diff --git a/.codex/AGENTS_EXTRA.md b/.codex/AGENTS_EXTRA.md index baf97a99..5dc66f4f 100644 --- a/.codex/AGENTS_EXTRA.md +++ b/.codex/AGENTS_EXTRA.md @@ -1,190 +1,96 @@ -# Critical - -1. Always load and use the .codex/AGENTS_EXTRA.md, if it exists, when working on the project. AGENTS_EXTRA.md contains specialized project-tailored information. -2. Before starting actual work on documentation, but not earlier than that, load the .codex/PROTOCOL_AFAD.md, and use it for all your work on the documentation. - ---- +# FTLLexEngine Project Directives # 1. ARCHITECTURAL PRIME DIRECTIVE ## 1.1 Library Identity -FTLLexEngine is the Python runtime for the **Fluent Template Language specification**, with -**CLDR-backed locale-aware formatting** and **fail-fast boot validation with structured audit evidence**. Every -public symbol must arise from one of these three purposes. The library is not a general -utilities collection, not a financial domain toolkit, not a concurrency framework — it is -the i18n layer that production systems build directly on top of, and nothing else. +FTLLexEngine is the Python runtime for the **Fluent Template Language specification**, with **CLDR-backed locale-aware formatting** and **fail-fast boot validation with structured audit evidence**. Every public symbol must arise from one of these three purposes. The library is not a general utilities collection, not a financial domain toolkit, not a concurrency framework — it is the i18n layer that production systems build directly on top of, and nothing else. -The primary use case is production systems where every locale resource must load cleanly, -every message schema must match exactly, and every failure must produce named, traceable -evidence — regulated deployments, audited backends, compliance-constrained services. This -purpose drives every API design decision. +The primary use case is production systems where every locale resource must load cleanly, every message schema must match exactly, and every failure must produce named, traceable evidence — regulated deployments, audited backends, compliance-constrained services. This purpose drives every API design decision. **Three Design Axioms:** -**Axiom 1 — Downstream Burden Elimination:** -Before adding any symbol to a public facade, ask: *what downstream composition does this -replace?* Every public surface must eliminate a pattern that serious callers would otherwise -implement themselves. `require_locale_code()` replaced per-caller trim/blank/length/normalize -chains. `LocalizationBootConfig` replaced per-caller boot sequence assembly. `make_fluent_number()` -replaced per-caller visible-precision inference. Primitives that serve only internal composition -belong in submodules, not in `ftllexengine`, `ftllexengine.runtime`, or -`ftllexengine.localization`. - -**Axiom 2 — Fail-Fast at Boot, Structured Evidence at Runtime:** -Validate everything before accepting traffic. The canonical boot chain — -`LocalizationBootConfig.boot()`, or `FluentLocalization` + `require_clean()` + -`validate_message_schemas()` — raises `IntegrityCheckFailedError` if any resource fails to -load cleanly or any schema mismatches. At runtime, errors are returned as immutable structured -evidence (`FrozenFluentError`, `WriteLogEntry`, `LoadSummary`) so callers can build auditable, -loggable, compliant systems on top. Silent degradation is prohibited; all failures are explicit. - -**Axiom 3 — Explicit Failures, Immutable Evidence:** -Every failure produces a named, typed, immutable error object with structured context. -`strict=True` is the default on `FluentBundle` and `FluentLocalization` — exceptions, not -silent empty strings, are the correct response to integrity failures. `strict=False` is an -explicit opt-in for soft-error return semantics where `format_pattern` returns a -`(result, errors)` tuple. Audit structures (`WriteLogEntry`, `IntegrityContext`) carry dual -timestamps (`timestamp` for monotonic ordering, `wall_time_unix` for cross-system correlation) -because compliance traces must be reproducible across restarts. - -**API Design Review — apply before any new public surface:** +**Axiom 1 — Downstream Burden Elimination.** +Before adding any symbol to a public facade, ask: *what downstream composition does this replace?* Every public surface must eliminate a pattern that serious callers would otherwise implement themselves. `require_locale_code()` replaced per-caller trim/blank/length/normalize chains. `LocalizationBootConfig` replaced per-caller boot sequence assembly. `make_fluent_number()` replaced per-caller visible-precision inference. Primitives that serve only internal composition belong in submodules, not on `ftllexengine`, `ftllexengine.runtime`, or `ftllexengine.localization`. + +**Axiom 2 — Fail-Fast at Boot, Structured Evidence at Runtime.** +Validate everything before accepting traffic. The canonical boot chain — `LocalizationBootConfig.boot()`, or `FluentLocalization` + `require_clean()` + `validate_message_schemas()` — raises `IntegrityCheckFailedError` if any resource fails to load cleanly or any schema mismatches. At runtime, errors are returned as immutable structured evidence (`FrozenFluentError`, `WriteLogEntry`, `LoadSummary`) so callers can build auditable, loggable, compliant systems on top. Silent degradation is prohibited; all failures are explicit. + +**Axiom 3 — Explicit Failures, Immutable Evidence.** +Every failure produces a named, typed, immutable error object with structured context. `strict=True` is the default on `FluentBundle` and `FluentLocalization` — exceptions, not silent empty strings, are the correct response to integrity failures. `strict=False` is an explicit opt-in for soft-error return semantics where `format_pattern` returns a `(result, errors)` tuple. Audit structures (`WriteLogEntry`, `IntegrityContext`) carry dual timestamps (`timestamp` for monotonic ordering, `wall_time_unix` for cross-system correlation) because compliance traces must be reproducible across restarts. + +**API design review — apply before any new public surface:** + 1. What downstream composition does this replace? (Axiom 1) 2. Does construction fail fast? Does runtime return immutable structured evidence? (Axiom 2) -3. Does it belong in a facade `__init__`, or is it an internal primitive? (see §1.5) -4. Does it introduce any upward layer dependency? (see §1.5) -5. Does it fall within one of the owned domains in §1.6 — FTL spec, CLDR locale - formatting, compliance boot/audit, ISO 4217, or ISO 3166? Apply the full rejection - test (§1.6) before answering yes. - -## 1.2 Runtime Environment Constraints (Python 3.13+) -**Constraint:** The solution space targets **Python 3.13** as the baseline, targeting forward -compatibility with the current and next CPython release by avoiding constructs documented as -deprecated or removed. -* **Version Support Policy:** Support the baseline release and forward. Current values (update - when a new CPython stable release occurs): baseline=3.13, current=3.14, next=3.15. -* **Forward Compatibility:** Use only stable language features. Avoid deprecated constructs and - CPython-specific internals that may change between releases. -* **Syntax Enforcement System:** - * **Type Topology:** Leverage **PEP 695** generics and `type` aliases as the foundational - data modeling layer (e.g., `class Buffer[T]: ...`). Type hints are not documentation; - they are structural contracts. - * **Control Flow:** Utilize `match/case` structural pattern matching as the primary dispatch - mechanism, reducing the cyclomatic complexity inherent in `if/elif` chains. -* **Dependency Isolation:** The **Python Standard Library** is the sole permitted toolkit, with - the below stated explicit permitted exception. External dependencies are treated as system - contaminants and are prohibited unless creating the solution within the Standard Library - bounds is not achievable. - * **Permitted Exception:** `Babel` is the sole external dependency (optional), providing - Unicode CLDR locale data (plural rules, currency symbols, number formatting). CLDR data - is a curated international standard dataset that cannot be derived algorithmically. Babel - is the canonical Python interface to CLDR. - * **Babel Optionality:** Babel is an **optional** dependency. The package supports two - installation modes: - * **Parser-only** (`pip install ftllexengine`): No external dependencies. Provides - syntax parsing (`parse_ftl`, `serialize_ftl`), AST manipulation, and validation. - * **Full runtime** (`pip install ftllexengine[babel]`): Includes Babel for locale-aware - formatting via `FluentBundle`, `FluentLocalization`, and `ftllexengine.parsing` - modules. -* **Obsolescence Filter:** The system strictly rejects features scheduled for removal. - * Legacy import mechanics (`imp`, `sys.path`) and pre-PEP 695 typing (e.g., `typing.List`) - are structurally invalid inputs. +3. Does it belong on a facade `__init__`, or is it an internal primitive? (§1.5) +4. Does it introduce any upward layer dependency? (§1.5) +5. Does it fall within one of the owned domains in §1.6 — FTL spec, CLDR locale formatting, compliance boot/audit, ISO 4217, or ISO 3166? Apply the full rejection test (§1.6) before answering yes. + +## 1.2 Runtime Environment + +General Python 3.13 posture (PEP 695 generics, `match/case` dispatch, free-threaded/JIT considerations, removed-module list, packaging discipline) is in `AGENTS_PYTHON313.md`. The project-specific additions are below. + +* **Baseline:** Python 3.13 (current=3.14, next=3.15). Avoid 3.14+ syntax until the baseline is raised. +* **Dependencies:** Babel is the **sole permitted external dependency** and is **optional**. CLDR locale data is a curated international standard dataset that cannot be derived algorithmically; Babel is the canonical Python interface. All other functionality must be Standard Library only. +* **Two install modes:** + * **Parser-only** (`pip install ftllexengine`): no external dependencies. Provides `parse_ftl`, `serialize_ftl`, AST manipulation, and validation. + * **Full runtime** (`pip install ftllexengine[babel]`): includes Babel for `FluentBundle`, `FluentLocalization`, and `ftllexengine.parsing` modules. ## 1.3 Structural Mechanics -* **Immutability Protocol:** State mutation creates hidden coupling and non-determinism. The - system defaults to **Immutable Data Structures** (`frozen=True` dataclasses, `tuples`) to - enforce referential transparency. Mutation is permitted in exactly two bounded cases: - 1. **Performance-critical accumulation buffers:** isolated parse-buffer components where - temporary accumulation is the direct implementation mechanism (e.g., parser's internal - character/token accumulation). - 2. **Scoped context managers:** classes implementing the `__enter__`/`__exit__` protocol - where tracked mutable state (e.g., a depth counter) has deterministic enter/exit - lifetime and no external visibility (e.g., `DepthGuard`). -* **Explicit Control Topology:** Implicit behavior and "magic" methods increase cognitive load - and reduce auditability. The system demands explicit control flow and dependency injection - over global state or `threading.local` thread-local storage. **ContextVars - (`contextvars.ContextVar`) are permitted** for task-scoped state in high-frequency primitive - operations — they provide automatic async task isolation and do not share state between - concurrent parse operations. Any `ContextVar` usage MUST be documented as an architectural - decision per §3.6 and included in the Known Waiver Registry (§3.7). -* **Constants Placement Policy:** The `constants.py` module is for **cross-package - configuration constants** (depth limits, cache sizes, input bounds). Module-local private - constants (leading underscore) that are semantic to a single module's functionality belong IN - that module, not in `constants.py`. Examples: Unicode escape lengths in parser primitives, - indentation strings in serializer, cache tuning parameters in cache implementation. This - follows the principle of locality — implementation details stay with their implementation. + +* **Immutability protocol.** State mutation creates hidden coupling and non-determinism. The system defaults to **immutable data structures** (`frozen=True` dataclasses, tuples) to enforce referential transparency. Mutation is permitted in exactly two bounded cases: + 1. **Performance-critical accumulation buffers:** isolated parse-buffer components where temporary accumulation is the direct implementation mechanism (e.g., parser's internal character/token accumulation). + 2. **Scoped context managers:** classes implementing `__enter__`/`__exit__` where tracked mutable state has deterministic enter/exit lifetime and no external visibility (e.g., `DepthGuard`). + +* **Explicit control topology.** Implicit behavior and "magic" methods increase cognitive load and reduce auditability. Prefer explicit control flow and dependency injection over global state or `threading.local`. **`contextvars.ContextVar` is permitted** for task-scoped state in high-frequency primitive operations — it provides automatic async task isolation and does not share state between concurrent parse operations. Any `ContextVar` usage MUST be documented as an architectural decision per §3.6 and included in the Known Waiver Registry (§3.7). + +* **Constants placement.** `constants.py` is for **cross-package configuration constants** (depth limits, cache sizes, input bounds). Module-local private constants (leading underscore) that are semantic to a single module belong IN that module, not in `constants.py`. Examples: Unicode escape lengths in parser primitives, indentation strings in serializer, cache tuning parameters in cache implementation. Implementation details stay with their implementation. ## 1.4 Specification Authority (Fluent) -**Constraint:** The Fluent specification is the authoritative reference for runtime behavior. -**Specification Sources:** +The Fluent specification is the authoritative reference for runtime behavior. When agents or developers assume behavior that differs from the specification, the specification wins. + +**Specification sources:** + * Primary: [Project Fluent Guide](https://projectfluent.org/fluent/guide/) * Syntax: [Fluent Syntax 1.0](https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf) * Validation: [valid.md](https://github.com/projectfluent/fluent/blob/master/spec/valid.md) * Reference implementation: [Mozilla python-fluent](https://github.com/projectfluent/python-fluent) -**Specification Primacy:** - -When AI agents or developers assume behavior that differs from the specification, the -specification wins. Common misunderstandings: +**Common misunderstandings:** -| Assumption | Specification Reality | -|:-----------|:---------------------| -| `{ $count }` should format locale-aware | Variables are formatted as-is via `str()` | +| Assumption | Specification reality | +|:-----------|:----------------------| +| `{ $count }` should format locale-aware | Variables interpolate as-is via `str()` | | `NUMBER($count)` is optional for numbers | `NUMBER()` is REQUIRED for locale-aware formatting | | Implicit date formatting exists | `DATETIME()` is REQUIRED for locale-aware dates | | Messages and terms share a namespace | Separate namespaces: `foo` and `-foo` can coexist | -| `NUMBER(style: "currency")` for currency | Use `CURRENCY()` function, not NUMBER with style | -| `NUMBER(style: "percent")` for percent | No percent style; use `NUMBER()` with manual `%` | +| `NUMBER(style: "currency")` for currency | Use `CURRENCY()` function | +| `NUMBER(style: "percent")` for percent | No percent style; use `NUMBER()` + literal `%` | -**Example: Locale-Aware Number Formatting** +The FTL parser is syntax-agnostic and accepts any named arguments; the grammar does not reject `NUMBER($x, style: "currency")`. The argument is silently ignored at runtime. Spec compliance is checked at runtime, not parse time. -```python -# Input: count = 1000, locale = "de_DE" - -# Fluent message: { $count } -# Output: "1000" (NOT "1.000") -# Reason: Per spec, variables are interpolated as-is - -# Fluent message: { NUMBER($count) } -# Output: "1.000" (locale-aware) -# Reason: NUMBER() explicitly requests locale formatting -``` +**JavaScript Intl conflation.** Agents trained on `Intl.NumberFormat` patterns frequently assume FTLLexEngine uses the same single-constructor + `style` parameter idiom. It does not — Fluent uses **separate functions** per formatting type. -This is SPEC-COMPLIANT behavior, not a bug. The Fluent specification intentionally separates: -* Raw interpolation: `{ $var }` — developer controls formatting -* Locale-aware formatting: `{ NUMBER($var) }`, `{ DATETIME($var) }` — locale determines format - -**JavaScript Intl API Conflation (Common Agent Error):** - -Agents familiar with JavaScript's `Intl.NumberFormat` API frequently assume FTLLexEngine uses -the same patterns. This is incorrect. - -| JavaScript Intl Pattern | FTLLexEngine Equivalent | -|:------------------------|:-----------------------| +| JavaScript Intl pattern | FTLLexEngine equivalent | +|:------------------------|:------------------------| | `Intl.NumberFormat(locale, {style: 'currency', currency: 'EUR'})` | `CURRENCY($val, currency: "EUR")` | -| `Intl.NumberFormat(locale, {style: 'percent'})` | Not supported; use `NUMBER()` + literal `%` | +| `Intl.NumberFormat(locale, {style: 'percent'})` | Not supported; `NUMBER()` + literal `%` | | `Intl.NumberFormat(locale, {style: 'decimal'})` | `NUMBER($val)` (default behavior) | | `Intl.DateTimeFormat(locale, {year: 'numeric', month: 'long'})` | `DATETIME($val, dateStyle: "long")` | -**Root Cause:** JavaScript's `Intl` API uses a single constructor with `style` parameter to -switch modes. Fluent/FTLLexEngine uses **separate functions** for each formatting type. The FTL -parser accepts any named arguments (it's syntax-agnostic), so `NUMBER($x, style: "currency")` -parses successfully but the `style` argument is ignored at runtime. +**Before flagging runtime behavior as incorrect:** -**Agent Responsibility:** Before flagging runtime behavior as incorrect: -1. Verify behavior against Fluent specification -2. Check Mozilla python-fluent reference implementation -3. If behavior matches spec: NOT a bug, even if counterintuitive -4. If behavior differs from spec: VALID issue; proceed with filing -5. Never assume JavaScript API patterns apply; verify function signatures against - DOC_04_Runtime.md +1. Verify against the Fluent specification. +2. Check Mozilla python-fluent reference implementation. +3. Spec match → not a bug, even if counterintuitive. +4. Spec divergence → valid issue; proceed with filing. +5. Never assume JavaScript API patterns apply; verify function signatures against `docs/DOC_04_Runtime.md`. ## 1.5 Layer Architecture and Facade Contract -### 1.5.1 Layer Graph (Architectural Law) - -The package layer hierarchy is a hard structural invariant, not a style convention: +### 1.5.1 Layer graph (architectural law) ``` core ← syntax ← parsing ← runtime ← localization @@ -206,20 +112,13 @@ core ← syntax ← parsing ← runtime ← localization | `runtime` | FluentBundle, resolver, cache, functions | `core`, `syntax`, `introspection`, `analysis`, `diagnostics` | | `localization` | FluentLocalization, boot, loaders | `runtime` and all below | -**Upward dependencies are structural violations, not style issues.** A module in layer N may -not import from layer M > N. Violations must be fixed by moving the symbol to the correct -layer, not by using a runtime local import to paper over the problem. +**Upward dependencies are structural violations, not style issues.** A module in layer N must not import from layer M > N. Violations must be fixed by **moving the symbol to the correct layer**, not by hiding the import in a function body. -**Detection pattern:** When layer N needs a symbol from layer M > N, ask: "Does this symbol -conceptually belong in layer ≤ N?" If yes, move the symbol. The 0.154.0 `FluentNumber` -relocation (`runtime.value_types` → `core.value_types`) is the canonical example — it was a -violation because `parsing` needed `FluentNumber` to implement `parse_fluent_number()`, but -`parsing` cannot import from `runtime`. +**Detection pattern:** when layer N needs a symbol from layer M > N, ask: "Does this symbol conceptually belong in layer ≤ N?" If yes, move it. The 0.154.0 `FluentNumber` relocation (`runtime.value_types` → `core.value_types`) is the canonical example — it was a violation because `parsing` needed `FluentNumber` to implement `parse_fluent_number()`, but `parsing` cannot import from `runtime`. -### 1.5.2 Public Facade Contract +### 1.5.2 Public facade contract -The three public facades are permanent API contracts. A symbol on a facade cannot be removed -or renamed without a CHANGELOG.md `### Breaking Changes` entry. +The three public facades are permanent API contracts. A symbol on a facade cannot be removed or renamed without a `CHANGELOG.md` `### Breaking Changes` entry. | Facade | Import path | Scope | |:-------|:------------|:------| @@ -227,272 +126,188 @@ or renamed without a CHANGELOG.md `### Breaking Changes` entry. | Runtime | `ftllexengine.runtime` | FluentBundle, AsyncFluentBundle, FluentNumber, FunctionRegistry | | Localization | `ftllexengine.localization` | FluentLocalization, LocalizationBootConfig, loader types | -**Submodule paths** (`ftllexengine.runtime.bundle`, `ftllexengine.core.value_types`) are -internal navigation paths, not contracted surfaces. They may be reorganized without breaking -the public contract provided facade re-exports are maintained. +**Submodule paths** (`ftllexengine.runtime.bundle`, `ftllexengine.core.value_types`) are internal navigation paths, not contracted surfaces. They may be reorganized without breaking the public contract provided facade re-exports are maintained. -**Export hygiene:** Every symbol in a facade `__init__.py` must have an explicit `__all__` -entry. Implicit reachability via attribute traversal does not constitute a public contract. +**Export hygiene:** every symbol in a facade `__init__.py` must have an explicit `__all__` entry. Implicit reachability via attribute traversal does not constitute a public contract. -**Prohibited facade additions:** Symbols that exist only to expose implementation details -(internal cache structures, private lock primitives, parser internals) must not be promoted -to a facade even if callers request it. The facade is a curated surface, not a namespace dump. +**Prohibited facade additions:** symbols that exist only to expose implementation details (internal cache structures, private lock primitives, parser internals) must not be promoted to a facade even if callers request it. The facade is a curated surface, not a namespace dump. ## 1.6 Public Surface Scope Constraint -**Constraint:** FTLLexEngine is the Python runtime for the Fluent Template Language -specification, with CLDR-backed locale-aware formatting and fail-fast boot validation -with structured audit evidence. Its public surface is bounded by three owned domains plus two narrowly-named -standards datasets. Symbols outside these domains do not belong on any public facade, -regardless of technical merit or caller convenience. +FTLLexEngine's public surface is bounded by **three owned domains plus two narrowly-named standards datasets**. Symbols outside these domains do not belong on any public facade, regardless of technical merit or caller convenience. -**The Owned Domains (exhaustive — not a representative sample):** +**Owned domains (exhaustive — not a representative sample):** | Domain | Bounded by | Examples of in-scope symbols | |:-------|:-----------|:-----------------------------| -| **FTL specification** | The Fluent 1.0 EBNF and valid.md | parse_ftl, serialize_ftl, validate_resource, AST nodes, FTL built-in functions | -| **CLDR-backed locale formatting** | Babel + Unicode CLDR | FluentBundle, FluentNumber, LocaleCode, normalize_locale, CLDR lookups | -| **Compliance-grade boot and audit** | The FTL/locale pipeline only | LocalizationBootConfig, IntegrityContext, LoadSummary, integrity exceptions arising from FTL resource loading | -| **ISO 4217 currency data** | The ISO 4217 standard as exposed by Babel/CLDR | CurrencyCode, is_valid_currency_code, get_currency_decimal_digits | -| **ISO 3166 territory data** | The ISO 3166-1 alpha-2 standard as exposed by Babel/CLDR | TerritoryCode, is_valid_territory_code, require_territory_code | - -The last two domains are named standards with fixed scope — not a generic "international -standards" category. A new standard (ISO 8601, IETF BCP-47 extensions, ITU-T E.164) is -NOT automatically in-scope because a standard exists; it must be added to this table with -explicit justification, because the table is exhaustive. - -**Mechanical Rejection Test — apply before any new public symbol:** - -1. Does this symbol address a failure mode or composition burden that arises specifically - from the FTL spec, CLDR locale formatting, or the boot/audit pipeline — and not from - general programming? -2. Would this symbol need to exist in a library that exclusively implements FTL parsing, - CLDR-backed locale formatting, and fail-fast boot validation — with no knowledge - of the caller's domain (financial, medical, logistics, etc.)? -3. Is this symbol's definition or behaviour meaningfully coupled to FTL, CLDR, or the - boot pipeline — or could it exist without modification in an unrelated Python library? - -All three questions must be answered YES. A symbol that fails any one is OUT OF SCOPE for -the public facade. It may exist internally if the implementation requires it, but must not -appear in `__all__` of any facade module. - -**Bootstrapping trap:** Defining a new type (e.g., `PhoneNumber`) does not automatically -make a corresponding validator (`require_phone_number`) in-scope. Question 2 applies to -the type itself: would a pure FTL/CLDR/boot library need `PhoneNumber`? If not, neither -the type nor its validator belongs on a public facade. - -**Explicitly Out-of-Scope Categories:** - -* **Generic type validators** (`require_int`, `require_non_negative_int`, - `require_non_empty_str`, `coerce_tuple`, etc.): Every Python program needs integer and - string validation. A stripped FTL/CLDR/boot library would not. Validators are in-scope - only when the validated type is intrinsic to FTL, CLDR, or boot (e.g., - `require_fluent_number` — `FluentNumber` cannot exist outside this library; - `require_locale_code` — locale canonicalization is required by the CLDR formatting - pipeline). - -* **Fiscal calendar** (`FiscalCalendar`, `FiscalDelta`, `FiscalPeriod`, `MonthEndPolicy`, - `fiscal_year`, `fiscal_quarter`, `fiscal_month`, `fiscal_year_start`, `fiscal_year_end`, - `require_fiscal_calendar`, `require_fiscal_period`): Pure date arithmetic with no CLDR - interaction, no Babel dependency, and no FTL parser involvement. Not an ISO standard. - Fails the mechanical rejection test on all three questions — no FTL/CLDR/boot coupling; - would not exist in a stripped FTL/CLDR/boot library; could exist unmodified in any - financial or accounting library. - -* **Accounting/ledger domain** (`LedgerInvariantError`, invariant codes such as - `BALANCE`, `DUPLICATE_ACCOUNT`, `PERIOD_OVERLAP`): Financial ledger semantics are the - caller's domain. A stripped FTL/CLDR/boot library has no concept of a ledger. These - symbols would exist unchanged in a CRM or ERP library that never touches FTL. - -* **Storage and persistence domain** (`PersistenceIntegrityError`): Resource *loading* - into the FTL pipeline is in-scope (`ResourceLoader`, `PathResourceLoader` — these are - the boundary at which FTL resources enter the library). Storage layer failures below - that boundary are the caller's concern; a stripped FTL/CLDR/boot library would have - no concept of persistence integrity independent of FTL resource loading. - -* **General concurrency primitives** (`RWLock`, `InterpreterPool`): Concurrency is an - implementation detail of the runtime layer, not a contract offered to callers. Internal - modules use `RWLock` for bundle thread-safety; callers have no need to instantiate it. - `InterpreterPool` is a general PEP 734 pool with no FTL-specific semantics. - -* **Internal resolver machinery** (`FluentResolver`, `ResolutionContext`): These are - implementation details of message resolution. The extension API is `FunctionRegistry` + - `fluent_function`. Callers do not instantiate resolvers. - -**Scope Creep Detection:** - -Scope creep occurs when the library adds a symbol because a caller *could use it* rather -than because *the FTL/CLDR/boot pipeline specifically requires it*. The test is not -"does this help callers?" — everything helpful passes that test. The test is: would a -library stripped to only FTL parsing + CLDR formatting + boot validation still need this -symbol? If not, it does not belong. "Could use" adds surface; "the pipeline requires" -eliminates downstream burden. Only the latter justifies promotion. +| **FTL specification** | Fluent 1.0 EBNF and valid.md | `parse_ftl`, `serialize_ftl`, `validate_resource`, AST nodes, FTL built-in functions | +| **CLDR-backed locale formatting** | Babel + Unicode CLDR | `FluentBundle`, `FluentNumber`, `LocaleCode`, `normalize_locale`, CLDR lookups | +| **Compliance-grade boot and audit** | The FTL/locale pipeline only | `LocalizationBootConfig`, `IntegrityContext`, `LoadSummary`, integrity exceptions arising from FTL resource loading | +| **ISO 4217 currency data** | The ISO 4217 standard as exposed by Babel/CLDR | `CurrencyCode`, `is_valid_currency_code`, `get_currency_decimal_digits` | +| **ISO 3166 territory data** | The ISO 3166-1 alpha-2 standard as exposed by Babel/CLDR | `TerritoryCode`, `is_valid_territory_code`, `require_territory_code` | + +The last two domains are **named standards with fixed scope** — not a generic "international standards" category. ISO 8601, IETF BCP-47 extensions, and ITU-T E.164 are NOT automatically in scope; they would require explicit promotion of the table above. + +**Mechanical rejection test — all three must be YES:** + +1. Does this symbol address a failure mode or composition burden that arises specifically from the FTL spec, CLDR locale formatting, or the boot/audit pipeline — and not from general programming? +2. Would this symbol need to exist in a library that exclusively implements FTL parsing, CLDR-backed locale formatting, and fail-fast boot validation — with no knowledge of the caller's domain (financial, medical, logistics, etc.)? +3. Is this symbol's definition or behavior meaningfully coupled to FTL, CLDR, or the boot pipeline — or could it exist without modification in an unrelated Python library? + +A symbol that fails any one is OUT OF SCOPE for the public facade. It may exist internally if the implementation requires it, but must not appear in `__all__` of any facade module. + +**Bootstrapping trap:** defining a new type (e.g., `PhoneNumber`) does not automatically make a corresponding validator (`require_phone_number`) in-scope. Question 2 applies to the type itself: would a pure FTL/CLDR/boot library need `PhoneNumber`? If not, neither the type nor its validator belongs on a public facade. + +**Explicitly out-of-scope categories:** + +* **Generic type validators** (`require_int`, `require_non_negative_int`, `require_non_empty_str`, `coerce_tuple`): every Python program needs integer and string validation. A stripped FTL/CLDR/boot library would not. Validators are in-scope only when the validated type is intrinsic to FTL, CLDR, or boot (e.g., `require_fluent_number` — `FluentNumber` cannot exist outside this library; `require_locale_code` — locale canonicalization is required by the CLDR pipeline). +* **Fiscal calendar** (`FiscalCalendar`, `FiscalDelta`, `FiscalPeriod`, `MonthEndPolicy`, `fiscal_year`, `fiscal_quarter`, etc.): pure date arithmetic with no CLDR/Babel/FTL coupling. Not an ISO standard. Would exist unmodified in any financial or accounting library. +* **Accounting/ledger domain** (`LedgerInvariantError`, invariant codes such as `BALANCE`, `DUPLICATE_ACCOUNT`, `PERIOD_OVERLAP`): financial ledger semantics are the caller's domain. +* **Storage and persistence domain** (`PersistenceIntegrityError`): resource *loading* into the FTL pipeline is in-scope (`ResourceLoader`, `PathResourceLoader`); storage layer failures below that boundary are the caller's concern. +* **General concurrency primitives** (`RWLock`, `InterpreterPool`): concurrency is an implementation detail of the runtime layer, not a contract. `InterpreterPool` is a general PEP 734 pool with no FTL semantics. +* **Internal resolver machinery** (`FluentResolver`, `ResolutionContext`): the extension API is `FunctionRegistry` + `fluent_function`. Callers do not instantiate resolvers. + +**Scope creep detection.** The test is not "does this help callers?" — everything helpful passes that test. The test is: would a library stripped to only FTL parsing + CLDR formatting + boot validation still need this symbol? If not, it does not belong. "Could use" adds surface; "the pipeline requires" eliminates downstream burden. Only the latter justifies promotion. + +--- # 2. CODE & OUTPUT CONSTRAINTS -## 2.1 Professional Output Standard (No-Emoji Policy) -**Constraint:** Enforce strict adherence to professional ASCII standards. -* **PROHIBITED:** Emojis or decorative characters in source code, comments, docstrings, or - commit messages. -* **PERMITTED:** Emojis are *only* permissible within **Test Data strings** to validate - Unicode/FTL specification handling (e.g., `parse_ftl("greeting = 👋")` as FTL message - content inside a test fixture). +`AGENTS.md` §7.10 already prohibits emojis in code, comments, and documentation. The additions below are project-specific. + +## 2.1 Status & Logging Indicators -## 2.2 Status & Logging Indicators Use only standardized ASCII indicators for logging and CLI output. | Status | Indicator | Rationale | -| :--- | :--- | :--- | -| **Success** | `[OK]`, `[PASS]` | Unambiguous status reporting. | -| **Failure** | `[FAIL]`, `[ERROR]` | High-priority failure flag. | -| **Warning** | `[WARN]` | Deprecation or non-critical state alert. | - -## 2.3 Documentation Standard -* **Docstrings:** All public modules, classes, and functions must have concise docstrings. -* **Style:** Use Google-style docstrings. This is the style established in the existing - codebase; consistency with existing code takes precedence. -* **Typing:** Do not duplicate type information in docstrings; rely on type hints. - -## 2.4 Self-Containment Principle -**Constraint:** Source code, tests, and documentation must NEVER reference CLAUDE.md. - -* **PROHIBITED:** Comments/docstrings referencing "CLAUDE.md", "Section X.Y", or - "per CLAUDE.md" -* **REQUIRED:** All architectural justifications must be self-contained and self-explanatory -* **RATIONALE:** CLAUDE.md is an AI agent directive, not developer documentation. Human - developers must understand design decisions without consulting AI protocols. - -**Examples:** +|:-------|:----------|:----------| +| **Success** | `[OK]`, `[PASS]` | Unambiguous status reporting | +| **Failure** | `[FAIL]`, `[ERROR]` | High-priority failure flag | +| **Warning** | `[WARN]` | Deprecation or non-critical state alert | + +**Test data exception (the only one):** emojis are permitted *only* inside test fixture data when validating Unicode/FTL specification handling (e.g., `parse_ftl("greeting = 👋")`). They are never permitted in source code, comments, docstrings, commit messages, or non-fixture test code. + +## 2.2 Documentation Style + +* **Docstrings:** all public modules, classes, and functions must have concise docstrings. +* **Style:** Google-style docstrings (matches existing code; consistency over preference). +* **Typing:** do not duplicate type information in docstrings; type hints are the contract. + +## 2.3 Self-Containment Principle + +Source code, tests, and user-facing documentation must remain **self-contained**. They must NEVER reference AI-agent-only directives or the `.codex/` protocol stack. Human developers must understand design decisions without consulting agent protocols. + +* **Prohibited:** comments, docstrings, error messages, or user-facing docs that reference `AGENTS.md`, `CLAUDE.md`, files in `.codex/` / `.claude/` / `.gemini/`, "Section X.Y" of an internal protocol, or "per the agent contract." +* **Required:** architectural justifications must stand alone — readable by a human developer who has never seen the protocol stack. + ```python # PROHIBITED -# Violates CLAUDE.md §1.3 explicit control flow principle +# Violates AGENTS_EXTRA.md §1.3 explicit control topology. # REQUIRED -# Uses task-local ContextVar for performance: primitives called 100+ times per parse. -# Explicit context parameter would require 10+ signature changes and 200+ call site updates. +# Task-local ContextVar for performance: primitive functions are called 100+ times per +# parse, and explicit context threading would require ~10 signature changes and 200+ +# call site updates. ``` -**Scope:** Applies to all `.py` files, CHANGELOG.md, and user-facing documentation. Internal -protocol files (`.claude/*.md`, `.codex/*.md`, `.gemini/*.md`) are exempt. +**Scope:** all files in `src/`, `tests/`, `examples/`, `CHANGELOG.md`, and user-facing markdown. The protocol stack itself (`AGENTS.md`, `.codex/`, `.claude/`, `.gemini/`) is exempt — it can and must reference itself. + +--- # 3. QUALITY HIERARCHY & WAIVERS -Maintain distinct quality configurations for static analysis. You must respect the specific -configuration files associated with each directory scope. +Distinct quality configurations apply by directory scope. Respect the configuration files associated with each. ## 3.1 Core Production Code (`src/`): STRICT -* **Quality Target:** All linters exit 0: Ruff (zero errors), Mypy (`strict = true`). See §5.7 - for enforcement order. -* **Ruff Configuration:** `select = ["ALL"]` with focused `ignore` list in `pyproject.toml` - (D, ANN, COM812, ISC001, and framework-specific families). New rules apply automatically; - explicit `ignore` or per-file-ignores required for any suppression. -* **Mypy Configuration:** `strict = true`. No unchecked types; full type annotation coverage - required. -* **Waiver Philosophy:** Only permit **Architectural Waivers** (see §3.6). Never permit waivers - for logic bugs, security issues, performance flaws, or dead code. + +* **Quality target:** all linters exit 0. Ruff (zero errors), Mypy (`strict = true`). See §5.7 for execution order. +* **Ruff:** `select = ["ALL"]` with focused `ignore` list in `pyproject.toml` (D, ANN, COM812, ISC001, framework families). New rules apply automatically; explicit `ignore` or per-file-ignores required for any suppression. +* **Mypy:** `strict = true`. No unchecked types; full annotation coverage required. +* **Waivers:** only architectural waivers (§3.6). Never permit waivers for logic bugs, security issues, performance flaws, or dead code. ## 3.2 Verification Test Code (`tests/`): PRAGMATIC -* **Quality Target:** All linters exit 0: Ruff (zero errors), Mypy (pragmatic). See §5.7 for - enforcement order. -* **Configuration Scope:** - * **Linter:** `pyproject.toml` (Ruff per-directory overrides). - * **Type Checker:** `tests/mypy.ini`. -* **Key Allowed Waivers:** - * `N802` (Naming): Permitted for FTL specification mimicry (e.g., `UPPERCASE_functions`). - * `SLF001` (Private Access): Permitted for integration tests verifying internal object - state. - * `E402`, `PLC0415` (Import Position): Permitted for Hypothesis strategy isolation. + +* **Quality target:** Ruff zero errors, Mypy pragmatic. +* **Configuration:** `pyproject.toml` (Ruff per-directory overrides), `tests/mypy.ini` (Mypy). +* **Allowed waivers:** + * `N802` — FTL specification mimicry (e.g., `UPPERCASE_functions`). + * `SLF001` — integration tests verifying internal object state. + * `E402`, `PLC0415` — Hypothesis strategy isolation. ## 3.3 Example Code (`examples/`): DEMONSTRATIVE -* **Configuration Scope:** - * **Type Checker:** `examples/mypy.ini`. -* **Waiver Philosophy:** Inline configuration is preferred here to serve as documentation for - users on how to handle linting in their own implementations. + +* **Configuration:** `examples/mypy.ini`. +* Inline configuration is preferred — examples document linting practice for users. ## 3.4 Operational Fuzzing Code (`fuzz_atheris/`): OPERATIONAL -* **Quality Target:** Ruff (zero errors), Mypy (operational — `fuzz_atheris/mypy.ini`). See - §5.7 for enforcement order. -* **Configuration Scope:** - * **Linter:** `pyproject.toml` (fuzz_atheris per-directory overrides). - * **Type Checker:** `fuzz_atheris/mypy.ini`. -* **Key Allowed Waivers:** - * `PLR0912`, `PLR0915` (Dispatch Complexity): Pattern handler functions in fuzz modules - MUST use dispatch-to-sub-handlers (see §4.3) rather than monolithic if/elif chains. - Sub-handler functions are individually simple; the dispatcher itself is a one-liner index - into a tuple of callables. This is the canonical pattern — do not suppress PLR0912 on - a monolithic function; refactor first. - * `S101` (assert): Permitted for invariant checks inside fuzz patterns. -* **Fuzz Pattern Architecture:** Each fuzz pattern function (`_pattern_*`) dispatches to a - tuple of sub-handler functions (`_check_*`). Each sub-handler tests one behavioral scenario. - This mirrors the dispatch-to-sub-handlers pattern in §4.3 and keeps individual functions - within McCabe complexity limits. - -## 3.5 No Deferrals Policy -**Constraint:** Technical debt is prohibited. Every issue identified must be resolved -immediately. - -**Prohibited Deferrals:** -* "Fix in next version" — If an issue is found, fix it now. -* "TODO: refactor later" — Refactor immediately or not at all. -* "Known issue" — Unknown issues become known; known issues become fixed. -* Backwards-compatibility shims — Make clean breaks; remove deprecated code entirely. -* Migration paths — Users adapt to the current API; old APIs are deleted, not deprecated. -* Suppression as fix — Never suppress lint/static analysis warnings when the underlying code - can be corrected. Suppression (`# noqa`, `# type: ignore`, `per-file-ignores`) is only - valid for permanent architectural patterns (see §3.6), not for avoiding proper remediation. - -**Rationale:** Deferred fixes accumulate interest. A "small" workaround today becomes an -architectural constraint tomorrow. The cost of immediate remediation is always lower than the -cost of accumulated technical debt. + +* **Quality target:** Ruff zero errors, Mypy operational (`fuzz_atheris/mypy.ini`). +* **Configuration:** `pyproject.toml` overrides + `fuzz_atheris/mypy.ini`. +* **Allowed waivers:** + * `PLR0912`, `PLR0915` — pattern handler functions in fuzz modules MUST use dispatch-to-sub-handlers (§4.3) rather than monolithic if/elif chains. Sub-handler functions are individually simple; the dispatcher itself is a one-liner index into a tuple of callables. Do not suppress PLR0912 on a monolithic function — refactor it. + * `S101` — `assert` is permitted for invariant checks inside fuzz patterns. +* **Pattern architecture:** each `_pattern_*` function dispatches to a tuple of `_check_*` sub-handlers; each sub-handler tests one behavioral scenario. Mirrors §4.3. + +## 3.5 Clean Breaks, No Debt + +This project takes a stricter stance than `AGENTS.md` §7.4 baseline: **no migration paths, no transitional shims, no deprecation cycles.** Old APIs are deleted, not deprecated. Users adapt to the current API. `CHANGELOG.md` is the single authoritative version ledger. + +**Prohibited:** + +* `# TODO: refactor later`, `# FIXME`, "fix in next version", "known issue" — fix now or not at all. +* Backwards-compatibility shims, transitional APIs, parallel-maintained old APIs — make clean breaks; remove deprecated code entirely. +* Suppression as fix — `# noqa`, `# type: ignore`, `per-file-ignores` are reserved for permanent architectural patterns documented in §3.7. They are never a way to defer remediation. +* Version provenance in `src/`, `tests/`, `examples/` — no `# v0.X.0: feature added`, no "As of v0.X.0", no "Since v0.X.0", no "Updated in v0.X.0", no `(TICKET-001 fix)` annotations. Test docstrings describe **WHAT** is tested, not **WHEN** it changed: + +```python +# PROHIBITED +"""v0.39.0: Pound symbol is now ambiguous (GBP, EGP, GIP).""" + +# REQUIRED +"""Pound symbol requires locale-aware resolution (ambiguous: GBP, EGP, GIP).""" +``` + +**Permitted version locations (exhaustive):** + +* `__version__` in `__init__.py` +* `version` field in `pyproject.toml` +* `version:` in YAML frontmatter +* `- Version: Added in v0.X.0.` in `docs/DOC_*.md` Constraints sections only + +**Rationale.** Deferred fixes accumulate interest: a "small" workaround today becomes an architectural constraint tomorrow. Version references scattered across 200+ locations require manual updates each release; duplication creates drift; old version numbers remain as historical noise. The cost of immediate remediation is always lower than the cost of accumulated debt. ## 3.6 Waiver Implementation Protocol + Waivers are for **permanent architectural necessities**, never for deferring fixes. -1. **Fix First:** Attempt remediation before waivering. Waivers are a last resort. -2. **Permanence Requirement:** A waiver must address a permanent constraint (e.g., Visitor - pattern naming), not a temporary inconvenience. -3. **Scope:** Use `per-file-ignores` in the relevant configuration file for patterns that - apply uniformly to an entire file or directory. Use inline `# noqa` directives for - isolated single-line exceptions within otherwise conformant files. -4. **Documentation:** Every waiver must be accompanied by a concise, high-value comment - justifying the *permanent architectural necessity*. - -**Prohibited Waiver Justifications:** -* "Will fix later" — No. Fix now or not at all. -* "Not enough time" — Time is not an accepted constraint; correctness is. -* "Too complex to refactor" — Complexity is a symptom of design issues; address the root - cause. +1. **Fix first.** Attempt remediation before waivering. Waivers are a last resort. +2. **Permanence.** A waiver must address a permanent constraint (e.g., Visitor pattern naming), not a temporary inconvenience. +3. **Scope.** Use `per-file-ignores` for patterns that apply uniformly to a file or directory; use inline `# noqa` for isolated single-line exceptions in otherwise conformant files. +4. **Documentation.** Every waiver must carry a concise, high-value comment justifying the *permanent architectural necessity*. + +**Prohibited justifications:** "will fix later", "not enough time", "too complex to refactor". Time is not an accepted constraint; correctness is. ## 3.7 Design Principle Hierarchy (Waiver Recognition) -**Constraint:** Documented architectural waivers OVERRIDE general principles stated in this -document. -The codebase contains intentional deviations from stated principles where trade-off analysis -justifies the design. AI agents must distinguish between: +Documented architectural waivers OVERRIDE general principles stated in this document or in `AGENTS_PYTHON313.md`. Distinguish carefully: | Category | Definition | Action | |:---------|:-----------|:-------| -| **Principle** | Default mode of operation stated in CLAUDE.md | Apply unless waiver documented | +| **Principle** | Default mode of operation stated in this file or the `.codex/` stack | Apply unless waiver documented | | **Waiver** | Documented exception with trade-off rationale | Respect; do NOT flag as violation | | **Violation** | Undocumented deviation without justification | Flag for remediation | -**Waiver Recognition Signals:** +**Waiver recognition signals** — a design decision is a documented waiver if any of these are present: -A design decision is a DOCUMENTED WAIVER if ANY of these signals are present: -* Module docstring explains trade-off (e.g., "Task-Local State (Architectural Decision)") -* Inline comment includes keywords: "intentional", "trade-off", "architectural", "design - decision" -* Suppression comment provides rationale (e.g., `# noqa: PLC0415 - circular import`) -* Comment explicitly states "permanent" or "accepted" +* Module docstring explains the trade-off. +* Inline comment includes "intentional", "trade-off", "architectural", "design decision". +* Suppression comment provides rationale (e.g., `# noqa: PLC0415 - circular import`). +* Comment explicitly states "permanent" or "accepted". -**Example: Task-Local ContextVar vs. Explicit Control (§1.3)** +**Example: task-local ContextVar vs explicit control (§1.3).** -§1.3 states: "The system demands explicit control flow... over global state or -`threading.local`." +§1.3 states: "prefer explicit control flow... over global state or `threading.local`." `primitives.py` uses `contextvars.ContextVar` task-local state (NOT `threading.local`) with documented justification: -`primitives.py` uses `contextvars.ContextVar` task-local state (NOT `threading.local`) with -documented justification. ContextVars are async-safe and task-isolated; they do not violate -§1.3's prohibition. The waiver covers the *implicit state* aspect of the principle: ``` # Task-Local State (Architectural Decision): # - Primitive functions called 100+ times per parse operation @@ -501,121 +316,94 @@ documented justification. ContextVars are async-safe and task-isolated; they do # This is a permanent architectural pattern... ``` -This is a WAIVER, not a VIOLATION. The documentation: -1. Acknowledges the principle being relaxed (implicit state) -2. Provides quantitative justification (100+ calls, 10 signatures) -3. Explicitly marks it as "permanent architectural pattern" +This is a WAIVER, not a violation. The documentation (1) acknowledges the relaxed principle, (2) provides quantitative justification, (3) explicitly marks it permanent. -**Violation Detection:** +**Violation detection.** An issue is a true violation only if: -An issue is a TRUE VIOLATION only if: -1. Behavior contradicts a stated principle (e.g., uses `threading.local` or module-global - mutable state) -2. No documentation within the module docstring OR within the enclosing function/class scope - justifies the deviation -3. No suppression comment provides rationale +1. Behavior contradicts a stated principle (e.g., uses `threading.local` or module-global mutable state). +2. No documentation in the module docstring or enclosing function/class scope justifies the deviation. +3. No suppression comment provides rationale. -**Agent Responsibility:** +**Before flagging any apparent principle violation:** read the module docstring, search the enclosing scope for waiver documentation, consult the registry below. If documented: not a violation. If undocumented: file the issue. -Before flagging ANY apparent principle violation: -1. Read the module docstring for architectural decisions -2. Search within the enclosing function or class scope for waiver documentation -3. Consult the Known Waiver Registry below -4. If documented with rationale: NOT a violation; respect the waiver -5. If undocumented: VALID violation; proceed with issue +### Known Waiver Registry -**Known Waiver Registry:** +All architectural waivers in `src/`. Each entry is a documented, permanent decision — not a deferral. -All architectural waivers in `src/`. Each entry is a documented, permanent decision — not a -deferral. - -| Module | Suppressed Rule(s) | Principle Relaxed | Permanent Justification | -|:-------|:------------------|:------------------|:------------------------| +| Module | Suppressed rule(s) | Principle relaxed | Permanent justification | +|:-------|:-------------------|:------------------|:------------------------| | `syntax/parser/primitives.py` | §1.3 explicit control | §1.3 explicit control topology | `ContextVar` task-local state; 100+ calls/parse; threading via ContextVar gives automatic async isolation with O(1) overhead | -| `core/depth_guard.py` | §1.3 immutability | §1.3 immutability protocol | Mutable `current_depth` counter required by context-manager `__enter__`/`__exit__` protocol; state is strictly scoped to each `with` block | -| `core/babel_compat.py` | PLW0603, F401, PLC0415 | §1.3 explicit control (global singleton) | `_babel_available` is a module-level sentinel; computed once at first call; `global` statement is the only stdlib mechanism for a mutable module-level singleton without a class | +| `core/depth_guard.py` | §1.3 immutability | §1.3 immutability protocol | Mutable `current_depth` counter required by context-manager `__enter__`/`__exit__`; state strictly scoped to each `with` block | +| `core/babel_compat.py` | PLW0603, F401, PLC0415 | §1.3 explicit control (global singleton) | `_babel_available` is a module-level sentinel computed once at first call; `global` is the only stdlib mechanism for a mutable module-level singleton without a class | | `syntax/parser/core.py`, `rules.py` | PLR0911, PLR0912, PLR0915 | §4.3 dispatch complexity | EBNF grammar rule dispatch: one function = one grammar rule; branching is structural, not accidental | | `syntax/serializer.py` | PLR0912 | §4.3 dispatch complexity | Classification-dispatch model (§4.6): `_serialize_pattern`, `_emit_classified_line`, `_serialize_expression` branches are exhaustive over closed grammar types | | `syntax/visitor.py` | ERA001, PLR0911, PLR0912 | §4.3 dispatch complexity | Visitor dispatch + docstring examples (`ERA001`); branching from closed AST node set | -| `runtime/resolver.py` | PLR0911, type:ignore[unreachable] | §4.3 dispatch complexity | `_resolve_expression`, `_get_fallback_for_placeable`: closed `Expression` union type, one return path per variant; `type:ignore[unreachable]` on `_get_fallback_for_placeable` `case _:` — union is statically exhaustive but wildcard is retained as safety net: error-recovery contract must always return a string, never raise | +| `runtime/resolver.py` | PLR0911, type:ignore[unreachable] | §4.3 dispatch complexity | `_resolve_expression`, `_get_fallback_for_placeable`: closed `Expression` union, one return per variant; `type:ignore[unreachable]` on `_get_fallback_for_placeable` `case _:` — union is statically exhaustive but wildcard is retained as safety net for the error-recovery contract (must always return a string, never raise) | | `runtime/cache.py` | PLR0911, PLR0912 | §4.3 dispatch complexity | `_make_hashable`: type dispatch over heterogeneous Python values; each branch handles a distinct Python type | -| `introspection/message.py` | N802, RUF022 | §4.1 visitor naming | `visit_NodeName` methods follow stdlib `ast.NodeVisitor` convention; `__all__` organized by category for public/internal clarity | +| `introspection/message.py` | N802, RUF022 | §4.1 visitor naming | `visit_NodeName` follows stdlib `ast.NodeVisitor` convention; `__all__` organized by category for public/internal clarity | | `runtime/bundle.py` | PLR0912, E501 | §4.3 dispatch complexity | Resource registration and validation coordination; long lines in structured logging messages | | `parsing/currency.py` | PLR0911, PLR0912 | §4.3 dispatch complexity | Ambiguous currency symbol disambiguation requires exhaustive symbol/territory resolution | | `parsing/dates.py` | DTZ007 | Naive datetime | Library does not impose timezone; caller provides timezone-aware values or explicitly opts into naive datetime | -| `runtime/locale_context.py` | DTZ001 | Naive datetime | `format_datetime` promotes a plain `date` to midnight `datetime` with no tzinfo; the date carried no timezone, so none is inferred — this is the correct semantics for a calendar date with no intrinsic time | -| `syntax/parser/whitespace.py` | SIM102 | Style | Nested `if` guards cursor state and EOF simultaneously; merging the conditions hides the state machine intent | +| `runtime/locale_context.py` | DTZ001 | Naive datetime | `format_datetime` promotes a plain `date` to midnight `datetime` with no tzinfo; the date carried no timezone, so none is inferred — correct semantics for a calendar date with no intrinsic time | +| `syntax/parser/whitespace.py` | SIM102 | Style | Nested `if` guards cursor state and EOF simultaneously; merging the conditions hides the state-machine intent | | `syntax/validator.py` | EM102 | Style | `TypeError` f-string messages: violation type includes dynamic type; static string would omit it | | Babel-optional modules (`parsing/`, `runtime/`, `introspection/`, `core/`) | PLC0415 | §4.2 runtime imports | Babel is optional; imports inside functions are the only way to make them lazy (avoids `ImportError` at module load for parser-only installs) | -| `diagnostics/formatter.py`, `diagnostics/validation.py` | PLC0415 | §4.2 runtime imports | Mutual runtime circular: `ValidationError`/`ValidationWarning` require runtime `isinstance` checks in formatter; `DiagnosticFormatter` is instantiated at runtime in validation factory. Neither is type-only — both execute code at call time. §4.2 pattern 2 is the correct resolution. | -| `diagnostics/codes.py` | PLC0415 | §4.2 runtime imports | `Diagnostic.format()` instantiates `DiagnosticFormatter` at runtime; circular between codes and formatter resolved per §4.2 pattern 2. | -| `validation/resource.py` | PLC0415 | §4.2 runtime imports | Resource validation triggers re-parse for annotation extraction; runtime circular between validation and syntax/parser layers. | -| `runtime/resolution_context.py` | §1.3 immutability | §1.3 immutability protocol | `ResolutionContext` uses mutable `_stack`, `_seen`, `_total_chars`, and `_expression_guard` for cycle detection and expansion tracking; §1.3 explicitly permits mutable accumulation buffers in performance-critical operations; isolation is guaranteed by creating a fresh instance per resolution call — no state leaks between concurrent resolutions | -| `runtime/function_bridge.py` | PLC0415 | §4.2 runtime imports | Function metadata loaded lazily on first call; runtime circular between bridge and function_metadata modules. | -| `runtime/bundle.py` (PLC0415) | PLC0415 | §4.2 runtime imports | Bundle loads `analysis.graph.entry_dependency_set` and `introspection.extract_references` at runtime; circular between runtime layer and analysis/introspection layers. | -| `core/__init__.py` | PLC0415, module `__getattr__` | §1.3 immutability | Lazy-loads `DepthGuard`/`depth_clamp` via module `__getattr__` to break circular import: `depth_guard` → `diagnostics` → `syntax.__init__` → `serializer` → `core.depth_guard`. Eager import during `ftllexengine` package init would deadlock the import chain. `globals()` mutation in `__getattr__` is a permanent, accepted stdlib pattern for module-level lazy singletons. | +| `diagnostics/formatter.py`, `diagnostics/validation.py` | PLC0415 | §4.2 runtime imports | Mutual runtime circular: `ValidationError`/`ValidationWarning` require runtime `isinstance` in formatter; `DiagnosticFormatter` is instantiated at runtime in validation factory. Neither is type-only | +| `diagnostics/codes.py` | PLC0415 | §4.2 runtime imports | `Diagnostic.format()` instantiates `DiagnosticFormatter` at runtime; circular between codes and formatter | +| `validation/resource.py` | PLC0415 | §4.2 runtime imports | Resource validation triggers re-parse for annotation extraction; runtime circular between validation and syntax/parser layers | +| `runtime/resolution_context.py` | §1.3 immutability | §1.3 immutability protocol | `ResolutionContext` uses mutable `_stack`, `_seen`, `_total_chars`, `_expression_guard` for cycle detection and expansion tracking; §1.3 permits mutable accumulation buffers in performance-critical operations; isolation is guaranteed by creating a fresh instance per resolution call | +| `runtime/function_bridge.py` | PLC0415 | §4.2 runtime imports | Function metadata loaded lazily on first call; runtime circular between bridge and function_metadata modules | +| `runtime/bundle.py` (PLC0415) | PLC0415 | §4.2 runtime imports | Bundle loads `analysis.graph.entry_dependency_set` and `introspection.extract_references` at runtime; circular between runtime and analysis/introspection layers | +| `core/__init__.py` | PLC0415, module `__getattr__` | §1.3 immutability | Lazy-loads `DepthGuard`/`depth_clamp` via module `__getattr__` to break circular import: `depth_guard` → `diagnostics` → `syntax.__init__` → `serializer` → `core.depth_guard`. Eager import during `ftllexengine` package init would deadlock the import chain. `globals()` mutation in `__getattr__` is a permanent, accepted stdlib pattern for module-level lazy singletons | | `parsing/guards.py` | TC003 | §4.2 TYPE_CHECKING | `date`, `datetime`, `Decimal` cannot be moved under TYPE_CHECKING: `typing.get_type_hints()` evaluates TypeIs annotation strings at runtime and requires these names in module globals; moving them causes `NameError` in callers using `get_type_hints()` | | `syntax/ast.py` | TC001 | §4.2 TYPE_CHECKING | `CommentType` is a public re-exported symbol; consumers do `from ftllexengine.syntax.ast import CommentType` at runtime; moving under TYPE_CHECKING would break this import | -| `localization/boot.py` | §1.3 immutability (`object.__setattr__`) | §1.3 immutability protocol | `_booted` guard requires a single post-init mutation (False→True) on a frozen dataclass. `object.__setattr__` bypasses the generated `__setattr__` — the same mechanism Python's own frozen dataclass `__init__` uses. Config fields remain permanently immutable; only the one-shot guard transitions, once, permanently. No alternative exists without abandoning `frozen=True` or changing the public API. | +| `localization/boot.py` | §1.3 immutability (`object.__setattr__`) | §1.3 immutability protocol | `_booted` guard requires a single post-init mutation (False→True) on a frozen dataclass. `object.__setattr__` bypasses the generated `__setattr__` — the same mechanism Python's frozen dataclass `__init__` uses. Config fields remain permanently immutable; only the one-shot guard transitions, once, permanently | +--- # 4. DESIGN PATTERNS & LINT INTEGRATION -## 4.1 Visitor Pattern Implementation -* **Pattern:** Follow the standard library's `ast.NodeVisitor` convention for AST traversal. -* **Waiver:** Suppress `N802` (function name snake_case) for dispatch methods like - `visit_Message` to match the node class names. +## 4.1 Visitor Pattern + +* Follow the standard library's `ast.NodeVisitor` convention for AST traversal. +* Suppress `N802` (function name snake_case) for dispatch methods like `visit_Message` to match node class names. ## 4.2 Runtime Imports (Circular Dependency Avoidance) -**Two distinct patterns, applied in priority order:** -1. **`TYPE_CHECKING` guard (preferred for type-only imports):** When a circular dependency - exists only because a type annotation references the other module, wrap the import under - `TYPE_CHECKING`. No `PLC0415` suppression is required (the import is still top-level); the - import is elided at runtime. +Two distinct patterns, applied in priority order: + +1. **`TYPE_CHECKING` guard (preferred for type-only imports).** When a circular dependency exists only because a type annotation references the other module, wrap the import under `TYPE_CHECKING`. No `PLC0415` suppression is required (the import is still top-level); the import is elided at runtime. ```python from typing import TYPE_CHECKING if TYPE_CHECKING: from ftllexengine.introspection import MessageIntrospection ``` -2. **Function-local import (runtime circular dependency):** Use only when the circular - dependency cannot be resolved via `TYPE_CHECKING` because the import is needed at runtime - (not just for type annotations). Requires `PLC0415` suppression with rationale. +2. **Function-local import (runtime circular dependency).** Use only when the import is needed at runtime (not just for annotations). Requires `PLC0415` suppression with rationale. ```python def _resolve(self) -> ...: from ftllexengine.runtime.cache import IntegrityCache # noqa: PLC0415 - runtime circular ... ``` -* **Constraint:** `TYPE_CHECKING` is always preferred. New `PLC0415` suppressions require - explicit justification proving `TYPE_CHECKING` is insufficient. - -## 4.3 Handling Complex Dispatch Logic -* **Pattern:** Grammar-derived or specification-driven dispatch logic has inherently high - branching complexity. This applies to both the parser and the serializer. -* **Waiver:** Suppress `PLR0912` (too many branches) and `PLR0915` (too many statements) for: - * The main parser loop (`syntax/parser/core.py`) — EBNF grammar rule dispatch - * Serializer classification-dispatch methods (`syntax/serializer.py`: - `_serialize_pattern`, `_emit_classified_line`) — documented in §4.6 - -**Fuzz pattern handlers:** `_pattern_*` functions in `fuzz_atheris/` that cover many -behavioral scenarios MUST use dispatch-to-sub-handlers rather than a single if/elif chain. -The top-level handler selects a sub-handler via an integer index into a tuple; each -sub-handler is a standalone function covering one scenario. This keeps each function under -complexity limits and makes scenario coverage explicit. Do NOT suppress PLR0912 on a monolithic -function — refactor it. + +`TYPE_CHECKING` is always preferred. New `PLC0415` suppressions require explicit justification proving `TYPE_CHECKING` is insufficient. + +## 4.3 Complex Dispatch Logic + +Grammar-derived or specification-driven dispatch logic has inherently high branching complexity. Suppress `PLR0912` (too many branches) and `PLR0915` (too many statements) for: + +* the main parser loop (`syntax/parser/core.py`) — EBNF grammar rule dispatch. +* serializer classification-dispatch methods (`syntax/serializer.py`: `_serialize_pattern`, `_emit_classified_line`) — documented in §4.6. + +**Fuzz pattern handlers.** `_pattern_*` functions in `fuzz_atheris/` that cover many behavioral scenarios MUST use dispatch-to-sub-handlers rather than a single if/elif chain. The top-level handler selects a sub-handler via an integer index into a tuple; each sub-handler is a standalone function covering one scenario. Do NOT suppress PLR0912 on a monolithic function — refactor it. ## 4.4 Type Narrowing (Union Types) -**Critical Implementation:** Never access attributes of a Union type without prior runtime -validation. -* **Action:** Always use explicit `isinstance()` checks or `match/case` blocks to narrow the - type before accessing specific attributes. + +Never access attributes of a Union type without prior runtime validation. Use explicit `isinstance()` checks or `match/case` to narrow before accessing specific attributes. ```python -# Type-Safe Narrowing Example -from ftllexengine.syntax.ast import Message, Term, Pattern +from ftllexengine.syntax.ast import Message, Term def get_entry_id(entry: Message | Term) -> str: - """Extract identifier from Message or Term using pattern matching.""" match entry: case Message(id=identifier): return identifier.name @@ -625,36 +413,29 @@ def get_entry_id(entry: Message | Term) -> str: raise TypeError(f"Unexpected entry type: {type(entry)}") ``` -## 4.5 Facade Layer (FluentBundle, FluentLocalization, LocalizationBootConfig) +## 4.5 Facade Layer -The facade layer is where the platform axioms from §1.1 are realized. All three facade classes -coordinate subsystems; none implement the logic they coordinate. The dependency graph is -**unidirectional** — delegate modules MUST NOT import any facade class. +The facade layer realizes the platform axioms from §1.1. All three facade classes coordinate subsystems; none implement the logic they coordinate. The dependency graph is **unidirectional** — delegate modules MUST NOT import any facade class. -### 4.5.1 FluentBundle — Single-Locale Formatting Unit +### 4.5.1 FluentBundle — single-locale formatting unit -`FluentBundle` is the core formatting unit. It owns a single locale and a set of parsed FTL -resources. +`FluentBundle` owns a single locale and a set of parsed FTL resources. -| Responsibility | Delegate Module | FluentBundle Role | -|:---------------|:----------------|:------------------| +| Responsibility | Delegate | FluentBundle role | +|:---------------|:---------|:------------------| | Parsing | `syntax.parser.FluentParserV1` | Calls `parse()`, registers results | | Resolution | `runtime.resolver.FluentResolver` | Instantiates, calls `resolve_message()` | | Validation | `validation.validate_resource()` | Single-line delegation | | Introspection | `introspection.extract_variables()`, `introspect_message()` | Single-line delegation | | Caching | `runtime.cache.IntegrityCache` | Holds reference, calls `get()`/`put()` | -**Metric Clarification:** FluentBundle has a high docstring-to-code ratio because it is the -primary public API facade. This is expected given the mandate in §2.3. High docstring ratio -is not debt. +FluentBundle's high docstring-to-code ratio is expected — it is the primary public API facade. High docstring ratio is not debt. -### 4.5.2 FluentLocalization — Multi-Locale Coordinator +### 4.5.2 FluentLocalization — multi-locale coordinator -`FluentLocalization` coordinates a set of locale-scoped `FluentBundle` instances and -implements the fallback chain. It does not hold bundles eagerly — bundle creation is lazy on -first `format_pattern` call for a given locale. +`FluentLocalization` coordinates locale-scoped `FluentBundle` instances and implements the fallback chain. Bundle creation is **lazy** — first `format_pattern` call for a given locale. -| Responsibility | Delegate | FluentLocalization Role | +| Responsibility | Delegate | FluentLocalization role | |:---------------|:---------|:------------------------| | Resource loading | `ResourceLoader` protocol | Calls `loader.load(locale, resource_id)` | | Bundle management | `FluentBundle` | Creates on demand, holds in `_bundles` dict | @@ -662,58 +443,41 @@ first `format_pattern` call for a given locale. | Boot validation | `require_clean()`, `validate_message_schemas()` | Provides pre-traffic validation API | | Audit log | `FluentBundle.get_cache_audit_log()` | Aggregates per-locale logs into dict | -### 4.5.3 LocalizationBootConfig — Strict-Mode Boot Orchestrator +### 4.5.3 LocalizationBootConfig — strict-mode boot orchestrator -`LocalizationBootConfig` is a one-shot boot coordinator, not a persistent object. It composes -`FluentLocalization`, `require_clean()`, and `validate_message_schemas()` into a single -audited boot sequence and discards itself after `boot()` returns the live `FluentLocalization`. +`LocalizationBootConfig` is a one-shot boot coordinator, not a persistent object. It composes `FluentLocalization`, `require_clean()`, and `validate_message_schemas()` into a single audited boot sequence and discards itself after `boot()` returns the live `FluentLocalization`. -* `boot()` → `(FluentLocalization, LoadSummary, tuple[MessageVariableValidationResult, ...])`: - PRIMARY API; executes full boot sequence and returns structured evidence for audit trails; - raises `IntegrityCheckFailedError` on any load failure, required-message absence, or schema - mismatch. -* `boot_simple()` → `FluentLocalization`: simplified form; raises on failure but discards - audit evidence; use when structured evidence is not required. -* The `LocalizationBootConfig` instance has no role after `boot()` completes. It is not - thread-safe to share across calls. +* `boot()` → `(FluentLocalization, LoadSummary, tuple[MessageVariableValidationResult, ...])`: PRIMARY API; executes full boot sequence and returns structured evidence; raises `IntegrityCheckFailedError` on any load failure, required-message absence, or schema mismatch. +* `boot_simple()` → `FluentLocalization`: simplified form; raises on failure but discards audit evidence; use when structured evidence is not required. +* The `LocalizationBootConfig` instance has no role after `boot()` completes. It is not thread-safe to share across calls. -**PROHIBITED Refactorings (all three facades):** -* Extracting facade methods into mixins (creates hidden C3 linearization complexity) -* Creating "Service" wrappers around single-line delegation methods (adds indirection, zero - benefit) -* Lifting delegate module internals to the facade (violates the unidirectional dependency - graph) +**Prohibited refactorings (all three facades):** + +* Extracting facade methods into mixins — creates hidden C3 linearization complexity. +* Creating "Service" wrappers around single-line delegation methods — adds indirection, zero benefit. +* Lifting delegate module internals to the facade — violates the unidirectional dependency graph. ## 4.6 Serializer Architecture (FluentSerializer) -**Pattern:** The serializer is a deterministic AST-to-FTL compiler. Its architecture separates -three concern layers and enforces a classify-then-dispatch model for continuation line emission. -### 4.6.1 Architectural Layers +The serializer is a deterministic AST-to-FTL compiler. Three concern layers, classify-then-dispatch model for continuation lines. + +### 4.6.1 Architectural layers | Layer | Responsibility | Methods | |:------|:---------------|:--------| | **Validation** | AST structural correctness (separate pass, runs first) | `_validate_resource`, `_validate_expression`, `_validate_pattern`, `_validate_call_arguments`, `_validate_identifier`, `_validate_select_expression` | -| **Node Serialization** | AST node dispatch via `match/case` | `_serialize_entry`, `_serialize_message`, `_serialize_term`, `_serialize_attribute`, `_serialize_comment`, `_serialize_junk`, `_serialize_expression`, `_serialize_call_arguments`, `_serialize_select_expression` | -| **Pattern Emission** | Continuation line classification, whitespace preservation, character escaping | `_serialize_pattern`, `_classify_line`, `_escape_text` | +| **Node serialization** | AST node dispatch via `match/case` | `_serialize_entry`, `_serialize_message`, `_serialize_term`, `_serialize_attribute`, `_serialize_comment`, `_serialize_junk`, `_serialize_expression`, `_serialize_call_arguments`, `_serialize_select_expression` | +| **Pattern emission** | Continuation line classification, whitespace preservation, character escaping | `_serialize_pattern`, `_classify_line`, `_escape_text` | -**Constraint:** Validation runs BEFORE serialization. Serialization code assumes validated -input. These layers MUST NOT be merged. +Validation runs BEFORE serialization. Serialization assumes validated input. These layers MUST NOT be merged. -### 4.6.2 Continuation Line Model +### 4.6.2 Continuation line model -The FTL parser interprets continuation lines structurally: leading whitespace is syntactic -indent, blank lines are stripped, and characters `.`, `*`, `[` as the first non-whitespace -trigger attribute/variant parsing. The serializer MUST ensure that content whitespace and -content syntax characters are not misinterpreted as structural. +The FTL parser interprets continuation lines structurally: leading whitespace is syntactic indent, blank lines are stripped, and characters `.`, `*`, `[` as the first non-whitespace trigger attribute/variant parsing. The serializer MUST ensure that content whitespace and content syntax characters are not misinterpreted as structural. -**Invariant:** Every continuation line emitted by the serializer must be unambiguous under FTL -parsing rules. Ambiguity is resolved by wrapping problematic content in `StringLiteral` -placeables (`{ "..." }`), which the parser treats as expression content, not structural syntax. +**Invariant:** every continuation line emitted by the serializer must be unambiguous under FTL parsing rules. Ambiguity is resolved by wrapping problematic content in `StringLiteral` placeables (`{ "..." }`), which the parser treats as expression content. -**Classification-Before-Dispatch:** - -Each continuation line is classified ONCE by a pure function, then handled through a single -`match/case` dispatch: +**Classification-before-dispatch.** Each continuation line is classified ONCE by a pure function, then handled through a single `match/case` dispatch: ```python class _LineKind(Enum): @@ -731,52 +495,36 @@ class _LineKind(Enum): | `SYNTAX_LEADING` | Parser treats first non-ws char as structural | Emit leading spaces as text, wrap syntax char in `StringLiteral` placeable | | `NORMAL` | None (may contain braces that need escaping) | Emit with brace escaping via `_escape_text` | -**PROHIBITED:** -* Handling whitespace ambiguity classes outside the classification-dispatch model (no scattered - `if` branches in multiple methods) -* Adding line-level concerns to `_escape_text` (it handles character-level brace escaping only) -* Modifying AST nodes to carry serializer-specific layout hints (AST represents language - structure, not rendering) -* Event/Layout/Emitter pipeline abstractions (overengineered for the Fluent 1.0 grammar, - which is a finalized specification with a fixed, closed node set) +**Prohibited:** + +* Handling whitespace ambiguity classes outside the classification-dispatch model (no scattered `if` branches in multiple methods). +* Adding line-level concerns to `_escape_text` (it handles character-level brace escaping only). +* Modifying AST nodes to carry serializer-specific layout hints. +* Event/Layout/Emitter pipeline abstractions (overengineered for a finalized closed-grammar specification). -### 4.6.3 Separate-Line Mode +### 4.6.3 Separate-line mode -When a pattern contains cross-element whitespace dependencies (a `TextElement` starting with -spaces follows a newline-ending element), the serializer outputs the pattern on a separate -line from `=` to establish `initial_common_indent` before any semantic whitespace. This is a -**pattern-level** decision, orthogonal to the per-line classification in §4.6.2. +When a pattern contains cross-element whitespace dependencies (a `TextElement` starting with spaces follows a newline-ending element), the serializer outputs the pattern on a separate line from `=` to establish `initial_common_indent` before any semantic whitespace. This is a **pattern-level** decision, orthogonal to the per-line classification in §4.6.2. -**Interaction:** `WHITESPACE_ONLY` and `SYNTAX_LEADING` lines are handled by per-line -wrapping, NOT by separate-line mode. Only `NORMAL` lines with leading whitespace after a -cross-element newline trigger separate-line mode. +`WHITESPACE_ONLY` and `SYNTAX_LEADING` lines are handled by per-line wrapping, NOT by separate-line mode. Only `NORMAL` lines with leading whitespace after a cross-element newline trigger separate-line mode. -### 4.6.4 Character-Level Escaping (`_escape_text`) +### 4.6.4 Character-level escaping (`_escape_text`) -The `_escape_text` function handles ONLY brace escaping: `{` and `}` at any position are -wrapped as `StringLiteral` placeables (per Fluent spec, braces in `TextElement` content must -be expressed as `{ "{" }` and `{ "}" }`). +`_escape_text` handles ONLY brace escaping: `{` and `}` at any position are wrapped as `StringLiteral` placeables (per Fluent spec, braces in `TextElement` content must be expressed as `{ "{" }` and `{ "}" }`). All other ambiguity concerns are resolved BEFORE `_escape_text`: -All other ambiguity concerns are resolved BEFORE `_escape_text` is called: -* Syntax characters (`.`, `*`, `[`) at continuation line starts: handled by - `_emit_classified_line` (`SYNTAX_LEADING` branch) -* Whitespace-only lines: handled by `_emit_classified_line` (`WHITESPACE_ONLY` branch) -* Newline detection and continuation line boundaries: text is pre-split by - `_serialize_pattern` +* Syntax characters at line starts: `_emit_classified_line` (`SYNTAX_LEADING` branch). +* Whitespace-only lines: `_emit_classified_line` (`WHITESPACE_ONLY` branch). +* Newline detection and continuation boundaries: text is pre-split by `_serialize_pattern`. ### 4.6.5 Exhaustiveness -All `match/case` dispatches on closed union types (`Entry`, `Expression`, `_LineKind`) MUST be -exhaustive. Use `assert_never()` from `typing` for enum dispatches and explicit -`case _: raise TypeError(...)` for AST union dispatches where the union may grow. +All `match/case` dispatches on closed union types (`Entry`, `Expression`, `_LineKind`) MUST be exhaustive. Use `assert_never()` from `typing` for enum dispatches and explicit `case _: raise TypeError(...)` for AST union dispatches where the union may grow. -## 4.7 Ruff Configuration and Operational Rules +## 4.7 Ruff Configuration -**Configuration:** `select = ["ALL"]` in `[tool.ruff.lint]`. New rules apply automatically; -explicit `ignore` or per-file-ignores required for any suppression. No curated allow-list — -the ignore list must justify every exception. +`select = ["ALL"]` in `[tool.ruff.lint]`. New rules apply automatically; explicit `ignore` or per-file-ignores required for any suppression. -### 4.7.1 Global `ignore` vs Per-File-Ignores +### 4.7.1 Global `ignore` vs per-file-ignores | Mechanism | Use when | |:----------|:---------| @@ -784,69 +532,52 @@ the ignore list must justify every exception. | Per-file-ignores | Rule is valid for most files but a specific file has a documented architectural reason for an exception | | Per-directory blanket | Entire directory has a distinct quality standard (`tests/`, `examples/`, `fuzz_atheris/`, `scripts/`) | -**Prohibited:** Suppressing a rule globally because one file needs it. One file's exception -belongs in per-file-ignores, not the global ignore list. +Suppressing a rule globally because one file needs it is prohibited. One file's exception belongs in per-file-ignores. -### 4.7.2 TC001/TC003 (TYPE_CHECKING Imports) — Non-Negotiable Exceptions +### 4.7.2 TC001/TC003 (TYPE_CHECKING) — non-negotiable exceptions -Two categories of imports **must never** be moved under `TYPE_CHECKING`, even when TC fires: +Two import categories must NEVER be moved under `TYPE_CHECKING`, even when TC fires: -1. **TypeIs annotation types**: `typing.get_type_hints()` evaluates annotation strings at - runtime in the module's `globals()`. If `date`, `datetime`, `Decimal` (or any type used in - `-> TypeIs[X]`) are under `TYPE_CHECKING`, `get_type_hints()` raises `NameError` at runtime - in callers. - - Affected: `parsing/guards.py` (`date`, `datetime`, `Decimal`) - - Fix: keep as direct import; add - `# noqa: TC003 - TypeIs return annotation requires X at runtime for get_type_hints() resolution` - -2. **Public re-exported symbols**: If callers do - `from ftllexengine.syntax.ast import CommentType` at runtime, moving `CommentType` under - `TYPE_CHECKING` in `ast.py` makes the import fail. - - Affected: `syntax/ast.py` (`CommentType`) - - Fix: keep as direct import; add - `# noqa: TC001 - CommentType is re-exported as a public runtime symbol` +1. **TypeIs annotation types.** `typing.get_type_hints()` evaluates annotation strings at runtime in the module's `globals()`. If `date`, `datetime`, `Decimal` (or any type used in `-> TypeIs[X]`) are under `TYPE_CHECKING`, `get_type_hints()` raises `NameError` at runtime in callers. + - Affected: `parsing/guards.py`. + - Fix: keep as direct import; add `# noqa: TC003 - TypeIs return annotation requires X at runtime for get_type_hints() resolution`. +2. **Public re-exported symbols.** If callers do `from ftllexengine.syntax.ast import CommentType` at runtime, moving `CommentType` under `TYPE_CHECKING` breaks the import. + - Affected: `syntax/ast.py`. + - Fix: keep as direct import; add `# noqa: TC001 - re-exported as a public runtime symbol`. Both are in the Known Waiver Registry (§3.7). -### 4.7.3 FBT001/FBT002 (Boolean Traps) — Fix Pattern +### 4.7.3 FBT001/FBT002 (boolean traps) — fix pattern -Ruff FBT flags boolean-typed positional parameters. **Preferred fix:** make the argument -keyword-only by adding `*` before it. +Ruff FBT flags boolean-typed positional parameters. **Preferred fix:** make the argument keyword-only with `*`. ```python -# BEFORE (FBT001 fires) +# BEFORE (FBT fires) def get_patterns(locale: str, allow_expansion: bool = True) -> list[str]: ... -# AFTER (FBT resolved) +# AFTER def get_patterns(locale: str, *, allow_expansion: bool = True) -> list[str]: ... ``` -After making an arg keyword-only, check all call sites — mypy reports "too many positional -arguments" for any missed site. - -**Acceptable waiver** (for truly internal private functions): add to per-file-ignores with -rationale. Do not add FBT to the global ignore. +After making an arg keyword-only, check all call sites — mypy reports "too many positional arguments" for any missed site. For truly internal private functions, per-file-ignores with rationale is acceptable. Do not add FBT to the global ignore. -### 4.7.4 C901 (McCabe Complexity) — Waiver Pattern +### 4.7.4 C901 (McCabe complexity) — waiver pattern -Grammar rules, AST visitor dispatch, and closed-union dispatch legitimately exceed the McCabe -threshold. Add C901 alongside PLR0912 in per-file-ignores: +Grammar rules, AST visitor dispatch, and closed-union dispatch legitimately exceed McCabe. Add C901 alongside PLR0912 in per-file-ignores: ```toml "src/ftllexengine/syntax/parser/rules.py" = ["PLR0911", "PLR0912", "PLR0915", "C901"] ``` -Rationale comment template: `"Grammar/AST dispatch: one function = one grammar rule; -cyclomatic complexity is structural, not accidental."` +Rationale template: *"Grammar/AST dispatch: one function = one grammar rule; cyclomatic complexity is structural, not accidental."* +--- # 5. VERIFICATION METHODOLOGY ## 5.1 Test File Naming Schema -Test file naming is a hard structural constraint, not a style preference. It determines -discoverability: an agent searching for tests covering `runtime/bundle.py` must be able to -predict the filename without scanning all 200+ test files. +Test file naming is a hard structural constraint, not a style preference. It determines discoverability: an agent searching for tests covering `runtime/bundle.py` must be able to predict the filename without scanning all 200+ test files. **Canonical schema:** `test_{package}_{module}[_{qualifier}].py` @@ -856,13 +587,11 @@ predict the filename without scanning all 200+ test files. | `{module}` | Module filename without `.py` | `bundle`, `resolver`, `serializer` | | `{qualifier}` | Optional single axis (see permitted list) | `_property`, `_integration` | -For nested subpackages, join segments with underscore: -`src/ftllexengine/syntax/parser/core.py` → `test_syntax_parser_core.py` +For nested subpackages, join with underscore: `src/ftllexengine/syntax/parser/core.py` → `test_syntax_parser_core.py`. -For top-level modules (`src/ftllexengine/enums.py`), omit the package segment: -`test_enums.py` +For top-level modules (`src/ftllexengine/enums.py`), omit the package segment: `test_enums.py`. -**Permitted qualifiers (exhaustive list):** +**Permitted qualifiers (exhaustive):** | Qualifier | Meaning | Runs in CI? | |:----------|:--------|:------------| @@ -872,26 +601,20 @@ For top-level modules (`src/ftllexengine/enums.py`), omit the package segment: | `_roundtrip` | Serialization/parse identity verification | Yes | | `_state_machine` | `RuleBasedStateMachine` tests (in `tests/fuzz/` only) | No | -No other qualifiers are permitted. If a file cannot be classified by one of these axes, -it belongs in an existing file or signals that file should be split. +No other qualifiers are permitted. If a file cannot be classified by one of these axes, it belongs in an existing file or signals that file should be split. -**Fuzz-marker test location:** All tests carrying `@pytest.mark.fuzz` MUST reside in -`tests/fuzz/`. The `tests/` root contains only tests that run in CI without the fuzz marker. -A `_property` file in `tests/` root is NOT a fuzz file even if it uses `@given`; the marker -and directory are what determine fuzz status (see §5.8). +**Fuzz-marker location.** All tests carrying `@pytest.mark.fuzz` MUST reside in `tests/fuzz/`. The `tests/` root contains only tests that run in CI without the fuzz marker. A `_property` file in `tests/` root is NOT a fuzz file even if it uses `@given`; the marker and directory are what determine fuzz status (see §5.8). -**Deprecated suffixes — prohibited for new files:** +**Deprecated suffixes — prohibited:** -| Deprecated suffix | Canonical replacement | -|:------------------|:----------------------| -| `_hypothesis` | `_property` | +| Deprecated | Canonical replacement | +|:-----------|:----------------------| +| `_hypothesis`, `_properties` | `_property` | | `_fuzzing` | Move file to `tests/fuzz/` | -| `_properties` | `_property` | -| `_comprehensive` | *(none; split into focused files by axis)* | -| `_advanced` | *(none; not a behavioral axis)* | -| `_edge_cases` | *(none; fold edge cases into primary or property file)* | +| `_comprehensive` | Split into focused files by axis | +| `_advanced`, `_edge_cases` | Not behavioral axes; fold into primary or property file | -**Files name systems under test, not motivations for writing them:** +**Files name systems under test, not motivations:** ``` PROHIBITED: test_system_quality_audit_fixes.py (internal task reference) @@ -904,34 +627,30 @@ REQUIRED: test_diagnostics_formatter_integration.py REQUIRED: test_runtime_resolver_property.py ``` -"And" in a filename is a mandatory split signal: the file covers two subjects and must -become two files. A file name that cannot map back to a single source module path is invalid. +"And" in a filename is a mandatory split signal: the file covers two subjects and must become two files. A filename that cannot map back to a single source module path is invalid. ## 5.2 Hypothesis-First Protocol -Property-Based Testing (Hypothesis) is the **primary** mechanism for verification, not an -afterthought. Unit tests with fixed inputs are appropriate only for CLDR-mandated exact output -values and `@example`-promoted Hypothesis failures (regression cases). All other verification -uses Hypothesis. -**HypoFuzz Symbiosis:** All Hypothesis tests are designed for coverage-guided fuzzing via -HypoFuzz. Tests and strategies MUST emit semantic coverage signals via `hypothesis.event()` to -guide the fuzzer toward interesting code paths. +Property-based testing (Hypothesis) is the **primary** verification mechanism. Unit tests with fixed inputs are appropriate only for CLDR-mandated exact output values and `@example`-promoted Hypothesis failures (regression cases). All other verification uses Hypothesis. + +**HypoFuzz symbiosis:** all Hypothesis tests are designed for coverage-guided fuzzing via HypoFuzz. Tests and strategies MUST emit semantic coverage signals via `hypothesis.event()` to guide the fuzzer toward interesting code paths. ## 5.3 Test Construction Strategy -Do not simply "fuzz" the code. You must construct tests based on deep code analysis: -### 5.3.1 Identify Properties -Before writing code, identify the mathematical properties of the component: -* *Roundtrip:* `decode(encode(x)) == x` -* *Idempotence:* `parse(parse(x).to_string()) == parse(x)` -* *Oracle:* Compare behavior against ShadowBundle or reference implementation -* *Metamorphic:* Predictable relationships (e.g., `len(filter(xs)) <= len(xs)`) +Construct tests based on deep code analysis, not blind fuzzing. + +### 5.3.1 Identify properties + +Before writing code, identify mathematical properties of the component: -### 5.3.2 Emit Semantic Coverage Events (MANDATORY) -**Constraint:** Every `@given` test — regardless of file or marker — MUST use `hypothesis.event()` -to signal semantically interesting behaviors invisible to code coverage. HypoFuzz treats events -as virtual branches, actively seeking inputs that produce new events. Preflight enforces this -across ALL `@given` tests, not just fuzz-marked modules. +* **Roundtrip:** `decode(encode(x)) == x` +* **Idempotence:** `parse(parse(x).to_string()) == parse(x)` +* **Oracle:** compare behavior against ShadowBundle or reference implementation. +* **Metamorphic:** predictable relationships (e.g., `len(filter(xs)) <= len(xs)`). + +### 5.3.2 Emit semantic coverage events (mandatory) + +Every `@given` test — regardless of file or marker — MUST use `hypothesis.event()` to signal semantically interesting behaviors invisible to code coverage. HypoFuzz treats events as virtual branches and actively seeks inputs that produce new events. Preflight enforces this across all `@given` tests, not just fuzz-marked modules. ```python from hypothesis import event, given @@ -939,20 +658,18 @@ from tests.strategies.ftl import ftl_placeables @given(placeable=ftl_placeables()) def test_placeable_serialization(placeable: Placeable) -> None: - # REQUIRED: Emit event for expression type diversity event(f"expr_type={type(placeable.expression).__name__}") result = serialize(placeable) parsed = parse(result) - # REQUIRED: Emit event for error paths if parsed.errors: event(f"error={type(parsed.errors[0]).__name__}") assert parsed.ast == placeable ``` -**Event Taxonomy (Use Consistently):** +**Event taxonomy (use consistently):** | Category | Format | Examples | |:---------|:-------|:---------| @@ -964,25 +681,18 @@ def test_placeable_serialization(placeable: Placeable) -> None: | Test parameter | `{param}={value}` | `thread_count=20`, `cache_size=50`, `reentry_depth=3` | | State machine | `rule={name}`, `invariant={name}` | `rule=add_simple_message`, `invariant=cache_stats_consistent` | -**Strategy Events vs Test Events:** +**Strategy events vs test events:** -* **Strategy events** are emitted by strategy functions in `tests/strategies/`. They are - tracked by `EXPECTED_EVENTS` in `tests/strategy_metrics.py` and drive strategy-level coverage - metrics. Format: `strategy={family}_{variant}` or `{domain}={variant}`. -* **Test events** are emitted by individual `@given` test functions and `@rule`/`@invariant` - methods. They guide HypoFuzz per-test but are NOT tracked by `EXPECTED_EVENTS`. Format: - `{param}={value}`, `outcome={result}`, `rule={name}`. +* **Strategy events** are emitted by strategy functions in `tests/strategies/`. They are tracked by `EXPECTED_EVENTS` in `tests/strategy_metrics.py` and drive strategy-level coverage metrics. Format: `strategy={family}_{variant}` or `{domain}={variant}`. +* **Test events** are emitted by individual `@given` test functions and `@rule`/`@invariant` methods. They guide HypoFuzz per-test but are NOT tracked by `EXPECTED_EVENTS`. Format: `{param}={value}`, `outcome={result}`, `rule={name}`. -When adding a new strategy, update `EXPECTED_EVENTS`. When adding test events, no metrics -update is needed. +When adding a new strategy, update `EXPECTED_EVENTS`. When adding test events, no metrics update is needed. -### 5.3.3 Strategy Construction (Soundness Over Exhaustion) -* Use `st.from_type()` and `st.builds()` to construct valid domain objects -* **Avoid:** High-rejection-rate filters on loose primitives (e.g., - `st.text().filter(is_valid_ftl)`). Low-rejection filters on constrained strategies are - acceptable when they improve readability. -* **REQUIRED:** Strategies MUST emit events when selecting between semantically distinct - variants +### 5.3.3 Strategy construction (soundness over exhaustion) + +* Use `st.from_type()` and `st.builds()` to construct valid domain objects. +* Avoid: high-rejection-rate filters on loose primitives (e.g., `st.text().filter(is_valid_ftl)`). Low-rejection filters on constrained strategies are acceptable when they improve readability. +* Strategies MUST emit events when selecting between semantically distinct variants. ```python @composite @@ -993,51 +703,34 @@ def ftl_placeables(draw: st.DrawFn, max_depth: int = 2) -> Placeable: - strategy=placeable_{choice}: Type of expression generated """ choice = draw(st.sampled_from(["variable", "function_ref", "term_ref"])) - - # REQUIRED: Emit strategy choice for fuzzer guidance event(f"strategy=placeable_{choice}") - # ... generation logic ... ``` -### 5.3.4 Contextual Awareness -Investigate how code is called. Define strategies that mirror real usage patterns (e.g., -chunked buffer inputs vs. whole-string inputs). +### 5.3.4 Contextual awareness + +Investigate how code is called. Define strategies that mirror real usage patterns (e.g., chunked buffer inputs vs. whole-string inputs). -### 5.3.5 Event Verification -**Constraint:** Verify event infrastructure coverage. +### 5.3.5 Event verification + +Verify event infrastructure coverage: ```bash ./scripts/fuzz_hypofuzz.sh --preflight ``` -**Enforcement Levels:** -1. **File-level:** Every `@pytest.mark.fuzz` module MUST contain `event()` calls. -2. **Per-test (AST-based):** Every `@given` test function across ALL test files (both - `tests/` root and `tests/fuzz/`) MUST emit at least one semantic event. The preflight tool - parses all test files via Python AST to verify this — the check is not scoped to fuzz-marked - modules. Any `@given` test without `event()` fails preflight with exit code 1. -3. **Strategy file coverage:** Every strategy implementation file in `tests/strategies/` MUST - emit `event()` calls. `__init__.py` is exempt as a pure re-export aggregator (enforced by - `_STRATEGY_REEXPORT_FILES` in the preflight script). A strategy file with 0 events gives - HypoFuzz zero semantic guidance — treated as an error, not a warning. -4. **Zero gaps:** Preflight must report zero gaps at all three levels. Any gap causes exit - code 1. - -**Violation:** If preflight shows fuzz modules, individual tests, or strategy files without -events, fuzzing sessions will have reduced semantic guidance. HypoFuzz captures events -internally for coverage decisions — components without events provide no semantic signals. +**Enforcement levels:** -**Scope Limitation:** Preflight validates `@given` tests only. `RuleBasedStateMachine` rules -and invariants use `@rule`/`@invariant` decorators (not `@given`), so their event coverage -is not checked by preflight. State machine event coverage is verified manually. +1. **File-level:** every `@pytest.mark.fuzz` module MUST contain `event()` calls. +2. **Per-test (AST-based):** every `@given` test function across ALL test files (both `tests/` root and `tests/fuzz/`) MUST emit at least one semantic event. The preflight tool parses all test files via Python AST. +3. **Strategy file coverage:** every strategy file in `tests/strategies/` MUST emit `event()` calls. `__init__.py` is exempt as a pure re-export aggregator (enforced by `_STRATEGY_REEXPORT_FILES` in the preflight script). A strategy file with 0 events gives HypoFuzz zero semantic guidance — treated as an error, not a warning. +4. **Zero gaps:** preflight must report zero gaps at all three levels. Any gap → exit code 1. -### 5.3.6 Runtime Strategy Metrics +**Scope limitation:** preflight validates `@given` tests only. `RuleBasedStateMachine` rules and invariants use `@rule`/`@invariant` (not `@given`), so their event coverage is verified manually. -The runtime metrics system (`tests/strategy_metrics.py`) complements preflight's static -analysis with dynamic event collection during test execution. +### 5.3.6 Runtime strategy metrics -**Three Core Constants:** +The runtime metrics system (`tests/strategy_metrics.py`) complements preflight's static analysis with dynamic event collection during test execution. | Constant | Purpose | |:---------|:--------| @@ -1045,17 +738,14 @@ analysis with dynamic event collection during test execution. | `STRATEGY_CATEGORIES` | Maps event prefixes to human-readable strategy family names | | `INTENDED_WEIGHTS` | Expected per-variant distribution within each strategy family | -**Metrics Collected:** Total events, per-strategy counts, weight skew (threshold: 0.15), -coverage gaps, performance percentiles. - -**Preflight vs Runtime Distinction:** +**Metrics collected:** total events, per-strategy counts, weight skew (threshold: 0.15), coverage gaps, performance percentiles. | Aspect | Preflight (`--preflight`) | Runtime (`--deep --metrics`) | |:-------|:--------------------------|:-----------------------------| | Method | Static AST analysis | Dynamic event collection | | Question | "Does `event()` exist in code?" | "Which events fired? At what frequencies?" | | Catches | Missing instrumentation | Dead code paths, weight skew | -| Speed | Instant (no test execution) | Requires full test run | +| Speed | Instant | Requires full test run | **Activation:** @@ -1063,20 +753,15 @@ coverage gaps, performance percentiles. ./scripts/fuzz_hypofuzz.sh --deep --metrics ``` -Environment variables: `STRATEGY_METRICS=1`, `STRATEGY_METRICS_LIVE=1`, -`STRATEGY_METRICS_DETAILED=1`. Results saved to `.hypothesis/strategy_metrics.json`. +Environment: `STRATEGY_METRICS=1`, `STRATEGY_METRICS_LIVE=1`, `STRATEGY_METRICS_DETAILED=1`. Results saved to `.hypothesis/strategy_metrics.json`. + +**Maintenance:** when adding a new event-emitting strategy in `tests/strategies/`, update all three constants in `tests/strategy_metrics.py`. Test-level events do not require metrics updates. -**Maintenance:** When adding a new event-emitting strategy in `tests/strategies/`, update all -three constants in `tests/strategy_metrics.py`. Test-level events (emitted by `@given` tests, -not strategies) do not require metrics updates. +## 5.4 Feedback Loop (Regression Proofing) -## 5.4 The Feedback Loop (Regression Proofing) -* **Discovery:** When Hypothesis finds a failure, it caches the minimal failing example in - `.hypothesis/examples/` -* **Action:** Investigate the root cause. Distinguish between a genuine bug and an incorrect - test assumption -* **Promotion:** For every non-trivial bug found, **promote the failing example** into the - test suite using the `@example(...)` decorator +* **Discovery:** when Hypothesis finds a failure, it caches the minimal failing example in `.hypothesis/examples/`. +* **Action:** investigate root cause. Distinguish a genuine bug from an incorrect test assumption. +* **Promotion:** for every non-trivial bug found, **promote the failing example** into the test suite using `@example(...)`. ```python @example(ftl="edge-case = { $var") # Promoted from Hypothesis finding @@ -1085,34 +770,29 @@ def test_roundtrip(ftl: str) -> None: ... ``` -**Crash Recording Infrastructure:** When a Hypothesis test fails, the `conftest.py` crash -recording hook (`pytest_runtest_makereport`) automatically: -1. Generates a standalone `repro_crash_.py` reproduction script in - `.hypothesis/crashes/` -2. Saves JSON metadata (test ID, example args, error type, timestamp) alongside the script -3. Creates portable crash files that persist independently of `.hypothesis/examples/` and - survive database cleanup +**Crash recording infrastructure.** When a Hypothesis test fails, the `conftest.py` crash-recording hook (`pytest_runtest_makereport`) automatically: + +1. Generates a standalone `repro_crash_.py` reproduction script in `.hypothesis/crashes/`. +2. Saves JSON metadata (test ID, example args, error type, timestamp) alongside the script. +3. Creates portable crash files that persist independently of `.hypothesis/examples/` and survive database cleanup. Use `./scripts/fuzz_hypofuzz.sh --repro` or run crash scripts directly for reproduction. ## 5.5 Database Persistence -The Hypothesis example database (`.hypothesis/examples/`) persists across fuzzing sessions. It -stores failing examples and covering examples (inputs that trigger distinct code paths during -`Phase.reuse`). -**Cross-Session Value:** -* **Phase.reuse:** Replays stored examples FIRST, catching regressions immediately -* **Example accumulation:** Each `--deep` session discovers new covering examples and failures -* **Shrink memory:** Minimal failing examples preserved across runs +The Hypothesis example database (`.hypothesis/examples/`) persists across fuzzing sessions. It stores failing examples and covering examples (inputs that trigger distinct code paths during `Phase.reuse`). + +* **Phase.reuse:** replays stored examples FIRST, catching regressions immediately. +* **Example accumulation:** each `--deep` session discovers new covering examples and failures. +* **Shrink memory:** minimal failing examples preserved across runs. -**Constraint:** Do NOT delete `.hypothesis/` between fuzzing sessions unless intentionally -resetting the database. A 30-minute session today + 30-minute session tomorrow = 60 minutes -of cumulative learning. +Do NOT delete `.hypothesis/` between fuzzing sessions unless intentionally resetting the database. A 30-minute session today + 30-minute session tomorrow = 60 minutes of cumulative learning. ## 5.6 Hypothesis Profiles -Profiles are defined in `tests/conftest.py`. Use the appropriate profile for context: -| Profile | max_examples | deadline | Use Case | +Profiles are defined in `tests/conftest.py`. + +| Profile | max_examples | deadline | Use case | |:--------|:-------------|:---------|:---------| | `dev` | 500 | 200ms | Local development | | `ci` | 50 | 200ms | Fast CI feedback (reproducible) | @@ -1120,122 +800,90 @@ Profiles are defined in `tests/conftest.py`. Use the appropriate profile for con | `hypofuzz` | 10000 | None | Coverage-guided `--deep` runs | | `stateful_fuzz` | 500 | None | State machine fuzzing | -**Profile Details:** * All profiles include `Phase.target` for targeted property exploration via `target()`. -* `ci` uses `derandomize=True` for reproducible builds and `print_blob=True` for failure - reproduction. -* `hypofuzz` suppresses `HealthCheck.too_slow` and `HealthCheck.data_too_large` for intensive - runs. +* `ci` uses `derandomize=True` for reproducible builds and `print_blob=True` for failure reproduction. +* `hypofuzz` suppresses `HealthCheck.too_slow` and `HealthCheck.data_too_large` for intensive runs. * `fuzz_hypofuzz.sh --deep` automatically sets `HYPOTHESIS_PROFILE=hypofuzz`. ## 5.7 Workflow Execution Order -The execution of scripts defines the quality gate. **All three steps must pass in order.** -1. **Lint:** `./scripts/lint.sh` (Ruff → Mypy). Must exit code 0. -2. **Test:** `./scripts/test.sh` (Pytest + Hypothesis + Coverage). Must meet the 95% - threshold. Must exit code 0. -3. **Preflight:** `./scripts/fuzz_hypofuzz.sh --preflight` (AST-based event audit). Must exit - code 0. Run whenever `tests/` or `tests/strategies/` files are modified. Runs in seconds - (no test execution); zero cost to always run. +The repository has one canonical full verification entry point: `./check.sh`. +Use it at the end of non-trivial work and whenever the change touches docs, +examples, scripts, packaging metadata, or release surfaces. + +For focused iteration inside the code/test loop, use these narrower commands: + +1. **Lint:** `./scripts/lint.sh`. Runs Ruff, mypy, the bare-`noqa` audit, and the explicit repository static validators. Must exit 0. +2. **Test:** `./scripts/test.sh`. Runs pytest + Hypothesis + coverage. The coverage threshold is read from `pyproject.toml`; do not duplicate percentages elsewhere. Must exit 0. +3. **Preflight:** `./scripts/fuzz_hypofuzz.sh --preflight`. AST-based event audit. Run whenever `tests/` or `tests/strategies/` files are modified. Runs in seconds (no test execution); zero cost to always run. + +### Script output design (agent-native, log-on-fail) -### Script Output Design (Agent-Native, Log-on-Fail) -Both `lint.sh` and `test.sh` are AI-agent-optimized with a **log-on-fail** design. Run them -directly without any output truncation: +`lint.sh` and `test.sh` are AI-agent-optimized with a **log-on-fail** design. Run them directly without output truncation: ```bash ./scripts/lint.sh ./scripts/test.sh ``` -**NEVER pipe through `tail`, `head`, or any output limiter. NEVER append redirection operators -(`2>&1`, `>`, `>>`).** The output is already appropriately sized: -* **On success:** emits only structured summary lines (`[PASS]`, JSON block). Already minimal - — no truncation needed. -* **On failure:** captures the full diagnostic log, then dumps it all at once. This dump IS - the analysis. Truncating it destroys the error context needed for diagnosis. +**NEVER pipe through `tail`, `head`, or any output limiter. NEVER append redirection operators (`2>&1`, `>`, `>>`).** Output is already appropriately sized: -Limiting output (e.g., `| tail -100`) means on failure you see only the summary footer, -missing the actual error details. Redirecting stderr (e.g., `2>&1`) loses the distinction -between stdout and the Bash tool's inherent stderr capture. The scripts are designed so the -agent never needs to re-run them to get more detail. +* On success: emits only structured summary lines (`[PASS]`, JSON block). Already minimal — no truncation needed. +* On failure: captures the full diagnostic log, then dumps it all at once. This dump IS the analysis. Truncating it destroys the error context needed for diagnosis. -## 5.8 Fuzz Test Skip Designation (Standardized) -**Constraint:** Intensive property tests excluded from normal runs use `@pytest.mark.fuzz` and -a standardized skip reason. +Limiting output (e.g., `| tail -100`) means on failure you see only the summary footer, missing the actual error details. Redirecting stderr (e.g., `2>&1`) loses the distinction between stdout and the Bash tool's inherent stderr capture. The scripts are designed so the agent never needs to re-run them to get more detail. -### Decision Criteria: When to Apply `@pytest.mark.fuzz` +## 5.8 Fuzz Test Skip Designation -The fuzz marker controls whether a test **runs at all** during `test.sh`. It is independent of -`event()` calls (which are mandatory in ALL `@given` tests per §5.3.2) and independent of -Hypothesis profiles (which control example counts when a test does run). +Intensive property tests excluded from normal runs use `@pytest.mark.fuzz` and a standardized skip reason. -| Test Category | Runs in CI? | Fuzz Marker? | Example Count | +### When to apply `@pytest.mark.fuzz` + +The marker controls whether a test **runs at all** during `test.sh`. It is independent of `event()` calls (mandatory in ALL `@given` tests per §5.3.2) and independent of Hypothesis profiles (which control example counts when a test does run). + +| Test category | Runs in CI? | Fuzz marker? | Example count | |:--------------|:------------|:-------------|:--------------| | Regular `@given` with `event()` | Yes | No | `ci`=50, `dev`=500 | | Intensive fuzz-only | No (skipped) | `@pytest.mark.fuzz` | Only under `--deep` (10000) | -**Apply `@pytest.mark.fuzz` ONLY when** the test meets one or more of these criteria: -* **State machines** (`RuleBasedStateMachine`) that explore exponential state spaces -* **Generators producing expensive objects** (deeply nested ASTs, large resources) where even - 50 examples would exceed CI time budgets -* **Tests with `deadline=None`** that intentionally allow slow individual examples -* **Tests requiring `suppress_health_check`** for `too_slow` or `data_too_large` - -**Hard placement rule:** Any test that uses `deadline=None` or -`suppress_health_check=[HealthCheck.too_slow]` MUST carry `@pytest.mark.fuzz` and reside in -`tests/fuzz/`. These settings signal that the test is intentionally slow — running 50 such -tests in CI would blow time budgets. Examples: boot-sequence tests that construct real loaders, -state machines. Do NOT place `deadline=None` -tests in `tests/` root even if they have bounded strategies. - -**Never hardcode `max_examples` in `tests/fuzz/`:** Fuzz tests MUST NOT set `max_examples=N` -in their `@settings` decorator. The `hypofuzz` profile controls exploration depth (10,000 for -`--deep --metrics`, continuous for HypoFuzz). Hardcoding `max_examples` overrides the profile -and artificially caps exploration — a `@settings(max_examples=20)` test runs only 20 examples -even under the `hypofuzz` profile's 10,000 budget. The only meaningful settings for fuzz tests -are `deadline=None`, `suppress_health_check`, and `stateful_step_count` (state machines only). - -**Do NOT apply `@pytest.mark.fuzz`** to standard `@given` tests with bounded strategies and -no deadline suppression. These run fast at 50 examples and benefit from CI regression -coverage. The Hypothesis profile system (`ci`/`dev`/`hypofuzz`) automatically scales example -counts — the same test runs with 50 examples in CI and 10000 under `--deep` without any -marker. - -### Marker Mechanics +**Apply `@pytest.mark.fuzz` ONLY when** the test meets one or more of: + +* **State machines** (`RuleBasedStateMachine`) that explore exponential state spaces. +* **Generators producing expensive objects** (deeply nested ASTs, large resources) where 50 examples would exceed CI time budgets. +* **Tests with `deadline=None`** that intentionally allow slow individual examples. +* **Tests requiring `suppress_health_check`** for `too_slow` or `data_too_large`. + +**Hard placement rule:** any test that uses `deadline=None` or `suppress_health_check=[HealthCheck.too_slow]` MUST carry `@pytest.mark.fuzz` and reside in `tests/fuzz/`. These settings signal that the test is intentionally slow — running 50 such tests in CI would blow time budgets. Examples: boot-sequence tests that construct real loaders, state machines. Do NOT place `deadline=None` tests in `tests/` root even if they have bounded strategies. + +**Never hardcode `max_examples` in `tests/fuzz/`.** Fuzz tests MUST NOT set `max_examples=N` in their `@settings` decorator. The `hypofuzz` profile controls exploration depth (10,000 for `--deep --metrics`, continuous for HypoFuzz). Hardcoding `max_examples` overrides the profile and artificially caps exploration — a `@settings(max_examples=20)` test runs only 20 examples even under the `hypofuzz` profile's 10,000 budget. The only meaningful settings for fuzz tests are `deadline=None`, `suppress_health_check`, and `stateful_step_count` (state machines only). + +**Do NOT apply `@pytest.mark.fuzz`** to standard `@given` tests with bounded strategies and no deadline suppression. These run fast at 50 examples and benefit from CI regression coverage. + +### Marker mechanics * **Marker:** `@pytest.mark.fuzz` at class or module level (`pytestmark = pytest.mark.fuzz`). -* **Skip Reason Prefix:** All fuzz skips use the reason prefix `"FUZZ:"`. The canonical reason - string is: +* **Skip reason prefix:** all fuzz skips use the prefix `"FUZZ:"`. Canonical reason: ``` FUZZ: run with ./scripts/fuzz_hypofuzz.sh --deep or pytest -m fuzz ``` -* **Prefix Requirement:** The `"FUZZ:"` prefix is a structural contract consumed by - `conftest.py` and `test.sh` for skip categorization. Do not alter the prefix. -* **Skip Breakdown Reporting:** `test.sh` emits `skipped_fuzz` and `skipped_other` in the - JSON summary. If `skipped_other > 0`, a `[WARN]` is emitted indicating non-fuzz tests were - skipped and require investigation. -* **Prohibited Variations:** `"SKIPPEDfuzz"`, `"SKIPPED fuzz"`, `"Fuzzing test"`, or any - other ad-hoc skip reason for fuzz tests. All fuzz skip reasons MUST use the `"FUZZ:"` prefix. +* **Prefix requirement:** the `"FUZZ:"` prefix is a structural contract consumed by `conftest.py` and `test.sh` for skip categorization. Do not alter the prefix. +* **Skip breakdown reporting:** `test.sh` emits `skipped_fuzz` and `skipped_other` in the JSON summary. If `skipped_other > 0`, a `[WARN]` is emitted indicating non-fuzz tests were skipped and require investigation. +* **Prohibited variations:** `"SKIPPEDfuzz"`, `"SKIPPED fuzz"`, `"Fuzzing test"`, or any other ad-hoc skip reason. All fuzz skip reasons MUST use the `"FUZZ:"` prefix. -### HypoFuzz Targeting Rationale +### HypoFuzz targeting rationale -`--deep` targets `tests/fuzz/` exclusively — NOT `tests/`. This is a deliberate concentration -strategy: +`--deep` targets `tests/fuzz/` exclusively — NOT `tests/`. This is a deliberate concentration strategy: | Target | Effect | |:-------|:-------| | `tests/fuzz/` (correct) | 4 workers concentrated on ~35 high-value, slow, open-ended targets | | `tests/` (wrong) | 4 workers diluted across 1500+ tests, most of which are fast and bounded | -Pointing HypoFuzz at `tests/` wastes worker capacity on tests that already run fine under CI's -50-example budget. The fuzz directory exists precisely to give HypoFuzz a concentrated set of -targets where unlimited exploration has the highest marginal value: state machines, pool -concurrency, boot sequences, subinterpreters. When adding new fuzz targets, always place them -in `tests/fuzz/`; `tests/` tests are CI regression suites, not fuzzing targets. +Pointing HypoFuzz at `tests/` wastes worker capacity on tests that already run fine under CI's 50-example budget. The fuzz directory exists precisely to give HypoFuzz a concentrated set of targets where unlimited exploration has the highest marginal value: state machines, pool concurrency, boot sequences, subinterpreters. When adding new fuzz targets, always place them in `tests/fuzz/`. + +## 5.9 Targeted Fuzzing with `target()` -## 5.9 Advanced: Targeted Fuzzing with target() -All profiles include `Phase.target`, so `target()` is active in every test run. Use it to -guide Hypothesis toward inputs that maximize specific metrics: +All profiles include `Phase.target`, so `target()` is active in every test run. Use it to guide Hypothesis toward inputs that maximize specific metrics: ```python from hypothesis import given, settings, target @@ -1244,193 +892,43 @@ from hypothesis import given, settings, target @given(source=ftl_chaos_source()) def test_parser_recovery(source: str) -> None: result = parse(source) - # Guide fuzzer toward inputs with more junk nodes (parser stress) target(len([e for e in result.body if isinstance(e, Junk)]), label="junk_count") ``` -The `target()` function accepts a numeric value and an optional label. Hypothesis actively -seeks inputs that maximize the targeted metric, making it effective for hunting specific bug -classes (deep nesting, large error counts, parser recovery stress). - -# 6. DOCUMENTATION PROTOCOL (MANDATORY) - -## 6.1 Governing Protocol -**Constraint:** All markdown file operations MUST comply with PROTOCOL_AFAD.md (v4.0). - -| File Pattern | Tier | Protocol Section | -|:-------------|:-----|:-----------------| -| `docs/DOC_*.md` | Reference | AFAD reference-doc rules | -| `README.md` (repository root) | Storefront special case | `AGENTS.md` root README exception | -| `*.md` (all other repo markdown) | Auxiliary / special | AFAD auxiliary-doc rules or native document convention | - -**Protocol Location:** `.codex/PROTOCOL_AFAD.md` - -## 6.2 Protocol Enforcement -**Before ANY markdown file operation**, the AI agent MUST: - -1. **LOAD** `.codex/PROTOCOL_AFAD.md`. -2. **IDENTIFY** the file tier (Reference = `DOC_*.md`, Auxiliary = other). -3. **COMPLY** with all schema requirements, formatting rules, and validation checks. -4. **REJECT** any user request that would violate the protocol. - -## 6.3 Reference Documentation (AFAD v4.0) -Applies to: `docs/DOC_00_Index.md`, `docs/DOC_01_*.md`, `docs/DOC_02_*.md`, etc. - -**Requirements:** -* YAML frontmatter with `afad: "4.0"`, `version`, `domain`, `updated`, `route` -* Component Entry Schema: Signature, Parameters table, Constraints -* First line states what symbol IS (embeddability) -* Minimal one-shot examples permitted (≤5 lines) -* Full type annotations on all signatures -* Entry ≤600 tokens (atomicity) - -## 6.4 Auxiliary Documentation (AFAD v4.0) -Applies to: any repo `*.md` file that does NOT match `docs/DOC_*.md`, except the repository -root `README.md` storefront special case. Examples: `CHANGELOG.md`, `docs/*_GUIDE.md`, -`docs/THREAD_SAFETY.md`, `examples/README.md`. - -**Requirements:** -* YAML frontmatter with `afad: "4.0"`, `version`, `domain`, `updated`, `route` where the file convention permits it -* Purpose/Prerequisites/Overview structure for guides -* Economy of words (no filler phrases) -* All code blocks specify language and are runnable -* QUICK_REFERENCE: task-oriented, copy-paste, zero prose -* Root `README.md` stays human-first and does not require AFAD frontmatter - -## 6.5 Prohibited Actions -The AI agent MUST NOT: - -* Create or modify markdown files without loading the protocol -* Violate schema requirements (missing Signature, missing Constraints) -* Add prose to Parameters tables (fragments only, ≤10 words) -* Add full API signatures to auxiliary docs (belongs in `DOC_*.md`) -* Duplicate content across files (consolidation required) -* Use filler phrases ("It is important to note...", "As mentioned earlier...") -* Create entries >600 tokens (split into atoms) - -## 6.6 Protocol Loading Requirement -**This is a BLOCKING requirement.** If instructed to create or modify any `*.md` file: +`target()` accepts a numeric value and an optional label. Hypothesis actively seeks inputs that maximize the targeted metric — effective for hunting specific bug classes (deep nesting, large error counts, parser recovery stress). -``` -LOAD .codex/PROTOCOL_AFAD.md -APPLY tier-appropriate AFAD 4.0 rules (reference-doc or auxiliary-doc path as applicable) -VALIDATE per AFAD 4.0 validation rules (L0-L2 blocking, L3 advisory) -``` +--- -Failure to load and comply with the governing protocol is a system failure. +# 6. INCIDENTAL OBSERVATIONS -# 7. VERSION DOCUMENTATION POLICY +This protocol is the project's concrete realization of `AGENTS.md` §7.3. -## 7.1 Single Source of Truth -**CHANGELOG.md is the authoritative record of version history.** -Version change documentation MUST NOT be duplicated in source code comments, docstrings, or -test documentation. +While reading source code for any task, the agent forms assessments about quality, defects, efficiency, and modernization opportunities. Capture them rather than discard them. -## 7.2 Prohibited Patterns in Source Code -**PROHIBITED** in `src/`, `tests/`, and `examples/`: -* `# v0.X.0: Feature added` — Version provenance comments -* `(TICKET-001 fix)` — Ticket reference annotations -* `As of v0.X.0` or `Since v0.X.0` — Behavioral version notes in docstrings -* `Updated in v0.X.0` — Change markers in comments +**Recording location:** `.codex/OBSERVATIONS_INCIDENTAL.txt` (created lazily on first use). -**PERMITTED** locations for version information: -* `__version__` in `__init__.py` -* `version` field in `pyproject.toml` -* `version:` in YAML frontmatter -* `- Version: Added in v0.X.0.` in `docs/DOC_*.md` reference documentation only +**What to record:** optimization opportunities (PERF, MEMORY, MODERN, SIMPLIFY) and defects (DEFECT — bugs, spec violations, security issues, API gaps). -**NOTE on MIGRATION.md**: This document is for **fluent.runtime → FTLLexEngine** migration -(external library), NOT for FTLLexEngine version-to-version upgrades. Version upgrade guidance -belongs in CHANGELOG.md. +**When to record:** upon noticing a real issue during any file read. Do not interrupt the current task workflow — record concisely and continue. -## 7.3 Test Documentation Standard -Test docstrings describe **WHAT** is tested, not **WHEN** it changed: -```python -# PROHIBITED -"""v0.39.0: Pound symbol is now ambiguous (GBP, EGP, GIP).""" +**Entry format:** -# REQUIRED -"""Pound symbol requires locale-aware resolution (ambiguous: GBP, EGP, GIP).""" -``` - -## 7.4 Reference Documentation Exception -Per §6.3 above, inline version metadata is permitted ONLY in `docs/DOC_*.md` files as part of -the Constraints section: -```markdown -- Version: Added in v0.31.0. -``` -This is the single permitted location for inline version notes outside CHANGELOG.md. - -## 7.5 Rationale -* **Maintenance Burden:** Version references scattered across 200+ locations require manual - updates each release. -* **Duplication:** Same change documented in CHANGELOG.md and inline creates drift risk. -* **Staleness:** Old version numbers remain as historical noise. -* **Mixed Concerns:** Behavioral documentation entangled with change history obscures intent. - -## 7.6 Enforcement -* New code MUST NOT introduce version provenance comments. -* Existing version references are grandfathered but SHOULD be removed when the code section is - modified for other reasons. - -# 8. INCIDENTAL OBSERVATION PROTOCOL - -## 8.1 Passive Discovery Mandate -**Constraint:** While performing any task that involves reading source code, the AI agent -naturally forms assessments about code quality, defects, efficiency, and modernization -opportunities. These observations MUST be captured rather than discarded. - -**Rationale:** The agent processes significant context during routine operations (file reads, -debugging, implementation). Optimization opportunities and defects noticed during this work -have value but are typically lost because no explicit directive exists to record them. - -## 8.2 Observation Scope -Record observations that are optimization opportunities and defects to -`.codex/OBSERVATIONS_INCIDENTAL.txt`: - -| Category | Examples | -|:---------|:---------| -| Performance | O(N) loop replaceable with O(1) lookup, unnecessary allocations | -| Modernization | Pre-PEP 695 patterns, deprecated stdlib usage | -| Simplification | Dead code paths, over-engineered abstractions | -| Memory | Cacheable computations, object pooling opportunities | -| Defects | Bugs, spec violations, security issues, API gaps | - -## 8.3 Recording Protocol -**Location:** `.codex/OBSERVATIONS_INCIDENTAL.txt` - -**When to Record:** Upon noticing an optimization opportunity or a defect during ANY file read -operation, append an entry. Do not interrupt the current task workflow — record concisely and -continue. - -**Entry Format:** ``` ------------------------------------------------------------------------ OBSERVED: FILE: : CATEGORY: PERF | MODERN | SIMPLIFY | MEMORY | DEFECT -OBSERVATION: <1-2 sentence description of what could be improved or fixed> -CURRENT: -SUGGESTED: +OBSERVATION: <1-2 sentence description> +CURRENT: +SUGGESTED: EFFORT: TRIVIAL | MINOR | MODERATE ------------------------------------------------------------------------ ``` -**Field Definitions:** -* `EFFORT: TRIVIAL` — Single-line or mechanical change -* `EFFORT: MINOR` — Localized change, <20 lines affected -* `EFFORT: MODERATE` — Cross-function or requires careful testing - -## 8.4 Non-Interruption Principle -Recording an observation MUST NOT: -* Interrupt the user's current task -* Trigger immediate remediation (unless user requests) -* Generate chat output announcing the observation -* Slow down the primary workflow +* `TRIVIAL` — single-line or mechanical change. +* `MINOR` — localized change, <20 lines affected. +* `MODERATE` — cross-function or requires careful testing. -The file serves as a backlog for future optimization and defect sprints, not an action queue. +**Non-interruption.** Recording must NOT interrupt the user's task, trigger immediate remediation (unless requested), generate chat output announcing the observation, or slow the primary workflow. The file is a backlog for future sprints, not an action queue. -## 8.5 Deduplication -Before recording, check if an equivalent observation already exists. If so, do not add a -duplicate entry. Observations that have been promoted to `ISSUES-VALID.txt` should be removed -from `OBSERVATIONS_INCIDENTAL.txt`. +**Deduplication.** Check for an equivalent existing entry before recording. Observations promoted to `ISSUES-VALID.txt` should be removed from `OBSERVATIONS_INCIDENTAL.txt`. diff --git a/.codex/AGENTS_JAVA26_GRADLE.md b/.codex/AGENTS_JAVA26_GRADLE.md index 50deb46b..78c8c7fe 100644 --- a/.codex/AGENTS_JAVA26_GRADLE.md +++ b/.codex/AGENTS_JAVA26_GRADLE.md @@ -1,15 +1,36 @@ # Java 26+ / Gradle Agent Protocol -**Scope:** Java **26+** projects built with **Gradle**: applications, libraries, CLIs, services, frameworks, plugins, tools, and multi-module builds. +**Version:** 2.0.0 +**Updated:** 2026-04-27 +**Inherits:** [.codex/UNIVERSAL_ENGINEERING_CONTRACT.md](./UNIVERSAL_ENGINEERING_CONTRACT.md) v2.0.0+ +**Scope:** Java **26+** projects built with **Gradle** — applications, libraries, CLIs, services, frameworks, plugins, tools, and multi-module builds. -**Primary objective:** produce Java that is correct, explicit, maintainable, compatible with the repository's real baseline, and validated through the narrowest sufficient feedback path. +## 0. Scope and inheritance + +This protocol inherits the Universal Engineering Contract. The universal contract defines the meta-questions every change must answer — Truth, Evidence, Consequence, Invariant, Justification, Re-cueing — and frames the agent as a *transient theory-holder*. Apply the universal contract before any rule below; do not restate it here. + +This protocol adds Java- and Gradle-specific content for which the universal contract is intentionally silent: language-feature posture, build wiring, JVM concurrency primitives, serialization shapes, and verification patterns appropriate to Java 26+. -Optimize in this order: +**Primary objective:** produce Java that is correct, explicit, maintainable, compatible with the repository's real baseline, and validated through the narrowest sufficient feedback path. -**correctness → explicit contracts → concurrency correctness → narrow API → evolution safety → readability → terseness** +**Optimization order:** correctness → explicit contracts → concurrency correctness → narrow API → evolution safety → readability → terseness. Terseness loses to clarity. Convenience loses to correctness. Cleverness loses to maintainability. +### 0.1 Java 26 + Gradle tacit gaps + +Per the Naurian frame, some theory the agent typically does not bring in cold and must surface rather than paper over. Watch especially for: + +- Whether Gradle wrapper, plugins, toolchain, IDE, and CI are all on a Java-26-capable line, and where that wiring actually lives. +- Whether the repository is an internal tool, a library, or a published artifact — this changes what "public" and "compatible" mean. +- Whether preview or incubator features are intentionally enabled for the slice being touched. Compile, test, runtime, IDE, packaging, and CI must all agree, or the feature silently fails in one phase. +- Whether the codebase has already migrated past historical `synchronized` / virtual-thread pinning workarounds. Java 26's locking advice differs from pre-24 advice; old folklore decays badly. +- Whether JPMS `opens` directives reflect intentional reflective seams or accumulated escape hatches. +- Whether serialized shapes are external contract or scratch internals. +- Whether deep reflection on `final` fields (now warned by default in Java 26) is in use anywhere along the change's path. + +Where the answer is not derivable from code, history, or conversation, surface the gap explicitly; do not assume the convenient answer. + ## 1. Repository intake Before touching source, establish the repository baseline: @@ -18,13 +39,13 @@ Before touching source, establish the repository baseline: 2. **Shape:** application, library, plugin, framework, CLI, or multi-module build; JPMS usage; generated code; publication targets; runtime packaging. 3. **Tests and CI:** test framework, canonical verification tasks, coverage tools, static analysis, CI matrix, release gates. 4. **Compatibility posture:** internal tool, published library, plugin, framework, service API, wire protocol, serialized format, or migration-sensitive data model. -5. **System map:** truth, evidence, consequence, invariant, and preservation for the touched surface. +5. **System map:** apply the universal contract's six concerns (truth, evidence, consequence, invariant, justification, re-cueing) to the touched surface. Do not assume the project wants the newest syntax, the broadest refactor, or a published-library compatibility posture. Derive the posture from the repository and task. ## 2. Change loop -For new behavior, start with the smallest failing proof of behavior: test, assertion, reproducible check, type-level constraint, contract test, or manual verification path. +Per the universal contract §2 (Red → Green → Refactor), start with the smallest failing proof of behavior: test, assertion, reproducible check, type-level constraint, contract test, or manual verification path. Then: @@ -85,7 +106,7 @@ When introducing one: - enable it explicitly and consistently across all affected phases; - keep the blast radius small; - avoid leaking preview-dependent types through broad public APIs unless the project accepts that risk; -- document why it is worth the cost; +- record the justification (per universal contract §1.5) so the next reader can see why the cost was accepted; - prefer wrappers or adapters if later redesign is likely. Currently relevant Java 26 preview/incubator features: @@ -392,7 +413,7 @@ Rules: ### 8.7 Canonical ownership -If something is canonical, define it once: domain invariants, operation catalogs, protocol semantics, error classification systems, enum vocabularies, validation rules, configuration schema. +The universal contract §5 defines the canonical-ownership rule. Java/Gradle-relevant facts that typically need a single owner: domain invariants, operation catalogs, protocol semantics, error classification systems, enum vocabularies, validation rules, configuration schema, version catalog coordinates. Every surface that exposes the fact must derive from that owner or from generated artifacts rooted in it. @@ -451,7 +472,7 @@ Convention plugin IDs must be qualified (`com.example.project.java-library`), no If preview syntax or APIs are used, synchronize configuration across compilation, test execution, runtime tasks, CI, IDE/developer workflow, packaging, and documentation. -Do not wire preview support for only one phase. +Do not wire preview support for only one phase. A preview feature enabled in `compileJava` but not `test` is the canonical way to ship an unverified change. ### 9.8 Build performance features @@ -522,25 +543,21 @@ Default style: Repository convention overrides naming style when already consistent. -## 11. Refactoring and deletion - -### 11.1 Coherent repair +## 11. Refactoring and deletion (Java-specific notes) -If a small patch would preserve or deepen a bad structure, widen to the nearest coherent boundary. Do not stack another workaround on a workaround. +The universal contract covers Boy Scout + Mikado discipline (§3), architecture as preserved theory (§4), and deletion-requires-proof (§8). The notes below add Java/Gradle-specific concerns; they are not a replacement for those sections. -When existing code violates hard boundaries, fix it if the fix is small and local. If systemic, flag it or record it through the repository's observation process. Do not silently extend the bad pattern. +### 11.1 Compatibility-aware refactoring -### 11.2 Compatibility-aware refactoring +Refactor aggressively inside private and internal surfaces. Refactor public or published surfaces deliberately, with migration cost, binary/source compatibility, serialization, and user contracts treated as design inputs. For libraries and plugins, Naur's "amorphous additions" warning bites hardest at the public surface — every patch made without the published-API theory tends to leak shape. -Refactor aggressively inside private and internal surfaces. Refactor public or published surfaces deliberately, with migration cost, binary/source compatibility, serialization, and user contracts treated as design inputs. - -### 11.3 Structural tasks +### 11.2 Structural tasks When the task is about scaffolding, architecture, or repository cleanup, audit the whole affected surface: module layout, package names, build logic, convention plugins, dependency centralization, CI assumptions, generated code, and verification tasks. Do not stop at the first file named in the prompt if the real problem is structural. -### 11.4 God constructs +### 11.3 God constructs A god construct concentrates unrelated responsibilities in one place. @@ -552,11 +569,15 @@ Refactoring signals: Refactor by extracting cohesive types or helpers named for domain purpose. Never extract merely to save lines. -### 11.5 Safe deletion +### 11.4 JPMS and reflection deletion hazards -Before deleting code, prove the blast radius: static references, dynamic references, generated code, serialized forms, migrations, external consumers, jobs, dashboards, alerts, runbooks, and human workflows. +Java-specific deletion hazards beyond the universal §8 list: -If uncertainty remains, make the smallest reversible simplification and preserve the uncertainty in the work summary or observation log. +- `opens` directives and the reflective consumers they serve; +- `ServiceLoader` registrations under `META-INF/services/`; +- `Class.forName`, MethodHandles, VarHandles, and other late-bound references; +- annotation processors, KAPT/KSP, and generated code rooted in deleted types; +- preview-feature flags whose removal silently downgrades an in-use API. ## 12. Documentation and self-containment @@ -574,7 +595,7 @@ Record component accessors usually do not need Javadoc beyond clear component na - No filler such as "This method..." or "This class...". - Use `@param` and `@return` only when names alone are insufficient. - Do not add comments or Javadoc that merely restate code. -- Use inline comments only for non-obvious reasoning, invariants, or boundary decisions. +- Use inline comments only for non-obvious reasoning, invariants, or boundary decisions — i.e., where the *why* (per universal contract §1.5 Justification) cannot be read off the code. ### 12.3 Self-containment @@ -626,7 +647,7 @@ Use either asynchronous dependency automation or a sync gate paired with automat ## 14. Incidental observation protocol -When reading a file surfaces a defect, rule violation, or clear improvement outside the active task, record it in the project's designated observation log if one exists. Do not derail the active task unless the issue blocks correctness or safety. +When reading a file surfaces a defect, rule violation, or clear improvement outside the active task, record it in the project's designated observation log if one exists. Do not derail the active task unless the issue blocks correctness or safety. This is the Java-side practice for honoring the universal contract's rule that the next improvement is a separate slice (§10). Each entry should record: @@ -644,13 +665,7 @@ When resolved, update the entry in place rather than deleting it. If no observat ## 15. Pre-output checklist -Run this before producing output. - -### System theory - -- Did you identify truth, evidence, consequence, invariant, and preservation for the touched surface? -- Did you avoid patching derived state when the source of truth was wrong? -- Did you consider blast radius beyond direct callers? +The universal contract §10 (stop conditions) and §9 (output contract) define the cross-language stops. The checks below are Java/Gradle-specific additions; do not duplicate the universal output template here. ### Java semantics @@ -666,7 +681,6 @@ Run this before producing output. - Are public surfaces compatible with their consumers? - Are serialization and external contracts preserved or intentionally evolved? - Are enum-to-wire mappings explicit where the wire vocabulary matters? -- Are canonical contract facts owned once? ### Concurrency @@ -680,7 +694,7 @@ Run this before producing output. - Did you use the wrapper and toolchains? - Is Gradle new enough for Java 26 when Java 26 is required? - Are versions pinned and centralized? -- Are preview features wired consistently if used? +- Are preview features wired consistently across compile, test, runtime, IDE, and CI? - Did you avoid concurrent Gradle invocations in the same project? ### Verification diff --git a/.codex/AGENTS_KOTLIN24_GRADLE.md b/.codex/AGENTS_KOTLIN24_GRADLE.md index 35d3d23b..63ea11f6 100644 --- a/.codex/AGENTS_KOTLIN24_GRADLE.md +++ b/.codex/AGENTS_KOTLIN24_GRADLE.md @@ -1,14 +1,23 @@ # Kotlin 2.4+ / Gradle Agent Protocol +**Version:** 2.0.0 +**Updated:** 2026-04-27 +**Inherits:** [.codex/UNIVERSAL_ENGINEERING_CONTRACT.md](./UNIVERSAL_ENGINEERING_CONTRACT.md) v2.0.0+ **Scope:** Kotlin repositories that intentionally use Kotlin **2.4+** or are being migrated to it. This includes Kotlin/JVM, Kotlin Multiplatform, Kotlin/Native, Kotlin/Wasm, Kotlin/JS, Android Kotlin modules, libraries, CLIs, services, plugins, and multi-module Gradle builds. +## 0. Scope and inheritance + +This protocol inherits the Universal Engineering Contract. The universal contract defines the meta-questions every change must answer — Truth, Evidence, Consequence, Invariant, Justification, Re-cueing — and frames the agent as a *transient theory-holder*. Apply the universal contract before any rule below; do not restate it here. + +This protocol adds Kotlin- and Gradle-specific content for which the universal contract is intentionally silent: Kotlin language and compiler posture, multiplatform and interop boundaries, coroutines and Flow semantics, Gradle Kotlin DSL build wiring, and verification patterns appropriate to Kotlin 2.4+. + **Current posture:** Kotlin 2.4 may be an EAP/Beta in the target repository. Treat EAP adoption as an explicit repository decision, not as a default upgrade path. If the project is on Kotlin 2.3.x or lower, do not silently migrate it to Kotlin 2.4+ unless the task is a migration or the repository already opts in. **Build default:** Gradle Kotlin DSL. Do not introduce Groovy build logic. Use Maven guidance only when the repository is already Maven-based. **Compiler default:** K2 is the normal compiler path. Do not add compatibility shims for K1-era behavior unless the repository has a documented reason. -Optimize in this order: +**Optimization order:** ```text correctness → explicit contracts → concurrency correctness → narrow API → evolution safety → readability → terseness @@ -16,7 +25,21 @@ correctness → explicit contracts → concurrency correctness → narrow API Terseness loses to clarity. Convenience loses to correctness. Cleverness loses to maintainability. -This protocol inherits `.codex/UNIVERSAL_ENGINEERING_CONTRACT.md`. Do not duplicate the universal contract here; apply it before all Kotlin-specific rules. +### 0.1 Kotlin 2.4 + Gradle tacit gaps + +Per the Naurian frame, some theory the agent typically does not bring in cold and must surface rather than paper over. Watch especially for: + +- Whether Kotlin 2.4 in this repository is GA, RC, Beta, or EAP — and whether the repository has a deliberate EAP policy or just inherited a version pin. +- Whether K2 is on, and whether any K1-era workaround is still load-bearing. +- Which experimental flags (`-Xcontext-parameters`, `-Xexplicit-context-arguments`, `-Xcollection-literals`, `-XXLanguage:+IntrinsicConstEvaluation`, etc.) are actually enabled in this slice. Compile, test, IDE, KSP, and CI must agree. +- Whether the repository is an internal app, a published library, a Gradle plugin, or a multiplatform package — this changes the meaning of "public" and "compatible." +- Whether **Rich Errors** are an available compiler feature (they are not as of Kotlin 2.4-Beta2). The agent must not invent the syntax. This is the most common Kotlin hallucination class. +- Whether Kotlin 2.4's annotation defaulting changes silently shifted serialization, validation, DI, or reflection behavior in this codebase. +- Whether `commonMain` is genuinely platform-neutral, or quietly leaking JVM/Android/Foundation/Node types. +- Whether old Kotlin/Native freezing or memory-manager folklore is still embedded in code or tests; the new memory manager has different rules. +- Whether Gradle daemon or Kotlin compile daemon failures look like Kotlin source errors. Treat daemon symptoms as infrastructure first. + +Where the answer is not derivable from code, history, or conversation, surface the gap explicitly; do not assume the convenient answer. --- @@ -36,6 +59,7 @@ Check: - Public API posture: application, internal library, published SDK, Gradle plugin, framework integration, or multiplatform package. - Test infrastructure: JUnit, Kotest, kotlinx-coroutines-test, MockK, Testcontainers, Android instrumentation, Native/JS/Wasm test tasks. - CI tasks and whether local verification exactly mirrors CI. +- The universal contract's six concerns (truth, evidence, consequence, invariant, justification, re-cueing) for the touched surface. Do not infer the baseline from file names alone. The version catalog and convention plugins are usually the canonical build truth. @@ -74,7 +98,7 @@ Do not introduce these unless the repository already enables the flag or the tas - UUID V4/V7 generation APIs. - Any Kotlin/Native, Swift export, JS, Wasm, or metadata feature still marked Experimental by the compiler or documentation. -When adding an experimental feature deliberately, update the canonical build policy, add a short rationale, isolate the feature behind explicit compiler flags, and add verification that fails if the flag is removed accidentally. +When adding an experimental feature deliberately, update the canonical build policy, record the *justification* (per universal contract §1.5) so the next reader can see why the cost was accepted, isolate the feature behind explicit compiler flags, and add verification that fails if the flag is removed accidentally. ### 2.3 Rich Errors posture @@ -89,7 +113,7 @@ fun loadUser(id: UserId): User | UserError error class UserError(...) ``` -Treat Rich Errors as a future design direction: valuable for thinking about explicit recoverable failures, not as code the agent may invent. +Treat Rich Errors as a future design direction: valuable for thinking about explicit recoverable failures, not as code the agent may invent. This is the single most important *tacit gap* for a Kotlin 2.4 agent (see §0.1). --- @@ -280,7 +304,7 @@ data class UserDto( Use `@all:` only when the annotation genuinely belongs on all relevant property targets and doing so does not change framework behavior unexpectedly. -When migrating to Kotlin 2.4 annotation defaulting rules, verify serialization, validation, DI, persistence, reflection, and annotation-processing behavior. Annotation placement drift can be a runtime contract bug. +When migrating to Kotlin 2.4 annotation defaulting rules, verify serialization, validation, DI, persistence, reflection, and annotation-processing behavior. Annotation placement drift can be a runtime contract bug — and the kind of silent change a transient theory-holder is least likely to notice (see §0.1). ### 5.4 Guard conditions and context-sensitive resolution @@ -508,18 +532,7 @@ A file should contain one coherent responsibility cluster. Do not create god fil ### 9.5 Canonical ownership -Every contract-defining fact has one canonical owner: - -- enum wire vocabularies, -- validation limits, -- feature capabilities, -- command names, -- error codes, -- schema names, -- route names, -- event types, -- permission names, -- generated docs and examples. +The universal contract §5 defines the canonical-ownership rule. Kotlin/Gradle-relevant facts that typically need a single owner: enum wire vocabularies, validation limits, feature capabilities, command names, error codes, schema names, route names, event types, permission names, version-catalog coordinates, and generated docs/examples. Other surfaces derive from the canonical owner or from generated artifacts rooted in it. Drift must fail verification. @@ -624,7 +637,7 @@ Swift package import and Flow export are important Kotlin 2.4 Native capabilitie ### 12.3 Native memory and concurrency -Kotlin/Native GC behavior changed over recent releases. Do not cargo-cult old freezing or memory-manager rules. Verify the current runtime behavior and repository target versions. +Kotlin/Native GC behavior changed over recent releases. Do not cargo-cult old freezing or memory-manager rules. Verify the current runtime behavior and repository target versions. This is a typical tacit-gap zone (see §0.1) — old folklore in tests and helpers can outlive the runtime change. Performance-sensitive Native changes need benchmarks or measurable runtime evidence. @@ -710,7 +723,7 @@ Rules: - In multi-agent/multi-project environments, isolate daemon pools with project-local `GRADLE_USER_HOME` when needed. - Add project-local Gradle homes to `.gitignore`. - Use `./gradlew --stop` only to recover from confirmed daemon corruption, not as routine build hygiene. -- Treat “Could not connect to Kotlin compile daemon” and similar failures as infrastructure first. Retry cleanly before editing build logic. +- Treat "Could not connect to Kotlin compile daemon" and similar failures as infrastructure first. Retry cleanly before editing build logic. ### 13.7 Dependency anti-hallucination @@ -731,7 +744,7 @@ Do not invent coordinates or assume APIs from memory. ### 14.1 Red → Green → Refactor -For new behavior, start with the smallest failing proof: unit test, property test, integration test, compiler check, serialization round trip, or reproducible script. +Per universal contract §2. For Kotlin specifically, the smallest failing proof can be a unit test, property test, integration test, compiler check, serialization round trip, or reproducible script. Then implement the minimum code that passes. Then refactor until the touched system is clearer and easier to change. @@ -806,7 +819,7 @@ When changing public API, state whether the change is source-compatible, binary- Application code can prioritize operational clarity and internal maintainability. -Library code must prioritize consumer clarity, compatibility, conservative API surface, and predictable behavior across Kotlin/Java/platform versions. +Library code must prioritize consumer clarity, compatibility, conservative API surface, and predictable behavior across Kotlin/Java/platform versions. Naur's "amorphous additions" warning applies most sharply at the published-API surface, where each patch made without the published-API theory tends to leak shape. --- @@ -821,6 +834,7 @@ KDoc rules: - Document cancellation, dispatcher, and Flow semantics for coroutine APIs. - Document serialization, wire names, and compatibility-sensitive defaults. - Keep examples small, compileable, and synchronized with the canonical API. +- Inline comments are reserved for *why* the code is the way it is — the universal contract §1.5 Justification — not for restating *what* it does. Generated documentation must derive from canonical code or schema. Do not maintain duplicate contract facts manually. @@ -857,35 +871,24 @@ Keep logical Gradle paths clear even if physical directories are grouped. --- -## 18. Agent output contract for Kotlin work +## 18. Agent output for Kotlin work -For non-trivial Kotlin changes, the summary must include: +The universal contract §9 defines the cross-language output template (Truth, Evidence, Consequence, Invariant, Justification, Re-cueing). Use that template. For Kotlin/Gradle work, the following are typical Kotlin-specific entries the next reader will need: -- changed behavior, -- source of truth touched, -- validation or feedback added/used, -- blast radius considered, -- invariant preserved, -- verification run, -- Kotlin/Gradle/compiler flags affected, -- public or serialized contract impact, -- documentation or system theory preserved. +- Kotlin/Gradle/JDK/compiler-plugin versions affected. +- Compiler flags or opt-ins enabled or relied upon. +- Public, serialized, or interop (Java/Swift/JS) contracts touched. +- Coroutine ownership, cancellation, and dispatcher decisions. +- Multiplatform source sets affected and which platforms were verified. +- Tacit gaps from §0.1 that this change did not close (and who would close them). -Do not provide only “updated the code.” Explain the engineering consequence proportionally to risk. +Do not provide only "updated the code." Per the universal contract, silence on justification gaps and inexpressible theory claims a theory the agent does not have. --- ## 19. Pre-output checklist -Before yielding Kotlin code, verify: - -**System theory** - -- [ ] Source of truth identified. -- [ ] Feedback path identified or added. -- [ ] Blast radius considered. -- [ ] Invariant preserved. -- [ ] Theory preserved in tests/docs/code where appropriate. +The universal contract §10 (stop conditions) covers the cross-language stops. The Kotlin-specific checks below are additions, not a replacement. **Semantics** @@ -907,7 +910,7 @@ Before yielding Kotlin code, verify: - [ ] Visibility is intentional. - [ ] Public return types are explicit where required. - [ ] Java/Swift/JS/serialization contracts were considered where applicable. -- [ ] Annotation targets are deliberate. +- [ ] Annotation targets are deliberate (Kotlin 2.4 defaulting changes considered). **Build** @@ -923,4 +926,4 @@ Before yielding Kotlin code, verify: - [ ] Coroutine/time/concurrency tests are deterministic. - [ ] Any skipped verification is stated honestly. -If any answer is “no” or “unclear,” refactor or surface the uncertainty before final output. +If any answer is "no" or "unclear," refactor or surface the uncertainty before final output. diff --git a/.codex/AGENTS_PYTHON313.md b/.codex/AGENTS_PYTHON313.md index e0d5b09e..c6130e5f 100644 --- a/.codex/AGENTS_PYTHON313.md +++ b/.codex/AGENTS_PYTHON313.md @@ -1,12 +1,19 @@ # Python 3.13+ Agent Protocol -This protocol governs agent work on Python projects that target Python 3.13 or newer, or that use Python 3.13 as the lowest supported runtime. +**Version:** 2.0.0 +**Updated:** 2026-04-27 +**Inherits:** [.codex/UNIVERSAL_ENGINEERING_CONTRACT.md](./UNIVERSAL_ENGINEERING_CONTRACT.md) v2.0.0+ +**Scope:** Python projects targeting **Python 3.13+** or using Python 3.13 as the lowest supported runtime — libraries, services, CLIs, daemons, data pipelines, notebooks, web APIs, test suites, build scripts, code generators, Python-backed plugins, C/Rust extension packages, and mixed-language repositories with Python surfaces. -Scope: libraries, services, CLIs, daemons, data pipelines, notebooks, web APIs, test suites, build scripts, code generators, Python-backed plugins, C/Rust extension packages, and mixed-language repositories with Python surfaces. +## 0. Scope and inheritance -Primary objective: produce Python that is explicit, typed where it matters, verifiable, maintainable, secure at boundaries, concurrency-safe, packaging-correct, and aligned with the repository's actual compatibility contract. +This protocol inherits the Universal Engineering Contract. The universal contract defines the meta-questions every change must answer — Truth, Evidence, Consequence, Invariant, Justification, Re-cueing — and frames the agent as a *transient theory-holder*. Apply the universal contract before any rule below; do not restate it here. -Optimize in this order: +This protocol adds Python- and packaging-specific content for which the universal contract is intentionally silent: dynamic typing discipline, async and free-threaded concurrency, packaging and environment management, dependency-graph behavior, framework boundaries, and the Python 3.13 runtime posture. + +**Primary objective:** produce Python that is explicit, typed where it matters, verifiable, maintainable, secure at boundaries, concurrency-safe, packaging-correct, and aligned with the repository's actual compatibility contract. + +**Optimization order:** ```text correctness → invariants → explicit contracts → observability → packaging compatibility → maintainability → performance where measured → terseness @@ -14,7 +21,24 @@ correctness → invariants → explicit contracts → observability → packagin Terseness loses to clarity. Dynamic convenience loses to explicit system boundaries. A passing import is not the finish line. A green test suite is not enough if the change weakens state ownership, API contracts, or failure evidence. -This protocol inherits `.codex/UNIVERSAL_ENGINEERING_CONTRACT.md`. Do not duplicate the universal contract here; apply it before all Python-specific rules. +### 0.1 Python 3.13 tacit gaps + +Per the Naurian frame, some theory the agent typically does not bring in cold and must surface rather than paper over. Watch especially for: + +- Whether deployed CPython is the default GIL build or the free-threaded experimental build. Code that races on free-threaded CPython will look fine on the default build and vice versa. +- Whether the experimental JIT is enabled in any deployed interpreter. Performance assumptions are deceptive without measurement on the actual binary. +- Which package manager actually owns the environment (uv, pip, Poetry, PDM, Hatch, conda) — `pip install` into a uv project, or uv into a Poetry project, can silently corrupt the lockfile contract. +- Whether `requires-python`, the CI matrix, the Dockerfile, the lockfile, and the `.python-version` agree. The actual minimum is the most restrictive across all of them; the contract is split across files. +- Whether `requirements.txt` is the source of truth or a derived artifact. Editing the wrong one is invisible until release. +- The annotation-evaluation policy: `from __future__ import annotations` (PEP 563), PEP 649 deferred evaluation, PEP 695 generic syntax — runtime tools like Pydantic, dataclasses, and frameworks that introspect annotations behave differently per file. +- Which type checker the repository uses. Pyright / basedpyright / mypy / pytype disagree on `TypedDict`, narrowing, generics, and overloads. "Passes the type checker" is not portable. +- Whether the deployed dependency set has Python 3.13 (and especially free-threaded) wheels for every C extension. A pure-Python upgrade can strand a C dep. +- Whether old code or copy-pasted recipes still reference modules removed in 3.13 (`cgi`, `crypt`, `imghdr`, `nntplib`, etc.). Agents trained pre-3.13 will still suggest them. +- Whether refactoring a class path will break stored pickles, ORM migrations, queue messages, or import-string config that expects the old fully-qualified name. +- Whether a new top-level import has side effects: connecting to a database, monkeypatching logging, mutating `sys.path`, registering signal handlers, or running plugin discovery. +- Whether `sys.path` resolution under editable installs, namespace packages, or `src/` layout matches what production sees. Test-runtime and production-runtime can diverge silently. + +Where the answer is not derivable from code, history, or conversation, surface the gap explicitly; do not assume the convenient answer. --- @@ -34,7 +58,8 @@ Inspect the relevant subset of: - runtime surface: web framework, ORM, async runtime, scheduler, worker queue, CLI framework, notebook/runtime environment, external APIs, databases, caches, message brokers, and config sources; - public API surface: imports, type stubs, protocols, entry points, CLI flags, HTTP routes, event schemas, model schemas, generated clients, and documented examples; - C/Rust/foreign extension surfaces, ABI policy, free-threaded compatibility, wheels, platform tags, and build isolation; -- repository verification commands and the exact CI gates that define success. +- repository verification commands and the exact CI gates that define success; +- the universal contract's six concerns (truth, evidence, consequence, invariant, justification, re-cueing) for the touched surface. Classify the touched Python surface before designing the change: @@ -46,44 +71,30 @@ Classify the touched Python surface before designing the change: - **Build/codegen/test tooling:** determinism, generated output, local/CI parity, and developer ergonomics are contracts. - **Extension package:** ABI, wheel tags, platform support, free-threaded behavior, and memory/thread safety are contracts. -Do not infer the baseline from a single file. In Python projects, compatibility truth is often split across packaging metadata, lock files, CI, docs, and release tooling. +Do not infer the baseline from a single file. In Python projects, compatibility truth is often split across packaging metadata, lock files, CI, docs, and release tooling (§0.1). --- ## 2. Change loop in Python terms -For every non-trivial change, apply the Universal Engineering Contract concretely. +For every non-trivial change, apply the universal contract concretely. ### 2.1 Minimum system map -Before editing, identify: - -```text -Truth: -- Source of truth for the relevant state, config, schema, model, dependency, generated artifact, cache, migration, or runtime value: -- Mutation paths: -- Derived/cached/generated copies: - -Evidence: -- Existing checks: unit/integration/property tests, type checks, lint, format, coverage, fixtures, logs, metrics, traces, CLI repros, notebooks, CI: -- Missing feedback worth adding: - -Consequence: -- Direct Python dependencies: imports, callers, subclasses, protocols, entry points, tests, generated clients, stubs: -- Indirect dependencies: serialization, CLI output, HTTP contracts, database schema, queues, cron jobs, docs, dashboards, support workflows: +Apply the universal contract §1 system map to the touched surface. Python-specific anchors for each concern: -Invariant: -- Type, domain, data, idempotency, authorization, concurrency, compatibility, or operational rule that must remain true: - -Preservation: -- Where the learned theory should live: type, test, docstring, module name, comment, migration note, docs, runbook, schema, config validation: -``` +- **Truth:** source of truth for the relevant state, config, schema, model, dependency, generated artifact, cache, migration, or runtime value; mutation paths; derived/cached/generated copies. +- **Evidence:** existing checks (unit/integration/property tests, type checks, lint, format, coverage, fixtures, logs, metrics, traces, CLI repros, notebooks, CI); missing feedback worth adding. +- **Consequence:** direct Python dependencies (imports, callers, subclasses, protocols, entry points, tests, generated clients, stubs); indirect (serialization, CLI output, HTTP contracts, database schema, queues, cron jobs, docs, dashboards, support workflows, **stored pickles and import-string config**). +- **Invariant:** type, domain, data, idempotency, authorization, concurrency, compatibility, or operational rule that must remain true. +- **Justification:** why each touched type, validation rule, dependency, async boundary, or feature flag is the way it is — and which are inherited rather than chosen. If the answer is not available, surface that gap. +- **Re-cueing:** where the learned theory belongs — type, test, docstring, module name, comment where the *why* is non-obvious, migration note, docs, runbook, schema, config validation. Flag the parts of the theory that cannot be written down, and who currently holds them. Keep the map lightweight for low-risk changes. Do not skip it for changes that touch state, public APIs, persistence, concurrency, packaging, security, or external contracts. ### 2.2 Red → Green → Refactor -For new behavior, start with the smallest failing proof: +Per universal contract §2. Python-typical "smallest failing proofs": - unit test; - integration test; @@ -101,7 +112,7 @@ Then make the smallest coherent implementation and immediately refactor until th ### 2.3 Narrow-to-wide verification -Work in small increments: +Per universal contract §2 and §7 *Feedback must match risk*. Work in small increments: 1. make one coherent change; 2. run the narrowest useful check, such as the targeted pytest node, module import, type-check target, or CLI repro; @@ -149,7 +160,7 @@ requires-python = ">=3.13" For existing projects: - do not raise `requires-python` without a concrete benefit and compatibility judgment; -- treat `requires-python`, CI Python matrices, lock files, Docker images, deployment runtimes, and docs as a single compatibility contract; +- treat `requires-python`, CI Python matrices, lock files, Docker images, deployment runtimes, and docs as a single compatibility contract — the actual minimum is the most restrictive across all of them (§0.1); - do not use Python 3.14+ syntax or APIs in a Python 3.13-baseline project unless guarded, backported, or explicitly allowed; - do not assume CPython-only behavior unless the repository declares CPython as part of the contract; - if PyPy, GraalPy, embedded Python, iOS, Android, or WASI support matters, verify behavior on that target or preserve target-specific guards. @@ -174,7 +185,7 @@ Do not use new features merely for novelty. Prefer them when they reduce ambigui ### 3.3 Experimental CPython features -Python 3.13 includes experimental implementation paths. Treat them as opt-in runtime targets, not assumptions. +Python 3.13 includes experimental implementation paths. Treat them as opt-in runtime targets, not assumptions. Adopting either of these is a deliberate posture decision; record the *justification* per universal contract §1.5 so the next reader can see why the cost was accepted. #### Free-threaded CPython @@ -201,7 +212,7 @@ Rules: ### 3.4 Removed Python 3.13 surfaces -Do not introduce dependencies on modules and APIs removed in Python 3.13. +Do not introduce dependencies on modules and APIs removed in Python 3.13. Watch for old code or copy-pasted recipes that still reference them (§0.1) — agents trained on pre-3.13 material will routinely suggest these. Removed legacy standard-library modules include: @@ -224,10 +235,11 @@ Violating these requires explicit repository policy or user authorization. - Never change public API shape without compatibility analysis. - Never change persisted data format, migration ordering, serialization keys, CLI output, error codes, route semantics, or environment-variable names without tracing downstream consumers. -- Never duplicate canonical contract facts across code, docs, tests, generated clients, schemas, or examples. +- Never duplicate canonical contract facts across code, docs, tests, generated clients, schemas, or examples (per universal contract §5). - Never edit generated files without editing the generator or canonical source unless the repository explicitly stores generated outputs as the source of truth. - Never weaken validation to make tests pass. - Never replace a failing proof with a weaker assertion unless the old assertion was wrong and the new one proves the real invariant. +- Never rename a class or move it across modules without checking pickled state, ORM model paths, queue payload class references, and import-string config (§0.1). ### 4.2 Type and dynamic-safety boundaries @@ -263,8 +275,8 @@ Violating these requires explicit repository policy or user authorization. - Never change dependency constraints or lock files without understanding direct, transitive, security, and deployment impact. - Never add a dependency when a small local function or existing dependency is enough. - Never vendor code without license, update, and security implications. -- Never mix package managers casually. Preserve the repository's canonical tool. -- Never claim a package is compatible with Python 3.13 unless tests/imports/builds verify the relevant dependency set. +- Never mix package managers casually. Preserve the repository's canonical tool — `pip install` into a uv project (or vice versa) silently corrupts the lockfile contract (§0.1). +- Never claim a package is compatible with Python 3.13 unless tests/imports/builds verify the relevant dependency set, including free-threaded wheel availability if relevant. --- @@ -290,7 +302,7 @@ Do not create a type merely to look enterprise. Every type must prevent misuse, ### 5.2 Type hints are contracts, not decoration -Use type hints to communicate real API contracts. +Use type hints to communicate real API contracts. Be aware that "passes the type checker" depends on which checker (§0.1) — pyright, mypy, basedpyright, pytype handle narrowing, generics, and overloads differently. Rules: @@ -300,7 +312,7 @@ Rules: - use `Literal` for small protocol strings only when the set is stable and public; - use `Final` and `ClassVar` where mutation semantics matter; - use `ReadOnly` for `TypedDict` items that callers must not mutate; -- keep annotations import-safe under the repository's chosen annotation policy; +- keep annotations import-safe under the repository's chosen annotation policy (PEP 563 / 649 / 695 differ; runtime tools like Pydantic care); - avoid runtime type introspection on annotations without understanding postponed annotation behavior and `typing.get_type_hints()` consequences. Avoid: @@ -377,11 +389,11 @@ Rules: - keep `__init__.py` exports deliberate and tested; - preserve package data and resources using `importlib.resources` rather than filesystem assumptions; - do not shadow standard-library or dependency module names; -- avoid side effects at import time except deliberate registration patterns. +- avoid side effects at import time except deliberate registration patterns. A new top-level import can connect to a database, register a signal handler, or run plugin discovery (§0.1). ### 6.3 Configuration is a contract -Configuration facts must have one canonical owner. +Per universal contract §5 (canonical ownership of contract facts). For Python, configuration facts that need a single owner include: environment variables, config keys, default values, schema, validation rules, and the names referenced in deployment manifests, docs, and tests. Rules: @@ -527,7 +539,7 @@ Rules: - never install into global Python for project work; - prefer `python -m ` when it avoids PATH ambiguity; - ensure local commands use the same interpreter and dependency group as CI; -- do not mix venvs, pyenv, uv, conda, Poetry, PDM, tox, and system Python without establishing which one is canonical; +- do not mix venvs, pyenv, uv, conda, Poetry, PDM, tox, and system Python without establishing which one is canonical (§0.1); - record new required environment variables, services, and system packages in the appropriate setup docs or runbook. ### 9.4 Wheels, extensions, and ABI @@ -593,11 +605,11 @@ For typed Python code: - do not weaken annotations to make checks pass; - add type tests or examples for generic public APIs; - keep `py.typed` and stubs synchronized; -- treat type-checker differences as tool contracts, not as runtime truth. +- treat type-checker differences as tool contracts, not as runtime truth (§0.1). ### 10.5 Required verification summary -For non-trivial work, report: +The universal contract §9 defines the cross-language output template. For Python work, augment with the verification block: ```text Verification: @@ -613,7 +625,9 @@ Do not claim a check passed unless it actually ran and passed. ## 11. Refactoring Python safely -### 11.1 Boy Scout + Mikado +The universal contract covers Boy Scout + Mikado discipline (§3), architecture as preserved theory (§4), and deletion-requires-proof (§8). Python-specific notes follow. + +### 11.1 Boy Scout, in Python When touching Python, leave the touched surface better: @@ -628,6 +642,8 @@ When touching Python, leave the touched surface better: - make validation central and explicit; - improve test coverage around real behavior. +Naur's "amorphous additions" warning applies particularly inside large Python codebases with framework magic, dynamic dispatch, and `Any`-leaking type boundaries: patches made without the type/import/configuration theory tend to grow `Any`, broad `try`/`except`, mutable defaults, and import-time side effects that quietly destroy the original contract. + Use Mikado sequencing for broader refactors: 1. identify target design; @@ -638,20 +654,21 @@ Use Mikado sequencing for broader refactors: Do not perform broad rewrites without executable evidence and a rollback path. -### 11.2 Deleting code +### 11.2 Deleting code (Python-specific surfaces) -Before deleting Python code, trace the blast radius: +Per universal contract §8. Python-specific blast-radius surfaces beyond the universal list: - static imports and references; - dynamic imports and plugin registrations; - entry points and console scripts; -- framework discovery patterns; +- framework discovery patterns (Django apps, pytest plugins, setuptools entry points, FastAPI dependencies); - test fixtures and monkeypatch targets; - docs, examples, generated clients, and stubs; -- serialized names, pickled paths, migration references, and config keys; +- **serialized names**: pickled paths, ORM model class paths, queue payload class references, import-string config (`module.Class` strings in YAML, env vars, `entry_points`); +- **migration references**: Django/Alembic migrations naming the class or column; - external user imports and SemVer commitments. -Deletion is safe only when the contract is gone, deprecated, or migrated and evidence proves no live dependency remains. +Deletion is safe only when the contract is gone, deprecated, or migrated and evidence proves no live dependency remains. A class rename in Python is a serialization migration, not a refactor (§0.1). ### 11.3 Generated code and migrations @@ -726,32 +743,28 @@ Python-specific documentation rules: - include version guards when behavior differs by Python version; - prefer small complete examples over fragments that hide setup; - document deprecations, migration paths, and public failure modes; -- use docstrings for local API semantics and AFAD-managed docs for broader contract theory. +- use docstrings for local API semantics and AFAD-managed docs for broader contract theory; +- inline comments are reserved for the *why* (per universal contract §1.5 Justification) — non-obvious invariants, safety constraints, framework gotchas, or external-contract reasons. Do not restate what the code says. Root `README.md` remains a storefront. Keep it human-first and link to detailed docs rather than turning it into a reference database. --- -## 14. Agent output checklist +## 14. Agent output for Python work -For non-trivial Python work, final output should include the relevant subset: +The universal contract §9 defines the cross-language output template (Truth, Evidence, Consequence, Invariant, Justification, Re-cueing). Use that template. For Python/packaging work, the following are typical Python-specific entries the next reader will need: ```text Python baseline: -- Interpreter/package baseline confirmed: +- Interpreter/package baseline confirmed (requires-python, CI matrix, Docker, lockfile agree?): - Packaging/build tool used: - -System map: -- Truth owner: -- Evidence added/used: -- Blast radius checked: -- Invariant preserved: -- Theory preserved in: +- Free-threaded / JIT posture, if relevant: Change summary: - Files changed: - Public API/config/schema/CLI behavior changed: - Dependencies or lock files changed: +- Any class renames affecting pickles, ORM paths, or import-string config: Verification: - Narrow checks: @@ -760,6 +773,7 @@ Verification: Risk: - Remaining compatibility, concurrency, packaging, or operational risk: +- Tacit gaps from §0.1 that this change did not close (and who would close them): ``` -Keep summaries proportional. Do not produce ceremony for a typo. Do not omit risk for changes that affect public contracts, persistence, packaging, concurrency, or security. +Keep summaries proportional. Do not produce ceremony for a typo. Do not omit risk for changes that affect public contracts, persistence, packaging, concurrency, or security. Per the universal contract, silence on justification gaps and inexpressible theory claims a theory the agent does not have. diff --git a/.codex/AGENTS_RUST195_CARGO.md b/.codex/AGENTS_RUST195_CARGO.md index a3f29744..4ddfcdd3 100644 --- a/.codex/AGENTS_RUST195_CARGO.md +++ b/.codex/AGENTS_RUST195_CARGO.md @@ -1,17 +1,41 @@ # Rust 1.95+ / Cargo Agent Protocol -This protocol governs agent work on Rust projects that target Rust 1.95 or newer and build with Cargo. +**Version:** 2.0.0 +**Updated:** 2026-04-27 +**Inherits:** [.codex/UNIVERSAL_ENGINEERING_CONTRACT.md](./UNIVERSAL_ENGINEERING_CONTRACT.md) v2.0.0+ +**Scope:** Rust projects targeting **Rust 1.95+** built with **Cargo** — libraries, services, CLIs, daemons, backends, systems tools, proc-macro crates, FFI crates, WebAssembly crates, embedded or `no_std` crates, Rust-backed desktop apps, and mixed-language repositories with Rust surfaces. -Scope: libraries, services, CLIs, daemons, backends, systems tools, proc-macro crates, FFI crates, WebAssembly crates, embedded or `no_std` crates, Rust-backed desktop apps, and mixed-language repositories with Rust surfaces. +## 0. Scope and inheritance -Primary objective: produce Rust that is sound, explicit, type-driven, verifiable, maintainable, secure at boundaries, and aligned with the repository's actual compatibility contract. +This protocol inherits the Universal Engineering Contract. The universal contract defines the meta-questions every change must answer — Truth, Evidence, Consequence, Invariant, Justification, Re-cueing — and frames the agent as a *transient theory-holder*. Apply the universal contract before any rule below; do not restate it here. -Optimize in this order: +This protocol adds Rust- and Cargo-specific content for which the universal contract is intentionally silent: ownership and borrowing model, `unsafe` and FFI safety contracts, Cargo feature graphs, edition/resolver/MSRV mechanics, async runtime ownership, and the verification ladder. -**soundness → invariants → ownership clarity → API compatibility → failure clarity → observability → performance where it matters → terseness** +**Primary objective:** produce Rust that is sound, explicit, type-driven, verifiable, maintainable, secure at boundaries, and aligned with the repository's actual compatibility contract. + +**Optimization order:** soundness → invariants → ownership clarity → API compatibility → failure clarity → observability → performance where it matters → terseness. Terseness loses to explicitness. Local convenience loses to correctness. Borrow-checker workarounds lose to a clear ownership model. Passing `cargo check` is not the finish line. +### 0.1 Rust 1.95 + Cargo tacit gaps + +Per the Naurian frame, some theory the agent typically does not bring in cold and must surface rather than paper over. Watch especially for: + +- Whether `rust-toolchain.toml` is actually being respected by the user's `cargo` invocation, and whether nightly is sneaking in via a dependency, an environment override, or a rustup default. +- Whether `package.rust-version` (MSRV) is a contract or decoration, and whether the code already silently exceeds it via a casually used stable feature or `if let` chain. +- That edition 2024 implies resolver `"3"`, but virtual workspaces do not inherit it. A "migrated" workspace whose root still uses resolver `"2"` is half-migrated. +- That Cargo features unify across the workspace. Adding a feature in one crate can change behavior in a sibling that depends on the same crate transitively. +- That `forbid(unsafe_code)` on the agent's crate does not extend to dependencies. The unsafe surface is the whole dep tree, not the local file. +- That `#[cfg(target_os = "...")]` arms not exercised in CI rot silently. A green workspace check on Linux says nothing about what compiles on Windows. +- That a transitive dependency can pull in an async runtime; a "runtime-agnostic" library may not be. +- That doc tests run in their own crate and do not see internal items, dev-dependencies, or test-only helpers without explicit setup. +- That `clippy::pedantic` / `nursery` / `restriction` lint groups change between releases. Denying them can silently break on the next toolchain bump. +- That Rust 1.95's newly stable tools (`cfg_select!`, `if let` guards, `Vec::push_mut`, `Atomic*::update`, `std::hint::cold_path`) may be undeployed *or* underused — the agent may avoid them as if still nightly, or use them where MSRV forbids. +- That edition 2024 denies references to `static mut` by default. Pre-existing `static mut` modules in the repo are invisible until touched. +- That `build.rs` shapes the build the agent never sees. Read it before assuming the build is hermetic. + +Where the answer is not derivable from code, history, or conversation, surface the gap explicitly; do not assume the convenient answer. + --- ## 1. Repository intake @@ -30,7 +54,8 @@ Always inspect the relevant subset of: - crate boundaries, public exports, module structure, trait definitions, and re-export surfaces; - `unsafe` blocks, `unsafe fn`, FFI boundaries, `extern` blocks, `repr(...)` types, global state, and manual memory management; - async runtime, thread ownership, channels, cancellation, shutdown, backpressure, and blocking boundaries; -- existing tests, doc tests, property tests, fuzz targets, Miri/Loom checks, benchmarks, CI, and project-specific verification commands. +- existing tests, doc tests, property tests, fuzz targets, Miri/Loom checks, benchmarks, CI, and project-specific verification commands; +- the universal contract's six concerns (truth, evidence, consequence, invariant, justification, re-cueing) for the touched surface. Classify the touched crate before designing the change: @@ -46,38 +71,24 @@ Do not assume repository state. Verify it. ## 2. Change loop -For every non-trivial change, apply the Universal Engineering Contract concretely in Rust terms. +For every non-trivial change, apply the universal contract concretely in Rust terms. ### 2.1 Minimum system map -Before editing, identify: - -```text -Truth: -- Source of truth for the relevant state, config, schema, generated artifact, feature flag, or protocol value: -- Mutation paths: -- Derived/cached/generated copies: - -Evidence: -- Existing checks: cargo check/test/doc/clippy/fmt, contract tests, integration tests, property tests, fuzz/Miri/Loom, CI: -- Missing feedback worth adding: - -Consequence: -- Direct Rust dependencies: callers, trait impls, re-exports, features, cfg arms, tests: -- Indirect dependencies: serialization, FFI, generated code, build scripts, CLI output, docs, dashboards, human workflows: +Apply the universal contract §1 system map to the touched surface. Rust-specific anchors for each concern: -Invariant: -- Type, ownership, concurrency, memory-safety, protocol, or compatibility rule that must remain true: - -Preservation: -- Where the learned theory should live: type, test, rustdoc, safety comment, module name, build check, generated artifact, README, runbook: -``` +- **Truth:** source of truth for the relevant state, config, schema, generated artifact, feature flag, or protocol value; mutation paths; derived/cached/generated copies (bindgen, prost, sqlx, build-script outputs). +- **Evidence:** existing checks (`cargo check`/`test`/`doc`/`clippy`/`fmt`, contract tests, integration tests, property tests, fuzz/Miri/Loom, CI); missing feedback worth adding. +- **Consequence:** direct Rust dependencies (callers, trait impls, re-exports, features, cfg arms, tests); indirect (serialization, FFI, generated code, build scripts, CLI output, docs, dashboards, human workflows). +- **Invariant:** type, ownership, concurrency, memory-safety, protocol, or compatibility rule that must remain true. +- **Justification:** why each touched type, lifetime, trait bound, feature, and `unsafe` block is the way it is — and which are inherited rather than chosen. If the answer is not available, surface that gap. +- **Re-cueing:** where the learned theory should live — type, test, rustdoc, `SAFETY:` comment, module name, build check, generated artifact, README, runbook. Flag the parts of the theory that cannot be written down, and who currently holds them. Keep the map lightweight. For trivial changes, do not turn it into ceremony. For risky changes, do not skip it. ### 2.2 Red → Green → Refactor -For new behavior, start with the smallest failing proof: +Per universal contract §2. Rust-typical "smallest failing proofs": - unit test; - integration test; @@ -102,7 +113,7 @@ Work in small increments: 5. rerun the narrow check; 6. widen verification only after local shape is sound. -Do not pile up cascading errors and try to reason about all of them at once. +Do not pile up cascading errors and try to reason about all of them at once. The Rust compiler is the cheapest theory-checker available; use it one error at a time. ### 2.4 Root-cause fixes only @@ -145,7 +156,7 @@ For existing crates: - preserve the existing edition unless the task is an edition migration or the repository clearly standardizes on Rust 2024; - if moving to edition 2024, run the appropriate migration checks, then manually review semantics rather than treating `cargo fix --edition` output as design guidance. -Nightly is allowed only when the repository already pins nightly or the task explicitly requires an unstable capability. Nightly use must be isolated, named, justified, and wired consistently in local verification and CI. +Nightly is allowed only when the repository already pins nightly or the task explicitly requires an unstable capability. Nightly use must be isolated, named, justified (per universal contract §1.5), and wired consistently in local verification and CI. ### 3.2 Rust 2024 expectations @@ -154,14 +165,14 @@ When using edition 2024, account for the edition's safety and semantics changes: - `unsafe_op_in_unsafe_fn` warns by default; keep explicit `unsafe {}` blocks inside `unsafe fn`. - `extern` blocks require `unsafe`. - `export_name`, `link_section`, and `no_mangle` require unsafe attributes. -- references to `static mut` are denied by default; redesign around atomics, locks, `OnceLock`, or other safe state owners. +- references to `static mut` are denied by default; redesign around atomics, locks, `OnceLock`, or other safe state owners. Pre-existing `static mut` modules elsewhere in the repo (§0.1) become invisible MSRV/edition tripwires. - `std::env::set_var`, `std::env::remove_var`, and Unix `CommandExt::before_exec` are unsafe; avoid mutating process environment after concurrency begins. - `Future` and `IntoFuture` are in the prelude; avoid redundant imports unless they improve local readability. - migration fixes are conservative. Review temporary lifetime changes, macro fragment changes, and never-type fallback implications deliberately. ### 3.3 Rust 1.95 language and library posture -Rust 1.95 adds useful stable tools. Use them when they make the code clearer, not merely because they are new. +Rust 1.95 adds useful stable tools. Use them when they make the code clearer, not merely because they are new. Do not avoid them as if still nightly. - Prefer `cfg_select!` for readable compile-time configuration selection when the repository baseline is Rust 1.95+ and the pattern would otherwise need ad hoc `#[cfg]` branching or the `cfg-if` crate. - Use `if let` guards in `match` arms when they make pattern-dependent conditions clearer. Remember that these guards do not contribute to exhaustiveness; the remaining arms must still handle all cases. @@ -192,7 +203,7 @@ all = "warn" pedantic = "warn" ``` -Do not enable noisy lint groups blindly in existing repositories. Match the repository's tolerance for warnings, then strengthen locally when it improves correctness and maintainability. +Do not enable noisy lint groups blindly in existing repositories. Match the repository's tolerance for warnings, then strengthen locally when it improves correctness and maintainability. `pedantic`, `nursery`, and `restriction` evolve between toolchains (§0.1); pinning them to `deny` is a maintenance commitment. --- @@ -207,7 +218,7 @@ Rules: - no unused dependencies; - no invented crate names, versions, or feature flags; - no accidental default-feature sprawl; -- no duplicated package metadata where the workspace is the canonical owner; +- no duplicated package metadata where the workspace is the canonical owner (per universal contract §5); - no path/git/registry dependency changes without compatibility and supply-chain judgment; - no feature or dependency edits without checking the feature graph and build impact; - no build-script side effects without explicit `cargo::rerun-if-*` discipline. @@ -219,9 +230,9 @@ Before modifying dependencies, verify actual crate versions and feature names th Cargo resolver behavior is part of the compatibility contract. - Edition 2024 implies resolver `"3"`, which uses Rust-version-aware dependency resolution. -- In virtual workspaces, set `resolver = "3"` explicitly at the workspace root when the workspace intends Rust 2024 resolver behavior. +- In virtual workspaces, set `resolver = "3"` explicitly at the workspace root when the workspace intends Rust 2024 resolver behavior — workspaces do not inherit resolver from member editions (§0.1). - `package.rust-version` is an MSRV contract, not decoration. -- Do not run `cargo update` casually in published libraries or applications with locked dependency expectations. +- Do not run `cargo update` casually in published libraries or applications with locked dependency expectations. Treat the lockfile delta as the consequence to inspect, not a side effect. - If a dependency upgrade raises MSRV, surface it explicitly and decide whether that is acceptable. ### 4.3 Feature discipline @@ -245,7 +256,7 @@ Do not use features to: - create untested combinatorial explosions; - make a dependency optional only in the manifest while code still assumes it exists. -If feature combinations matter, verify them with the repository's feature-matrix tool or add one. `cargo hack` is appropriate when the repository already uses it or the feature matrix is non-trivial. +Feature unification is global within a build (§0.1). Adding a feature to one workspace member can enable it transitively in siblings that share the same dependency. If feature combinations matter, verify them with the repository's feature-matrix tool or add one. `cargo hack` is appropriate when the repository already uses it or the feature matrix is non-trivial. ### 4.4 Lockfiles @@ -270,6 +281,8 @@ When touching generated code: - do not hand-edit generated output unless the repository explicitly treats it as source; - verify that checked-in generated artifacts and source inputs are not drifting. +Read `build.rs` before assuming the build is hermetic (§0.1). Whatever it does shapes everything downstream and is invisible from the source tree. + --- ## 5. Type, API, and domain modeling @@ -363,6 +376,8 @@ If a crate needs unsafe, require: #![deny(unsafe_op_in_unsafe_fn)] ``` +`forbid(unsafe_code)` does not extend to dependencies (§0.1). The unsafe surface is the whole tree. + ### 6.2 Unsafe block contract Every unsafe block must be small and must have a nearby `SAFETY:` explanation covering: @@ -372,7 +387,7 @@ Every unsafe block must be small and must have a nearby `SAFETY:` explanation co - who maintains it in the future; - what would make it invalid. -Do not write vague safety comments such as "caller guarantees this" unless the caller contract is also expressed in the function signature and rustdoc. +Do not write vague safety comments such as "caller guarantees this" unless the caller contract is also expressed in the function signature and rustdoc. The `SAFETY:` comment is a primary re-cueing surface (per universal contract §1.6) — it is often the only place the relevant theory can be written down. ### 6.3 Unsafe functions @@ -416,6 +431,7 @@ Do not add an async runtime casually. - Libraries should usually expose async functions without constructing a runtime internally. - Runtime choice is a contract when it appears in public types, features, or docs. - Do not block inside async tasks unless using an explicit blocking boundary such as `spawn_blocking`. +- A "runtime-agnostic" library may not be runtime-agnostic transitively (§0.1). Verify the dep tree. ### 7.2 Task lifecycle @@ -447,7 +463,7 @@ For async code, identify what happens when a future is dropped. ### 7.5 Testing concurrency -For concurrency-sensitive code, ordinary tests are often insufficient. Use the strongest practical feedback: +For concurrency-sensitive code, ordinary tests are often insufficient. Use the strongest practical feedback (per universal contract §7, *Feedback must match risk*): - Loom for interleaving-sensitive synchronization logic; - Miri for undefined behavior and aliasing-sensitive unsafe code; @@ -476,7 +492,7 @@ For CLIs and process integration: - exit codes are contracts; - stdout/stderr separation is a contract; - human output and machine-readable output should not be casually mixed; -- environment variables and config keys must have canonical owners; +- environment variables and config keys must have canonical owners (per universal contract §5); - secrets must not appear in logs, panic messages, debug output, or error chains. ### 8.3 Configuration and platform gates @@ -487,7 +503,7 @@ Configuration facts must be canonical. - Validate config once, early, and explicitly. - Use `cfg_select!`, `#[cfg]`, and target-specific dependencies deliberately. - Do not duplicate platform names, feature names, environment variable names, or protocol constants across code and docs. -- Test platform-specific code paths where feasible. If not feasible locally, preserve the verification story in CI or docs. +- Test platform-specific code paths where feasible. Cfg-gated arms not exercised in CI rot silently (§0.1); if not feasible locally, preserve the verification story in CI or docs. ### 8.4 Observability @@ -546,7 +562,7 @@ Use stronger test forms when ordinary examples miss the risk: ### 9.4 Rustdoc and examples -Rustdoc is executable documentation when examples are doc tests. +Rustdoc is executable documentation when examples are doc tests. Doc tests run in their own crate (§0.1) — examples must work with only public API and documented setup. Public APIs should document: @@ -564,6 +580,8 @@ Do not write examples that require unstated global state, network availability, ## 10. Refactoring, deletion, and module design +The universal contract covers Boy Scout + Mikado discipline (§3), architecture as preserved theory (§4), and deletion-requires-proof (§8). Rust-specific notes follow. + ### 10.1 Coherent repair When a local patch exposes an incoherent module boundary, type model, or feature contract, fix the smallest coherent area rather than stacking workarounds. @@ -576,6 +594,8 @@ Examples of coherent repair: - extract a module when a file mixes unrelated responsibilities; - collapse a trait that has only one implementation and no current abstraction value. +Naur's "amorphous additions" warning applies particularly inside crates with deep trait hierarchies and feature graphs: patches made without the type/feature theory tend to grow workarounds (`.clone()`, broader `Send`/`Sync` bounds, `Arc>` wrappers) that quietly destroy the original ownership shape. + ### 10.2 Compatibility-aware refactoring Refactor private/internal code aggressively when evidence stays green. Refactor public or published surfaces deliberately. @@ -602,13 +622,12 @@ Refactor: Extraction must improve cohesion, not merely reduce line count. -### 10.4 Safe deletion +### 10.4 Safe deletion (Rust-specific surfaces) -Before deleting Rust code, check: +Per universal contract §8. Rust-specific blast-radius surfaces beyond the universal list: -- direct references with search and compiler feedback; - public exports and downstream API implications; -- feature-gated or cfg-gated references; +- feature-gated or cfg-gated references (cfg-gated references that are silently dead on this host but live elsewhere — §0.1); - proc macro or generated references; - serialization formats and stored data; - FFI symbols, `no_mangle`, exported names, and linker scripts; @@ -666,9 +685,9 @@ For public API crates: ### 12.2 Comments -Comments should explain non-obvious invariants, safety, compatibility, or operational constraints. Do not comment what the code already says. +Comments should explain non-obvious invariants, safety, compatibility, or operational constraints — i.e., the *why* (per universal contract §1.5 Justification) that cannot be read off the code. Do not comment what the code already says. -Good comments explain why a seemingly simpler change is wrong, where an invariant is maintained, or what external contract constrains the implementation. +Good comments explain why a seemingly simpler change is wrong, where an invariant is maintained, or what external contract constrains the implementation. `SAFETY:` blocks are a primary re-cueing surface (§1.6) and deserve more care than ordinary comments. ### 12.3 Self-containment @@ -688,7 +707,7 @@ Agent directive files are operational instructions for agents. Code and docs mus ## 13. Incidental observation protocol -When reading a file surfaces a defect, rule violation, or clear improvement opportunity unrelated to the active task, record it in the project's designated observation log and continue the active task. +When reading a file surfaces a defect, rule violation, or clear improvement opportunity unrelated to the active task, record it in the project's designated observation log and continue the active task. This is the Rust-side practice for honoring the universal contract's rule that the next improvement is a separate slice (§10). Do not fix unrelated observations in the current change unless they are prerequisites for correctness. Do not interrupt the workflow to discuss every incidental finding. @@ -710,15 +729,7 @@ If the project has no observation log, include only high-value observations in t ## 14. Pre-output checklist -Run this before declaring completion. - -### System theory - -- Truth: is the source of truth identified and changed at the right layer? -- Evidence: did you add or run feedback proportional to risk? -- Consequence: did you trace direct and indirect blast radius? -- Invariant: is the important invariant protected by type, test, assertion, or documented contract? -- Preservation: did important theory land somewhere durable? +The universal contract §10 (stop conditions) and §9 (output contract) define the cross-language stops. The checks below are Rust/Cargo-specific additions; do not duplicate the universal output template here. ### Rust semantics @@ -731,15 +742,16 @@ Run this before declaring completion. ### Cargo and features - Are edition, resolver, MSRV, and feature changes deliberate? -- Are dependency versions and feature names verified? +- Are dependency versions and feature names verified, not invented? - Are features additive and tested where meaningful? - Did lockfile changes happen only when justified? - Are generated artifacts and build scripts in sync with their canonical inputs? +- Did you check whether feature unification affects sibling crates? ### Unsafe and concurrency - Is unsafe absent where unnecessary? -- Does every unsafe block or unsafe function have a real safety contract? +- Does every unsafe block or unsafe function have a real `SAFETY:` contract? - Are task lifecycles, cancellation, blocking, locks, channels, and shutdown paths explicit? - Are atomic orderings justified? - Did you avoid global mutable state or give it a clear owner? @@ -748,6 +760,6 @@ Run this before declaring completion. - Did the narrow relevant check pass? - Did verification widen when the change widened? -- Are formatting, linting, tests, doc tests, or stronger tools run as appropriate? +- Are formatting, linting, tests, doc tests, or stronger tools (Miri, Loom, fuzz) run as appropriate? - Are remaining failures unrelated and explicitly stated? - Is the touched Rust surface clearer and easier to change than before? diff --git a/.codex/AGENTS_SQLITE3MC233_SQLITE353.md b/.codex/AGENTS_SQLITE3MC233_SQLITE353.md index 54ea21e7..a738f233 100644 --- a/.codex/AGENTS_SQLITE3MC233_SQLITE353.md +++ b/.codex/AGENTS_SQLITE3MC233_SQLITE353.md @@ -1,12 +1,19 @@ # SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0 Agent Protocol -This protocol governs agent work on projects that build, vendor, link, wrap, configure, distribute, test, or operate **SQLite3 Multiple Ciphers 2.3.3**, based on **SQLite 3.53.0**. +**Version:** 2.0.0 +**Updated:** 2026-04-27 +**Inherits:** [.codex/UNIVERSAL_ENGINEERING_CONTRACT.md](./UNIVERSAL_ENGINEERING_CONTRACT.md) v2.0.0+ +**Scope:** projects that build, vendor, link, wrap, configure, distribute, test, or operate **SQLite3 Multiple Ciphers 2.3.3**, based on **SQLite 3.53.0**. Includes C and C++ integrations, amalgamation builds, static or shared library packaging, embedded applications, CLIs, services, language bindings, JNI/JNA, Python/Rust/Node/.NET/Java/Kotlin wrappers, SQL migrations, encrypted database files, PRAGMA/URI configuration, key and rekey flows, backups, WAL/journal behavior, build flags, and cross-platform distribution. -Scope: C and C++ integrations, amalgamation builds, static or shared library packaging, embedded applications, CLIs, services, language bindings, JNI/JNA, Python/Rust/Node/.NET/Java/Kotlin wrappers, SQL migrations, encrypted database files, PRAGMA/URI configuration, key and rekey flows, backups, WAL/journal behavior, build flags, and cross-platform distribution. +## 0. Scope and inheritance -Primary objective: preserve data integrity, encryption correctness, key safety, SQLite compatibility, build reproducibility, and clear ownership of database/file-format contracts. +This protocol inherits the Universal Engineering Contract. The universal contract defines the meta-questions every change must answer — Truth, Evidence, Consequence, Invariant, Justification, Re-cueing — and frames the agent as a *transient theory-holder*. Apply the universal contract before any rule below; do not restate it here. When SQLite3MC is used from Java, Kotlin, Python, Rust, C, C++, or another runtime, apply this protocol in addition to the relevant language protocol. -Optimize in this order: +This protocol adds SQLite3MC- and SQLite-specific content for which the universal contract is intentionally silent: cipher and key lifecycle, file-format state, native-library identity across compile and runtime, SQL/SQLite version compatibility, FFI safety, and the at-rest encryption boundary. + +**Primary objective:** preserve data integrity, encryption correctness, key safety, SQLite compatibility, build reproducibility, and clear ownership of database/file-format contracts. + +**Optimization order:** ```text data integrity → key safety → cipher/file-format compatibility → source-of-truth clarity → portability → observability without leakage → performance where measured → terseness @@ -14,7 +21,20 @@ data integrity → key safety → cipher/file-format compatibility → source-of Convenience loses to data safety. Local build success loses to runtime link correctness. Encryption that is not tested as encryption is not verified. A wrapper API that hides key ownership, cipher selection, or migration behavior is not finished. -This protocol inherits `.codex/UNIVERSAL_ENGINEERING_CONTRACT.md`. Apply the universal Truth / Evidence / Consequence / Invariant / Preservation map before all SQLite3MC-specific rules. When SQLite3MC is used from Java, Kotlin, Python, Rust, C, C++, or another runtime, apply this protocol in addition to the relevant language protocol. +### 0.1 SQLite3MC + SQLite 3.53 tacit gaps + +Per the Naurian frame, some theory the agent typically does not bring in cold and must surface rather than paper over. Watch especially for: + +- Whether the headers at compile time, the static or shared library linked at build time, and the dynamic library actually loaded at runtime are the same SQLite3MC version. A single file will not answer this; the agent must verify across phases. +- Whether the application actually loads SQLite3MC at runtime, or quietly resolves to a system SQLite. "Drop-in replacement" is a code property, not a runtime guarantee. +- Whether encrypted-database test fixtures reflect production cipher, KDF, page size, and reserve-byte settings — or were created with default settings and so prove nothing about the deployed format. +- Whether keys ever appear in URIs, `ATTACH ... KEY` statements, `PRAGMA key`/`rekey`, debug captures, query logs, crash reports, shell history, or process listings. Every one of these is a real production leak class. +- Whether `TEMP` tables, in-memory databases, or bytes 16–23 of the database file are inside or outside the threat model. The encryption boundary is non-obvious and easy to assume away. +- Whether old SQLCipher, sqleet, or SQLite Encryption Extension conventions still inform the codebase. SQLite3MC is API-compatible in many places but is not identical, and copy-pasted SQLCipher recipes can silently drift. +- Whether SQLite 3.52.0 (withdrawn upstream) is still pinned anywhere as a fallback baseline. +- Whether the secure cipher-state nullification path that distinguishes SQLite3MC 2.3.3 from older releases is still intact. It looks redundant; removing it is a security regression. + +Where the answer is not derivable from code, history, or conversation, surface the gap explicitly; do not assume the convenient answer. --- @@ -35,7 +55,8 @@ Inspect the relevant subset of: - database lifecycle: initial creation, open, authentication/keying, migration, attach/detach, backup, restore, VACUUM, WAL checkpointing, rekeying, decryption, compaction, corruption handling, and deletion; - file-format assumptions: cipher scheme, page size, reserve bytes, plaintext header policy, KDF settings, legacy compatibility, `user_version`, schema migrations, and database compatibility fixtures; - journaling and temp behavior: rollback journal, WAL, shared memory files, temporary tables, in-memory databases, temp-store configuration, and file-permission policy; -- tests and evidence: encrypted fixture files, wrong-key tests, rekey tests, migration tests, cross-platform CI, sanitizer runs, Valgrind, fuzzers, SQL logic tests, and production observability. +- tests and evidence: encrypted fixture files, wrong-key tests, rekey tests, migration tests, cross-platform CI, sanitizer runs, Valgrind, fuzzers, SQL logic tests, and production observability; +- the universal contract's six concerns (truth, evidence, consequence, invariant, justification, re-cueing) for the touched surface. Classify the touched surface before designing the change: @@ -54,35 +75,20 @@ Do not infer SQLite3MC behavior from ordinary SQLite alone. SQLite3MC is intenti ### 2.1 Minimum system map -For every non-trivial SQLite3MC change, identify: +For every non-trivial SQLite3MC change, apply the universal contract §1 system map (Truth / Evidence / Consequence / Invariant / Justification / Re-cueing) to the touched surface. SQLite3MC-specific anchors for each concern: -```text -Truth: -- Canonical owner of SQLite3MC version, SQLite source version, compile options, default cipher, legacy flags, and binding/runtime package version: -- Source of truth for key material and key lifecycle: -- Source of truth for database schema, migrations, cipher configuration, page format, and fixtures: -- Derived/generated copies: amalgamation, headers, wrappers, package metadata, docs, CI images, lock files: - -Evidence: -- Checks proving native build correctness, runtime link correctness, encryption roundtrip, wrong-key failure, rekey, migration, backup/restore, and language binding behavior: -- Missing feedback worth adding: - -Consequence: -- Direct dependencies: callers, wrappers, SQL scripts, migrations, bindings, tests, packaging, CLI tools, deployment images: -- Indirect dependencies: stored database files, backups, restore tools, support workflows, monitoring, user data, compliance, release process: - -Invariant: -- Data, encryption, file-format, key-safety, ABI/API, migration, or compatibility rule that must remain true: - -Preservation: -- Where learned theory belongs: build manifest, test fixture, migration note, wrapper API, safety comment, runbook, AFAD-managed doc, release checklist, CI assertion: -``` +- **Truth:** canonical owner of SQLite3MC version, SQLite source version, compile options, default cipher, legacy flags, binding/runtime package version; canonical owner of key material and key lifecycle; canonical owner of database schema, migrations, cipher configuration, page format, and fixtures; derived/generated copies (amalgamation, headers, wrappers, package metadata, docs, CI images, lock files). +- **Evidence:** native build correctness, runtime link correctness, encryption roundtrip, wrong-key failure, rekey, migration, backup/restore, language binding behavior; missing feedback worth adding. +- **Consequence:** direct (callers, wrappers, SQL scripts, migrations, bindings, tests, packaging, CLI tools, deployment images); indirect (stored database files, backups, restore tools, support workflows, monitoring, user data, compliance, release process). +- **Invariant:** data, encryption, file-format, key-safety, ABI/API, migration, or compatibility rule that must remain true. +- **Justification:** why each cipher / page-size / KDF / legacy-mode choice is the way it is, and which are inherited rather than deliberately chosen. If the answer is not available, surface that gap. +- **Re-cueing:** where the learned theory belongs — build manifest, test fixture, migration note, wrapper API, safety comment, runbook, AFAD-managed doc, release checklist, CI assertion. Flag the parts of the theory that cannot be written down, and who currently holds them. Keep this lightweight for low-risk edits. Do not skip it for changes that affect encryption, persisted files, build flags, runtime linking, migrations, or key handling. ### 2.2 Red → Green → Refactor -For new behavior or bug fixes, start with the smallest failing proof: +Per universal contract §2. SQLite3MC-typical "smallest failing proofs": - encrypted open/read/write roundtrip; - wrong-key rejection; @@ -100,25 +106,22 @@ Then make the smallest coherent implementation and immediately refactor until th ### 2.3 Narrow-to-wide verification -Work in small increments: - -1. make one coherent change; -2. run the narrowest relevant check, such as a native build target, one integration test, a single binding test, or one fixture migration; -3. inspect the first real failure; -4. fix the root cause; -5. rerun the narrow check; -6. widen to repository-required verification before completion. - -For SQLite3MC, widening usually means verifying both compile-time and runtime facts: the code compiled against the intended headers and also loaded the intended library at runtime. +Per universal contract §2 and §7 (Feedback must match risk). For SQLite3MC, widening usually means verifying both compile-time and runtime facts: the code compiled against the intended headers and also loaded the intended library at runtime. The two are independent; a green compile-time check does not prove the runtime answer. ### 2.4 Root-cause fixes only -When verification fails: +Per universal contract §0 ("the agent must not paper over what it does not have") and §2 (read the actual failure). When verification fails, distinguish among SQLite3MC-specific root causes: -- read the actual SQLite error code, extended error code, native build diagnostic, linker output, sanitizer report, migration diff, or fixture mismatch; -- determine whether the root is key timing, wrong cipher configuration, stale generated source, mixed headers/library, runtime library shadowing, unsupported SQL, file permissions, WAL/journal mode, platform target, or actual corruption; -- fix that cause; -- preserve the failing proof if it guards real data safety or compatibility. +- key timing, +- wrong cipher configuration, +- stale generated source, +- mixed headers/library, +- runtime library shadowing, +- unsupported SQL, +- file permissions, +- WAL/journal mode, +- platform target, +- actual corruption. Do not: @@ -145,9 +148,9 @@ Underlying SQLite: 3.53.0 Use the repository's pinned version when it is more specific. Do not upgrade or downgrade SQLite3MC without a compatibility judgment, migration-risk assessment, and verification plan. -SQLite3MC 2.3.3 includes the upstream SQLite 3.53.0 baseline and fixes secure nullification of cipher data structures on freeing. Treat any edit around cipher state cleanup as security-sensitive. Do not remove zeroization, nullification, or cleanup paths because they look redundant. +SQLite3MC 2.3.3 includes the upstream SQLite 3.53.0 baseline and fixes secure nullification of cipher data structures on freeing. Treat any edit around cipher state cleanup as security-sensitive. Do not remove zeroization, nullification, or cleanup paths because they look redundant — this is exactly the kind of code where Naur's "amorphous additions" warning bites in reverse. -SQLite 3.53.0 includes a fix for the WAL-reset database corruption bug. Do not downgrade to a pre-fix SQLite baseline without explicitly accepting the risk and preserving a reason. +SQLite 3.53.0 includes a fix for the WAL-reset database corruption bug. Do not downgrade to a pre-fix SQLite baseline without explicitly accepting the risk and recording the justification (per universal contract §1.5). ### 3.2 SQLite 3.53.0 feature posture @@ -168,7 +171,7 @@ Do not write code or migrations that silently require 3.53.0 if production, test ### 3.3 SQLite 3.52 warning -SQLite 3.52.0 was withdrawn upstream. Do not select SQLite3MC 2.3.0 / SQLite 3.52.0 as a fallback baseline. If a repository already contains that version, surface the issue and prefer moving to SQLite3MC 2.3.3 or a project-approved fixed baseline. +SQLite 3.52.0 was withdrawn upstream. Do not select SQLite3MC 2.3.0 / SQLite 3.52.0 as a fallback baseline. If a repository already contains that version (see §0.1), surface the issue and prefer moving to SQLite3MC 2.3.3 or a project-approved fixed baseline. --- @@ -176,7 +179,7 @@ SQLite 3.52.0 was withdrawn upstream. Do not select SQLite3MC 2.3.0 / SQLite 3.5 ### 4.1 One owner for version and build facts -SQLite3MC version, SQLite source version, release tag, commit hash, checksums, compile flags, enabled extensions, default cipher, legacy options, and platform artifact versions must have one canonical owner. +Per universal contract §5 (canonical ownership of contract facts). For SQLite3MC, the contract facts that need a single owner include: SQLite3MC version, SQLite source version, release tag, commit hash, checksums, compile flags, enabled extensions, default cipher, legacy options, and platform artifact versions. Acceptable owners include: @@ -214,7 +217,7 @@ The following must agree unless the repository has an explicit compatibility shi - compile-option observations such as `PRAGMA compile_options` or `sqlite3_compileoption_get()`; - language-binding reported versions. -A common failure mode is compiling against the intended SQLite3MC headers while loading a system SQLite library at runtime. Always verify runtime identity when touching packaging, dynamic linking, containers, or language bindings. +A common failure mode — and the headline tacit gap from §0.1 — is compiling against the intended SQLite3MC headers while loading a system SQLite library at runtime. Always verify runtime identity when touching packaging, dynamic linking, containers, or language bindings. --- @@ -333,6 +336,8 @@ For new development, do not choose AES-CBC-without-HMAC or RC4 unless the task i Cipher configuration is file-format state. Changing cipher scheme, KDF parameters, page size, reserve bytes, plaintext header behavior, or legacy mode requires migration tests using real fixtures. +Per universal contract §1.5 (Justification), record *why* the cipher, KDF, and page-format choice is the way it is — threat model, performance budget, legacy compatibility, regulatory constraint, or inherited default. A choice without a recorded reason cannot be safely re-evaluated by the next reader. + ### 6.4 Rekey and cipher migration Rekeying is a data migration, not a simple settings edit. @@ -619,7 +624,7 @@ SQLite3MC protects database contents at rest under defined assumptions. It does - keys stored beside the database; - compromised application users or compromised hosts. -State the real threat model when changing encryption behavior. +State the real threat model when changing encryption behavior. The threat model is itself theory in Naur's sense — usually held by a security stakeholder, often not in the diff. Where the agent is acting without it, surface the gap (per universal contract §0). ### 11.2 Secret redaction @@ -688,9 +693,7 @@ Do not log full SQL statements if they can include keys or sensitive data. If qu ## 13. Deletion and blast-radius rules -Before deleting or replacing any SQLite3MC component, prove the blast radius. - -Check: +Per universal contract §8 (deletion and simplification require proof). SQLite3MC-specific blast-radius surfaces beyond the universal list: - native source files and generated amalgamation paths; - headers and exported symbols; @@ -702,15 +705,15 @@ Check: - docs, examples, runbooks, and release checklists; - production data files and backups that may require legacy cipher support. -Removing a cipher, compile option, wrapper method, or legacy compatibility flag can strand existing encrypted databases. Treat such deletion as a data-migration decision, not cleanup. +Removing a cipher, compile option, wrapper method, or legacy compatibility flag can strand existing encrypted databases. Treat such deletion as a data-migration decision, not cleanup. Naur's "amorphous additions" warning applies in reverse here: a deletion made without the cipher/file-format theory destroys structure that *looks* redundant but is in fact load-bearing for some existing on-disk file the agent has never seen. --- -## 14. Documentation and preservation +## 14. Documentation and re-cueing Use `.codex/PROTOCOL_AFAD.md` for docs that describe SQLite3MC integration, public APIs, migrations, operational procedures, or code/documentation synchronization. -Preserve system theory in the smallest durable place: +Per universal contract §1.6 (re-cueing), preserve the cues that let the next reader rebuild the relevant slice of theory. SQLite3MC-specific homes for those cues: - version/build facts in the canonical dependency manifest; - cipher choices and migration rationale in migration notes or AFAD-managed docs; @@ -720,24 +723,26 @@ Preserve system theory in the smallest durable place: - compatibility fixtures in tests; - operational recovery in runbooks. +Theory the agent could not write down — production threat model nuance, why a particular legacy fixture exists, who chose the current KDF settings, what historical incident led to a defensive zeroization path — should be flagged as a known re-cueing gap so the next reader knows where to ask. Do not pretend an artifact transfers a theory it can only re-cue. + The repository root `README.md` remains a storefront. It may mention that the project supports encrypted SQLite, but detailed cipher configuration, key management, and migration mechanics belong in deeper docs. --- ## 15. Completion checklist -Before declaring a SQLite3MC-related change complete, answer: +The universal contract §10 (stop conditions) covers the cross-language stops, and §9 defines the agent output template. The checks below are SQLite3MC-specific additions; do not duplicate the universal output template here. ```text Baseline: -- Did I verify the intended SQLite3MC and SQLite versions at build time and runtime? +- Did I verify the intended SQLite3MC and SQLite versions at build time AND runtime? Truth: - Did I preserve one canonical owner for version, compile options, cipher defaults, key lifecycle, and migration state? Evidence: -- Did I run the narrow and required broad checks? - For encryption changes, did I prove correct-key success, wrong-key failure, and absence of obvious plaintext leakage? +- Did I verify against real encrypted fixtures, not only freshly created scratch databases? Consequence: - Did I trace packaging, linking, language bindings, stored files, backups, and support tools? @@ -745,8 +750,12 @@ Consequence: Invariant: - Did data integrity, key safety, cipher compatibility, ABI/API compatibility, and migration safety remain intact? -Preservation: +Justification: +- Can I explain why each touched cipher, page-size, KDF, or legacy-mode choice is the way it is — or have I surfaced that as a known gap rather than silently changing it? + +Re-cueing: - Did I update tests, fixtures, build assertions, docs, runbooks, or comments where the learned theory belongs? +- Did I flag what could not be written down, and who currently holds it? Leakage: - Did I avoid logging, committing, or documenting real secrets or key-bearing commands? diff --git a/.codex/AGENTS_TAURI210.md b/.codex/AGENTS_TAURI210.md new file mode 100644 index 00000000..f4715374 --- /dev/null +++ b/.codex/AGENTS_TAURI210.md @@ -0,0 +1,929 @@ +# Tauri 2.10.x Agent Protocol + +**Version:** 2.0.0 +**Updated:** 2026-04-27 +**Inherits:** [.codex/UNIVERSAL_ENGINEERING_CONTRACT.md](./UNIVERSAL_ENGINEERING_CONTRACT.md) v2.0.0+ +**Composes with:** [.codex/AGENTS_RUST195_CARGO.md](./AGENTS_RUST195_CARGO.md) for Rust surfaces in `src-tauri`; the relevant frontend language/framework instructions for JavaScript, TypeScript, React, Vue, Svelte, Solid, Angular, Leptos, Dioxus, Yew, or other frontend stacks; and other applicable protocols (Java/Kotlin/Python/SQLite3MC) for app-specific dependencies. +**Scope:** projects that build, configure, ship, test, or operate **Tauri 2.10.x** applications, plugins, bundles, mobile targets, updaters, and frontend/Rust IPC surfaces — including desktop and mobile Tauri apps, `src-tauri` crates, frontend packages calling Tauri APIs, plugins, generated Android/iOS projects, bundling/signing/notarization workflows, capability and permission files, deep links, menus, tray behavior, custom protocols, WebView behavior, and CI/release automation. + +## 0. Scope and inheritance + +This protocol inherits the Universal Engineering Contract. The universal contract defines the meta-questions every change must answer — Truth, Evidence, Consequence, Invariant, Justification, Re-cueing — and frames the agent as a *transient theory-holder*. Apply the universal contract before any rule below; do not restate it here. Compose with the Rust protocol for `src-tauri` and Rust plugins, and with the relevant frontend protocol for the WebView code. + +This protocol adds Tauri- and platform-specific content for which the universal contract is intentionally silent: the frontend↔Rust security boundary, capability/permission/scope discipline, IPC contract design, WebView heterogeneity, generated mobile-project ownership, and updater/signing/release mechanics. + +**Primary objective:** preserve the security boundary between frontend and Rust core while keeping state ownership, IPC contracts, permissions, platform behavior, and release artifacts explicit, verified, and easy to modify. + +**Optimization order:** + +```text +security boundary → least privilege → state ownership → IPC contract clarity → platform compatibility → build/release reproducibility → observability without leakage → user experience → performance where measured → terseness +``` + +Convenience loses to least privilege. Passing a frontend build is not enough. A Tauri change is not verified until the relevant Rust, frontend, capability, platform, and release-contract surfaces have been checked or explicitly scoped out. + +### 0.1 Tauri 2.10 tacit gaps + +Per the Naurian frame, some theory the agent typically does not bring in cold and must surface rather than paper over. Watch especially for: + +- That the WebView is *not* trusted code. The frontend is an attacker model, not just a UI layer. Validate every IPC payload in Rust even if the frontend already validated it. +- That `tauri dev` and `tauri build` diverge on CSP, signing, dev-server URL, hot reload, asset loading, and capability defaults. "Works in `tauri dev`" proves almost nothing about the bundled artifact. +- That WebView2 / WKWebView / WebKitGTK are not interchangeable. Media, fonts, fetch, custom protocols, file URIs, and CSP edge cases differ. A green check on one host platform says nothing about the others. +- That the active Tauri CLI may resolve to either the cargo-installed `tauri-cli` or the local `@tauri-apps/cli` depending on the script and PATH. They can be on different versions and produce different artifacts. +- That `@tauri-apps/api` 2.10.0 had a packaging/asset issue. If frontend imports look broken, prefer 2.10.1+ within the 2.10 line before rewriting application code around the symptom. +- That updater signing private keys generated by the CLI window between Tauri 2.9.3 and 2.10.0 may be unusable. Treat updater signature failures as possible key-state issues, not just config mistakes. +- That `TAURI_PRIVATE_KEY*` env vars are deprecated for signer commands in the 2.10 line in favor of `TAURI_SIGNING_PRIVATE_KEY*`. Mixed-name configurations silently fall back to defaults. +- That window/webview labels are stringly identifiers referenced from capabilities, event routing, frontend code, tests, mobile manifests, and sometimes user workflows. Renaming `"main"` is not a refactor — it is a contract migration. +- That Tauri capabilities and plugin permissions *unify* into an effective permission set per window. The visible files are inputs; the effective grant is computed. +- That generated Android/iOS projects may be regenerated *or* maintained per repository policy. The agent does not know which posture this repo takes; editing the wrong layer is invisible until the next regenerate. +- That secret stores (Stronghold, OS keychain, plugin-store, SQLite, app data) have different recovery, sync, and OS-prompt semantics. Moving a secret between them is a behavior change, not refactoring. +- That bundle identifier, app identifier, and package name appear in `tauri.conf.*`, mobile manifests, signing configs, store listings, deep-link associations, and updater manifests. They form a single contract across multiple files. + +Where the answer is not derivable from code, history, or conversation, surface the gap explicitly; do not assume the convenient answer. + +--- + +## 1. Repository intake before touching Tauri surfaces + +Before editing anything related to Tauri, determine the repository's actual application shape. + +Inspect the relevant subset of: + +- `src-tauri/Cargo.toml`, workspace `Cargo.toml`, `Cargo.lock`, `rust-toolchain.toml`, `.cargo/config.toml`, Cargo features, target triples, build scripts, generated code, and native dependencies; +- `src-tauri/src/main.rs`, `src-tauri/src/lib.rs`, module structure, `tauri::Builder` setup, plugin registration, `generate_handler!`, setup hooks, state registration, async runtime usage, background tasks, custom URI schemes, and command modules; +- Tauri configuration files: `src-tauri/tauri.conf.json`, `tauri.conf.json5`, platform-specific config overlays, appstore config, dev/build commands, bundle targets, identifiers, product names, versions, resources, icons, security settings, windows/webviews, updater config, and plugin configuration; +- capability, permission, and scope files under `src-tauri/capabilities`, plugin permission files, generated permission artifacts, and references to window or webview labels; +- frontend package manifests and lock files: `package.json`, `pnpm-lock.yaml`, `yarn.lock`, `package-lock.json`, Bun/Deno manifests, frontend build configuration, Vite/Webpack/Rollup/ESBuild config, TypeScript config, route definitions, application state stores, and calls to `@tauri-apps/api` or plugin packages; +- generated Android and iOS projects, including Gradle files, Xcode project files, `Info.plist`, entitlements, manifests, signing settings, bundle identifiers, mobile permissions, associated domains, app links/deep links, and generated bridge code; +- updater/signing/release assets: updater public keys, signing key references, environment-variable policy, signing/notarization scripts, GitHub Actions, package manager scripts, release manifests, installer targets, appcast/update endpoints, bundle metadata, and artifact naming conventions; +- platform dependencies and packaging assumptions: WebView2, WebKit/WebKitGTK, Linux packages, Xcode/Command Line Tools, Android Studio/JDK, Windows MSVC toolchain, MSI/NSIS/AppImage/deb/rpm/dmg/app/appimage/mobile build targets, and container images; +- runtime state stores: managed Rust state, frontend state, plugin-store data, local storage, IndexedDB, filesystem paths, SQLite or other databases, keychain/credential stores, Stronghold, OS settings, command-line args, and environment variables; +- existing feedback: unit tests, Rust integration tests, frontend tests, component tests, mocked Tauri API tests, E2E tests, `tauri info`, CI build matrix, bundled-app smoke tests, permission-denial tests, updater tests, mobile tests, logging, crash reporting, telemetry, and support/runbook notes; +- the universal contract's six concerns (truth, evidence, consequence, invariant, justification, re-cueing) for the touched surface. + +Classify the touched surface before designing the change: + +- **Tauri core app:** security boundary, application state, command registry, plugin registry, lifecycle, windows, menus, trays, custom protocols, and platform behavior are contracts. +- **Frontend invoking Tauri:** IPC names, payload shapes, error shapes, event names, capability assumptions, UI state, and fallback behavior are contracts. +- **Capability/permission/scope:** allowed commands, denied commands, path/network/shell scopes, window labels, and plugin access are security contracts. +- **Published plugin or library:** SemVer, permission schema, command names, generated bindings, docs, examples, tests, and cross-platform support are contracts. +- **Mobile target:** generated project ownership, platform permissions, signing, bundle identifiers, deep links, storage, WebView behavior, and Android/iOS lifecycle are contracts. +- **Bundler/updater/signing:** artifact identity, code signing, updater signature verification, installer behavior, platform metadata, and release automation are contracts. +- **Internal tool/demo:** correctness still matters, but public compatibility and release surface may be narrower. + +Do not infer a repository is Electron-like, browser-only, Rust-only, or Node-capable. Tauri applications deliberately split trust and capability across frontend, Rust core, operating system APIs, and platform WebViews. + +--- + +## 2. Change loop in Tauri terms + +### 2.1 Minimum system map + +Apply the universal contract §1 system map to the touched surface. Tauri-specific anchors for each concern: + +- **Truth:** source of truth for application state (Rust-managed, frontend, database, plugin store, filesystem, OS, external service, config); for IPC command names, event names, payload schemas, and errors; for capabilities, permissions, scopes, windows/webviews, bundle identity, updater identity, and signing configuration; derived/cached/generated copies (frontend wrappers, TypeScript types, Specta/serde outputs, generated mobile projects, docs, tests, CI config, release manifests). +- **Evidence:** existing checks (`cargo check/test/clippy/fmt`, frontend typecheck/lint/test/build, `tauri info`, capability/permission validation, mocked IPC tests, E2E tests, bundled-app smoke tests, mobile builds, updater/signing verification); missing feedback worth adding. +- **Consequence:** direct (commands, invoke callers, event listeners/emitters, windows/webviews, capabilities, permissions, plugin setup, menus, trays, deep links, tests, docs); indirect (persisted user data, platform installers, update manifests, mobile entitlements, signing/notarization, support workflows, telemetry, release automation). +- **Invariant:** security, least-privilege, state, IPC, platform, release, or user-experience rule that must remain true. +- **Justification:** why each capability grant, scope, plugin choice, window label, or signing decision is the way it is — and which are inherited rather than chosen. If the answer is not available, surface that gap rather than silently re-deriving it. +- **Re-cueing:** where the learned theory belongs — type, command wrapper, permission file, generated binding, test, rustdoc/KDoc/JS doc, AFAD-managed doc, runbook, release checklist, CI assertion. Flag the parts of the theory that cannot be written down (threat model nuance, signing-key history, store-listing context) and who currently holds them. + +Keep this lightweight for low-risk edits. Do not skip it for changes that affect permissions, IPC, persisted state, filesystem/shell/network access, updater/signing, mobile platform files, or bundle identity. + +### 2.2 Red → Green → Refactor + +Per universal contract §2. Tauri-typical "smallest failing proofs": + +- Rust unit or integration test for command logic; +- frontend test with mocked Tauri API; +- contract test for serialized command input/output and error shape; +- capability/permission denial test; +- path scope, shell scope, HTTP scope, or file access test; +- bundled-app smoke test for window/menu/tray/deep-link behavior; +- updater/signature verification fixture; +- mobile manifest/entitlement/build check; +- reproducible `tauri info`, `tauri build`, or CI failure; +- generated binding/golden file check. + +Then make the smallest coherent implementation and immediately refactor until the touched surface has clearer ownership, narrower permission, simpler IPC, and better verification. + +### 2.3 Narrow-to-wide verification + +Per universal contract §2 and §7 *Feedback must match risk*. Work in small increments: + +1. make one coherent change; +2. run the narrowest relevant check, usually a Rust test/check, frontend typecheck/test, or config/schema validation; +3. inspect the first real failure; +4. fix the root cause; +5. rerun the narrow check; +6. widen to the repository-required verification before completion. + +For Tauri, widening usually means crossing the boundary at least once: a Rust-only change that affects commands needs frontend or contract evidence; a frontend-only change that changes `invoke` behavior needs command/capability evidence; a config or permission change needs a smoke test or explicit validation. `tauri dev` is not release verification (§0.1). + +### 2.4 Root-cause fixes only + +When verification fails: + +- read the actual Rust, frontend, Tauri CLI, bundler, platform, WebView, mobile, or signing diagnostic; +- identify whether the root is Rust type/state ownership, serde shape, command registration, capability/permission scope, frontend import/version mismatch, stale generated binding, platform WebView behavior, packaging metadata, mobile generated project drift, updater key/signature state, or a real logic bug; +- fix that cause; +- preserve the failing proof if it guards security, platform compatibility, persisted state, or release correctness. + +Do not: + +- expand capabilities or scopes to make a failure disappear; +- move secret handling into the frontend; +- collapse typed command errors into strings without preserving actionable meaning; +- ignore platform differences because a dev WebView works locally; +- edit generated mobile or binding artifacts without updating the canonical source or generation path; +- treat `tauri dev` success as release verification; +- downgrade signing, notarization, updater verification, CSP, or permission controls for convenience. + +--- + +## 3. Baseline posture: Tauri 2.10.x + +### 3.1 Version baseline + +For repositories governed by this protocol, assume: + +- Tauri major/minor baseline: `2.10.x`. +- Tauri Rust crate line: `tauri = 2.10.x`; prefer the latest compatible 2.10 patch unless the repository intentionally pins a specific patch. +- Tauri JavaScript API line: `@tauri-apps/api = 2.10.x`; prefer `2.10.1` or later within the 2.10 line because `2.10.0` had packaging/asset trouble (§0.1). +- Tauri CLI line: Rust `tauri-cli = 2.10.x` or JavaScript `@tauri-apps/cli = 2.10.x`; prefer the latest compatible 2.10 patch. Note that the active CLI may resolve to either path depending on PATH and scripts (§0.1). +- Tauri plugin packages should be kept on compatible `2.x` versions, and where plugin packages have a `2.10.x` line, use the repository's selected patch consistently. +- Do not mix Tauri core `2.10.x` with `@tauri-apps/api` or CLI packages from `2.9.x` or `2.11.x` unless the repository explicitly documents that compatibility decision and verification proves it. Any such mix requires recorded *justification* per universal contract §1.5. +- Do not silently upgrade to Tauri 2.11+ features in a 2.10-baseline repository. +- Keep `Cargo.lock` and frontend package lock files aligned with the chosen Tauri patch versions. + +Before changing versions, run or preserve a version report such as `cargo tree -i tauri`, package-manager dependency output, and `tauri info` where available. + +### 3.2 Tauri 2.10-specific notes + +Agents may use Tauri 2.10 behavior when the repository is on the 2.10 line: + +- `WebviewWindow` includes `set_simple_fullscreen`, matching the existing `Window` capability; on macOS this toggles fullscreen without creating a new macOS Space, while other platforms fall back to regular fullscreen behavior. +- Android external-storage `convertFileSrc` behavior received better error handling around local video files; still verify Android permissions, content URI/file path handling, and failure logging on the target device or emulator. +- Tauri 2.10 updated the Wry/WebKitGTK dependency stack; Linux WebKitGTK behavior and `with_webview` integrations deserve platform smoke tests after changes. +- Tauri CLI 2.10 introduced `TAURI_SIGNING_PRIVATE_KEY`, `TAURI_SIGNING_PRIVATE_KEY_PATH`, and `TAURI_SIGNING_PRIVATE_KEY_PASSWORD` for signer commands. Treat older `TAURI_PRIVATE_KEY*` variables as deprecated unless the repository intentionally supports them. Mixed naming silently falls back (§0.1). +- Tauri CLI 2.10.1 fixed command handling for comma-separated Cargo features and mobile Cargo arguments; prefer the fixed CLI patch when the repository uses feature-heavy or mobile builds. +- Updater signing private keys generated by the affected CLI window between Tauri `2.9.3` and `2.10.0` may need regeneration if they fail. Do not paper over updater signature failures — they may be key-state issues, not config mistakes. + +### 3.3 Rust and frontend baselines + +Tauri is built with Rust. Use the repository's pinned Rust toolchain first; if this bundle's Rust protocol governs the project, apply the Rust 1.95+ protocol to `src-tauri` and Rust plugin surfaces. + +For frontend surfaces: + +- detect the actual package manager from lock files and scripts; +- use the repository's pinned Node/Bun/Deno/toolchain version if present; +- do not change package managers casually; +- keep `@tauri-apps/api`, `@tauri-apps/cli`, plugin packages, and generated bindings aligned; +- do not assume Node APIs exist in the WebView frontend; +- do not assume browser APIs behave identically across WebView2, WKWebView, and WebKitGTK (§0.1). + +### 3.4 WebView and platform posture + +Tauri uses platform WebViews. Treat target platforms as runtime contracts: + +- Windows: WebView2, MSVC tooling, installer behavior, MSI/NSIS behavior, code signing, and Windows paths matter. +- macOS/iOS: WKWebView, Xcode tooling, entitlements, associated domains, sandboxing, signing, notarization, and app bundle metadata matter. +- Linux: WebKitGTK version, package dependencies, app indicator/tray support, OpenSSL, distro packaging, fonts in containers, and sandbox/container assumptions matter. +- Android: generated Gradle project, Android Studio/JBR/JDK, manifests, storage/media permissions, deep links, app identifiers, and emulator/device behavior matter. + +Do not declare a platform-sensitive change complete from a single host-platform dev build unless the repository's supported platform contract is narrower. + +--- + +## 4. State, truth, and ownership + +### 4.1 Authoritative state + +Before changing state behavior, identify where truth lives: + +- Rust-managed state registered through Tauri state APIs; +- frontend framework state, stores, caches, local storage, IndexedDB, or session state; +- plugin-store data or filesystem-backed preferences; +- SQLite or another database; +- OS-level state such as keychain, permissions, app data directories, notification state, tray/menu state, or file associations; +- external service state; +- generated config state from `tauri.conf.*`, mobile manifests, and build scripts. + +Do not create a second source of truth for convenience. Frontend state should be treated as a cache or UI projection unless the system explicitly declares it authoritative. Moving a secret between Stronghold, OS keychain, plugin-store, and database changes recovery and OS-prompt semantics — that is a behavior change, not a refactor (§0.1). + +### 4.2 Mutation paths + +For every stateful change, list who can mutate the state: + +- frontend user action; +- Rust command; +- plugin command; +- background task; +- startup/setup hook; +- event listener; +- file watcher; +- updater; +- external service callback; +- mobile lifecycle callback; +- OS permission or storage behavior. + +If multiple paths mutate the same state, make ordering, idempotency, and reconciliation explicit. + +### 4.3 Cross-boundary synchronization + +Prefer explicit synchronization: + +- commands return current authoritative state after mutation; +- events notify changed state with typed payloads; +- frontend subscribes/unsubscribes deterministically; +- backend background tasks expose cancellation and shutdown behavior; +- persisted state has migrations and versioning where needed. + +Avoid hidden state synchronization through duplicated constants, stringly typed event names, unversioned local storage, unscoped filesystem paths, or implicit UI assumptions. + +--- + +## 5. IPC, commands, and events + +### 5.1 Command design + +Tauri commands are public application capabilities once exposed to the frontend. Design them as contracts. + +For each command, make explicit: + +- command name and canonical owner; +- input payload schema and validation; +- output payload schema; +- error type and frontend-visible error mapping; +- required permission/capability; +- state read/write effects; +- filesystem, network, shell, database, or OS effects; +- cancellation or timeout behavior for long-running work; +- tests proving success and failure paths. + +Prefer small, domain-shaped commands over generic escape hatches. Avoid commands such as `run_shell`, `read_any_file`, `write_config`, `fetch_url`, `execute_sql`, or `set_state` unless scopes and validation make them safe and domain-specific. + +### 5.2 Command registration + +When adding, deleting, or renaming a command, update all of: + +- Rust function and module; +- `tauri::generate_handler!` registration; +- generated bindings or TypeScript wrappers; +- frontend `invoke` callers; +- capability/permission references; +- tests and mocks; +- docs/examples/runbooks where the command is public or operationally relevant. + +Do not leave orphan commands in `generate_handler!`, orphan frontend invokes, or permissive capabilities that no longer map to live behavior. + +### 5.3 Payload and error shape + +Use typed request and response structs where command shape is non-trivial. Validate untrusted frontend input in Rust even if the frontend already validates it — the WebView is part of the attacker model (§0.1). + +Error rules: + +- return structured errors that preserve cause without leaking secrets; +- avoid `unwrap`, `expect`, or panic paths in command handlers; +- map internal errors to frontend errors intentionally; +- make permission-denied, not-found, invalid-input, conflict, unavailable, and platform-unsupported cases distinguishable when the UI needs to react differently; +- test serialization/deserialization for public or reused command types. + +### 5.4 Events + +Events are dynamic contracts. For every event: + +- define the canonical event name in one place (per universal contract §5 canonical ownership); +- define payload shape; +- state who emits and who listens; +- document whether the event is global, window-specific, webview-specific, one-shot, streaming, or lifecycle-bound; +- ensure listeners are unsubscribed when components unmount or windows close; +- avoid flooding the frontend without throttling, backpressure, or sampling. + +Use events for broadcast or asynchronous state changes. Use commands for request/response operations. Do not use events to bypass command permission design. + +--- + +## 6. Capabilities, permissions, and scopes + +### 6.1 Capabilities are security contracts + +Tauri capabilities define which permissions are granted or denied for specific windows and webviews. Treat capability files as first-class security artifacts. The visible capability and plugin-permission files are inputs; the *effective* permission set per window is computed from their union (§0.1). + +When changing capabilities: + +- identify affected windows/webviews by label; +- identify each command or plugin permission being granted or denied; +- record *why* the frontend needs the capability (per universal contract §1.5); +- minimize scope to the exact path, URL, shell command, database, or plugin operation required; +- add a denial/negative test when practical; +- update docs or runbooks for non-obvious access. + +Do not use broad permissions to make development easier. A permission that does not correspond to a real user-visible or operational need should not exist. + +### 6.2 Permission and scope hygiene + +For plugin permissions and scopes: + +- keep scopes narrow and deterministic; +- prefer app data/config/cache directories over arbitrary absolute paths; +- avoid unrestricted shell, filesystem, HTTP, SQL, or opener access; +- keep allow and deny rules readable; +- do not grant permissions to windows that do not need them; +- remove stale permissions when commands or UI features are deleted; +- ensure generated permission artifacts are updated from canonical sources. + +For filesystem access, never rely only on frontend path checks. Enforce path constraints in Rust or plugin scopes. + +For shell/process access, never allow arbitrary frontend-provided command strings. Use fixed commands or domain-specific operations with strict argument validation. + +For network/http access, scope allowed domains and methods where the plugin supports it. Treat bearer tokens, cookies, signed URLs, and local network targets as sensitive. + +### 6.3 Frontend trust posture + +The frontend runs in a WebView and must not be treated as fully trusted. This is the headline tacit gap from §0.1. + +Do not: + +- put long-lived secrets, private keys, updater signing keys, or privileged tokens in frontend state or bundled JS; +- let frontend input choose arbitrary file paths, shell commands, SQL, URLs, or OS operations; +- rely on hidden UI controls for authorization; +- disable CSP or relax webview security without a documented reason and verification; +- load remote code unless the repository explicitly accepts that risk and the capability/security model is reviewed. + +Rust command handlers must validate all security-relevant input. + +--- + +## 7. Configuration, generated artifacts, and build ownership + +### 7.1 Tauri config is a contract + +`tauri.conf.*` and platform-specific overlays define public application identity and runtime behavior. Treat these values as contract facts (per universal contract §5): + +- product name; +- app identifier/bundle ID; +- version; +- build commands; +- dev URL/frontend dist path; +- windows/webviews and labels; +- security settings; +- bundle targets; +- resources and external binaries; +- updater endpoints and public keys; +- plugin configuration; +- platform-specific entitlements and metadata. + +Do not duplicate config facts in scripts, docs, frontend constants, mobile manifests, or CI. Generate or derive where possible, and fail validation on drift when practical. Bundle identifier, app identifier, and package name in particular form a single contract spread across `tauri.conf.*`, mobile manifests, signing configs, store listings, deep-link associations, and updater manifests (§0.1). + +### 7.2 Generated files + +Generated outputs may include mobile projects, TypeScript bindings, plugin permission schemas, icons/resources, code-signing metadata, and build artifacts. + +Before editing a generated file: + +- find the canonical generator or source; +- decide whether the generated file should be regenerated instead; +- update the generator when the generated shape is wrong; +- preserve checks that detect stale generated output. + +Do not hand-edit generated Android/iOS project files as a workaround unless the repository explicitly owns those files post-generation. The agent does not know which posture this repo takes (§0.1) — verify before editing. + +### 7.3 Package manager and lock discipline + +Use the package manager indicated by lock files and repository scripts. + +- `pnpm-lock.yaml` → use pnpm unless project instructions say otherwise. +- `yarn.lock` → use Yarn and detect classic vs Berry from repository files. +- `package-lock.json` → use npm. +- Bun/Deno manifests → use the repository's pinned tool. + +Do not rewrite lock files from a different package manager. Do not update unrelated frontend dependencies while changing Tauri unless required by compatibility or security. + +### 7.4 Version alignment checks + +When changing Tauri versions or packages, check: + +- Rust crate versions in `Cargo.toml` and `Cargo.lock`; +- JS package versions in `package.json` and lock files; +- CLI source: cargo-installed `tauri-cli` vs local `@tauri-apps/cli` (§0.1); +- plugin Rust crates and JS packages; +- generated bindings; +- CI images and install scripts; +- documentation examples. + +A version mismatch across `tauri`, `@tauri-apps/api`, plugins, and CLI is a real failure, not cosmetic noise. + +--- + +## 8. Windows, webviews, menus, trays, and lifecycle + +### 8.1 Window and webview labels + +Window and webview labels are contract identifiers (§0.1). They are referenced by capabilities, event targets, tests, frontend code, menu/tray behavior, docs, and sometimes user workflows. + +When changing labels: + +- update capabilities; +- update event routing; +- update window lookup code; +- update tests/mocks; +- update docs/examples; +- verify launch and multi-window behavior. + +Do not rename labels as a purely cosmetic refactor unless blast radius is traced. + +### 8.2 Multi-window behavior + +For multi-window applications, define: + +- which state is per-window vs global; +- which capabilities each window gets; +- how windows are created, focused, hidden, closed, restored, or destroyed; +- how event routing avoids leaking data between windows; +- how frontend routes map to windows/webviews; +- how release builds behave compared with dev builds (§0.1). + +Test release-mode multi-window behavior when window labels, URLs, capabilities, or webview setup change. + +### 8.3 Menus and trays + +Menus and trays are not mere UI. They often expose privileged operations. + +When changing menu/tray behavior: + +- trace the command path to Rust handler and capability; +- ensure disabled/hidden UI state is not the only authorization check; +- verify platform differences; +- preserve keyboard shortcuts and accessibility labels where applicable; +- test quit/close/hide behavior and background task shutdown. + +### 8.4 Startup, shutdown, and background tasks + +Startup and shutdown are system contracts. + +For setup hooks, managed state, background tasks, watchers, and async jobs: + +- make initialization order explicit; +- avoid races between frontend readiness and Rust state readiness; +- expose cancellation/shutdown paths; +- avoid blocking the UI thread; +- avoid orphan tasks after window close; +- log failures without leaking secrets; +- test restart/reopen paths when practical. + +--- + +## 9. Files, paths, assets, and custom protocols + +### 9.1 Path ownership + +Use Tauri path APIs and platform-appropriate app directories. Avoid hard-coded absolute paths except in tests or documented integrations. + +For path changes, identify: + +- app data/config/cache/log directory; +- migration behavior from old locations; +- permissions/scopes needed by the frontend; +- mobile storage behavior; +- backup/restore implications; +- user-visible file picker behavior; +- deletion/cleanup behavior. + +### 9.2 Filesystem access + +Filesystem access requires strict validation: + +- resolve paths before use; +- prevent traversal outside allowed roots; +- reject symlink surprises where security-relevant; +- handle platform path separators and Unicode; +- test missing, permission-denied, locked, and corrupted files; +- never log secret file contents or keys. + +Frontend filesystem plugin access should be narrower than backend filesystem access. If the frontend only needs to request a domain action, implement a Rust command instead of granting broad file access. + +### 9.3 Assets and `convertFileSrc` + +When serving local files to the WebView: + +- use the repository's established `convertFileSrc` or custom protocol strategy; +- verify capability and path scope behavior; +- test file types and MIME expectations; +- handle Android external storage and media permissions explicitly; +- avoid exposing directories or private files through broad URL conversion. + +### 9.4 Custom protocols + +Custom protocols bridge application state into WebView-visible resources. Treat them as security-sensitive. + +For custom protocol handlers: + +- validate every path/key/request; +- restrict content roots; +- set appropriate MIME types and headers; +- avoid cache leaks of private content; +- test invalid path, traversal, missing content, and unauthorized content cases; +- document ownership and threat model if non-obvious. + +--- + +## 10. Updater, signing, bundling, and release + +### 10.1 Updater integrity + +Updater behavior is a supply-chain contract. + +When changing updater configuration: + +- verify public key ownership; +- verify signing private key source and environment-variable naming; +- verify update endpoint and manifest shape; +- verify target platform artifacts and signatures; +- test failed signature, stale version, unavailable endpoint, and rollback/user-cancel paths where practical; +- never log private keys, signing passwords, tokens, or signed update URLs. + +Do not bypass signature verification. Do not use updater success on one platform as proof for all supported platforms. Treat updater signature failures as possible 2.9.3–2.10.0 key-state issues (§0.1) before assuming a configuration mistake. + +### 10.2 Signing environment variables + +For Tauri 2.10.x signer workflows, prefer: + +```text +TAURI_SIGNING_PRIVATE_KEY +TAURI_SIGNING_PRIVATE_KEY_PATH +TAURI_SIGNING_PRIVATE_KEY_PASSWORD +``` + +Treat these as secret-bearing. Keep them out of frontend code, logs, committed files, and public CI output. + +Old `TAURI_PRIVATE_KEY*` variables are deprecated for signer commands in the 2.10 line. Support them only when the repository has an explicit backward-compatibility contract. Mixed naming silently falls back to defaults (§0.1) — never partial. + +### 10.3 Bundler targets + +Bundler targets affect user installation, update compatibility, support workflows, and security posture. + +When changing bundle targets or metadata: + +- verify app identifier, product name, version, icons, resources, external binaries, file associations, deep links, and license/notice files; +- check platform-specific installer behavior; +- preserve code-signing/notarization requirements; +- test installation/launch/uninstall/update paths when possible; +- update release docs and CI artifacts. + +### 10.4 Platform signing and stores + +For App Store, Microsoft Store, notarization, Android Play, or other store/distribution targets: + +- treat entitlements, provisioning profiles, certificates, bundle IDs, app IDs, package names, and associated domains as contract facts; +- never weaken entitlements or signing to make a local build pass; +- distinguish local debug signing from release signing; +- keep secrets in the repository's secret manager, not in files or docs. + +--- + +## 11. Mobile targets + +### 11.1 Generated project ownership + +Tauri mobile projects may include generated Android/iOS artifacts. Determine whether the repository treats them as generated or maintained — the agent does not know which posture this repo takes (§0.1). + +If generated: + +- edit the canonical Tauri config or generator source; +- regenerate artifacts; +- preserve drift checks where possible. + +If maintained: + +- document why the project owns the generated files; +- update platform files carefully; +- verify desktop behavior is not accidentally changed by mobile-only edits. + +### 11.2 Android + +For Android changes, inspect: + +- package/application ID; +- Gradle plugin and wrapper versions; +- Kotlin/Java toolchain if present; +- Android manifests and permissions; +- storage/media access; +- deep links and associated domains; +- signing configs; +- generated Rust/mobile bridge; +- emulator/device test path; +- `JAVA_HOME` and Android Studio JBR assumptions. + +Do not assume desktop filesystem paths or permissions apply to Android. Test `convertFileSrc`, media/external storage, and content handling on Android when relevant. + +### 11.3 iOS + +For iOS changes, inspect: + +- bundle ID; +- Xcode project/workspace; +- signing team/profiles; +- entitlements; +- associated domains; +- app transport/security settings; +- WKWebView behavior; +- lifecycle and background modes; +- generated bridge code; +- simulator/device test path. + +Do not weaken entitlements, ATS, or signing settings without a documented release reason and verification. + +--- + +## 12. Frontend integration + +### 12.1 Tauri API imports + +Use public `@tauri-apps/api` and plugin package imports appropriate to the installed 2.10.x package. Do not rely on private package internals. + +When imports fail: + +- check package version and lock file; +- check frontend bundler configuration; +- check ESM/CJS expectations; +- check whether `@tauri-apps/api` 2.10.0 packaging is the issue and update within 2.10.x if allowed (§0.1); +- do not rewrite application code around a packaging bug without preserving the real cause. + +### 12.2 Frontend state and effects + +Frontend code should model Tauri effects explicitly: + +- command wrappers live in one module per domain; +- event subscriptions are registered and cleaned up deterministically; +- permission-denied and platform-unsupported states are user-visible when needed; +- loading/cancellation/error states are represented; +- tests mock the Tauri API rather than requiring native runtime for pure UI behavior. + +Avoid scattering raw `invoke` strings throughout components. Prefer typed wrappers or generated bindings. + +### 12.3 Browser compatibility + +The WebView is not a generic evergreen browser across all targets (§0.1). + +- Verify APIs against WebView2/WKWebView/WebKitGTK target behavior. +- Avoid Node/Electron assumptions. +- Avoid remote script execution unless the security model explicitly accepts it. +- Keep CSP, asset loading, custom protocols, and bundler output compatible with Tauri. + +--- + +## 13. Rust core and plugin integration + +### 13.1 Rust state and commands + +Apply the Rust protocol for ownership, lifetimes, error handling, concurrency, unsafe, FFI, and testing. + +Tauri-specific Rust rules: + +- managed state should have clear ownership and synchronization; +- command handlers must validate untrusted frontend input; +- long-running commands should not block the UI thread; +- global state must be justified and tested for lifecycle/restart behavior; +- `AppHandle`, `Window`, `WebviewWindow`, and `Emitter` usage should preserve target/window ownership; +- no `unwrap`/`expect` in user-triggered command paths unless the invariant is impossible to violate and documented. + +### 13.2 Plugins + +When adding or changing plugins: + +- update Rust crate and JS package together where applicable; +- register plugin in the builder/setup path; +- configure permissions/capabilities; +- test frontend import and command behavior; +- update generated permissions/bindings if relevant; +- document scopes and security considerations. + +Do not add a plugin when a narrower internal command would be safer and simpler. + +### 13.3 Native dependencies and sidecars + +For native dependencies and sidecars: + +- define canonical build/version owner; +- verify target triples and bundler inclusion; +- verify codesigning/notarization behavior; +- validate arguments and environment; +- avoid secret leakage through process args or logs; +- test missing binary, permission denied, crash, and version mismatch cases. + +--- + +## 14. Testing and verification matrix + +### 14.1 Minimum checks by change type + +Use the smallest sufficient set, then widen as risk increases. + +For Rust command logic: + +```text +cargo check --all-targets +cargo test +cargo clippy --all-targets --all-features # if repo requires it +cargo fmt --check # if repo requires it +``` + +For frontend behavior: + +```text + run typecheck # if present + run lint # if present + test # if present + run build # if relevant +``` + +For Tauri boundary behavior: + +```text +tauri info # or package-manager script equivalent + tauri build # if feasible and relevant +cargo tauri build # if cargo CLI is the repo standard +``` + +Use repository scripts over generic commands when they exist. + +### 14.2 Security-sensitive checks + +For permissions, scopes, filesystem, shell, network, updater, or secrets: + +- add negative tests where practical; +- verify permission-denied behavior; +- verify path/scope rejection; +- verify no secrets appear in logs or frontend bundles; +- verify updater signature failure behavior; +- verify bundled artifact metadata and signing path if changed. + +### 14.3 Platform checks + +For platform-sensitive changes: + +- Windows: verify WebView2/MSVC assumptions, installer target, signing, path behavior, and tray/menu behavior where relevant; +- macOS: verify WKWebView, bundle metadata, signing/notarization, entitlements, tray/menu/window behavior; +- Linux: verify WebKitGTK dependency, app indicator/tray behavior, packaging target, fonts in container builds where relevant; +- Android/iOS: verify generated project, permissions, signing, storage, deep links, and device/emulator behavior. + +If a target cannot be verified locally, state the gap and preserve the most useful automated check or runbook note. + +### 14.4 E2E and smoke tests + +Prefer E2E/smoke tests for: + +- app launch; +- command invocation through frontend; +- permission-denied behavior; +- deep links; +- file picker/local file preview; +- tray/menu operations; +- multi-window behavior; +- updater check/download/install path; +- bundled artifact launch. + +Do not overfit E2E tests to timing. Prefer stable selectors, explicit readiness signals, deterministic fixtures, and clear logs. + +--- + +## 15. Observability and diagnostics + +Tauri observability must help diagnose cross-boundary failures without leaking secrets. + +Log: + +- command names and high-level outcome; +- platform and target information; +- permission-denied cases; +- updater state transitions; +- file path categories, not sensitive full paths unless safe; +- WebView/platform-specific failures; +- background task startup/shutdown. + +Do not log: + +- signing private keys or passwords; +- updater secrets; +- access tokens; +- decrypted data; +- arbitrary file contents; +- full user paths where privacy-sensitive; +- frontend-supplied command strings or URLs before validation. + +When adding diagnostics, make them actionable: include command name, platform, target, and error category without exposing secret values. + +--- + +## 16. Documentation and re-cueing + +Use `.codex/PROTOCOL_AFAD.md` when Tauri changes alter documented public contracts, commands, permissions, plugin APIs, updater/release flows, platform setup, or runbooks. Keep the root `README.md` reader-first per `AGENTS.md`. + +Per universal contract §1.6 (re-cueing), preserve the cues that let the next reader rebuild the relevant slice of theory. Tauri-specific homes for those cues: + +- command contract → Rust type, generated binding, frontend wrapper, and test; +- permission rationale → capability file comment where supported, AFAD doc, or security note; +- state ownership → type/module name, doc comment, test, or architecture note; +- platform behavior → platform test, runbook, or release checklist; +- updater/signing behavior → release checklist and CI assertion; +- build/version rule → lock file, config, CI check, or install README. + +Theory the agent could not write down — store-listing context, signing-key history, threat-model nuance, the reason for a particular CSP exception — should be flagged as a known re-cueing gap so the next reader knows where to ask. Do not preserve critical system theory only in chat transcripts or one-off PR summaries. + +--- + +## 17. Refactoring and deletion + +The universal contract covers Boy Scout + Mikado discipline (§3), architecture as preserved theory (§4), and deletion-requires-proof (§8). Tauri-specific notes follow. + +### 17.1 Refactor safely across the boundary + +For refactors crossing Rust/frontend/config boundaries: + +- rename through canonical owners first; +- update generated bindings; +- update raw `invoke` strings and wrappers; +- update `generate_handler!`; +- update capabilities/permissions; +- update event names and listeners; +- update tests/mocks/docs; +- run boundary verification. + +Do not treat Rust and frontend refactors as independent when IPC names or payloads are involved. Naur's "amorphous additions" warning applies sharply at the Tauri boundary: a Rust-side rename without the IPC theory tends to grow string-aliased shims, default-argument hacks, and capability widening that quietly destroys the original contract. + +### 17.2 Deletion proof (Tauri-specific surfaces) + +Per universal contract §8. Tauri-specific blast-radius surfaces beyond the universal list: + +- static references in Rust and frontend; +- string references in `invoke`, event names, labels, config, tests, docs, CI, and mobile manifests; +- generated bindings and mocks; +- capability/permission references; +- menus, trays, shortcuts, deep links, file associations, updater scripts, and release workflows; +- persisted user data and migration needs; +- external support or documentation workflows. + +Delete in the smallest safe sequence. Remove stale permissions and docs with the behavior they describe. A window-label rename or command deletion is a contract migration, not a refactor. + +### 17.3 Compatibility shims + +Compatibility shims may be useful for app updates, persisted state, old command names, old event names, or migration from Tauri 1.x/older 2.x. + +Keep a shim only when it serves a real compatibility contract. Document: + +- who still depends on it; +- how long it remains; +- how it is tested; +- what removes it. + +Do not accumulate dead compatibility paths. + +--- + +## 18. Anti-patterns + +Reject or refactor these patterns when encountered in touched code: + +- broad `allow-all`-style capabilities without a real contract; +- arbitrary shell/file/network command execution from frontend input; +- raw `invoke` strings scattered through UI components; +- command handlers that panic on user input; +- frontend as source of truth for privileged state; +- secrets in bundled JS, config files, logs, screenshots, or docs; +- path validation done only in frontend; +- duplicate command/event names in separate files; +- stale generated bindings; +- package-manager lock churn unrelated to the task; +- config facts duplicated across `tauri.conf.*`, scripts, docs, mobile manifests, and CI; +- treating `tauri dev` as release verification; +- editing generated mobile project files without owning the generator; +- weakening signing, updater, CSP, or capability restrictions to get a build green; +- ignoring platform-specific WebView behavior. + +--- + +## 19. Agent output for Tauri changes + +The universal contract §9 defines the cross-language output template (Truth, Evidence, Consequence, Invariant, Justification, Re-cueing). Use that template. For Tauri work, the following are typical Tauri-specific entries the next reader will need: + +```text +Tauri surface touched: +- Core app / frontend invoke / plugin / capability / mobile / bundler / updater / docs: + +Evidence: +- Rust checks: +- Frontend checks: +- Tauri/config/bundle/mobile checks: +- Security or negative checks: + +Consequence: +- Commands/events/windows/capabilities/plugins/platforms affected: +- Bundle identifier / app identifier / package name surfaces touched: + +Gaps: +- Platform or GUI checks not run, with reason: +- Tacit gaps from §0.1 that this change did not close (and who would close them): +``` + +Keep the summary proportional to risk. For tiny edits, a concise sentence with verification is enough. For permission, IPC, updater, signing, or platform changes, be explicit. Per the universal contract, silence on justification gaps and inexpressible theory claims a theory the agent does not have. diff --git a/.codex/DOMAIN_DRIVEN_DESIGN_LENS.md b/.codex/DOMAIN_DRIVEN_DESIGN_LENS.md new file mode 100644 index 00000000..b24e1f30 --- /dev/null +++ b/.codex/DOMAIN_DRIVEN_DESIGN_LENS.md @@ -0,0 +1,418 @@ +# Domain-Driven Design Lens + +**Version:** 1.0.0 +**Updated:** 2026-04-30 +**Companion to:** `UNIVERSAL_ENGINEERING_CONTRACT.md` v2.1+ +**Loaded:** on demand, when a change fires the UEC §1.7 domain-meaning gate + +## 0. Placement + +The Universal Engineering Contract governs safe engineering work — truth, evidence, consequence, invariant, justification, re-cueing. This file governs domain-modeling work — language, model boundaries, strategic and tactical design, consistency boundaries, integration between models. + +Use it when a change touches business meaning, business rules, business state, workflows, commands, domain events, policies, permissions, calculations, lifecycle transitions, user-facing business terms, or integration between domain models. + +Do not use it for purely mechanical work — build-script cleanup, generic plumbing, static UI cosmetics, infrastructure changes with no domain meaning. Domain ceremony on mechanical work is the failure mode this discipline most often produces in agent hands. + +This is a reference manual loaded on demand, not a checklist to complete. Apply only what the change actually needs. + +## 1. Triage — do you need this lens at all? + +Run before reading further. If the answer is "no" or "light", skip the heavy chapters. + +### Apply heavy + +When most of these are true: + +- the domain is business-critical; +- the domain gives the product its competitive advantage; +- domain experts use nuanced language; +- the same word means different things in different parts of the business; +- rules, workflows, policies, permissions, calculations, or state transitions are non-trivial; +- incorrect behavior would harm money movement, compliance, security, identity, authorization, operations, or trust; +- the model must evolve for years; +- multiple systems or teams must integrate without sharing one model; +- the current code shows anemic objects, transaction scripts, unclear service methods, duplicated status values, or DTOs leaking everywhere. + +### Apply light + +- supporting subdomain; +- moderate complexity; +- a few meaningful rules or state transitions; +- mostly need clean naming, good tests, and one or two explicit boundaries. + +### Skip + +- low-risk CRUD; +- simple data-entry features with little behavior; +- static content, scaffolding, build tooling, framework glue; +- generic infrastructure; +- a generic product, library, or service solves the problem adequately; +- the ceremony would make the design less clear than the problem. + +The discipline is an investment. Spend it where the business return is real. + +## 2. The DDD-Lite trap + +The most common failure: using Entities, Repositories, Services, and other tactical shapes while skipping Ubiquitous Language, Bounded Contexts, Context Maps, and domain-expert collaboration. This produces the form of the discipline without its value. + +If the team cannot answer "which Bounded Context is this in?", "what is the local Ubiquitous Language?", and "which Aggregate owns this invariant?", the work is DDD-Lite. Stop and answer those before more tactical work. + +## 3. Mindset + +- The model lives in the code and tests, not in glossaries or diagrams. If the code does not use the language, the language is not implemented. +- The model is not the database. Data matters; the database does not get to dictate the language of the domain. +- Domain experts and developers are one team. Do not translate business language into a private technical vocabulary and treat the translation as superior. +- Examples beat abstractions. When a term or rule is unclear, ask for concrete cases. +- Modeling is iterative. Expect the first model to be wrong. Improve through conversation, examples, tests, and refactoring. + +## 4. Strategic design + +Strategic design decides where models belong and how they relate. Without strategic design, tactical patterns are easily misused. + +### 4.1 Domain and subdomains + +A Domain is the sphere of business activity being modeled. Divide a large domain into subdomains. Three types: + +- **Core Domain** — strategically central, differentiating, worth deep modeling. +- **Supporting Subdomain** — necessary, somewhat specialized, but not the main competitive advantage. +- **Generic Subdomain** — common across many businesses; usually better bought, reused, or implemented simply. + +Invest the most modeling effort in the Core Domain. Keep Supporting clean but proportionate. Avoid custom-building Generic Subdomains unless they are strategically important here. Do not let generic concerns pollute the Core Domain. + +### 4.2 Bounded Context + +A Bounded Context is the boundary inside which a model and its Ubiquitous Language are valid. + +It is *not automatically*: a microservice, a database, a repository, a package, a deployment unit, a team, a namespace, a module, or an API. It may align with any of those. Its primary purpose is **semantic boundary**. + +Inside a Bounded Context: "When we say this word here, we mean exactly this." + +Right size: large enough to contain a coherent model, small enough that the language stays crisp. + +- **Too large:** vague terms; teams argue over overloaded words; generic and supporting concerns swamp the Core; everything depends on everything. +- **Too small:** related concepts are split prematurely; integration overhead dominates; a coherent language fragments; every business action becomes a distributed workflow. + +Right-sized contexts are discovered by language, cohesion, dependencies, and business capability — not by arbitrary technical slicing. + +Subdomains describe the *problem space*. Bounded Contexts describe the *solution-space* model boundaries. They often align but not always; do not confuse business decomposition with deployment topology. + +### 4.3 Ubiquitous Language + +Team language used by domain experts and developers in a Bounded Context. It includes nouns, verbs, adjectives, lifecycle names, policies, commands, events, examples, invariants, scenarios, and explicitly rejected terms. + +It must appear in code names, tests, command names, event names, module names, API resources where the API expresses domain meaning, schemas, and conversation. A glossary is useful but insufficient — if the code does not use the language, the language is not current. + +Capture by: scenario workshops, expert interviews, example-based requirements, Given/When/Then tests, command and event catalogs, glossary entries with examples, refactoring names in code, rejecting misleading technical names. + +Avoid: a large dictionary no one uses; database table names treated as the business language; generated bean-style objects treated as the model; one enterprise-wide vocabulary that erases local meaning; developers privately renaming business concepts. + +### 4.4 Context Map + +A Context Map shows relationships between Bounded Contexts. It is both a sketch for discussion and the concrete code, contracts, and translators that implement the relationships. Integration is social, organizational, and technical. Systems fail because teams assume relationships that do not exist. + +For every edge ask: + +- Which context is upstream? Which is downstream? +- Who controls the model or API? +- Who can request changes? +- What crosses the boundary — commands, events, resources, files, schemas, DTOs? +- Is translation needed? +- What happens when the upstream model changes? +- What is the release coordination model? +- What pattern describes the relationship? + +Patterns: + +| Pattern | Use when | Risk | +|---|---|---| +| **Partnership** | Two teams must succeed or fail together | Requires strong communication; schedules become linked | +| **Shared Kernel** | Two teams genuinely share a small, governed model subset | Tight coupling; can become a disguised shared enterprise model | +| **Customer–Supplier** | Downstream depends on upstream; downstream needs influence upstream planning | If upstream does not actually commit, becomes Conformist in practice | +| **Conformist** | Downstream adopts upstream's model because it lacks leverage | Avoid for Core Domain; corrupts the local model | +| **Anticorruption Layer (ACL)** | Local model matters; upstream is foreign, muddy, unstable, or too technical | Translation cost | +| **Open Host Service** | Upstream provides a stable protocol for many consumers | Public host interface is itself a contract; do not dump the internal model | +| **Published Language** | Documented shared exchange language (media type, schema, event format) | The exchange language is not the internal model of either side | +| **Separate Ways** | Integration value is low; duplication or independence is cheaper | None — but verify the assumption | +| **Big Ball of Mud** | An existing tangle | Draw a boundary around it; do not pretend DDD lives inside; prevent spread | + +## 5. Tactical design + +Tactical patterns express the model inside a Bounded Context. Apply *after* strategic design, not instead of it. + +Before asking "Is this an Entity or Value Object?", ask "Which Bounded Context are we in, and what does the local language say?" + +### 5.1 Aggregate — the most important pattern + +An Aggregate is a *transactional consistency boundary* — a cluster of Entities and Value Objects treated as a unit for enforcing invariants. Each Aggregate has a root Entity. External objects reference the Aggregate through the root. + +The key question is not "what objects belong together?" but: + +> Which objects must be transactionally consistent together to protect a true business invariant? + +Rules: + +- **Model true invariants inside the boundary.** What must be true immediately after the transaction? What can be corrected eventually? If a rule can become true eventually, it probably does not need one Aggregate boundary. +- **Design small Aggregates.** Large Aggregates load too much, conflict more, scale poorly, and couple unrelated rules. Design as small as the invariants allow. +- **Reference other Aggregates by identity**, not by object reference. Reduces coupling; supports distribution; clarifies transactional boundaries. +- **Use eventual consistency outside the boundary.** When a business process spans multiple Aggregates, coordinate via Domain Events, process managers, sagas, or Application Services. +- **No public setters for important state.** Use intention-revealing command methods. +- **No repository or messaging injection into Aggregates.** That usually means the Aggregate is reaching outside its boundary. +- **Optimistic concurrency** via version field or stream revision. + +Common reasons to violate these rules — UI convenience, query performance, fear of eventual consistency, habit from relational modeling — explain the pressure but rarely justify large Aggregates. Use read models, projections, repositories, and process managers instead. + +### 5.2 Entity + +An Entity has identity that matters across time and state changes. + +Use when: object has a lifecycle; can change attributes while remaining the same thing; the business distinguishes one instance from another; identity matters more than the current values. + +Avoid making everything an Entity. Many concepts are better as Value Objects. + +Identity questions: who assigns it; when it is needed; can it change; is it meaningful or only technical; can equality rely on identity alone; what happens if upstream identity changes. + +Behaviors should be intention-revealing: + +```text +user.activate() +user.changeEmailAddress(...) +subscription.cancelBecause(...) +backlogItem.commitTo(sprint) +``` + +over: + +```text +user.setActive(true) +user.setEmail(...) +subscription.setStatus("cancelled") +backlogItem.setSprintId(...) +``` + +Validation belongs where the rule belongs: Entity constructor or factory for local invariants; Aggregate root for aggregate-wide invariants; Domain Service for an operation not owned by one object; Application Service for use-case coordination and authorization. Do not rely only on database constraints for domain validity. + +### 5.3 Value Object + +Describes, measures, or quantifies something. No conceptual identity. Equal by value. Replaceable. Immutable or treated as immutable. + +Use when: the concept is descriptive; its attributes form a conceptual whole; equality by value makes sense; immutability is practical; replacement is safer than mutation. + +Examples: Money, EmailAddress, DateRange, Quantity, Address, FullName, Percentage, Coordinates, TenantId-as-typed-value. + +Replace primitive obsession when the value has rules: + +```text +EmailAddress over String email +Money over BigDecimal amount +TenantId over String tenantId +SprintDuration over int days +``` + +Persistence should not force a Value Object to become an Entity. Embed, serialize, or map; choose the implementation, not the conceptual decision. + +### 5.4 Domain Service + +A stateless domain operation that does not naturally belong to an Entity or Value Object — usually because it coordinates multiple objects or expresses a domain concept that does not fit on one object. + +Not: an Application Service, REST controller, persistence gateway, transaction script, helper class, technical utility, or general dumping ground for business rules. + +Name in the Ubiquitous Language. Keep the layer small — if most behavior lives in services, the model is anemic. + +### 5.5 Domain Event + +Records something meaningful that happened in the domain. Past tense. + +Examples: `BacklogItemCommitted`, `SprintScheduled`, `UserRegistered`, `PaymentReceived`, `PolicyActivated`, `OrderShipped`. + +Use to: make important happenings explicit; decouple producers from consumers; support eventual consistency; notify other contexts; drive long-running processes; create audit history; support event sourcing; update read models. + +Avoid: events for trivial technical steps with no domain meaning ("event noise"). + +Publishing pattern: the Aggregate records events during command execution; the Application Service publishes after commit. Do not let domain state change succeed while the event silently disappears — use a transactional outbox, event store, notification log, or transaction-synchronized dispatch. Do not publish remote messages directly from inside Aggregates. + +For events crossing context boundaries: treat them as integration contracts. Schema versioning, idempotent consumers, ordering, replay, late arrival, missing events, correlation and causation IDs, privacy. + +### 5.6 Factory + +Creates complex domain objects or Aggregates while protecting invariants and hiding construction complexity. May be a static or named method on the Aggregate root, a standalone Factory object, a Domain Service acting as a Factory, or part of an Anticorruption Layer. + +Name in the Ubiquitous Language: `registerTenant(...)`, `scheduleSprint(...)`, `createDiscussionFor(...)` over `newEntity(...)`, `buildObject(...)`, `mapDto(...)`. + +### 5.7 Repository + +Collection-like access to Aggregates. Abstracts persistence while preserving the illusion of working with domain objects. + +One Repository per Aggregate root, not per table or per object. + +A Repository is not a DAO. It does not let callers assemble inconsistent object graphs. For complex read views, prefer read models or query services rather than loading large Aggregates for display. + +Application Services usually manage transactions; Repositories participate but should not hide broad transaction scripts. + +### 5.8 Module + +Groups cohesive concepts. Names should use the Ubiquitous Language: `identity.access`, `agilepm.backlog`, `billing.invoice`. Avoid vague containers (`utils`, `helpers`, `managers`, `models`, `services`, `common`) unless they truly express the local model. + +If a module cannot be named coherently because language is mixed inside, that may reveal a missing Bounded Context boundary. + +## 6. Integration between Bounded Contexts + +Cross-context integration is not local object collaboration. Latency, partial failure, versioning, autonomy, deployment mismatch, and organizational boundaries all apply. Do not treat remote calls as normal method calls. + +Do not share domain objects across contexts casually. Sharing internal classes creates Shared Kernel or Conformist coupling accidentally. Consumers begin to use foreign concepts as if they were local. + +Prefer: Published Language, Open Host Service, DTOs as integration contracts, Anticorruption Layer, local Value Objects created from foreign data, explicit translators. + +For REST integration: + +- design resources around consumer use cases, not internal Aggregate shape; +- document media types or schemas; +- treat the API as Open Host Service and the schema as Published Language where appropriate; +- translate responses into local model concepts at the boundary; +- a CRUD-per-entity REST surface is usually a Conformist trap for downstream consumers. + +For messaging integration: design for durable publication, idempotent consumers, duplicate messages, out-of-order arrival, missing messages, replay, broker outage and catch-up, poison messages, schema versioning. + +For local copies of remote data: state who owns the original, why a local copy is needed, how stale it may be, how it is updated, what happens when updates are missed, whether it is a snapshot Value Object or a synchronized local Entity. + +For long-running processes (sagas / process managers): hold process state, react to events, send commands, handle timeouts, record progress, tolerate retries. Do not pretend distributed work is one local transaction; do not hide a long-running process inside a single oversized Aggregate. + +## 7. Architecture posture + +Architecture must protect the domain model rather than replace it. Layered, hexagonal, REST, messaging, CQRS, event sourcing, data grids — these are support structures, not the model. + +The recurring principle: **protect the domain model from accidental technical concerns.** + +Use a style because it reduces real risk or satisfies real quality demands — maintainability, scalability, reliability, latency, autonomy, testability, deployment independence, integration needs, auditability, regulatory traceability, consistency requirements. Do not adopt for fashion. + +The domain layer should remain expressible in the local Ubiquitous Language, insulated from persistence, transport, UI, and foreign models where practical. Application Services coordinate use cases — they should not contain core business rules that belong in the model. Infrastructure (persistence, messaging, REST clients, scheduling, transactions, framework glue) lives outside the domain through dependency inversion, ports, adapters, and repositories. + +## 8. Smells and corrections + +| Smell | Correction | +|---|---| +| **DDD-Lite** — Entities, Services, Repositories without Bounded Context, Ubiquitous Language, or Context Map | Return to strategic design before more tactical work | +| **Anemic Domain Model** — domain objects are data holders; logic lives in services or controllers | Move behavior to Entities, Value Objects, Aggregates, Domain Services | +| **Entity obsession** — every concept has identity and mutable lifecycle | Favor Value Objects for descriptive concepts | +| **Aggregate bloat** — one Aggregate carries a large object graph for UI or query convenience | Split by true invariants; use identity references, read models, eventual consistency | +| **Repository as DAO** — repositories expose tables, generic queries, partial assembly | Design around Aggregate roots and collection semantics | +| **Shared model leakage** — one context imports another's domain classes | Use Published Language, DTOs, Open Host Service, Anticorruption Layer | +| **False enterprise vocabulary** — one term forced to mean one thing across the org | Preserve local meanings inside contexts; translate at boundaries | +| **Technical module names** — `utils`, `helpers`, `managers`, `services`, `models` | Name modules by cohesive domain concepts | +| **Event noise** — events describe technical steps, not domain occurrences | Rename or remove events business experts would not care about | +| **Infrastructure in domain** — domain depends on HTTP, ORM, brokers, framework annotations | Use ports, adapters, repositories, dependency inversion | +| **Context Map denial** — teams assume integration works but never name the relationship | Draw the Context Map; document upstream/downstream responsibilities | + +## 9. Refactoring drift toward the model + +For existing systems, look for language drift first: + +- generic method names (`save`, `update`, `process`, `handle`); +- status strings duplicated across layers; +- DTO names used as domain names; +- `Manager`, `Helper`, `Util`, `Service` classes with unclear domain meaning; +- entities with only getters and setters; +- business rules in controllers or application services; +- external API models used directly inside domain code; +- one model trying to serve multiple teams with different vocabularies. + +Move toward intention. Replace generic operations with domain actions: + +```text +saveCustomer(...) +``` + +may split into: + +```text +registerCustomer(...) +changeCustomerAddress(...) +changeCustomerPrimaryEmail(...) +deactivateCustomer(...) +``` + +— but only when the business actions are actually distinct. + +Recover Aggregates from invariants, not from object graphs. For each command: which Aggregate receives it, what invariant must hold immediately, which state is required to decide, which events may be emitted, what can happen eventually. + +Move behavior inward. If business rules live in controllers or application services, move them toward the Entity, Value Object, Aggregate, or Domain Service that owns the rule. + +Isolate foreign models. When external DTOs leak into domain code, introduce an Anticorruption Layer: + +```text +Foreign API DTO → Adapter → Translator → Local Value Object / Entity / command +``` + +Avoid big-bang rewrites. Use the UEC's Mikado-style sequencing: small step, validate, repeat. Stop when the next step requires a new domain conversation, a new boundary decision, or broader blast-radius proof. + +## 10. Cards (use as checklists, not forms) + +Proportional to risk. Do not fill in every line for every change. + +### 10.1 Triage card + +```text +Domain triage: +- Change: +- Business capability: +- Domain experts: +- Core / Supporting / Generic / non-domain: +- Why this investment level: +- Consequence of wrong model: +- Lens depth: full / light / not needed +``` + +### 10.2 Bounded Context card + +```text +Bounded Context: +- Name: +- Purpose: +- Model owner: +- Ubiquitous Language summary: +- Core concepts: +- Commands: +- Events: +- Aggregates: +- Repositories: +- External dependencies: +- Published Language / Open Host Service: +- Anticorruption Layers: +- Known model gaps: +``` + +### 10.3 Aggregate design card + +```text +Aggregate design: +- Bounded Context: +- Aggregate: +- Root: +- True invariant being protected: +- State needed to enforce the invariant: +- Other Aggregates referenced by identity: +- Domain Events emitted: +- Eventual consistency outside the boundary: +- Concurrency rule: +- Tests: +``` + +### 10.4 Context Map edge card + +```text +Context Map edge: +- Upstream: +- Downstream: +- Pattern: +- Business reason for integration: +- Exchange type: +- Published Language: +- Translation location: +- Release coordination: +- Failure modes: +- Versioning: +- Tests/contracts: +``` + +## 11. Output for domain-touching work + +Use the UEC §9 output template plus the proportional domain-design block (also defined in UEC §9). The "Known model gaps" line is a first-class output, not an optional caveat — silence on it claims a model the agent does not have. diff --git a/.codex/PROTOCOL_AFAD.md b/.codex/PROTOCOL_AFAD.md index 13cdef7e..f38e60b9 100644 --- a/.codex/PROTOCOL_AFAD.md +++ b/.codex/PROTOCOL_AFAD.md @@ -1,7 +1,8 @@ # PROTOCOL_AFAD.md — Agent-First Documentation Protocol -Protocol: `AGENT_FIRST_DOCUMENTATION` -Version: `4.0` +**Version:** 4.0 +**Updated:** 2026-04-24 +**Applies to:** all languages, runtimes, frameworks, tools, and repositories. This protocol governs documentation that agents must maintain, retrieve, validate, or keep synchronized with code and system behavior. It is optimized for documentation that can be used by humans, retrieval systems, and future coding agents without requiring hidden context. diff --git a/.codex/UNIVERSAL_ENGINEERING_CONTRACT.md b/.codex/UNIVERSAL_ENGINEERING_CONTRACT.md index 98b30c9f..50523e1d 100644 --- a/.codex/UNIVERSAL_ENGINEERING_CONTRACT.md +++ b/.codex/UNIVERSAL_ENGINEERING_CONTRACT.md @@ -1,45 +1,51 @@ # Universal Engineering Contract -This contract applies to all languages, runtimes, frameworks, tools, and repositories. +**Version:** 2.1.0 +**Updated:** 2026-04-30 +**Applies to:** all languages, runtimes, frameworks, tools, and repositories. +**Companion:** `DOMAIN_DRIVEN_DESIGN_LENS.md` for changes that touch business meaning, business rules, workflows, state transitions, or cross-context integration. Loaded on demand, not by default. -## 1. Systems over goals +## 0. What this contract is, and is not -The requested task is the entry point. The standard is to leave the touched system more coherent, more observable, and easier to change than it was before. +This contract is a method. Per Naur (*Programming as Theory Building*, 1985), no method substitutes for the *theory* held by the people who build and maintain a system — the tacit grasp of how the code maps to the world, why each part is what it is, and how new demands can be absorbed without destroying structure. Theory cannot be fully written down, and revival of a program from its artifacts alone is, in Naur's words, "strictly impossible." -Do not treat generated code, a passing build, or a closed issue as the whole outcome. The outcome is a validated improvement to the system's working theory: what is true, what changes it, what proves it works, what depends on it, and what must not break. +The purpose of this contract is therefore not to capture theory, but to keep its absence visible. It is a discipline for working agents — human or otherwise — who are *transient theory-holders*: they enter cold, build a partial theory of the slice they touch, and leave. That transience makes two things obligatory: -Avoid orphan code: code that appears locally correct but has no clear owner, feedback loop, invariant, or understandable place in the system. +1. **Surface the tacit gap.** Where the change depends on theory the agent does not fully have, say so. Do not paper over it with confident output. +2. **Re-cue the next reader.** Leave artifacts that help the next person reconstruct the relevant slice — knowing those artifacts are aids to a theory that lives elsewhere, not the theory itself. -## 2. Build the minimum system map before touching code +A passing build, a closed issue, or a generated patch is not the outcome. The outcome is a validated improvement to the system's working theory: what is true, what changes it, what proves it works, what depends on it, why it is the way it is, and what must not break. -Before making a non-trivial change, identify the relevant system theory. Keep this lightweight, but make it concrete enough that another engineer or agent could continue safely. +Avoid orphan code: code that appears locally correct but has no clear owner, feedback loop, justification, invariant, or understandable place in the system. -### 2.1 Truth +When the work touches business meaning, treat the Universal Engineering Contract as the operating discipline and the Domain-Driven Design Lens as the model-design discipline. Do not force domain ceremony onto mechanical infrastructure work. -Ask: +## 1. Build the minimum system map before touching code + +Before making a non-trivial change, identify the relevant system theory — concretely enough that another engineer or agent could continue safely, lightly enough that the map does not become its own artifact. + +The map has six concerns. Treat them as questions, not as a form to fill in. A seventh concern (§1.7) applies only when the change touches business meaning. + +### 1.1 Truth — where does state live? - Where does the relevant state live? - What is the canonical source of truth? - Who is allowed to mutate it? -- What state is cached, derived, duplicated, denormalized, persisted, remote, or eventually consistent? +- What is cached, derived, duplicated, denormalized, persisted, remote, or eventually consistent? - Where can this value become stale, invalid, or contradictory? Change the source of truth, not a symptom, unless the task is explicitly about presentation or derived behavior. -### 2.2 Evidence - -Ask: +### 1.2 Evidence — what tells you it works? - What tells us the system is working? - What would tell us it is failing? - Which tests, assertions, type checks, contracts, logs, metrics, traces, dashboards, alerts, or reproducible checks cover this behavior? -- If feedback is missing, what is the smallest useful feedback loop to add? +- If feedback is missing, what is the smallest useful loop to add? A change without evidence is incomplete unless there is a clear, stated reason evidence cannot be added. -### 2.3 Consequence - -Ask: +### 1.3 Consequence — what breaks if you delete it? - What breaks if this file, function, module, class, endpoint, table, message, job, flag, or configuration disappears? - Who calls it directly? @@ -48,40 +54,65 @@ Ask: Do not rely only on intuition. Prove blast radius with the available tools: search, static analysis, dependency graphs, tests, traces, logs, schemas, build output, or runtime inspection. -### 2.4 Invariant - -Ask: +### 1.4 Invariant — what must remain true? -- What must remain true after this change? - What domain rule, security property, compatibility contract, performance bound, idempotency rule, ordering guarantee, data-shape guarantee, or user-visible behavior must not be violated? State the invariant before changing behavior. Add or update executable checks for it where practical. -### 2.5 Preservation +### 1.5 Justification — why is each touched part the way it is? -Ask: +This is Naur's criterion: a programmer who possesses the theory of a program can explain *why* each part is what it is, not merely what it does, and can ground that explanation in the affairs of the world the program maps to. -- Where should the discovered theory live after this work? +For each non-trivial part you touch, ask: -Preserve important knowledge in the most durable appropriate place: tests, names, types, schemas, comments, documentation, runbooks, architecture decision records, generated artifacts, or agent directive files. Do not leave essential system knowledge trapped in a chat transcript or temporary reasoning. +- Why does this exist? +- What real-world fact, constraint, history, or domain rule is it the response to? +- What alternatives were available, and what would have made one of them right instead? -## 3. Red → Green → Refactor +If the answer is not available — from code, history, conversation, or reasoning — say so explicitly. A change made without justification is a change whose blast radius cannot be estimated, because you do not know what the code was protecting. -For new behavior, start with the smallest failing proof of behavior: a test, assertion, contract check, type-level check, reproducible script, golden case, or manual verification path. +### 1.6 Re-cueing — what must the next reader be able to rebuild? + +The relevant theory cannot be fully written down. What can be written down is the set of cues that help the next reader — human or agent — rebuild the slice of theory needed to act safely. + +- What did this change depend on that is not obvious from the diff? +- Where should those cues live so they survive: tests, names, types, schemas, comments where the *why* is non-obvious, runbooks, ADRs, architecture notes, agent directive files? +- What part of the relevant theory is not expressible in artifacts, and who currently holds it? + +Do not leave essential cues trapped in a chat transcript or temporary reasoning. Equally, do not pretend an artifact transfers a theory it can only re-cue. + +### 1.7 Domain meaning gate (conditional) + +Apply this subsection only when the change touches business meaning. This includes domain state, business rules, workflow names, commands, domain events, permissions, policies, calculations, lifecycle transitions, user-facing business terms, or integration contracts between models. For purely mechanical work — build wiring, generic plumbing, infrastructure with no business meaning — skip §1.7 and load nothing further. + +When it applies, ask: + +- Which Bounded Context gives the touched terms their meaning? +- What local Ubiquitous Language is being added, corrected, renamed, or protected? +- Are any terms overloaded elsewhere in the organization or codebase? +- Is the work Core Domain, Supporting Subdomain, Generic Subdomain, or non-domain infrastructure? +- Which Aggregate, Entity, Value Object, Domain Service, Factory, Repository, command, or Domain Event owns the behavior? +- What true invariant is being protected, and where is the transactional consistency boundary? +- Is a foreign model entering this context? If yes, where is the translation boundary, Published Language, Open Host Service, or Anticorruption Layer? + +Do not guess these answers into existence. If the context or language is not known, state the uncertainty and leave a durable cue for the next theory-holder. A vague class name, duplicated status enum, or copied integration DTO can become a false model. -Then: +The full discipline lives in the companion lens. Load it when this gate fires. + +## 2. Red → Green → Refactor + +For new behavior, start with the smallest failing proof of behavior: a test, assertion, contract check, type-level check, reproducible script, golden case, or manual verification path. 1. **Red:** demonstrate the missing or broken behavior. 2. **Green:** make the smallest coherent change that satisfies the proof. 3. **Refactor:** immediately simplify names, boundaries, structure, duplication, and control flow while keeping feedback green. -Passing is not finished. Understandable, coherent, and changeable is finished. - -## 4. Boy Scout + Mikado +Passing is not finished. Understandable, coherent, justified, and changeable is finished. -When touching existing code, leave the local area better than you found it. +## 3. Boy Scout + Mikado -Prefer small, safe, validated improvements: +When touching existing code, leave the local area better than you found it. Prefer small, safe, validated improvements: - Rename unclear concepts. - Extract coherent units. @@ -94,19 +125,21 @@ Prefer small, safe, validated improvements: Use Mikado-style sequencing for broader change: identify the desired improvement, discover prerequisites, make the smallest safe prerequisite change, validate it, and continue only while each step remains understandable and reversible. -If a local refactor naturally unlocks a broader system-wide improvement, continue only while the scope remains controlled and evidence remains strong. Stop when the next improvement is a separate slice. +If a local refactor naturally unlocks a broader improvement, continue only while scope remains controlled and evidence remains strong. Stop when the next improvement is a separate slice. Naur's warning applies: improvements made without the theory tend to become "amorphous additions" that destroy structure even when each individual change looks correct. -## 5. Architecture as preserved theory +## 4. Architecture as preserved theory Do not preserve architecture merely because it exists. Do not replace architecture merely because a new design seems cleaner in isolation. Treat architecture as accumulated system theory. Preserve the parts that encode real constraints, useful boundaries, domain language, operational lessons, or compatibility contracts. Improve the parts that are accidental, duplicated, misleading, obsolete, or unnecessarily complex. -Architecture should emerge through repeated validated improvements, not speculative rewrites. When changing structure, make the new structure easier to explain, test, and modify than the old one. +Architecture should emerge through repeated validated improvements, not speculative rewrites. When changing structure, make the new structure easier to explain, justify, test, and modify than the old one. -## 6. Canonical ownership of contract facts +For domain-heavy work, architecture must protect the domain model rather than replace it. Layered architecture, hexagonal architecture, REST, messaging, CQRS, event sourcing, data grids, and other styles are not the model. They are support structures. Use them only when they reduce real risk or satisfy real quality demands. The domain model should remain expressible in the local Ubiquitous Language, insulated from persistence, transport, UI, and foreign models where practical. -Shared contract facts must have exactly one canonical owner. +## 5. Canonical ownership of contract facts + +Shared contract facts must have exactly one canonical owner. Which facts qualify is itself a domain judgment, not a mechanical rule. Contract facts include externally meaningful: @@ -120,13 +153,13 @@ Contract facts include externally meaningful: - configuration keys and feature flags; - protocol, API, CLI, UI, database, and integration contracts. -Do not hard-code contract facts in parallel across code, interfaces, tools, tests, documentation, generated files, summaries, or error surfaces. - -Any surface that exposes a contract fact must derive it from the canonical source or from generated artifacts rooted in that source. Build-time or test-time validation should fail on drift, missing registration, contradictory definitions, or references to contract facts outside the canonical owner. +Do not hard-code contract facts in parallel across code, interfaces, tools, tests, documentation, generated files, summaries, or error surfaces. Any surface that exposes a contract fact must derive it from the canonical source or from generated artifacts rooted in that source. Build- or test-time validation should fail on drift, missing registration, contradictory definitions, or references to contract facts outside the canonical owner. When no canonical owner exists, create the smallest appropriate one before spreading the fact further. -## 7. State ownership and mutation discipline +For domain work, canonical ownership is context-sensitive. A term, status, event, or identifier may have one meaning in one Bounded Context and a different meaning in another. Do not create a false global owner for language that is only locally true. Across Bounded Contexts, the canonical owner may be a Published Language, stable API resource, event schema, command contract, Open Host Service, or translation map, not a shared domain class. + +## 6. State ownership and mutation discipline Every meaningful piece of state needs an owner and a mutation policy. @@ -142,7 +175,9 @@ Before changing stateful behavior, identify: Do not introduce a second source of truth. Do not patch derived state when the canonical state or mutation path is wrong. Do not add hidden state that future maintainers cannot locate or reason about. -## 8. Feedback must match risk +For domain work, identify the Aggregate or process that owns each invariant. Aggregates are consistency boundaries, not object graphs. Keep them as small as the true invariant permits. Prefer referencing other Aggregates by identity and using eventual consistency outside the boundary unless a real business rule demands atomic consistency. + +## 7. Feedback must match risk Use the cheapest feedback that proves the important behavior, but do not confuse cheap feedback with sufficient feedback. @@ -150,7 +185,9 @@ A pure function may need a unit test. A protocol may need a contract test. A mig When fixing a bug, reproduce it first if practical. When preventing recurrence, add the feedback that would have caught it. -## 9. Deletion and simplification require proof +For domain work, evidence should use business examples, not merely technical assertions. Prefer tests and executable examples that state the local language: given a domain situation, when a command or event occurs, then the invariant or state transition holds. If domain expert validation is required and unavailable, record that gap explicitly. + +## 8. Deletion and simplification require proof Deleting code is good when the dependency theory is sound. @@ -168,7 +205,7 @@ Before deleting or simplifying, check for: If safe deletion cannot be proven fully, reduce uncertainty with tooling and make the smallest reversible change. -## 10. Agent output contract +## 9. Agent output contract For non-trivial changes, produce more than a patch. Include a compact summary covering: @@ -192,23 +229,48 @@ Invariant: - Must remain true: - How it is protected: -Preservation: -- Where the relevant theory was recorded: +Justification: +- Why each touched part is the way it is: +- Known gaps in justification (theory the agent does not have): + +Re-cueing: +- Cues left for the next reader, and where: +- Theory that could not be expressed in artifacts, and who currently holds it: +``` + +When the change touches domain meaning, also include a proportional domain-design block: + +```text +Domain design lens: +- Domain / subdomain: +- Core / Supporting / Generic / non-domain judgment: +- Bounded Context: +- Ubiquitous Language terms changed or clarified: +- Term collisions or foreign concepts: +- Aggregate or consistency boundary affected: +- Entity / Value Object / Domain Service / Factory / Repository choices: +- Commands, workflows, or Domain Events affected: +- Cross-context relationship, if any: +- Published Language, Open Host Service, or Anticorruption Layer, if any: +- Domain examples, tests, or expert validation used: +- Known model gaps: ``` -Keep the summary proportional to the change. Small changes need small summaries. Risky changes need explicit reasoning. +Keep the summary proportional to the change. Small changes need small summaries. Risky changes need explicit reasoning. The "Known gaps" and "Theory that could not be expressed" lines are first-class outputs, not optional caveats — silence on them claims a theory the agent does not have. -## 11. Stop conditions +## 10. Stop conditions Stop when: - the requested behavior is implemented; - the relevant feedback is green; -- touched code is clearer, simpler, and easier to change; +- touched code is clearer, simpler, more justified, and easier to change; - shared contract facts have a canonical owner; - important invariants are protected; - blast radius has been considered and checked with available tools; -- newly discovered system knowledge has been preserved in a durable place; and +- justification gaps have been surfaced rather than silently closed; +- newly discovered cues have been left in a durable place; +- for domain changes, the local language, Bounded Context, invariant owner, and cross-context translation boundary are explicit enough for the next change; and - the next improvement is a separate slice. Do not continue expanding scope after the next step stops being clearly connected, safe, and validated. diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile new file mode 100644 index 00000000..0ec55428 --- /dev/null +++ b/.devcontainer/Dockerfile @@ -0,0 +1,36 @@ +# Contributor devcontainer for FTLLexEngine. +# +# Why this is not a published runtime image: +# - Contributors need the full Python, shell, native-build, and verification toolchain. +# - Atheris is treated as a contributor-time native fuzzing surface, not a shipped artifact. +# - The repository checkout stays on the host and is bind-mounted into the container. +FROM mcr.microsoft.com/devcontainers/python:1-3.13-bookworm@sha256:0aa711e570b306c02946cdda67587ce8c65978dbb65691341cbcfd4854dfcfff + +SHELL ["/bin/bash", "-o", "pipefail", "-c"] + +USER root + +RUN rm -f /etc/apt/sources.list.d/yarn.list \ + && apt-get update \ + && export DEBIAN_FRONTEND=noninteractive \ + && apt-get install -y --no-install-recommends \ + build-essential \ + clang-19 \ + libclang-rt-19-dev \ + lsof \ + procps \ + shellcheck \ + && ln -sf /usr/bin/clang-19 /usr/local/bin/clang \ + && ln -sf /usr/bin/clang++-19 /usr/local/bin/clang++ \ + && pip install --no-cache-dir "uv>=0.9.6,<1.0" \ + && python3.13 --version >/dev/null \ + && uv --version >/dev/null \ + && git --version >/dev/null \ + && bash --version >/dev/null \ + && shellcheck --version >/dev/null \ + && clang --version >/dev/null \ + && test -n "$(find "$(clang --print-resource-dir)"/lib/linux -maxdepth 1 -name 'libclang_rt.fuzzer*.a' -print -quit)" \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +USER vscode diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json new file mode 100644 index 00000000..4adf6d39 --- /dev/null +++ b/.devcontainer/devcontainer.json @@ -0,0 +1,37 @@ +{ + "name": "FTLLexEngine Contributor", + "build": { + "dockerfile": "Dockerfile", + "context": "." + }, + "features": { + "ghcr.io/devcontainers/features/docker-outside-of-docker:1": {} + }, + "workspaceFolder": "/workspaces/ftllexengine", + "workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/ftllexengine,type=bind", + "mounts": [ + "source=ftllexengine-cache,target=/home/vscode/.cache,type=volume" + ], + "containerEnv": { + "FTLLEXENGINE_DEVCONTAINER": "1", + "CLANG_BIN": "/usr/local/bin/clang", + "UV_LINK_MODE": "copy" + }, + "postStartCommand": "./scripts/devcontainer-prepare-user-home.sh", + "remoteUser": "vscode", + "updateRemoteUserUID": true, + "customizations": { + "vscode": { + "settings": { + "terminal.integrated.defaultProfile.linux": "bash", + "python.terminal.activateEnvironment": false + }, + "extensions": [ + "EditorConfig.EditorConfig", + "ms-python.python", + "ms-python.mypy-type-checker", + "charliermarsh.ruff" + ] + } + } +} diff --git a/.gitignore b/.gitignore index c1fef1f3..832421ba 100644 --- a/.gitignore +++ b/.gitignore @@ -12,6 +12,11 @@ downloads/ eggs/ .eggs/ lib/ +!/scripts/lib/ +!/scripts/lib/fuzz_hypofuzz/ +!/scripts/lib/fuzz_hypofuzz/*.sh +!/scripts/lib/fuzz_atheris/ +!/scripts/lib/fuzz_atheris/*.sh lib64/ parts/ sdist/ diff --git a/AGENTS.md b/AGENTS.md index 484ec900..55760050 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,25 +1,44 @@ # AGENTS.md — Agent Entry Protocol -This file is the repository entry point for agent work. It defines load order, precedence, repository-wide exceptions, and the universal minimum that applies before any specialized language, database/native, or documentation rule. +**Version:** 2.3.0 +**Updated:** 2026-04-30 + +This file is the repository entry point for agent work. It defines load order, precedence, repository-wide exceptions, and the universal minimum that applies before any specialized language, framework, database/native, domain-modeling, or documentation rule. + +## 0. Frame + +You are a *transient theory-holder*. You enter the repository cold, build a partial theory of the slice you touch, act on it, and leave. Per Naur (*Programming as Theory Building*, 1985), the program is not the artifact — it is the theory held by the people who build and maintain it; that theory cannot be fully written down and is not transferred by documentation alone. + +The protocols in this stack are a method. They are not a substitute for the theory. Their purpose is to keep the *absence* of theory visible, so that: + +1. you surface the tacit gap rather than papering over it with confident output; +2. you re-cue the next reader with artifacts that help them rebuild the relevant slice, while flagging what cannot be written down. + +A passing build, a closed issue, or a generated patch is not the outcome. ## 1. Required context loading When opening a repository, load context in this order: 1. Read this file completely. -2. Load `.codex/UNIVERSAL_ENGINEERING_CONTRACT.md`. This is the cross-language engineering contract. +2. Load `.codex/UNIVERSAL_ENGINEERING_CONTRACT.md` (v2.1.0+). This is the cross-language engineering contract. 3. Load `.codex/AGENTS_EXTRA.md` if it exists. This contains project-specific instructions. 4. Load the language/runtime protocol for each touched surface: - - Java 26+ / Gradle: `.codex/AGENTS_JAVA26_GRADLE.md` - - Kotlin 2.4+ / Gradle: `.codex/AGENTS_KOTLIN24_GRADLE.md` - - Python 3.13+: `.codex/AGENTS_PYTHON313.md` - - Rust 1.95+ / Cargo: `.codex/AGENTS_RUST195_CARGO.md` -5. Load the database/native dependency protocol for each touched surface: - - SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0: `.codex/AGENTS_SQLITE3MC233_SQLITE353.md` -6. For documentation authoring, documentation refactoring, or code changes that alter documented public contracts, load `.codex/PROTOCOL_AFAD.md` unless the only touched document is the repository root `README.md`. + - Java 26+ / Gradle: `.codex/AGENTS_JAVA26_GRADLE.md` (v2.0.0+) + - Kotlin 2.4+ / Gradle: `.codex/AGENTS_KOTLIN24_GRADLE.md` (v2.0.0+) + - Python 3.13+: `.codex/AGENTS_PYTHON313.md` (v2.0.0+) + - Rust 1.95+ / Cargo: `.codex/AGENTS_RUST195_CARGO.md` (v2.0.0+) +5. Load the application-framework protocol for each touched surface: + - Tauri 2.10.x: `.codex/AGENTS_TAURI210.md` (v2.0.0+) +6. Load the database/native dependency protocol for each touched surface: + - SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0: `.codex/AGENTS_SQLITE3MC233_SQLITE353.md` (v2.0.0+) +7. Load the domain-modeling lens **only when the change touches business meaning**: `.codex/DOMAIN_DRIVEN_DESIGN_LENS.md` (v1.0.0+). Triggers include domain state, business rules, workflow names, commands, domain events, permissions, policies, calculations, lifecycle transitions, user-facing business terms, or integration contracts between models. Do not load for purely mechanical work — build wiring, generic plumbing, infrastructure with no domain meaning. The Universal Engineering Contract §1.7 *Domain meaning gate* is the formal trigger. +8. For documentation authoring, documentation refactoring, or code changes that alter documented public contracts, load `.codex/PROTOCOL_AFAD.md` unless the only touched document is the repository root `README.md`. If a referenced file is absent, continue with the best available context and state the missing file in the work summary when it matters. +If a loaded protocol's major version does not match the universal contract's, treat the mismatch as a known re-cueing gap and surface it. + ## 2. Precedence Use the most specific applicable instruction, but do not silently relax correctness, security, compatibility, or verification requirements. @@ -29,25 +48,30 @@ Precedence order: 1. Explicit user request for the current task. 2. Project-specific instructions in `.codex/AGENTS_EXTRA.md`. 3. Repository-wide rules in this `AGENTS.md`, including the root `README.md` exception. -4. Applicable language/runtime-specific protocol. -5. Applicable database/native dependency protocol. -6. Applicable documentation protocol. -7. Universal Engineering Contract. -8. General language, framework, ecosystem, and documentation norms. +4. Applicable application-framework protocol. +5. Applicable language/runtime-specific protocol. +6. Applicable database/native dependency protocol. +7. Applicable domain-modeling lens (when the gate fires). +8. Applicable documentation protocol. +9. Universal Engineering Contract. +10. General language, framework, ecosystem, and documentation norms. When instructions conflict, prefer the stricter or more specific instruction unless it would make the task incorrect. Surface the conflict rather than guessing. ## 3. Universal minimum before changing a system -For every non-trivial change, build the smallest useful system map: +For every non-trivial change, build the smallest useful system map (per Universal Engineering Contract §1): - **Truth:** Where does the relevant state live? What is authoritative? Who can mutate it? - **Evidence:** What proves the system is working? What would reveal failure? - **Consequence:** What breaks if the touched component disappears or changes shape? - **Invariant:** What must remain true after the change? -- **Preservation:** Where should the discovered system theory live after the work? +- **Justification:** Can you explain *why* each touched part is the way it is, in terms of the world it maps to? If not, surface that as a known gap rather than a confident edit. +- **Re-cueing:** Where should the cues that help the next reader rebuild this slice of theory live? What part of the relevant theory could not be written down, and who currently holds it? -Use this map to decide what to change, how far to widen the change, what to verify, and what to document. +If the change touches business meaning, additionally apply the UEC §1.7 *Domain meaning gate* and the Domain-Driven Design Lens. Do not force the lens onto mechanical work. + +Use this map to decide what to change, how far to widen the change, what to verify, what to document, and what to flag as unresolved. ## 4. Surface dispatch @@ -58,14 +82,22 @@ Language/runtime surfaces: - Python 3.13+ projects use `.codex/AGENTS_PYTHON313.md`. - Rust 1.95+ / Cargo projects use `.codex/AGENTS_RUST195_CARGO.md`. +Application-framework surfaces: + +- Tauri 2.10.x apps, plugins, configuration, capabilities, permissions, bundling, updater/signing, mobile targets, and frontend/Rust IPC surfaces use `.codex/AGENTS_TAURI210.md` in addition to the Rust protocol and any applicable frontend language/framework norms. + Database/native dependency surfaces: -- SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0 surfaces use `.codex/AGENTS_SQLITE3MC233_SQLITE353.md` in addition to any applicable language protocol. +- SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0 surfaces use `.codex/AGENTS_SQLITE3MC233_SQLITE353.md` in addition to any applicable language or framework protocol. + +Domain-modeling surfaces: + +- Changes that touch business meaning, business rules, business state, workflows, commands, domain events, permissions, policies, calculations, lifecycle transitions, user-facing business terms, or integration contracts between models additionally use `.codex/DOMAIN_DRIVEN_DESIGN_LENS.md`. The lens is loaded conditionally, not by language. Touching Java does not mean DDD; touching the *billing* module of a Java app does. Always run the lens triage (lens §1) before applying its tactical chapters. Other surfaces: -- Other languages, runtimes, databases, and native dependencies use the Universal Engineering Contract plus repository-specific instructions. Do not apply Java-, Kotlin-, Python-, Rust-, or SQLite3MC-specific rules to unrelated systems unless the repository explicitly asks for them. -- If a repository spans multiple languages or native dependencies, use the relevant protocol for each touched surface and the Universal Engineering Contract across all boundaries. +- Other languages, runtimes, frameworks, databases, and native dependencies use the Universal Engineering Contract plus repository-specific instructions. Do not apply Java-, Kotlin-, Python-, Rust-, Tauri-, SQLite3MC-, or DDD-specific rules to unrelated systems unless the repository explicitly asks for them. +- If a repository spans multiple languages, frameworks, or native dependencies, use the relevant protocol for each touched surface and the Universal Engineering Contract across all boundaries. ## 5. Documentation dispatch and root README exception @@ -87,4 +119,93 @@ Nested `README.md` files are governed by their actual role. If a nested README i ## 6. Work summary requirement -For non-trivial changes, the final work summary must include the verification performed and any important system theory preserved or still missing. Keep the summary proportional to the risk of the change. +For non-trivial changes, the final work summary must follow the Universal Engineering Contract §9 output template (Truth, Evidence, Consequence, Invariant, Justification, Re-cueing). For changes that fired the §1.7 domain-meaning gate, also include the proportional domain-design block defined in UEC §9. Keep the summary proportional to the risk of the change. Silence on justification gaps and inexpressible theory claims a theory you do not have. + +## 7. Standing working norms + +These apply to every non-trivial agent session unless a project-specific override says otherwise. Day-to-day session prompts may reference these subsections by number rather than restating them. + +### 7.1 Evidence over theorycrafting + +Base claims on the actual project. Inspect code, tests, docs, examples, build files, configuration, scripts, and runtime behavior as needed. Do not rely on assumptions or surface-level reading. If a suspected issue cannot be proven, either investigate further or mark it as unconfirmed and state what evidence is missing. + +### 7.2 Investigation freedom and temporary workspace + +You may create custom tools, scripts, probes, fixtures, or experiments to investigate, reproduce, validate, or disprove issues. Use any available runtime appropriate for the project — including Ruby v4 (via `ruby-brew`) and Python 3 (via `python3`) — even when the project itself is in a different language. + +Put all temporary scripts, logs, generated files, experiments, and investigation artifacts under `tmp/` at the project root, or the project's conventional temporary workspace if it has one. Do not pollute the project tree. + +Temporary artifacts must: + +- not interfere with quality gates; +- not require project-configuration changes to hide them from checks; +- be deleted before final quality-gate execution unless intentionally promoted into real tests, fixtures, tools, or documentation. + +### 7.3 Incidental observations + +While reading the codebase, docs, docstrings, examples, tests, build files, or supporting materials, do not ignore unrelated deficiencies you discover. Incorporate them into the current session's workplan rather than skipping. + +If `OBSERVATIONS_INCIDENTAL.txt` (or the project's equivalent observation log) exists, read it and resolve every valid item still open. + +The Universal Engineering Contract's "next improvement is a separate slice" rule still applies — incorporate when cohesive, defer when truly out of scope, and prefer the project's observation log over silent skip. + +### 7.4 Systems over goals + +Per Universal Engineering Contract §0, in concrete operational form: + +- fix root causes, not symptoms; +- choose clean, decisive architecture over compatibility-preserving compromises; +- choose breaking refactors when they are the correct engineering answer; +- do not add backwards-compatibility layers, migration shims, transitional APIs, or legacy-preserving glue unless genuinely unavoidable; +- when a shim is genuinely unavoidable, defend the decision with proof — name the consumer, the contract, and the removal trigger; +- treat compatibility shims and migrations as technical debt; +- break up god-files when you encounter them. + +### 7.5 Quality gates + +Run the project's full quality-gate suite at the end of non-trivial work. Iterate on failures until the gates pass. Do not weaken, bypass, exclude, or reconfigure quality gates to obtain a pass. + +If the project has a standard check script, use it. Include relevant build, test, lint, formatting, documentation, example, packaging, fuzz/property, publication-dry-run, metadata, and dependency-license checks where applicable. + +### 7.6 Tests assert intended behavior + +Tests must assert the corrected or newly intended behavior. Do not merely loosen tests, broaden assertions, or skip tests to tolerate broken behavior. + +For projects with fuzzing, property tests, randomized tests, or seed corpora: update them where relevant; add or revise seeds carefully to avoid skewing the corpus toward only the discovered cases; run the relevant fuzz/property checks where feasible, including live hands-on fuzzing when the project supports it. + +For domain-touching work, prefer tests that state the local language: given a domain situation, when a command or event occurs, then the invariant or state transition holds. + +### 7.7 Documentation, CHANGELOG, and public-facing artifacts + +Documentation must accurately reflect the implemented system. When code, behavior, commands, examples, APIs, or workflows change, update the corresponding documentation, examples, and any internal parity or consistency docs in the same change. The root `README.md` is a special case per §5. + +When the project maintains a `CHANGELOG.md`: + +- record user-visible or developer-visible changes under the project's `UNRELEASED` section (or its equivalent); +- write entries from the public reader's point of view; +- never mention this entry-protocol file, internal session prompts, work specifications, the `.codex/` protocol stack, AI-agent context, or other internal scaffolding. + +The same public-facing rule applies to README, release notes, examples, error messages, help text, and any user-visible artifact. + +### 7.8 Project baseline + +Apply the project's specified language, runtime, framework, and platform baseline when modernizing or refactoring code. Do not assume a baseline the project does not specify, and do not silently raise a baseline. + +If the touched surface has a protocol in this stack (Java/Kotlin/Python/Rust/Tauri/SQLite3MC/DDD), follow it. If it does not, fall back to the Universal Engineering Contract plus repository-specific instructions per §4. + +### 7.9 Final response + +For non-trivial work, the final report combines two shapes: + +- the Universal Engineering Contract §9 output template (Truth, Evidence, Consequence, Invariant, Justification, Re-cueing) for the structural part, plus the conditional domain-design block when §1.7 fired; +- plus the operational items: what was done; breaking refactors performed (if any); tests, fuzzing, examples, docs, changelog updates; quality-gate commands run and final results; only genuinely blocked items, with precise reasons. + +Keep the report proportional to risk. For tiny edits, a concise sentence with verification is enough. + +## 7.10 No emoji + +Do not add, retain, or introduce emoji anywhere. This rule applies across all programming languages, markup languages, documentation formats, and plain text. + +This includes, without limitation, source code, inline comments, documentation comments, docstrings, commit messages, changelogs, release notes, configuration files, documentation, and this AGENTS.md file. + +There are no exceptions. Remove any emoji encountered while creating, editing, reviewing, or refactoring content. diff --git a/CHANGELOG.md b/CHANGELOG.md index c3a8799f..0117c6a8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: CHANGELOG -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [changelog, release notes, version history, breaking changes, migration, fixed, what's new] questions: ["what changed in version X?", "what are the breaking changes?", "what was fixed in the latest release?", "what is the release history?"] @@ -15,6 +15,72 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.166.0] - 2026-05-01 + +### Changed + +- **Locale fallback creation now rejects obvious invalid-language floods earlier and bounds repeated warning volume.** + `LocaleContext.create()` now fast-rejects locale codes whose primary language + tag is absent from Babel's locale and language-alias metadata, reuses one + cached `en_US` fallback locale instead of reparsing it on every fallback, and + suppresses additional fallback warnings after the initial burst so hostile or + misconfigured locale traffic does not turn runtime formatting into a log + amplifier. `LocaleContext.create_or_raise()` now shares the same metadata fast + path while preserving strict rejection semantics for cached fallback entries. +- **The contributor devcontainer bind mount now favors correctness over stale-read caching.** + `.devcontainer/devcontainer.json` no longer declares `consistency=cached` on the repository + workspace bind mount, `scripts/validate-devcontainer.sh` now rejects that stale-read mode, and + `docs/DEVELOPER_DEVCONTAINER.md` now states that in-progress host edits must be visible + immediately inside the contributor container. This fixes a live workflow defect where + `./check.sh` inside the devcontainer could read older copies of recently edited files than the + host workspace and therefore disagree with host-side verification during the same session. +- **The contributor devcontainer now declares `UV_LINK_MODE=copy` for bind-mounted installs.** + The canonical container workflow no longer emits `uv` hardlink fallback warnings every time a + repository shell gate installs the local package into a devcontainer-owned virtual environment. + `devcontainer.json`, the devcontainer validator, and the contributor workflow guide now agree on + that deterministic bind-mount install mode. +- **Locale-aware parsing now accepts copied RTL display text and locale-native digits, and parser-only optional imports fail with actionable Babel guidance on first use.** + The parsing boundary now strips invisible bidi control marks, retries locale-default numbering systems such as Arabic-Indic digits before falling back to Latin digits, and keeps currency parsing on the same numeric parsing core. CLDR currency symbol discovery now normalizes formatting-only RTL marks before caching symbols, so locale-native Arabic symbols such as `ج.م.` roundtrip through `parse_currency()`. Parser-only installs now let explicit optional imports succeed up front while raising `BabelImportError` with install guidance the first time a Babel-backed class or formatter is actually used. +- **The public docs now teach request-flow locale and currency constraints more explicitly.** + README, quick-reference, parsing, locale, and workflow guides now explain that + ambiguous symbols such as `"$4.25"` require either `infer_from_locale=True` or an + explicit ISO code, and that each `FluentLocalization` instance owns a fixed fallback + chain. +- **Verification now has one canonical contract and the workflow tour examples execute as published.** + `./check.sh` is the documented full-repo verification entry point, `./scripts/lint.sh` + declares its validator surface explicitly instead of discovering scripts by comment + headers, `./scripts/test.sh` reads its coverage floor from `pyproject.toml`, and the + runnable Python fences in `docs/WORKFLOW_TOUR.md` are now self-contained so the docs + validator and copy-paste users execute the same examples. +- **Validation and serializer internals are now split across focused helper modules, and the HypoFuzz runner is decomposed into sourced mode libraries.** + Resource validation now separates syntax extraction and entry/reference passes from the + orchestration entry point, serializer recursion helpers live in a dedicated engine module, + and `scripts/fuzz_hypofuzz.sh` now delegates large operational modes to focused shell + libraries so the public script remains a thin dispatcher rather than a monolith. +- **Contributor verification now has a committed devcontainer contract, and Atheris native fuzzing is container-owned.** + The repository now ships `.devcontainer/` plus `./scripts/validate-devcontainer.sh`, `./check.sh` + requires the contributor container, and the contributor docs point to the devcontainer as the + canonical engineering workflow instead of relying on host-specific native setup. +- **Native fuzzing workflows now provision their real toolchain contracts instead of assuming ambient packages.** + `./scripts/fuzz_hypofuzz.sh --deep` now runs through the dedicated `fuzz` dependency group that owns + `hypofuzz` and the Hypothesis CLI extra, while the contributor devcontainer now ships LLVM 19, + compiler-rt/libFuzzer, and `CLANG_BIN` so `./scripts/fuzz_atheris.sh --setup` can build Atheris + reproducibly on Linux contributor machines. +- **Containerized shell gates now use devcontainer-scoped uv environments instead of reusing host `.venv-*` names.** + The lint, test, benchmark, HypoFuzz, `./check.sh`, and Atheris entrypoints now pivot into + `.venv-devcontainer-*` paths inside the bind-mounted workspace so Linux container environments + cannot overwrite or corrupt host macOS virtual environments. +- **The Atheris runner now uses one explicit target manifest and split launcher libraries.** + `./scripts/fuzz_atheris.sh` now loads target names from `fuzz_atheris/targets.tsv`, delegates its + operations to focused shell helpers, and the large runtime and localization targets now expose thin + wrapper entrypoints over extracted support modules instead of keeping those owners monolithic. +- **Cache audit records now expose predictable event ordering under read-heavy and concurrent workloads.** + `WriteLogEntry.sequence` now tracks monotonic audit-event order across misses, hits, puts, evictions, + and corruption records, while the new `cache_sequence` field preserves the cache-entry sequence that + was current for the logged event. The runtime reference and quickstart example now reflect that + distinction, so cache-history consumers no longer see misleading `sequence=0` miss records in normal + production traffic. + ## [0.165.0] - 2026-04-24 ### Changed @@ -204,9 +270,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 Documentation validation, example execution, lint, tests, and Atheris corpus health all clear `PYTHONPATH`/`sys.path` overrides so quality gates exercise the same import contract users get. - **Atheris corpus health now bootstraps its dedicated environment on demand.** The - `./scripts/fuzz_atheris.sh --corpus` path now creates `.venv-atheris` before invoking the - health checker, so fresh machines and `./check.sh` no longer depend on a pre-existing - Atheris venv. + `./scripts/fuzz_atheris.sh --corpus` path now creates the dedicated Atheris environment before + invoking the health checker, so fresh machines and `./check.sh` no longer depend on a + pre-existing native fuzzing venv. - **Public examples and parser-focused tests now describe current behavior instead of stale cleanup notes or old line coordinates.** The shipped transformer example no longer embeds inline `TODO` markers, parser coverage tests now describe behavior rather than historical line numbers from the pre-split parser, and the @@ -7022,7 +7088,8 @@ Both validators are re-exported from `ftllexengine.introspection` and the root [0.29.0]: https://github.com/resoltico/ftllexengine/releases/tag/v0.29.0 [0.28.1]: https://github.com/resoltico/ftllexengine/releases/tag/v0.28.1 [0.28.0]: https://github.com/resoltico/ftllexengine/releases/tag/v0.28.0 -[Unreleased]: https://github.com/resoltico/FTLLexEngine/compare/v0.165.0...HEAD +[Unreleased]: https://github.com/resoltico/FTLLexEngine/compare/v0.166.0...HEAD +[0.166.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.165.0...v0.166.0 [0.165.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.164.0...v0.165.0 [0.164.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.163.0...v0.164.0 [0.163.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.162.0...v0.163.0 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 06af2c09..c5cf4cf1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: CONTRIBUTING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [contributing, development, uv, lint, test, fuzz, benchmark, release, virtualenv] questions: ["how do I set up development?", "how do I run lint and tests?", "how do I work on fuzzing?", "how do I prepare a release?"] @@ -11,38 +11,39 @@ route: # Contributing to FTLLexEngine **Purpose**: Set up a working development environment and run the same validation paths the repo expects. -**Prerequisites**: `uv`, Bash 5+, Python 3.13 available locally. Python 3.14 is recommended for forward-compat checks. +**Prerequisites**: Docker plus either the Dev Containers IDE integration or `npx --yes @devcontainers/cli`. Direct host `uv` use is optional; direct host execution of the repository shell gates also requires a Bash 5.0+ `bash` on `PATH`. ## Overview -This repository uses `uv` for dependency management and self-isolating shell scripts for the main quality gates. The root `.venv` is the manual development environment; the scripted gates pivot into versioned environments such as `.venv-3.13`, `.venv-3.14`, and `.venv-atheris` as needed. +This repository uses a committed contributor devcontainer for the canonical engineering workflow. Repository shell gates continue to use `uv`-managed environments internally, but the supported path for full verification and native Atheris work is the devcontainer described in [docs/DEVELOPER_DEVCONTAINER.md](docs/DEVELOPER_DEVCONTAINER.md). The shortest reliable workflow is: ```bash -uv sync --group dev --group release -./check.sh +npx --yes @devcontainers/cli up --workspace-folder . +npx --yes @devcontainers/cli exec --workspace-folder . ./check.sh ``` -The default test gate enforces **100% line coverage and 100% branch coverage** for `src/ftllexengine`. +The default test gate enforces the repository coverage floor declared in `pyproject.toml`. ## Setup ```bash git clone https://github.com/resoltico/FTLLexEngine.git cd FTLLexEngine -uv sync --group dev --group release -uv sync --group fuzz +npx --yes @devcontainers/cli up --workspace-folder . ``` -Optional environments: +Optional direct host setup: -- `PY_VERSION=3.14 ./scripts/lint.sh` and `PY_VERSION=3.14 ./scripts/test.sh` create or reuse `.venv-3.14`. -- `./scripts/fuzz_atheris.sh --help` bootstraps `.venv-atheris` on demand and requires Python 3.13. +- `uv sync --group dev --group release` +- `uv sync --group fuzz` +- Host shell gates create or reuse `.venv-3.13` and `.venv-3.14`; the devcontainer uses `.venv-devcontainer-*` names to avoid cross-platform contamination of the bind-mounted workspace. +- Stock macOS `/bin/bash` 3.2 is not sufficient for the repo shell entrypoints. Use the devcontainer path or install a Bash 5.0+ `bash` before invoking `./scripts/*.sh` directly from the host. ## Daily Workflow -Run the repo gates directly; the scripts manage their own interpreter pivots. +Run the repo gates inside the devcontainer. If you are already in a devcontainer terminal, use the script directly; from the host, use `devcontainers exec`. ```bash ./check.sh @@ -56,8 +57,8 @@ Useful variants: - `./scripts/benchmark.sh` - `./scripts/fuzz_hypofuzz.sh` - `./scripts/fuzz_hypofuzz.sh --deep --time 300` -- `./scripts/fuzz_atheris.sh numbers --time 60` -- `./scripts/fuzz_atheris.sh --list` to inspect stored crashes and finding artifacts +- Inside a devcontainer terminal: `./scripts/fuzz_atheris.sh graph --time 60` +- Inside a devcontainer terminal: `./scripts/fuzz_atheris.sh --list` to inspect stored crashes and finding artifacts ## Documentation Work @@ -72,6 +73,7 @@ uv run python scripts/run_examples.py Expectations: - README and guide Python snippets should run as written. +- Canonical shell quick-start blocks in the fuzzing guides should execute as written from the documented host-with-devcontainer-wrapper context. - `examples/*.py` should execute cleanly under the dev environment. - Source-code docstring transcripts are illustrative API notes, not an executable test suite. Keep runnable examples in Markdown or `examples/`, and mark any source `>>>` transcript with `# doctest: +SKIP`. - Reference docs should describe current symbols, not removed or internal machinery. @@ -89,10 +91,11 @@ uv run mypy --config-file examples/mypy.ini examples Two fuzzing surfaces are maintained: - `./scripts/fuzz_hypofuzz.sh` for Hypothesis and HypoFuzz. -- `./scripts/fuzz_atheris.sh` for native Atheris/libFuzzer targets. +- `./scripts/fuzz_atheris.sh` for native Atheris/libFuzzer targets inside the contributor devcontainer. See: +- [docs/DEVELOPER_DEVCONTAINER.md](docs/DEVELOPER_DEVCONTAINER.md) - [docs/FUZZING_GUIDE.md](docs/FUZZING_GUIDE.md) - [docs/FUZZING_GUIDE_HYPOFUZZ.md](docs/FUZZING_GUIDE_HYPOFUZZ.md) - [docs/FUZZING_GUIDE_ATHERIS.md](docs/FUZZING_GUIDE_ATHERIS.md) @@ -126,7 +129,7 @@ Before opening a PR, make sure the baseline gates pass: ./check.sh ``` -`./scripts/test.sh` is expected to fail on any coverage regression below the repository's 100% line-and-branch baseline. +`./scripts/test.sh` is expected to fail on any coverage regression below the repository policy declared in `pyproject.toml`. When the change touches runtime behavior or supported Python versions, also run the forward-compat pass: diff --git a/PATENTS.md b/PATENTS.md index e6769e31..1f91d03c 100644 --- a/PATENTS.md +++ b/PATENTS.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: LEGAL -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [patents, legal, license, fluent, apache, mit, babel] questions: ["what is the patent position?", "does the project include a patent grant?", "what about the Fluent specification license?"] diff --git a/README.md b/README.md index 2dfe9a5f..f9040f6d 100644 --- a/README.md +++ b/README.md @@ -1,25 +1,31 @@ -# FTLLexEngine — Fluent runtime for real-world localization - -FTLLexEngine is a Python runtime and parsing toolkit for Fluent `.ftl` resources, built for teams that need locale-aware text, money, dates, and user-input parsing without rebuilding the same rules in application code. - -If you are still stitching this together with string interpolation, one-off parsers, and per-locale edge-case fixes, the same bug tends to get fixed in three places. +[![FTLLexEngine Art](https://raw.githubusercontent.com/resoltico/FTLLexEngine/main/images/FTLLexEngine.jpg)](https://github.com/resoltico/FTLLexEngine) [![PyPI](https://img.shields.io/pypi/v/ftllexengine.svg)](https://pypi.org/project/ftllexengine/) [![Python Versions](https://img.shields.io/pypi/pyversions/ftllexengine.svg)](https://pypi.org/project/ftllexengine/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -- Keep plural rules and locale formatting in `.ftl`, close to the messages themselves. -- Parse localized numbers, dates, and currency back into exact Python types. -- Fail startup early when resources or message schemas drift. -- Share internally synchronized bundles safely across concurrent requests. +# FTLLexEngine — Fluent localization runtime and parser for Python + +FTLLexEngine is a Python library for the Fluent `.ftl` specification: format locale-aware prices, +dates, and messages for 200+ locales, then parse localized user input back to exact Python types +in the same stack. -The nearby alternative is a mix of hand-kept formatting rules, ad-hoc parsing helpers, and translation checks that only happen after a request is already live. FTLLexEngine turns that into one repeatable runtime. +Most setups handle the two directions separately — one library brews the outbound message, +something hand-rolled to parse the reply back. Locale rules drift between them. FTLLexEngine runs +both from the same locale, validates `.ftl` resources at boot before the first request, and keeps +threads isolated without touching global state. -[Try a working snippet](docs/QUICK_REFERENCE.md) · [Take the deeper workflow tour](docs/WORKFLOW_TOUR.md) · [Get the package on PyPI](https://pypi.org/project/ftllexengine/) +- Format currency, dates, and plural messages correctly for 200+ locales via CLDR +- Parse localized user input back to `Decimal`, `date`, or typed values — no float drift +- Validate `.ftl` resources and message schemas at boot, before the first request +- Thread-safe bundles, no global locale state -## One Small Workflow +[Copy-paste patterns](docs/QUICK_REFERENCE.md) · [Workflow tour](docs/WORKFLOW_TOUR.md) · [PyPI](https://pypi.org/project/ftllexengine/) -For a coffee exporter, one invoice line and one buyer reply are enough to create drift: display logic in one place, parsing logic in another, validation nowhere. FTLLexEngine keeps that move in one stack. +## Both Ends of the Counter + +A specialty coffee exporter invoices buyers in German. Buyers reply in their local number format. +One runtime handles both ends: ```python from decimal import Decimal @@ -29,55 +35,53 @@ from ftllexengine.parsing import parse_currency bundle = FluentBundle("de_DE", use_isolating=False) bundle.add_resource('quote = Angebot: { CURRENCY($amount, currency: "EUR") }') -text, errors = bundle.format_pattern("quote", {"amount": Decimal("12450.00")}) -assert errors == () -assert text == "Angebot: 12.450,00\u00a0€" +text, _ = bundle.format_pattern("quote", {"amount": Decimal("12450.00")}) +# → "Angebot: 12.450,00 €" (non-breaking space before €) -parsed, errors = parse_currency("12.450,00 EUR", "de_DE", default_currency="EUR") -assert errors == () -assert parsed == (Decimal("12450.00"), "EUR") +parsed, _ = parse_currency("12.450,00 EUR", "de_DE", default_currency="EUR") +# → (Decimal("12450.00"), "EUR") ``` -The same locale-aware runtime formats the outgoing quote and parses the buyer’s reply back into an exact `Decimal`. +Same locale rules write the invoice and read the buyer's reply. No separate parser. No float +approximation. ## Where It Fits -Use FTLLexEngine when the same message has to survive more than one locale, more than one direction, or more than one layer of your system. - -- Good fit: Fluent-based apps, invoice and checkout flows, localized forms, startup validation for translation packs, and systems that care about exact decimals instead of float luck. -- Good fit: Teams that want message grammar, money formatting, and localized input parsing to stay consistent instead of drifting between templates, helpers, and validation code. -- Keep it simple: single-locale apps, plain string formatting, or projects that do not need Fluent at all. +Python apps using Fluent `.ftl` for messages, plural rules, and locale-aware formatting — +especially when users send localized prices, dates, or quantities that need to come back as exact +typed values. Systems that validate `.ftl` resources before accepting traffic, and concurrent apps +that need locale isolation without shared mutable state. -## Start In Two Paths +## Install -Use the full runtime when you need formatting, localization orchestration, and localized parsing: +Full runtime — formatting, bidirectional parsing, CLDR locale data: ```bash uv add ftllexengine[babel] +# or: pip install "ftllexengine[babel]" ``` -Use the parser-only install when you only need syntax parsing, AST work, validation, and zero-dependency helper surfaces: +Parser only — FTL syntax, AST, validation, zero Babel dependency: ```bash uv add ftllexengine +# or: pip install ftllexengine ``` -Start from the path that matches your job: - -- [Copy the smallest working examples](docs/QUICK_REFERENCE.md) -- [Run the shipped examples](examples/README.md) -- [Browse parsing, thread-safety, and boot-validation guides](docs/DOC_00_Index.md) +Python 3.13+. Fully typed. Built on the [Fluent specification](https://projectfluent.org/) with +CLDR data via Babel. -## Why It Feels Safe To Try +- [Copy-paste patterns](docs/QUICK_REFERENCE.md) +- [Workflow tour](docs/WORKFLOW_TOUR.md) +- [API reference](docs/DOC_00_Index.md) +- [Runnable examples](examples/) -- Published on [PyPI](https://pypi.org/project/ftllexengine/) for Python 3.13+. -- Built around the [Fluent specification](https://projectfluent.org/) and CLDR-backed locale data via Babel. -- Fully typed, MIT-licensed, and shipped with runnable examples plus repository checks for docs, examples, and version sync. -- Supports parser-only installs for syntax and validation work when you do not need the Babel-backed runtime surface. -- Release and publishing steps live in [docs/RELEASE_PROTOCOL.md](docs/RELEASE_PROTOCOL.md). +Maintainers: [Release protocol](docs/RELEASE_PROTOCOL.md). ## Legal -FTLLexEngine is MIT-licensed. The optional `babel` extra adds Babel under BSD-3-Clause terms. FTLLexEngine is an independent implementation of the [Fluent syntax specification](https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf) and is not affiliated with or endorsed by Mozilla. +MIT-licensed. The optional `[babel]` extra adds Babel under BSD-3-Clause. FTLLexEngine is an +independent implementation of the [Fluent syntax specification](https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf) +and is not affiliated with or endorsed by Mozilla. [LICENSE](LICENSE) · [NOTICE](NOTICE) · [PATENTS.md](PATENTS.md) diff --git a/check.sh b/check.sh index a8dfeb52..3e2e50ac 100755 --- a/check.sh +++ b/check.sh @@ -3,12 +3,27 @@ set -euo pipefail PY_VERSION="${PY_VERSION:-3.13}" -UV_ENV=".venv-${PY_VERSION}" -ATHERIS_SMOKE_TIME="${ATHERIS_SMOKE_TIME:-5}" +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then + UV_ENV=".venv-devcontainer-${PY_VERSION}" +else + UV_ENV=".venv-${PY_VERSION}" +fi +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then + export UV_LINK_MODE="copy" +fi +ATHERIS_TARGET_SMOKE_TIME="${ATHERIS_TARGET_SMOKE_TIME:-3}" ROOT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" cd "$ROOT_DIR" +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" != "1" ]]; then + printf 'Error: ./check.sh must be run inside the committed contributor devcontainer.\n' >&2 + printf 'Use the Dev Containers extension or:\n' >&2 + printf ' npx --yes @devcontainers/cli up --workspace-folder .\n' >&2 + printf ' npx --yes @devcontainers/cli exec --workspace-folder . ./check.sh\n' >&2 + exit 1 +fi + run_step() { local title="$1" shift @@ -21,13 +36,13 @@ uv_python() { } run_step "Version Validation" uv_python scripts/validate_version.py +run_step "Contributor Devcontainer" ./scripts/validate-devcontainer.sh run_step "Documentation Validation" uv_python scripts/validate_docs.py run_step "Examples" uv_python scripts/run_examples.py run_step "Lint" ./scripts/lint.sh run_step "Tests" ./scripts/test.sh run_step "HypoFuzz Preflight" ./scripts/fuzz_hypofuzz.sh --preflight run_step "Atheris Corpus Health" ./scripts/fuzz_atheris.sh --corpus -run_step "Atheris Graph Smoke" ./scripts/fuzz_atheris.sh graph --time "$ATHERIS_SMOKE_TIME" -run_step "Atheris Introspection Smoke" ./scripts/fuzz_atheris.sh introspection --time "$ATHERIS_SMOKE_TIME" +run_step "Atheris Manifest Smoke" ./scripts/fuzz_atheris.sh --smoke-all --time "$ATHERIS_TARGET_SMOKE_TIME" printf '\n[PASS] Full repository check completed.\n' diff --git a/docs/CUSTOM_FUNCTIONS_GUIDE.md b/docs/CUSTOM_FUNCTIONS_GUIDE.md index acede757..40d9ac4b 100644 --- a/docs/CUSTOM_FUNCTIONS_GUIDE.md +++ b/docs/CUSTOM_FUNCTIONS_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: CUSTOM_FUNCTIONS -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [custom functions, fluent_function, FunctionRegistry, locale injection, add_function] questions: ["how do I add a custom function?", "how does locale injection work?", "should I use a registry or add_function?"] diff --git a/docs/DATA_INTEGRITY_ARCHITECTURE.md b/docs/DATA_INTEGRITY_ARCHITECTURE.md index eb4c6b0f..ce6db068 100644 --- a/docs/DATA_INTEGRITY_ARCHITECTURE.md +++ b/docs/DATA_INTEGRITY_ARCHITECTURE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: ARCHITECTURE -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [data integrity, strict mode, FrozenFluentError, IntegrityCheckFailedError, cache audit, boot validation] questions: ["how does strict mode relate to integrity?", "what audit evidence does the runtime expose?", "what is boot validation for?"] diff --git a/docs/DEVELOPER_DEVCONTAINER.md b/docs/DEVELOPER_DEVCONTAINER.md new file mode 100644 index 00000000..685c9999 --- /dev/null +++ b/docs/DEVELOPER_DEVCONTAINER.md @@ -0,0 +1,76 @@ +--- +afad: "4.0" +version: "0.166.0" +domain: CONTRIBUTING +updated: "2026-05-01" +route: + keywords: [devcontainer, contributor workflow, docker, check.sh, atheris] + questions: ["how do I open the contributor container?", "how do I run the full repo gate?", "how do I run Atheris in the supported environment?"] +--- + +# Contributor Devcontainer + +**Purpose**: Run the repository's canonical contributor workflow inside a committed, reproducible container surface. +**Prerequisites**: Docker plus either the Dev Containers IDE integration or `npx --yes @devcontainers/cli`. + +## Canonical Workflow + +From the host: + +```bash +npx --yes @devcontainers/cli up --workspace-folder . +npx --yes @devcontainers/cli exec --workspace-folder . ./check.sh +``` + +From an already-open devcontainer terminal: + +```bash +./check.sh +``` + +`./check.sh` is intentionally container-owned. It validates the container contract, runs docs and examples, executes lint and pytest, runs HypoFuzz preflight, and performs bounded live Atheris smoke checks. + +## What The Container Owns + +- Python 3.13 as the canonical contributor interpreter +- `uv` for dependency and environment orchestration +- LLVM 19 plus the compiler-rt/libFuzzer archives required by Atheris native builds +- `shellcheck` and the shell-based quality gates +- Writable cache mounts for repeatable dependency resolution +- `UV_LINK_MODE=copy` so bind-mounted workspace installs do not emit hardlink fallback warnings + +The container is a contributor environment, not a published runtime image. The repository checkout stays on the host and is bind-mounted into the container with immediate read visibility for in-progress edits. +Repository shell gates use `.venv-devcontainer-*` names inside that bind-mounted workspace so Linux container environments do not overwrite host macOS virtual environments. + +## Daily Commands + +Inside the devcontainer: + +```bash +./scripts/lint.sh +./scripts/test.sh +./scripts/fuzz_hypofuzz.sh --preflight +./scripts/fuzz_atheris.sh --smoke-all --time 3 +``` + +From the host without opening an interactive shell: + +```bash +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_hypofuzz.sh --preflight +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --smoke-all --time 3 +``` + +## Validation + +The committed container contract is verified by: + +```bash +./scripts/validate-devcontainer.sh +``` + +That script validates `devcontainer.json`, builds the image defined by `.devcontainer/Dockerfile`, checks the required toolchain, and verifies the writable cache-repair hook. +It also verifies that the image exposes a working `CLANG_BIN` and a discoverable libFuzzer archive for Atheris. + +## Host-Only Work + +Direct host execution remains available for non-container-native tasks when you need it, such as ad hoc `uv` commands or forward-compat runs with `PY_VERSION=3.14`. Host invocations of the repository shell entrypoints require a Bash 5.0+ `bash` on `PATH`; stock macOS `/bin/bash` 3.2 is unsupported. The container is the canonical path for contributor verification and the required path for native Atheris work. diff --git a/docs/DOC_00_Index.md b/docs/DOC_00_Index.md index 9a6ee52c..21353627 100644 --- a/docs/DOC_00_Index.md +++ b/docs/DOC_00_Index.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: INDEX -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [api index, routing, FluentBundle, FluentLocalization, parse_ftl, FunctionRegistry, FrozenFluentError, introspection, detect_cycles, entry_dependency_set] questions: ["where is a symbol documented?", "which file documents the runtime APIs?", "which file documents locale parsing, introspection, and analysis APIs?", "where are syntax, parsing, diagnostics, and dependency-graph references?"] @@ -167,6 +167,7 @@ route: | `SourceSpan` | [DOC_05_Diagnostics.md](DOC_05_Diagnostics.md) | `SourceSpan` | | `scripts/validate_docs.py` | [DOC_06_Testing.md](DOC_06_Testing.md) | `scripts/validate_docs.py` | | `scripts/validate_version.py` | [DOC_06_Testing.md](DOC_06_Testing.md) | `scripts/validate_version.py` | +| `scripts/validate-devcontainer.sh` | [DOC_06_Testing.md](DOC_06_Testing.md) | `scripts/validate-devcontainer.sh` | | `scripts/run_examples.py` | [DOC_06_Testing.md](DOC_06_Testing.md) | `scripts/run_examples.py` | | `check.sh` | [DOC_06_Testing.md](DOC_06_Testing.md) | `check.sh` | | `scripts/lint.sh` | [DOC_06_Testing.md](DOC_06_Testing.md) | `scripts/lint.sh` | @@ -180,6 +181,7 @@ route: - [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - [CUSTOM_FUNCTIONS_GUIDE.md](CUSTOM_FUNCTIONS_GUIDE.md) - [DATA_INTEGRITY_ARCHITECTURE.md](DATA_INTEGRITY_ARCHITECTURE.md) +- [DEVELOPER_DEVCONTAINER.md](DEVELOPER_DEVCONTAINER.md) - [FUZZING_GUIDE.md](FUZZING_GUIDE.md) - [FUZZING_GUIDE_ATHERIS.md](FUZZING_GUIDE_ATHERIS.md) - [FUZZING_GUIDE_HYPOFUZZ.md](FUZZING_GUIDE_HYPOFUZZ.md) diff --git a/docs/DOC_01_Core.md b/docs/DOC_01_Core.md index 5e995db8..0e2f129a 100644 --- a/docs/DOC_01_Core.md +++ b/docs/DOC_01_Core.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: CORE -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [FluentBundle, AsyncFluentBundle, FluentLocalization, LocalizationBootConfig, PathResourceLoader, LoadSummary, ResourceLoadResult, LocalizationCacheStats, require_clean, get_load_summary] questions: ["how do I format messages?", "how do I load multiple locales?", "how do I inspect localization load results?", "how do I boot localization safely?"] diff --git a/docs/DOC_02_SyntaxExpressions.md b/docs/DOC_02_SyntaxExpressions.md index 8794fc72..2c76bd6d 100644 --- a/docs/DOC_02_SyntaxExpressions.md +++ b/docs/DOC_02_SyntaxExpressions.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: SYNTAX_EXPRESSIONS -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [TextElement, Placeable, SelectExpression, VariableReference, FunctionReference, Entry, Expression] questions: ["which AST node types model Fluent expressions and references?", "what public syntax union aliases exist?", "where are placeables and selectors documented?"] diff --git a/docs/DOC_02_SyntaxTypes.md b/docs/DOC_02_SyntaxTypes.md index 8aa521e9..69a13de1 100644 --- a/docs/DOC_02_SyntaxTypes.md +++ b/docs/DOC_02_SyntaxTypes.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: SYNTAX_TYPES -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [AST, Resource, Message, Term, Pattern, Span, Annotation, syntax nodes] questions: ["how is FTL represented in the AST?", "which public AST container and declaration node types exist?", "where are spans and parser annotations documented?"] diff --git a/docs/DOC_02_Types.md b/docs/DOC_02_Types.md index 32953a7d..c63354d0 100644 --- a/docs/DOC_02_Types.md +++ b/docs/DOC_02_Types.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: TYPES -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [FluentNumber, FluentValue, ParseResult, LocaleCode, CurrencyCode, TerritoryInfo, MessageIntrospection] questions: ["what public types does FTLLexEngine expose?", "what value types can formatting accept?", "which semantic aliases and lookup-result types exist?", "what introspection result types exist?"] diff --git a/docs/DOC_03_LocaleParsing.md b/docs/DOC_03_LocaleParsing.md index 1afff2d9..9b29482e 100644 --- a/docs/DOC_03_LocaleParsing.md +++ b/docs/DOC_03_LocaleParsing.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: LOCALE_PARSING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [parse_decimal, parse_fluent_number, parse_date, parse_datetime, parse_currency, is_valid_decimal, clear_date_caches] questions: ["how do I parse localized numbers and dates?", "what do the locale-aware parse helpers return?", "which parsing type guards and cache-clear helpers are public?"] diff --git a/docs/DOC_03_Parsing.md b/docs/DOC_03_Parsing.md index 4ed740d8..0bd3172c 100644 --- a/docs/DOC_03_Parsing.md +++ b/docs/DOC_03_Parsing.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: PARSING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [parse_ftl, serialize_ftl, validate_resource, FluentParserV1, Cursor, ASTVisitor, ASTTransformer, ParseError] questions: ["how do I parse FTL?", "what does validate_resource return?", "what syntax traversal helpers are public?", "where is the syntax parser API documented?"] diff --git a/docs/DOC_04_Analysis.md b/docs/DOC_04_Analysis.md index 460f6799..3a9fdba3 100644 --- a/docs/DOC_04_Analysis.md +++ b/docs/DOC_04_Analysis.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: ANALYSIS -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [analysis, detect_cycles, entry_dependency_set, make_cycle_key, dependency graph, cycle key] questions: ["where are the dependency-graph helpers documented?", "how do I detect cycles in an FTL dependency graph?", "how do I build namespace-prefixed dependency sets?"] diff --git a/docs/DOC_04_Introspection.md b/docs/DOC_04_Introspection.md index 09631016..5c8effa9 100644 --- a/docs/DOC_04_Introspection.md +++ b/docs/DOC_04_Introspection.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: INTROSPECTION -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [introspection, validate_message_variables, extract_variables, extract_references, ISO 4217, ISO 3166, get_currency, get_territory] questions: ["how do I inspect a message's variables and references?", "which ISO lookup helpers exist?", "how do I validate message-variable schemas?", "which Babel-backed introspection helpers are public?"] diff --git a/docs/DOC_04_Runtime.md b/docs/DOC_04_Runtime.md index 33495a0d..f0de2e0b 100644 --- a/docs/DOC_04_Runtime.md +++ b/docs/DOC_04_Runtime.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: RUNTIME -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [CacheConfig, FunctionRegistry, fluent_function, number_format, currency_format, select_plural_category, clear_module_caches] questions: ["how do I configure runtime formatting?", "how do custom functions and registries work?", "where are cache config and write-log entry types documented?"] @@ -15,7 +15,7 @@ Runtime-adjacent utilities, validators, and package metadata constants are docum Parser-only facade note: - `CacheConfig`, `FunctionRegistry`, `fluent_function`, `make_fluent_number`, `CacheAuditLogEntry`, `WriteLogEntry`, and `ValidationResult` remain importable in parser-only installs. -- `create_default_registry`, `get_shared_registry`, `number_format`, `datetime_format`, `currency_format`, `select_plural_category`, `FluentBundle`, and `AsyncFluentBundle` require the full runtime install and are absent from `ftllexengine.runtime` in parser-only installs. +- `create_default_registry`, `get_shared_registry`, `number_format`, `datetime_format`, `currency_format`, `select_plural_category`, `FluentBundle`, and `AsyncFluentBundle` require the full runtime install. In parser-only installs they resolve to lazy placeholders that raise `BabelImportError` on first use. - `clear_module_caches()` is a root-level helper that works in both parser-only and full-runtime installs. Facade ownership note: @@ -103,7 +103,7 @@ def create_default_registry() -> FunctionRegistry: ### Constraints - Return: New mutable registry - State: Fresh object on each call -- Availability: full-runtime only; absent from `ftllexengine.runtime` in parser-only installs +- Availability: full-runtime only; parser-only installs expose a lazy placeholder that raises `BabelImportError` on first use --- @@ -119,7 +119,7 @@ def get_shared_registry() -> FunctionRegistry: ### Constraints - Return: Shared frozen registry - State: Shared singleton-style object -- Availability: full-runtime only; absent from `ftllexengine.runtime` in parser-only installs +- Availability: full-runtime only; parser-only installs expose a lazy placeholder that raises `BabelImportError` on first use --- @@ -146,7 +146,7 @@ def number_format( - Raises: Locale/value boundary errors - State: Pure - Thread: Safe -- Availability: full-runtime only; absent from `ftllexengine.runtime` in parser-only installs +- Availability: full-runtime only; parser-only installs expose a lazy placeholder that raises `BabelImportError` on first use --- @@ -171,7 +171,7 @@ def datetime_format( - Raises: Locale/value boundary errors - State: Pure - Thread: Safe -- Availability: full-runtime only; absent from `ftllexengine.runtime` in parser-only installs +- Availability: full-runtime only; parser-only installs expose a lazy placeholder that raises `BabelImportError` on first use --- @@ -199,7 +199,7 @@ def currency_format( - Raises: Locale/value boundary errors - State: Pure - Thread: Safe -- Availability: full-runtime only; absent from `ftllexengine.runtime` in parser-only installs +- Availability: full-runtime only; parser-only installs expose a lazy placeholder that raises `BabelImportError` on first use --- @@ -222,7 +222,7 @@ def select_plural_category( - Return: CLDR plural category string - State: Pure - Thread: Safe -- Availability: full-runtime only; absent from `ftllexengine.runtime` in parser-only installs +- Availability: full-runtime only; parser-only installs expose a lazy placeholder that raises `BabelImportError` on first use --- @@ -293,6 +293,7 @@ class WriteLogEntry: key_hash: str timestamp: float sequence: int + cache_sequence: int checksum_hex: str wall_time_unix: float ``` @@ -301,5 +302,7 @@ class WriteLogEntry: - Purpose: Underlying runtime cache dataclass behind the `CacheAuditLogEntry` public alias - State: Immutable - Thread: Safe +- `sequence`: Monotonic audit-event order across all cache operations +- `cache_sequence`: Cache-entry sequence observed at the time of the event --- diff --git a/docs/DOC_04_RuntimeUtilities.md b/docs/DOC_04_RuntimeUtilities.md index 2c8b73ab..67ad24ba 100644 --- a/docs/DOC_04_RuntimeUtilities.md +++ b/docs/DOC_04_RuntimeUtilities.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: RUNTIME_UTILITIES -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [normalize_locale, get_system_locale, require_locale_code, __version__, require_date, require_datetime, require_fluent_number] questions: ["where are root-level runtime utility exports documented?", "what package metadata constants are public?", "which boundary validators and locale helpers are exported from the root package?"] diff --git a/docs/DOC_05_Diagnostics.md b/docs/DOC_05_Diagnostics.md index d75acda6..6731725a 100644 --- a/docs/DOC_05_Diagnostics.md +++ b/docs/DOC_05_Diagnostics.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: DIAGNOSTICS -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [ParserAnnotation, ValidationResult, ValidationError, ValidationWarning, DiagnosticCode, DiagnosticFormatter, OutputFormat, SourceSpan] questions: ["what validation result types exist?", "how do I format diagnostics output?", "where are diagnostic codes and source spans documented?"] diff --git a/docs/DOC_05_Errors.md b/docs/DOC_05_Errors.md index 46e89368..672a8b3a 100644 --- a/docs/DOC_05_Errors.md +++ b/docs/DOC_05_Errors.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: ERRORS -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [FrozenFluentError, ErrorCategory, FrozenErrorContext, DataIntegrityError, BabelImportError, ErrorTemplate] questions: ["what errors does FTLLexEngine expose?", "how do parse and format failures surface?", "what integrity exceptions exist?", "how does missing Babel surface?"] @@ -116,7 +116,7 @@ class BabelImportError(ImportError): ### Constraints - Import: `from ftllexengine.introspection import BabelImportError` - Purpose: consistent optional-dependency failure for CLDR-backed features -- Trigger: only for genuinely missing Babel in parser-only installs +- Trigger: only for genuinely missing Babel in parser-only installs; explicit optional runtime imports raise it on first use - Broken-install path: internal Babel import failures bubble their original `ImportError` - Message: instructs callers to install `ftllexengine[babel]` diff --git a/docs/DOC_06_Testing.md b/docs/DOC_06_Testing.md index 955ac0b1..d0dbb2bc 100644 --- a/docs/DOC_06_Testing.md +++ b/docs/DOC_06_Testing.md @@ -1,15 +1,19 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: TESTING -updated: "2026-04-24" +updated: "2026-05-01" route: - keywords: [testing, lint, pytest, fuzz, HypoFuzz, Atheris, test.sh, lint.sh, check.sh] - questions: ["how do I run lint and tests?", "what is the fuzz marker for?", "which scripts drive testing?"] + keywords: [testing, lint, pytest, fuzz, HypoFuzz, Atheris, test.sh, lint.sh, check.sh, devcontainer] + questions: ["how do I run lint and tests?", "what is the fuzz marker for?", "which scripts drive testing?", "how do I validate the contributor container?"] --- # Testing Reference +Repository shell entrypoints assume a Bash 5.0+ `bash` on `PATH`. The committed +contributor devcontainer satisfies that automatically; stock macOS `/bin/bash` +3.2 does not. + --- ## `scripts/validate_docs.py` @@ -22,9 +26,10 @@ uv run python scripts/validate_docs.py ``` ### Constraints -- Purpose: parse repository Markdown, run configured Python fences, and validate FTL fences with the project parser +- Purpose: parse repository Markdown, run configured Python fences, execute canonical bash/sh quick-start blocks, and validate FTL fences with the project parser - Coverage: executes the runnable example set configured in `pyproject.toml` -- Failure mode: exits non-zero on invalid snippets, parser errors, or failing Python blocks +- Shell execution: resolves the shell from the active `PATH` and avoids login-shell path rewrites so repo shebangs hit the same toolchain as a human invocation +- Failure mode: exits non-zero on invalid snippets, parser errors, failing Python blocks, or failing shell workflow blocks - Related guard: `tests/test_documentation_tooling.py` verifies the validator configuration --- @@ -46,6 +51,22 @@ uv run python scripts/validate_version.py --- +## `scripts/validate-devcontainer.sh` + +Repository script that validates the committed contributor devcontainer contract. + +### Signature +```bash +./scripts/validate-devcontainer.sh +``` + +### Constraints +- Purpose: verify `.devcontainer/devcontainer.json`, build the contributor image, and smoke-test the required toolchain and writable cache repair +- Coverage: checks the committed devcontainer image rather than a local shell assumption +- Failure mode: exits non-zero when the committed contributor container drifts from the repo's supported native-tooling contract + +--- + ## `scripts/run_examples.py` Repository script that executes every shipped example under the active project interpreter. @@ -73,9 +94,9 @@ Top-level orchestration script for the repository's full quality surface. ``` ### Constraints -- Purpose: run version/docs validation, examples, lint, tests, HypoFuzz preflight, and bounded Atheris checks in one command -- Environment: uses the same Python-versioned uv environment contract as the repo shell gates -- Fuzzing scope: includes corpus health plus short live Atheris smoke runs for graph and introspection targets +- Purpose: canonical full-repository quality gate; runs version/docs validation, examples, lint, tests, HypoFuzz preflight, and bounded Atheris checks in one command +- Environment: must run inside the committed contributor devcontainer and validates that contract before the other gates +- Fuzzing scope: includes corpus health plus a short live Atheris smoke sweep across every target declared in `fuzz_atheris/targets.tsv` --- @@ -105,11 +126,11 @@ Repository lint runner script for the main static-analysis gate. ``` ### Constraints -- Purpose: Run Ruff then mypy under the repo's expected isolated environment pivot -- Behavior: Pivots to `.venv-3.13` by default; `PY_VERSION` overrides target +- Purpose: run Ruff, mypy, the bare-`noqa` audit, and the explicit repository static validators under the repo's expected isolated environment pivot +- Behavior: Pivots to `.venv-3.13` on the host and `.venv-devcontainer-3.13` inside the contributor container; `PY_VERSION` overrides target - Import mode: keeps `PYTHONPATH` unset so tooling resolves the installed package surface - Output: Quiet-on-success, log-on-fail, agent-oriented summary markers -- Failure mode: exits non-zero on any Ruff or mypy violation +- Failure mode: exits non-zero on any lint, static-validator, or audit violation --- @@ -124,9 +145,9 @@ Repository test runner script for the main correctness gate. ### Constraints - Purpose: Run pytest with the project’s expected environment pivot and reporting -- Behavior: Pivots to `.venv-3.13` by default; `PY_VERSION` overrides target +- Behavior: Pivots to `.venv-3.13` on the host and `.venv-devcontainer-3.13` inside the contributor container; `PY_VERSION` overrides target - Import mode: keeps `PYTHONPATH` unset so tests exercise the installed package surface -- Coverage: Enforces 100% line coverage and 100% branch coverage for `src/ftllexengine` in normal full mode +- Coverage: Enforces the coverage threshold declared in `pyproject.toml` for `src/ftllexengine` in normal full mode - Output: Log-on-fail summary plus structured status markers --- @@ -153,11 +174,11 @@ Repository script for native Atheris/libFuzzer targets. ### Signature ```bash -./scripts/fuzz_atheris.sh [TARGET | --setup | --list | --corpus | --minimize TARGET FILE | --replay TARGET [DIR] | --report TARGET | --clean TARGET] [OPTIONS] +./scripts/fuzz_atheris.sh [TARGET | --setup [TARGET] | --list | --corpus | --minimize TARGET FILE | --replay TARGET [DIR] | --report TARGET | --clean TARGET] [OPTIONS] ``` ### Constraints - Purpose: Run, replay, list, and minimize Atheris findings -- Behavior: Manages `.venv-atheris` separately from the main project venvs +- Behavior: Requires the contributor devcontainer for native execution, pivots into `.venv-devcontainer-atheris`, and loads targets from `fuzz_atheris/targets.tsv` - Output: Target-oriented CLI workflow around the `fuzz_atheris/` tree - `--list`: shows stored crashes and finding artifacts; use [fuzz_atheris/README.md](../fuzz_atheris/README.md) for the target inventory diff --git a/docs/FUZZING_GUIDE.md b/docs/FUZZING_GUIDE.md index 0ed82cc9..75f06de1 100644 --- a/docs/FUZZING_GUIDE.md +++ b/docs/FUZZING_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: FUZZING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [fuzzing, HypoFuzz, Atheris, Hypothesis, fuzz_hypofuzz.sh, fuzz_atheris.sh] questions: ["which fuzzer should I use?", "how do I start fuzzing?", "how do I reproduce a fuzz failure?"] @@ -11,28 +11,38 @@ route: # Fuzzing Guide **Purpose**: Choose the right fuzzing entry point and run it with the repo-supported scripts. -**Prerequisites**: Dev environment synced with `uv`; Python 3.13 available locally for Atheris. +**Prerequisites**: Contributor devcontainer for the canonical path. Optional direct host HypoFuzz work also needs `uv sync --group dev --group fuzz` plus a Bash 5.0+ `bash` on `PATH`. ## Overview Use: - `./scripts/fuzz_hypofuzz.sh` for Hypothesis and HypoFuzz property exploration. -- `./scripts/fuzz_atheris.sh` for native Atheris/libFuzzer targets. +- `./scripts/fuzz_atheris.sh` for native Atheris/libFuzzer targets inside the contributor devcontainer. ## Fast Start +From the host, use the contributor container for the canonical quick start: + ```bash -./scripts/fuzz_hypofuzz.sh -./scripts/fuzz_hypofuzz.sh --deep --time 300 -./scripts/fuzz_atheris.sh numbers --time 60 +npx --yes @devcontainers/cli up --workspace-folder . +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_hypofuzz.sh --preflight +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --help ``` +Inside an already-open contributor devcontainer, drop the wrapper and run +`./scripts/fuzz_hypofuzz.sh --preflight` or `./scripts/fuzz_atheris.sh --help` +directly. + +Optional direct-host HypoFuzz runs use `./scripts/fuzz_hypofuzz.sh ...` and +therefore require a Bash 5.0+ `bash` on `PATH`; stock macOS `/bin/bash` 3.2 is +not enough. + ## Choosing A Surface - Prefer HypoFuzz when you are exploring Python-level invariants and stateful/property-based tests. -- Prefer Atheris when you need native-style mutation, corpus management, or target-specific replay/minimization. -- `./scripts/fuzz_atheris.sh --list` inspects stored crashes and finding artifacts; it does not enumerate target names. +- Prefer Atheris when you need native-style mutation, corpus management, or target-specific replay/minimization inside the contributor devcontainer. +- Inside the devcontainer, `./scripts/fuzz_atheris.sh --list` inspects stored crashes and finding artifacts; it does not enumerate target names. ## Related Guides diff --git a/docs/FUZZING_GUIDE_ATHERIS.md b/docs/FUZZING_GUIDE_ATHERIS.md index 998f8fb7..16ec454b 100644 --- a/docs/FUZZING_GUIDE_ATHERIS.md +++ b/docs/FUZZING_GUIDE_ATHERIS.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: FUZZING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [atheris, libfuzzer, fuzz_atheris.sh, replay, minimize, corpus] questions: ["how do I run an Atheris target?", "how do I replay a finding?", "how does the Atheris environment get created?"] @@ -11,20 +11,29 @@ route: # Atheris Guide **Purpose**: Run and manage the native Atheris/libFuzzer targets in `fuzz_atheris/`. -**Prerequisites**: Python 3.13 available locally. +**Prerequisites**: The committed contributor devcontainer. ## Common Commands +Inside a contributor devcontainer terminal: + +- `./scripts/fuzz_atheris.sh --help` +- `./scripts/fuzz_atheris.sh --list` + +From the host, run the same entrypoint through the devcontainer wrapper: + ```bash -./scripts/fuzz_atheris.sh --help -./scripts/fuzz_atheris.sh numbers --time 60 -./scripts/fuzz_atheris.sh --list # stored crashes/findings, not target names -./scripts/fuzz_atheris.sh --replay runtime path/to/finding +npx --yes @devcontainers/cli up --workspace-folder . +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --help +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --list ``` ## Environment -The script manages `.venv-atheris` itself and keeps it separate from the normal project venvs. If the Atheris environment is missing or built with the wrong Python version, the script recreates it automatically. +The script pivots into the dedicated `.venv-devcontainer-atheris` uv environment inside the devcontainer. +Native toolchain ownership lives in the devcontainer image, which provides `CLANG_BIN=/usr/local/bin/clang` +and the LLVM 19 libFuzzer archives that Atheris needs to build; target discovery lives in +`fuzz_atheris/targets.tsv`. ## Useful Operations @@ -33,3 +42,4 @@ The script manages `.venv-atheris` itself and keeps it separate from the normal - `--replay` to replay stored findings without starting a fresh fuzz run. - `--minimize TARGET FILE` to shrink a failing input for one target. - `--corpus` to run the corpus health check. +- `--smoke-all` to run a bounded manifest-driven sweep across every registered target. diff --git a/docs/FUZZING_GUIDE_HYPOFUZZ.md b/docs/FUZZING_GUIDE_HYPOFUZZ.md index 83f0b148..255ca3d5 100644 --- a/docs/FUZZING_GUIDE_HYPOFUZZ.md +++ b/docs/FUZZING_GUIDE_HYPOFUZZ.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: FUZZING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [hypofuzz, hypothesis, fuzz_hypofuzz.sh, deep mode, preflight, repro] questions: ["how do I run HypoFuzz?", "what does --deep do?", "how do I reproduce a Hypothesis failure?"] @@ -11,15 +11,15 @@ route: # HypoFuzz Guide **Purpose**: Run the property-testing and HypoFuzz entry points shipped by the repository. -**Prerequisites**: `uv sync --group dev --group fuzz`. +**Prerequisites**: Contributor devcontainer for the canonical path, or `uv sync --group dev --group fuzz` plus a Bash 5.0+ `bash` on `PATH` for direct host work. ## Common Commands ```bash -./scripts/fuzz_hypofuzz.sh -./scripts/fuzz_hypofuzz.sh --deep --time 300 -./scripts/fuzz_hypofuzz.sh --preflight -./scripts/fuzz_hypofuzz.sh --repro tests/fuzz/test_runtime_bundle_state_machine.py::test_state_machine +npx --yes @devcontainers/cli up --workspace-folder . +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_hypofuzz.sh --help +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_hypofuzz.sh --preflight +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_hypofuzz.sh --list ``` ## Modes @@ -31,5 +31,7 @@ route: ## Notes -- The script pivots into `.venv-3.13` by default. +- Inside an already-open contributor devcontainer terminal, drop the wrapper and run `./scripts/fuzz_hypofuzz.sh ...` directly. +- Optional direct host invocations of `./scripts/fuzz_hypofuzz.sh ...` require a Bash 5.0+ `bash` on `PATH`; stock macOS `/bin/bash` 3.2 is unsupported. +- The script pivots into `.venv-3.13` on the host and `.venv-devcontainer-3.13` inside the contributor container. - `--metrics` is intended for metric-focused runs rather than indefinite continuous fuzzing. diff --git a/docs/LOCALE_GUIDE.md b/docs/LOCALE_GUIDE.md index 2280f5e1..892bc279 100644 --- a/docs/LOCALE_GUIDE.md +++ b/docs/LOCALE_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: LOCALE -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [locale, NUMBER, DATETIME, CURRENCY, normalize_locale, get_system_locale, use_isolating] questions: ["why did my number not format?", "what locale string should I use?", "what does use_isolating do?"] @@ -54,6 +54,29 @@ assert isinstance(detected, str) assert detected ``` +## Localization Instances Own Their Fallback Chains + +`FluentLocalization` is not a per-call locale switch. Each instance owns one immutable +fallback chain, so a multi-user app typically caches one instance per supported chain and +selects the instance by request locale. + +```python +from ftllexengine import FluentLocalization + +de_checkout = FluentLocalization(["de_DE", "en_US"]) +de_checkout.add_resource("en_US", "checkout = Checkout") +de_checkout.add_resource("de_DE", "checkout = Kasse") +value, errors = de_checkout.format_value("checkout") +assert errors == () +assert value == "Kasse" + +en_checkout = FluentLocalization(["en_US"]) +en_checkout.add_resource("en_US", "checkout = Checkout") +value, errors = en_checkout.format_value("checkout") +assert errors == () +assert value == "Checkout" +``` + ## Bidi Isolation `use_isolating=True` is the default on bundle and localization classes. It wraps placeables with Unicode bidi isolation marks so interpolated values do not corrupt surrounding RTL/LTR text. Keep it enabled for UI output unless you know the output will stay LTR-only and you need plain strings for logging or snapshot assertions. diff --git a/docs/MIGRATION.md b/docs/MIGRATION.md index 709b0459..ebd338be 100644 --- a/docs/MIGRATION.md +++ b/docs/MIGRATION.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: MIGRATION -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [migration, fluent.runtime, FluentBundle, FluentLocalization, strict mode] questions: ["how do I migrate from fluent.runtime?", "what changes when I switch to FTLLexEngine?"] diff --git a/docs/PARSING_GUIDE.md b/docs/PARSING_GUIDE.md index f17aa47a..a61fcf72 100644 --- a/docs/PARSING_GUIDE.md +++ b/docs/PARSING_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: PARSING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [parsing, parse_decimal, parse_currency, parse_date, parse_datetime, parse_fluent_number] questions: ["how do I parse localized user input?", "how do I do roundtrip formatting and parsing?", "what do parse errors look like?"] @@ -34,6 +34,35 @@ assert errors == () assert delivery_date.isoformat() == "2026-03-15" ``` +The parsing boundary also normalizes invisible bidi control marks and locale-native +digit sets. Strings copied from RTL UI output, including Arabic-Indic digits or +Fluent isolation marks, can be parsed directly without pre-cleaning. + +## Ambiguous Currency Symbols + +Some currency symbols map to multiple ISO codes. `"$4.25"` is not enough information by +itself because `$` is used by several currencies. FTLLexEngine returns a structured parse +error instead of guessing. + +```python +from decimal import Decimal +from ftllexengine.parsing import parse_currency + +money, errors = parse_currency("$4.25", "en_US") +assert money is None +assert errors + +money, errors = parse_currency("$4.25", "en_US", infer_from_locale=True) +assert errors == () +assert money == (Decimal("4.25"), "USD") +``` + +For user-entered form fields, choose one rule and apply it consistently: + +- If the request locale is authoritative, use `infer_from_locale=True`. +- If the field contract is fixed, pass `default_currency="USD"` or another ISO code. +- If users can submit mixed currencies, require ISO codes like `USD 4.25` or `4,25 EUR`. + ## FluentNumber Parsing `parse_fluent_number()` returns a `FluentNumber`, preserving both the numeric value and the localized display string. diff --git a/docs/QUICK_REFERENCE.md b/docs/QUICK_REFERENCE.md index 0e939394..7f76ed58 100644 --- a/docs/QUICK_REFERENCE.md +++ b/docs/QUICK_REFERENCE.md @@ -1,26 +1,34 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: REFERENCE -updated: "2026-04-24" +updated: "2026-05-01" route: - keywords: [quick reference, cheat sheet, fluentbundle, fluentlocalization, parsing, validation, boot] - questions: ["show me the common commands", "what is the smallest working example?", "how do I boot localization safely?"] + keywords: [quick reference, cheat sheet, fluentbundle, fluentlocalization, parsing, validation, boot, strict mode] + questions: ["show me the common patterns", "smallest working example", "how do I boot localization safely?", "strict vs soft mode"] --- # FTLLexEngine Quick Reference +Common patterns, copy-paste ready. For full workflows and explanations, see [WORKFLOW_TOUR.md](WORKFLOW_TOUR.md). + +--- + ## Install ```bash -# Full runtime (locale formatting, localization, parsing) +# Full runtime (locale formatting, localization, bidirectional parsing) uv add ftllexengine[babel] +# or: pip install "ftllexengine[babel]" -# Parser-only (syntax, AST, validation, introspection) +# Parser only (syntax, AST, validation, introspection — zero Babel dependency) uv add ftllexengine +# or: pip install ftllexengine ``` -## Format One Message +--- + +## Format one message ```python from ftllexengine import FluentBundle @@ -32,7 +40,9 @@ assert errors == () assert result == "Hello, Alice!" ``` -## Multi-Locale Fallback +--- + +## Multi-locale fallback ```python from ftllexengine import FluentLocalization @@ -45,22 +55,75 @@ assert errors == () assert result == "Apmaksa" ``` -## Parse Localized Input +--- + +## Parse localized user input ```python from decimal import Decimal -from ftllexengine.parsing import parse_currency, parse_decimal +from ftllexengine.parsing import parse_currency, parse_decimal, parse_date +# Number amount, errors = parse_decimal("12,450.50", "en_US") assert errors == () assert amount == Decimal("12450.50") +# Currency money, errors = parse_currency("12.450,50 EUR", "de_DE", default_currency="EUR") assert errors == () assert money == (Decimal("12450.50"), "EUR") + +# Date +date, errors = parse_date("2026年3月15日", "ja_JP") +assert errors == () +assert date.isoformat() == "2026-03-15" +``` + +### Ambiguous currency symbols + +```python +from decimal import Decimal +from ftllexengine.parsing import parse_currency + +money, errors = parse_currency("$4.25", "en_US") +assert money is None +assert errors + +money, errors = parse_currency("$4.25", "en_US", infer_from_locale=True) +assert errors == () +assert money == (Decimal("4.25"), "USD") +``` + +Use `infer_from_locale=True` when the request locale is authoritative, or pass a fixed +`default_currency="USD"` when the field contract is already known. + +--- + +## Strict mode vs soft mode + +```python +from ftllexengine import FluentBundle, FormattingIntegrityError + +# Default strict=True: raises on any resolution error +bundle = FluentBundle("en_US", use_isolating=False) +bundle.add_resource('confirm = { $bags } bags at { CURRENCY($price, currency: "USD") }/lb') + +try: + bundle.format_pattern("confirm", {"bags": 500}) # $price missing +except FormattingIntegrityError as e: + print(e.message_id) # "confirm" + print(e.fallback_value) # "500 bags at {!CURRENCY}/lb" + +# strict=False: errors returned as data, not raised +soft = FluentBundle("en_US", strict=False, use_isolating=False) +soft.add_resource("confirm = { $bags } bags") +result, errors = soft.format_pattern("confirm", {}) +assert errors # structured error list ``` -## Validate FTL Before Loading +--- + +## Validate FTL before loading ```python from ftllexengine import validate_resource @@ -70,7 +133,9 @@ assert result.is_valid assert result.error_count == 0 ``` -## Boot Validation +--- + +## Boot validation ```python from pathlib import Path @@ -89,12 +154,19 @@ with TemporaryDirectory() as tmp: message_schemas={"welcome": {"name"}}, required_messages=frozenset({"welcome"}), ) + + # Full boot: returns localization object, load summary, and schema results l10n, summary, schema_results = cfg.boot() assert summary.all_clean assert schema_results[0].is_valid + + # Simple boot: returns only the localization object + # l10n = cfg.boot_simple() ``` -## Register A Custom Function +--- + +## Register a custom function ```python from ftllexengine import FluentBundle @@ -110,12 +182,20 @@ assert errors == () assert result == "COFFEE" ``` -## Clear Module Caches +--- + +## Introspect a message contract ```python -from ftllexengine import clear_module_caches +from ftllexengine import FluentBundle -clear_module_caches() -clear_module_caches(frozenset({"parsing.dates", "locale"})) -# Unknown selector names raise ValueError instead of being ignored. +bundle = FluentBundle("en_US", use_isolating=False) +bundle.add_resource( + 'order = { $buyer } buys { $bags } bags at { CURRENCY($price, currency: "USD") }/lb' +) + +info = bundle.introspect_message("order") +assert info.get_variable_names() == frozenset({"buyer", "bags", "price"}) +assert info.get_function_names() == frozenset({"CURRENCY"}) +assert info.requires_variable("price") is True ``` diff --git a/docs/RELEASE_PROTOCOL.md b/docs/RELEASE_PROTOCOL.md index 4609bec5..c355177a 100644 --- a/docs/RELEASE_PROTOCOL.md +++ b/docs/RELEASE_PROTOCOL.md @@ -1,17 +1,17 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: RELEASE -updated: "2026-04-24" +updated: "2026-05-01" route: - keywords: [release, gh, github release, pypi, tag, assets, publish, verify, worktree, main] + keywords: [release, gh, github release, pypi, tag, assets, publish, verify, clone, main] questions: ["how do I cut a release?", "how do I publish GitHub assets?", "how do I verify a release handoff?", "how do I rerun publish for an existing tag?"] --- # Release Protocol **Purpose**: Publish a tagged FTLLexEngine release through GitHub CLI and verify the GitHub Release and PyPI handoff. -**Prerequisites**: `gh` installed and authenticated, `uv` installed, the target release version chosen, and a checkout topology that can produce a clean release payload. +**Prerequisites**: `gh` installed and authenticated, `uv` installed, Docker plus either the Dev Containers IDE integration or `npx --yes @devcontainers/cli`, the target release version chosen, and a checkout topology that can produce a clean release clone. ## Overview @@ -51,10 +51,15 @@ Rules: - If the primary checkout is clean and current enough for release work, release from it directly. - If the primary checkout is intentionally dirty, contains unrelated unpublished work, or should - not be disturbed, create a clean release worktree from the same repository and do release work + not be disturbed, create a clean release clone from the same repository and do release work there. - Do not run the release from a dirty checkout just because the intended payload currently lives there. +- Do not use `git worktree` for release pre-flight in this repository. The contributor + devcontainer mounts the workspace folder only, and the release gates include git-aware checks + that call `git ls-files` inside that container. A worktree `.git` indirection points back to a + host path outside the mounted workspace and causes those checks to fail for topology reasons + rather than release reasons. - If `git fetch origin --tags` fails with `would clobber existing tag`, stop and inspect the tag divergence before continuing. Compare the local and remote tag directly, delete only the stale local tag, and rerun the tag fetch: @@ -67,24 +72,29 @@ git tag -d "$TAG" git fetch origin --tags ``` -Recommended clean-worktree flow: +Recommended clean-clone flow: ```bash PRIMARY_CHECKOUT="$(git rev-parse --show-toplevel)" +PRIMARY_ORIGIN_URL="$(git -C "$PRIMARY_CHECKOUT" remote get-url origin)" git fetch origin --prune git fetch origin --tags -RELEASE_WORKTREE="$(mktemp -d -t ftllexengine-release-XXXXXX)" -git worktree add --detach "$RELEASE_WORKTREE" origin/main -cd "$RELEASE_WORKTREE" +RELEASE_CLONE="$(mktemp -d -t ftllexengine-release-XXXXXX)" +git clone --branch main "$PRIMARY_CHECKOUT" "$RELEASE_CLONE" +cd "$RELEASE_CLONE" +git remote set-url origin "$PRIMARY_ORIGIN_URL" +git fetch origin --prune +git fetch origin --tags +git switch --detach origin/main ``` -This flow intentionally keeps the worktree detached during pre-flight. Create +This flow intentionally keeps the clone detached during pre-flight. Create `release/X.Y.Z` only after Step 2 passes. If the unpublished release payload exists only in the dirty primary checkout, move it explicitly -before running release gates in the clean worktree. Preferred: create a local bootstrap branch that -captures the payload, then add the release worktree from that branch. Acceptable: export one -explicit patch and apply it inside the release worktree. +before running release gates in the clean clone. Preferred: create a local bootstrap branch that +captures the payload, then clone from the primary checkout after that bootstrap commit exists. +Acceptable: export one explicit patch and apply it inside the clean clone. Bootstrap-branch example: @@ -92,13 +102,19 @@ Bootstrap-branch example: git switch -c codex/release-bootstrap-X.Y.Z git add -A git commit -m "release: bootstrap X.Y.Z payload" -RELEASE_WORKTREE="$(mktemp -d -t ftllexengine-release-XXXXXX)" -git worktree add --detach "$RELEASE_WORKTREE" codex/release-bootstrap-X.Y.Z -cd "$RELEASE_WORKTREE" +PRIMARY_CHECKOUT="$(git rev-parse --show-toplevel)" +PRIMARY_ORIGIN_URL="$(git -C "$PRIMARY_CHECKOUT" remote get-url origin)" +RELEASE_CLONE="$(mktemp -d -t ftllexengine-release-XXXXXX)" +git clone --branch codex/release-bootstrap-X.Y.Z "$PRIMARY_CHECKOUT" "$RELEASE_CLONE" +cd "$RELEASE_CLONE" +git remote set-url origin "$PRIMARY_ORIGIN_URL" +git fetch origin --prune +git fetch origin --tags +git switch --detach codex/release-bootstrap-X.Y.Z ``` If the bootstrap payload intentionally left the final release version or changelog entry unresolved, -finish those edits inside the clean release worktree before Step 2. Treat the clean worktree as the +finish those edits inside the clean release clone before Step 2. Treat the clean clone as the authoritative place to finalize `pyproject.toml`, versioned markdown frontmatter, lockfiles, and the target `CHANGELOG.md` release entry. @@ -109,14 +125,18 @@ Run the local gates first: ```bash gh pr list --state open \ --json number,title,url,headRefName,mergeStateStatus,isDraft,author,statusCheckRollup -bash -n scripts/*.sh -./check.sh -PY_VERSION=3.14 ./scripts/lint.sh -PY_VERSION=3.14 ./scripts/test.sh -uv run python scripts/validate_docs.py -uv run python scripts/validate_version.py -uv build -tar -tzf "dist/ftllexengine-X.Y.Z.tar.gz" | rg '(^|/)AGENTS\\.md$|(^|/)\\.codex/' || true +npx --yes @devcontainers/cli up --workspace-folder . +npx --yes @devcontainers/cli exec --workspace-folder . bash -lc 'bash -n check.sh scripts/*.sh' +npx --yes @devcontainers/cli exec --workspace-folder . ./check.sh +npx --yes @devcontainers/cli exec --workspace-folder . bash -lc ' + set -euo pipefail + PY_VERSION=3.14 ./scripts/lint.sh + PY_VERSION=3.14 ./scripts/test.sh + uv run --group dev --python 3.14 python scripts/validate_docs.py + uv run --group dev --python 3.14 python scripts/validate_version.py + uv build +' +tar -tzf "dist/ftllexengine-X.Y.Z.tar.gz" | grep -E '(^|/)AGENTS\.md$|(^|/)\.codex/' || true python - <<'PY' import zipfile from pathlib import Path @@ -347,7 +367,7 @@ Requirements: - The remote `release/X.Y.Z` branch is gone. - No stale historical `release/` branches remain locally or remotely. -- If a dedicated release worktree was used, the primary checkout is explicitly returned to a +- If a dedicated release clone was used, the primary checkout is explicitly returned to a truthful `main`: ```bash @@ -357,4 +377,4 @@ git -C "$PRIMARY_CHECKOUT" pull --ff-only - Any still-needed unpublished local work from the old primary checkout is moved to a named branch or exported patch. -- Disposable release worktrees are removed after the release closes. +- Disposable release clones are removed after the release closes. diff --git a/docs/TERMINOLOGY.md b/docs/TERMINOLOGY.md index fd70ebbc..18c9ff1c 100644 --- a/docs/TERMINOLOGY.md +++ b/docs/TERMINOLOGY.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: TERMINOLOGY -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [terminology, glossary, message, term, resource, locale code, strict mode] questions: ["what does resource mean here?", "what is the difference between a message and a term?", "what does strict mode mean in FTLLexEngine?"] diff --git a/docs/THREAD_SAFETY.md b/docs/THREAD_SAFETY.md index a6d27431..10680192 100644 --- a/docs/THREAD_SAFETY.md +++ b/docs/THREAD_SAFETY.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: ARCHITECTURE -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [thread safety, concurrency, FluentBundle, FluentLocalization, AsyncFluentBundle, shared bundle] questions: ["is FluentBundle thread-safe?", "can I share a localization object across threads?", "what does AsyncFluentBundle do?"] diff --git a/docs/TYPE_HINTS_GUIDE.md b/docs/TYPE_HINTS_GUIDE.md index df517597..569dd52b 100644 --- a/docs/TYPE_HINTS_GUIDE.md +++ b/docs/TYPE_HINTS_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: TYPE_HINTS -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [type hints, mypy, FluentValue, ParseResult, TypeIs, LocaleCode] questions: ["what types does the library expose?", "how do I type parse results?", "which helpers are type guards?"] diff --git a/docs/VALIDATION_GUIDE.md b/docs/VALIDATION_GUIDE.md index 364ce5c7..d1acff5b 100644 --- a/docs/VALIDATION_GUIDE.md +++ b/docs/VALIDATION_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: VALIDATION -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [validation, validate_resource, ValidationResult, require_clean, boot validation, message schemas] questions: ["how do I validate FTL before loading it?", "how do I fail fast at startup?", "how do I validate message variables?"] diff --git a/docs/WORKFLOW_TOUR.md b/docs/WORKFLOW_TOUR.md index 5a2ee3a3..d1f90bc7 100644 --- a/docs/WORKFLOW_TOUR.md +++ b/docs/WORKFLOW_TOUR.md @@ -1,97 +1,281 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: GUIDE -updated: "2026-04-24" +updated: "2026-05-01" route: - keywords: [workflow tour, deeper readme material, multi-locale, streaming resources, async bundle, boot validation, introspection] - questions: ["where did the deeper README workflows move?", "how do I see FTLLexEngine end-to-end workflows?", "which docs cover streaming, async, and boot validation together?"] + keywords: [workflow tour, multi-locale, bidirectional parsing, boot validation, thread safety, async, introspection, streaming] + questions: ["how do I use FTLLexEngine end-to-end?", "multi-locale formatting example", "how do I parse localized user input?", "boot validation example", "thread-safe formatting", "async bundle example"] --- # FTLLexEngine Workflow Tour -**Purpose**: Preserve the deeper workflows that do not belong in the storefront `README.md` while keeping them easy to find and grounded in runnable examples. -**Prerequisites**: Full runtime install (`ftllexengine[babel]`) for formatting, localization, localized parsing, and ISO metadata lookups. +This guide shows FTLLexEngine working as a full stack — format outbound, parse inbound, validate at boot — across the scenarios where it earns its place. Prerequisites: full runtime install (`uv add ftllexengine[babel]`) for all sections except introspection and validation, which work with the parser-only install. -## Overview +--- + +## Format for multiple locales + +Same message template, three markets. Translators maintain one `.ftl` file per locale. Your code stays the same. + +**English — New York buyer:** + +```python +from decimal import Decimal +from ftllexengine import FluentBundle + +bundle = FluentBundle("en_US", use_isolating=False) +bundle.add_resource(""" +shipment-line = { $bags -> + [0] No bags shipped + [one] 1 bag of { $origin } coffee + *[other] { $bags } bags of { $origin } coffee +} + +invoice-total = Total: { CURRENCY($amount, currency: "USD") } +""") + +result, _ = bundle.format_pattern("shipment-line", {"bags": 500, "origin": "Colombian"}) +assert result == "500 bags of Colombian coffee" + +result, _ = bundle.format_pattern("invoice-total", {"amount": Decimal("187500.00")}) +assert result == "Total: $187,500.00" +``` + +**German — Hamburg buyer:** + +```python +from decimal import Decimal +from ftllexengine import FluentBundle + +bundle_de = FluentBundle("de_DE", use_isolating=False) +bundle_de.add_resource(""" +shipment-line = { $bags -> + [0] Keine Saecke versandt + [one] 1 Sack { $origin } Kaffee + *[other] { $bags } Saecke { $origin } Kaffee +} + +invoice-total = Gesamt: { CURRENCY($amount, currency: "EUR") } +""") + +result, _ = bundle_de.format_pattern("shipment-line", {"bags": 500, "origin": "kolumbianischer"}) +assert result == "500 Saecke kolumbianischer Kaffee" + +result, _ = bundle_de.format_pattern("invoice-total", {"amount": Decimal("187500.00")}) +assert result == "Gesamt: 187.500,00\u00a0€" # CLDR: non-breaking space before symbol +``` + +**Japanese — Tokyo buyer:** + +```python +from decimal import Decimal +from ftllexengine import FluentBundle + +bundle_ja = FluentBundle("ja_JP", use_isolating=False) +bundle_ja.add_resource(""" +shipment-line = { $bags -> + [0] 出荷なし + *[other] { $origin }コーヒー { $bags }袋 +} + +invoice-total = 合計:{ CURRENCY($amount, currency: "JPY") } +""") + +result, _ = bundle_ja.format_pattern("shipment-line", {"bags": 500, "origin": "コロンビア"}) +assert result == "コロンビアコーヒー 500袋" + +result, _ = bundle_ja.format_pattern("invoice-total", {"amount": Decimal("28125000")}) +assert result == "合計:¥28,125,000" +``` -The root `README.md` is the front window: short promise, shortest credible example, and quick next steps. This guide keeps the richer material that matters once you want to evaluate FTLLexEngine as a real working stack instead of a headline. +Add a new market: add one `.ftl` file. Zero code changes. -The library is strongest when you need one coherent path for: +→ See [LOCALE_GUIDE.md](LOCALE_GUIDE.md) for fallback chains and multi-locale orchestration. -- formatting Fluent messages with locale-aware numbers, dates, and currency, -- parsing localized user input back into exact Python values, -- validating resources before traffic, -- and keeping those operations safe in threaded or asyncio applications. +--- + +## Parse localized user input + +Most libraries only format outbound data. FTLLexEngine also parses inbound user input back to exact Python types. + +```python +from decimal import Decimal +from ftllexengine.parsing import parse_currency, parse_date, parse_decimal + +# German user enters a bid price +bid, errors = parse_currency("12.450,00 EUR", "de_DE", default_currency="EUR") +if not errors: + amount, currency = bid # (Decimal("12450.00"), "EUR") + +# Colombian user enters an ask +ask, errors = parse_currency("45.000.000 COP", "es_CO", default_currency="COP") +if not errors: + amount, currency = ask # (Decimal("45000000"), "COP") + +# Japanese user enters a delivery date +date, errors = parse_date("2026年3月15日", "ja_JP") +assert not errors +assert date.isoformat() == "2026-03-15" + +# US user enters a weight +weight, errors = parse_decimal("12,450.50", "en_US") +assert not errors +assert weight == Decimal("12450.50") +``` + +Parse errors come back as structured data, not exceptions: + +```python +from ftllexengine.parsing import parse_decimal + +price, errors = parse_decimal("twelve thousand", "en_US") +assert price is None +assert errors +print(errors[0]) # "Failed to parse decimal 'twelve thousand' for locale 'en_US': ..." +``` + +**Decimal precision throughout.** Float math fails for money: `0.1 + 0.2 = 0.30000000000000004`. FTLLexEngine uses `Decimal` everywhere — parse, format, and arithmetic stay exact. + +```python +from decimal import Decimal +from ftllexengine.parsing import parse_currency + +price, _ = parse_currency("$4.25", "en_US", default_currency="USD") +price_per_lb, _ = price # Decimal("4.25") -## Where The Deeper Material Lives +bags, lbs_per_bag = 500, Decimal("132") +contract_value = bags * lbs_per_bag * price_per_lb +assert contract_value == Decimal("280500.00") # exact, every time +``` + +## Wire a request flow without ambient locale tricks -| Topic moved out of the storefront | Best current home | -|:----------------------------------|:------------------| -| Smallest working setup | [QUICK_REFERENCE.md](QUICK_REFERENCE.md) | -| Multi-locale fallback chains | [examples/locale_fallback.py](../examples/locale_fallback.py) and [LOCALE_GUIDE.md](LOCALE_GUIDE.md) | -| Parsing localized input | [PARSING_GUIDE.md](PARSING_GUIDE.md) and [examples/bidirectional_formatting.py](../examples/bidirectional_formatting.py) | -| Thread-safe shared bundles | [THREAD_SAFETY.md](THREAD_SAFETY.md) and [examples/thread_safety.py](../examples/thread_safety.py) | -| Async applications | [examples/async_bundle.py](../examples/async_bundle.py) | -| Streaming resource loading | [examples/streaming_resources.py](../examples/streaming_resources.py) and [DOC_03_Parsing.md](DOC_03_Parsing.md) | -| Message introspection | [examples/parser_only.py](../examples/parser_only.py) and [DOC_04_Introspection.md](DOC_04_Introspection.md) | -| Startup and schema validation | [VALIDATION_GUIDE.md](VALIDATION_GUIDE.md) and [QUICK_REFERENCE.md](QUICK_REFERENCE.md) | -| Currency and territory metadata | [DOC_04_Introspection.md](DOC_04_Introspection.md) | -| Symbol-by-symbol API routing | [DOC_00_Index.md](DOC_00_Index.md) | +For request-driven apps, treat locale selection as startup wiring, not as a per-call flag on +one global localization object. -## One Runtime For Format And Parse +- One `FluentLocalization` instance owns one fallback chain. +- Build or cache one localization per supported chain, then choose the instance from the + request locale. +- For ambiguous money inputs such as `"$4.25"`, use `infer_from_locale=True` or an + explicit `default_currency`. -The core value proposition from the old root README still stands: the same locale theory can format outbound text and parse inbound user input, so the invoice you emit and the reply you accept do not drift into separate rule systems. +→ See [PARSING_GUIDE.md](PARSING_GUIDE.md) for the full parsing API and locale-specific edge cases. -- For the fastest copy-paste path, use [QUICK_REFERENCE.md](QUICK_REFERENCE.md). -- For a fuller parsing walkthrough, use [PARSING_GUIDE.md](PARSING_GUIDE.md). -- For runnable end-to-end examples, use [examples/quickstart.py](../examples/quickstart.py) and [examples/bidirectional_formatting.py](../examples/bidirectional_formatting.py). +--- -## Stream Resources Without Building One Giant String +## Validate at startup, not at request time -`add_resource_stream()` and `parse_stream_ftl()` let you work from line iterators instead of pre-assembling the entire source in memory first. +`LocalizationBootConfig` loads all resources, checks that required messages exist, and validates message schemas before the application accepts any traffic. If anything is wrong, it raises before the first request — not during one. ```python from pathlib import Path from tempfile import TemporaryDirectory +from ftllexengine import LocalizationBootConfig -from ftllexengine import FluentBundle, parse_stream_ftl +with TemporaryDirectory() as tmp: + base = Path(tmp) / "locales" + for locale, label in {"en_us": "Total", "de_de": "Gesamt", "ja_jp": "合計"}.items(): + locale_dir = base / locale + locale_dir.mkdir(parents=True) + (locale_dir / "invoice.ftl").write_text( + f'invoice-total = {label}: {{ CURRENCY($amount, currency: "USD") }}\n', + encoding="utf-8", + ) + (locale_dir / "shipment.ftl").write_text( + 'shipment-line = { $bags } bags of { $origin }\n', + encoding="utf-8", + ) + + cfg = LocalizationBootConfig.from_path( + locales=("en_US", "de_DE", "ja_JP"), + resource_ids=("invoice.ftl", "shipment.ftl"), + base_path=base / "{locale}", + message_schemas={ + "invoice-total": {"amount"}, + "shipment-line": {"bags", "origin"}, + }, + required_messages=frozenset({"invoice-total", "shipment-line"}), + ) + + l10n, summary, schema_results = cfg.boot() + assert summary.all_clean + assert all(r.is_valid for r in schema_results) +``` + +Use `boot_simple()` when you only need the localization object: + +```python +from pathlib import Path +from tempfile import TemporaryDirectory +from ftllexengine import LocalizationBootConfig with TemporaryDirectory() as tmp: - source_path = Path(tmp) / "messages.ftl" - source_path.write_text( - "hello = Hello from orbit\n" - "status = Cargo ready\n", - encoding="utf-8", + base = Path(tmp) / "locales" + locale_dir = base / "en_us" + locale_dir.mkdir(parents=True) + (locale_dir / "main.ftl").write_text("welcome = Hello\n", encoding="utf-8") + + cfg = LocalizationBootConfig.from_path( + locales=("en_US",), + resource_ids=("main.ftl",), + base_path=base / "{locale}", + required_messages=frozenset({"welcome"}), ) + l10n = cfg.boot_simple() + assert l10n is not None +``` - bundle = FluentBundle("en_US", use_isolating=False) - with source_path.open(encoding="utf-8") as handle: - junk = bundle.add_resource_stream(handle, source_path=str(source_path)) - assert junk == () +→ See [VALIDATION_GUIDE.md](VALIDATION_GUIDE.md) and [DATA_INTEGRITY_ARCHITECTURE.md](DATA_INTEGRITY_ARCHITECTURE.md). - status, errors = bundle.format_pattern("status") - assert errors == () - assert status == "Cargo ready" +--- - with source_path.open(encoding="utf-8") as handle: - entry_ids = [entry.id.name for entry in parse_stream_ftl(handle)] - assert entry_ids == ["hello", "status"] +## Handle concurrent requests + +Python's `locale` module uses global state. Setting a locale in one thread affects every other thread. FTLLexEngine bundles are isolated — no global state, no locks you manage. + +```python +from concurrent.futures import ThreadPoolExecutor +from decimal import Decimal +from ftllexengine import FluentBundle + +de_bundle = FluentBundle("de_DE", use_isolating=False) +es_bundle = FluentBundle("es_CO", use_isolating=False) +ja_bundle = FluentBundle("ja_JP", use_isolating=False) + +ftl = 'confirm = { CURRENCY($amount, currency: "USD") } per { $unit }' +for b in (de_bundle, es_bundle, ja_bundle): + b.add_resource(ftl) + +def format_confirmation(bundle, amount, unit): + result, _ = bundle.format_pattern("confirm", {"amount": amount, "unit": unit}) + return result + +with ThreadPoolExecutor(max_workers=100) as executor: + futures = [ + executor.submit(format_confirmation, de_bundle, Decimal("4.25"), "lb"), + executor.submit(format_confirmation, es_bundle, Decimal("4.25"), "lb"), + executor.submit(format_confirmation, ja_bundle, Decimal("4.25"), "lb"), + ] + results = [f.result() for f in futures] + # de_DE: "4,25 $ per lb", es_CO: "US$4,25 per lb", ja_JP: "$4.25 per lb" ``` -For a runnable script that also shows streamed localization loads, use [examples/streaming_resources.py](../examples/streaming_resources.py). +Multiple threads can format messages simultaneously. Adding resources or functions acquires exclusive access briefly. You do not manage any of this. -## Use Async Bundles In Event-Loop Applications +→ See [THREAD_SAFETY.md](THREAD_SAFETY.md) and [examples/thread_safety.py](../examples/thread_safety.py). -`AsyncFluentBundle` keeps the Fluent runtime behavior but offloads mutation and formatting work through `asyncio.to_thread()`, which is the right fit when your application is already organized around async request handling. +--- + +## Use async bundles in event-loop applications + +`AsyncFluentBundle` keeps the same strict-mode guarantees but offloads mutations and formatting through `asyncio.to_thread()`, keeping the event loop free. ```python import asyncio from decimal import Decimal - from ftllexengine import AsyncFluentBundle - async def main() -> None: async with AsyncFluentBundle("en_US", use_isolating=False) as bundle: await bundle.add_resource( @@ -108,15 +292,51 @@ async def main() -> None: ) assert [text for text, _ in results] == ["Count: 0", "Count: 1", "Count: 2"] - asyncio.run(main()) ``` -For a fuller runnable script, use [examples/async_bundle.py](../examples/async_bundle.py). +→ See [examples/async_bundle.py](../examples/async_bundle.py). + +--- + +## Stream resources without loading the whole file + +`add_resource_stream()` and `parse_stream_ftl()` accept any line iterator. Useful for large `.ftl` files or network streams where building one giant string first is not practical. + +```python +from pathlib import Path +from tempfile import TemporaryDirectory +from ftllexengine import FluentBundle, parse_stream_ftl + +with TemporaryDirectory() as tmp: + source_path = Path(tmp) / "messages.ftl" + source_path.write_text( + "hello = Hello\n" + "status = Cargo ready\n", + encoding="utf-8", + ) + + bundle = FluentBundle("en_US", use_isolating=False) + with source_path.open(encoding="utf-8") as handle: + junk = bundle.add_resource_stream(handle, source_path=str(source_path)) + assert junk == () + + status, errors = bundle.format_pattern("status") + assert errors == () + assert status == "Cargo ready" + + with source_path.open(encoding="utf-8") as handle: + entry_ids = [entry.id.name for entry in parse_stream_ftl(handle)] + assert entry_ids == ["hello", "status"] +``` + +→ See [examples/streaming_resources.py](../examples/streaming_resources.py). + +--- -## Introspect Message Contracts Before Formatting +## Inspect message contracts before formatting -The message-introspection APIs are the pre-flight surface: inspect required variables and called functions before a live format call, or use the same metadata to generate forms, validation rules, or build-time checks. +Query what variables and functions a message requires before you call `format_pattern()`. Useful for pre-flight checks, auto-generating input fields, or catching missing variables at build time. ```python from ftllexengine import FluentBundle @@ -130,49 +350,46 @@ info = bundle.introspect_message("contract") assert info.get_variable_names() == frozenset({"buyer", "amount", "ship_date"}) assert info.get_function_names() == frozenset({"CURRENCY", "DATETIME"}) assert info.has_selectors is False +assert info.requires_variable("amount") is True ``` -If you only need parsing, validation, and introspection without the Babel-backed runtime, start with [examples/parser_only.py](../examples/parser_only.py). +→ See [DOC_04_Introspection.md](DOC_04_Introspection.md) and [examples/parser_only.py](../examples/parser_only.py). -## Validate Before Traffic - -The fail-fast startup path also remains important. `LocalizationBootConfig.boot()` is the canonical way to prove that required resources loaded cleanly and that required message contracts exist before the application starts serving requests. - -- Use [VALIDATION_GUIDE.md](VALIDATION_GUIDE.md) for the startup pattern. -- Use [DATA_INTEGRITY_ARCHITECTURE.md](DATA_INTEGRITY_ARCHITECTURE.md) for the underlying fail-fast model. -- Use [QUICK_REFERENCE.md](QUICK_REFERENCE.md) for the shortest runnable boot snippet. - -## Query Territory And Currency Metadata +--- -The ISO and CLDR-backed helper layer stays useful when product decisions depend on territory defaults or currency precision. +## Query territory and currency metadata ```python from ftllexengine.introspection import get_currency, get_territory_currencies +# What currency does Japan use? assert get_territory_currencies("JP") == ("JPY",) +# How many decimal places for yen? yen = get_currency("JPY") -assert yen is not None -assert yen.decimal_digits == 0 -``` - -For the full set of helpers, use [DOC_04_Introspection.md](DOC_04_Introspection.md). +assert yen.decimal_digits == 0 # no decimal places -## Surface Map +# Compare to Colombian peso +cop = get_currency("COP") +assert cop.decimal_digits == 2 -| Surface | Use it for | Install mode | -|:--------|:-----------|:-------------| -| Syntax and validation | Parse, transform, serialize, and validate `.ftl` resources | Parser-only | -| Runtime | `FluentBundle`, built-in functions, locale-aware formatting | Full runtime | -| Localization | `FluentLocalization`, fallback chains, loaders, boot validation | Mixed | -| Parsing | Localized numbers, dates, datetimes, and currency back to Python values | Full runtime | -| Introspection and analysis | Message variables, references, dependency graphs, ISO helpers | Mixed | -| Diagnostics and integrity | Structured errors, strict mode, audit evidence, immutable failure data | Parser-only | +# Multi-currency territories +assert get_territory_currencies("PA") == ("PAB", "USD") +``` -Use [DOC_00_Index.md](DOC_00_Index.md) when you need the exact symbol home instead of the high-level subsystem map. +→ See [DOC_04_Introspection.md](DOC_04_Introspection.md). -## Good Fit Versus Simpler Fit +--- -- Strong fit: Fluent-based applications, invoice and checkout flows, localized forms, startup validation for translation packs, and systems that care about exact decimals. -- Strong fit: Teams that want message grammar, formatting rules, parsing rules, and startup checks to stay in one coherent runtime instead of drifting between template helpers and request-time patches. -- Simpler fit: single-locale applications, plain string formatting, or projects that do not need Fluent resources at all. +## Go deeper + +| Topic | Best home | +|:------|:----------| +| Copy-paste patterns | [QUICK_REFERENCE.md](QUICK_REFERENCE.md) | +| Locale fallback chains | [LOCALE_GUIDE.md](LOCALE_GUIDE.md) · [examples/locale_fallback.py](../examples/locale_fallback.py) | +| Full parsing API | [PARSING_GUIDE.md](PARSING_GUIDE.md) · [examples/bidirectional_formatting.py](../examples/bidirectional_formatting.py) | +| Thread safety details | [THREAD_SAFETY.md](THREAD_SAFETY.md) · [examples/thread_safety.py](../examples/thread_safety.py) | +| Boot validation and strict mode | [VALIDATION_GUIDE.md](VALIDATION_GUIDE.md) · [DATA_INTEGRITY_ARCHITECTURE.md](DATA_INTEGRITY_ARCHITECTURE.md) | +| Symbol-by-symbol API | [DOC_00_Index.md](DOC_00_Index.md) | +| Custom functions | [CUSTOM_FUNCTIONS_GUIDE.md](CUSTOM_FUNCTIONS_GUIDE.md) | +| Type hints | [TYPE_HINTS_GUIDE.md](TYPE_HINTS_GUIDE.md) | diff --git a/examples/README.md b/examples/README.md index e34d369e..e8622431 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,17 +1,17 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: EXAMPLES -updated: "2026-04-24" +updated: "2026-05-01" route: - keywords: [examples, quickstart, parser-only, localization, custom functions, thread safety, benchmarks] - questions: ["what examples are available?", "how do I run the examples?", "which example should I start with?"] + keywords: [examples, quickstart, parser-only, localization, custom functions, thread safety, streaming, benchmarks] + questions: ["what examples are available?", "how do I run the examples?", "which example should I start with?", "which example covers streaming or parsing?"] --- # FTLLexEngine Examples **Purpose**: Show which runnable example scripts ship with the repository and what each one demonstrates. -**Prerequisites**: Development environment synced with `uv sync --group dev`. +**Prerequisites**: Development environment synced with `uv sync --group dev`, or the contributor devcontainer. ## Overview diff --git a/examples/README_TYPE_CHECKING.md b/examples/README_TYPE_CHECKING.md index cf675031..f9b9274c 100644 --- a/examples/README_TYPE_CHECKING.md +++ b/examples/README_TYPE_CHECKING.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: EXAMPLES -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [examples, mypy, type checking, strict, explicit ownership, thread safety] questions: ["how do I type-check the examples?", "what mypy config do the examples use?", "how do the examples stay strict without local stubs?"] diff --git a/examples/quickstart.py b/examples/quickstart.py index d1941e78..99dfe2c7 100644 --- a/examples/quickstart.py +++ b/examples/quickstart.py @@ -443,6 +443,7 @@ def describe_path(self, locale: str, resource_id: str) -> str: latest_entry = audit_log[-1] print(f" latest_audit_operation: {latest_entry.operation}") print(f" latest_audit_sequence: {latest_entry.sequence}") + print(f" latest_cache_sequence: {latest_entry.cache_sequence}") print("\n" + "=" * 50) print("[SUCCESS] All examples completed successfully!") diff --git a/fuzz_atheris/README.md b/fuzz_atheris/README.md index f616f585..6713eb9b 100644 --- a/fuzz_atheris/README.md +++ b/fuzz_atheris/README.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.165.0" +version: "0.166.0" domain: FUZZING -updated: "2026-04-24" +updated: "2026-05-01" route: keywords: [atheris, fuzz inventory, fuzz targets, libfuzzer, corpus] questions: ["what do the Atheris fuzzers cover?", "which targets exist?", "how do I map a target name to a file?"] @@ -10,39 +10,52 @@ route: # Atheris Target Inventory +The executable target registry lives in `targets.tsv`. This table is the human-readable mirror of that manifest. + ## Summary | Target | File | Concern | |:-------|:-----|:--------| -| `bridge` | `fuzz_bridge.py` | Function bridge and registry | -| `builtins` | `fuzz_builtins.py` | Built-in formatting functions | +| `bridge` | `fuzz_bridge.py` | FunctionRegistry bridge machinery | +| `builtins` | `fuzz_builtins.py` | Built-in function Babel boundary | | `cache` | `fuzz_cache.py` | Cache concurrency and audit behavior | | `currency` | `fuzz_currency.py` | Currency formatting oracle | | `cursor` | `fuzz_cursor.py` | Cursor and parse-position helpers | -| `dates` | `fuzz_dates.py` | Locale-aware date/datetime parsing | -| `diagnostics_formatter` | `fuzz_diagnostics_formatter.py` | Diagnostic formatter output | +| `dates` | `fuzz_dates.py` | Locale-aware date and datetime parsing | +| `diagnostics_formatter` | `fuzz_diagnostics_formatter.py` | Diagnostic formatter output and escaping | | `graph` | `fuzz_graph.py` | Dependency graph algorithms | -| `integrity` | `fuzz_integrity.py` | Integrity and validation surfaces | -| `introspection` | `fuzz_introspection.py` | Message introspection | -| `iso` | `fuzz_iso.py` | ISO lookup/introspection | -| `locale_context` | `fuzz_locale_context.py` | LocaleContext formatting paths | -| `localization` | `fuzz_localization.py` | `FluentLocalization` orchestration | +| `integrity` | `fuzz_integrity.py` | Semantic validation and data integrity | +| `introspection` | `fuzz_introspection.py` | Message introspection and reference extraction | +| `iso` | `fuzz_iso.py` | ISO lookup and introspection APIs | +| `locale_context` | `fuzz_locale_context.py` | LocaleContext direct formatting API | +| `localization` | `fuzz_localization.py` | FluentLocalization orchestration | | `lock` | `fuzz_lock.py` | RWLock contention behavior | | `numbers` | `fuzz_numbers.py` | Number formatting oracle | | `oom` | `fuzz_oom.py` | Parser object-density limits | | `parse_currency` | `fuzz_parse_currency.py` | Currency parsing and symbol resolution | -| `parse_decimal` | `fuzz_parse_decimal.py` | Decimal and FluentNumber parsing | +| `parse_decimal` | `fuzz_parse_decimal.py` | Decimal parsing and FluentNumber parsing | | `plural` | `fuzz_plural.py` | CLDR plural category boundaries | -| `roundtrip` | `fuzz_roundtrip.py` | Parser/serializer roundtrip | -| `runtime` | `fuzz_runtime.py` | End-to-end runtime behavior | +| `roundtrip` | `fuzz_roundtrip.py` | Parser and serializer roundtrip | +| `runtime` | `fuzz_runtime.py` | End-to-end runtime behavior and strict mode | | `scope` | `fuzz_scope.py` | Variable scoping invariants | | `serializer` | `fuzz_serializer.py` | AST-construction serializer paths | | `structured` | `fuzz_structured.py` | Structure-aware parser stress | ## How To Run +Inside a contributor devcontainer terminal: + +- `./scripts/fuzz_atheris.sh --help` +- `./scripts/fuzz_atheris.sh --list` + +From the host, use: + ```bash -./scripts/fuzz_atheris.sh numbers --time 60 -./scripts/fuzz_atheris.sh --list # stored crashes/findings, not target names -./scripts/fuzz_atheris.sh --replay runtime path/to/finding +npx --yes @devcontainers/cli up --workspace-folder . +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --help +npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --list ``` + +For a concrete target run from the host: + +- `npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh numbers --time 60` diff --git a/fuzz_atheris/__init__.py b/fuzz_atheris/__init__.py deleted file mode 100644 index c32c42ce..00000000 --- a/fuzz_atheris/__init__.py +++ /dev/null @@ -1,12 +0,0 @@ -"""Atheris fuzz targets for parser security testing. - -This package contains Atheris-based fuzz targets for detecting crashes -and performance issues in the FTL parser. Requires Atheris installation. - -Targets: - stability.py - Detects unexpected exceptions (byte-level chaos) - structured.py - Detects crashes via grammar-aware generation - perf.py - Detects algorithmic complexity issues (ReDoS) - -See docs/FUZZING_GUIDE.md for usage. -""" diff --git a/fuzz_atheris/fuzz_atheris_replay_finding.py b/fuzz_atheris/fuzz_atheris_replay_finding.py index c817c206..d53a5477 100644 --- a/fuzz_atheris/fuzz_atheris_replay_finding.py +++ b/fuzz_atheris/fuzz_atheris_replay_finding.py @@ -5,7 +5,7 @@ runs the parse-serialize-reparse cycle using the production parser/serializer, and reports whether the finding reproduces WITHOUT Atheris instrumentation. -This script runs in the main project venv (not .venv-atheris). If a finding +This script runs in the main project venv, not the dedicated Atheris environment. If a finding reproduces here, it is a real parser/serializer bug. If it does NOT reproduce, it may be an Atheris str-hook instrumentation artifact. diff --git a/fuzz_atheris/fuzz_bridge.py b/fuzz_atheris/fuzz_bridge.py index bc84c0a7..5d7ee9ba 100644 --- a/fuzz_atheris/fuzz_bridge.py +++ b/fuzz_atheris/fuzz_bridge.py @@ -1,1386 +1,9 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: bridge - FunctionRegistry Bridge Machinery -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END -"""FunctionRegistry Bridge Machinery Fuzzer (Atheris). - -Targets: ftllexengine.runtime.function_bridge, ftllexengine.core.value_types -(FunctionRegistry, FunctionSignature, FluentNumber, make_fluent_number, -fluent_function decorator, parameter mapping, locale injection) - -Concern boundary: This fuzzer stress-tests the bridge machinery that connects -FTL function calls to Python implementations. Distinct from fuzz_builtins which -tests built-in functions (NUMBER, DATETIME, CURRENCY) through the bridge; this -fuzzer tests the bridge itself: -- FunctionRegistry.register() with varied function signatures -- Parameter mapping: _to_camel_case conversion and custom param_map -- FunctionRegistry.call() dispatch with adversarial arguments -- Locale injection protocol (fluent_function decorator) -- FunctionSignature construction and immutability -- FluentNumber object contracts (str, hash, contains, len, repr) -- make_fluent_number() visible-precision inference and typed construction -- Dict-like registry interface (__iter__, __contains__, __len__, has_function) -- Freeze/copy lifecycle and isolation -- Metadata API (get_expected_positional_args, get_builtin_metadata) -- Signature validation error paths (arity, collision, auto-naming) -- Adversarial Python objects (evil __str__, __hash__, recursive structures) -- Error wrapping (TypeError/ValueError -> FrozenFluentError) - -Shared infrastructure imported from fuzz_common (BaseFuzzerState, metrics, -reporting); domain-specific metrics tracked in BridgeMetrics dataclass. -Pattern selection uses deterministic round-robin through a pre-built weighted -schedule (select_pattern_round_robin), immune to coverage-guided mutation bias. -Periodic gc.collect() every 256 iterations and -rss_limit_mb=4096 default. - -Requires Python 3.13+ (uses PEP 695 type aliases). -""" +"""FunctionRegistry bridge machinery Atheris entry wrapper.""" from __future__ import annotations -import argparse -import atexit -import contextlib -import gc -import logging -import pathlib -import sys -import time -from dataclasses import dataclass -from decimal import Decimal -from typing import TYPE_CHECKING, Any - -if TYPE_CHECKING: - from collections.abc import Sequence - -# --- Dependency Checks --- -_psutil_mod: Any = None -_atheris_mod: Any = None - -try: # noqa: SIM105 - need module ref for check_dependencies - import psutil as _psutil_mod # type: ignore[no-redef] -except ImportError: - pass - -try: # noqa: SIM105 - need module ref for check_dependencies - import atheris as _atheris_mod # type: ignore[no-redef] -except ImportError: - pass - -from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 - GC_INTERVAL, - BaseFuzzerState, - build_base_stats_dict, - build_weighted_schedule, - check_dependencies, - emit_checkpoint_report, - emit_final_report, - get_process, - print_fuzzer_banner, - record_iteration_metrics, - record_memory, - run_fuzzer, - select_pattern_round_robin, -) - -check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) - -import atheris # noqa: E402, I001 # pylint: disable=C0412,C0413 - - -# --- Domain Metrics --- - - -@dataclass -class BridgeMetrics: - """Domain-specific metrics for bridge fuzzer.""" - - # Registration tests - register_calls: int = 0 - register_failures: int = 0 - - # Call dispatch - call_dispatch_tests: int = 0 - call_dispatch_errors: int = 0 - - # FluentNumber contract checks - fluent_number_checks: int = 0 - make_fluent_number_checks: int = 0 - - # Camel case conversions - camel_case_tests: int = 0 - - # Freeze/copy operations - freeze_copy_tests: int = 0 - - # Locale injection tests - locale_injection_tests: int = 0 - - # Signature validation - signature_validation_tests: int = 0 - - # Metadata API tests - metadata_api_tests: int = 0 - - # Evil object tests - evil_object_tests: int = 0 - - -# --- Global State --- - -_state = BaseFuzzerState( - fuzzer_name="bridge", - fuzzer_target="FunctionRegistry, FunctionSignature, FluentNumber, make_fluent_number", -) -_domain = BridgeMetrics() - -# Pattern weights: (name, weight) -# 16 patterns across 4 categories: -# REGISTRATION (4): register_basic, register_signatures, param_mapping_custom, -# signature_validation -# CONTRACTS (4): fluent_number_contracts, make_fluent_number_api, -# signature_immutability, camel_case_conversion -# DISPATCH (4): call_dispatch, locale_injection, error_wrapping, evil_objects -# INTROSPECTION (4): dict_interface, freeze_copy_lifecycle, fluent_function_decorator, -# metadata_api -_PATTERN_WEIGHTS: tuple[tuple[str, int], ...] = ( - # REGISTRATION - ("register_basic", 10), - ("register_signatures", 12), - ("param_mapping_custom", 8), - ("signature_validation", 6), - # CONTRACTS - ("fluent_number_contracts", 12), - ("make_fluent_number_api", 10), - ("signature_immutability", 5), - ("camel_case_conversion", 10), - # DISPATCH - ("call_dispatch", 12), - ("locale_injection", 10), - ("error_wrapping", 7), - ("evil_objects", 5), - # INTROSPECTION - ("dict_interface", 8), - ("freeze_copy_lifecycle", 8), - ("fluent_function_decorator", 8), - ("metadata_api", 6), -) - -_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( - [name for name, _ in _PATTERN_WEIGHTS], - [weight for _, weight in _PATTERN_WEIGHTS], -) - -# Register intended weights for skew detection -_state.pattern_intended_weights = {name: float(weight) for name, weight in _PATTERN_WEIGHTS} - - -class BridgeFuzzError(Exception): - """Raised when a bridge invariant is breached.""" - - -# Allowed exceptions from bridge operations -_ALLOWED_EXCEPTIONS = ( - ValueError, - TypeError, - OverflowError, - ArithmeticError, - RecursionError, - RuntimeError, -) - - -# --- Reporting --- - -_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "bridge" - - -def _build_stats_dict() -> dict[str, Any]: - """Build complete stats dictionary including domain metrics.""" - stats = build_base_stats_dict(_state) - - # Registration - stats["register_calls"] = _domain.register_calls - stats["register_failures"] = _domain.register_failures - - # Call dispatch - stats["call_dispatch_tests"] = _domain.call_dispatch_tests - stats["call_dispatch_errors"] = _domain.call_dispatch_errors - - # FluentNumber - stats["fluent_number_checks"] = _domain.fluent_number_checks - stats["make_fluent_number_checks"] = _domain.make_fluent_number_checks - - # Camel case - stats["camel_case_tests"] = _domain.camel_case_tests - - # Freeze/copy - stats["freeze_copy_tests"] = _domain.freeze_copy_tests - - # Locale injection - stats["locale_injection_tests"] = _domain.locale_injection_tests - - # Signature validation - stats["signature_validation_tests"] = _domain.signature_validation_tests - - # Metadata API - stats["metadata_api_tests"] = _domain.metadata_api_tests - - # Evil objects - stats["evil_object_tests"] = _domain.evil_object_tests - - return stats - - -_REPORT_FILENAME = "fuzz_bridge_report.json" - - -def _emit_checkpoint() -> None: - """Emit periodic checkpoint (uses checkpoint markers).""" - stats = _build_stats_dict() - emit_checkpoint_report( - _state, - stats, - _REPORT_DIR, - _REPORT_FILENAME, - ) - - -def _emit_report() -> None: - """Emit comprehensive final report (crash-proof).""" - stats = _build_stats_dict() - emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) - - -atexit.register(_emit_report) - - -# --- Suppress logging and instrument imports --- -logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) - -with atheris.instrument_imports(include=["ftllexengine"]): - from ftllexengine.core.value_types import make_fluent_number - from ftllexengine.diagnostics.errors import FrozenFluentError - from ftllexengine.runtime.bundle import FluentBundle - from ftllexengine.runtime.cache_config import CacheConfig - from ftllexengine.runtime.function_bridge import ( - FluentNumber, - FunctionRegistry, - fluent_function, - ) - from ftllexengine.runtime.functions import ( - create_default_registry, - get_shared_registry, - ) - - -# --- Constants --- - -_LOCALES: Sequence[str] = ( - "en", - "en_US", - "de", - "de_DE", - "ar", - "ar_SA", - "ja", - "ja_JP", - "fr", - "fr_FR", - "ru", -) - -# Snake_case names for _to_camel_case testing -_SNAKE_CASE_NAMES: Sequence[str] = ( - "minimum_fraction_digits", - "maximum_fraction_digits", - "use_grouping", - "date_style", - "time_style", - "currency_display", - "value", - "x", - "_private_param", - "__dunder_param", - "a_b_c_d_e", - "already_camel", - "", - "_", - "__", - "___", - "UPPER_CASE", - "mixed_Case_Style", - "single", -) - -# Expected camelCase conversions for invariant checking -_CAMEL_EXPECTED: dict[str, str] = { - "minimum_fraction_digits": "minimumFractionDigits", - "maximum_fraction_digits": "maximumFractionDigits", - "use_grouping": "useGrouping", - "value": "value", - "x": "x", - "single": "single", -} - - -def _pick_locale(fdp: atheris.FuzzedDataProvider) -> str: - """Pick locale: 90% valid, 10% fuzzed.""" - if fdp.ConsumeIntInRange(0, 9) < 9: - return fdp.PickValueInList(list(_LOCALES)) - return fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 20)) - - -def _group_ascii_thousands(value: int) -> str: - """Render an integer with ASCII comma grouping.""" - digits = str(abs(value)) - groups: list[str] = [] - while digits: - groups.append(digits[-3:]) - digits = digits[:-3] - grouped = ",".join(reversed(groups)) - return f"-{grouped}" if value < 0 else grouped - - -def _call_make_fluent_number( - value: int | Decimal, - *, - formatted: str | None = None, -) -> FluentNumber: - """Call make_fluent_number and fail hard on unexpected valid-input errors.""" - try: - return make_fluent_number(value, formatted=formatted) - except (TypeError, ValueError) as err: - msg = ( - "make_fluent_number unexpectedly rejected a valid contract input: " - f"value={value!r}, formatted={formatted!r}, error={err}" - ) - raise BridgeFuzzError(msg) from err - - -# --- Pattern Implementations --- -# REGISTRATION (4 patterns) - - -def _pattern_register_basic(fdp: atheris.FuzzedDataProvider) -> None: - """Basic function registration: name generation, simple callables.""" - _domain.register_calls += 1 - reg = FunctionRegistry() - num_funcs = fdp.ConsumeIntInRange(1, 5) - - for i in range(num_funcs): - - def make_fn(idx: int) -> Any: - def fn(_value: Any) -> str: - return f"result_{idx}" - - fn.__name__ = f"test_func_{idx}" - return fn - - func = make_fn(i) - ftl_name = f"FUNC{i}" if fdp.ConsumeBool() else None - reg.register(func, ftl_name=ftl_name) - - # Invariant: len matches registration count - if len(reg) != num_funcs: - msg = f"Registry len {len(reg)} != expected {num_funcs}" - raise BridgeFuzzError(msg) - - -def _pattern_register_signatures(fdp: atheris.FuzzedDataProvider) -> None: - """Registration with various Python function signatures.""" - _domain.register_calls += 1 - reg = FunctionRegistry() - variant = fdp.ConsumeIntInRange(0, 6) - - match variant: - case 0: - # Positional-only params - def pos_only(value: Any, /) -> str: - return str(value) - - reg.register(pos_only, ftl_name="POS_ONLY") - - case 1: - # Keyword-only params - def kw_only(value: Any, *, style: str = "default") -> str: - return f"{value}_{style}" - - reg.register(kw_only, ftl_name="KW_ONLY") - result = reg.call("KW_ONLY", [42], {"style": "custom"}) - if "42" not in str(result): - msg = f"KW_ONLY result missing value: {result}" - raise BridgeFuzzError(msg) - - case 2: - # *args function - def varargs(*args: Any) -> str: - return "_".join(str(a) for a in args) - - reg.register(varargs, ftl_name="VARARGS") - n = fdp.ConsumeIntInRange(0, 5) - positional = [fdp.ConsumeIntInRange(0, 100) for _ in range(n)] - reg.call("VARARGS", positional, {}) - - case 3: - # **kwargs function - def kwargs_fn(value: Any, **kwargs: Any) -> str: - return f"{value}_{len(kwargs)}" - - reg.register(kwargs_fn, ftl_name="KWARGS_FN") - named = {f"key{i}": i for i in range(fdp.ConsumeIntInRange(0, 5))} - reg.call("KWARGS_FN", ["hello"], named) - - case 4: - # Function with many parameters (auto-mapping stress) - def many_params( - value: Any, - *, - minimum_fraction_digits: int = 0, - maximum_fraction_digits: int = 3, - use_grouping: bool = True, - currency_display: str = "symbol", - ) -> str: - return str(value) - - reg.register(many_params, ftl_name="MANY") - info = reg.get_function_info("MANY") - if info is None: - msg = "get_function_info returned None for registered function" - raise BridgeFuzzError(msg) - # Verify param_mapping includes all snake_case -> camelCase - mapping_dict = dict(info.param_mapping) - if "minimumFractionDigits" not in mapping_dict: - msg = f"Missing camelCase mapping: {mapping_dict}" - raise BridgeFuzzError(msg) - - case 5: - # Duplicate registration (should overwrite) - def fn_v1(_value: Any) -> str: - return "v1" - - def fn_v2(_value: Any) -> str: - return "v2" - - fn_v2.__name__ = "fn_v1" - reg.register(fn_v1, ftl_name="DUP") - reg.register(fn_v2, ftl_name="DUP") - result = reg.call("DUP", ["x"], {}) - if str(result) != "v2": - msg = f"Duplicate registration did not overwrite: got {result}" - raise BridgeFuzzError(msg) - - case _: - # Lambda registration - reg.register(str, ftl_name="LAMBDA") - reg.call("LAMBDA", [42], {}) - - -def _pattern_param_mapping_custom(fdp: atheris.FuzzedDataProvider) -> None: - """Custom param_map overrides auto-generated mappings.""" - _domain.register_calls += 1 - reg = FunctionRegistry() - - def target_fn(value: Any, *, minimum_fraction_digits: int = 0) -> str: - return str(value) - - variant = fdp.ConsumeIntInRange(0, 2) - - if variant == 0: - # Custom mapping overrides auto-generated - custom_map = {"customName": "minimum_fraction_digits"} - reg.register(target_fn, ftl_name="CUSTOM_MAP", param_map=custom_map) - result = reg.call("CUSTOM_MAP", [42], {"customName": 2}) - if "42" not in str(result): - msg = f"Custom param_map call failed: {result}" - raise BridgeFuzzError(msg) - - elif variant == 1: - # Empty custom map (auto-generation only) - reg.register(target_fn, ftl_name="EMPTY_MAP", param_map={}) - info = reg.get_function_info("EMPTY_MAP") - if info is None or len(info.param_mapping) == 0: - msg = "Empty param_map should still have auto-generated mappings" - raise BridgeFuzzError(msg) - - else: - # Fuzzed param_map keys - fuzzed_key = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 30)) - custom_map = {fuzzed_key: "minimum_fraction_digits"} - reg.register(target_fn, ftl_name="FUZZ_MAP", param_map=custom_map) - with contextlib.suppress(Exception): - reg.call("FUZZ_MAP", [1], {fuzzed_key: 2}) - - -def _pattern_signature_validation(fdp: atheris.FuzzedDataProvider) -> None: - """Test registration error paths: locale arity, collision, auto-naming.""" - _domain.signature_validation_tests += 1 - reg = FunctionRegistry() - variant = fdp.ConsumeIntInRange(0, 3) - - match variant: - case 0: - # inject_locale with insufficient positional params -> TypeError - @fluent_function(inject_locale=True) - def bad_fn(value: Any) -> str: - return str(value) - - try: - reg.register(bad_fn, ftl_name="BAD_LOCALE") - msg = "inject_locale with 1 positional param did not raise TypeError" - raise BridgeFuzzError(msg) - except TypeError: - _domain.register_failures += 1 - - case 1: - # Underscore collision detection -> ValueError - def colliding( - value: Any, - *, - _data: int = 0, - data: int = 0, - ) -> str: - return str(value) - - try: - reg.register(colliding, ftl_name="COLLIDE") - msg = "Underscore collision did not raise ValueError" - raise BridgeFuzzError(msg) - except ValueError: - _domain.register_failures += 1 - - case 2: - # Auto-naming from __name__ (ftl_name=None) - def my_custom_function(value: Any) -> str: - return str(value) - - reg.register(my_custom_function) - if "MY_CUSTOM_FUNCTION" not in reg: - msg = "Auto-naming failed: MY_CUSTOM_FUNCTION not in registry" - raise BridgeFuzzError(msg) - - case _: - # inject_locale=True with *args function (should succeed) - @fluent_function(inject_locale=True) - def varargs_locale(*args: Any) -> str: - return str(args) - - reg.register(varargs_locale, ftl_name="VARARGS_LOCALE") - if not reg.should_inject_locale("VARARGS_LOCALE"): - msg = "varargs function with inject_locale not detected" - raise BridgeFuzzError(msg) - - -# CONTRACTS (4 patterns) - - -def _pattern_fluent_number_contracts(fdp: atheris.FuzzedDataProvider) -> None: - """FluentNumber object contracts: str, repr, precision, frozen.""" - _domain.fluent_number_checks += 1 - variant = fdp.ConsumeIntInRange(0, 3) - - match variant: - case 0: - # Basic construction and str - fn = FluentNumber(value=Decimal("1234.56"), formatted="1,234.56", precision=2) - if str(fn) != "1,234.56": - msg = f"FluentNumber str() = '{fn}', expected '1,234.56'" - raise BridgeFuzzError(msg) - - case 1: - # repr includes value info - fn = FluentNumber(value=Decimal("99.9"), formatted="99.9", precision=1) - r = repr(fn) - if "99.9" not in r: - msg = f"FluentNumber repr missing value: {r}" - raise BridgeFuzzError(msg) - - case 2: - # Precision can be None - fn = FluentNumber(value=42, formatted="42", precision=None) - if fn.precision is not None: - msg = "FluentNumber precision should be None" - raise BridgeFuzzError(msg) - if str(fn) != "42": - msg = f"FluentNumber str() with None precision = '{fn}'" - raise BridgeFuzzError(msg) - - case _: - # Frozen: attribute assignment should fail - fn = FluentNumber(value=1, formatted="1", precision=0) - try: - fn.value = 999 # type: ignore[misc] - msg = "FluentNumber is not frozen: attribute assignment succeeded" - raise BridgeFuzzError(msg) - except AttributeError: - pass # Expected: frozen dataclass - - -def _check_make_fluent_number_default_decimal( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Default Decimal formatting preserves trailing-zero precision.""" - int_part = fdp.ConsumeIntInRange(-9999, 9999) - frac_core = str(fdp.ConsumeIntInRange(1, 999)).zfill(3) - trailing_zeros = "0" * fdp.ConsumeIntInRange(1, 4) - value = Decimal(f"{int_part}.{frac_core}{trailing_zeros}") - fn = _call_make_fluent_number(value) - expected_precision = len(frac_core) + len(trailing_zeros) - if fn.formatted != str(value) or fn.precision != expected_precision: - msg = ( - "make_fluent_number(default Decimal) did not preserve " - f"string/precision: {fn!r} vs value={value!r}, " - f"expected_precision={expected_precision}" - ) - raise BridgeFuzzError(msg) - - -def _check_make_fluent_number_default_int( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Default integer formatting exposes zero visible decimals.""" - value = fdp.ConsumeIntInRange(-1_000_000, 1_000_000) - fn = _call_make_fluent_number(value) - if fn.formatted != str(value) or fn.precision != 0: - msg = ( - "make_fluent_number(int) did not preserve zero visible precision: " - f"{fn!r} for value={value}" - ) - raise BridgeFuzzError(msg) - - -def _check_make_fluent_number_fractional_int( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Explicit fractional formatting controls visible precision for ints.""" - value = fdp.ConsumeIntInRange(-999, 999) - frac_digits = "0" * fdp.ConsumeIntInRange(1, 4) - formatted = f"{value}.{frac_digits}" - fn = _call_make_fluent_number(value, formatted=formatted) - if fn.formatted != formatted or fn.precision != len(frac_digits): - msg = ( - "make_fluent_number(explicit fractional int) miscomputed precision: " - f"{fn!r} for formatted={formatted!r}" - ) - raise BridgeFuzzError(msg) - - -def _check_make_fluent_number_grouped_int( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Grouping separators do not create decimal precision.""" - value = fdp.ConsumeIntInRange(1_000, 999_999) - formatted = _group_ascii_thousands(value) - fn = _call_make_fluent_number(value, formatted=formatted) - if fn.formatted != formatted or fn.precision != 0: - msg = ( - "make_fluent_number(grouped int) treated grouping as decimals: " - f"{fn!r} for formatted={formatted!r}" - ) - raise BridgeFuzzError(msg) - - -def _check_make_fluent_number_localized_decimal( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Localized formatted strings drive visible precision inference.""" - whole = fdp.ConsumeIntInRange(1, 9999) - precision = fdp.ConsumeIntInRange(1, 4) - fraction = str(fdp.ConsumeIntInRange(0, (10**precision) - 1)).zfill(precision) - value = Decimal(f"{whole}.{fraction}") - grouped = _group_ascii_thousands(whole).replace(",", " ") - formatted = f"{grouped},{fraction} EUR" - fn = _call_make_fluent_number(value, formatted=formatted) - if fn.formatted != formatted or fn.precision != precision: - msg = ( - "make_fluent_number(localized decimal) miscomputed visible precision: " - f"{fn!r} for formatted={formatted!r}" - ) - raise BridgeFuzzError(msg) - - -def _check_make_fluent_number_disambiguation( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Formatted decimals for integer values are not mistaken for grouping.""" - precision = fdp.ConsumeIntInRange(1, 4) - separator = fdp.PickValueInList([",", "."]) - zeros = "0" * precision - value = fdp.PickValueInList([1, -1]) - formatted = f"{value}{separator}{zeros}" - fn = _call_make_fluent_number(value, formatted=formatted) - if fn.formatted != formatted or fn.precision != precision: - msg = ( - "make_fluent_number(disambiguation) lost decimal precision: " - f"{fn!r} for formatted={formatted!r}" - ) - raise BridgeFuzzError(msg) - - -def _check_make_fluent_number_bool_rejection( - _fdp: atheris.FuzzedDataProvider, -) -> None: - """Bool inputs are rejected like direct FluentNumber construction.""" - try: - make_fluent_number(True) - except TypeError: - return - msg = "make_fluent_number(bool) should raise TypeError" - raise BridgeFuzzError(msg) - - -def _pattern_make_fluent_number_api(fdp: atheris.FuzzedDataProvider) -> None: - """make_fluent_number derives visible precision from domain values.""" - _domain.make_fluent_number_checks += 1 - handlers = ( - _check_make_fluent_number_default_decimal, - _check_make_fluent_number_default_int, - _check_make_fluent_number_fractional_int, - _check_make_fluent_number_grouped_int, - _check_make_fluent_number_localized_decimal, - _check_make_fluent_number_disambiguation, - _check_make_fluent_number_bool_rejection, - ) - handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] - handler(fdp) - - -def _pattern_signature_immutability(fdp: atheris.FuzzedDataProvider) -> None: - """Verify FunctionSignature immutability and param_mapping tuple type.""" - reg = create_default_registry() - func_name = fdp.PickValueInList(["NUMBER", "DATETIME", "CURRENCY"]) - info = reg.get_function_info(func_name) - - if info is None: - msg = f"{func_name} FunctionSignature is None" - raise BridgeFuzzError(msg) - - # param_mapping should be tuple of tuples (immutable) - if not isinstance(info.param_mapping, tuple): - msg = f"param_mapping is {type(info.param_mapping)}, expected tuple" - raise BridgeFuzzError(msg) - - for pair in info.param_mapping: - if not isinstance(pair, tuple) or len(pair) != 2: - msg = f"param_mapping entry is not (str, str): {pair}" - raise BridgeFuzzError(msg) - - # FunctionSignature should be frozen - try: - info.ftl_name = "HACKED" # type: ignore[misc] - msg = "FunctionSignature is not frozen" - raise BridgeFuzzError(msg) - except AttributeError: - pass # Expected - - # Callable should be present - if not callable(info.callable): - msg = "FunctionSignature callable is not callable" - raise BridgeFuzzError(msg) - - # ftl_name should match what we queried - if info.ftl_name != func_name: - msg = f"FunctionSignature.ftl_name = '{info.ftl_name}', expected '{func_name}'" - raise BridgeFuzzError(msg) - - # Fuzzed: try getting info for nonexistent function - if fdp.ConsumeBool(): - fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) - bad_info = reg.get_function_info(fuzzed) - if bad_info is not None and fuzzed not in ("NUMBER", "DATETIME", "CURRENCY"): - msg = f"get_function_info returned non-None for unknown '{fuzzed}'" - raise BridgeFuzzError(msg) - - -def _pattern_camel_case_conversion(fdp: atheris.FuzzedDataProvider) -> None: - """Test _to_camel_case with known and fuzzed inputs.""" - _domain.camel_case_tests += 1 - variant = fdp.ConsumeIntInRange(0, 2) - - if variant == 0: - # Known conversions with invariant checks - for snake, expected_camel in _CAMEL_EXPECTED.items(): - result = FunctionRegistry._to_camel_case(snake) - if result != expected_camel: - msg = f"_to_camel_case('{snake}') = '{result}', expected '{expected_camel}'" - raise BridgeFuzzError(msg) - - elif variant == 1: - # Fuzzed snake_case names from curated list - name = fdp.PickValueInList(list(_SNAKE_CASE_NAMES)) - result = FunctionRegistry._to_camel_case(name) - if not isinstance(result, str): - msg = f"_to_camel_case returned non-string: {type(result)}" - raise BridgeFuzzError(msg) - - else: - # Fully fuzzed input - raw = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 50)) - result = FunctionRegistry._to_camel_case(raw) - if not isinstance(result, str): - msg = "_to_camel_case returned non-string for fuzzed input" - raise BridgeFuzzError(msg) - - -# DISPATCH (4 patterns) - - -def _pattern_call_dispatch(fdp: atheris.FuzzedDataProvider) -> None: - """Test call() dispatch with varied argument shapes.""" - _domain.call_dispatch_tests += 1 - reg = FunctionRegistry() - - def echo_fn(value: Any, **kwargs: Any) -> str: - return f"{value}|{len(kwargs)}" - - reg.register(echo_fn, ftl_name="ECHO") - - variant = fdp.ConsumeIntInRange(0, 4) - - match variant: - case 0: - # Normal call - result = reg.call("ECHO", [42], {"key": "val"}) - if "42" not in str(result): - msg = f"Normal call failed: {result}" - raise BridgeFuzzError(msg) - - case 1: - # No positional args - with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): - reg.call("ECHO", [], {}) - - case 2: - # Many positional args - n = fdp.ConsumeIntInRange(2, 10) - args = [fdp.ConsumeIntInRange(0, 100) for _ in range(n)] - with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): - reg.call("ECHO", args, {}) - - case 3: - # Unknown function name - with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): - reg.call("NONEXISTENT", [1], {}) - - case _: - # Call with many kwargs - n = fdp.ConsumeIntInRange(1, 10) - kwargs = {f"k{i}": i for i in range(n)} - reg.call("ECHO", ["val"], kwargs) - - -def _pattern_locale_injection(fdp: atheris.FuzzedDataProvider) -> None: - """Test locale injection protocol with custom functions.""" - _domain.locale_injection_tests += 1 - reg = FunctionRegistry() - variant = fdp.ConsumeIntInRange(0, 3) - - match variant: - case 0: - # Decorated with inject_locale=True - @fluent_function(inject_locale=True) - def locale_fn(value: Any, locale_code: str) -> str: - return f"{value}@{locale_code}" - - reg.register(locale_fn, ftl_name="LOCALE_FN") - - if not reg.should_inject_locale("LOCALE_FN"): - msg = "should_inject_locale returned False for decorated function" - raise BridgeFuzzError(msg) - - case 1: - # Not decorated -- should NOT inject locale - def plain_fn(value: Any) -> str: - return str(value) - - reg.register(plain_fn, ftl_name="PLAIN_FN") - - if reg.should_inject_locale("PLAIN_FN"): - msg = "should_inject_locale returned True for plain function" - raise BridgeFuzzError(msg) - - case 2: - # Nonexistent function - if reg.should_inject_locale("DOES_NOT_EXIST"): - msg = "should_inject_locale returned True for nonexistent function" - raise BridgeFuzzError(msg) - - case _: - # End-to-end: locale injection through FluentBundle - locale = _pick_locale(fdp) - bundle = FluentBundle(locale, strict=False) - - @fluent_function(inject_locale=True) - def fmt_fn(value: Any, locale_code: str) -> str: - return f"[{locale_code}:{value}]" - - bundle.add_function("FMT", fmt_fn) - bundle.add_resource("msg = { FMT($val) }\n") - with contextlib.suppress(Exception): - bundle.format_pattern("msg", {"val": "test"}) - - -def _pattern_error_wrapping(fdp: atheris.FuzzedDataProvider) -> None: - """Verify TypeError/ValueError from functions are wrapped as FrozenFluentError.""" - _domain.call_dispatch_errors += 1 - reg = create_default_registry() - variant = fdp.ConsumeIntInRange(0, 2) - - match variant: - case 0: - # Call NUMBER with wrong type - try: - reg.call("NUMBER", ["not_a_number", "en"], {}) - except FrozenFluentError: - pass # Expected wrapping - except (TypeError, ValueError): - pass # Also acceptable - - case 1: - # Call nonexistent function - with contextlib.suppress(FrozenFluentError, KeyError): - reg.call("NONEXISTENT", [1], {}) - - case _: - # Call with wrong arity - with contextlib.suppress(FrozenFluentError, TypeError): - reg.call("NUMBER", [], {}) - - -def _pattern_evil_objects(fdp: atheris.FuzzedDataProvider) -> None: - """Adversarial Python objects as FTL variables through FluentBundle.""" - _domain.evil_object_tests += 1 - variant = fdp.ConsumeIntInRange(0, 5) - - match variant: - case 0: - # Evil __str__ raises RuntimeError - class EvilStr: - """Object whose __str__ raises RuntimeError.""" - - def __str__(self) -> str: - raise RuntimeError("evil __str__") # noqa: EM101 - dynamic type in error message - - var: object = EvilStr() - - case 1: - # Evil __hash__ raises TypeError - class EvilHash: - """Object whose __hash__ raises TypeError.""" - - def __hash__(self) -> int: - raise TypeError("unhashable evil") # noqa: EM101 - dynamic type in error message - - def __str__(self) -> str: - return "evil" - - var = EvilHash() - - case 2: - # Recursive list - recursive_list: list[object] = [] - recursive_list.append(recursive_list) - var = recursive_list - - case 3: - # Recursive dict - recursive_dict: dict[str, object] = {} - recursive_dict["self"] = recursive_dict - var = recursive_dict - - case 4: - # Massive string - size = fdp.ConsumeIntInRange(1000, 50000) - var = "A" * size - - case _: - # None value - var = None - - # Full FluentBundle resolution path with adversarial objects - bundle = FluentBundle("en-US", cache=CacheConfig() if fdp.ConsumeBool() else None) - bundle.add_resource("msg = Value: { $var }\n") - with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): - bundle.format_pattern("msg", {"var": var}) # type: ignore[dict-item] - - -# INTROSPECTION (4 patterns) - - -def _pattern_dict_interface(fdp: atheris.FuzzedDataProvider) -> None: # noqa: PLR0912 - dispatch - """Dict-like interface: __iter__, __contains__, __len__, list_functions, __repr__.""" - reg = create_default_registry() - variant = fdp.ConsumeIntInRange(0, 4) - - match variant: - case 0: - # __contains__ for known builtins - for name in ("NUMBER", "DATETIME", "CURRENCY"): - if name not in reg: - msg = f"Default registry missing {name} via __contains__" - raise BridgeFuzzError(msg) - # Nonexistent - fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) - if fuzzed in reg and fuzzed not in ("NUMBER", "DATETIME", "CURRENCY"): - msg = f"Registry contains unexpected function: {fuzzed}" - raise BridgeFuzzError(msg) - - case 1: - # __iter__ yields all function names - names = list(reg) - for name in ("NUMBER", "DATETIME", "CURRENCY"): - if name not in names: - msg = f"__iter__ missing {name}" - raise BridgeFuzzError(msg) - - case 2: - # list_functions returns all registered names (insertion order) - funcs = reg.list_functions() - if len(funcs) != len(reg): - msg = f"list_functions length {len(funcs)} != len(reg) {len(reg)}" - raise BridgeFuzzError(msg) - for name in ("NUMBER", "DATETIME", "CURRENCY"): - if name not in funcs: - msg = f"list_functions missing {name}" - raise BridgeFuzzError(msg) - - case 3: - # get_python_name and get_callable - py_name = reg.get_python_name("NUMBER") - if py_name is None: - msg = "get_python_name('NUMBER') returned None" - raise BridgeFuzzError(msg) - callable_fn = reg.get_callable("NUMBER") - if callable_fn is None: - msg = "get_callable('NUMBER') returned None" - raise BridgeFuzzError(msg) - # Nonexistent - if reg.get_python_name("FAKE") is not None: - msg = "get_python_name returned non-None for nonexistent" - raise BridgeFuzzError(msg) - - case _: - # __repr__ consistency - empty_reg = FunctionRegistry() - r = repr(empty_reg) - if "0" not in r: - msg = f"Empty registry repr missing '0': {r}" - raise BridgeFuzzError(msg) - empty_reg.register(str, ftl_name="TEST") - r2 = repr(empty_reg) - if "1" not in r2: - msg = f"Single-func registry repr missing '1': {r2}" - raise BridgeFuzzError(msg) - - -def _pattern_freeze_copy_lifecycle(fdp: atheris.FuzzedDataProvider) -> None: - """Freeze/copy lifecycle: isolation, mutation prevention.""" - _domain.freeze_copy_tests += 1 - variant = fdp.ConsumeIntInRange(0, 3) - - match variant: - case 0: - # Freeze prevents registration - reg = FunctionRegistry() - reg.register(str, ftl_name="PRE") - reg.freeze() - if not reg.frozen: - msg = "Registry not frozen after freeze()" - raise BridgeFuzzError(msg) - try: - reg.register(str, ftl_name="POST") - msg = "Frozen registry accepted registration" - raise BridgeFuzzError(msg) - except TypeError: - pass # Expected - - case 1: - # Copy is unfrozen and independent - shared = get_shared_registry() - copy = shared.copy() - if copy.frozen: - msg = "Copy should be unfrozen" - raise BridgeFuzzError(msg) - - def custom(_value: Any) -> str: - return "custom" - - copy.register(custom, ftl_name="COPY_ONLY") - if "COPY_ONLY" in shared: - msg = "Copy polluted original registry" - raise BridgeFuzzError(msg) - if "COPY_ONLY" not in copy: - msg = "Copy missing newly registered function" - raise BridgeFuzzError(msg) - - case 2: - # Copy preserves all original functions - original = create_default_registry() - original_funcs = set(original) - copy = original.copy() - copy_funcs = set(copy) - if original_funcs != copy_funcs: - msg = f"Copy functions differ: {original_funcs - copy_funcs}" - raise BridgeFuzzError(msg) - - case _: - # Double freeze is safe (idempotent) - reg = FunctionRegistry() - reg.freeze() - reg.freeze() # Should not raise - if not reg.frozen: - msg = "Double freeze broke frozen state" - raise BridgeFuzzError(msg) - - -def _pattern_fluent_function_decorator(fdp: atheris.FuzzedDataProvider) -> None: - """Test @fluent_function decorator edge cases.""" - variant = fdp.ConsumeIntInRange(0, 3) - - match variant: - case 0: - # Bare decorator (no parentheses) - @fluent_function - def bare_fn(value: Any) -> str: - return str(value) - - if bare_fn(42) != "42": - msg = f"Bare decorator broke function: {bare_fn(42)}" - raise BridgeFuzzError(msg) - - case 1: - # Decorator with parentheses, no inject_locale - @fluent_function() - def parens_fn(value: Any) -> str: - return str(value) - - if parens_fn(42) != "42": - msg = f"Parenthesized decorator broke function: {parens_fn(42)}" - raise BridgeFuzzError(msg) - - case 2: - # Decorator with inject_locale=True sets attribute - @fluent_function(inject_locale=True) - def locale_fn(value: Any, locale_code: str) -> str: - return f"{value}@{locale_code}" - - attr_name = "_ftl_requires_locale" - if not getattr(locale_fn, attr_name, False): - msg = "inject_locale=True did not set attribute" - raise BridgeFuzzError(msg) - - result = locale_fn(42, "en") - if result != "42@en": - msg = f"Decorated function broken: {result}" - raise BridgeFuzzError(msg) - - case _: - # Register decorated function in registry - @fluent_function(inject_locale=True) - def reg_fn(_value: Any, locale_code: str) -> str: - return f"[{locale_code}]" - - reg = FunctionRegistry() - reg.register(reg_fn, ftl_name="REG_FN") - if not reg.should_inject_locale("REG_FN"): - msg = "Decorated + registered: should_inject_locale is False" - raise BridgeFuzzError(msg) - - -def _pattern_metadata_api(fdp: atheris.FuzzedDataProvider) -> None: # noqa: PLR0912 - dispatch - """Test get_expected_positional_args, get_builtin_metadata, has_function.""" - _domain.metadata_api_tests += 1 - reg = create_default_registry() - variant = fdp.ConsumeIntInRange(0, 4) - - match variant: - case 0: - # get_expected_positional_args for known builtins - for name in ("NUMBER", "DATETIME", "CURRENCY"): - result = reg.get_expected_positional_args(name) - if result is None: - msg = f"get_expected_positional_args({name}) returned None" - raise BridgeFuzzError(msg) - if result != 1: - msg = f"get_expected_positional_args({name}) = {result}, expected 1" - raise BridgeFuzzError(msg) - - case 1: - # get_expected_positional_args for unknown function - fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) - result = reg.get_expected_positional_args(fuzzed) - if fuzzed not in ("NUMBER", "DATETIME", "CURRENCY") and result is not None: - msg = f"get_expected_positional_args({fuzzed!r}) returned {result}" - raise BridgeFuzzError(msg) - - case 2: - # get_builtin_metadata for known builtins - for name in ("NUMBER", "DATETIME", "CURRENCY"): - meta = reg.get_builtin_metadata(name) - if meta is None: - msg = f"get_builtin_metadata({name}) returned None" - raise BridgeFuzzError(msg) - if not meta.requires_locale: - msg = f"Builtin {name} should require locale" - raise BridgeFuzzError(msg) - - case 3: - # has_function vs __contains__ consistency - for name in ("NUMBER", "DATETIME", "CURRENCY"): - has = reg.has_function(name) - contains = name in reg - if has != contains: - msg = f"has_function != __contains__ for {name}" - raise BridgeFuzzError(msg) - fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) - has = reg.has_function(fuzzed) - contains = fuzzed in reg - if has != contains: - msg = f"has_function != __contains__ for fuzzed {fuzzed!r}" - raise BridgeFuzzError(msg) - - case _: - # get_builtin_metadata for unknown function returns None - fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 30)) - meta = reg.get_builtin_metadata(fuzzed) - if fuzzed not in ("NUMBER", "DATETIME", "CURRENCY") and meta is not None: - msg = f"get_builtin_metadata({fuzzed!r}) returned non-None" - raise BridgeFuzzError(msg) - - -# --- Pattern Dispatch --- - -_PATTERN_DISPATCH: dict[str, Any] = { - "register_basic": _pattern_register_basic, - "register_signatures": _pattern_register_signatures, - "param_mapping_custom": _pattern_param_mapping_custom, - "signature_validation": _pattern_signature_validation, - "fluent_number_contracts": _pattern_fluent_number_contracts, - "make_fluent_number_api": _pattern_make_fluent_number_api, - "signature_immutability": _pattern_signature_immutability, - "camel_case_conversion": _pattern_camel_case_conversion, - "call_dispatch": _pattern_call_dispatch, - "locale_injection": _pattern_locale_injection, - "error_wrapping": _pattern_error_wrapping, - "evil_objects": _pattern_evil_objects, - "dict_interface": _pattern_dict_interface, - "freeze_copy_lifecycle": _pattern_freeze_copy_lifecycle, - "fluent_function_decorator": _pattern_fluent_function_decorator, - "metadata_api": _pattern_metadata_api, -} - - -# --- Main Entry Point --- - - -def test_one_input(data: bytes) -> None: - """Atheris entry point: fuzz FunctionRegistry bridge machinery.""" - if _state.iterations == 0: - _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) - - _state.iterations += 1 - _state.status = "running" - - if _state.iterations % _state.checkpoint_interval == 0: - _emit_checkpoint() - - start_time = time.perf_counter() - fdp = atheris.FuzzedDataProvider(data) - - pattern = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) - _state.pattern_coverage[pattern] = _state.pattern_coverage.get(pattern, 0) + 1 - - if fdp.remaining_bytes() < 4: - return - - pattern_func = _PATTERN_DISPATCH[pattern] - - try: - pattern_func(fdp) - - except BridgeFuzzError: - _state.findings += 1 - raise - - except (*_ALLOWED_EXCEPTIONS, FrozenFluentError): - pass # Expected for invalid inputs - - except Exception as e: # pylint: disable=broad-exception-caught - error_key = f"{type(e).__name__}_{str(e)[:30]}" - _state.error_counts[error_key] = _state.error_counts.get(error_key, 0) + 1 - - finally: - # Semantic interestingness: patterns exercising complex paths, - # error paths, or wall-time > 1ms indicating unusual code path - is_interesting = ( - pattern - in ( - "evil_objects", - "signature_validation", - "locale_injection", - "metadata_api", - "error_wrapping", - "make_fluent_number_api", - "dict_interface", - "signature_immutability", - "register_signatures", - ) - or (time.perf_counter() - start_time) * 1000 > 1.0 - ) - record_iteration_metrics( - _state, - pattern, - start_time, - data, - is_interesting=is_interesting, - ) - - if _state.iterations % GC_INTERVAL == 0: - gc.collect() - - if _state.iterations % 100 == 0: - record_memory(_state) - - -def main() -> None: - """Run the bridge machinery fuzzer with CLI support.""" - parser = argparse.ArgumentParser( - description="FunctionRegistry bridge machinery fuzzer using Atheris/libFuzzer", - epilog="All unrecognized arguments are passed to libFuzzer.", - ) - parser.add_argument( - "--checkpoint-interval", - type=int, - default=500, - help="Emit report every N iterations (default: 500)", - ) - parser.add_argument( - "--seed-corpus-size", - type=int, - default=500, - help="Maximum size of in-memory seed corpus (default: 500)", - ) - - args, remaining = parser.parse_known_args() - _state.checkpoint_interval = args.checkpoint_interval - _state.seed_corpus_max_size = args.seed_corpus_size - - sys.argv = [sys.argv[0], *remaining] - - # Inject RSS limit if not specified - if not any(arg.startswith("-rss_limit_mb") for arg in sys.argv): - sys.argv.append("-rss_limit_mb=4096") - - print_fuzzer_banner( - title="FunctionRegistry Bridge Machinery Fuzzer (Atheris)", - target="FunctionRegistry, FunctionSignature, FluentNumber, make_fluent_number", - state=_state, - schedule_len=len(_PATTERN_SCHEDULE), - ) - - run_fuzzer(_state, test_one_input=test_one_input) - +from fuzz_bridge_entry import main if __name__ == "__main__": main() diff --git a/fuzz_atheris/fuzz_bridge_entry.py b/fuzz_atheris/fuzz_bridge_entry.py new file mode 100644 index 00000000..4725ac72 --- /dev/null +++ b/fuzz_atheris/fuzz_bridge_entry.py @@ -0,0 +1,174 @@ +from __future__ import annotations + +import argparse +import gc +import sys +import time +from typing import Any + +import atheris +from fuzz_bridge_patterns_dispatch import ( + _pattern_call_dispatch, + _pattern_dict_interface, + _pattern_error_wrapping, + _pattern_evil_objects, + _pattern_fluent_function_decorator, + _pattern_freeze_copy_lifecycle, + _pattern_locale_injection, + _pattern_metadata_api, +) +from fuzz_bridge_patterns_numbers import ( + _pattern_camel_case_conversion, + _pattern_fluent_number_contracts, + _pattern_make_fluent_number_api, + _pattern_signature_immutability, +) +from fuzz_bridge_patterns_registration import ( + _pattern_param_mapping_custom, + _pattern_register_basic, + _pattern_register_signatures, + _pattern_signature_validation, +) +from fuzz_bridge_support import ( + _ALLOWED_EXCEPTIONS, + _PATTERN_SCHEDULE, + BridgeFuzzError, + _emit_checkpoint, + _state, +) +from fuzz_common import ( + GC_INTERVAL, + get_process, + print_fuzzer_banner, + record_iteration_metrics, + record_memory, + run_fuzzer, + select_pattern_round_robin, +) + +from ftllexengine.diagnostics import FrozenFluentError + +_PATTERN_DISPATCH: dict[str, Any] = { + "register_basic": _pattern_register_basic, + "register_signatures": _pattern_register_signatures, + "param_mapping_custom": _pattern_param_mapping_custom, + "signature_validation": _pattern_signature_validation, + "fluent_number_contracts": _pattern_fluent_number_contracts, + "make_fluent_number_api": _pattern_make_fluent_number_api, + "signature_immutability": _pattern_signature_immutability, + "camel_case_conversion": _pattern_camel_case_conversion, + "call_dispatch": _pattern_call_dispatch, + "locale_injection": _pattern_locale_injection, + "error_wrapping": _pattern_error_wrapping, + "evil_objects": _pattern_evil_objects, + "dict_interface": _pattern_dict_interface, + "freeze_copy_lifecycle": _pattern_freeze_copy_lifecycle, + "fluent_function_decorator": _pattern_fluent_function_decorator, + "metadata_api": _pattern_metadata_api, +} + +def test_one_input(data: bytes) -> None: + """Atheris entry point: fuzz FunctionRegistry bridge machinery.""" + if _state.iterations == 0: + _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) + + _state.iterations += 1 + _state.status = "running" + + if _state.iterations % _state.checkpoint_interval == 0: + _emit_checkpoint() + + start_time = time.perf_counter() + fdp = atheris.FuzzedDataProvider(data) + + pattern = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) + _state.pattern_coverage[pattern] = _state.pattern_coverage.get(pattern, 0) + 1 + + if fdp.remaining_bytes() < 4: + return + + pattern_func = _PATTERN_DISPATCH[pattern] + + try: + pattern_func(fdp) + + except BridgeFuzzError: + _state.findings += 1 + raise + + except (*_ALLOWED_EXCEPTIONS, FrozenFluentError): + pass # Expected for invalid inputs + + except Exception as e: # pylint: disable=broad-exception-caught + error_key = f"{type(e).__name__}_{str(e)[:30]}" + _state.error_counts[error_key] = _state.error_counts.get(error_key, 0) + 1 + + finally: + # Semantic interestingness: patterns exercising complex paths, + # error paths, or wall-time > 1ms indicating unusual code path + is_interesting = ( + pattern + in ( + "evil_objects", + "signature_validation", + "locale_injection", + "metadata_api", + "error_wrapping", + "make_fluent_number_api", + "dict_interface", + "signature_immutability", + "register_signatures", + ) + or (time.perf_counter() - start_time) * 1000 > 1.0 + ) + record_iteration_metrics( + _state, + pattern, + start_time, + data, + is_interesting=is_interesting, + ) + + if _state.iterations % GC_INTERVAL == 0: + gc.collect() + + if _state.iterations % 100 == 0: + record_memory(_state) + +def main() -> None: + """Run the bridge machinery fuzzer with CLI support.""" + parser = argparse.ArgumentParser( + description="FunctionRegistry bridge machinery fuzzer using Atheris/libFuzzer", + epilog="All unrecognized arguments are passed to libFuzzer.", + ) + parser.add_argument( + "--checkpoint-interval", + type=int, + default=500, + help="Emit report every N iterations (default: 500)", + ) + parser.add_argument( + "--seed-corpus-size", + type=int, + default=500, + help="Maximum size of in-memory seed corpus (default: 500)", + ) + + args, remaining = parser.parse_known_args() + _state.checkpoint_interval = args.checkpoint_interval + _state.seed_corpus_max_size = args.seed_corpus_size + + sys.argv = [sys.argv[0], *remaining] + + # Inject RSS limit if not specified + if not any(arg.startswith("-rss_limit_mb") for arg in sys.argv): + sys.argv.append("-rss_limit_mb=4096") + + print_fuzzer_banner( + title="FunctionRegistry Bridge Machinery Fuzzer (Atheris)", + target="FunctionRegistry, FunctionSignature, FluentNumber, make_fluent_number", + state=_state, + schedule_len=len(_PATTERN_SCHEDULE), + ) + + run_fuzzer(_state, test_one_input=test_one_input) diff --git a/fuzz_atheris/fuzz_bridge_patterns_dispatch.py b/fuzz_atheris/fuzz_bridge_patterns_dispatch.py new file mode 100644 index 00000000..8997f67c --- /dev/null +++ b/fuzz_atheris/fuzz_bridge_patterns_dispatch.py @@ -0,0 +1,432 @@ +from __future__ import annotations + +import contextlib +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + import atheris +from fuzz_bridge_support import ( + _ALLOWED_EXCEPTIONS, + BridgeFuzzError, + _domain, + _pick_locale, +) + +from ftllexengine.diagnostics import FrozenFluentError +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.cache_config import CacheConfig +from ftllexengine.runtime.function_bridge import FunctionRegistry, fluent_function +from ftllexengine.runtime.functions import create_default_registry, get_shared_registry + + +def _pattern_call_dispatch(fdp: atheris.FuzzedDataProvider) -> None: + """Test call() dispatch with varied argument shapes.""" + _domain.call_dispatch_tests += 1 + reg = FunctionRegistry() + + def echo_fn(value: Any, **kwargs: Any) -> str: + return f"{value}|{len(kwargs)}" + + reg.register(echo_fn, ftl_name="ECHO") + + variant = fdp.ConsumeIntInRange(0, 4) + + match variant: + case 0: + # Normal call + result = reg.call("ECHO", [42], {"key": "val"}) + if "42" not in str(result): + msg = f"Normal call failed: {result}" + raise BridgeFuzzError(msg) + + case 1: + # No positional args + with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): + reg.call("ECHO", [], {}) + + case 2: + # Many positional args + n = fdp.ConsumeIntInRange(2, 10) + args = [fdp.ConsumeIntInRange(0, 100) for _ in range(n)] + with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): + reg.call("ECHO", args, {}) + + case 3: + # Unknown function name + with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): + reg.call("NONEXISTENT", [1], {}) + + case _: + # Call with many kwargs + n = fdp.ConsumeIntInRange(1, 10) + kwargs = {f"k{i}": i for i in range(n)} + reg.call("ECHO", ["val"], kwargs) + +def _pattern_locale_injection(fdp: atheris.FuzzedDataProvider) -> None: + """Test locale injection protocol with custom functions.""" + _domain.locale_injection_tests += 1 + reg = FunctionRegistry() + variant = fdp.ConsumeIntInRange(0, 3) + + match variant: + case 0: + # Decorated with inject_locale=True + @fluent_function(inject_locale=True) + def locale_fn(value: Any, locale_code: str) -> str: + return f"{value}@{locale_code}" + + reg.register(locale_fn, ftl_name="LOCALE_FN") + + if not reg.should_inject_locale("LOCALE_FN"): + msg = "should_inject_locale returned False for decorated function" + raise BridgeFuzzError(msg) + + case 1: + # Not decorated -- should NOT inject locale + def plain_fn(value: Any) -> str: + return str(value) + + reg.register(plain_fn, ftl_name="PLAIN_FN") + + if reg.should_inject_locale("PLAIN_FN"): + msg = "should_inject_locale returned True for plain function" + raise BridgeFuzzError(msg) + + case 2: + # Nonexistent function + if reg.should_inject_locale("DOES_NOT_EXIST"): + msg = "should_inject_locale returned True for nonexistent function" + raise BridgeFuzzError(msg) + + case _: + # End-to-end: locale injection through FluentBundle + locale = _pick_locale(fdp) + bundle = FluentBundle(locale, strict=False) + + @fluent_function(inject_locale=True) + def fmt_fn(value: Any, locale_code: str) -> str: + return f"[{locale_code}:{value}]" + + bundle.add_function("FMT", fmt_fn) + bundle.add_resource("msg = { FMT($val) }\n") + with contextlib.suppress(Exception): + bundle.format_pattern("msg", {"val": "test"}) + +def _pattern_error_wrapping(fdp: atheris.FuzzedDataProvider) -> None: + """Verify TypeError/ValueError from functions are wrapped as FrozenFluentError.""" + _domain.call_dispatch_errors += 1 + reg = create_default_registry() + variant = fdp.ConsumeIntInRange(0, 2) + + match variant: + case 0: + # Call NUMBER with wrong type + try: + reg.call("NUMBER", ["not_a_number", "en"], {}) + except FrozenFluentError: + pass # Expected wrapping + except (TypeError, ValueError): + pass # Also acceptable + + case 1: + # Call nonexistent function + with contextlib.suppress(FrozenFluentError, KeyError): + reg.call("NONEXISTENT", [1], {}) + + case _: + # Call with wrong arity + with contextlib.suppress(FrozenFluentError, TypeError): + reg.call("NUMBER", [], {}) + +def _pattern_evil_objects(fdp: atheris.FuzzedDataProvider) -> None: + """Adversarial Python objects as FTL variables through FluentBundle.""" + _domain.evil_object_tests += 1 + variant = fdp.ConsumeIntInRange(0, 5) + + match variant: + case 0: + # Evil __str__ raises RuntimeError + class EvilStr: + """Object whose __str__ raises RuntimeError.""" + + def __str__(self) -> str: + raise RuntimeError("evil __str__") # noqa: EM101 - dynamic type in error message + + var: object = EvilStr() + + case 1: + # Evil __hash__ raises TypeError + class EvilHash: + """Object whose __hash__ raises TypeError.""" + + def __hash__(self) -> int: + raise TypeError("unhashable evil") # noqa: EM101 - dynamic type in error message + + def __str__(self) -> str: + return "evil" + + var = EvilHash() + + case 2: + # Recursive list + recursive_list: list[object] = [] + recursive_list.append(recursive_list) + var = recursive_list + + case 3: + # Recursive dict + recursive_dict: dict[str, object] = {} + recursive_dict["self"] = recursive_dict + var = recursive_dict + + case 4: + # Massive string + size = fdp.ConsumeIntInRange(1000, 50000) + var = "A" * size + + case _: + # None value + var = None + + # Full FluentBundle resolution path with adversarial objects + bundle = FluentBundle("en-US", cache=CacheConfig() if fdp.ConsumeBool() else None) + bundle.add_resource("msg = Value: { $var }\n") + with contextlib.suppress(*_ALLOWED_EXCEPTIONS, FrozenFluentError): + bundle.format_pattern("msg", {"var": var}) # type: ignore[dict-item] + +def _pattern_dict_interface(fdp: atheris.FuzzedDataProvider) -> None: # noqa: PLR0912 - dispatch + """Dict-like interface: __iter__, __contains__, __len__, list_functions, __repr__.""" + reg = create_default_registry() + variant = fdp.ConsumeIntInRange(0, 4) + + match variant: + case 0: + # __contains__ for known builtins + for name in ("NUMBER", "DATETIME", "CURRENCY"): + if name not in reg: + msg = f"Default registry missing {name} via __contains__" + raise BridgeFuzzError(msg) + # Nonexistent + fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) + if fuzzed in reg and fuzzed not in ("NUMBER", "DATETIME", "CURRENCY"): + msg = f"Registry contains unexpected function: {fuzzed}" + raise BridgeFuzzError(msg) + + case 1: + # __iter__ yields all function names + names = list(reg) + for name in ("NUMBER", "DATETIME", "CURRENCY"): + if name not in names: + msg = f"__iter__ missing {name}" + raise BridgeFuzzError(msg) + + case 2: + # list_functions returns all registered names (insertion order) + funcs = reg.list_functions() + if len(funcs) != len(reg): + msg = f"list_functions length {len(funcs)} != len(reg) {len(reg)}" + raise BridgeFuzzError(msg) + for name in ("NUMBER", "DATETIME", "CURRENCY"): + if name not in funcs: + msg = f"list_functions missing {name}" + raise BridgeFuzzError(msg) + + case 3: + # get_python_name and get_callable + py_name = reg.get_python_name("NUMBER") + if py_name is None: + msg = "get_python_name('NUMBER') returned None" + raise BridgeFuzzError(msg) + callable_fn = reg.get_callable("NUMBER") + if callable_fn is None: + msg = "get_callable('NUMBER') returned None" + raise BridgeFuzzError(msg) + # Nonexistent + if reg.get_python_name("FAKE") is not None: + msg = "get_python_name returned non-None for nonexistent" + raise BridgeFuzzError(msg) + + case _: + # __repr__ consistency + empty_reg = FunctionRegistry() + r = repr(empty_reg) + if "0" not in r: + msg = f"Empty registry repr missing '0': {r}" + raise BridgeFuzzError(msg) + empty_reg.register(str, ftl_name="TEST") + r2 = repr(empty_reg) + if "1" not in r2: + msg = f"Single-func registry repr missing '1': {r2}" + raise BridgeFuzzError(msg) + +def _pattern_freeze_copy_lifecycle(fdp: atheris.FuzzedDataProvider) -> None: + """Freeze/copy lifecycle: isolation, mutation prevention.""" + _domain.freeze_copy_tests += 1 + variant = fdp.ConsumeIntInRange(0, 3) + + match variant: + case 0: + # Freeze prevents registration + reg = FunctionRegistry() + reg.register(str, ftl_name="PRE") + reg.freeze() + if not reg.frozen: + msg = "Registry not frozen after freeze()" + raise BridgeFuzzError(msg) + try: + reg.register(str, ftl_name="POST") + msg = "Frozen registry accepted registration" + raise BridgeFuzzError(msg) + except TypeError: + pass # Expected + + case 1: + # Copy is unfrozen and independent + shared = get_shared_registry() + copy = shared.copy() + if copy.frozen: + msg = "Copy should be unfrozen" + raise BridgeFuzzError(msg) + + def custom(_value: Any) -> str: + return "custom" + + copy.register(custom, ftl_name="COPY_ONLY") + if "COPY_ONLY" in shared: + msg = "Copy polluted original registry" + raise BridgeFuzzError(msg) + if "COPY_ONLY" not in copy: + msg = "Copy missing newly registered function" + raise BridgeFuzzError(msg) + + case 2: + # Copy preserves all original functions + original = create_default_registry() + original_funcs = set(original) + copy = original.copy() + copy_funcs = set(copy) + if original_funcs != copy_funcs: + msg = f"Copy functions differ: {original_funcs - copy_funcs}" + raise BridgeFuzzError(msg) + + case _: + # Double freeze is safe (idempotent) + reg = FunctionRegistry() + reg.freeze() + reg.freeze() # Should not raise + if not reg.frozen: + msg = "Double freeze broke frozen state" + raise BridgeFuzzError(msg) + +def _pattern_fluent_function_decorator(fdp: atheris.FuzzedDataProvider) -> None: + """Test @fluent_function decorator edge cases.""" + variant = fdp.ConsumeIntInRange(0, 3) + + match variant: + case 0: + # Bare decorator (no parentheses) + @fluent_function + def bare_fn(value: Any) -> str: + return str(value) + + if bare_fn(42) != "42": + msg = f"Bare decorator broke function: {bare_fn(42)}" + raise BridgeFuzzError(msg) + + case 1: + # Decorator with parentheses, no inject_locale + @fluent_function() + def parens_fn(value: Any) -> str: + return str(value) + + if parens_fn(42) != "42": + msg = f"Parenthesized decorator broke function: {parens_fn(42)}" + raise BridgeFuzzError(msg) + + case 2: + # Decorator with inject_locale=True sets attribute + @fluent_function(inject_locale=True) + def locale_fn(value: Any, locale_code: str) -> str: + return f"{value}@{locale_code}" + + attr_name = "_ftl_requires_locale" + if not getattr(locale_fn, attr_name, False): + msg = "inject_locale=True did not set attribute" + raise BridgeFuzzError(msg) + + result = locale_fn(42, "en") + if result != "42@en": + msg = f"Decorated function broken: {result}" + raise BridgeFuzzError(msg) + + case _: + # Register decorated function in registry + @fluent_function(inject_locale=True) + def reg_fn(_value: Any, locale_code: str) -> str: + return f"[{locale_code}]" + + reg = FunctionRegistry() + reg.register(reg_fn, ftl_name="REG_FN") + if not reg.should_inject_locale("REG_FN"): + msg = "Decorated + registered: should_inject_locale is False" + raise BridgeFuzzError(msg) + +def _pattern_metadata_api(fdp: atheris.FuzzedDataProvider) -> None: # noqa: PLR0912 - dispatch + """Test get_expected_positional_args, get_builtin_metadata, has_function.""" + _domain.metadata_api_tests += 1 + reg = create_default_registry() + variant = fdp.ConsumeIntInRange(0, 4) + + match variant: + case 0: + # get_expected_positional_args for known builtins + for name in ("NUMBER", "DATETIME", "CURRENCY"): + result = reg.get_expected_positional_args(name) + if result is None: + msg = f"get_expected_positional_args({name}) returned None" + raise BridgeFuzzError(msg) + if result != 1: + msg = f"get_expected_positional_args({name}) = {result}, expected 1" + raise BridgeFuzzError(msg) + + case 1: + # get_expected_positional_args for unknown function + fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) + result = reg.get_expected_positional_args(fuzzed) + if fuzzed not in ("NUMBER", "DATETIME", "CURRENCY") and result is not None: + msg = f"get_expected_positional_args({fuzzed!r}) returned {result}" + raise BridgeFuzzError(msg) + + case 2: + # get_builtin_metadata for known builtins + for name in ("NUMBER", "DATETIME", "CURRENCY"): + meta = reg.get_builtin_metadata(name) + if meta is None: + msg = f"get_builtin_metadata({name}) returned None" + raise BridgeFuzzError(msg) + if not meta.requires_locale: + msg = f"Builtin {name} should require locale" + raise BridgeFuzzError(msg) + + case 3: + # has_function vs __contains__ consistency + for name in ("NUMBER", "DATETIME", "CURRENCY"): + has = reg.has_function(name) + contains = name in reg + if has != contains: + msg = f"has_function != __contains__ for {name}" + raise BridgeFuzzError(msg) + fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) + has = reg.has_function(fuzzed) + contains = fuzzed in reg + if has != contains: + msg = f"has_function != __contains__ for fuzzed {fuzzed!r}" + raise BridgeFuzzError(msg) + + case _: + # get_builtin_metadata for unknown function returns None + fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 30)) + meta = reg.get_builtin_metadata(fuzzed) + if fuzzed not in ("NUMBER", "DATETIME", "CURRENCY") and meta is not None: + msg = f"get_builtin_metadata({fuzzed!r}) returned non-None" + raise BridgeFuzzError(msg) diff --git a/fuzz_atheris/fuzz_bridge_patterns_numbers.py b/fuzz_atheris/fuzz_bridge_patterns_numbers.py new file mode 100644 index 00000000..abcc5f0e --- /dev/null +++ b/fuzz_atheris/fuzz_bridge_patterns_numbers.py @@ -0,0 +1,257 @@ +from __future__ import annotations + +from decimal import Decimal +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + import atheris +from fuzz_bridge_support import ( + _CAMEL_EXPECTED, + _SNAKE_CASE_NAMES, + BridgeFuzzError, + _call_make_fluent_number, + _domain, + _group_ascii_thousands, +) + +from ftllexengine.core.value_types import make_fluent_number +from ftllexengine.runtime.function_bridge import FluentNumber, FunctionRegistry +from ftllexengine.runtime.functions import create_default_registry + + +def _pattern_fluent_number_contracts(fdp: atheris.FuzzedDataProvider) -> None: + """FluentNumber object contracts: str, repr, precision, frozen.""" + _domain.fluent_number_checks += 1 + variant = fdp.ConsumeIntInRange(0, 3) + + match variant: + case 0: + # Basic construction and str + fn = FluentNumber(value=Decimal("1234.56"), formatted="1,234.56", precision=2) + if str(fn) != "1,234.56": + msg = f"FluentNumber str() = '{fn}', expected '1,234.56'" + raise BridgeFuzzError(msg) + + case 1: + # repr includes value info + fn = FluentNumber(value=Decimal("99.9"), formatted="99.9", precision=1) + r = repr(fn) + if "99.9" not in r: + msg = f"FluentNumber repr missing value: {r}" + raise BridgeFuzzError(msg) + + case 2: + # Precision can be None + fn = FluentNumber(value=42, formatted="42", precision=None) + if fn.precision is not None: + msg = "FluentNumber precision should be None" + raise BridgeFuzzError(msg) + if str(fn) != "42": + msg = f"FluentNumber str() with None precision = '{fn}'" + raise BridgeFuzzError(msg) + + case _: + # Frozen: attribute assignment should fail + fn = FluentNumber(value=1, formatted="1", precision=0) + try: + fn.value = 999 # type: ignore[misc] + msg = "FluentNumber is not frozen: attribute assignment succeeded" + raise BridgeFuzzError(msg) + except AttributeError: + pass # Expected: frozen dataclass + +def _check_make_fluent_number_default_decimal( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Default Decimal formatting preserves trailing-zero precision.""" + int_part = fdp.ConsumeIntInRange(-9999, 9999) + frac_core = str(fdp.ConsumeIntInRange(1, 999)).zfill(3) + trailing_zeros = "0" * fdp.ConsumeIntInRange(1, 4) + value = Decimal(f"{int_part}.{frac_core}{trailing_zeros}") + fn = _call_make_fluent_number(value) + expected_precision = len(frac_core) + len(trailing_zeros) + if fn.formatted != str(value) or fn.precision != expected_precision: + msg = ( + "make_fluent_number(default Decimal) did not preserve " + f"string/precision: {fn!r} vs value={value!r}, " + f"expected_precision={expected_precision}" + ) + raise BridgeFuzzError(msg) + +def _check_make_fluent_number_default_int( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Default integer formatting exposes zero visible decimals.""" + value = fdp.ConsumeIntInRange(-1_000_000, 1_000_000) + fn = _call_make_fluent_number(value) + if fn.formatted != str(value) or fn.precision != 0: + msg = ( + "make_fluent_number(int) did not preserve zero visible precision: " + f"{fn!r} for value={value}" + ) + raise BridgeFuzzError(msg) + +def _check_make_fluent_number_fractional_int( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Explicit fractional formatting controls visible precision for ints.""" + value = fdp.ConsumeIntInRange(-999, 999) + frac_digits = "0" * fdp.ConsumeIntInRange(1, 4) + formatted = f"{value}.{frac_digits}" + fn = _call_make_fluent_number(value, formatted=formatted) + if fn.formatted != formatted or fn.precision != len(frac_digits): + msg = ( + "make_fluent_number(explicit fractional int) miscomputed precision: " + f"{fn!r} for formatted={formatted!r}" + ) + raise BridgeFuzzError(msg) + +def _check_make_fluent_number_grouped_int( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Grouping separators do not create decimal precision.""" + value = fdp.ConsumeIntInRange(1_000, 999_999) + formatted = _group_ascii_thousands(value) + fn = _call_make_fluent_number(value, formatted=formatted) + if fn.formatted != formatted or fn.precision != 0: + msg = ( + "make_fluent_number(grouped int) treated grouping as decimals: " + f"{fn!r} for formatted={formatted!r}" + ) + raise BridgeFuzzError(msg) + +def _check_make_fluent_number_localized_decimal( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Localized formatted strings drive visible precision inference.""" + whole = fdp.ConsumeIntInRange(1, 9999) + precision = fdp.ConsumeIntInRange(1, 4) + fraction = str(fdp.ConsumeIntInRange(0, (10**precision) - 1)).zfill(precision) + value = Decimal(f"{whole}.{fraction}") + grouped = _group_ascii_thousands(whole).replace(",", " ") + formatted = f"{grouped},{fraction} EUR" + fn = _call_make_fluent_number(value, formatted=formatted) + if fn.formatted != formatted or fn.precision != precision: + msg = ( + "make_fluent_number(localized decimal) miscomputed visible precision: " + f"{fn!r} for formatted={formatted!r}" + ) + raise BridgeFuzzError(msg) + +def _check_make_fluent_number_disambiguation( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Formatted decimals for integer values are not mistaken for grouping.""" + precision = fdp.ConsumeIntInRange(1, 4) + separator = fdp.PickValueInList([",", "."]) + zeros = "0" * precision + value = fdp.PickValueInList([1, -1]) + formatted = f"{value}{separator}{zeros}" + fn = _call_make_fluent_number(value, formatted=formatted) + if fn.formatted != formatted or fn.precision != precision: + msg = ( + "make_fluent_number(disambiguation) lost decimal precision: " + f"{fn!r} for formatted={formatted!r}" + ) + raise BridgeFuzzError(msg) + +def _check_make_fluent_number_bool_rejection( + _fdp: atheris.FuzzedDataProvider, +) -> None: + """Bool inputs are rejected like direct FluentNumber construction.""" + try: + make_fluent_number(True) + except TypeError: + return + msg = "make_fluent_number(bool) should raise TypeError" + raise BridgeFuzzError(msg) + +def _pattern_make_fluent_number_api(fdp: atheris.FuzzedDataProvider) -> None: + """make_fluent_number derives visible precision from domain values.""" + _domain.make_fluent_number_checks += 1 + handlers = ( + _check_make_fluent_number_default_decimal, + _check_make_fluent_number_default_int, + _check_make_fluent_number_fractional_int, + _check_make_fluent_number_grouped_int, + _check_make_fluent_number_localized_decimal, + _check_make_fluent_number_disambiguation, + _check_make_fluent_number_bool_rejection, + ) + handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] + handler(fdp) + +def _pattern_signature_immutability(fdp: atheris.FuzzedDataProvider) -> None: + """Verify FunctionSignature immutability and param_mapping tuple type.""" + reg = create_default_registry() + func_name = fdp.PickValueInList(["NUMBER", "DATETIME", "CURRENCY"]) + info = reg.get_function_info(func_name) + + if info is None: + msg = f"{func_name} FunctionSignature is None" + raise BridgeFuzzError(msg) + + # param_mapping should be tuple of tuples (immutable) + if not isinstance(info.param_mapping, tuple): + msg = f"param_mapping is {type(info.param_mapping)}, expected tuple" + raise BridgeFuzzError(msg) + + for pair in info.param_mapping: + if not isinstance(pair, tuple) or len(pair) != 2: + msg = f"param_mapping entry is not (str, str): {pair}" + raise BridgeFuzzError(msg) + + # FunctionSignature should be frozen + try: + info.ftl_name = "HACKED" # type: ignore[misc] + msg = "FunctionSignature is not frozen" + raise BridgeFuzzError(msg) + except AttributeError: + pass # Expected + + # Callable should be present + if not callable(info.callable): + msg = "FunctionSignature callable is not callable" + raise BridgeFuzzError(msg) + + # ftl_name should match what we queried + if info.ftl_name != func_name: + msg = f"FunctionSignature.ftl_name = '{info.ftl_name}', expected '{func_name}'" + raise BridgeFuzzError(msg) + + # Fuzzed: try getting info for nonexistent function + if fdp.ConsumeBool(): + fuzzed = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 20)) + bad_info = reg.get_function_info(fuzzed) + if bad_info is not None and fuzzed not in ("NUMBER", "DATETIME", "CURRENCY"): + msg = f"get_function_info returned non-None for unknown '{fuzzed}'" + raise BridgeFuzzError(msg) + +def _pattern_camel_case_conversion(fdp: atheris.FuzzedDataProvider) -> None: + """Test _to_camel_case with known and fuzzed inputs.""" + _domain.camel_case_tests += 1 + variant = fdp.ConsumeIntInRange(0, 2) + + if variant == 0: + # Known conversions with invariant checks + for snake, expected_camel in _CAMEL_EXPECTED.items(): + result = FunctionRegistry._to_camel_case(snake) + if result != expected_camel: + msg = f"_to_camel_case('{snake}') = '{result}', expected '{expected_camel}'" + raise BridgeFuzzError(msg) + + elif variant == 1: + # Fuzzed snake_case names from curated list + name = fdp.PickValueInList(list(_SNAKE_CASE_NAMES)) + result = FunctionRegistry._to_camel_case(name) + if not isinstance(result, str): + msg = f"_to_camel_case returned non-string: {type(result)}" + raise BridgeFuzzError(msg) + + else: + # Fully fuzzed input + raw = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 50)) + result = FunctionRegistry._to_camel_case(raw) + if not isinstance(result, str): + msg = "_to_camel_case returned non-string for fuzzed input" + raise BridgeFuzzError(msg) diff --git a/fuzz_atheris/fuzz_bridge_patterns_registration.py b/fuzz_atheris/fuzz_bridge_patterns_registration.py new file mode 100644 index 00000000..f78bc3be --- /dev/null +++ b/fuzz_atheris/fuzz_bridge_patterns_registration.py @@ -0,0 +1,219 @@ +from __future__ import annotations + +import contextlib +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + import atheris +from fuzz_bridge_support import ( + BridgeFuzzError, + _domain, +) + +from ftllexengine.runtime.function_bridge import FunctionRegistry, fluent_function + + +def _pattern_register_basic(fdp: atheris.FuzzedDataProvider) -> None: + """Basic function registration: name generation, simple callables.""" + _domain.register_calls += 1 + reg = FunctionRegistry() + num_funcs = fdp.ConsumeIntInRange(1, 5) + + for i in range(num_funcs): + + def make_fn(idx: int) -> Any: + def fn(_value: Any) -> str: + return f"result_{idx}" + + fn.__name__ = f"test_func_{idx}" + return fn + + func = make_fn(i) + ftl_name = f"FUNC{i}" if fdp.ConsumeBool() else None + reg.register(func, ftl_name=ftl_name) + + # Invariant: len matches registration count + if len(reg) != num_funcs: + msg = f"Registry len {len(reg)} != expected {num_funcs}" + raise BridgeFuzzError(msg) + +def _pattern_register_signatures(fdp: atheris.FuzzedDataProvider) -> None: + """Registration with various Python function signatures.""" + _domain.register_calls += 1 + reg = FunctionRegistry() + variant = fdp.ConsumeIntInRange(0, 6) + + match variant: + case 0: + # Positional-only params + def pos_only(value: Any, /) -> str: + return str(value) + + reg.register(pos_only, ftl_name="POS_ONLY") + + case 1: + # Keyword-only params + def kw_only(value: Any, *, style: str = "default") -> str: + return f"{value}_{style}" + + reg.register(kw_only, ftl_name="KW_ONLY") + result = reg.call("KW_ONLY", [42], {"style": "custom"}) + if "42" not in str(result): + msg = f"KW_ONLY result missing value: {result}" + raise BridgeFuzzError(msg) + + case 2: + # *args function + def varargs(*args: Any) -> str: + return "_".join(str(a) for a in args) + + reg.register(varargs, ftl_name="VARARGS") + n = fdp.ConsumeIntInRange(0, 5) + positional = [fdp.ConsumeIntInRange(0, 100) for _ in range(n)] + reg.call("VARARGS", positional, {}) + + case 3: + # **kwargs function + def kwargs_fn(value: Any, **kwargs: Any) -> str: + return f"{value}_{len(kwargs)}" + + reg.register(kwargs_fn, ftl_name="KWARGS_FN") + named = {f"key{i}": i for i in range(fdp.ConsumeIntInRange(0, 5))} + reg.call("KWARGS_FN", ["hello"], named) + + case 4: + # Function with many parameters (auto-mapping stress) + def many_params( + value: Any, + *, + minimum_fraction_digits: int = 0, + maximum_fraction_digits: int = 3, + use_grouping: bool = True, + currency_display: str = "symbol", + ) -> str: + return str(value) + + reg.register(many_params, ftl_name="MANY") + info = reg.get_function_info("MANY") + if info is None: + msg = "get_function_info returned None for registered function" + raise BridgeFuzzError(msg) + # Verify param_mapping includes all snake_case -> camelCase + mapping_dict = dict(info.param_mapping) + if "minimumFractionDigits" not in mapping_dict: + msg = f"Missing camelCase mapping: {mapping_dict}" + raise BridgeFuzzError(msg) + + case 5: + # Duplicate registration (should overwrite) + def fn_v1(_value: Any) -> str: + return "v1" + + def fn_v2(_value: Any) -> str: + return "v2" + + fn_v2.__name__ = "fn_v1" + reg.register(fn_v1, ftl_name="DUP") + reg.register(fn_v2, ftl_name="DUP") + result = reg.call("DUP", ["x"], {}) + if str(result) != "v2": + msg = f"Duplicate registration did not overwrite: got {result}" + raise BridgeFuzzError(msg) + + case _: + # Lambda registration + reg.register(str, ftl_name="LAMBDA") + reg.call("LAMBDA", [42], {}) + +def _pattern_param_mapping_custom(fdp: atheris.FuzzedDataProvider) -> None: + """Custom param_map overrides auto-generated mappings.""" + _domain.register_calls += 1 + reg = FunctionRegistry() + + def target_fn(value: Any, *, minimum_fraction_digits: int = 0) -> str: + return str(value) + + variant = fdp.ConsumeIntInRange(0, 2) + + if variant == 0: + # Custom mapping overrides auto-generated + custom_map = {"customName": "minimum_fraction_digits"} + reg.register(target_fn, ftl_name="CUSTOM_MAP", param_map=custom_map) + result = reg.call("CUSTOM_MAP", [42], {"customName": 2}) + if "42" not in str(result): + msg = f"Custom param_map call failed: {result}" + raise BridgeFuzzError(msg) + + elif variant == 1: + # Empty custom map (auto-generation only) + reg.register(target_fn, ftl_name="EMPTY_MAP", param_map={}) + info = reg.get_function_info("EMPTY_MAP") + if info is None or len(info.param_mapping) == 0: + msg = "Empty param_map should still have auto-generated mappings" + raise BridgeFuzzError(msg) + + else: + # Fuzzed param_map keys + fuzzed_key = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(1, 30)) + custom_map = {fuzzed_key: "minimum_fraction_digits"} + reg.register(target_fn, ftl_name="FUZZ_MAP", param_map=custom_map) + with contextlib.suppress(Exception): + reg.call("FUZZ_MAP", [1], {fuzzed_key: 2}) + +def _pattern_signature_validation(fdp: atheris.FuzzedDataProvider) -> None: + """Test registration error paths: locale arity, collision, auto-naming.""" + _domain.signature_validation_tests += 1 + reg = FunctionRegistry() + variant = fdp.ConsumeIntInRange(0, 3) + + match variant: + case 0: + # inject_locale with insufficient positional params -> TypeError + @fluent_function(inject_locale=True) + def bad_fn(value: Any) -> str: + return str(value) + + try: + reg.register(bad_fn, ftl_name="BAD_LOCALE") + msg = "inject_locale with 1 positional param did not raise TypeError" + raise BridgeFuzzError(msg) + except TypeError: + _domain.register_failures += 1 + + case 1: + # Underscore collision detection -> ValueError + def colliding( + value: Any, + *, + _data: int = 0, + data: int = 0, + ) -> str: + return str(value) + + try: + reg.register(colliding, ftl_name="COLLIDE") + msg = "Underscore collision did not raise ValueError" + raise BridgeFuzzError(msg) + except ValueError: + _domain.register_failures += 1 + + case 2: + # Auto-naming from __name__ (ftl_name=None) + def my_custom_function(value: Any) -> str: + return str(value) + + reg.register(my_custom_function) + if "MY_CUSTOM_FUNCTION" not in reg: + msg = "Auto-naming failed: MY_CUSTOM_FUNCTION not in registry" + raise BridgeFuzzError(msg) + + case _: + # inject_locale=True with *args function (should succeed) + @fluent_function(inject_locale=True) + def varargs_locale(*args: Any) -> str: + return str(args) + + reg.register(varargs_locale, ftl_name="VARARGS_LOCALE") + if not reg.should_inject_locale("VARARGS_LOCALE"): + msg = "varargs function with inject_locale not detected" + raise BridgeFuzzError(msg) diff --git a/fuzz_atheris/fuzz_bridge_support.py b/fuzz_atheris/fuzz_bridge_support.py new file mode 100644 index 00000000..4b88623b --- /dev/null +++ b/fuzz_atheris/fuzz_bridge_support.py @@ -0,0 +1,335 @@ +#!/usr/bin/env python3 +"""FunctionRegistry Bridge Machinery Fuzzer (Atheris). + +Targets: ftllexengine.runtime.function_bridge, ftllexengine.core.value_types +(FunctionRegistry, FunctionSignature, FluentNumber, make_fluent_number, +fluent_function decorator, parameter mapping, locale injection) + +Concern boundary: This fuzzer stress-tests the bridge machinery that connects +FTL function calls to Python implementations. Distinct from fuzz_builtins which +tests built-in functions (NUMBER, DATETIME, CURRENCY) through the bridge; this +fuzzer tests the bridge itself: +- FunctionRegistry.register() with varied function signatures +- Parameter mapping: _to_camel_case conversion and custom param_map +- FunctionRegistry.call() dispatch with adversarial arguments +- Locale injection protocol (fluent_function decorator) +- FunctionSignature construction and immutability +- FluentNumber object contracts (str, hash, contains, len, repr) +- make_fluent_number() visible-precision inference and typed construction +- Dict-like registry interface (__iter__, __contains__, __len__, has_function) +- Freeze/copy lifecycle and isolation +- Metadata API (get_expected_positional_args, get_builtin_metadata) +- Signature validation error paths (arity, collision, auto-naming) +- Adversarial Python objects (evil __str__, __hash__, recursive structures) +- Error wrapping (TypeError/ValueError -> FrozenFluentError) + +Shared infrastructure imported from fuzz_common (BaseFuzzerState, metrics, +reporting); domain-specific metrics tracked in BridgeMetrics dataclass. +Pattern selection uses deterministic round-robin through a pre-built weighted +schedule (select_pattern_round_robin), immune to coverage-guided mutation bias. +Periodic gc.collect() every 256 iterations and -rss_limit_mb=4096 default. + +Requires Python 3.13+ (uses PEP 695 type aliases). +""" + +from __future__ import annotations + +import atexit +import logging +import pathlib +from dataclasses import dataclass +from decimal import Decimal +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + from collections.abc import Sequence + +# --- Dependency Checks --- +_psutil_mod: Any = None +_atheris_mod: Any = None + +try: # noqa: SIM105 - need module ref for check_dependencies + import psutil as _psutil_mod # type: ignore[no-redef] +except ImportError: + pass + +try: # noqa: SIM105 - need module ref for check_dependencies + import atheris as _atheris_mod # type: ignore[no-redef] +except ImportError: + pass + +from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 + BaseFuzzerState, + build_base_stats_dict, + build_weighted_schedule, + check_dependencies, + emit_checkpoint_report, + emit_final_report, +) + +check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) + +import atheris # noqa: E402, I001 # pylint: disable=C0412,C0413 + + +# --- Domain Metrics --- + + +@dataclass +class BridgeMetrics: + """Domain-specific metrics for bridge fuzzer.""" + + # Registration tests + register_calls: int = 0 + register_failures: int = 0 + + # Call dispatch + call_dispatch_tests: int = 0 + call_dispatch_errors: int = 0 + + # FluentNumber contract checks + fluent_number_checks: int = 0 + make_fluent_number_checks: int = 0 + + # Camel case conversions + camel_case_tests: int = 0 + + # Freeze/copy operations + freeze_copy_tests: int = 0 + + # Locale injection tests + locale_injection_tests: int = 0 + + # Signature validation + signature_validation_tests: int = 0 + + # Metadata API tests + metadata_api_tests: int = 0 + + # Evil object tests + evil_object_tests: int = 0 + + +# --- Global State --- + +_state = BaseFuzzerState( + fuzzer_name="bridge", + fuzzer_target="FunctionRegistry, FunctionSignature, FluentNumber, make_fluent_number", +) +_domain = BridgeMetrics() + +# Pattern weights: (name, weight) +# 16 patterns across 4 categories: +# REGISTRATION (4): register_basic, register_signatures, param_mapping_custom, +# signature_validation +# CONTRACTS (4): fluent_number_contracts, make_fluent_number_api, +# signature_immutability, camel_case_conversion +# DISPATCH (4): call_dispatch, locale_injection, error_wrapping, evil_objects +# INTROSPECTION (4): dict_interface, freeze_copy_lifecycle, fluent_function_decorator, +# metadata_api +_PATTERN_WEIGHTS: tuple[tuple[str, int], ...] = ( + # REGISTRATION + ("register_basic", 10), + ("register_signatures", 12), + ("param_mapping_custom", 8), + ("signature_validation", 6), + # CONTRACTS + ("fluent_number_contracts", 12), + ("make_fluent_number_api", 10), + ("signature_immutability", 5), + ("camel_case_conversion", 10), + # DISPATCH + ("call_dispatch", 12), + ("locale_injection", 10), + ("error_wrapping", 7), + ("evil_objects", 5), + # INTROSPECTION + ("dict_interface", 8), + ("freeze_copy_lifecycle", 8), + ("fluent_function_decorator", 8), + ("metadata_api", 6), +) + +_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( + [name for name, _ in _PATTERN_WEIGHTS], + [weight for _, weight in _PATTERN_WEIGHTS], +) + +# Register intended weights for skew detection +_state.pattern_intended_weights = {name: float(weight) for name, weight in _PATTERN_WEIGHTS} + + +class BridgeFuzzError(Exception): + """Raised when a bridge invariant is breached.""" + + +# Allowed exceptions from bridge operations +_ALLOWED_EXCEPTIONS = ( + ValueError, + TypeError, + OverflowError, + ArithmeticError, + RecursionError, + RuntimeError, +) + + +# --- Reporting --- + +_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "bridge" + + +def _build_stats_dict() -> dict[str, Any]: + """Build complete stats dictionary including domain metrics.""" + stats = build_base_stats_dict(_state) + + # Registration + stats["register_calls"] = _domain.register_calls + stats["register_failures"] = _domain.register_failures + + # Call dispatch + stats["call_dispatch_tests"] = _domain.call_dispatch_tests + stats["call_dispatch_errors"] = _domain.call_dispatch_errors + + # FluentNumber + stats["fluent_number_checks"] = _domain.fluent_number_checks + stats["make_fluent_number_checks"] = _domain.make_fluent_number_checks + + # Camel case + stats["camel_case_tests"] = _domain.camel_case_tests + + # Freeze/copy + stats["freeze_copy_tests"] = _domain.freeze_copy_tests + + # Locale injection + stats["locale_injection_tests"] = _domain.locale_injection_tests + + # Signature validation + stats["signature_validation_tests"] = _domain.signature_validation_tests + + # Metadata API + stats["metadata_api_tests"] = _domain.metadata_api_tests + + # Evil objects + stats["evil_object_tests"] = _domain.evil_object_tests + + return stats + + +_REPORT_FILENAME = "fuzz_bridge_report.json" + + +def _emit_checkpoint() -> None: + """Emit periodic checkpoint (uses checkpoint markers).""" + stats = _build_stats_dict() + emit_checkpoint_report( + _state, + stats, + _REPORT_DIR, + _REPORT_FILENAME, + ) + + +def _emit_report() -> None: + """Emit comprehensive final report (crash-proof).""" + stats = _build_stats_dict() + emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) + + +atexit.register(_emit_report) + + +# --- Suppress logging and instrument imports --- +logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) + +with atheris.instrument_imports(include=["ftllexengine"]): + from ftllexengine.core.value_types import make_fluent_number + from ftllexengine.runtime.function_bridge import ( + FluentNumber, + ) + + +# --- Constants --- + +_LOCALES: Sequence[str] = ( + "en", + "en_US", + "de", + "de_DE", + "ar", + "ar_SA", + "ja", + "ja_JP", + "fr", + "fr_FR", + "ru", +) + +# Snake_case names for _to_camel_case testing +_SNAKE_CASE_NAMES: Sequence[str] = ( + "minimum_fraction_digits", + "maximum_fraction_digits", + "use_grouping", + "date_style", + "time_style", + "currency_display", + "value", + "x", + "_private_param", + "__dunder_param", + "a_b_c_d_e", + "already_camel", + "", + "_", + "__", + "___", + "UPPER_CASE", + "mixed_Case_Style", + "single", +) + +# Expected camelCase conversions for invariant checking +_CAMEL_EXPECTED: dict[str, str] = { + "minimum_fraction_digits": "minimumFractionDigits", + "maximum_fraction_digits": "maximumFractionDigits", + "use_grouping": "useGrouping", + "value": "value", + "x": "x", + "single": "single", +} + + +def _pick_locale(fdp: atheris.FuzzedDataProvider) -> str: + """Pick locale: 90% valid, 10% fuzzed.""" + if fdp.ConsumeIntInRange(0, 9) < 9: + return fdp.PickValueInList(list(_LOCALES)) + return fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 20)) + + +def _group_ascii_thousands(value: int) -> str: + """Render an integer with ASCII comma grouping.""" + digits = str(abs(value)) + groups: list[str] = [] + while digits: + groups.append(digits[-3:]) + digits = digits[:-3] + grouped = ",".join(reversed(groups)) + return f"-{grouped}" if value < 0 else grouped + + +def _call_make_fluent_number( + value: int | Decimal, + *, + formatted: str | None = None, +) -> FluentNumber: + """Call make_fluent_number and fail hard on unexpected valid-input errors.""" + try: + return make_fluent_number(value, formatted=formatted) + except (TypeError, ValueError) as err: + msg = ( + "make_fluent_number unexpectedly rejected a valid contract input: " + f"value={value!r}, formatted={formatted!r}, error={err}" + ) + raise BridgeFuzzError(msg) from err + +__all__ = [name for name in globals() if not name.startswith("__")] diff --git a/fuzz_atheris/fuzz_builtins.py b/fuzz_atheris/fuzz_builtins.py index da0abc21..4751db11 100644 --- a/fuzz_atheris/fuzz_builtins.py +++ b/fuzz_atheris/fuzz_builtins.py @@ -1,1016 +1,9 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: builtins - Built-in Functions (Babel Boundary) -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END -"""Built-in Function Boundary Fuzzer (Atheris). - -Targets: ftllexengine.runtime.functions (NUMBER, DATETIME, CURRENCY) - -Concern boundary: This fuzzer stress-tests the Babel formatting boundary by -calling NUMBER, DATETIME, and CURRENCY functions directly through the Python -API. This is distinct from fuzz_runtime which invokes these functions through -FTL syntax and the resolver stack. Direct API testing isolates the Babel layer -from resolver/cache behavior and enables: -- Fuzz-generated Babel pattern strings (pattern= parameter) -- FluentNumber precision (CLDR v operand) correctness verification -- Currency-specific decimal digit enforcement (JPY=0, BHD=3) -- Type coercion across int/float/Decimal/FluentNumber inputs -- Cross-locale formatting consistency (same value, multiple locales) -- Edge value handling (NaN, Inf, -0.0, extreme magnitudes) - -FunctionRegistry lifecycle, parameter mapping, and locale injection protocol -are covered by fuzz_bridge.py. This fuzzer focuses exclusively on the -formatting output correctness boundary. - -Requires Python 3.13+ (uses PEP 695 type aliases). -""" +"""Built-in function boundary Atheris entry wrapper.""" from __future__ import annotations -import argparse -import atexit -import gc -import logging -import pathlib -import re -import sys -import time -from dataclasses import dataclass -from datetime import UTC, datetime, timedelta, timezone -from decimal import ROUND_HALF_EVEN, Decimal, InvalidOperation -from math import isinf, isnan -from typing import TYPE_CHECKING, Any - -if TYPE_CHECKING: - from collections.abc import Sequence - -# --- Dependency Checks --- -_psutil_mod: Any = None -_atheris_mod: Any = None - -try: # noqa: SIM105 - need module ref for check_dependencies - import psutil as _psutil_mod # type: ignore[no-redef] -except ImportError: - pass - -try: # noqa: SIM105 - need module ref for check_dependencies - import atheris as _atheris_mod # type: ignore[no-redef] -except ImportError: - pass - -from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 - GC_INTERVAL, - BaseFuzzerState, - build_base_stats_dict, - build_weighted_schedule, - check_dependencies, - emit_checkpoint_report, - emit_final_report, - get_process, - print_fuzzer_banner, - record_iteration_metrics, - record_memory, - run_fuzzer, - select_pattern_round_robin, -) - -check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) - -import atheris # noqa: E402 # pylint: disable=C0412,C0413 - -# --- Domain Metrics --- - -@dataclass -class BuiltinsMetrics: - """Domain-specific metrics for builtins fuzzer.""" - - # Per-function call counts - number_calls: int = 0 - datetime_calls: int = 0 - currency_calls: int = 0 - - # Precision tracking - precision_checks: int = 0 - precision_violations: int = 0 - - # Cross-locale tests - cross_locale_tests: int = 0 - cross_locale_empty_results: int = 0 - - # Type coercion tests - type_coercion_tests: int = 0 - - # Custom pattern tests - custom_pattern_tests: int = 0 - - # Edge value encounters - edge_nan_count: int = 0 - edge_inf_count: int = 0 - edge_zero_count: int = 0 - - # Rounding oracle: ROUND_HALF_EVEN verification (Babel default) - rounding_oracle_checks: int = 0 - rounding_oracle_violations: int = 0 - - # Input domain coverage: min_frac > max_frac cases - min_gt_max_tests: int = 0 - - -# --- Global State --- - -_state = BaseFuzzerState( - seed_corpus_max_size=500, - fuzzer_name="builtins", - fuzzer_target="NUMBER, DATETIME, CURRENCY (Babel boundary)", -) -_domain = BuiltinsMetrics() - -# Pattern weights: (name, weight) - focused on Babel boundary, no bridge overlap -_PATTERN_WEIGHTS: tuple[tuple[str, int], ...] = ( - ("number_basic", 12), - ("number_precision", 15), - ("number_edges", 8), - ("number_type_variety", 8), - ("datetime_styles", 10), - ("datetime_edges", 8), - ("datetime_timezone_stress", 6), - ("currency_codes", 12), - ("currency_precision", 10), - ("currency_cross_locale", 8), - ("custom_pattern", 8), - ("cross_locale_consistency", 8), - ("error_paths", 5), -) - -_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( - [name for name, _ in _PATTERN_WEIGHTS], - [weight for _, weight in _PATTERN_WEIGHTS], -) - -# Register intended weights for skew detection -_state.pattern_intended_weights = {name: float(weight) for name, weight in _PATTERN_WEIGHTS} - - -class BuiltinsFuzzError(Exception): - """Raised when a fuzzer invariant is violated.""" - - -# Allowed exceptions from Babel / formatting functions -ALLOWED_EXCEPTIONS = ( - ValueError, - TypeError, - OverflowError, - InvalidOperation, - OSError, - ArithmeticError, -) - - -# --- Reporting --- - -_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "builtins" - - -def _build_stats_dict() -> dict[str, Any]: - """Build complete stats dictionary including domain metrics.""" - stats = build_base_stats_dict(_state) - - # Per-function call counts - stats["number_calls"] = _domain.number_calls - stats["datetime_calls"] = _domain.datetime_calls - stats["currency_calls"] = _domain.currency_calls - - # Precision tracking - stats["precision_checks"] = _domain.precision_checks - stats["precision_violations"] = _domain.precision_violations - - # Cross-locale - stats["cross_locale_tests"] = _domain.cross_locale_tests - stats["cross_locale_empty_results"] = _domain.cross_locale_empty_results - - # Type coercion - stats["type_coercion_tests"] = _domain.type_coercion_tests - - # Custom patterns - stats["custom_pattern_tests"] = _domain.custom_pattern_tests - - # Edge values - stats["edge_nan_count"] = _domain.edge_nan_count - stats["edge_inf_count"] = _domain.edge_inf_count - stats["edge_zero_count"] = _domain.edge_zero_count - - # Rounding oracle - stats["rounding_oracle_checks"] = _domain.rounding_oracle_checks - stats["rounding_oracle_violations"] = _domain.rounding_oracle_violations - - # Input domain coverage - stats["min_gt_max_tests"] = _domain.min_gt_max_tests - - return stats - - -_REPORT_FILENAME = "fuzz_builtins_report.json" - - -def _emit_checkpoint() -> None: - """Emit periodic checkpoint (uses checkpoint markers).""" - stats = _build_stats_dict() - emit_checkpoint_report( - _state, stats, _REPORT_DIR, _REPORT_FILENAME, - ) - - -def _emit_report() -> None: - """Emit comprehensive final report (crash-proof).""" - stats = _build_stats_dict() - emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) - - -atexit.register(_emit_report) - - -# --- Suppress logging and instrument imports --- -logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) - -with atheris.instrument_imports(include=["ftllexengine"]): - from ftllexengine.diagnostics.errors import FrozenFluentError - from ftllexengine.runtime.function_bridge import FluentNumber - from ftllexengine.runtime.functions import ( - currency_format, - datetime_format, - number_format, - ) - - -# --- Constants --- - -_LOCALES: Sequence[str] = ( - "en-US", "de-DE", "ar-EG", "zh-Hans-CN", "ja-JP", - "lv-LV", "fr-FR", "pt-BR", "hi-IN", "root", -) - -_VALID_ISO_CURRENCIES: Sequence[str] = ( - "USD", "EUR", "GBP", "JPY", "CHF", "CNY", "BRL", - "INR", "KRW", "BHD", "KWD", "OMR", -) - -_CURRENCY_DISPLAY_MODES: Sequence[str] = ("symbol", "code", "name") - -_DATE_STYLES: Sequence[str] = ("short", "medium", "long", "full") - -# Numbers that exercise precision boundary conditions -_PRECISION_NUMBERS: Sequence[Decimal] = ( - Decimal(0), Decimal(1), Decimal("1.0"), Decimal("1.00"), - Decimal("1.5"), Decimal("1.50"), Decimal("0.001"), - Decimal("1234567.89"), Decimal("-1.5"), Decimal("0.10"), - Decimal("999999999.999"), -) - -# Edge float values -_EDGE_FLOATS: Sequence[float] = ( - 0.0, -0.0, 1e-10, 1e10, 1e100, 1e308, - float("inf"), float("-inf"), float("nan"), - -1.0, 0.1, 0.01, 0.001, -) - -# Timestamp boundaries for DATETIME -_MAX_TIMESTAMP = 253402300799.0 # 9999-12-31T23:59:59 UTC - - -# --- Helpers --- - -def _pick_locale(fdp: atheris.FuzzedDataProvider) -> str: - """Pick locale: 90% valid, 10% fuzzed.""" - if fdp.ConsumeIntInRange(0, 9) < 9: - return fdp.PickValueInList(list(_LOCALES)) - return fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(2, 15)) - - -def _make_decimal(fdp: atheris.FuzzedDataProvider) -> Decimal: - """Generate a Decimal from fuzzed float, including NaN and Infinity. - - Decimal(str(float('nan'))) -> Decimal('NaN') and - Decimal(str(float('inf'))) -> Decimal('Infinity') without raising; - no exception handler needed. - """ - return Decimal(str(fdp.ConsumeFloat())) - - -def _values_match(a: object, b: object) -> bool: - """NaN-safe value comparison for cross-locale invariant checks. - - IEEE 754 defines NaN != NaN, so naive != comparison falsely reports - value drift when both sides are NaN. This function treats two NaN - values of the same type as matching. - """ - if isinstance(a, Decimal) and isinstance(b, Decimal) and a.is_nan() and b.is_nan(): - return True - if isinstance(a, float) and isinstance(b, float) and isnan(a) and isnan(b): - return True - return a == b - - -def _extract_oracle_digits(formatted: str, locale: str) -> str | None: - """Extract absolute numeric digits from a formatted string for oracle comparison. - - Uses Babel to look up locale-specific decimal and grouping separators. - Returns None when digit extraction is not possible (non-ASCII digits, - ambiguous separators, or unknown locale). - - The extraction algorithm: - 1. Skip if any digit character is non-ASCII (e.g., ar-EG Arabic-Indic, - hi-IN Devanagari); these cannot be compared against ASCII oracle values. - 2. Look up locale decimal and group symbols via Babel. - 3. Remove group separators (critical for de-DE where group sep is '.'). - 4. Replace decimal separator with ASCII '.'. - 5. Strip all remaining non-digit, non-dot characters (currency codes, - whitespace, signs) via regex. Whitespace-based group separators - (lv-LV, fr-FR thin-space) are handled by this final strip. - """ - # Skip locales where any digit character is non-ASCII. - if any(c.isdigit() and not c.isascii() for c in formatted): - return None - try: - # Deferred import: Babel is optional at ftllexengine package level. - # At fuzzing time Babel is always present (required by the functions - # under test), but the import is deferred to match project conventions. - from babel.numbers import ( - get_decimal_symbol, - get_group_symbol, - ) - # Babel expects underscore-separated locale IDs ('en_US', 'de_DE'); - # ftllexengine uses BCP 47 hyphen-separated codes ('en-US', 'de-DE'). - babel_locale = locale.replace("-", "_") - decimal_sym = get_decimal_symbol(babel_locale) - group_sym = get_group_symbol(babel_locale) - except ValueError: - # Babel raises UnknownLocaleError (ValueError subclass) for invalid locales. - return None - # Guard: ambiguous separators (same symbol for both) cannot be parsed reliably. - if decimal_sym == group_sym: - return None - # Step 1: remove group separators before replacing decimal separator. - # This is critical when group_sym == '.' (e.g., de-DE): removing it first - # prevents '1.234,56' → '1.234.56' (two dots, wrong result). - normalized = formatted.replace(group_sym, "").replace(decimal_sym, ".") - # Step 2: strip all remaining non-digit, non-dot characters (currency codes, - # whitespace, signs). Handles whitespace-variant group seps (lv-LV, fr-FR). - digits = re.sub(r"[^\d.]", "", normalized) - return digits or None - - -# ============================================================================= -# Pattern implementations -# ============================================================================= - - -def _pattern_number_basic(fdp: atheris.FuzzedDataProvider) -> None: - """NUMBER with varied fraction digits, grouping, and locales.""" - locale = _pick_locale(fdp) - val = _make_decimal(fdp) - min_frac = fdp.ConsumeIntInRange(0, 10) - max_frac = fdp.ConsumeIntInRange(0, 20) # Independent: allows min > max (clamp path) - grouping = fdp.ConsumeBool() - - _domain.number_calls += 1 - result = number_format( - val, locale, - minimum_fraction_digits=min_frac, - maximum_fraction_digits=max_frac, - use_grouping=grouping, - ) - - # Invariant: result must be FluentNumber - if not isinstance(result, FluentNumber): - msg = f"number_format returned {type(result).__name__}, expected FluentNumber" - raise BuiltinsFuzzError(msg) - - -def _pattern_number_precision(fdp: atheris.FuzzedDataProvider) -> None: - """Verify FluentNumber precision (CLDR v operand) correctness. - - The v operand is the count of visible fraction digits in the formatted - output. This is critical for plural rule matching. - """ - locale = _pick_locale(fdp) - # Use precision-sensitive numbers - val = ( - fdp.PickValueInList(list(_PRECISION_NUMBERS)) - if fdp.ConsumeBool() - else _make_decimal(fdp) - ) - - min_frac = fdp.ConsumeIntInRange(0, 6) - max_frac = fdp.ConsumeIntInRange(0, 10) # Independent: allows min > max (clamp path) - if min_frac > max_frac: - _domain.min_gt_max_tests += 1 - - _domain.number_calls += 1 - _domain.precision_checks += 1 - result = number_format( - val, locale, - minimum_fraction_digits=min_frac, - maximum_fraction_digits=max_frac, - use_grouping=False, - ) - - # Invariant: precision must be non-negative integer - if not isinstance(result, FluentNumber): - return - if result.precision is not None and result.precision < 0: - _domain.precision_violations += 1 - msg = ( - f"Negative precision {result.precision} for val={val}, " - f"locale={locale}, min={min_frac}, max={max_frac}" - ) - raise BuiltinsFuzzError(msg) - - # Rounding oracle: verify ROUND_HALF_EVEN across all ASCII-digit locales. - # Babel uses decimal_quantization=True by default, which applies ROUND_HALF_EVEN - # (IEEE 754 banker's rounding). _extract_oracle_digits handles locale-specific - # decimal and group separators; returns None for non-ASCII-digit locales (ar-EG). - # NaN guard is explicit: Decimal.quantize() does NOT raise InvalidOperation for - # quiet NaN -- it silently propagates and returns Decimal('NaN'). Only Infinity - # raises InvalidOperation. Without the is_nan() check, the oracle compares - # 'NaN' against whatever Babel emits for NaN input, producing a false violation. - val_d = result.value - if isinstance(val_d, Decimal) and result.precision is not None and not val_d.is_nan(): - prec = result.precision - try: - expected = abs(val_d).quantize(Decimal(10) ** -prec, rounding=ROUND_HALF_EVEN) - except InvalidOperation: - pass # Infinity: skip oracle - else: - digits_only = _extract_oracle_digits(result.formatted, locale) - if digits_only is not None: - _domain.rounding_oracle_checks += 1 - if digits_only != str(expected): - _domain.rounding_oracle_violations += 1 - msg = ( - f"Rounding oracle: got {digits_only!r}, expected {str(expected)!r} " - f"for val={val_d}, locale={locale}, min={min_frac}, max={max_frac}" - ) - raise BuiltinsFuzzError(msg) - - -def _pattern_number_edges(fdp: atheris.FuzzedDataProvider) -> None: - """Edge float values: NaN, Inf, -0.0, huge, tiny.""" - locale = _pick_locale(fdp) - val_float = fdp.PickValueInList(list(_EDGE_FLOATS)) - - # Track edge value types - if isnan(val_float): - _domain.edge_nan_count += 1 - elif isinf(val_float): - _domain.edge_inf_count += 1 - elif val_float == 0.0: - _domain.edge_zero_count += 1 - - # Decimal(str(float)) never raises for NaN/Inf: - # float('nan') -> 'nan' -> Decimal('NaN'), float('inf') -> Decimal('Infinity'). - val = Decimal(str(val_float)) - - _domain.number_calls += 1 - number_format( - val, locale, - minimum_fraction_digits=fdp.ConsumeIntInRange(0, 5), - maximum_fraction_digits=fdp.ConsumeIntInRange(0, 10), - use_grouping=fdp.ConsumeBool(), - ) - - -def _pattern_number_type_variety(fdp: atheris.FuzzedDataProvider) -> None: - """Test NUMBER with int, float, Decimal, and FluentNumber inputs. - - Verifies type coercion works correctly across all numeric types - that could be passed as FTL variable values. - """ - locale = _pick_locale(fdp) - _domain.type_coercion_tests += 1 - _domain.number_calls += 1 - - input_type = fdp.ConsumeIntInRange(0, 3) - match input_type: - case 0: - # int input - val = Decimal(fdp.ConsumeIntInRange(-999999, 999999)) - case 1: - # float input (via Decimal conversion) - val = _make_decimal(fdp) - case 2: - # Precision-sensitive Decimal - val = fdp.PickValueInList(list(_PRECISION_NUMBERS)) - case _: - # FluentNumber as input (result of previous NUMBER call) - inner = number_format( - Decimal(str(fdp.ConsumeIntInRange(1, 100))), locale, - minimum_fraction_digits=2, - ) - # Format the FluentNumber again (nested call) - val = Decimal(str(inner.value)) if isinstance(inner, FluentNumber) else Decimal(0) - - result = number_format( - val, locale, - minimum_fraction_digits=fdp.ConsumeIntInRange(0, 6), - maximum_fraction_digits=fdp.ConsumeIntInRange(0, 10), - ) - - if not isinstance(result, FluentNumber): - msg = f"number_format returned {type(result).__name__} for {type(val).__name__} input" - raise BuiltinsFuzzError(msg) - - -def _pattern_datetime_styles(fdp: atheris.FuzzedDataProvider) -> None: - """DATETIME with all style combinations.""" - locale = _pick_locale(fdp) - # Safe timestamp range - timestamp = fdp.ConsumeFloat() % _MAX_TIMESTAMP - if timestamp < 0: - timestamp = abs(timestamp) - - try: - dt = datetime.fromtimestamp(timestamp, tz=UTC) - except (OSError, OverflowError, ValueError): - return - - date_style = fdp.PickValueInList(list(_DATE_STYLES)) - use_time = fdp.ConsumeBool() - time_style = fdp.PickValueInList(list(_DATE_STYLES)) if use_time else None - - _domain.datetime_calls += 1 - result = datetime_format( - dt, locale, - date_style=date_style, - time_style=time_style, - ) - - # Invariant: result must be non-empty string - if not isinstance(result, str) or not result: - msg = ( - f"datetime_format returned empty/non-str: {result!r} " - f"for locale={locale}, date_style={date_style}" - ) - raise BuiltinsFuzzError(msg) - - -def _pattern_datetime_edges(fdp: atheris.FuzzedDataProvider) -> None: - """Edge timestamps and timezone variations.""" - locale = _pick_locale(fdp) - - # Edge timestamps - edge_timestamps = [ - 0.0, # Unix epoch - 86400.0, # One day - -86400.0, # Before epoch - 946684800.0, # Y2K - _MAX_TIMESTAMP, # Max safe - ] - timestamp = fdp.PickValueInList(edge_timestamps) - - try: - dt = datetime.fromtimestamp(timestamp, tz=UTC) - except (OSError, OverflowError, ValueError): - return - - # Test with different timezone offsets - if fdp.ConsumeBool(): - offset_hours = fdp.ConsumeIntInRange(-12, 14) - tz = timezone(timedelta(hours=offset_hours)) - dt = dt.astimezone(tz) - - _domain.datetime_calls += 1 - datetime_format( - dt, locale, - date_style=fdp.PickValueInList(list(_DATE_STYLES)), - time_style=fdp.PickValueInList(list(_DATE_STYLES)) if fdp.ConsumeBool() else None, - ) - - -def _pattern_datetime_timezone_stress(fdp: atheris.FuzzedDataProvider) -> None: - """Stress-test timezone handling with extreme offsets and DST boundaries. - - Tests the DATETIME function with timezone offsets at the edges of - the valid range, timestamps near DST transitions, and unusual - UTC offset values. - """ - locale = _pick_locale(fdp) - - # Base timestamp: mix of safe values and edge cases - base_timestamps = [ - 0.0, # Epoch - 1647302400.0, # March 2022 (DST transition period) - 1667091600.0, # Nov 2022 (DST fall-back period) - 946684800.0, # Y2K - 1704067200.0, # 2024-01-01 - 86400.0 * 365, # One year - ] - timestamp = fdp.PickValueInList(base_timestamps) - - # Add fuzzed offset to push near boundaries - offset_seconds = fdp.ConsumeIntInRange(-43200, 43200) - - try: - # Create with extreme timezone offset (±12h in 15min increments) - offset_minutes = fdp.ConsumeIntInRange(-720, 840) - tz = timezone(timedelta(minutes=offset_minutes)) - dt = datetime.fromtimestamp(timestamp + offset_seconds, tz=tz) - except (OSError, OverflowError, ValueError): - return - - _domain.datetime_calls += 1 - result = datetime_format( - dt, locale, - date_style=fdp.PickValueInList(list(_DATE_STYLES)), - time_style=fdp.PickValueInList(list(_DATE_STYLES)) if fdp.ConsumeBool() else None, - ) - - if not isinstance(result, str) or not result: - msg = f"datetime_format returned empty for tz offset {offset_minutes}min" - raise BuiltinsFuzzError(msg) - - -def _pattern_currency_codes(fdp: atheris.FuzzedDataProvider) -> None: - """CURRENCY with valid/invalid ISO codes and display modes.""" - locale = _pick_locale(fdp) - val = _make_decimal(fdp) - - # 80% valid ISO code, 20% fuzzed - if fdp.ConsumeIntInRange(0, 4) < 4: - currency = fdp.PickValueInList(list(_VALID_ISO_CURRENCIES)) - else: - currency = fdp.ConsumeUnicodeNoSurrogates(3).upper() - - display = fdp.PickValueInList(list(_CURRENCY_DISPLAY_MODES)) - - _domain.currency_calls += 1 - result = currency_format( - val, locale, - currency=currency, - currency_display=display, - ) - - # Invariant: result must be FluentNumber - if not isinstance(result, FluentNumber): - msg = f"currency_format returned {type(result).__name__}" - raise BuiltinsFuzzError(msg) - - -def _pattern_currency_precision(fdp: atheris.FuzzedDataProvider) -> None: - """Currency-specific decimal digits: JPY=0, BHD=3, EUR/USD=2.""" - locale = _pick_locale(fdp) - - # Currencies with known decimal digits - currency_decimals = { - "JPY": 0, "KRW": 0, # 0 decimals - "USD": 2, "EUR": 2, # 2 decimals - "BHD": 3, "KWD": 3, # 3 decimals - } - - currency = fdp.PickValueInList(list(currency_decimals.keys())) - val = fdp.PickValueInList(list(_PRECISION_NUMBERS)) - - _domain.currency_calls += 1 - _domain.precision_checks += 1 - result = currency_format( - val, locale, - currency=currency, - currency_display="code", - ) - - # Invariant: precision must be non-negative - if isinstance(result, FluentNumber) and result.precision is not None and result.precision < 0: - _domain.precision_violations += 1 - msg = ( - f"Negative precision {result.precision} for " - f"currency={currency}, val={val}" - ) - raise BuiltinsFuzzError(msg) - - # Rounding oracle: verify ROUND_HALF_EVEN for known currency decimal counts. - # Babel's decimal_quantization=True applies ROUND_HALF_EVEN by default. - # _extract_oracle_digits handles locale-specific separators and skips - # non-ASCII-digit locales. NaN guard is explicit: Decimal.quantize() silently - # propagates quiet NaN (returns Decimal('NaN')) instead of raising - # InvalidOperation. Only Infinity raises. Without is_nan(), the oracle fires - # a false violation when Babel formats NaN differently from str(Decimal('NaN')). - if isinstance(result, FluentNumber) and result.precision is not None: - val_d = result.value - if isinstance(val_d, Decimal) and not val_d.is_nan(): - expected_prec = currency_decimals[currency] - quantizer = Decimal(10) ** -expected_prec - try: - expected = abs(val_d).quantize(quantizer, rounding=ROUND_HALF_EVEN) - except InvalidOperation: - pass # Infinity: skip oracle - else: - digits_only = _extract_oracle_digits(result.formatted, locale) - if digits_only is not None: - _domain.rounding_oracle_checks += 1 - if digits_only != str(expected): - _domain.rounding_oracle_violations += 1 - msg = ( - f"Currency rounding oracle: got {digits_only!r}, " - f"expected {str(expected)!r} " - f"for currency={currency}, val={val_d}, locale={locale}" - ) - raise BuiltinsFuzzError(msg) - - -def _pattern_currency_cross_locale(fdp: atheris.FuzzedDataProvider) -> None: - """Same currency amount formatted across multiple locales. - - Verifies that the same value + currency code produces valid output - in every locale, and that the FluentNumber.value is preserved. - """ - val = fdp.PickValueInList(list(_PRECISION_NUMBERS)) - currency = fdp.PickValueInList(list(_VALID_ISO_CURRENCIES)) - display = fdp.PickValueInList(list(_CURRENCY_DISPLAY_MODES)) - - results: list[FluentNumber] = [] - num_locales = fdp.ConsumeIntInRange(3, 6) - locales_to_test = [ - fdp.PickValueInList(list(_LOCALES)) for _ in range(num_locales) - ] - - _domain.cross_locale_tests += 1 - for locale in locales_to_test: - _domain.currency_calls += 1 - result = currency_format( - val, locale, - currency=currency, - currency_display=display, - ) - if isinstance(result, FluentNumber): - results.append(result) - - # Invariant: all results should have the same underlying numeric value - if len(results) >= 2: - first_val = results[0].value - for r in results[1:]: - if not _values_match(r.value, first_val): - msg = ( - f"Currency value drift: {first_val} vs {r.value} " - f"for {currency} across locales" - ) - raise BuiltinsFuzzError(msg) - - -def _pattern_custom_pattern(fdp: atheris.FuzzedDataProvider) -> None: - """Custom Babel patterns for NUMBER, DATETIME, CURRENCY.""" - locale = _pick_locale(fdp) - target = fdp.ConsumeIntInRange(0, 2) - _domain.custom_pattern_tests += 1 - - # Mix of valid and fuzzed patterns - number_patterns = [ - "#,##0.00", "#,##0", "0.###", "#,##0.00;(#,##0.00)", - "0.0", "#", "##0.00%", - ] - date_patterns = [ - "yyyy-MM-dd", "dd/MM/yyyy", "MMMM d, yyyy", - "HH:mm:ss", "EEE, d MMM yyyy", - ] - - match target: - case 0: # NUMBER with pattern - if fdp.ConsumeBool(): - pattern = fdp.PickValueInList(number_patterns) - else: - pattern = fdp.ConsumeUnicodeNoSurrogates(20) - _domain.number_calls += 1 - number_format( - _make_decimal(fdp), locale, - pattern=pattern, - ) - case 1: # DATETIME with pattern - timestamp = abs(fdp.ConsumeFloat()) % _MAX_TIMESTAMP - try: - dt = datetime.fromtimestamp(timestamp, tz=UTC) - except (OSError, OverflowError, ValueError): - return - if fdp.ConsumeBool(): - pattern = fdp.PickValueInList(date_patterns) - else: - pattern = fdp.ConsumeUnicodeNoSurrogates(20) - _domain.datetime_calls += 1 - datetime_format(dt, locale, pattern=pattern) - case _: # CURRENCY with pattern - if fdp.ConsumeBool(): - pattern = fdp.PickValueInList(number_patterns) - else: - pattern = fdp.ConsumeUnicodeNoSurrogates(20) - _domain.currency_calls += 1 - currency_format( - _make_decimal(fdp), locale, - currency=fdp.PickValueInList(list(_VALID_ISO_CURRENCIES)), - pattern=pattern, - ) - - -def _pattern_cross_locale_consistency(fdp: atheris.FuzzedDataProvider) -> None: - """Same numeric value formatted across multiple locales. - - Verifies all locales produce a non-empty result and that the - underlying FluentNumber.value is preserved across locales. - """ - val = _make_decimal(fdp) - min_frac = fdp.ConsumeIntInRange(0, 4) - max_frac = fdp.ConsumeIntInRange(0, 8) # Independent: allows min > max (clamp path) - - _domain.cross_locale_tests += 1 - num_locales = fdp.ConsumeIntInRange(3, 8) - locales_to_test = [ - fdp.PickValueInList(list(_LOCALES)) for _ in range(num_locales) - ] - - results: list[FluentNumber] = [] - for locale in locales_to_test: - _domain.number_calls += 1 - result = number_format( - val, locale, - minimum_fraction_digits=min_frac, - maximum_fraction_digits=max_frac, - ) - if isinstance(result, FluentNumber): - results.append(result) - if not str(result): - _domain.cross_locale_empty_results += 1 - - # Invariant: all results should preserve the same underlying value - if len(results) >= 2: - first_val = results[0].value - for r in results[1:]: - if not _values_match(r.value, first_val): - msg = ( - f"Value drift across locales: {first_val} vs {r.value} " - f"for input {val}" - ) - raise BuiltinsFuzzError(msg) - - -def _pattern_error_paths(fdp: atheris.FuzzedDataProvider) -> None: - """Invalid inputs, type mismatches, boundary violations.""" - locale = _pick_locale(fdp) - error_case = fdp.ConsumeIntInRange(0, 4) - - match error_case: - case 0: - # Invalid fraction digits (negative) - _domain.number_calls += 1 - number_format( - Decimal("1.5"), locale, - minimum_fraction_digits=-1, - maximum_fraction_digits=fdp.ConsumeIntInRange(-5, 5), - ) - case 1: - # Very large fraction digits - _domain.number_calls += 1 - number_format( - Decimal("1.5"), locale, - minimum_fraction_digits=fdp.ConsumeIntInRange(50, 200), - maximum_fraction_digits=fdp.ConsumeIntInRange(50, 200), - ) - case 2: - # Empty currency code - _domain.currency_calls += 1 - currency_format( - Decimal(100), locale, - currency="", - ) - case 3: - # Invalid currency code (too long / too short) - bad_code = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 50)) - _domain.currency_calls += 1 - currency_format( - Decimal(100), locale, - currency=bad_code, - ) - case _: - # Fuzzed date style strings - timestamp = abs(fdp.ConsumeFloat()) % _MAX_TIMESTAMP - try: - dt = datetime.fromtimestamp(timestamp, tz=UTC) - except (OSError, OverflowError, ValueError): - return - _domain.datetime_calls += 1 - datetime_format( - dt, locale, - date_style=fdp.ConsumeUnicodeNoSurrogates(10), - time_style=fdp.ConsumeUnicodeNoSurrogates(10) if fdp.ConsumeBool() else None, - ) - - -# --- Pattern Dispatch --- - -_PATTERN_DISPATCH: dict[str, Any] = { - "number_basic": _pattern_number_basic, - "number_precision": _pattern_number_precision, - "number_edges": _pattern_number_edges, - "number_type_variety": _pattern_number_type_variety, - "datetime_styles": _pattern_datetime_styles, - "datetime_edges": _pattern_datetime_edges, - "datetime_timezone_stress": _pattern_datetime_timezone_stress, - "currency_codes": _pattern_currency_codes, - "currency_precision": _pattern_currency_precision, - "currency_cross_locale": _pattern_currency_cross_locale, - "custom_pattern": _pattern_custom_pattern, - "cross_locale_consistency": _pattern_cross_locale_consistency, - "error_paths": _pattern_error_paths, -} - - -# --- Main Entry Point --- - - -def test_one_input(data: bytes) -> None: - """Atheris entry point: Test built-in formatting functions.""" - if _state.iterations == 0: - _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) - - _state.iterations += 1 - _state.status = "running" - - if _state.iterations % _state.checkpoint_interval == 0: - _emit_checkpoint() - - start_time = time.perf_counter() - fdp = atheris.FuzzedDataProvider(data) - - pattern = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) - _state.pattern_coverage[pattern] = _state.pattern_coverage.get(pattern, 0) + 1 - - if fdp.remaining_bytes() < 4: - return - - pattern_func = _PATTERN_DISPATCH[pattern] - - try: - pattern_func(fdp) - - except BuiltinsFuzzError: - _state.findings += 1 - raise - - except (*ALLOWED_EXCEPTIONS, FrozenFluentError): - pass # Expected for invalid inputs / Babel limitations - - except Exception as e: # pylint: disable=broad-exception-caught - error_key = f"{type(e).__name__}_{str(e)[:30]}" - _state.error_counts[error_key] = _state.error_counts.get(error_key, 0) + 1 - - finally: - # Semantic interestingness: multi-locale, edge values, fuzzed patterns, - # or wall-time > 1ms (12x P99) indicating unusual code path - is_interesting = pattern in ( - "cross_locale_consistency", "currency_cross_locale", - "number_edges", "number_type_variety", "custom_pattern", - ) or (time.perf_counter() - start_time) * 1000 > 1.0 - record_iteration_metrics( - _state, pattern, start_time, data, is_interesting=is_interesting, - ) - - if _state.iterations % GC_INTERVAL == 0: - gc.collect() - - if _state.iterations % 100 == 0: - record_memory(_state) - - -def main() -> None: - """Run the builtins fuzzer with CLI support.""" - parser = argparse.ArgumentParser( - description="Built-in function boundary fuzzer using Atheris/libFuzzer", - epilog="All unrecognized arguments are passed to libFuzzer.", - ) - parser.add_argument( - "--checkpoint-interval", type=int, default=500, - help="Emit report every N iterations (default: 500)", - ) - parser.add_argument( - "--seed-corpus-size", type=int, default=500, - help="Maximum size of in-memory seed corpus (default: 500)", - ) - - args, remaining = parser.parse_known_args() - _state.checkpoint_interval = args.checkpoint_interval - _state.seed_corpus_max_size = args.seed_corpus_size - - if not any(arg.startswith("-rss_limit_mb") for arg in remaining): - remaining.append("-rss_limit_mb=4096") - - sys.argv = [sys.argv[0], *remaining] - - print_fuzzer_banner( - title="Built-in Function Boundary Fuzzer (Atheris)", - target="NUMBER, DATETIME, CURRENCY (Babel boundary)", - state=_state, - schedule_len=len(_PATTERN_SCHEDULE), - ) - - run_fuzzer(_state, test_one_input=test_one_input) - +from fuzz_builtins_entry import main if __name__ == "__main__": main() diff --git a/fuzz_atheris/fuzz_builtins_entry.py b/fuzz_atheris/fuzz_builtins_entry.py new file mode 100644 index 00000000..2d3fca9c --- /dev/null +++ b/fuzz_atheris/fuzz_builtins_entry.py @@ -0,0 +1,148 @@ +from __future__ import annotations + +import argparse +import gc +import sys +import time +from typing import Any + +import atheris +from fuzz_builtins_patterns_currency import ( + _pattern_cross_locale_consistency, + _pattern_currency_codes, + _pattern_currency_cross_locale, + _pattern_currency_precision, + _pattern_custom_pattern, + _pattern_error_paths, +) +from fuzz_builtins_patterns_datetime import ( + _pattern_datetime_edges, + _pattern_datetime_styles, + _pattern_datetime_timezone_stress, +) +from fuzz_builtins_patterns_number import ( + _pattern_number_basic, + _pattern_number_edges, + _pattern_number_precision, + _pattern_number_type_variety, +) +from fuzz_builtins_support import ( + _PATTERN_SCHEDULE, + ALLOWED_EXCEPTIONS, + BuiltinsFuzzError, + _emit_checkpoint, + _state, +) +from fuzz_common import ( + GC_INTERVAL, + get_process, + print_fuzzer_banner, + record_iteration_metrics, + record_memory, + run_fuzzer, + select_pattern_round_robin, +) + +from ftllexengine.diagnostics import FrozenFluentError + +_PATTERN_DISPATCH: dict[str, Any] = { + "number_basic": _pattern_number_basic, + "number_precision": _pattern_number_precision, + "number_edges": _pattern_number_edges, + "number_type_variety": _pattern_number_type_variety, + "datetime_styles": _pattern_datetime_styles, + "datetime_edges": _pattern_datetime_edges, + "datetime_timezone_stress": _pattern_datetime_timezone_stress, + "currency_codes": _pattern_currency_codes, + "currency_precision": _pattern_currency_precision, + "currency_cross_locale": _pattern_currency_cross_locale, + "custom_pattern": _pattern_custom_pattern, + "cross_locale_consistency": _pattern_cross_locale_consistency, + "error_paths": _pattern_error_paths, +} + +def test_one_input(data: bytes) -> None: + """Atheris entry point: Test built-in formatting functions.""" + if _state.iterations == 0: + _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) + + _state.iterations += 1 + _state.status = "running" + + if _state.iterations % _state.checkpoint_interval == 0: + _emit_checkpoint() + + start_time = time.perf_counter() + fdp = atheris.FuzzedDataProvider(data) + + pattern = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) + _state.pattern_coverage[pattern] = _state.pattern_coverage.get(pattern, 0) + 1 + + if fdp.remaining_bytes() < 4: + return + + pattern_func = _PATTERN_DISPATCH[pattern] + + try: + pattern_func(fdp) + + except BuiltinsFuzzError: + _state.findings += 1 + raise + + except (*ALLOWED_EXCEPTIONS, FrozenFluentError): + pass # Expected for invalid inputs / Babel limitations + + except Exception as e: # pylint: disable=broad-exception-caught + error_key = f"{type(e).__name__}_{str(e)[:30]}" + _state.error_counts[error_key] = _state.error_counts.get(error_key, 0) + 1 + + finally: + # Semantic interestingness: multi-locale, edge values, fuzzed patterns, + # or wall-time > 1ms (12x P99) indicating unusual code path + is_interesting = pattern in ( + "cross_locale_consistency", "currency_cross_locale", + "number_edges", "number_type_variety", "custom_pattern", + ) or (time.perf_counter() - start_time) * 1000 > 1.0 + record_iteration_metrics( + _state, pattern, start_time, data, is_interesting=is_interesting, + ) + + if _state.iterations % GC_INTERVAL == 0: + gc.collect() + + if _state.iterations % 100 == 0: + record_memory(_state) + +def main() -> None: + """Run the builtins fuzzer with CLI support.""" + parser = argparse.ArgumentParser( + description="Built-in function boundary fuzzer using Atheris/libFuzzer", + epilog="All unrecognized arguments are passed to libFuzzer.", + ) + parser.add_argument( + "--checkpoint-interval", type=int, default=500, + help="Emit report every N iterations (default: 500)", + ) + parser.add_argument( + "--seed-corpus-size", type=int, default=500, + help="Maximum size of in-memory seed corpus (default: 500)", + ) + + args, remaining = parser.parse_known_args() + _state.checkpoint_interval = args.checkpoint_interval + _state.seed_corpus_max_size = args.seed_corpus_size + + if not any(arg.startswith("-rss_limit_mb") for arg in remaining): + remaining.append("-rss_limit_mb=4096") + + sys.argv = [sys.argv[0], *remaining] + + print_fuzzer_banner( + title="Built-in Function Boundary Fuzzer (Atheris)", + target="NUMBER, DATETIME, CURRENCY (Babel boundary)", + state=_state, + schedule_len=len(_PATTERN_SCHEDULE), + ) + + run_fuzzer(_state, test_one_input=test_one_input) diff --git a/fuzz_atheris/fuzz_builtins_patterns_currency.py b/fuzz_atheris/fuzz_builtins_patterns_currency.py new file mode 100644 index 00000000..8ec7c24a --- /dev/null +++ b/fuzz_atheris/fuzz_builtins_patterns_currency.py @@ -0,0 +1,294 @@ +from __future__ import annotations + +from datetime import UTC, datetime +from decimal import ROUND_HALF_EVEN, Decimal, InvalidOperation +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + import atheris +from fuzz_builtins_support import ( + _CURRENCY_DISPLAY_MODES, + _LOCALES, + _MAX_TIMESTAMP, + _PRECISION_NUMBERS, + _VALID_ISO_CURRENCIES, + BuiltinsFuzzError, + _domain, + _extract_oracle_digits, + _make_decimal, + _pick_locale, + _values_match, +) + +from ftllexengine.core.value_types import FluentNumber +from ftllexengine.runtime.functions import ( + currency_format, + datetime_format, + number_format, +) + + +def _pattern_currency_codes(fdp: atheris.FuzzedDataProvider) -> None: + """CURRENCY with valid/invalid ISO codes and display modes.""" + locale = _pick_locale(fdp) + val = _make_decimal(fdp) + + # 80% valid ISO code, 20% fuzzed + if fdp.ConsumeIntInRange(0, 4) < 4: + currency = fdp.PickValueInList(list(_VALID_ISO_CURRENCIES)) + else: + currency = fdp.ConsumeUnicodeNoSurrogates(3).upper() + + display = fdp.PickValueInList(list(_CURRENCY_DISPLAY_MODES)) + + _domain.currency_calls += 1 + result = currency_format( + val, locale, + currency=currency, + currency_display=display, + ) + + # Invariant: result must be FluentNumber + if not isinstance(result, FluentNumber): + msg = f"currency_format returned {type(result).__name__}" + raise BuiltinsFuzzError(msg) + +def _pattern_currency_precision(fdp: atheris.FuzzedDataProvider) -> None: + """Currency-specific decimal digits: JPY=0, BHD=3, EUR/USD=2.""" + locale = _pick_locale(fdp) + + # Currencies with known decimal digits + currency_decimals = { + "JPY": 0, "KRW": 0, # 0 decimals + "USD": 2, "EUR": 2, # 2 decimals + "BHD": 3, "KWD": 3, # 3 decimals + } + + currency = fdp.PickValueInList(list(currency_decimals.keys())) + val = fdp.PickValueInList(list(_PRECISION_NUMBERS)) + + _domain.currency_calls += 1 + _domain.precision_checks += 1 + result = currency_format( + val, locale, + currency=currency, + currency_display="code", + ) + + # Invariant: precision must be non-negative + if isinstance(result, FluentNumber) and result.precision is not None and result.precision < 0: + _domain.precision_violations += 1 + msg = ( + f"Negative precision {result.precision} for " + f"currency={currency}, val={val}" + ) + raise BuiltinsFuzzError(msg) + + # Rounding oracle: verify ROUND_HALF_EVEN for known currency decimal counts. + # Babel's decimal_quantization=True applies ROUND_HALF_EVEN by default. + # _extract_oracle_digits handles locale-specific separators and skips + # non-ASCII-digit locales. NaN guard is explicit: Decimal.quantize() silently + # propagates quiet NaN (returns Decimal('NaN')) instead of raising + # InvalidOperation. Only Infinity raises. Without is_nan(), the oracle fires + # a false violation when Babel formats NaN differently from str(Decimal('NaN')). + if isinstance(result, FluentNumber) and result.precision is not None: + val_d = result.value + if isinstance(val_d, Decimal) and not val_d.is_nan(): + expected_prec = currency_decimals[currency] + quantizer = Decimal(10) ** -expected_prec + try: + expected = abs(val_d).quantize(quantizer, rounding=ROUND_HALF_EVEN) + except InvalidOperation: + pass # Infinity: skip oracle + else: + digits_only = _extract_oracle_digits(result.formatted, locale) + if digits_only is not None: + _domain.rounding_oracle_checks += 1 + if digits_only != str(expected): + _domain.rounding_oracle_violations += 1 + msg = ( + f"Currency rounding oracle: got {digits_only!r}, " + f"expected {str(expected)!r} " + f"for currency={currency}, val={val_d}, locale={locale}" + ) + raise BuiltinsFuzzError(msg) + +def _pattern_currency_cross_locale(fdp: atheris.FuzzedDataProvider) -> None: + """Same currency amount formatted across multiple locales. + + Verifies that the same value + currency code produces valid output + in every locale, and that the FluentNumber.value is preserved. + """ + val = fdp.PickValueInList(list(_PRECISION_NUMBERS)) + currency = fdp.PickValueInList(list(_VALID_ISO_CURRENCIES)) + display = fdp.PickValueInList(list(_CURRENCY_DISPLAY_MODES)) + + results: list[FluentNumber] = [] + num_locales = fdp.ConsumeIntInRange(3, 6) + locales_to_test = [ + fdp.PickValueInList(list(_LOCALES)) for _ in range(num_locales) + ] + + _domain.cross_locale_tests += 1 + for locale in locales_to_test: + _domain.currency_calls += 1 + result = currency_format( + val, locale, + currency=currency, + currency_display=display, + ) + if isinstance(result, FluentNumber): + results.append(result) + + # Invariant: all results should have the same underlying numeric value + if len(results) >= 2: + first_val = results[0].value + for r in results[1:]: + if not _values_match(r.value, first_val): + msg = ( + f"Currency value drift: {first_val} vs {r.value} " + f"for {currency} across locales" + ) + raise BuiltinsFuzzError(msg) + +def _pattern_custom_pattern(fdp: atheris.FuzzedDataProvider) -> None: + """Custom Babel patterns for NUMBER, DATETIME, CURRENCY.""" + locale = _pick_locale(fdp) + target = fdp.ConsumeIntInRange(0, 2) + _domain.custom_pattern_tests += 1 + + # Mix of valid and fuzzed patterns + number_patterns = [ + "#,##0.00", "#,##0", "0.###", "#,##0.00;(#,##0.00)", + "0.0", "#", "##0.00%", + ] + date_patterns = [ + "yyyy-MM-dd", "dd/MM/yyyy", "MMMM d, yyyy", + "HH:mm:ss", "EEE, d MMM yyyy", + ] + + match target: + case 0: # NUMBER with pattern + if fdp.ConsumeBool(): + pattern = fdp.PickValueInList(number_patterns) + else: + pattern = fdp.ConsumeUnicodeNoSurrogates(20) + _domain.number_calls += 1 + number_format( + _make_decimal(fdp), locale, + pattern=pattern, + ) + case 1: # DATETIME with pattern + timestamp = abs(fdp.ConsumeFloat()) % _MAX_TIMESTAMP + try: + dt = datetime.fromtimestamp(timestamp, tz=UTC) + except (OSError, OverflowError, ValueError): + return + if fdp.ConsumeBool(): + pattern = fdp.PickValueInList(date_patterns) + else: + pattern = fdp.ConsumeUnicodeNoSurrogates(20) + _domain.datetime_calls += 1 + datetime_format(dt, locale, pattern=pattern) + case _: # CURRENCY with pattern + if fdp.ConsumeBool(): + pattern = fdp.PickValueInList(number_patterns) + else: + pattern = fdp.ConsumeUnicodeNoSurrogates(20) + _domain.currency_calls += 1 + currency_format( + _make_decimal(fdp), locale, + currency=fdp.PickValueInList(list(_VALID_ISO_CURRENCIES)), + pattern=pattern, + ) + +def _pattern_cross_locale_consistency(fdp: atheris.FuzzedDataProvider) -> None: + """Same numeric value formatted across multiple locales. + + Verifies all locales produce a non-empty result and that the + underlying FluentNumber.value is preserved across locales. + """ + val = _make_decimal(fdp) + min_frac = fdp.ConsumeIntInRange(0, 4) + max_frac = fdp.ConsumeIntInRange(0, 8) # Independent: allows min > max (clamp path) + + _domain.cross_locale_tests += 1 + num_locales = fdp.ConsumeIntInRange(3, 8) + locales_to_test = [ + fdp.PickValueInList(list(_LOCALES)) for _ in range(num_locales) + ] + + results: list[FluentNumber] = [] + for locale in locales_to_test: + _domain.number_calls += 1 + result = number_format( + val, locale, + minimum_fraction_digits=min_frac, + maximum_fraction_digits=max_frac, + ) + if isinstance(result, FluentNumber): + results.append(result) + if not str(result): + _domain.cross_locale_empty_results += 1 + + # Invariant: all results should preserve the same underlying value + if len(results) >= 2: + first_val = results[0].value + for r in results[1:]: + if not _values_match(r.value, first_val): + msg = ( + f"Value drift across locales: {first_val} vs {r.value} " + f"for input {val}" + ) + raise BuiltinsFuzzError(msg) + +def _pattern_error_paths(fdp: atheris.FuzzedDataProvider) -> None: + """Invalid inputs, type mismatches, boundary violations.""" + locale = _pick_locale(fdp) + error_case = fdp.ConsumeIntInRange(0, 4) + + match error_case: + case 0: + # Invalid fraction digits (negative) + _domain.number_calls += 1 + number_format( + Decimal("1.5"), locale, + minimum_fraction_digits=-1, + maximum_fraction_digits=fdp.ConsumeIntInRange(-5, 5), + ) + case 1: + # Very large fraction digits + _domain.number_calls += 1 + number_format( + Decimal("1.5"), locale, + minimum_fraction_digits=fdp.ConsumeIntInRange(50, 200), + maximum_fraction_digits=fdp.ConsumeIntInRange(50, 200), + ) + case 2: + # Empty currency code + _domain.currency_calls += 1 + currency_format( + Decimal(100), locale, + currency="", + ) + case 3: + # Invalid currency code (too long / too short) + bad_code = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 50)) + _domain.currency_calls += 1 + currency_format( + Decimal(100), locale, + currency=bad_code, + ) + case _: + # Fuzzed date style strings + timestamp = abs(fdp.ConsumeFloat()) % _MAX_TIMESTAMP + try: + dt = datetime.fromtimestamp(timestamp, tz=UTC) + except (OSError, OverflowError, ValueError): + return + _domain.datetime_calls += 1 + datetime_format( + dt, locale, + date_style=fdp.ConsumeUnicodeNoSurrogates(10), + time_style=fdp.ConsumeUnicodeNoSurrogates(10) if fdp.ConsumeBool() else None, + ) diff --git a/fuzz_atheris/fuzz_builtins_patterns_datetime.py b/fuzz_atheris/fuzz_builtins_patterns_datetime.py new file mode 100644 index 00000000..20c5997c --- /dev/null +++ b/fuzz_atheris/fuzz_builtins_patterns_datetime.py @@ -0,0 +1,123 @@ +from __future__ import annotations + +from datetime import UTC, datetime, timedelta, timezone +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + import atheris +from fuzz_builtins_support import ( + _DATE_STYLES, + _MAX_TIMESTAMP, + BuiltinsFuzzError, + _domain, + _pick_locale, +) + +from ftllexengine.runtime.functions import datetime_format + + +def _pattern_datetime_styles(fdp: atheris.FuzzedDataProvider) -> None: + """DATETIME with all style combinations.""" + locale = _pick_locale(fdp) + # Safe timestamp range + timestamp = fdp.ConsumeFloat() % _MAX_TIMESTAMP + if timestamp < 0: + timestamp = abs(timestamp) + + try: + dt = datetime.fromtimestamp(timestamp, tz=UTC) + except (OSError, OverflowError, ValueError): + return + + date_style = fdp.PickValueInList(list(_DATE_STYLES)) + use_time = fdp.ConsumeBool() + time_style = fdp.PickValueInList(list(_DATE_STYLES)) if use_time else None + + _domain.datetime_calls += 1 + result = datetime_format( + dt, locale, + date_style=date_style, + time_style=time_style, + ) + + # Invariant: result must be non-empty string + if not isinstance(result, str) or not result: + msg = ( + f"datetime_format returned empty/non-str: {result!r} " + f"for locale={locale}, date_style={date_style}" + ) + raise BuiltinsFuzzError(msg) + +def _pattern_datetime_edges(fdp: atheris.FuzzedDataProvider) -> None: + """Edge timestamps and timezone variations.""" + locale = _pick_locale(fdp) + + # Edge timestamps + edge_timestamps = [ + 0.0, # Unix epoch + 86400.0, # One day + -86400.0, # Before epoch + 946684800.0, # Y2K + _MAX_TIMESTAMP, # Max safe + ] + timestamp = fdp.PickValueInList(edge_timestamps) + + try: + dt = datetime.fromtimestamp(timestamp, tz=UTC) + except (OSError, OverflowError, ValueError): + return + + # Test with different timezone offsets + if fdp.ConsumeBool(): + offset_hours = fdp.ConsumeIntInRange(-12, 14) + tz = timezone(timedelta(hours=offset_hours)) + dt = dt.astimezone(tz) + + _domain.datetime_calls += 1 + datetime_format( + dt, locale, + date_style=fdp.PickValueInList(list(_DATE_STYLES)), + time_style=fdp.PickValueInList(list(_DATE_STYLES)) if fdp.ConsumeBool() else None, + ) + +def _pattern_datetime_timezone_stress(fdp: atheris.FuzzedDataProvider) -> None: + """Stress-test timezone handling with extreme offsets and DST boundaries. + + Tests the DATETIME function with timezone offsets at the edges of + the valid range, timestamps near DST transitions, and unusual + UTC offset values. + """ + locale = _pick_locale(fdp) + + # Base timestamp: mix of safe values and edge cases + base_timestamps = [ + 0.0, # Epoch + 1647302400.0, # March 2022 (DST transition period) + 1667091600.0, # Nov 2022 (DST fall-back period) + 946684800.0, # Y2K + 1704067200.0, # 2024-01-01 + 86400.0 * 365, # One year + ] + timestamp = fdp.PickValueInList(base_timestamps) + + # Add fuzzed offset to push near boundaries + offset_seconds = fdp.ConsumeIntInRange(-43200, 43200) + + try: + # Create with extreme timezone offset (±12h in 15min increments) + offset_minutes = fdp.ConsumeIntInRange(-720, 840) + tz = timezone(timedelta(minutes=offset_minutes)) + dt = datetime.fromtimestamp(timestamp + offset_seconds, tz=tz) + except (OSError, OverflowError, ValueError): + return + + _domain.datetime_calls += 1 + result = datetime_format( + dt, locale, + date_style=fdp.PickValueInList(list(_DATE_STYLES)), + time_style=fdp.PickValueInList(list(_DATE_STYLES)) if fdp.ConsumeBool() else None, + ) + + if not isinstance(result, str) or not result: + msg = f"datetime_format returned empty for tz offset {offset_minutes}min" + raise BuiltinsFuzzError(msg) diff --git a/fuzz_atheris/fuzz_builtins_patterns_number.py b/fuzz_atheris/fuzz_builtins_patterns_number.py new file mode 100644 index 00000000..e1300ff7 --- /dev/null +++ b/fuzz_atheris/fuzz_builtins_patterns_number.py @@ -0,0 +1,173 @@ +from __future__ import annotations + +from decimal import ROUND_HALF_EVEN, Decimal, InvalidOperation +from math import isinf, isnan +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + import atheris +from fuzz_builtins_support import ( + _EDGE_FLOATS, + _PRECISION_NUMBERS, + BuiltinsFuzzError, + _domain, + _extract_oracle_digits, + _make_decimal, + _pick_locale, +) + +from ftllexengine.core.value_types import FluentNumber +from ftllexengine.runtime.functions import number_format + + +def _pattern_number_basic(fdp: atheris.FuzzedDataProvider) -> None: + """NUMBER with varied fraction digits, grouping, and locales.""" + locale = _pick_locale(fdp) + val = _make_decimal(fdp) + min_frac = fdp.ConsumeIntInRange(0, 10) + max_frac = fdp.ConsumeIntInRange(0, 20) # Independent: allows min > max (clamp path) + grouping = fdp.ConsumeBool() + + _domain.number_calls += 1 + result = number_format( + val, locale, + minimum_fraction_digits=min_frac, + maximum_fraction_digits=max_frac, + use_grouping=grouping, + ) + + # Invariant: result must be FluentNumber + if not isinstance(result, FluentNumber): + msg = f"number_format returned {type(result).__name__}, expected FluentNumber" + raise BuiltinsFuzzError(msg) + +def _pattern_number_precision(fdp: atheris.FuzzedDataProvider) -> None: + """Verify FluentNumber precision (CLDR v operand) correctness. + + The v operand is the count of visible fraction digits in the formatted + output. This is critical for plural rule matching. + """ + locale = _pick_locale(fdp) + # Use precision-sensitive numbers + val = ( + fdp.PickValueInList(list(_PRECISION_NUMBERS)) + if fdp.ConsumeBool() + else _make_decimal(fdp) + ) + + min_frac = fdp.ConsumeIntInRange(0, 6) + max_frac = fdp.ConsumeIntInRange(0, 10) # Independent: allows min > max (clamp path) + if min_frac > max_frac: + _domain.min_gt_max_tests += 1 + + _domain.number_calls += 1 + _domain.precision_checks += 1 + result = number_format( + val, locale, + minimum_fraction_digits=min_frac, + maximum_fraction_digits=max_frac, + use_grouping=False, + ) + + # Invariant: precision must be non-negative integer + if not isinstance(result, FluentNumber): + return + if result.precision is not None and result.precision < 0: + _domain.precision_violations += 1 + msg = ( + f"Negative precision {result.precision} for val={val}, " + f"locale={locale}, min={min_frac}, max={max_frac}" + ) + raise BuiltinsFuzzError(msg) + + # Rounding oracle: verify ROUND_HALF_EVEN across all ASCII-digit locales. + # Babel uses decimal_quantization=True by default, which applies ROUND_HALF_EVEN + # (IEEE 754 banker's rounding). _extract_oracle_digits handles locale-specific + # decimal and group separators; returns None for non-ASCII-digit locales (ar-EG). + # NaN guard is explicit: Decimal.quantize() does NOT raise InvalidOperation for + # quiet NaN -- it silently propagates and returns Decimal('NaN'). Only Infinity + # raises InvalidOperation. Without the is_nan() check, the oracle compares + # 'NaN' against whatever Babel emits for NaN input, producing a false violation. + val_d = result.value + if isinstance(val_d, Decimal) and result.precision is not None and not val_d.is_nan(): + prec = result.precision + try: + expected = abs(val_d).quantize(Decimal(10) ** -prec, rounding=ROUND_HALF_EVEN) + except InvalidOperation: + pass # Infinity: skip oracle + else: + digits_only = _extract_oracle_digits(result.formatted, locale) + if digits_only is not None: + _domain.rounding_oracle_checks += 1 + if digits_only != str(expected): + _domain.rounding_oracle_violations += 1 + msg = ( + f"Rounding oracle: got {digits_only!r}, expected {str(expected)!r} " + f"for val={val_d}, locale={locale}, min={min_frac}, max={max_frac}" + ) + raise BuiltinsFuzzError(msg) + +def _pattern_number_edges(fdp: atheris.FuzzedDataProvider) -> None: + """Edge float values: NaN, Inf, -0.0, huge, tiny.""" + locale = _pick_locale(fdp) + val_float = fdp.PickValueInList(list(_EDGE_FLOATS)) + + # Track edge value types + if isnan(val_float): + _domain.edge_nan_count += 1 + elif isinf(val_float): + _domain.edge_inf_count += 1 + elif val_float == 0.0: + _domain.edge_zero_count += 1 + + # Decimal(str(float)) never raises for NaN/Inf: + # float('nan') -> 'nan' -> Decimal('NaN'), float('inf') -> Decimal('Infinity'). + val = Decimal(str(val_float)) + + _domain.number_calls += 1 + number_format( + val, locale, + minimum_fraction_digits=fdp.ConsumeIntInRange(0, 5), + maximum_fraction_digits=fdp.ConsumeIntInRange(0, 10), + use_grouping=fdp.ConsumeBool(), + ) + +def _pattern_number_type_variety(fdp: atheris.FuzzedDataProvider) -> None: + """Test NUMBER with int, float, Decimal, and FluentNumber inputs. + + Verifies type coercion works correctly across all numeric types + that could be passed as FTL variable values. + """ + locale = _pick_locale(fdp) + _domain.type_coercion_tests += 1 + _domain.number_calls += 1 + + input_type = fdp.ConsumeIntInRange(0, 3) + match input_type: + case 0: + # int input + val = Decimal(fdp.ConsumeIntInRange(-999999, 999999)) + case 1: + # float input (via Decimal conversion) + val = _make_decimal(fdp) + case 2: + # Precision-sensitive Decimal + val = fdp.PickValueInList(list(_PRECISION_NUMBERS)) + case _: + # FluentNumber as input (result of previous NUMBER call) + inner = number_format( + Decimal(str(fdp.ConsumeIntInRange(1, 100))), locale, + minimum_fraction_digits=2, + ) + # Format the FluentNumber again (nested call) + val = Decimal(str(inner.value)) if isinstance(inner, FluentNumber) else Decimal(0) + + result = number_format( + val, locale, + minimum_fraction_digits=fdp.ConsumeIntInRange(0, 6), + maximum_fraction_digits=fdp.ConsumeIntInRange(0, 10), + ) + + if not isinstance(result, FluentNumber): + msg = f"number_format returned {type(result).__name__} for {type(val).__name__} input" + raise BuiltinsFuzzError(msg) diff --git a/fuzz_atheris/fuzz_builtins_support.py b/fuzz_atheris/fuzz_builtins_support.py new file mode 100644 index 00000000..29620fa1 --- /dev/null +++ b/fuzz_atheris/fuzz_builtins_support.py @@ -0,0 +1,340 @@ +#!/usr/bin/env python3 +"""Built-in Function Boundary Fuzzer (Atheris). + +Targets: ftllexengine.runtime.functions (NUMBER, DATETIME, CURRENCY) + +Concern boundary: This fuzzer stress-tests the Babel formatting boundary by +calling NUMBER, DATETIME, and CURRENCY functions directly through the Python +API. This is distinct from fuzz_runtime which invokes these functions through +FTL syntax and the resolver stack. Direct API testing isolates the Babel layer +from resolver/cache behavior and enables: +- Fuzz-generated Babel pattern strings (pattern= parameter) +- FluentNumber precision (CLDR v operand) correctness verification +- Currency-specific decimal digit enforcement (JPY=0, BHD=3) +- Type coercion across int/float/Decimal/FluentNumber inputs +- Cross-locale formatting consistency (same value, multiple locales) +- Edge value handling (NaN, Inf, -0.0, extreme magnitudes) + +FunctionRegistry lifecycle, parameter mapping, and locale injection protocol +are covered by fuzz_bridge.py. This fuzzer focuses exclusively on the +formatting output correctness boundary. + +Requires Python 3.13+ (uses PEP 695 type aliases). +""" + +from __future__ import annotations + +import atexit +import logging +import pathlib +import re +from dataclasses import dataclass +from decimal import Decimal, InvalidOperation +from math import isnan +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + from collections.abc import Sequence + +# --- Dependency Checks --- +_psutil_mod: Any = None +_atheris_mod: Any = None + +try: # noqa: SIM105 - need module ref for check_dependencies + import psutil as _psutil_mod # type: ignore[no-redef] +except ImportError: + pass + +try: # noqa: SIM105 - need module ref for check_dependencies + import atheris as _atheris_mod # type: ignore[no-redef] +except ImportError: + pass + +from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 + BaseFuzzerState, + build_base_stats_dict, + build_weighted_schedule, + check_dependencies, + emit_checkpoint_report, + emit_final_report, +) + +check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) + +import atheris # noqa: E402 # pylint: disable=C0412,C0413 + +# --- Domain Metrics --- + +@dataclass +class BuiltinsMetrics: + """Domain-specific metrics for builtins fuzzer.""" + + # Per-function call counts + number_calls: int = 0 + datetime_calls: int = 0 + currency_calls: int = 0 + + # Precision tracking + precision_checks: int = 0 + precision_violations: int = 0 + + # Cross-locale tests + cross_locale_tests: int = 0 + cross_locale_empty_results: int = 0 + + # Type coercion tests + type_coercion_tests: int = 0 + + # Custom pattern tests + custom_pattern_tests: int = 0 + + # Edge value encounters + edge_nan_count: int = 0 + edge_inf_count: int = 0 + edge_zero_count: int = 0 + + # Rounding oracle: ROUND_HALF_EVEN verification (Babel default) + rounding_oracle_checks: int = 0 + rounding_oracle_violations: int = 0 + + # Input domain coverage: min_frac > max_frac cases + min_gt_max_tests: int = 0 + + +# --- Global State --- + +_state = BaseFuzzerState( + seed_corpus_max_size=500, + fuzzer_name="builtins", + fuzzer_target="NUMBER, DATETIME, CURRENCY (Babel boundary)", +) +_domain = BuiltinsMetrics() + +# Pattern weights: (name, weight) - focused on Babel boundary, no bridge overlap +_PATTERN_WEIGHTS: tuple[tuple[str, int], ...] = ( + ("number_basic", 12), + ("number_precision", 15), + ("number_edges", 8), + ("number_type_variety", 8), + ("datetime_styles", 10), + ("datetime_edges", 8), + ("datetime_timezone_stress", 6), + ("currency_codes", 12), + ("currency_precision", 10), + ("currency_cross_locale", 8), + ("custom_pattern", 8), + ("cross_locale_consistency", 8), + ("error_paths", 5), +) + +_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( + [name for name, _ in _PATTERN_WEIGHTS], + [weight for _, weight in _PATTERN_WEIGHTS], +) + +# Register intended weights for skew detection +_state.pattern_intended_weights = {name: float(weight) for name, weight in _PATTERN_WEIGHTS} + + +class BuiltinsFuzzError(Exception): + """Raised when a fuzzer invariant is violated.""" + + +# Allowed exceptions from Babel / formatting functions +ALLOWED_EXCEPTIONS = ( + ValueError, + TypeError, + OverflowError, + InvalidOperation, + OSError, + ArithmeticError, +) + + +# --- Reporting --- + +_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "builtins" + + +def _build_stats_dict() -> dict[str, Any]: + """Build complete stats dictionary including domain metrics.""" + stats = build_base_stats_dict(_state) + + # Per-function call counts + stats["number_calls"] = _domain.number_calls + stats["datetime_calls"] = _domain.datetime_calls + stats["currency_calls"] = _domain.currency_calls + + # Precision tracking + stats["precision_checks"] = _domain.precision_checks + stats["precision_violations"] = _domain.precision_violations + + # Cross-locale + stats["cross_locale_tests"] = _domain.cross_locale_tests + stats["cross_locale_empty_results"] = _domain.cross_locale_empty_results + + # Type coercion + stats["type_coercion_tests"] = _domain.type_coercion_tests + + # Custom patterns + stats["custom_pattern_tests"] = _domain.custom_pattern_tests + + # Edge values + stats["edge_nan_count"] = _domain.edge_nan_count + stats["edge_inf_count"] = _domain.edge_inf_count + stats["edge_zero_count"] = _domain.edge_zero_count + + # Rounding oracle + stats["rounding_oracle_checks"] = _domain.rounding_oracle_checks + stats["rounding_oracle_violations"] = _domain.rounding_oracle_violations + + # Input domain coverage + stats["min_gt_max_tests"] = _domain.min_gt_max_tests + + return stats + + +_REPORT_FILENAME = "fuzz_builtins_report.json" + + +def _emit_checkpoint() -> None: + """Emit periodic checkpoint (uses checkpoint markers).""" + stats = _build_stats_dict() + emit_checkpoint_report( + _state, stats, _REPORT_DIR, _REPORT_FILENAME, + ) + + +def _emit_report() -> None: + """Emit comprehensive final report (crash-proof).""" + stats = _build_stats_dict() + emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) + + +atexit.register(_emit_report) + + +# --- Suppress logging and instrument imports --- +logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) + +with atheris.instrument_imports(include=["ftllexengine"]): + pass + + +# --- Constants --- + +_LOCALES: Sequence[str] = ( + "en-US", "de-DE", "ar-EG", "zh-Hans-CN", "ja-JP", + "lv-LV", "fr-FR", "pt-BR", "hi-IN", "root", +) + +_VALID_ISO_CURRENCIES: Sequence[str] = ( + "USD", "EUR", "GBP", "JPY", "CHF", "CNY", "BRL", + "INR", "KRW", "BHD", "KWD", "OMR", +) + +_CURRENCY_DISPLAY_MODES: Sequence[str] = ("symbol", "code", "name") + +_DATE_STYLES: Sequence[str] = ("short", "medium", "long", "full") + +# Numbers that exercise precision boundary conditions +_PRECISION_NUMBERS: Sequence[Decimal] = ( + Decimal(0), Decimal(1), Decimal("1.0"), Decimal("1.00"), + Decimal("1.5"), Decimal("1.50"), Decimal("0.001"), + Decimal("1234567.89"), Decimal("-1.5"), Decimal("0.10"), + Decimal("999999999.999"), +) + +# Edge float values +_EDGE_FLOATS: Sequence[float] = ( + 0.0, -0.0, 1e-10, 1e10, 1e100, 1e308, + float("inf"), float("-inf"), float("nan"), + -1.0, 0.1, 0.01, 0.001, +) + +# Timestamp boundaries for DATETIME +_MAX_TIMESTAMP = 253402300799.0 # 9999-12-31T23:59:59 UTC + + +# --- Helpers --- + +def _pick_locale(fdp: atheris.FuzzedDataProvider) -> str: + """Pick locale: 90% valid, 10% fuzzed.""" + if fdp.ConsumeIntInRange(0, 9) < 9: + return fdp.PickValueInList(list(_LOCALES)) + return fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(2, 15)) + + +def _make_decimal(fdp: atheris.FuzzedDataProvider) -> Decimal: + """Generate a Decimal from fuzzed float, including NaN and Infinity. + + Decimal(str(float('nan'))) -> Decimal('NaN') and + Decimal(str(float('inf'))) -> Decimal('Infinity') without raising; + no exception handler needed. + """ + return Decimal(str(fdp.ConsumeFloat())) + + +def _values_match(a: object, b: object) -> bool: + """NaN-safe value comparison for cross-locale invariant checks. + + IEEE 754 defines NaN != NaN, so naive != comparison falsely reports + value drift when both sides are NaN. This function treats two NaN + values of the same type as matching. + """ + if isinstance(a, Decimal) and isinstance(b, Decimal) and a.is_nan() and b.is_nan(): + return True + if isinstance(a, float) and isinstance(b, float) and isnan(a) and isnan(b): + return True + return a == b + + +def _extract_oracle_digits(formatted: str, locale: str) -> str | None: + """Extract absolute numeric digits from a formatted string for oracle comparison. + + Uses Babel to look up locale-specific decimal and grouping separators. + Returns None when digit extraction is not possible (non-ASCII digits, + ambiguous separators, or unknown locale). + + The extraction algorithm: + 1. Skip if any digit character is non-ASCII (e.g., ar-EG Arabic-Indic, + hi-IN Devanagari); these cannot be compared against ASCII oracle values. + 2. Look up locale decimal and group symbols via Babel. + 3. Remove group separators (critical for de-DE where group sep is '.'). + 4. Replace decimal separator with ASCII '.'. + 5. Strip all remaining non-digit, non-dot characters (currency codes, + whitespace, signs) via regex. Whitespace-based group separators + (lv-LV, fr-FR thin-space) are handled by this final strip. + """ + # Skip locales where any digit character is non-ASCII. + if any(c.isdigit() and not c.isascii() for c in formatted): + return None + try: + # Deferred import: Babel is optional at ftllexengine package level. + # At fuzzing time Babel is always present (required by the functions + # under test), but the import is deferred to match project conventions. + from babel.numbers import ( + get_decimal_symbol, + get_group_symbol, + ) + # Babel expects underscore-separated locale IDs ('en_US', 'de_DE'); + # ftllexengine uses BCP 47 hyphen-separated codes ('en-US', 'de-DE'). + babel_locale = locale.replace("-", "_") + decimal_sym = get_decimal_symbol(babel_locale) + group_sym = get_group_symbol(babel_locale) + except ValueError: + # Babel raises UnknownLocaleError (ValueError subclass) for invalid locales. + return None + # Guard: ambiguous separators (same symbol for both) cannot be parsed reliably. + if decimal_sym == group_sym: + return None + # Step 1: remove group separators before replacing decimal separator. + # This is critical when group_sym == '.' (e.g., de-DE): removing it first + # prevents '1.234,56' → '1.234.56' (two dots, wrong result). + normalized = formatted.replace(group_sym, "").replace(decimal_sym, ".") + # Step 2: strip all remaining non-digit, non-dot characters (currency codes, + # whitespace, signs). Handles whitespace-variant group seps (lv-LV, fr-FR). + digits = re.sub(r"[^\d.]", "", normalized) + return digits or None + +__all__ = [name for name in globals() if not name.startswith("__")] diff --git a/fuzz_atheris/fuzz_cache.py b/fuzz_atheris/fuzz_cache.py index 8c09ddf2..f72f34a7 100644 --- a/fuzz_atheris/fuzz_cache.py +++ b/fuzz_atheris/fuzz_cache.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: cache - High-pressure Cache Race & Concurrency -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """High-Pressure Cache Race and Integrity Fuzzer (Atheris). Targets: ftllexengine.runtime.cache (via FluentBundle public API) @@ -261,8 +256,9 @@ def _validate_cache_audit_entry( entry: WriteLogEntry, *, last_timestamp: float, -) -> float: - """Validate one audit-log entry and return its timestamp.""" + last_sequence: int, +) -> tuple[float, int]: + """Validate one audit-log entry and return timestamp plus audit sequence.""" if entry.operation not in _VALID_AUDIT_OPERATIONS: msg = f"Unexpected audit operation {entry.operation!r}" raise CacheFuzzError(msg) @@ -275,6 +271,12 @@ def _validate_cache_audit_entry( f"{last_timestamp} -> {entry.timestamp}" ) raise CacheFuzzError(msg) + if entry.sequence <= last_sequence: + msg = ( + "Audit log sequences must be strictly increasing: " + f"{last_sequence} -> {entry.sequence}" + ) + raise CacheFuzzError(msg) # wall_time_unix is a Unix timestamp (time.time()); must be a positive float. # It is the wall-clock companion to the monotonic timestamp field. if not isinstance(entry.wall_time_unix, float): @@ -288,18 +290,21 @@ def _validate_cache_audit_entry( raise CacheFuzzError(msg) if entry.operation == "MISS": - if entry.sequence != 0 or entry.checksum_hex != "": - msg = "MISS audit entries must have sequence=0 and empty checksum" + if entry.checksum_hex != "" or entry.cache_sequence < 0: + msg = ( + "MISS audit entries must have empty checksum and " + "non-negative cache_sequence" + ) raise CacheFuzzError(msg) - return entry.timestamp + return entry.timestamp, entry.sequence - if entry.sequence <= 0 or entry.checksum_hex == "": + if entry.checksum_hex == "" or entry.cache_sequence <= 0: msg = ( - f"{entry.operation} audit entries must carry a positive sequence " + f"{entry.operation} audit entries must carry a positive cache_sequence " "and non-empty checksum" ) raise CacheFuzzError(msg) - return entry.timestamp + return entry.timestamp, entry.sequence def _collect_cache_observability( @@ -354,13 +359,15 @@ def _collect_cache_observability( return last_timestamp = float("-inf") + last_sequence = 0 for entry in audit_log: if not isinstance(entry, WriteLogEntry): msg = "get_cache_audit_log() returned non-WriteLogEntry entries" raise CacheFuzzError(msg) - last_timestamp = _validate_cache_audit_entry( + last_timestamp, last_sequence = _validate_cache_audit_entry( entry, last_timestamp=last_timestamp, + last_sequence=last_sequence, ) diff --git a/fuzz_atheris/fuzz_common.py b/fuzz_atheris/fuzz_common.py index 5ed453c6..e68299cb 100644 --- a/fuzz_atheris/fuzz_common.py +++ b/fuzz_atheris/fuzz_common.py @@ -4,8 +4,8 @@ used by all fuzz targets. Each fuzzer imports from this module and composes domain-specific state alongside BaseFuzzerState. -Not a fuzz target itself -- no FUZZ_PLUGIN header, not discoverable by -fuzz_atheris.sh. +Not a fuzz target itself. Discoverable targets are owned by +fuzz_atheris/targets.tsv and loaded by scripts/fuzz_atheris.sh. """ from __future__ import annotations diff --git a/fuzz_atheris/fuzz_currency.py b/fuzz_atheris/fuzz_currency.py index 7d08ec26..3e063d68 100644 --- a/fuzz_atheris/fuzz_currency.py +++ b/fuzz_atheris/fuzz_currency.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: currency - CURRENCY Function Runtime Formatting (Oracle) -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """CURRENCY Function Runtime Formatting Oracle Fuzzer (Atheris). Targets: ftllexengine.runtime.functions.currency_format diff --git a/fuzz_atheris/fuzz_cursor.py b/fuzz_atheris/fuzz_cursor.py index e3227a9a..6e0d7435 100644 --- a/fuzz_atheris/fuzz_cursor.py +++ b/fuzz_atheris/fuzz_cursor.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: cursor - Cursor, ParseError, and source position utilities -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Cursor and position utility fuzzer (Atheris). Targets: diff --git a/fuzz_atheris/fuzz_dates.py b/fuzz_atheris/fuzz_dates.py index 7a31b83a..590e1453 100644 --- a/fuzz_atheris/fuzz_dates.py +++ b/fuzz_atheris/fuzz_dates.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: dates - Date/Datetime Locale-aware Parsing -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Date and Datetime Locale-aware Parsing Fuzzer (Atheris). Targets: ftllexengine.parsing.dates (parse_date, parse_datetime) diff --git a/fuzz_atheris/fuzz_diagnostics_formatter.py b/fuzz_atheris/fuzz_diagnostics_formatter.py index fb8a425f..f852e147 100644 --- a/fuzz_atheris/fuzz_diagnostics_formatter.py +++ b/fuzz_atheris/fuzz_diagnostics_formatter.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: diagnostics_formatter - DiagnosticFormatter Output & Control-Char Escaping -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """DiagnosticFormatter Fuzzer (Atheris). Targets: ftllexengine.diagnostics.formatter.DiagnosticFormatter diff --git a/fuzz_atheris/fuzz_graph.py b/fuzz_atheris/fuzz_graph.py index 6972adcf..d1ade5a6 100644 --- a/fuzz_atheris/fuzz_graph.py +++ b/fuzz_atheris/fuzz_graph.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: graph - Dependency Graph Algorithms -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Dependency Graph Algorithm Fuzzer (Atheris). Targets: ftllexengine.analysis.graph (detect_cycles, make_cycle_key, diff --git a/fuzz_atheris/fuzz_integrity.py b/fuzz_atheris/fuzz_integrity.py index c36310ec..62a7bae1 100644 --- a/fuzz_atheris/fuzz_integrity.py +++ b/fuzz_atheris/fuzz_integrity.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: integrity - Semantic Validation and Data Integrity -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Semantic Validation and Data Integrity Fuzzer (Atheris). Targets: @@ -59,19 +54,24 @@ _psutil_mod: Any = None _atheris_mod: Any = None -try: # noqa: SIM105 - captures module for check_dependencies - import psutil as _psutil_mod # type: ignore[no-redef] +try: + import psutil as _psutil_import except ImportError: pass +else: + _psutil_mod = _psutil_import -try: # noqa: SIM105 - captures module for check_dependencies - import atheris as _atheris_mod # type: ignore[no-redef] +try: + import atheris as _atheris_import except ImportError: pass +else: + _atheris_mod = _atheris_import from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 GC_INTERVAL, BaseFuzzerState, + FuzzStats, build_base_stats_dict, build_weighted_schedule, check_dependencies, @@ -89,9 +89,6 @@ import atheris # noqa: E402 # pylint: disable=C0412,C0413 -# --- Type Aliases (PEP 695) --- -type FuzzStats = dict[str, int | str | float] - # --- Suppress logging and instrument imports --- logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) diff --git a/fuzz_atheris/fuzz_introspection.py b/fuzz_atheris/fuzz_introspection.py index 325ba197..bdf8f2d6 100644 --- a/fuzz_atheris/fuzz_introspection.py +++ b/fuzz_atheris/fuzz_introspection.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: introspection - MessageIntrospection Visitor & Reference Extraction -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """MessageIntrospection Visitor and Reference Extraction Fuzzer (Atheris). Targets: ftllexengine.introspection.message (IntrospectionVisitor, diff --git a/fuzz_atheris/fuzz_iso.py b/fuzz_atheris/fuzz_iso.py index a9296064..351b3fc8 100644 --- a/fuzz_atheris/fuzz_iso.py +++ b/fuzz_atheris/fuzz_iso.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: iso - ISO 3166/4217 Introspection -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """ISO Introspection Fuzzer (Atheris). Targets ISO 3166-1 territory and ISO 4217 currency introspection APIs. diff --git a/fuzz_atheris/fuzz_locale_context.py b/fuzz_atheris/fuzz_locale_context.py index c1d6d80d..a4ffa6f2 100644 --- a/fuzz_atheris/fuzz_locale_context.py +++ b/fuzz_atheris/fuzz_locale_context.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: locale_context - LocaleContext Direct Formatting API -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """LocaleContext Direct Formatting API Fuzzer (Atheris). Targets: ftllexengine.runtime.locale_context.LocaleContext diff --git a/fuzz_atheris/fuzz_localization.py b/fuzz_atheris/fuzz_localization.py index 0e0df2d4..9df6c099 100644 --- a/fuzz_atheris/fuzz_localization.py +++ b/fuzz_atheris/fuzz_localization.py @@ -1,2244 +1,9 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: localization - FluentLocalization Multi-locale Orchestration -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END -"""FluentLocalization Multi-locale Orchestration Fuzzer (Atheris). - -Targets: ftllexengine.localization.orchestrator (FluentLocalization) - -Concern boundary: This fuzzer stress-tests the multi-locale orchestration layer. -FluentLocalization is the second top-level public API (alongside FluentBundle) and -has a completely distinct lifecycle: constructor locale boundary validation, -multi-locale fallback chains, RWLock-protected add_resource/add_function after -construction, lazy bundle creation for fallback locales, load summary tracking, -and on_fallback callback dispatch. fuzz_runtime covers FluentBundle only -- -zero FluentLocalization code paths are exercised by any other fuzzer. - -Unique coverage (not covered by other fuzzers): -- FluentLocalization constructor locale boundary validation, canonicalization/dedup rules, - and rejection contracts -- format_pattern fallback chain traversal across 2-5 locales -- add_resource() with RWLock write acquisition after initial construction -- Lazy bundle creation for fallback locales (_get_or_create_bundle) -- has_message()/has_attribute() cross-locale scan -- get_message()/get_term() AST lookup across fallback locales -- require_clean() boot validation over loader-backed LoadSummary state -- validate_message_variables() single-message integrity validation across fallback chains -- validate_message_schemas() exact-schema enforcement across fallback chains -- get_message_ids() aggregation across all locale bundles -- get_message_variables() / introspect_message() localization facade -- get_cache_audit_log() per-locale audit visibility without raw cache access -- on_fallback callback invocation and FallbackInfo contract -- validate_resource() via localization facade -- add_function() custom function registration and application to late bundles -- Strict/non-strict mode propagation to each per-locale bundle -- resource_loader + PathResourceLoader eager initialization path -- LoadSummary aggregation (success, not_found, error, junk) -- loader-backed source_path and path-validation error plumbing - -Patterns (24): -- single_locale_add_resource: 1 locale, add_resource, format -- multi_locale_fallback: 2 locales, message only in fallback locale -- chain_of_3_fallback: 3-locale chain, message in various positions -- format_value_missing: format non-existent message (fallback contract) -- format_with_variables: format message with variable args -- add_resource_mutation: add_resource after initial creation, re-format -- has_message_api: has_message/has_attribute contract verification -- ast_lookup_api: get_message/get_term precedence and namespace separation -- get_message_ids_api: get_message_ids deduplication and coverage -- validate_resource_api: validate_resource via localization facade -- validate_message_variables_api: single-message exact-schema validation and integrity errors -- validate_message_schemas_api: exact schema validation success/failure paths -- add_function_custom: add_function + FTL that calls custom function -- introspect_api: introspect_message/get_message_variables contracts -- cache_audit_api: per-locale cache audit accessor and aggregation -- locale_boundary_api: constructor locale canonicalization/dedup and rejection contracts -- on_fallback_callback: on_fallback callback fires on locale miss -- loader_init_success: eager load via PathResourceLoader succeeds for all locales -- loader_not_found_fallback: loader summary tracks primary miss + fallback success -- loader_junk_summary: eager load records Junk entries in LoadSummary -- loader_path_error: invalid resource_id is captured as loader error in summary -- require_clean_api: boot validation raises or returns based on LoadSummary cleanliness -- boot_config_api: LocalizationBootConfig strict-mode boot sequence, boot_simple(), boot() - 3-tuple primary API, required_messages enforcement, and one-shot call enforcement - -Metrics: -- Pattern coverage with weighted round-robin schedule -- Fallback trigger counts, messages found vs missing -- Custom function call tracking -- Performance profiling (min/mean/p95/p99/max) -- Real memory usage (RSS via psutil) - -Requires Python 3.13+ (uses PEP 695 type aliases). -""" +"""FluentLocalization Multi-locale Orchestration Fuzzer (Atheris).""" from __future__ import annotations -import argparse -import atexit -import gc -import logging -import pathlib -import sys -import time -from dataclasses import dataclass -from tempfile import TemporaryDirectory -from typing import TYPE_CHECKING, Any - -if TYPE_CHECKING: - from collections.abc import Sequence - -# --- Dependency Checks --- -_psutil_mod: Any = None -_atheris_mod: Any = None - -try: # noqa: SIM105 - need module ref for check_dependencies - import psutil as _psutil_mod # type: ignore[no-redef] -except ImportError: - pass - -try: # noqa: SIM105 - need module ref for check_dependencies - import atheris as _atheris_mod # type: ignore[no-redef] -except ImportError: - pass - -from fuzz_common import ( # noqa: E402 # pylint: disable=C0413 - GC_INTERVAL, - BaseFuzzerState, - build_base_stats_dict, - build_weighted_schedule, - check_dependencies, - emit_checkpoint_report, - emit_final_report, - gen_ftl_identifier, - gen_ftl_value, - get_process, - print_fuzzer_banner, - record_iteration_metrics, - record_memory, - run_fuzzer, - select_pattern_round_robin, -) - -check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) - -import atheris # noqa: E402 # pylint: disable=C0412,C0413 - -# --- Domain Metrics --- - - -@dataclass -class LocalizationMetrics: - """Domain-specific metrics for localization fuzzer.""" - - fallback_triggered: int = 0 - messages_found: int = 0 - messages_missing: int = 0 - custom_function_calls: int = 0 - add_resource_mutations: int = 0 - has_message_checks: int = 0 - introspect_calls: int = 0 - ast_lookup_checks: int = 0 - validate_calls: int = 0 - message_variable_validation_checks: int = 0 - schema_validation_checks: int = 0 - cache_audit_checks: int = 0 - locale_boundary_checks: int = 0 - loader_init_checks: int = 0 - loader_junk_checks: int = 0 - loader_error_checks: int = 0 - boot_validation_checks: int = 0 - boot_config_checks: int = 0 - - -class LocalizationFuzzError(Exception): - """Raised when an invariant breach is detected.""" - - -# --- Constants --- - -_ALLOWED_EXCEPTIONS = ( - ValueError, # empty locale list, locale not in chain, whitespace - TypeError, # invalid argument types - UnicodeEncodeError, # surrogate characters in FTL source -) - -# Pattern definitions with weights (name, weight) -_PATTERN_WEIGHTS: Sequence[tuple[str, int]] = ( - ("single_locale_add_resource", 10), - ("multi_locale_fallback", 10), - ("chain_of_3_fallback", 8), - ("format_value_missing", 7), - ("format_with_variables", 9), - ("add_resource_mutation", 7), - ("has_message_api", 7), - ("ast_lookup_api", 7), - ("get_message_ids_api", 6), - ("validate_resource_api", 7), - ("validate_message_variables_api", 6), - ("validate_message_schemas_api", 6), - ("add_function_custom", 6), - ("introspect_api", 7), - ("cache_audit_api", 6), - ("locale_boundary_api", 5), - ("on_fallback_callback", 6), - ("loader_init_success", 5), - ("loader_not_found_fallback", 5), - ("loader_junk_summary", 4), - ("loader_path_error", 4), - ("require_clean_api", 5), - ("boot_config_api", 6), -) - -_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( - [name for name, _ in _PATTERN_WEIGHTS], - [weight for _, weight in _PATTERN_WEIGHTS], -) -_PATTERN_INDEX: dict[str, int] = {name: i for i, (name, _) in enumerate(_PATTERN_WEIGHTS)} - -# Test locale sets (ordered by fallback priority) -_LOCALE_PAIRS: Sequence[tuple[str, str]] = ( - ("en-US", "en"), - ("de-DE", "de"), - ("fr-FR", "fr"), - ("ja-JP", "ja"), - ("ar-SA", "ar"), - ("zh-CN", "zh"), - ("ko-KR", "ko"), - ("pt-BR", "pt"), - ("es-ES", "es"), - ("sv-SE", "sv"), -) - -_LOCALE_TRIPLES: Sequence[tuple[str, str, str]] = ( - ("lv", "en-US", "en"), - ("lt", "en-GB", "en"), - ("pl", "de-AT", "de"), - ("uk", "ru-RU", "ru"), - ("zh-TW", "zh-CN", "zh"), -) - -_SINGLE_LOCALES: Sequence[str] = ( - "en-US", - "de-DE", - "fr-FR", - "ja-JP", - "ko-KR", - "ar-SA", - "zh-CN", - "pt-BR", - "es-ES", - "sv-SE", -) -_STRUCTURALLY_INVALID_LOCALES: Sequence[str] = ( - "en/US", - "en US", - "en@US", - "123_US", - "\x00\x01\x02", - "en-US" + "\x00" * 8, - "invalid!!", -) -_NON_STRING_LOCALES: Sequence[object] = ( - None, - 0, - 1.5, - ["en-US"], - {"locale": "en-US"}, -) -_VALID_AUDIT_OPERATIONS: frozenset[str] = frozenset( - { - "MISS", - "PUT", - "HIT", - "EVICT", - "CORRUPTION", - "WRITE_ONCE_IDEMPOTENT", - "WRITE_ONCE_CONFLICT", - } -) - -# --- Module State --- - -_state = BaseFuzzerState( - checkpoint_interval=500, - seed_corpus_max_size=500, - fuzzer_name="localization", - fuzzer_target=( - "FluentLocalization (locale boundary, multi-locale fallback chains, " - "add_resource, format_pattern, introspection)" - ), - pattern_intended_weights={name: float(w) for name, w in _PATTERN_WEIGHTS}, -) -_domain = LocalizationMetrics() - -_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "localization" -_REPORT_FILENAME = "fuzz_localization_report.json" - - -def _build_stats_dict() -> dict[str, Any]: - """Build complete stats dictionary including domain metrics.""" - stats = build_base_stats_dict(_state) - stats["fallback_triggered"] = _domain.fallback_triggered - stats["messages_found"] = _domain.messages_found - stats["messages_missing"] = _domain.messages_missing - stats["custom_function_calls"] = _domain.custom_function_calls - stats["add_resource_mutations"] = _domain.add_resource_mutations - stats["has_message_checks"] = _domain.has_message_checks - stats["introspect_calls"] = _domain.introspect_calls - stats["ast_lookup_checks"] = _domain.ast_lookup_checks - stats["validate_calls"] = _domain.validate_calls - stats["message_variable_validation_checks"] = _domain.message_variable_validation_checks - stats["schema_validation_checks"] = _domain.schema_validation_checks - stats["cache_audit_checks"] = _domain.cache_audit_checks - stats["locale_boundary_checks"] = _domain.locale_boundary_checks - stats["loader_init_checks"] = _domain.loader_init_checks - stats["loader_junk_checks"] = _domain.loader_junk_checks - stats["loader_error_checks"] = _domain.loader_error_checks - stats["boot_validation_checks"] = _domain.boot_validation_checks - stats["boot_config_checks"] = _domain.boot_config_checks - total = _domain.messages_found + _domain.messages_missing - if total > 0: - stats["fallback_hit_ratio"] = round(_domain.fallback_triggered / total, 3) - return stats - - -def _emit_checkpoint() -> None: - """Emit periodic checkpoint (uses checkpoint markers).""" - stats = _build_stats_dict() - emit_checkpoint_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) - - -def _emit_report() -> None: - """Emit crash-proof final report.""" - stats = _build_stats_dict() - emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) - - -atexit.register(_emit_report) - -# --- Suppress logging and instrument imports --- -logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) - -with atheris.instrument_imports(include=["ftllexengine"]): - from ftllexengine import validate_message_variables - from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT - from ftllexengine.core.locale_utils import normalize_locale, require_locale_code - from ftllexengine.diagnostics.errors import FrozenFluentError - from ftllexengine.integrity import ( - DataIntegrityError, - FormattingIntegrityError, - IntegrityCheckFailedError, - SyntaxIntegrityError, - ) - from ftllexengine.localization import ( - CacheAuditLogEntry, - FluentLocalization, - LocalizationBootConfig, - LocalizationCacheStats, - ) - from ftllexengine.localization.loading import FallbackInfo, PathResourceLoader - from ftllexengine.runtime.cache_config import CacheConfig - from ftllexengine.syntax import Message, Term - - -# --- Pattern implementations --- - - -def _write_loader_resource( - root: pathlib.Path, - locale: str, - resource_id: str, - ftl_source: str, -) -> pathlib.Path: - """Write an FTL file for PathResourceLoader-backed tests.""" - locale_dir = root / normalize_locale(locale) - locale_dir.mkdir(parents=True, exist_ok=True) - resource_path = locale_dir / resource_id - resource_path.write_text(ftl_source, encoding="utf-8") - return resource_path - - -def _build_variable_message(message_id: str, variables: tuple[str, ...]) -> str: - """Build a simple message that references the given variable set.""" - placeables = " ".join(f"{{ ${variable} }}" for variable in variables) - return f"{message_id} = {placeables or 'value'}\n" - - -def _assert_integrity_failure( - err: IntegrityCheckFailedError, - *, - operation: str, - message_fragment: str | None = None, - key: str | None = None, - key_fragment: str | None = None, - actual_fragment: str | None = None, -) -> None: - """Validate localization-scoped IntegrityCheckFailedError context.""" - if message_fragment is not None and message_fragment not in str(err): - msg = f"Integrity error message missing {message_fragment!r}: {err!s}" - raise LocalizationFuzzError(msg) - - context = err.context - if context is None: - msg = "IntegrityCheckFailedError missing context" - raise LocalizationFuzzError(msg) - if context.component != "localization": - msg = f"Integrity error component={context.component!r}, expected 'localization'" - raise LocalizationFuzzError(msg) - if context.operation != operation: - msg = f"Integrity error operation={context.operation!r}, expected {operation!r}" - raise LocalizationFuzzError(msg) - if context.expected != "LoadSummary(all_clean=True)" and operation == "require_clean": - msg = f"require_clean context expected field mismatch: {context.expected!r}" - raise LocalizationFuzzError(msg) - if key is not None and context.key != key: - msg = f"Integrity error key={context.key!r}, expected {key!r}" - raise LocalizationFuzzError(msg) - if key_fragment is not None and (context.key is None or key_fragment not in context.key): - msg = f"Integrity error key={context.key!r} missing fragment {key_fragment!r}" - raise LocalizationFuzzError(msg) - if actual_fragment is not None and ( - context.actual is None or actual_fragment not in context.actual - ): - msg = f"Integrity error actual={context.actual!r} missing fragment {actual_fragment!r}" - raise LocalizationFuzzError(msg) - - -def _pattern_single_locale_add_resource( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Single-locale FluentLocalization: add_resource + format round-trip. - - Tests the minimal FluentLocalization configuration: one locale, one - resource added via add_resource(), one format call. Verifies the - basic construction-add-format lifecycle. - """ - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - msg_id = gen_ftl_identifier(fdp) - var = gen_ftl_identifier(fdp) - val = gen_ftl_value(fdp) - ftl = f"{msg_id} = {{ ${var} }}\n" - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, ftl) - - result, errors = l10n.format_pattern(msg_id, {var: val}) - - # Contract: no errors means result must contain the variable value - if not errors and val not in result: - msg = ( - f"Single locale: format_pattern('{msg_id}', {{'{var}': '{val}'}}) " - f"returned '{result}' without errors but value missing" - ) - raise LocalizationFuzzError(msg) - - if not errors: - _domain.messages_found += 1 - else: - _domain.messages_missing += 1 - - -def _pattern_multi_locale_fallback( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Two-locale chain: message present only in fallback locale. - - Tests the core fallback mechanism: the primary locale does NOT have the - message, the fallback locale does. Verifies that format_pattern traverses - the chain and returns the fallback locale's result. - """ - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - msg_id = gen_ftl_identifier(fdp) - val = gen_ftl_value(fdp) - ftl = f"{msg_id} = {val}\n" - - l10n = FluentLocalization([primary, fallback], strict=False) - # Add resource ONLY to fallback locale; primary stays empty - l10n.add_resource(fallback, ftl) - - fallback_seen: list[FallbackInfo] = [] - l10n_with_cb = FluentLocalization( - [primary, fallback], - strict=False, - on_fallback=fallback_seen.append, - ) - l10n_with_cb.add_resource(fallback, ftl) - - _, errors = l10n_with_cb.format_pattern(msg_id) - - if not errors: - _domain.messages_found += 1 - # Fallback callback must have fired (primary locale had no message) - if fallback_seen: - _domain.fallback_triggered += 1 - info = fallback_seen[0] - # Contract: FallbackInfo carries the correct resolved_locale - expected_fallback = normalize_locale(fallback) - if info.resolved_locale != expected_fallback: - msg = ( - "Fallback: expected " - f"resolved_locale='{expected_fallback}', " - f"got '{info.resolved_locale}'" - ) - raise LocalizationFuzzError(msg) - else: - _domain.messages_missing += 1 - - -def _pattern_chain_of_3_fallback( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Three-locale chain: message in a fuzz-chosen position. - - Tests fallback traversal depth. The message can be in locale 0, 1, or 2 - (or nowhere). Verifies the fallback chain visits locales in order. - """ - triple = fdp.PickValueInList(list(_LOCALE_TRIPLES)) - locale_a, locale_b, locale_c = triple - msg_id = gen_ftl_identifier(fdp) - val = gen_ftl_value(fdp) - ftl = f"{msg_id} = {val}\n" - target_locale_idx = fdp.ConsumeIntInRange(0, 3) # 3 = none - - l10n = FluentLocalization([locale_a, locale_b, locale_c], strict=False) - - target_locale = triple[target_locale_idx] if target_locale_idx < 3 else None - if target_locale: - l10n.add_resource(target_locale, ftl) - - result, errors = l10n.format_pattern(msg_id) - - if not errors: - _domain.messages_found += 1 - if target_locale and val in result: - return # Correct - if not target_locale: - # Message was in no locale - result is fallback text, errors expected - pass - else: - _domain.messages_missing += 1 - - -def _pattern_format_value_missing( - fdp: atheris.FuzzedDataProvider, -) -> None: - """format_value/format_pattern with non-existent message returns fallback. - - Tests the missing-message contract: format_pattern with a message ID that - does not exist in any locale must return a non-empty fallback string and - at least one error. strict=False to use soft-error return API. - """ - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - existing_id = gen_ftl_identifier(fdp) - missing_id = f"missing-{gen_ftl_identifier(fdp)}" - existing_ftl = f"{existing_id} = value\n" - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, existing_ftl) - - result, errors = l10n.format_pattern(missing_id) - - # Contract: missing message MUST produce errors and non-empty fallback - if not errors: - msg = f"Missing message '{missing_id}' produced no errors (result='{result}')" - raise LocalizationFuzzError(msg) - if not result: - msg = f"Missing message '{missing_id}' produced empty result with errors" - raise LocalizationFuzzError(msg) - - _domain.messages_missing += 1 - - -def _pattern_format_with_variables( - fdp: atheris.FuzzedDataProvider, -) -> None: - """format_pattern with multiple variable args across two locales. - - Tests that variable substitution works correctly with fallback. - Verifies the args dict propagates into the resolved bundle. - """ - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - msg_id = gen_ftl_identifier(fdp) - var_a = gen_ftl_identifier(fdp) - var_b = f"B-{gen_ftl_identifier(fdp)}" # B: gen_ftl_identifier always starts with a-z - val_a = gen_ftl_value(fdp, max_length=20) - val_b = gen_ftl_value(fdp, max_length=20) - ftl = f"{msg_id} = {{ ${var_a} }} {{ ${var_b} }}\n" - - l10n = FluentLocalization([primary, fallback], strict=False) - l10n.add_resource(fallback, ftl) - - result, errors = l10n.format_pattern(msg_id, {var_a: val_a, var_b: val_b}) - - if not errors: - _domain.messages_found += 1 - if val_a not in result or val_b not in result: - msg = f"Variables not found in result: expected '{val_a}' and '{val_b}', got '{result}'" - raise LocalizationFuzzError(msg) - - -def _pattern_add_resource_mutation( - fdp: atheris.FuzzedDataProvider, -) -> None: - """add_resource after initial format call; re-format sees new resource. - - Tests that RWLock correctly serializes post-construction add_resource - against concurrent format_pattern calls. The resource adds a new message - and the second format_pattern must see it. - """ - _domain.add_resource_mutations += 1 - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - msg_id_a = gen_ftl_identifier(fdp) - msg_id_b = f"B-{gen_ftl_identifier(fdp)}" # B: gen_ftl_identifier always starts with a-z - val_a = gen_ftl_value(fdp) - val_b = gen_ftl_value(fdp) - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, f"{msg_id_a} = {val_a}\n") - - # First format (before mutation) - l10n.format_pattern(msg_id_a) - _, errors_b1 = l10n.format_pattern(msg_id_b) - - # msg_b not yet added - must produce errors - if not errors_b1: - msg = f"Before mutation: '{msg_id_b}' found before add_resource" - raise LocalizationFuzzError(msg) - - # Add second message (mutation) - l10n.add_resource(locale, f"{msg_id_b} = {val_b}\n") - - # Re-format after mutation - result_b2, errors_b2 = l10n.format_pattern(msg_id_b) - - if not errors_b2 and val_b not in result_b2: - msg = f"After mutation: expected '{val_b}' in result, got '{result_b2}'" - raise LocalizationFuzzError(msg) - - -def _pattern_has_message_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """has_message/has_attribute cross-locale scan invariants. - - Tests: if format_pattern succeeds for a message ID, has_message must - return True. If has_message returns False, format_pattern must produce - errors. - """ - _domain.has_message_checks += 1 - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - msg_id = gen_ftl_identifier(fdp) - attr_name = fdp.PickValueInList(["tooltip", "label", "title"]) - val = gen_ftl_value(fdp) - ftl = f"{msg_id} = {val}\n .{attr_name} = hint\n" - - l10n = FluentLocalization([primary, fallback], strict=False) - l10n.add_resource(fallback, ftl) - - has_msg = l10n.has_message(msg_id) - has_attr = l10n.has_attribute(msg_id, attr_name) - has_missing_attr = l10n.has_attribute(msg_id, "nonexistent-attr") - - # Contract: has_message must be True (we added it to fallback) - if not has_msg: - msg = f"has_message('{msg_id}') returned False after add_resource" - raise LocalizationFuzzError(msg) - - # Contract: has_attribute(existing) must be True - if not has_attr: - msg = f"has_attribute('{msg_id}', '{attr_name}') returned False after add_resource" - raise LocalizationFuzzError(msg) - - # Contract: has_attribute(nonexistent) must be False - if has_missing_attr: - msg = f"has_attribute('{msg_id}', 'nonexistent-attr') returned True" - raise LocalizationFuzzError(msg) - - -def _validate_localization_message_lookup( - l10n: FluentLocalization, - message_id: str, - expected_variables: frozenset[str], -) -> None: - """Validate FluentLocalization.get_message() for one identifier.""" - message = l10n.get_message(message_id) - if message is None: - msg = f"get_message('{message_id}') returned None for an existing message" - raise LocalizationFuzzError(msg) - if not isinstance(message, Message): - msg = f"get_message('{message_id}') returned {type(message).__name__}" - raise LocalizationFuzzError(msg) - if message.id.name != message_id: - msg = f"get_message('{message_id}') returned node named '{message.id.name}'" - raise LocalizationFuzzError(msg) - - message_validation = validate_message_variables(message, expected_variables) - if not message_validation.is_valid: - msg = f"validate_message_variables() rejected localization message '{message_id}'" - raise LocalizationFuzzError(msg) - if message_validation.declared_variables != expected_variables: - msg = ( - f"get_message('{message_id}') resolved wrong locale variables: " - f"{message_validation.declared_variables!r} vs {expected_variables!r}" - ) - raise LocalizationFuzzError(msg) - - -def _validate_localization_term_lookup( - l10n: FluentLocalization, - term_id: str, - expected_variables: frozenset[str], -) -> None: - """Validate FluentLocalization.get_term() for one identifier.""" - term = l10n.get_term(term_id) - if term is None: - msg = f"get_term('{term_id}') returned None for an existing term" - raise LocalizationFuzzError(msg) - if not isinstance(term, Term): - msg = f"get_term('{term_id}') returned {type(term).__name__}" - raise LocalizationFuzzError(msg) - if term.id.name != term_id: - msg = f"get_term('{term_id}') returned node named '{term.id.name}'" - raise LocalizationFuzzError(msg) - - term_validation = validate_message_variables(term, expected_variables) - if not term_validation.is_valid: - msg = f"validate_message_variables() rejected localization term '{term_id}'" - raise LocalizationFuzzError(msg) - if term_validation.declared_variables != expected_variables: - msg = ( - f"get_term('{term_id}') resolved wrong locale variables: " - f"{term_validation.declared_variables!r} vs {expected_variables!r}" - ) - raise LocalizationFuzzError(msg) - - -def _pattern_ast_lookup_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """get_message/get_term honor fallback precedence and namespace boundaries.""" - _domain.ast_lookup_checks += 1 - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - msg_id = f"msg-{gen_ftl_identifier(fdp)}" - term_id = f"term-{gen_ftl_identifier(fdp)}" - primary_has_message = fdp.ConsumeBool() - primary_has_term = fdp.ConsumeBool() - - l10n = FluentLocalization([primary, fallback], strict=False) - l10n.add_resource( - fallback, - (f"{msg_id} = {{ $fallbackvar }}\n-{term_id} = {{ $fallbackterm }}\n"), - ) - - primary_parts: list[str] = [] - if primary_has_message: - primary_parts.append(f"{msg_id} = {{ $primaryvar }}\n") - if primary_has_term: - primary_parts.append(f"-{term_id} = {{ $primaryterm }}\n") - if primary_parts: - l10n.add_resource(primary, "".join(primary_parts)) - - expected_message_vars = frozenset({"primaryvar" if primary_has_message else "fallbackvar"}) - _validate_localization_message_lookup(l10n, msg_id, expected_message_vars) - - expected_term_vars = frozenset({"primaryterm" if primary_has_term else "fallbackterm"}) - _validate_localization_term_lookup(l10n, term_id, expected_term_vars) - - if l10n.get_term(f"-{term_id}") is not None: - msg = f"get_term('-{term_id}') bypassed the no-leading-dash contract" - raise LocalizationFuzzError(msg) - if l10n.get_message(term_id) is not None: - msg = f"get_message('{term_id}') crossed the term/message namespace boundary" - raise LocalizationFuzzError(msg) - if l10n.get_term(msg_id) is not None: - msg = f"get_term('{msg_id}') crossed the message/term namespace boundary" - raise LocalizationFuzzError(msg) - if l10n.get_message("__missing_localization_lookup__") is not None: - msg = "get_message() returned a node for a missing localization message" - raise LocalizationFuzzError(msg) - if l10n.get_term("__missing_localization_lookup__") is not None: - msg = "get_term() returned a node for a missing localization term" - raise LocalizationFuzzError(msg) - - -def _pattern_get_message_ids_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """get_message_ids returns superset of added message IDs. - - Tests deduplication: if the same message ID is added to two locales, it - must appear only once in get_message_ids(). Also checks that - get_message_ids() contains every message we added. - """ - locale_a, locale_b = fdp.PickValueInList(list(_LOCALE_PAIRS)) - n = fdp.ConsumeIntInRange(1, 5) - msg_ids = [gen_ftl_identifier(fdp) for _ in range(n)] - - l10n = FluentLocalization([locale_a, locale_b], strict=False) - - # Add same messages to both locales (deduplication test) - for mid in msg_ids: - l10n.add_resource(locale_a, f"{mid} = value-a\n") - l10n.add_resource(locale_b, f"{mid} = value-b\n") - - all_ids = l10n.get_message_ids() - all_ids_set = set(all_ids) - - # Contract: every added message ID must appear - for mid in msg_ids: - if mid not in all_ids_set: - msg = f"get_message_ids(): missing '{mid}' after add_resource" - raise LocalizationFuzzError(msg) - - # Contract: no duplicates - if len(all_ids) != len(all_ids_set): - msg = f"get_message_ids(): duplicates found: {sorted(all_ids)}" - raise LocalizationFuzzError(msg) - - -def _pattern_validate_resource_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """validate_resource via FluentLocalization facade. - - Tests that validate_resource returns a ValidationResult and that - its errors/warnings attributes are sequences (never crashes, never - returns None, always returns a structured result). - """ - _domain.validate_calls += 1 - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - ftl_choice = fdp.ConsumeIntInRange(0, 5) - - match ftl_choice: - case 0: - ftl = f"{gen_ftl_identifier(fdp)} = valid message\n" - case 1: - ftl = "invalid = { $x -> [one] singular *[other] plural }\n" - case 2: - ftl = "" # Empty - case 3: - ftl = "# Just a comment\n" - case 4: - # Duplicate message ID - mid = gen_ftl_identifier(fdp) - ftl = f"{mid} = first\n{mid} = second\n" - case _: - ftl = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 200)) - - l10n = FluentLocalization([locale], strict=False) - result = l10n.validate_resource(ftl) - - # Contract: validate_resource always returns a structured result - if result is None: - msg = "validate_resource returned None" - raise LocalizationFuzzError(msg) - - # Contract: errors and warnings are tuples/sequences - if not hasattr(result, "errors") or not hasattr(result, "warnings"): - msg = "validate_resource result missing errors/warnings" - raise LocalizationFuzzError(msg) - - -def _check_message_schema_exact_success( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Exact schemas succeed and preserve input order.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - message_count = fdp.ConsumeIntInRange(1, 3) - expected_schemas: dict[str, frozenset[str] | set[str]] = {} - resource_parts: list[str] = [] - - for index in range(message_count): - message_id = f"schema-{index}-{gen_ftl_identifier(fdp)}" - variable_count = fdp.ConsumeIntInRange(1, 2) - variables = tuple( - f"var{index}_{slot}_{gen_ftl_identifier(fdp)}" for slot in range(variable_count) - ) - expected = frozenset(variables) if fdp.ConsumeBool() else set(variables) - expected_schemas[message_id] = expected - resource_parts.append(_build_variable_message(message_id, variables)) - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, "".join(resource_parts)) - try: - results = l10n.validate_message_schemas(expected_schemas) - except IntegrityCheckFailedError as err: - msg = f"validate_message_schemas() raised on exact schemas: {err}" - raise LocalizationFuzzError(msg) from err - - if not isinstance(results, tuple): - msg = f"validate_message_schemas() returned {type(results).__name__}" - raise LocalizationFuzzError(msg) - if [result.message_id for result in results] != list(expected_schemas): - msg = ( - "validate_message_schemas() returned results out of input order: " - f"{[result.message_id for result in results]!r} vs {list(expected_schemas)!r}" - ) - raise LocalizationFuzzError(msg) - for result in results: - expected_variables = frozenset(expected_schemas[result.message_id]) - if not result.is_valid or result.declared_variables != expected_variables: - msg = ( - "validate_message_schemas() returned invalid exact-match result: " - f"{result!r} vs {expected_variables!r}" - ) - raise LocalizationFuzzError(msg) - - -def _assert_localization_message_validation_matches_lookup( - l10n: FluentLocalization, - message_id: str, - expected_variables: frozenset[str] | set[str], -) -> None: - """Single-message validation should match direct AST validation.""" - message = l10n.get_message(message_id) - if message is None: - msg = f"get_message('{message_id}') returned None during schema validation" - raise LocalizationFuzzError(msg) - - direct = validate_message_variables(message, frozenset(expected_variables)) - resolved = l10n.validate_message_variables(message_id, expected_variables) - if resolved != direct: - msg = ( - "validate_message_variables() diverged from direct AST validation: " - f"{resolved!r} vs {direct!r}" - ) - raise LocalizationFuzzError(msg) - - -def _check_single_message_validation_success( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Single-message exact-schema validation succeeds for direct hits.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - message_id = f"single-{gen_ftl_identifier(fdp)}" - variable_count = fdp.ConsumeIntInRange(1, 2) - variables = tuple( - f"var_{slot}_{gen_ftl_identifier(fdp)}" for slot in range(variable_count) - ) - expected_variables: frozenset[str] | set[str] = ( - frozenset(variables) if fdp.ConsumeBool() else set(variables) - ) - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, _build_variable_message(message_id, variables)) - _assert_localization_message_validation_matches_lookup( - l10n, - message_id, - expected_variables, - ) - - -def _check_single_message_validation_fallback_success( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Single-message validation resolves through localization fallback.""" - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - message_id = f"fallback-single-{gen_ftl_identifier(fdp)}" - variable = f"fallback_{gen_ftl_identifier(fdp)}" - expected_variables: frozenset[str] | set[str] = ( - frozenset({variable}) if fdp.ConsumeBool() else {variable} - ) - - l10n = FluentLocalization([primary, fallback], strict=False) - l10n.add_resource(fallback, _build_variable_message(message_id, (variable,))) - _assert_localization_message_validation_matches_lookup( - l10n, - message_id, - expected_variables, - ) - - -def _check_single_message_validation_missing_message( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Missing messages fail the single-message localization validator.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - missing_id = f"missing-single-{gen_ftl_identifier(fdp)}" - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, "present = value\n") - - try: - l10n.validate_message_variables(missing_id, frozenset()) - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="validate_message_variables", - message_fragment=f"{missing_id}: not found", - key=missing_id, - actual_fragment="missing_messages=1", - ) - else: - msg = "validate_message_variables() accepted a missing message" - raise LocalizationFuzzError(msg) - - -def _check_single_message_validation_extra_variable( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Extra declared variables fail exact single-message validation.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - message_id = f"extra-single-{gen_ftl_identifier(fdp)}" - amount_var = f"amount_{gen_ftl_identifier(fdp)}" - customer_var = f"customer_{gen_ftl_identifier(fdp)}" - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource( - locale, - _build_variable_message(message_id, (amount_var, customer_var)), - ) - - try: - l10n.validate_message_variables(message_id, frozenset({amount_var})) - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="validate_message_variables", - message_fragment=f"{message_id}: extra {{{customer_var}}}", - key=message_id, - actual_fragment="schema_mismatches=1", - ) - else: - msg = "validate_message_variables() accepted an extra-variable mismatch" - raise LocalizationFuzzError(msg) - - -def _check_single_message_validation_missing_variable( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Missing expected variables fail exact single-message validation.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - message_id = f"missing-var-single-{gen_ftl_identifier(fdp)}" - amount_var = f"amount_{gen_ftl_identifier(fdp)}" - customer_var = f"customer_{gen_ftl_identifier(fdp)}" - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, _build_variable_message(message_id, (amount_var,))) - - try: - l10n.validate_message_variables(message_id, {amount_var, customer_var}) - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="validate_message_variables", - message_fragment=f"{message_id}: missing {{{customer_var}}}", - key=message_id, - actual_fragment="schema_mismatches=1", - ) - else: - msg = "validate_message_variables() accepted a missing-variable mismatch" - raise LocalizationFuzzError(msg) - - -def _pattern_validate_message_variables_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """validate_message_variables enforces exact schemas per message.""" - _domain.message_variable_validation_checks += 1 - handlers = ( - _check_single_message_validation_success, - _check_single_message_validation_fallback_success, - _check_single_message_validation_missing_message, - _check_single_message_validation_extra_variable, - _check_single_message_validation_missing_variable, - ) - handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] - handler(fdp) - - -def _check_message_schema_fallback_success( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Fallback-resolved messages validate through the localization facade.""" - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - message_id = f"fallback-{gen_ftl_identifier(fdp)}" - variable = f"fallback_{gen_ftl_identifier(fdp)}" - - l10n = FluentLocalization([primary, fallback], strict=False) - l10n.add_resource(fallback, _build_variable_message(message_id, (variable,))) - try: - results = l10n.validate_message_schemas({message_id: frozenset({variable})}) - except IntegrityCheckFailedError as err: - msg = f"validate_message_schemas() rejected fallback-resolved schema: {err}" - raise LocalizationFuzzError(msg) from err - - if len(results) != 1 or not results[0].is_valid: - msg = f"Fallback schema validation returned {results!r}" - raise LocalizationFuzzError(msg) - - -def _check_message_schema_missing_message( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Missing messages fail exact schema validation.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - missing_id = f"missing-{gen_ftl_identifier(fdp)}" - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, "present = value\n") - - try: - l10n.validate_message_schemas({missing_id: frozenset()}) - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="validate_message_schemas", - message_fragment=f"{missing_id}: not found", - key=missing_id, - actual_fragment="missing_messages=1", - ) - else: - msg = "validate_message_schemas() accepted a missing message" - raise LocalizationFuzzError(msg) - - -def _check_message_schema_extra_variable( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Extra variables in the message fail exact schema validation.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - message_id = f"extra-{gen_ftl_identifier(fdp)}" - amount_var = f"amount_{gen_ftl_identifier(fdp)}" - customer_var = f"customer_{gen_ftl_identifier(fdp)}" - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource( - locale, - _build_variable_message(message_id, (amount_var, customer_var)), - ) - - try: - l10n.validate_message_schemas({message_id: frozenset({amount_var})}) - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="validate_message_schemas", - message_fragment=f"{message_id}: extra {{{customer_var}}}", - key=message_id, - actual_fragment="schema_mismatches=1", - ) - else: - msg = "validate_message_schemas() accepted an extra-variable mismatch" - raise LocalizationFuzzError(msg) - - -def _check_message_schema_missing_variable( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Missing expected variables fail exact schema validation.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - message_id = f"missing-var-{gen_ftl_identifier(fdp)}" - amount_var = f"amount_{gen_ftl_identifier(fdp)}" - customer_var = f"customer_{gen_ftl_identifier(fdp)}" - - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, _build_variable_message(message_id, (amount_var,))) - - try: - l10n.validate_message_schemas({message_id: {amount_var, customer_var}}) - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="validate_message_schemas", - message_fragment=f"{message_id}: missing {{{customer_var}}}", - key=message_id, - actual_fragment="schema_mismatches=1", - ) - else: - msg = "validate_message_schemas() accepted a missing-variable mismatch" - raise LocalizationFuzzError(msg) - - -def _pattern_validate_message_schemas_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """validate_message_schemas enforces exact schemas through localization.""" - _domain.schema_validation_checks += 1 - handlers = ( - _check_message_schema_exact_success, - _check_message_schema_fallback_success, - _check_message_schema_missing_message, - _check_message_schema_extra_variable, - _check_message_schema_missing_variable, - ) - handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] - handler(fdp) - - -def _pattern_add_function_custom( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Custom function registered via add_function and invoked in FTL. - - Tests the add_function pathway: a Python function is registered under a - SCREAMING_SNAKE_CASE name and invoked from an FTL message. Verifies that - function results appear in format_pattern output. - """ - _domain.custom_function_calls += 1 - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - func_name = "UPPER" - msg_id = gen_ftl_identifier(fdp) - val = gen_ftl_value(fdp, max_length=20) - ftl = f"{msg_id} = {{ {func_name}($val) }}\n" - - # use_isolating=False: result equality check must not include FSI/PDI BiDi marks - l10n = FluentLocalization([locale], strict=False, use_isolating=False) - l10n.add_resource(locale, ftl) - - # Register custom function that uppercases its argument - def upper_func(value: str) -> str: - return str(value).upper() - - l10n.add_function(func_name, upper_func) - - result, errors = l10n.format_pattern(msg_id, {"val": val}) - - if not errors: - expected = val.upper() - if result != expected: - msg = f"Custom UPPER function: expected '{expected}', got '{result}'" - raise LocalizationFuzzError(msg) - - -def _pattern_introspect_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """introspect_message and get_message_variables via localization facade. - - Tests the introspection delegation path: introspect_message() and - get_message_variables() both delegate through the fallback chain. - Verifies variable sets are consistent between the two APIs. - """ - _domain.introspect_calls += 1 - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - msg_id = gen_ftl_identifier(fdp) - var_a = gen_ftl_identifier(fdp) - var_b = f"B-{gen_ftl_identifier(fdp)}" # B: gen_ftl_identifier always starts with a-z - ftl = f"{msg_id} = {{ ${var_a} }} {{ ${var_b} }}\n" - - l10n = FluentLocalization([primary, fallback], strict=False) - l10n.add_resource(fallback, ftl) - - # introspect_message returns MessageIntrospection or None - info = l10n.introspect_message(msg_id) - variables = l10n.get_message_variables(msg_id) - - if info is not None: - # Contract: get_message_variables must be a subset of introspect result - introspect_vars = info.get_variable_names() - for var in variables: - if var not in introspect_vars: - msg = ( - f"get_message_variables returned '{var}' not in " - f"introspect result: {introspect_vars}" - ) - raise LocalizationFuzzError(msg) - - -def _validate_localization_audit_log( - locale: str, - audit_log: tuple[CacheAuditLogEntry, ...], - *, - enable_audit: bool, -) -> int: - """Validate one locale's audit log and return its entry count.""" - if not enable_audit and audit_log != (): - msg = f"Audit-disabled localization returned non-empty log for '{locale}'" - raise LocalizationFuzzError(msg) - - last_timestamp = float("-inf") - for entry in audit_log: - if entry.operation not in _VALID_AUDIT_OPERATIONS: - msg = f"Unexpected audit operation {entry.operation!r} for locale '{locale}'" - raise LocalizationFuzzError(msg) - if not entry.key_hash: - msg = f"Empty audit key hash for locale '{locale}'" - raise LocalizationFuzzError(msg) - if entry.timestamp < last_timestamp: - msg = ( - f"Audit timestamps regressed for locale '{locale}': " - f"{last_timestamp} -> {entry.timestamp}" - ) - raise LocalizationFuzzError(msg) - if entry.operation == "MISS": - if entry.sequence != 0 or entry.checksum_hex != "": - msg = ( - f"MISS audit entry for locale '{locale}' must have " - "sequence=0 and empty checksum" - ) - raise LocalizationFuzzError(msg) - elif entry.sequence <= 0 or entry.checksum_hex == "": - msg = ( - f"{entry.operation} audit entry for locale '{locale}' must carry " - "a positive sequence and non-empty checksum" - ) - raise LocalizationFuzzError(msg) - last_timestamp = entry.timestamp - - return len(audit_log) - - -def _validate_localization_cache_stats( - stats: LocalizationCacheStats, - *, - enable_audit: bool, - expected_locales: list[str], -) -> None: - """Validate aggregate localization cache stats against configuration.""" - if stats["audit_enabled"] != enable_audit: - msg = ( - "get_cache_stats()['audit_enabled'] disagrees with CacheConfig: " - f"{stats['audit_enabled']} vs {enable_audit}" - ) - raise LocalizationFuzzError(msg) - if stats["bundle_count"] != len(expected_locales): - msg = ( - "get_cache_stats()['bundle_count'] disagrees with initialized locales: " - f"{stats['bundle_count']} vs {len(expected_locales)}" - ) - raise LocalizationFuzzError(msg) - - -def _collect_localization_audit_entries( - audit_logs: dict[str, tuple[CacheAuditLogEntry, ...]], - *, - enable_audit: bool, -) -> int: - """Validate all per-locale audit logs and return their combined length.""" - total_audit_entries = 0 - for locale, audit_log in audit_logs.items(): - if not isinstance(audit_log, tuple): - msg = f"get_cache_audit_log()['{locale}'] returned {type(audit_log).__name__}" - raise LocalizationFuzzError(msg) - if any(not isinstance(entry, CacheAuditLogEntry) for entry in audit_log): - msg = f"get_cache_audit_log()['{locale}'] returned non-CacheAuditLogEntry data" - raise LocalizationFuzzError(msg) - total_audit_entries += _validate_localization_audit_log( - locale, - audit_log, - enable_audit=enable_audit, - ) - return total_audit_entries - - -def _pattern_cache_audit_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """get_cache_audit_log exposes per-locale immutable audit trails.""" - _domain.cache_audit_checks += 1 - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - enable_audit = fdp.ConsumeBool() - initialize_fallback = fdp.ConsumeBool() - primary_msg_id = f"audit-{gen_ftl_identifier(fdp)}" - fallback_msg_id = f"fallback-{gen_ftl_identifier(fdp)}" - - l10n = FluentLocalization( - [primary, fallback], - cache=CacheConfig(enable_audit=enable_audit), - strict=False, - ) - l10n.add_resource(primary, f"{primary_msg_id} = primary\n") - - expected_locales = [normalize_locale(primary)] - if initialize_fallback: - l10n.add_resource(fallback, f"{fallback_msg_id} = fallback\n") - expected_locales.append(normalize_locale(fallback)) - - l10n.format_value(primary_msg_id) - l10n.format_value(primary_msg_id) - if initialize_fallback: - l10n.format_value(fallback_msg_id) - - audit_logs = l10n.get_cache_audit_log() - if audit_logs is None: - msg = "Cached FluentLocalization returned None from get_cache_audit_log()" - raise LocalizationFuzzError(msg) - if list(audit_logs) != expected_locales: - msg = ( - "get_cache_audit_log() returned wrong locale keys: " - f"{list(audit_logs)!r} vs {expected_locales!r}" - ) - raise LocalizationFuzzError(msg) - - stats = l10n.get_cache_stats() - if stats is None: - msg = "Cached FluentLocalization returned None from get_cache_stats()" - raise LocalizationFuzzError(msg) - _validate_localization_cache_stats( - stats, - enable_audit=enable_audit, - expected_locales=expected_locales, - ) - total_audit_entries = _collect_localization_audit_entries( - audit_logs, - enable_audit=enable_audit, - ) - - if total_audit_entries != int(stats.get("audit_entries", 0)): - msg = ( - "Localization audit log length disagrees with cache stats: " - f"{total_audit_entries} vs {stats.get('audit_entries')}" - ) - raise LocalizationFuzzError(msg) - - primary_locale = normalize_locale(primary) - fallback_locale = normalize_locale(fallback) - if enable_audit and len(audit_logs[primary_locale]) < 2: - msg = f"Primary locale '{primary_locale}' did not record expected audit entries" - raise LocalizationFuzzError(msg) - if initialize_fallback and enable_audit and len(audit_logs[fallback_locale]) < 2: - msg = f"Fallback locale '{fallback_locale}' did not record expected audit entries" - raise LocalizationFuzzError(msg) - - -def _assert_localization_locale_accepts( - raw_locales: list[str], - *, - expected_locales: tuple[str, ...], -) -> None: - """Accepted locale chains are canonicalized, deduplicated, and remain usable.""" - try: - l10n = FluentLocalization(raw_locales, strict=False) - except Exception as err: # pylint: disable=broad-exception-caught - msg = f"FluentLocalization rejected valid locales {raw_locales!r}: {err}" - raise LocalizationFuzzError(msg) from err - - if l10n.locales != expected_locales: - msg = ( - "FluentLocalization stored the wrong locale chain: " - f"{l10n.locales!r} vs {expected_locales!r}" - ) - raise LocalizationFuzzError(msg) - - l10n.add_resource(expected_locales[0], "msg = ready\n") - result, errors = l10n.format_pattern("msg") - if result != "ready" or errors: - msg = ( - f"FluentLocalization with accepted locales {expected_locales!r} " - f"failed basic formatting: result={result!r}, errors={errors!r}" - ) - raise LocalizationFuzzError(msg) - - -def _assert_localization_locale_rejected( - locales: list[object], - *, - expected_exception: type[ValueError | TypeError], - expected_fragment: str, -) -> None: - """Rejected locale chains surface the canonical constructor error contract.""" - locales_value: Any = locales - - try: - FluentLocalization(locales_value, strict=False) - except Exception as err: # pylint: disable=broad-exception-caught - if not isinstance(err, expected_exception): - msg = ( - "FluentLocalization raised the wrong locale-boundary exception for " - f"{locales!r}: {type(err).__name__}" - ) - raise LocalizationFuzzError(msg) from err - if expected_fragment not in str(err): - msg = ( - "FluentLocalization locale-boundary error message drifted for " - f"{locales!r}: {err}" - ) - raise LocalizationFuzzError(msg) from err - return - - msg = f"FluentLocalization accepted invalid locales {locales!r}" - raise LocalizationFuzzError(msg) - - -def _pattern_locale_boundary_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """FluentLocalization constructor shares the canonical locale boundary contract.""" - _domain.locale_boundary_checks += 1 - scenario = fdp.ConsumeIntInRange(0, 4) - boundary_locale = "a" + ("b" * (MAX_LOCALE_LENGTH_HARD_LIMIT - 2)) + "C" - - match scenario: - case 0: - if fdp.ConsumeBool(): - raw_locales = [" EN-us ", "\tEN-us\n", " de-DE "] - expected_locales = ( - require_locale_code(" EN-us ", "locale"), - require_locale_code(" de-DE ", "locale"), - ) - else: - raw_locales = [f" {boundary_locale} ", f"\n{boundary_locale}\t", " lv "] - expected_locales = ( - require_locale_code(f" {boundary_locale} ", "locale"), - require_locale_code(" lv ", "locale"), - ) - _assert_localization_locale_accepts( - raw_locales, - expected_locales=expected_locales, - ) - case 1: - blank_locale = fdp.PickValueInList(["", " ", "\t\n", " \r\n "]) - _assert_localization_locale_rejected( - ["en", blank_locale], - expected_exception=ValueError, - expected_fragment="locale cannot be blank", - ) - case 2: - invalid_locale = fdp.PickValueInList(list(_STRUCTURALLY_INVALID_LOCALES)) - _assert_localization_locale_rejected( - ["en", invalid_locale], - expected_exception=ValueError, - expected_fragment="Invalid locale:", - ) - case 3: - overshoot = fdp.ConsumeIntInRange(1, 32) - overlong_locale = "a" * (MAX_LOCALE_LENGTH_HARD_LIMIT + overshoot) - _assert_localization_locale_rejected( - ["en", overlong_locale], - expected_exception=ValueError, - expected_fragment="locale exceeds maximum length", - ) - case _: - non_string_locale = fdp.PickValueInList(list(_NON_STRING_LOCALES)) - _assert_localization_locale_rejected( - ["en", non_string_locale], - expected_exception=TypeError, - expected_fragment="locale must be str", - ) - - -def _pattern_on_fallback_callback( - fdp: atheris.FuzzedDataProvider, -) -> None: - """on_fallback callback fires when message resolved from fallback locale. - - Tests that the callback is invoked exactly once when the primary locale - lacks the message and the fallback locale has it. Verifies FallbackInfo - carries the correct requested and resolved locales. - """ - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - msg_id = gen_ftl_identifier(fdp) - val = gen_ftl_value(fdp) - ftl = f"{msg_id} = {val}\n" - - fallback_infos: list[FallbackInfo] = [] - - l10n = FluentLocalization( - [primary, fallback], - strict=False, - on_fallback=fallback_infos.append, - ) - # Add message only to fallback locale - l10n.add_resource(fallback, ftl) - - _, errors = l10n.format_pattern(msg_id) - - if not errors: - _domain.messages_found += 1 - if fallback_infos: - _domain.fallback_triggered += 1 - info = fallback_infos[0] - # Contract: requested_locale = primary, resolved_locale = fallback - expected_fallback = normalize_locale(fallback) - if info.resolved_locale != expected_fallback: - msg = ( - f"on_fallback: resolved_locale='{info.resolved_locale}' " - f"expected '{expected_fallback}'" - ) - raise LocalizationFuzzError(msg) - else: - _domain.messages_missing += 1 - - -def _pattern_loader_init_success( - fdp: atheris.FuzzedDataProvider, -) -> None: - """PathResourceLoader eager-init path records all-success summary data.""" - _domain.loader_init_checks += 1 - locale_a, locale_b = fdp.PickValueInList(list(_LOCALE_PAIRS)) - resource_id = "main.ftl" - msg_id = gen_ftl_identifier(fdp) - primary_val = gen_ftl_value(fdp) - fallback_val = gen_ftl_value(fdp) - - with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: - root = pathlib.Path(tmp_dir) - _write_loader_resource(root, locale_a, resource_id, f"{msg_id} = {primary_val}\n") - _write_loader_resource(root, locale_b, resource_id, f"{msg_id} = {fallback_val}\n") - - loader = PathResourceLoader(str(root / "{locale}")) - l10n = FluentLocalization( - [locale_a, locale_b], - [resource_id], - loader, - strict=False, - ) - summary = l10n.get_load_summary() - - if summary.successful != 2 or not summary.all_successful: - msg = ( - f"Expected two successful eager loads, got successful={summary.successful}, " - f"not_found={summary.not_found}, errors={summary.errors}" - ) - raise LocalizationFuzzError(msg) - if summary.has_errors or summary.has_junk: - msg = ( - f"Unexpected summary state: has_errors={summary.has_errors}, " - f"has_junk={summary.has_junk}" - ) - raise LocalizationFuzzError(msg) - if any(result.source_path is None for result in summary.results): - msg = "Loader summary missing source_path on successful result" - raise LocalizationFuzzError(msg) - - result, errors = l10n.format_pattern(msg_id) - if errors: - msg = f"Loader-backed localization unexpectedly returned errors: {errors!r}" - raise LocalizationFuzzError(msg) - if primary_val not in result: - msg = f"Primary locale value {primary_val!r} missing from result {result!r}" - raise LocalizationFuzzError(msg) - - -def _pattern_loader_not_found_fallback( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Primary miss is tracked as not_found while fallback still resolves.""" - _domain.loader_init_checks += 1 - primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - resource_id = "main.ftl" - msg_id = gen_ftl_identifier(fdp) - val = gen_ftl_value(fdp) - - with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: - root = pathlib.Path(tmp_dir) - _write_loader_resource(root, fallback, resource_id, f"{msg_id} = {val}\n") - - loader = PathResourceLoader(str(root / "{locale}")) - l10n = FluentLocalization( - [primary, fallback], - [resource_id], - loader, - strict=False, - ) - summary = l10n.get_load_summary() - - if summary.successful != 1 or summary.not_found != 1 or summary.errors != 0: - msg = ( - f"Unexpected mixed summary: successful={summary.successful}, " - f"not_found={summary.not_found}, errors={summary.errors}" - ) - raise LocalizationFuzzError(msg) - - result, errors = l10n.format_pattern(msg_id) - if errors: - msg = f"Fallback load should resolve successfully, got errors={errors!r}" - raise LocalizationFuzzError(msg) - if val not in result: - msg = f"Fallback value {val!r} missing from result {result!r}" - raise LocalizationFuzzError(msg) - - -def _pattern_loader_junk_summary( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Junk entries discovered during eager load are preserved in LoadSummary.""" - _domain.loader_junk_checks += 1 - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - resource_id = "broken.ftl" - junk_source = f"{gen_ftl_identifier(fdp)} = {{\n" - - with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: - root = pathlib.Path(tmp_dir) - _write_loader_resource(root, locale, resource_id, junk_source) - - loader = PathResourceLoader(str(root / "{locale}")) - l10n = FluentLocalization([locale], [resource_id], loader, strict=False) - summary = l10n.get_load_summary() - - if summary.successful != 1 or not summary.has_junk or summary.junk_count < 1: - msg = ( - f"Expected junk-bearing successful load, got successful={summary.successful}, " - f"has_junk={summary.has_junk}, junk_count={summary.junk_count}" - ) - raise LocalizationFuzzError(msg) - if summary.all_clean: - msg = "LoadSummary.all_clean unexpectedly true for junk input" - raise LocalizationFuzzError(msg) - if not summary.get_with_junk(): - msg = "LoadSummary.get_with_junk() returned empty tuple" - raise LocalizationFuzzError(msg) - - -def _pattern_loader_path_error( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Invalid resource IDs surface as loader errors in the eager-load summary.""" - _domain.loader_error_checks += 1 - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - invalid_resource_id = fdp.PickValueInList( - [ - "../escape.ftl", - " main.ftl", - "/absolute.ftl", - ] - ) - - with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: - root = pathlib.Path(tmp_dir) - loader = PathResourceLoader(str(root / "{locale}")) - l10n = FluentLocalization( - [locale], - [invalid_resource_id], - loader, - strict=False, - ) - summary = l10n.get_load_summary() - - if summary.errors != 1 or not summary.has_errors: - msg = ( - f"Expected one loader error for invalid resource_id, got " - f"errors={summary.errors}, not_found={summary.not_found}" - ) - raise LocalizationFuzzError(msg) - - first_error = summary.get_errors()[0].error - if not isinstance(first_error, ValueError): - msg = ( - "Expected ValueError from PathResourceLoader validation, got " - f"{type(first_error).__name__}" - ) - raise LocalizationFuzzError(msg) - - -def _check_require_clean_empty_init( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Empty initialization is considered clean.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - l10n = FluentLocalization([locale], strict=False) - try: - summary = l10n.require_clean() - except IntegrityCheckFailedError as err: - msg = f"require_clean() raised on empty initialization: {err}" - raise LocalizationFuzzError(msg) from err - - if not summary.all_clean or summary.total_attempted != 0: - msg = f"Empty initialization should be clean, got {summary!r}" - raise LocalizationFuzzError(msg) - - -def _check_require_clean_loader_success( - fdp: atheris.FuzzedDataProvider, -) -> None: - """All-success loader summaries return from require_clean().""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - resource_id = "main.ftl" - message_id = f"clean-{gen_ftl_identifier(fdp)}" - value = gen_ftl_value(fdp) - - with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: - root = pathlib.Path(tmp_dir) - _write_loader_resource(root, locale, resource_id, f"{message_id} = {value}\n") - loader = PathResourceLoader(str(root / "{locale}")) - l10n = FluentLocalization([locale], [resource_id], loader, strict=False) - try: - summary = l10n.require_clean() - except IntegrityCheckFailedError as err: - msg = f"require_clean() rejected an all-success summary: {err}" - raise LocalizationFuzzError(msg) from err - - if not summary.all_clean or summary.successful != 1 or summary.errors != 0: - msg = f"Clean loader initialization returned wrong summary: {summary!r}" - raise LocalizationFuzzError(msg) - - -def _check_require_clean_missing_loader( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Missing resources fail require_clean() with integrity context.""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - normalized_locale = normalize_locale(locale) - resource_id = "main.ftl" - - class MissingLoader: - def load(self, _locale: str, _resource_id: str) -> str: - msg = "missing" - raise FileNotFoundError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization([locale], [resource_id], MissingLoader(), strict=False) - - try: - l10n.require_clean() - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="require_clean", - message_fragment="not clean", - key=f"{normalized_locale}/{resource_id}", - actual_fragment="LoadSummary(", - ) - else: - msg = "require_clean() accepted a missing-resource summary" - raise LocalizationFuzzError(msg) - - -def _check_require_clean_junk_resource( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Junk-bearing resources fail require_clean().""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - resource_id = "broken.ftl" - junk_source = f"{gen_ftl_identifier(fdp)} = {{\n" - - with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: - root = pathlib.Path(tmp_dir) - _write_loader_resource(root, locale, resource_id, junk_source) - loader = PathResourceLoader(str(root / "{locale}")) - l10n = FluentLocalization([locale], [resource_id], loader, strict=False) - - try: - l10n.require_clean() - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="require_clean", - message_fragment="junk", - key_fragment=resource_id, - actual_fragment="LoadSummary(", - ) - else: - msg = "require_clean() accepted a junk-bearing summary" - raise LocalizationFuzzError(msg) - - -def _check_require_clean_loader_error( - fdp: atheris.FuzzedDataProvider, -) -> None: - """Loader validation errors fail require_clean().""" - locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) - invalid_resource_id = fdp.PickValueInList( - [ - "../escape.ftl", - " main.ftl", - "/absolute.ftl", - ] - ) - - with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: - root = pathlib.Path(tmp_dir) - loader = PathResourceLoader(str(root / "{locale}")) - l10n = FluentLocalization( - [locale], - [invalid_resource_id], - loader, - strict=False, - ) - - try: - l10n.require_clean() - except IntegrityCheckFailedError as err: - _assert_integrity_failure( - err, - operation="require_clean", - message_fragment="load error", - key_fragment=invalid_resource_id, - actual_fragment="LoadSummary(", - ) - else: - msg = "require_clean() accepted a loader error summary" - raise LocalizationFuzzError(msg) - - -def _pattern_require_clean_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """require_clean returns only for clean initialization summaries.""" - _domain.boot_validation_checks += 1 - handlers = ( - _check_require_clean_empty_init, - _check_require_clean_loader_success, - _check_require_clean_missing_loader, - _check_require_clean_junk_resource, - _check_require_clean_loader_error, - ) - handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] - handler(fdp) - - -def _check_boot_config_validation(fdp: atheris.FuzzedDataProvider) -> None: - """__post_init__ rejects empty locales/resource_ids and missing loader/base_path.""" - choice = fdp.ConsumeIntInRange(0, 2) - try: - if choice == 0: - LocalizationBootConfig( - locales=(), - resource_ids=("ui.ftl",), - loader=_EmptyLoader(), - ) - msg = "Empty locales did not raise ValueError" - raise LocalizationFuzzError(msg) - if choice == 1: - LocalizationBootConfig( - locales=("en",), - resource_ids=(), - loader=_EmptyLoader(), - ) - msg = "Empty resource_ids did not raise ValueError" - raise LocalizationFuzzError(msg) - LocalizationBootConfig( - locales=("en",), - resource_ids=("ui.ftl",), - ) - msg = "Missing loader/base_path did not raise ValueError" - raise LocalizationFuzzError(msg) - except ValueError: - pass # expected - - -def _check_boot_config_boot_success(fdp: atheris.FuzzedDataProvider) -> None: - """boot_simple() returns FluentLocalization for a valid in-memory FTL resource.""" - locale = fdp.PickValueInList(["en", "de", "lv"]) - ftl = f"greeting = Hello {{ $name }}\nmsg{fdp.ConsumeIntInRange(0, 9)} = Value\n" - loader = _SingleResourceLoader(locale, "ui.ftl", ftl) - try: - cfg = LocalizationBootConfig( - locales=(locale,), - resource_ids=("ui.ftl",), - loader=loader, - ) - l10n = cfg.boot_simple() - if not isinstance(l10n, FluentLocalization): - msg = f"boot_simple() returned {type(l10n).__name__}, expected FluentLocalization" - raise LocalizationFuzzError(msg) - except IntegrityCheckFailedError: - pass # strict syntax errors in generated FTL are acceptable - except _ALLOWED_EXCEPTIONS: - pass - - -def _check_boot_config_boot_with_summary(fdp: atheris.FuzzedDataProvider) -> None: - """boot() returns a 3-tuple with correct types and clean LoadSummary.""" - locale = fdp.PickValueInList(["en", "de"]) - ftl = "msg = Value\n" - loader = _SingleResourceLoader(locale, "ui.ftl", ftl) - try: - cfg = LocalizationBootConfig( - locales=(locale,), - resource_ids=("ui.ftl",), - loader=loader, - ) - result = cfg.boot() - if not isinstance(result, tuple) or len(result) != 3: - msg = f"boot() returned wrong structure: {result!r}" - raise LocalizationFuzzError(msg) - l10n, summary, schema_results = result - if not isinstance(l10n, FluentLocalization): - msg = f"boot()[0] is {type(l10n).__name__}, not FluentLocalization" - raise LocalizationFuzzError(msg) - if not isinstance(schema_results, tuple): - msg = f"boot()[2] is {type(schema_results).__name__}, not tuple" - raise LocalizationFuzzError(msg) - if summary.errors != 0: - msg = f"LoadSummary.errors={summary.errors} for clean resource" - raise LocalizationFuzzError(msg) - if summary.total_attempted < 1: - msg = f"LoadSummary.total_attempted={summary.total_attempted}, expected >= 1" - raise LocalizationFuzzError(msg) - except IntegrityCheckFailedError: - pass - except _ALLOWED_EXCEPTIONS: - pass - - -def _check_boot_config_boot_failure(fdp: atheris.FuzzedDataProvider) -> None: - """boot() raises IntegrityCheckFailedError when a resource cannot be loaded.""" - locale = fdp.PickValueInList(["en", "de"]) - loader = _EmptyLoader() # no resources registered -> FileNotFoundError - try: - cfg = LocalizationBootConfig( - locales=(locale,), - resource_ids=("missing.ftl",), - loader=loader, - ) - cfg.boot() - msg = "boot() did not raise IntegrityCheckFailedError for missing resource" - raise LocalizationFuzzError(msg) - except IntegrityCheckFailedError: - pass # expected - except _ALLOWED_EXCEPTIONS: - pass - - -def _check_boot_config_required_messages_absent(fdp: atheris.FuzzedDataProvider) -> None: - """required_messages raises IntegrityCheckFailedError when an ID is absent.""" - locale = fdp.PickValueInList(["en", "de"]) - # Load a resource that has "greeting" but NOT "farewell" - ftl = "greeting = Hello\n" - loader = _SingleResourceLoader(locale, "ui.ftl", ftl) - try: - cfg = LocalizationBootConfig( - locales=(locale,), - resource_ids=("ui.ftl",), - loader=loader, - required_messages=frozenset({"greeting", "farewell"}), - ) - cfg.boot() - msg = "boot() did not raise IntegrityCheckFailedError for absent required message" - raise LocalizationFuzzError(msg) - except IntegrityCheckFailedError: - pass # expected: "farewell" is absent - except _ALLOWED_EXCEPTIONS: - pass - - -def _check_boot_config_required_messages_present(fdp: atheris.FuzzedDataProvider) -> None: - """required_messages succeeds when all IDs resolve in at least one locale.""" - locale = fdp.PickValueInList(["en", "de"]) - ftl = "greeting = Hello\nfarewell = Goodbye\n" - loader = _SingleResourceLoader(locale, "ui.ftl", ftl) - try: - cfg = LocalizationBootConfig( - locales=(locale,), - resource_ids=("ui.ftl",), - loader=loader, - required_messages=frozenset({"greeting", "farewell"}), - ) - l10n, summary, _ = cfg.boot() - if not isinstance(l10n, FluentLocalization): - msg = f"boot()[0] is {type(l10n).__name__}, expected FluentLocalization" - raise LocalizationFuzzError(msg) - if summary.errors != 0: - msg = f"LoadSummary.errors={summary.errors} for clean resource" - raise LocalizationFuzzError(msg) - except IntegrityCheckFailedError: - pass # generated FTL may have syntax issues - except _ALLOWED_EXCEPTIONS: - pass - - -def _check_boot_config_one_shot(fdp: atheris.FuzzedDataProvider) -> None: - """boot() and boot_simple() are one-shot: second call raises RuntimeError.""" - locale = fdp.PickValueInList(["en", "de"]) - ftl = "greeting = Hello\n" - loader = _SingleResourceLoader(locale, "ui.ftl", ftl) - use_simple = fdp.ConsumeBool() - try: - cfg = LocalizationBootConfig( - locales=(locale,), - resource_ids=("ui.ftl",), - loader=loader, - ) - # First call must succeed - if use_simple: - cfg.boot_simple() - else: - cfg.boot() - # Second call must raise RuntimeError (one-shot enforcement) - try: - if use_simple: - cfg.boot_simple() - else: - cfg.boot() - msg = ( - "boot() did not raise RuntimeError on second call " - "(one-shot enforcement missing)" - ) - raise LocalizationFuzzError(msg) - except RuntimeError: - pass # expected: one-shot enforcement - except IntegrityCheckFailedError: - pass # FTL may have syntax issues -- acceptable - except _ALLOWED_EXCEPTIONS: - pass - - -def _pattern_boot_config_api( - fdp: atheris.FuzzedDataProvider, -) -> None: - """LocalizationBootConfig strict-mode boot sequence and invariants.""" - _domain.boot_config_checks += 1 - handlers = ( - _check_boot_config_validation, - _check_boot_config_boot_success, - _check_boot_config_boot_with_summary, - _check_boot_config_boot_failure, - _check_boot_config_required_messages_absent, - _check_boot_config_required_messages_present, - _check_boot_config_one_shot, - ) - handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] - handler(fdp) - - -class _EmptyLoader: - """ResourceLoader with no resources — always raises FileNotFoundError.""" - - def load(self, locale: str, resource_id: str) -> str: - msg = f"No resource for ({locale!r}, {resource_id!r})" - raise FileNotFoundError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"empty://{locale}/{resource_id}" - - -class _SingleResourceLoader: - """ResourceLoader backed by a single (locale, resource_id) → FTL mapping.""" - - def __init__(self, locale: str, resource_id: str, ftl: str) -> None: - self._locale = locale - self._resource_id = resource_id - self._ftl = ftl - - def load(self, locale: str, resource_id: str) -> str: - if locale == self._locale and resource_id == self._resource_id: - return self._ftl - msg = f"No resource for ({locale!r}, {resource_id!r})" - raise FileNotFoundError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"memory://{locale}/{resource_id}" - - -# --- Pattern dispatch --- - -_PATTERN_DISPATCH = { - "single_locale_add_resource": _pattern_single_locale_add_resource, - "multi_locale_fallback": _pattern_multi_locale_fallback, - "chain_of_3_fallback": _pattern_chain_of_3_fallback, - "format_value_missing": _pattern_format_value_missing, - "format_with_variables": _pattern_format_with_variables, - "add_resource_mutation": _pattern_add_resource_mutation, - "has_message_api": _pattern_has_message_api, - "ast_lookup_api": _pattern_ast_lookup_api, - "get_message_ids_api": _pattern_get_message_ids_api, - "validate_resource_api": _pattern_validate_resource_api, - "validate_message_variables_api": _pattern_validate_message_variables_api, - "validate_message_schemas_api": _pattern_validate_message_schemas_api, - "add_function_custom": _pattern_add_function_custom, - "introspect_api": _pattern_introspect_api, - "cache_audit_api": _pattern_cache_audit_api, - "locale_boundary_api": _pattern_locale_boundary_api, - "on_fallback_callback": _pattern_on_fallback_callback, - "loader_init_success": _pattern_loader_init_success, - "loader_not_found_fallback": _pattern_loader_not_found_fallback, - "loader_junk_summary": _pattern_loader_junk_summary, - "loader_path_error": _pattern_loader_path_error, - "require_clean_api": _pattern_require_clean_api, - "boot_config_api": _pattern_boot_config_api, -} - - -# --- Main Entry Point --- - - -def test_one_input(data: bytes) -> None: - """Atheris entry point: Test FluentLocalization invariants.""" - if _state.iterations == 0: - _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) - - _state.iterations += 1 - _state.status = "running" - - if _state.iterations % _state.checkpoint_interval == 0: - _emit_checkpoint() - - start_time = time.perf_counter() - fdp = atheris.FuzzedDataProvider(data) - - pattern_name = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) - _state.pattern_coverage[pattern_name] = _state.pattern_coverage.get(pattern_name, 0) + 1 - - try: - _PATTERN_DISPATCH[pattern_name](fdp) - - except ( - *_ALLOWED_EXCEPTIONS, - FrozenFluentError, - DataIntegrityError, - FormattingIntegrityError, - SyntaxIntegrityError, - ) as e: - error_type = f"{type(e).__name__}_{str(e)[:30]}" - _state.error_counts[error_type] = _state.error_counts.get(error_type, 0) + 1 - except Exception: - _state.findings += 1 - raise - finally: - is_interesting = ( - "fallback" in pattern_name - or "loader" in pattern_name - or pattern_name - in ( - "add_resource_mutation", - "introspect_api", - "ast_lookup_api", - "cache_audit_api", - "locale_boundary_api", - "validate_message_variables_api", - "validate_message_schemas_api", - "require_clean_api", - "boot_config_api", - ) - or (time.perf_counter() - start_time) * 1000 > 1.0 - ) - record_iteration_metrics( - _state, - pattern_name, - start_time, - data, - is_interesting=is_interesting, - ) - - if _state.iterations % GC_INTERVAL == 0: - gc.collect() - - if _state.iterations % 100 == 0: - record_memory(_state) - - -def main() -> None: - """Run the localization fuzzer with CLI support.""" - parser = argparse.ArgumentParser( - description="FluentLocalization multi-locale orchestration fuzzer", - epilog="All unrecognized arguments are passed to libFuzzer.", - ) - parser.add_argument( - "--checkpoint-interval", - type=int, - default=500, - help="Emit report every N iterations (default: 500)", - ) - parser.add_argument( - "--seed-corpus-size", - type=int, - default=500, - help="Maximum in-memory seed corpus size (default: 500)", - ) - - args, remaining = parser.parse_known_args() - _state.checkpoint_interval = args.checkpoint_interval - _state.seed_corpus_max_size = args.seed_corpus_size - sys.argv = [sys.argv[0], *remaining] - - print_fuzzer_banner( - title="FluentLocalization Multi-locale Orchestration Fuzzer (Atheris)", - target="ftllexengine.localization.orchestrator.FluentLocalization", - state=_state, - schedule_len=len(_PATTERN_SCHEDULE), - extra_lines=[ - f"Patterns: {len(_PATTERN_WEIGHTS)}" - f" ({sum(w for _, w in _PATTERN_WEIGHTS)} weighted slots)", - ], - ) - - run_fuzzer(_state, test_one_input=test_one_input) - +from fuzz_localization_entry import main if __name__ == "__main__": main() diff --git a/fuzz_atheris/fuzz_localization_entry.py b/fuzz_atheris/fuzz_localization_entry.py new file mode 100644 index 00000000..c0b0b693 --- /dev/null +++ b/fuzz_atheris/fuzz_localization_entry.py @@ -0,0 +1,193 @@ +# mypy: disable-error-code=name-defined +from fuzz_localization_patterns_basic import ( + _pattern_add_resource_mutation, + _pattern_ast_lookup_api, + _pattern_chain_of_3_fallback, + _pattern_format_value_missing, + _pattern_format_with_variables, + _pattern_get_message_ids_api, + _pattern_has_message_api, + _pattern_multi_locale_fallback, + _pattern_single_locale_add_resource, + _pattern_validate_resource_api, +) +from fuzz_localization_patterns_boot import _pattern_boot_config_api +from fuzz_localization_patterns_introspection import ( + _pattern_add_function_custom, + _pattern_cache_audit_api, + _pattern_introspect_api, + _pattern_locale_boundary_api, + _pattern_on_fallback_callback, +) +from fuzz_localization_patterns_loader import ( + _pattern_loader_init_success, + _pattern_loader_junk_summary, + _pattern_loader_not_found_fallback, + _pattern_loader_path_error, + _pattern_require_clean_api, +) +from fuzz_localization_patterns_validation import ( + _pattern_validate_message_schemas_api, + _pattern_validate_message_variables_api, +) +from fuzz_localization_support import ( + _ALLOWED_EXCEPTIONS, + _PATTERN_SCHEDULE, + _PATTERN_WEIGHTS, + GC_INTERVAL, + DataIntegrityError, + FormattingIntegrityError, + FrozenFluentError, + SyntaxIntegrityError, + _emit_checkpoint, + _state, + argparse, + atheris, + gc, + get_process, + print_fuzzer_banner, + record_iteration_metrics, + record_memory, + run_fuzzer, + select_pattern_round_robin, + sys, + time, +) + +# --- Pattern dispatch --- + +_PATTERN_DISPATCH = { + "single_locale_add_resource": _pattern_single_locale_add_resource, + "multi_locale_fallback": _pattern_multi_locale_fallback, + "chain_of_3_fallback": _pattern_chain_of_3_fallback, + "format_value_missing": _pattern_format_value_missing, + "format_with_variables": _pattern_format_with_variables, + "add_resource_mutation": _pattern_add_resource_mutation, + "has_message_api": _pattern_has_message_api, + "ast_lookup_api": _pattern_ast_lookup_api, + "get_message_ids_api": _pattern_get_message_ids_api, + "validate_resource_api": _pattern_validate_resource_api, + "validate_message_variables_api": _pattern_validate_message_variables_api, + "validate_message_schemas_api": _pattern_validate_message_schemas_api, + "add_function_custom": _pattern_add_function_custom, + "introspect_api": _pattern_introspect_api, + "cache_audit_api": _pattern_cache_audit_api, + "locale_boundary_api": _pattern_locale_boundary_api, + "on_fallback_callback": _pattern_on_fallback_callback, + "loader_init_success": _pattern_loader_init_success, + "loader_not_found_fallback": _pattern_loader_not_found_fallback, + "loader_junk_summary": _pattern_loader_junk_summary, + "loader_path_error": _pattern_loader_path_error, + "require_clean_api": _pattern_require_clean_api, + "boot_config_api": _pattern_boot_config_api, +} + + +# --- Main Entry Point --- + + +def test_one_input(data: bytes) -> None: + """Atheris entry point: Test FluentLocalization invariants.""" + if _state.iterations == 0: + _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) + + _state.iterations += 1 + _state.status = "running" + + if _state.iterations % _state.checkpoint_interval == 0: + _emit_checkpoint() + + start_time = time.perf_counter() + fdp = atheris.FuzzedDataProvider(data) + + pattern_name = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) + _state.pattern_coverage[pattern_name] = _state.pattern_coverage.get(pattern_name, 0) + 1 + + try: + _PATTERN_DISPATCH[pattern_name](fdp) + + except ( + *_ALLOWED_EXCEPTIONS, + FrozenFluentError, + DataIntegrityError, + FormattingIntegrityError, + SyntaxIntegrityError, + ) as e: + error_type = f"{type(e).__name__}_{str(e)[:30]}" + _state.error_counts[error_type] = _state.error_counts.get(error_type, 0) + 1 + except Exception: + _state.findings += 1 + raise + finally: + is_interesting = ( + "fallback" in pattern_name + or "loader" in pattern_name + or pattern_name + in ( + "add_resource_mutation", + "introspect_api", + "ast_lookup_api", + "cache_audit_api", + "locale_boundary_api", + "validate_message_variables_api", + "validate_message_schemas_api", + "require_clean_api", + "boot_config_api", + ) + or (time.perf_counter() - start_time) * 1000 > 1.0 + ) + record_iteration_metrics( + _state, + pattern_name, + start_time, + data, + is_interesting=is_interesting, + ) + + if _state.iterations % GC_INTERVAL == 0: + gc.collect() + + if _state.iterations % 100 == 0: + record_memory(_state) + + +def main() -> None: + """Run the localization fuzzer with CLI support.""" + parser = argparse.ArgumentParser( + description="FluentLocalization multi-locale orchestration fuzzer", + epilog="All unrecognized arguments are passed to libFuzzer.", + ) + parser.add_argument( + "--checkpoint-interval", + type=int, + default=500, + help="Emit report every N iterations (default: 500)", + ) + parser.add_argument( + "--seed-corpus-size", + type=int, + default=500, + help="Maximum in-memory seed corpus size (default: 500)", + ) + + args, remaining = parser.parse_known_args() + _state.checkpoint_interval = args.checkpoint_interval + _state.seed_corpus_max_size = args.seed_corpus_size + sys.argv = [sys.argv[0], *remaining] + + print_fuzzer_banner( + title="FluentLocalization Multi-locale Orchestration Fuzzer (Atheris)", + target="ftllexengine.localization.orchestrator.FluentLocalization", + state=_state, + schedule_len=len(_PATTERN_SCHEDULE), + extra_lines=[ + f"Patterns: {len(_PATTERN_WEIGHTS)}" + f" ({sum(w for _, w in _PATTERN_WEIGHTS)} weighted slots)", + ], + ) + + run_fuzzer(_state, test_one_input=test_one_input) + + +if __name__ == "__main__": + main() diff --git a/fuzz_atheris/fuzz_localization_patterns_basic.py b/fuzz_atheris/fuzz_localization_patterns_basic.py new file mode 100644 index 00000000..97f18beb --- /dev/null +++ b/fuzz_atheris/fuzz_localization_patterns_basic.py @@ -0,0 +1,502 @@ +# mypy: disable-error-code=name-defined +from fuzz_localization_support import ( + _LOCALE_PAIRS, + _LOCALE_TRIPLES, + _SINGLE_LOCALES, + FallbackInfo, + FluentLocalization, + IntegrityCheckFailedError, + LocalizationFuzzError, + _domain, + atheris, + gen_ftl_identifier, + gen_ftl_value, + normalize_locale, + pathlib, + validate_message_variables, +) + + +def _write_loader_resource( + root: pathlib.Path, + locale: str, + resource_id: str, + ftl_source: str, +) -> pathlib.Path: + """Write an FTL file for PathResourceLoader-backed tests.""" + locale_dir = root / normalize_locale(locale) + locale_dir.mkdir(parents=True, exist_ok=True) + resource_path = locale_dir / resource_id + resource_path.write_text(ftl_source, encoding="utf-8") + return resource_path + + +def _build_variable_message(message_id: str, variables: tuple[str, ...]) -> str: + """Build a simple message that references the given variable set.""" + placeables = " ".join(f"{{ ${variable} }}" for variable in variables) + return f"{message_id} = {placeables or 'value'}\n" + + +def _assert_integrity_failure( + err: IntegrityCheckFailedError, + *, + operation: str, + message_fragment: str | None = None, + key: str | None = None, + key_fragment: str | None = None, + actual_fragment: str | None = None, +) -> None: + """Validate localization-scoped IntegrityCheckFailedError context.""" + if message_fragment is not None and message_fragment not in str(err): + msg = f"Integrity error message missing {message_fragment!r}: {err!s}" + raise LocalizationFuzzError(msg) + + context = err.context + if context is None: + msg = "IntegrityCheckFailedError missing context" + raise LocalizationFuzzError(msg) + if context.component != "localization": + msg = f"Integrity error component={context.component!r}, expected 'localization'" + raise LocalizationFuzzError(msg) + if context.operation != operation: + msg = f"Integrity error operation={context.operation!r}, expected {operation!r}" + raise LocalizationFuzzError(msg) + if context.expected != "LoadSummary(all_clean=True)" and operation == "require_clean": + msg = f"require_clean context expected field mismatch: {context.expected!r}" + raise LocalizationFuzzError(msg) + if key is not None and context.key != key: + msg = f"Integrity error key={context.key!r}, expected {key!r}" + raise LocalizationFuzzError(msg) + if key_fragment is not None and (context.key is None or key_fragment not in context.key): + msg = f"Integrity error key={context.key!r} missing fragment {key_fragment!r}" + raise LocalizationFuzzError(msg) + if actual_fragment is not None and ( + context.actual is None or actual_fragment not in context.actual + ): + msg = f"Integrity error actual={context.actual!r} missing fragment {actual_fragment!r}" + raise LocalizationFuzzError(msg) + + +def _pattern_single_locale_add_resource( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Single-locale FluentLocalization: add_resource + format round-trip. + + Tests the minimal FluentLocalization configuration: one locale, one + resource added via add_resource(), one format call. Verifies the + basic construction-add-format lifecycle. + """ + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + msg_id = gen_ftl_identifier(fdp) + var = gen_ftl_identifier(fdp) + val = gen_ftl_value(fdp) + ftl = f"{msg_id} = {{ ${var} }}\n" + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, ftl) + + result, errors = l10n.format_pattern(msg_id, {var: val}) + + # Contract: no errors means result must contain the variable value + if not errors and val not in result: + msg = ( + f"Single locale: format_pattern('{msg_id}', {{'{var}': '{val}'}}) " + f"returned '{result}' without errors but value missing" + ) + raise LocalizationFuzzError(msg) + + if not errors: + _domain.messages_found += 1 + else: + _domain.messages_missing += 1 + + +def _pattern_multi_locale_fallback( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Two-locale chain: message present only in fallback locale. + + Tests the core fallback mechanism: the primary locale does NOT have the + message, the fallback locale does. Verifies that format_pattern traverses + the chain and returns the fallback locale's result. + """ + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + msg_id = gen_ftl_identifier(fdp) + val = gen_ftl_value(fdp) + ftl = f"{msg_id} = {val}\n" + + l10n = FluentLocalization([primary, fallback], strict=False) + # Add resource ONLY to fallback locale; primary stays empty + l10n.add_resource(fallback, ftl) + + fallback_seen: list[FallbackInfo] = [] + l10n_with_cb = FluentLocalization( + [primary, fallback], + strict=False, + on_fallback=fallback_seen.append, + ) + l10n_with_cb.add_resource(fallback, ftl) + + _, errors = l10n_with_cb.format_pattern(msg_id) + + if not errors: + _domain.messages_found += 1 + # Fallback callback must have fired (primary locale had no message) + if fallback_seen: + _domain.fallback_triggered += 1 + info = fallback_seen[0] + # Contract: FallbackInfo carries the correct resolved_locale + expected_fallback = normalize_locale(fallback) + if info.resolved_locale != expected_fallback: + msg = ( + "Fallback: expected " + f"resolved_locale='{expected_fallback}', " + f"got '{info.resolved_locale}'" + ) + raise LocalizationFuzzError(msg) + else: + _domain.messages_missing += 1 + + +def _pattern_chain_of_3_fallback( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Three-locale chain: message in a fuzz-chosen position. + + Tests fallback traversal depth. The message can be in locale 0, 1, or 2 + (or nowhere). Verifies the fallback chain visits locales in order. + """ + triple = fdp.PickValueInList(list(_LOCALE_TRIPLES)) + locale_a, locale_b, locale_c = triple + msg_id = gen_ftl_identifier(fdp) + val = gen_ftl_value(fdp) + ftl = f"{msg_id} = {val}\n" + target_locale_idx = fdp.ConsumeIntInRange(0, 3) # 3 = none + + l10n = FluentLocalization([locale_a, locale_b, locale_c], strict=False) + + target_locale = triple[target_locale_idx] if target_locale_idx < 3 else None + if target_locale: + l10n.add_resource(target_locale, ftl) + + result, errors = l10n.format_pattern(msg_id) + + if not errors: + _domain.messages_found += 1 + if target_locale and val in result: + return # Correct + if not target_locale: + # Message was in no locale - result is fallback text, errors expected + pass + else: + _domain.messages_missing += 1 + + +def _pattern_format_value_missing( + fdp: atheris.FuzzedDataProvider, +) -> None: + """format_value/format_pattern with non-existent message returns fallback. + + Tests the missing-message contract: format_pattern with a message ID that + does not exist in any locale must return a non-empty fallback string and + at least one error. strict=False to use soft-error return API. + """ + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + existing_id = gen_ftl_identifier(fdp) + missing_id = f"missing-{gen_ftl_identifier(fdp)}" + existing_ftl = f"{existing_id} = value\n" + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, existing_ftl) + + result, errors = l10n.format_pattern(missing_id) + + # Contract: missing message MUST produce errors and non-empty fallback + if not errors: + msg = f"Missing message '{missing_id}' produced no errors (result='{result}')" + raise LocalizationFuzzError(msg) + if not result: + msg = f"Missing message '{missing_id}' produced empty result with errors" + raise LocalizationFuzzError(msg) + + _domain.messages_missing += 1 + + +def _pattern_format_with_variables( + fdp: atheris.FuzzedDataProvider, +) -> None: + """format_pattern with multiple variable args across two locales. + + Tests that variable substitution works correctly with fallback. + Verifies the args dict propagates into the resolved bundle. + """ + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + msg_id = gen_ftl_identifier(fdp) + var_a = gen_ftl_identifier(fdp) + var_b = f"B-{gen_ftl_identifier(fdp)}" # B: gen_ftl_identifier always starts with a-z + val_a = gen_ftl_value(fdp, max_length=20) + val_b = gen_ftl_value(fdp, max_length=20) + ftl = f"{msg_id} = {{ ${var_a} }} {{ ${var_b} }}\n" + + l10n = FluentLocalization([primary, fallback], strict=False) + l10n.add_resource(fallback, ftl) + + result, errors = l10n.format_pattern(msg_id, {var_a: val_a, var_b: val_b}) + + if not errors: + _domain.messages_found += 1 + if val_a not in result or val_b not in result: + msg = f"Variables not found in result: expected '{val_a}' and '{val_b}', got '{result}'" + raise LocalizationFuzzError(msg) + + +def _pattern_add_resource_mutation( + fdp: atheris.FuzzedDataProvider, +) -> None: + """add_resource after initial format call; re-format sees new resource. + + Tests that RWLock correctly serializes post-construction add_resource + against concurrent format_pattern calls. The resource adds a new message + and the second format_pattern must see it. + """ + _domain.add_resource_mutations += 1 + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + msg_id_a = gen_ftl_identifier(fdp) + msg_id_b = f"B-{gen_ftl_identifier(fdp)}" # B: gen_ftl_identifier always starts with a-z + val_a = gen_ftl_value(fdp) + val_b = gen_ftl_value(fdp) + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, f"{msg_id_a} = {val_a}\n") + + # First format (before mutation) + l10n.format_pattern(msg_id_a) + _, errors_b1 = l10n.format_pattern(msg_id_b) + + # msg_b not yet added - must produce errors + if not errors_b1: + msg = f"Before mutation: '{msg_id_b}' found before add_resource" + raise LocalizationFuzzError(msg) + + # Add second message (mutation) + l10n.add_resource(locale, f"{msg_id_b} = {val_b}\n") + + # Re-format after mutation + result_b2, errors_b2 = l10n.format_pattern(msg_id_b) + + if not errors_b2 and val_b not in result_b2: + msg = f"After mutation: expected '{val_b}' in result, got '{result_b2}'" + raise LocalizationFuzzError(msg) + + +def _pattern_has_message_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """has_message/has_attribute cross-locale scan invariants. + + Tests: if format_pattern succeeds for a message ID, has_message must + return True. If has_message returns False, format_pattern must produce + errors. + """ + _domain.has_message_checks += 1 + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + msg_id = gen_ftl_identifier(fdp) + attr_name = fdp.PickValueInList(["tooltip", "label", "title"]) + val = gen_ftl_value(fdp) + ftl = f"{msg_id} = {val}\n .{attr_name} = hint\n" + + l10n = FluentLocalization([primary, fallback], strict=False) + l10n.add_resource(fallback, ftl) + + has_msg = l10n.has_message(msg_id) + has_attr = l10n.has_attribute(msg_id, attr_name) + has_missing_attr = l10n.has_attribute(msg_id, "nonexistent-attr") + + # Contract: has_message must be True (we added it to fallback) + if not has_msg: + msg = f"has_message('{msg_id}') returned False after add_resource" + raise LocalizationFuzzError(msg) + + # Contract: has_attribute(existing) must be True + if not has_attr: + msg = f"has_attribute('{msg_id}', '{attr_name}') returned False after add_resource" + raise LocalizationFuzzError(msg) + + # Contract: has_attribute(nonexistent) must be False + if has_missing_attr: + msg = f"has_attribute('{msg_id}', 'nonexistent-attr') returned True" + raise LocalizationFuzzError(msg) + + +def _validate_localization_message_lookup( + l10n: FluentLocalization, + message_id: str, + expected_variables: frozenset[str], +) -> None: + """Validate FluentLocalization.get_message() for one identifier.""" + message = l10n.get_message(message_id) + if message is None: + msg = f"get_message('{message_id}') returned None for an existing message" + raise LocalizationFuzzError(msg) + if message.id.name != message_id: + msg = f"get_message('{message_id}') returned node named '{message.id.name}'" + raise LocalizationFuzzError(msg) + + message_validation = validate_message_variables(message, expected_variables) + if not message_validation.is_valid: + msg = f"validate_message_variables() rejected localization message '{message_id}'" + raise LocalizationFuzzError(msg) + if message_validation.declared_variables != expected_variables: + msg = ( + f"get_message('{message_id}') resolved wrong locale variables: " + f"{message_validation.declared_variables!r} vs {expected_variables!r}" + ) + raise LocalizationFuzzError(msg) + + +def _validate_localization_term_lookup( + l10n: FluentLocalization, + term_id: str, + expected_variables: frozenset[str], +) -> None: + """Validate FluentLocalization.get_term() for one identifier.""" + term = l10n.get_term(term_id) + if term is None: + msg = f"get_term('{term_id}') returned None for an existing term" + raise LocalizationFuzzError(msg) + if term.id.name != term_id: + msg = f"get_term('{term_id}') returned node named '{term.id.name}'" + raise LocalizationFuzzError(msg) + + term_validation = validate_message_variables(term, expected_variables) + if not term_validation.is_valid: + msg = f"validate_message_variables() rejected localization term '{term_id}'" + raise LocalizationFuzzError(msg) + if term_validation.declared_variables != expected_variables: + msg = ( + f"get_term('{term_id}') resolved wrong locale variables: " + f"{term_validation.declared_variables!r} vs {expected_variables!r}" + ) + raise LocalizationFuzzError(msg) + + +def _pattern_ast_lookup_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """get_message/get_term honor fallback precedence and namespace boundaries.""" + _domain.ast_lookup_checks += 1 + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + msg_id = f"msg-{gen_ftl_identifier(fdp)}" + term_id = f"term-{gen_ftl_identifier(fdp)}" + primary_has_message = fdp.ConsumeBool() + primary_has_term = fdp.ConsumeBool() + + l10n = FluentLocalization([primary, fallback], strict=False) + l10n.add_resource( + fallback, + (f"{msg_id} = {{ $fallbackvar }}\n-{term_id} = {{ $fallbackterm }}\n"), + ) + + primary_parts: list[str] = [] + if primary_has_message: + primary_parts.append(f"{msg_id} = {{ $primaryvar }}\n") + if primary_has_term: + primary_parts.append(f"-{term_id} = {{ $primaryterm }}\n") + if primary_parts: + l10n.add_resource(primary, "".join(primary_parts)) + + expected_message_vars = frozenset({"primaryvar" if primary_has_message else "fallbackvar"}) + _validate_localization_message_lookup(l10n, msg_id, expected_message_vars) + + expected_term_vars = frozenset({"primaryterm" if primary_has_term else "fallbackterm"}) + _validate_localization_term_lookup(l10n, term_id, expected_term_vars) + + if l10n.get_term(f"-{term_id}") is not None: + msg = f"get_term('-{term_id}') bypassed the no-leading-dash contract" + raise LocalizationFuzzError(msg) + if l10n.get_message(term_id) is not None: + msg = f"get_message('{term_id}') crossed the term/message namespace boundary" + raise LocalizationFuzzError(msg) + if l10n.get_term(msg_id) is not None: + msg = f"get_term('{msg_id}') crossed the message/term namespace boundary" + raise LocalizationFuzzError(msg) + if l10n.get_message("__missing_localization_lookup__") is not None: + msg = "get_message() returned a node for a missing localization message" + raise LocalizationFuzzError(msg) + if l10n.get_term("__missing_localization_lookup__") is not None: + msg = "get_term() returned a node for a missing localization term" + raise LocalizationFuzzError(msg) + + +def _pattern_get_message_ids_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """get_message_ids returns superset of added message IDs. + + Tests deduplication: if the same message ID is added to two locales, it + must appear only once in get_message_ids(). Also checks that + get_message_ids() contains every message we added. + """ + locale_a, locale_b = fdp.PickValueInList(list(_LOCALE_PAIRS)) + n = fdp.ConsumeIntInRange(1, 5) + msg_ids = [gen_ftl_identifier(fdp) for _ in range(n)] + + l10n = FluentLocalization([locale_a, locale_b], strict=False) + + # Add same messages to both locales (deduplication test) + for mid in msg_ids: + l10n.add_resource(locale_a, f"{mid} = value-a\n") + l10n.add_resource(locale_b, f"{mid} = value-b\n") + + all_ids = l10n.get_message_ids() + all_ids_set = set(all_ids) + + # Contract: every added message ID must appear + for mid in msg_ids: + if mid not in all_ids_set: + msg = f"get_message_ids(): missing '{mid}' after add_resource" + raise LocalizationFuzzError(msg) + + # Contract: no duplicates + if len(all_ids) != len(all_ids_set): + msg = f"get_message_ids(): duplicates found: {sorted(all_ids)}" + raise LocalizationFuzzError(msg) + + +def _pattern_validate_resource_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """validate_resource via FluentLocalization facade. + + Tests that validate_resource returns a ValidationResult and that + its errors/warnings attributes are sequences (never crashes, never + returns None, always returns a structured result). + """ + _domain.validate_calls += 1 + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + ftl_choice = fdp.ConsumeIntInRange(0, 5) + + match ftl_choice: + case 0: + ftl = f"{gen_ftl_identifier(fdp)} = valid message\n" + case 1: + ftl = "invalid = { $x -> [one] singular *[other] plural }\n" + case 2: + ftl = "" # Empty + case 3: + ftl = "# Just a comment\n" + case 4: + # Duplicate message ID + mid = gen_ftl_identifier(fdp) + ftl = f"{mid} = first\n{mid} = second\n" + case _: + ftl = fdp.ConsumeUnicodeNoSurrogates(fdp.ConsumeIntInRange(0, 200)) + + l10n = FluentLocalization([locale], strict=False) + result = l10n.validate_resource(ftl) + + # Contract: errors and warnings are tuples/sequences + if not hasattr(result, "errors") or not hasattr(result, "warnings"): + msg = "validate_resource result missing errors/warnings" + raise LocalizationFuzzError(msg) + diff --git a/fuzz_atheris/fuzz_localization_patterns_boot.py b/fuzz_atheris/fuzz_localization_patterns_boot.py new file mode 100644 index 00000000..7797ad0c --- /dev/null +++ b/fuzz_atheris/fuzz_localization_patterns_boot.py @@ -0,0 +1,248 @@ +# mypy: disable-error-code=name-defined +from fuzz_localization_support import ( + _ALLOWED_EXCEPTIONS, + IntegrityCheckFailedError, + LocalizationBootConfig, + LocalizationFuzzError, + _domain, + atheris, +) + + +def _check_boot_config_validation(fdp: atheris.FuzzedDataProvider) -> None: + """__post_init__ rejects empty locales/resource_ids and missing loader/base_path.""" + choice = fdp.ConsumeIntInRange(0, 2) + try: + if choice == 0: + LocalizationBootConfig( + locales=(), + resource_ids=("ui.ftl",), + loader=_EmptyLoader(), + ) + msg = "Empty locales did not raise ValueError" + raise LocalizationFuzzError(msg) + if choice == 1: + LocalizationBootConfig( + locales=("en",), + resource_ids=(), + loader=_EmptyLoader(), + ) + msg = "Empty resource_ids did not raise ValueError" + raise LocalizationFuzzError(msg) + LocalizationBootConfig( + locales=("en",), + resource_ids=("ui.ftl",), + ) + msg = "Missing loader/base_path did not raise ValueError" + raise LocalizationFuzzError(msg) + except ValueError: + pass # expected + + +def _check_boot_config_boot_success(fdp: atheris.FuzzedDataProvider) -> None: + """boot_simple() returns FluentLocalization for a valid in-memory FTL resource.""" + locale = fdp.PickValueInList(["en", "de", "lv"]) + ftl = f"greeting = Hello {{ $name }}\nmsg{fdp.ConsumeIntInRange(0, 9)} = Value\n" + loader = _SingleResourceLoader(locale, "ui.ftl", ftl) + try: + cfg = LocalizationBootConfig( + locales=(locale,), + resource_ids=("ui.ftl",), + loader=loader, + ) + l10n = cfg.boot_simple() + result, errors = l10n.format_pattern("greeting", {"name": "bootstrap"}) + if errors or "bootstrap" not in result: + msg = ( + "boot_simple() returned unusable localization: " + f"result={result!r}, errors={errors!r}" + ) + raise LocalizationFuzzError(msg) + except IntegrityCheckFailedError: + pass # strict syntax errors in generated FTL are acceptable + except _ALLOWED_EXCEPTIONS: + pass + + +def _check_boot_config_boot_with_summary(fdp: atheris.FuzzedDataProvider) -> None: + """boot() returns a 3-tuple with correct types and clean LoadSummary.""" + locale = fdp.PickValueInList(["en", "de"]) + ftl = "msg = Value\n" + loader = _SingleResourceLoader(locale, "ui.ftl", ftl) + try: + cfg = LocalizationBootConfig( + locales=(locale,), + resource_ids=("ui.ftl",), + loader=loader, + ) + l10n, summary, schema_results = cfg.boot() + if summary.errors != 0: + msg = f"LoadSummary.errors={summary.errors} for clean resource" + raise LocalizationFuzzError(msg) + if summary.total_attempted < 1: + msg = f"LoadSummary.total_attempted={summary.total_attempted}, expected >= 1" + raise LocalizationFuzzError(msg) + result, errors = l10n.format_pattern("msg") + if errors or result != "Value": + msg = f"boot() returned unusable localization: result={result!r}, errors={errors!r}" + raise LocalizationFuzzError(msg) + if schema_results != (): + msg = f"boot() returned schema results without message_schemas: {schema_results!r}" + raise LocalizationFuzzError(msg) + except IntegrityCheckFailedError: + pass + except _ALLOWED_EXCEPTIONS: + pass + + +def _check_boot_config_boot_failure(fdp: atheris.FuzzedDataProvider) -> None: + """boot() raises IntegrityCheckFailedError when a resource cannot be loaded.""" + locale = fdp.PickValueInList(["en", "de"]) + loader = _EmptyLoader() # no resources registered -> FileNotFoundError + try: + cfg = LocalizationBootConfig( + locales=(locale,), + resource_ids=("missing.ftl",), + loader=loader, + ) + cfg.boot() + msg = "boot() did not raise IntegrityCheckFailedError for missing resource" + raise LocalizationFuzzError(msg) + except IntegrityCheckFailedError: + pass # expected + except _ALLOWED_EXCEPTIONS: + pass + + +def _check_boot_config_required_messages_absent(fdp: atheris.FuzzedDataProvider) -> None: + """required_messages raises IntegrityCheckFailedError when an ID is absent.""" + locale = fdp.PickValueInList(["en", "de"]) + # Load a resource that has "greeting" but NOT "farewell" + ftl = "greeting = Hello\n" + loader = _SingleResourceLoader(locale, "ui.ftl", ftl) + try: + cfg = LocalizationBootConfig( + locales=(locale,), + resource_ids=("ui.ftl",), + loader=loader, + required_messages=frozenset({"greeting", "farewell"}), + ) + cfg.boot() + msg = "boot() did not raise IntegrityCheckFailedError for absent required message" + raise LocalizationFuzzError(msg) + except IntegrityCheckFailedError: + pass # expected: "farewell" is absent + except _ALLOWED_EXCEPTIONS: + pass + + +def _check_boot_config_required_messages_present(fdp: atheris.FuzzedDataProvider) -> None: + """required_messages succeeds when all IDs resolve in at least one locale.""" + locale = fdp.PickValueInList(["en", "de"]) + ftl = "greeting = Hello\nfarewell = Goodbye\n" + loader = _SingleResourceLoader(locale, "ui.ftl", ftl) + try: + cfg = LocalizationBootConfig( + locales=(locale,), + resource_ids=("ui.ftl",), + loader=loader, + required_messages=frozenset({"greeting", "farewell"}), + ) + l10n, summary, _ = cfg.boot() + if summary.errors != 0: + msg = f"LoadSummary.errors={summary.errors} for clean resource" + raise LocalizationFuzzError(msg) + farewell, errors = l10n.format_pattern("farewell") + if errors or farewell != "Goodbye": + msg = ( + "Required-message boot returned unusable localization: " + f"result={farewell!r}, errors={errors!r}" + ) + raise LocalizationFuzzError(msg) + except IntegrityCheckFailedError: + pass # generated FTL may have syntax issues + except _ALLOWED_EXCEPTIONS: + pass + + +def _check_boot_config_one_shot(fdp: atheris.FuzzedDataProvider) -> None: + """boot() and boot_simple() are one-shot: second call raises RuntimeError.""" + locale = fdp.PickValueInList(["en", "de"]) + ftl = "greeting = Hello\n" + loader = _SingleResourceLoader(locale, "ui.ftl", ftl) + use_simple = fdp.ConsumeBool() + try: + cfg = LocalizationBootConfig( + locales=(locale,), + resource_ids=("ui.ftl",), + loader=loader, + ) + # First call must succeed + if use_simple: + cfg.boot_simple() + else: + cfg.boot() + # Second call must raise RuntimeError (one-shot enforcement) + try: + if use_simple: + cfg.boot_simple() + else: + cfg.boot() + msg = ( + "boot() did not raise RuntimeError on second call " + "(one-shot enforcement missing)" + ) + raise LocalizationFuzzError(msg) + except RuntimeError: + pass # expected: one-shot enforcement + except IntegrityCheckFailedError: + pass # FTL may have syntax issues -- acceptable + except _ALLOWED_EXCEPTIONS: + pass + + +def _pattern_boot_config_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """LocalizationBootConfig strict-mode boot sequence and invariants.""" + _domain.boot_config_checks += 1 + handlers = ( + _check_boot_config_validation, + _check_boot_config_boot_success, + _check_boot_config_boot_with_summary, + _check_boot_config_boot_failure, + _check_boot_config_required_messages_absent, + _check_boot_config_required_messages_present, + _check_boot_config_one_shot, + ) + handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] + handler(fdp) + + +class _EmptyLoader: + """ResourceLoader with no resources — always raises FileNotFoundError.""" + + def load(self, locale: str, resource_id: str) -> str: + msg = f"No resource for ({locale!r}, {resource_id!r})" + raise FileNotFoundError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"empty://{locale}/{resource_id}" + + +class _SingleResourceLoader: + """ResourceLoader backed by a single (locale, resource_id) → FTL mapping.""" + + def __init__(self, locale: str, resource_id: str, ftl: str) -> None: + self._locale = locale + self._resource_id = resource_id + self._ftl = ftl + + def load(self, locale: str, resource_id: str) -> str: + if locale == self._locale and resource_id == self._resource_id: + return self._ftl + msg = f"No resource for ({locale!r}, {resource_id!r})" + raise FileNotFoundError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"memory://{locale}/{resource_id}" diff --git a/fuzz_atheris/fuzz_localization_patterns_introspection.py b/fuzz_atheris/fuzz_localization_patterns_introspection.py new file mode 100644 index 00000000..7e81a232 --- /dev/null +++ b/fuzz_atheris/fuzz_localization_patterns_introspection.py @@ -0,0 +1,420 @@ +# mypy: disable-error-code=name-defined +from fuzz_localization_support import ( + _LOCALE_PAIRS, + _NON_STRING_LOCALES, + _SINGLE_LOCALES, + _STRUCTURALLY_INVALID_LOCALES, + _VALID_AUDIT_OPERATIONS, + MAX_LOCALE_LENGTH_HARD_LIMIT, + Any, + CacheAuditLogEntry, + CacheConfig, + FallbackInfo, + FluentLocalization, + LocalizationCacheStats, + LocalizationFuzzError, + _domain, + atheris, + gen_ftl_identifier, + gen_ftl_value, + normalize_locale, + require_locale_code, +) + + +def _pattern_add_function_custom( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Custom function registered via add_function and invoked in FTL. + + Tests the add_function pathway: a Python function is registered under a + SCREAMING_SNAKE_CASE name and invoked from an FTL message. Verifies that + function results appear in format_pattern output. + """ + _domain.custom_function_calls += 1 + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + func_name = "UPPER" + msg_id = gen_ftl_identifier(fdp) + val = gen_ftl_value(fdp, max_length=20) + ftl = f"{msg_id} = {{ {func_name}($val) }}\n" + + # use_isolating=False: result equality check must not include FSI/PDI BiDi marks + l10n = FluentLocalization([locale], strict=False, use_isolating=False) + l10n.add_resource(locale, ftl) + # Register custom function that uppercases its argument + def upper_func(value: str) -> str: + return str(value).upper() + l10n.add_function(func_name, upper_func) + + result, errors = l10n.format_pattern(msg_id, {"val": val}) + + if not errors: + expected = val.upper() + if result != expected: + msg = f"Custom UPPER function: expected '{expected}', got '{result}'" + raise LocalizationFuzzError(msg) + + +def _pattern_introspect_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """introspect_message and get_message_variables via localization facade. + + Tests the introspection delegation path: introspect_message() and + get_message_variables() both delegate through the fallback chain. + Verifies variable sets are consistent between the two APIs. + """ + _domain.introspect_calls += 1 + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + msg_id = gen_ftl_identifier(fdp) + var_a = gen_ftl_identifier(fdp) + var_b = f"B-{gen_ftl_identifier(fdp)}" # B: gen_ftl_identifier always starts with a-z + ftl = f"{msg_id} = {{ ${var_a} }} {{ ${var_b} }}\n" + + l10n = FluentLocalization([primary, fallback], strict=False) + l10n.add_resource(fallback, ftl) + + # introspect_message returns MessageIntrospection or None + info = l10n.introspect_message(msg_id) + variables = l10n.get_message_variables(msg_id) + + if info is not None: + # Contract: get_message_variables must be a subset of introspect result + introspect_vars = info.get_variable_names() + for var in variables: + if var not in introspect_vars: + msg = ( + f"get_message_variables returned '{var}' not in " + f"introspect result: {introspect_vars}" + ) + raise LocalizationFuzzError(msg) + + +def _validate_localization_audit_log( + locale: str, + audit_log: tuple[CacheAuditLogEntry, ...], + *, + enable_audit: bool, +) -> int: + """Validate one locale's audit log and return its entry count.""" + if not enable_audit and audit_log != (): + msg = f"Audit-disabled localization returned non-empty log for '{locale}'" + raise LocalizationFuzzError(msg) + + last_timestamp = float("-inf") + last_sequence = 0 + for entry in audit_log: + if entry.operation not in _VALID_AUDIT_OPERATIONS: + msg = f"Unexpected audit operation {entry.operation!r} for locale '{locale}'" + raise LocalizationFuzzError(msg) + if not entry.key_hash: + msg = f"Empty audit key hash for locale '{locale}'" + raise LocalizationFuzzError(msg) + if entry.timestamp < last_timestamp: + msg = ( + f"Audit timestamps regressed for locale '{locale}': " + f"{last_timestamp} -> {entry.timestamp}" + ) + raise LocalizationFuzzError(msg) + if entry.sequence <= last_sequence: + msg = ( + f"Audit sequence regressed for locale '{locale}': " + f"{last_sequence} -> {entry.sequence}" + ) + raise LocalizationFuzzError(msg) + if entry.operation == "MISS": + if entry.checksum_hex != "" or entry.cache_sequence < 0: + msg = ( + f"MISS audit entry for locale '{locale}' must have " + "empty checksum and non-negative cache_sequence" + ) + raise LocalizationFuzzError(msg) + elif entry.checksum_hex == "" or entry.cache_sequence <= 0: + msg = ( + f"{entry.operation} audit entry for locale '{locale}' must carry " + "a positive cache_sequence and non-empty checksum" + ) + raise LocalizationFuzzError(msg) + last_timestamp = entry.timestamp + last_sequence = entry.sequence + + return len(audit_log) + + +def _validate_localization_cache_stats( + stats: LocalizationCacheStats, + *, + enable_audit: bool, + expected_locales: list[str], +) -> None: + """Validate aggregate localization cache stats against configuration.""" + if stats["audit_enabled"] != enable_audit: + msg = ( + "get_cache_stats()['audit_enabled'] disagrees with CacheConfig: " + f"{stats['audit_enabled']} vs {enable_audit}" + ) + raise LocalizationFuzzError(msg) + if stats["bundle_count"] != len(expected_locales): + msg = ( + "get_cache_stats()['bundle_count'] disagrees with initialized locales: " + f"{stats['bundle_count']} vs {len(expected_locales)}" + ) + raise LocalizationFuzzError(msg) + + +def _collect_localization_audit_entries( + audit_logs: dict[str, tuple[CacheAuditLogEntry, ...]], + *, + enable_audit: bool, +) -> int: + """Validate all per-locale audit logs and return their combined length.""" + total_audit_entries = 0 + for locale, audit_log in audit_logs.items(): + if any(not isinstance(entry, CacheAuditLogEntry) for entry in audit_log): + msg = f"get_cache_audit_log()['{locale}'] returned non-CacheAuditLogEntry data" + raise LocalizationFuzzError(msg) + total_audit_entries += _validate_localization_audit_log( + locale, + audit_log, + enable_audit=enable_audit, + ) + return total_audit_entries + + +def _pattern_cache_audit_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """get_cache_audit_log exposes per-locale immutable audit trails.""" + _domain.cache_audit_checks += 1 + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + enable_audit = fdp.ConsumeBool() + initialize_fallback = fdp.ConsumeBool() + primary_msg_id = f"audit-{gen_ftl_identifier(fdp)}" + fallback_msg_id = f"fallback-{gen_ftl_identifier(fdp)}" + + l10n = FluentLocalization( + [primary, fallback], + cache=CacheConfig(enable_audit=enable_audit), + strict=False, + ) + l10n.add_resource(primary, f"{primary_msg_id} = primary\n") + + expected_locales = [normalize_locale(primary)] + if initialize_fallback: + l10n.add_resource(fallback, f"{fallback_msg_id} = fallback\n") + expected_locales.append(normalize_locale(fallback)) + + l10n.format_value(primary_msg_id) + l10n.format_value(primary_msg_id) + if initialize_fallback: + l10n.format_value(fallback_msg_id) + + audit_logs = l10n.get_cache_audit_log() + if audit_logs is None: + msg = "Cached FluentLocalization returned None from get_cache_audit_log()" + raise LocalizationFuzzError(msg) + if list(audit_logs) != expected_locales: + msg = ( + "get_cache_audit_log() returned wrong locale keys: " + f"{list(audit_logs)!r} vs {expected_locales!r}" + ) + raise LocalizationFuzzError(msg) + + stats = l10n.get_cache_stats() + if stats is None: + msg = "Cached FluentLocalization returned None from get_cache_stats()" + raise LocalizationFuzzError(msg) + _validate_localization_cache_stats( + stats, + enable_audit=enable_audit, + expected_locales=expected_locales, + ) + total_audit_entries = _collect_localization_audit_entries( + audit_logs, + enable_audit=enable_audit, + ) + + if total_audit_entries != int(stats.get("audit_entries", 0)): + msg = ( + "Localization audit log length disagrees with cache stats: " + f"{total_audit_entries} vs {stats.get('audit_entries')}" + ) + raise LocalizationFuzzError(msg) + + primary_locale = normalize_locale(primary) + fallback_locale = normalize_locale(fallback) + if enable_audit and len(audit_logs[primary_locale]) < 2: + msg = f"Primary locale '{primary_locale}' did not record expected audit entries" + raise LocalizationFuzzError(msg) + if initialize_fallback and enable_audit and len(audit_logs[fallback_locale]) < 2: + msg = f"Fallback locale '{fallback_locale}' did not record expected audit entries" + raise LocalizationFuzzError(msg) + + +def _assert_localization_locale_accepts( + raw_locales: list[str], + *, + expected_locales: tuple[str, ...], +) -> None: + """Accepted locale chains are canonicalized, deduplicated, and remain usable.""" + try: + l10n = FluentLocalization(raw_locales, strict=False) + except Exception as err: # pylint: disable=broad-exception-caught + msg = f"FluentLocalization rejected valid locales {raw_locales!r}: {err}" + raise LocalizationFuzzError(msg) from err + + if l10n.locales != expected_locales: + msg = ( + "FluentLocalization stored the wrong locale chain: " + f"{l10n.locales!r} vs {expected_locales!r}" + ) + raise LocalizationFuzzError(msg) + + l10n.add_resource(expected_locales[0], "msg = ready\n") + result, errors = l10n.format_pattern("msg") + if result != "ready" or errors: + msg = ( + f"FluentLocalization with accepted locales {expected_locales!r} " + f"failed basic formatting: result={result!r}, errors={errors!r}" + ) + raise LocalizationFuzzError(msg) + + +def _assert_localization_locale_rejected( + locales: list[object], + *, + expected_exception: type[ValueError | TypeError], + expected_fragment: str, +) -> None: + """Rejected locale chains surface the canonical constructor error contract.""" + locales_value: Any = locales + + try: + FluentLocalization(locales_value, strict=False) + except Exception as err: # pylint: disable=broad-exception-caught + if not isinstance(err, expected_exception): + msg = ( + "FluentLocalization raised the wrong locale-boundary exception for " + f"{locales!r}: {type(err).__name__}" + ) + raise LocalizationFuzzError(msg) from err + if expected_fragment not in str(err): + msg = ( + "FluentLocalization locale-boundary error message drifted for " + f"{locales!r}: {err}" + ) + raise LocalizationFuzzError(msg) from err + return + + msg = f"FluentLocalization accepted invalid locales {locales!r}" + raise LocalizationFuzzError(msg) + + +def _pattern_locale_boundary_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """FluentLocalization constructor shares the canonical locale boundary contract.""" + _domain.locale_boundary_checks += 1 + scenario = fdp.ConsumeIntInRange(0, 5) + boundary_locale = "a" + ("b" * (MAX_LOCALE_LENGTH_HARD_LIMIT - 2)) + "C" + + match scenario: + case 0: + if fdp.ConsumeBool(): + raw_locales = [" EN-us ", "\tEN-us\n", " de-DE "] + expected_locales = ( + require_locale_code(" EN-us ", "locale"), + require_locale_code(" de-DE ", "locale"), + ) + else: + raw_locales = [" LV-lv ", "\nLV-lv\t", " en-US "] + expected_locales = ( + require_locale_code(" LV-lv ", "locale"), + require_locale_code(" en-US ", "locale"), + ) + _assert_localization_locale_accepts( + raw_locales, + expected_locales=expected_locales, + ) + case 1: + blank_locale = fdp.PickValueInList(["", " ", "\t\n", " \r\n "]) + _assert_localization_locale_rejected( + ["en", blank_locale], + expected_exception=ValueError, + expected_fragment="locale cannot be blank", + ) + case 2: + invalid_locale = fdp.PickValueInList(list(_STRUCTURALLY_INVALID_LOCALES)) + _assert_localization_locale_rejected( + ["en", invalid_locale], + expected_exception=ValueError, + expected_fragment="Invalid locale:", + ) + case 3: + _assert_localization_locale_rejected( + ["en", f" {boundary_locale} "], + expected_exception=ValueError, + expected_fragment="Unknown locale identifier", + ) + case 4: + overshoot = fdp.ConsumeIntInRange(1, 32) + overlong_locale = "a" * (MAX_LOCALE_LENGTH_HARD_LIMIT + overshoot) + _assert_localization_locale_rejected( + ["en", overlong_locale], + expected_exception=ValueError, + expected_fragment="locale exceeds maximum length", + ) + case _: + non_string_locale = fdp.PickValueInList(list(_NON_STRING_LOCALES)) + _assert_localization_locale_rejected( + ["en", non_string_locale], + expected_exception=TypeError, + expected_fragment="locale must be str", + ) + + +def _pattern_on_fallback_callback( + fdp: atheris.FuzzedDataProvider, +) -> None: + """on_fallback callback fires when message resolved from fallback locale. + + Tests that the callback is invoked exactly once when the primary locale + lacks the message and the fallback locale has it. Verifies FallbackInfo + carries the correct requested and resolved locales. + """ + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + msg_id = gen_ftl_identifier(fdp) + val = gen_ftl_value(fdp) + ftl = f"{msg_id} = {val}\n" + + fallback_infos: list[FallbackInfo] = [] + + l10n = FluentLocalization( + [primary, fallback], + strict=False, + on_fallback=fallback_infos.append, + ) + # Add message only to fallback locale + l10n.add_resource(fallback, ftl) + + _, errors = l10n.format_pattern(msg_id) + + if not errors: + _domain.messages_found += 1 + if fallback_infos: + _domain.fallback_triggered += 1 + info = fallback_infos[0] + # Contract: requested_locale = primary, resolved_locale = fallback + expected_fallback = normalize_locale(fallback) + if info.requested_locale != normalize_locale(primary): + msg = ( + "Fallback callback recorded the wrong requested locale: " + f"{info.requested_locale!r} vs {normalize_locale(primary)!r}" + ) + raise LocalizationFuzzError(msg) + if info.resolved_locale != expected_fallback: + msg = ( + "Fallback callback recorded the wrong resolved locale: " + f"{info.resolved_locale!r} vs {expected_fallback!r}" + ) + raise LocalizationFuzzError(msg) diff --git a/fuzz_atheris/fuzz_localization_patterns_loader.py b/fuzz_atheris/fuzz_localization_patterns_loader.py new file mode 100644 index 00000000..bff9552f --- /dev/null +++ b/fuzz_atheris/fuzz_localization_patterns_loader.py @@ -0,0 +1,341 @@ +# mypy: disable-error-code=name-defined +from fuzz_localization_patterns_basic import ( + _assert_integrity_failure, + _write_loader_resource, +) +from fuzz_localization_support import ( + _LOCALE_PAIRS, + _SINGLE_LOCALES, + FluentLocalization, + IntegrityCheckFailedError, + LocalizationFuzzError, + PathResourceLoader, + TemporaryDirectory, + _domain, + atheris, + gen_ftl_identifier, + gen_ftl_value, + normalize_locale, + pathlib, +) + + +def _pattern_loader_init_success( + fdp: atheris.FuzzedDataProvider, +) -> None: + """PathResourceLoader eager-init path records all-success summary data.""" + _domain.loader_init_checks += 1 + locale_a, locale_b = fdp.PickValueInList(list(_LOCALE_PAIRS)) + resource_id = "main.ftl" + msg_id = gen_ftl_identifier(fdp) + primary_val = gen_ftl_value(fdp) + fallback_val = gen_ftl_value(fdp) + + with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: + root = pathlib.Path(tmp_dir) + _write_loader_resource(root, locale_a, resource_id, f"{msg_id} = {primary_val}\n") + _write_loader_resource(root, locale_b, resource_id, f"{msg_id} = {fallback_val}\n") + + loader = PathResourceLoader(str(root / "{locale}")) + l10n = FluentLocalization( + [locale_a, locale_b], + [resource_id], + loader, + strict=False, + ) + summary = l10n.get_load_summary() + + if summary.successful != 2 or not summary.all_successful: + msg = ( + f"Expected two successful eager loads, got successful={summary.successful}, " + f"not_found={summary.not_found}, errors={summary.errors}" + ) + raise LocalizationFuzzError(msg) + if summary.has_errors or summary.has_junk: + msg = ( + f"Unexpected summary state: has_errors={summary.has_errors}, " + f"has_junk={summary.has_junk}" + ) + raise LocalizationFuzzError(msg) + if any(result.source_path is None for result in summary.results): + msg = "Loader summary missing source_path on successful result" + raise LocalizationFuzzError(msg) + + result, errors = l10n.format_pattern(msg_id) + if errors: + msg = f"Loader-backed localization unexpectedly returned errors: {errors!r}" + raise LocalizationFuzzError(msg) + if primary_val not in result: + msg = f"Primary locale value {primary_val!r} missing from result {result!r}" + raise LocalizationFuzzError(msg) + + +def _pattern_loader_not_found_fallback( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Primary miss is tracked as not_found while fallback still resolves.""" + _domain.loader_init_checks += 1 + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + resource_id = "main.ftl" + msg_id = gen_ftl_identifier(fdp) + val = gen_ftl_value(fdp) + + with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: + root = pathlib.Path(tmp_dir) + _write_loader_resource(root, fallback, resource_id, f"{msg_id} = {val}\n") + + loader = PathResourceLoader(str(root / "{locale}")) + l10n = FluentLocalization( + [primary, fallback], + [resource_id], + loader, + strict=False, + ) + summary = l10n.get_load_summary() + + if summary.successful != 1 or summary.not_found != 1 or summary.errors != 0: + msg = ( + f"Unexpected mixed summary: successful={summary.successful}, " + f"not_found={summary.not_found}, errors={summary.errors}" + ) + raise LocalizationFuzzError(msg) + + result, errors = l10n.format_pattern(msg_id) + if errors: + msg = f"Fallback load should resolve successfully, got errors={errors!r}" + raise LocalizationFuzzError(msg) + if val not in result: + msg = f"Fallback value {val!r} missing from result {result!r}" + raise LocalizationFuzzError(msg) + + +def _pattern_loader_junk_summary( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Junk entries discovered during eager load are preserved in LoadSummary.""" + _domain.loader_junk_checks += 1 + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + resource_id = "broken.ftl" + junk_source = f"{gen_ftl_identifier(fdp)} = {{\n" + + with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: + root = pathlib.Path(tmp_dir) + _write_loader_resource(root, locale, resource_id, junk_source) + + loader = PathResourceLoader(str(root / "{locale}")) + l10n = FluentLocalization([locale], [resource_id], loader, strict=False) + summary = l10n.get_load_summary() + + if summary.successful != 1 or not summary.has_junk or summary.junk_count < 1: + msg = ( + f"Expected junk-bearing successful load, got successful={summary.successful}, " + f"has_junk={summary.has_junk}, junk_count={summary.junk_count}" + ) + raise LocalizationFuzzError(msg) + if summary.all_clean: + msg = "LoadSummary.all_clean unexpectedly true for junk input" + raise LocalizationFuzzError(msg) + if not summary.get_with_junk(): + msg = "LoadSummary.get_with_junk() returned empty tuple" + raise LocalizationFuzzError(msg) + + +def _pattern_loader_path_error( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Invalid resource IDs surface as loader errors in the eager-load summary.""" + _domain.loader_error_checks += 1 + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + invalid_resource_id = fdp.PickValueInList( + [ + "../escape.ftl", + " main.ftl", + "/absolute.ftl", + ] + ) + + with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: + root = pathlib.Path(tmp_dir) + loader = PathResourceLoader(str(root / "{locale}")) + l10n = FluentLocalization( + [locale], + [invalid_resource_id], + loader, + strict=False, + ) + summary = l10n.get_load_summary() + + if summary.errors != 1 or not summary.has_errors: + msg = ( + f"Expected one loader error for invalid resource_id, got " + f"errors={summary.errors}, not_found={summary.not_found}" + ) + raise LocalizationFuzzError(msg) + + first_error = summary.get_errors()[0].error + if not isinstance(first_error, ValueError): + msg = ( + "Expected ValueError from PathResourceLoader validation, got " + f"{type(first_error).__name__}" + ) + raise LocalizationFuzzError(msg) + + +def _check_require_clean_empty_init( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Empty initialization is considered clean.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + l10n = FluentLocalization([locale], strict=False) + try: + summary = l10n.require_clean() + except IntegrityCheckFailedError as err: + msg = f"require_clean() raised on empty initialization: {err}" + raise LocalizationFuzzError(msg) from err + + if not summary.all_clean or summary.total_attempted != 0: + msg = f"Empty initialization should be clean, got {summary!r}" + raise LocalizationFuzzError(msg) + + +def _check_require_clean_loader_success( + fdp: atheris.FuzzedDataProvider, +) -> None: + """All-success loader summaries return from require_clean().""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + resource_id = "main.ftl" + message_id = f"clean-{gen_ftl_identifier(fdp)}" + value = gen_ftl_value(fdp) + + with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: + root = pathlib.Path(tmp_dir) + _write_loader_resource(root, locale, resource_id, f"{message_id} = {value}\n") + loader = PathResourceLoader(str(root / "{locale}")) + l10n = FluentLocalization([locale], [resource_id], loader, strict=False) + try: + summary = l10n.require_clean() + except IntegrityCheckFailedError as err: + msg = f"require_clean() rejected an all-success summary: {err}" + raise LocalizationFuzzError(msg) from err + + if not summary.all_clean or summary.successful != 1 or summary.errors != 0: + msg = f"Clean loader initialization returned wrong summary: {summary!r}" + raise LocalizationFuzzError(msg) + + +def _check_require_clean_missing_loader( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Missing resources fail require_clean() with integrity context.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + normalized_locale = normalize_locale(locale) + resource_id = "main.ftl" + + class MissingLoader: + def load(self, _locale: str, _resource_id: str) -> str: + msg = "missing" + raise FileNotFoundError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization([locale], [resource_id], MissingLoader(), strict=False) + + try: + l10n.require_clean() + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="require_clean", + message_fragment="not clean", + key=f"{normalized_locale}/{resource_id}", + actual_fragment="LoadSummary(", + ) + else: + msg = "require_clean() accepted a missing-resource summary" + raise LocalizationFuzzError(msg) + + +def _check_require_clean_junk_resource( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Junk-bearing resources fail require_clean().""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + resource_id = "broken.ftl" + junk_source = f"{gen_ftl_identifier(fdp)} = {{\n" + + with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: + root = pathlib.Path(tmp_dir) + _write_loader_resource(root, locale, resource_id, junk_source) + loader = PathResourceLoader(str(root / "{locale}")) + l10n = FluentLocalization([locale], [resource_id], loader, strict=False) + + try: + l10n.require_clean() + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="require_clean", + message_fragment="junk", + key_fragment=resource_id, + actual_fragment="LoadSummary(", + ) + else: + msg = "require_clean() accepted a junk-bearing summary" + raise LocalizationFuzzError(msg) + + +def _check_require_clean_loader_error( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Loader validation errors fail require_clean().""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + invalid_resource_id = fdp.PickValueInList( + [ + "../escape.ftl", + " main.ftl", + "/absolute.ftl", + ] + ) + + with TemporaryDirectory(prefix="ftllexengine-fuzz-loader-") as tmp_dir: + root = pathlib.Path(tmp_dir) + loader = PathResourceLoader(str(root / "{locale}")) + l10n = FluentLocalization( + [locale], + [invalid_resource_id], + loader, + strict=False, + ) + + try: + l10n.require_clean() + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="require_clean", + message_fragment="load error", + key_fragment=invalid_resource_id, + actual_fragment="LoadSummary(", + ) + else: + msg = "require_clean() accepted a loader error summary" + raise LocalizationFuzzError(msg) + + +def _pattern_require_clean_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """require_clean returns only for clean initialization summaries.""" + _domain.boot_validation_checks += 1 + handlers = ( + _check_require_clean_empty_init, + _check_require_clean_loader_success, + _check_require_clean_missing_loader, + _check_require_clean_junk_resource, + _check_require_clean_loader_error, + ) + handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] + handler(fdp) + + diff --git a/fuzz_atheris/fuzz_localization_patterns_validation.py b/fuzz_atheris/fuzz_localization_patterns_validation.py new file mode 100644 index 00000000..76b44ed6 --- /dev/null +++ b/fuzz_atheris/fuzz_localization_patterns_validation.py @@ -0,0 +1,339 @@ +# mypy: disable-error-code=name-defined +from fuzz_localization_patterns_basic import ( + _assert_integrity_failure, + _build_variable_message, +) +from fuzz_localization_support import ( + _LOCALE_PAIRS, + _SINGLE_LOCALES, + FluentLocalization, + IntegrityCheckFailedError, + LocalizationFuzzError, + _domain, + atheris, + gen_ftl_identifier, + validate_message_variables, +) + + +def _check_message_schema_exact_success( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Exact schemas succeed and preserve input order.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + message_count = fdp.ConsumeIntInRange(1, 3) + expected_schemas: dict[str, frozenset[str] | set[str]] = {} + resource_parts: list[str] = [] + + for index in range(message_count): + message_id = f"schema-{index}-{gen_ftl_identifier(fdp)}" + variable_count = fdp.ConsumeIntInRange(1, 2) + variables = tuple( + f"var{index}_{slot}_{gen_ftl_identifier(fdp)}" for slot in range(variable_count) + ) + expected = frozenset(variables) if fdp.ConsumeBool() else set(variables) + expected_schemas[message_id] = expected + resource_parts.append(_build_variable_message(message_id, variables)) + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, "".join(resource_parts)) + try: + results = l10n.validate_message_schemas(expected_schemas) + except IntegrityCheckFailedError as err: + msg = f"validate_message_schemas() raised on exact schemas: {err}" + raise LocalizationFuzzError(msg) from err + + if [result.message_id for result in results] != list(expected_schemas): + msg = ( + "validate_message_schemas() returned results out of input order: " + f"{[result.message_id for result in results]!r} vs {list(expected_schemas)!r}" + ) + raise LocalizationFuzzError(msg) + for result in results: + expected_variables = frozenset(expected_schemas[result.message_id]) + if not result.is_valid or result.declared_variables != expected_variables: + msg = ( + "validate_message_schemas() returned invalid exact-match result: " + f"{result!r} vs {expected_variables!r}" + ) + raise LocalizationFuzzError(msg) + + +def _assert_localization_message_validation_matches_lookup( + l10n: FluentLocalization, + message_id: str, + expected_variables: frozenset[str] | set[str], +) -> None: + """Single-message validation should match direct AST validation.""" + message = l10n.get_message(message_id) + if message is None: + msg = f"get_message('{message_id}') returned None during schema validation" + raise LocalizationFuzzError(msg) + + direct = validate_message_variables(message, frozenset(expected_variables)) + resolved = l10n.validate_message_variables(message_id, expected_variables) + if resolved != direct: + msg = ( + "validate_message_variables() diverged from direct AST validation: " + f"{resolved!r} vs {direct!r}" + ) + raise LocalizationFuzzError(msg) + + +def _check_single_message_validation_success( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Single-message exact-schema validation succeeds for direct hits.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + message_id = f"single-{gen_ftl_identifier(fdp)}" + variable_count = fdp.ConsumeIntInRange(1, 2) + variables = tuple( + f"var_{slot}_{gen_ftl_identifier(fdp)}" for slot in range(variable_count) + ) + expected_variables: frozenset[str] | set[str] = ( + frozenset(variables) if fdp.ConsumeBool() else set(variables) + ) + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, _build_variable_message(message_id, variables)) + _assert_localization_message_validation_matches_lookup( + l10n, + message_id, + expected_variables, + ) + + +def _check_single_message_validation_fallback_success( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Single-message validation resolves through localization fallback.""" + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + message_id = f"fallback-single-{gen_ftl_identifier(fdp)}" + variable = f"fallback_{gen_ftl_identifier(fdp)}" + expected_variables: frozenset[str] | set[str] = ( + frozenset({variable}) if fdp.ConsumeBool() else {variable} + ) + + l10n = FluentLocalization([primary, fallback], strict=False) + l10n.add_resource(fallback, _build_variable_message(message_id, (variable,))) + _assert_localization_message_validation_matches_lookup( + l10n, + message_id, + expected_variables, + ) + + +def _check_single_message_validation_missing_message( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Missing messages fail the single-message localization validator.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + missing_id = f"missing-single-{gen_ftl_identifier(fdp)}" + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, "present = value\n") + + try: + l10n.validate_message_variables(missing_id, frozenset()) + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="validate_message_variables", + message_fragment=f"{missing_id}: not found", + key=missing_id, + actual_fragment="missing_messages=1", + ) + else: + msg = "validate_message_variables() accepted a missing message" + raise LocalizationFuzzError(msg) + + +def _check_single_message_validation_extra_variable( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Extra declared variables fail exact single-message validation.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + message_id = f"extra-single-{gen_ftl_identifier(fdp)}" + amount_var = f"amount_{gen_ftl_identifier(fdp)}" + customer_var = f"customer_{gen_ftl_identifier(fdp)}" + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource( + locale, + _build_variable_message(message_id, (amount_var, customer_var)), + ) + + try: + l10n.validate_message_variables(message_id, frozenset({amount_var})) + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="validate_message_variables", + message_fragment=f"{message_id}: extra {{{customer_var}}}", + key=message_id, + actual_fragment="schema_mismatches=1", + ) + else: + msg = "validate_message_variables() accepted an extra-variable mismatch" + raise LocalizationFuzzError(msg) + + +def _check_single_message_validation_missing_variable( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Missing expected variables fail exact single-message validation.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + message_id = f"missing-var-single-{gen_ftl_identifier(fdp)}" + amount_var = f"amount_{gen_ftl_identifier(fdp)}" + customer_var = f"customer_{gen_ftl_identifier(fdp)}" + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, _build_variable_message(message_id, (amount_var,))) + + try: + l10n.validate_message_variables(message_id, {amount_var, customer_var}) + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="validate_message_variables", + message_fragment=f"{message_id}: missing {{{customer_var}}}", + key=message_id, + actual_fragment="schema_mismatches=1", + ) + else: + msg = "validate_message_variables() accepted a missing-variable mismatch" + raise LocalizationFuzzError(msg) + + +def _pattern_validate_message_variables_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """validate_message_variables enforces exact schemas per message.""" + _domain.message_variable_validation_checks += 1 + handlers = ( + _check_single_message_validation_success, + _check_single_message_validation_fallback_success, + _check_single_message_validation_missing_message, + _check_single_message_validation_extra_variable, + _check_single_message_validation_missing_variable, + ) + handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] + handler(fdp) + + +def _check_message_schema_fallback_success( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Fallback-resolved messages validate through the localization facade.""" + primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) + message_id = f"fallback-{gen_ftl_identifier(fdp)}" + variable = f"fallback_{gen_ftl_identifier(fdp)}" + + l10n = FluentLocalization([primary, fallback], strict=False) + l10n.add_resource(fallback, _build_variable_message(message_id, (variable,))) + try: + results = l10n.validate_message_schemas({message_id: frozenset({variable})}) + except IntegrityCheckFailedError as err: + msg = f"validate_message_schemas() rejected fallback-resolved schema: {err}" + raise LocalizationFuzzError(msg) from err + + if len(results) != 1 or not results[0].is_valid: + msg = f"Fallback schema validation returned {results!r}" + raise LocalizationFuzzError(msg) + + +def _check_message_schema_missing_message( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Missing messages fail exact schema validation.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + missing_id = f"missing-{gen_ftl_identifier(fdp)}" + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, "present = value\n") + + try: + l10n.validate_message_schemas({missing_id: frozenset()}) + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="validate_message_schemas", + message_fragment=f"{missing_id}: not found", + key=missing_id, + actual_fragment="missing_messages=1", + ) + else: + msg = "validate_message_schemas() accepted a missing message" + raise LocalizationFuzzError(msg) + + +def _check_message_schema_extra_variable( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Extra variables in the message fail exact schema validation.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + message_id = f"extra-{gen_ftl_identifier(fdp)}" + amount_var = f"amount_{gen_ftl_identifier(fdp)}" + customer_var = f"customer_{gen_ftl_identifier(fdp)}" + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource( + locale, + _build_variable_message(message_id, (amount_var, customer_var)), + ) + + try: + l10n.validate_message_schemas({message_id: frozenset({amount_var})}) + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="validate_message_schemas", + message_fragment=f"{message_id}: extra {{{customer_var}}}", + key=message_id, + actual_fragment="schema_mismatches=1", + ) + else: + msg = "validate_message_schemas() accepted an extra-variable mismatch" + raise LocalizationFuzzError(msg) + + +def _check_message_schema_missing_variable( + fdp: atheris.FuzzedDataProvider, +) -> None: + """Missing expected variables fail exact schema validation.""" + locale = fdp.PickValueInList(list(_SINGLE_LOCALES)) + message_id = f"missing-var-{gen_ftl_identifier(fdp)}" + amount_var = f"amount_{gen_ftl_identifier(fdp)}" + customer_var = f"customer_{gen_ftl_identifier(fdp)}" + + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, _build_variable_message(message_id, (amount_var,))) + + try: + l10n.validate_message_schemas({message_id: {amount_var, customer_var}}) + except IntegrityCheckFailedError as err: + _assert_integrity_failure( + err, + operation="validate_message_schemas", + message_fragment=f"{message_id}: missing {{{customer_var}}}", + key=message_id, + actual_fragment="schema_mismatches=1", + ) + else: + msg = "validate_message_schemas() accepted a missing-variable mismatch" + raise LocalizationFuzzError(msg) + + +def _pattern_validate_message_schemas_api( + fdp: atheris.FuzzedDataProvider, +) -> None: + """validate_message_schemas enforces exact schemas through localization.""" + _domain.schema_validation_checks += 1 + handlers = ( + _check_message_schema_exact_success, + _check_message_schema_fallback_success, + _check_message_schema_missing_message, + _check_message_schema_extra_variable, + _check_message_schema_missing_variable, + ) + handler = handlers[fdp.ConsumeIntInRange(0, len(handlers) - 1)] + handler(fdp) + diff --git a/fuzz_atheris/fuzz_localization_support.py b/fuzz_atheris/fuzz_localization_support.py new file mode 100644 index 00000000..441154fd --- /dev/null +++ b/fuzz_atheris/fuzz_localization_support.py @@ -0,0 +1,313 @@ +"""Shared state, imports, and constants for the localization Atheris fuzzer.""" + +from __future__ import annotations + +import argparse +import atexit +import gc +import logging +import pathlib +import sys +import time +from dataclasses import dataclass +from tempfile import TemporaryDirectory +from typing import TYPE_CHECKING, Any, cast + +if TYPE_CHECKING: + from collections.abc import Sequence + +# --- Dependency Checks --- +_psutil_mod: Any = None +_atheris_mod: Any = None + +try: + import psutil as _psutil_import +except ImportError: + pass +else: + _psutil_mod = _psutil_import + +try: + import atheris as _atheris_import +except ImportError: + pass +else: + _atheris_mod = _atheris_import + +from fuzz_common import ( # noqa: E402 # pylint: disable=C0413 + GC_INTERVAL, + BaseFuzzerState, + build_base_stats_dict, + build_weighted_schedule, + check_dependencies, + emit_checkpoint_report, + emit_final_report, + gen_ftl_identifier, + gen_ftl_value, + get_process, + print_fuzzer_banner, + record_iteration_metrics, + record_memory, + run_fuzzer, + select_pattern_round_robin, +) + +check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) + +atheris = cast("Any", _atheris_mod) + +@dataclass +class LocalizationMetrics: + """Domain-specific metrics for localization fuzzer.""" + + fallback_triggered: int = 0 + messages_found: int = 0 + messages_missing: int = 0 + custom_function_calls: int = 0 + add_resource_mutations: int = 0 + has_message_checks: int = 0 + introspect_calls: int = 0 + ast_lookup_checks: int = 0 + validate_calls: int = 0 + message_variable_validation_checks: int = 0 + schema_validation_checks: int = 0 + cache_audit_checks: int = 0 + locale_boundary_checks: int = 0 + loader_init_checks: int = 0 + loader_junk_checks: int = 0 + loader_error_checks: int = 0 + boot_validation_checks: int = 0 + boot_config_checks: int = 0 + + +class LocalizationFuzzError(Exception): + """Raised when an invariant breach is detected.""" + +_ALLOWED_EXCEPTIONS = ( + ValueError, # empty locale list, locale not in chain, whitespace + TypeError, # invalid argument types + UnicodeEncodeError, # surrogate characters in FTL source +) +_PATTERN_WEIGHTS: Sequence[tuple[str, int]] = ( + ("single_locale_add_resource", 10), + ("multi_locale_fallback", 10), + ("chain_of_3_fallback", 8), + ("format_value_missing", 7), + ("format_with_variables", 9), + ("add_resource_mutation", 7), + ("has_message_api", 7), + ("ast_lookup_api", 7), + ("get_message_ids_api", 6), + ("validate_resource_api", 7), + ("validate_message_variables_api", 6), + ("validate_message_schemas_api", 6), + ("add_function_custom", 6), + ("introspect_api", 7), + ("cache_audit_api", 6), + ("locale_boundary_api", 5), + ("on_fallback_callback", 6), + ("loader_init_success", 5), + ("loader_not_found_fallback", 5), + ("loader_junk_summary", 4), + ("loader_path_error", 4), + ("require_clean_api", 5), + ("boot_config_api", 6), +) + +_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( + [name for name, _ in _PATTERN_WEIGHTS], + [weight for _, weight in _PATTERN_WEIGHTS], +) +_PATTERN_INDEX: dict[str, int] = {name: i for i, (name, _) in enumerate(_PATTERN_WEIGHTS)} +_LOCALE_PAIRS: Sequence[tuple[str, str]] = ( + ("en-US", "en"), + ("de-DE", "de"), + ("fr-FR", "fr"), + ("ja-JP", "ja"), + ("ar-SA", "ar"), + ("zh-CN", "zh"), + ("ko-KR", "ko"), + ("pt-BR", "pt"), + ("es-ES", "es"), + ("sv-SE", "sv"), +) + +_LOCALE_TRIPLES: Sequence[tuple[str, str, str]] = ( + ("lv", "en-US", "en"), + ("lt", "en-GB", "en"), + ("pl", "de-AT", "de"), + ("uk", "ru-RU", "ru"), + ("zh-TW", "zh-CN", "zh"), +) + +_SINGLE_LOCALES: Sequence[str] = ( + "en-US", + "de-DE", + "fr-FR", + "ja-JP", + "ko-KR", + "ar-SA", + "zh-CN", + "pt-BR", + "es-ES", + "sv-SE", +) +_STRUCTURALLY_INVALID_LOCALES: Sequence[str] = ( + "en/US", + "en US", + "en@US", + "123_US", + "\x00\x01\x02", + "en-US" + "\x00" * 8, + "invalid!!", +) +_NON_STRING_LOCALES: Sequence[object] = ( + None, + 0, + 1.5, + ["en-US"], + {"locale": "en-US"}, +) +_VALID_AUDIT_OPERATIONS: frozenset[str] = frozenset( + { + "MISS", + "PUT", + "HIT", + "EVICT", + "CORRUPTION", + "WRITE_ONCE_IDEMPOTENT", + "WRITE_ONCE_CONFLICT", + } +) +_state = BaseFuzzerState( + checkpoint_interval=500, + seed_corpus_max_size=500, + fuzzer_name="localization", + fuzzer_target=( + "FluentLocalization (locale boundary, multi-locale fallback chains, " + "add_resource, format_pattern, introspection)" + ), + pattern_intended_weights={name: float(w) for name, w in _PATTERN_WEIGHTS}, +) +_domain = LocalizationMetrics() + +_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "localization" +_REPORT_FILENAME = "fuzz_localization_report.json" + + +def _build_stats_dict() -> dict[str, Any]: + """Build complete stats dictionary including domain metrics.""" + stats = cast("dict[str, Any]", build_base_stats_dict(_state)) + stats["fallback_triggered"] = _domain.fallback_triggered + stats["messages_found"] = _domain.messages_found + stats["messages_missing"] = _domain.messages_missing + stats["custom_function_calls"] = _domain.custom_function_calls + stats["add_resource_mutations"] = _domain.add_resource_mutations + stats["has_message_checks"] = _domain.has_message_checks + stats["introspect_calls"] = _domain.introspect_calls + stats["ast_lookup_checks"] = _domain.ast_lookup_checks + stats["validate_calls"] = _domain.validate_calls + stats["message_variable_validation_checks"] = _domain.message_variable_validation_checks + stats["schema_validation_checks"] = _domain.schema_validation_checks + stats["cache_audit_checks"] = _domain.cache_audit_checks + stats["locale_boundary_checks"] = _domain.locale_boundary_checks + stats["loader_init_checks"] = _domain.loader_init_checks + stats["loader_junk_checks"] = _domain.loader_junk_checks + stats["loader_error_checks"] = _domain.loader_error_checks + stats["boot_validation_checks"] = _domain.boot_validation_checks + stats["boot_config_checks"] = _domain.boot_config_checks + total = _domain.messages_found + _domain.messages_missing + if total > 0: + stats["fallback_hit_ratio"] = round(_domain.fallback_triggered / total, 3) + return stats + + +def _emit_checkpoint() -> None: + """Emit periodic checkpoint (uses checkpoint markers).""" + stats = _build_stats_dict() + emit_checkpoint_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) + + +def _emit_report() -> None: + """Emit crash-proof final report.""" + stats = _build_stats_dict() + emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) + + +atexit.register(_emit_report) + +# --- Suppress logging and instrument imports --- +logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) + +with atheris.instrument_imports(include=["ftllexengine"]): + from ftllexengine import validate_message_variables + from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT + from ftllexengine.core.locale_utils import normalize_locale, require_locale_code + from ftllexengine.diagnostics.errors import FrozenFluentError + from ftllexengine.integrity import ( + DataIntegrityError, + FormattingIntegrityError, + IntegrityCheckFailedError, + SyntaxIntegrityError, + ) + from ftllexengine.localization import ( + CacheAuditLogEntry, + FluentLocalization, + LocalizationBootConfig, + LocalizationCacheStats, + ) + from ftllexengine.localization.loading import FallbackInfo, PathResourceLoader + from ftllexengine.runtime.cache_config import CacheConfig + from ftllexengine.syntax import Message, Term + + +__all__ = [ + "GC_INTERVAL", + "MAX_LOCALE_LENGTH_HARD_LIMIT", + "_ALLOWED_EXCEPTIONS", + "_LOCALE_PAIRS", + "_LOCALE_TRIPLES", + "_NON_STRING_LOCALES", + "_PATTERN_SCHEDULE", + "_PATTERN_WEIGHTS", + "_SINGLE_LOCALES", + "_STRUCTURALLY_INVALID_LOCALES", + "_VALID_AUDIT_OPERATIONS", + "Any", + "CacheAuditLogEntry", + "CacheConfig", + "DataIntegrityError", + "FallbackInfo", + "FluentLocalization", + "FormattingIntegrityError", + "FrozenFluentError", + "IntegrityCheckFailedError", + "LocalizationBootConfig", + "LocalizationCacheStats", + "LocalizationFuzzError", + "Message", + "PathResourceLoader", + "SyntaxIntegrityError", + "TemporaryDirectory", + "Term", + "_domain", + "_emit_checkpoint", + "_state", + "argparse", + "atheris", + "gc", + "gen_ftl_identifier", + "gen_ftl_value", + "get_process", + "normalize_locale", + "pathlib", + "print_fuzzer_banner", + "record_iteration_metrics", + "record_memory", + "require_locale_code", + "run_fuzzer", + "select_pattern_round_robin", + "sys", + "time", + "validate_message_variables", +] diff --git a/fuzz_atheris/fuzz_lock.py b/fuzz_atheris/fuzz_lock.py index 54465ee7..1bdb4adc 100644 --- a/fuzz_atheris/fuzz_lock.py +++ b/fuzz_atheris/fuzz_lock.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: lock - RWLock Concurrency & Contention -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """RWLock Contention Fuzzer (Atheris). Targets: diff --git a/fuzz_atheris/fuzz_numbers.py b/fuzz_atheris/fuzz_numbers.py index df421d07..c78c29d8 100644 --- a/fuzz_atheris/fuzz_numbers.py +++ b/fuzz_atheris/fuzz_numbers.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: numbers - NUMBER Function Runtime Formatting (Oracle) -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """NUMBER Function Runtime Formatting Oracle Fuzzer (Atheris). Targets: ftllexengine.runtime.functions.number_format diff --git a/fuzz_atheris/fuzz_oom.py b/fuzz_atheris/fuzz_oom.py index 1f3459c6..3d43fdfd 100644 --- a/fuzz_atheris/fuzz_oom.py +++ b/fuzz_atheris/fuzz_oom.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: oom - Memory Density (Object Explosion) -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Memory Density and Object Explosion Fuzzer (Atheris). Targets: ftllexengine.syntax.parser.FluentParserV1 diff --git a/fuzz_atheris/fuzz_parse_currency.py b/fuzz_atheris/fuzz_parse_currency.py index 4b3dfe47..faa1ceaf 100644 --- a/fuzz_atheris/fuzz_parse_currency.py +++ b/fuzz_atheris/fuzz_parse_currency.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: parse_currency - Locale-aware currency parsing and symbol resolution -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Locale-aware currency parsing fuzzer (Atheris). Targets: diff --git a/fuzz_atheris/fuzz_parse_decimal.py b/fuzz_atheris/fuzz_parse_decimal.py index 7265cfb0..8faaeab1 100644 --- a/fuzz_atheris/fuzz_parse_decimal.py +++ b/fuzz_atheris/fuzz_parse_decimal.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: parse_decimal - Locale-aware decimal parsing, FluentNumber parsing, and locale utils -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Locale-aware decimal parsing fuzzer (Atheris). Targets: diff --git a/fuzz_atheris/fuzz_plural.py b/fuzz_atheris/fuzz_plural.py index 3bc01da6..6dbc1500 100644 --- a/fuzz_atheris/fuzz_plural.py +++ b/fuzz_atheris/fuzz_plural.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: plural - Plural Rule Boundary & CLDR -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Plural Rule Boundary Fuzzer (Atheris). Targets: ftllexengine.runtime.plural_rules.select_plural_category diff --git a/fuzz_atheris/fuzz_roundtrip.py b/fuzz_atheris/fuzz_roundtrip.py index dfbd9bed..6c2c2a73 100644 --- a/fuzz_atheris/fuzz_roundtrip.py +++ b/fuzz_atheris/fuzz_roundtrip.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: roundtrip - Metamorphic roundtrip (Parser <-> Serializer) -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Metamorphic Roundtrip Fuzzer (Atheris). Targets: ftllexengine.syntax.parser.FluentParserV1, diff --git a/fuzz_atheris/fuzz_runtime.py b/fuzz_atheris/fuzz_runtime.py index ea6091fa..223ab4d9 100644 --- a/fuzz_atheris/fuzz_runtime.py +++ b/fuzz_atheris/fuzz_runtime.py @@ -1,1405 +1,9 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: runtime - End-to-End Runtime & strict mode validation -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END -"""Runtime End-to-End Fuzzer (Atheris). - -Grammar-aware fuzzer targeting the full runtime stack: FluentBundle, -IntegrityCache, Resolver, constructor locale boundaries, and strict mode -integrity guarantees. - -Uses structured construction from fuzzed bytes so that libFuzzer mutations -map to meaningful FTL grammar variations (message structure, selector types, -function calls, term references, attribute access, nesting depth). This -enables coverage-guided exploration of resolver dispatch paths, select -expression matching, built-in function formatting, cache key construction, -cycle detection, and error recovery. - -Metrics: -- Scenario coverage (strict mode, caching, integrity, security, concurrent) -- Weight skew detection (actual vs intended distribution) -- Performance profiling (min/mean/median/p95/p99/max) -- Real memory usage (RSS via psutil) -- Corpus retention rate and eviction tracking -- Error distribution and contract violations -- Seed corpus management - -Requires Python 3.13+ (uses PEP 695 type aliases). -""" +"""Runtime End-to-End Fuzzer (Atheris).""" from __future__ import annotations -import argparse -import atexit -import contextlib -import gc -import logging -import pathlib -import sys -import threading -import time -from dataclasses import dataclass -from datetime import UTC, datetime -from typing import TYPE_CHECKING, Any - -if TYPE_CHECKING: - from collections.abc import Sequence - -# --- Dependency Checks --- -_psutil_mod: Any = None -_atheris_mod: Any = None - -try: # noqa: SIM105 - need module ref for check_dependencies - import psutil as _psutil_mod # type: ignore[no-redef] -except ImportError: - pass - -try: # noqa: SIM105 - need module ref for check_dependencies - import atheris as _atheris_mod # type: ignore[no-redef] -except ImportError: - pass - -from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 - GC_INTERVAL, - BaseFuzzerState, - build_base_stats_dict, - build_weighted_schedule, - check_dependencies, - emit_checkpoint_report, - emit_final_report, - get_process, - print_fuzzer_banner, - record_iteration_metrics, - record_memory, - run_fuzzer, - select_pattern_round_robin, -) - -check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) - -import atheris # noqa: E402 # pylint: disable=C0412,C0413 - -# --- PEP 695 Type Alias --- -type ComplexArgs = dict[str, Any] - - -# --- Domain Metrics --- - -@dataclass -class RuntimeMetrics: - """Domain-specific metrics for runtime fuzzer.""" - - strict_mode_tests: int = 0 - cache_operations: int = 0 - integrity_checks: int = 0 - security_tests: int = 0 - concurrent_tests: int = 0 - differential_tests: int = 0 - - # Contract validation - frozen_error_verifications: int = 0 - cache_stability_checks: int = 0 - corruption_simulations: int = 0 - ast_lookup_checks: int = 0 - locale_boundary_checks: int = 0 - - -# --- Global State --- - -_state = BaseFuzzerState( - seed_corpus_max_size=500, - fuzzer_name="runtime", - fuzzer_target="FluentBundle, IntegrityCache, Resolver, Strict Mode, Locale Boundary", -) -_domain = RuntimeMetrics() - - -# --- Test Configuration Constants --- -TEST_LOCALES: Sequence[str] = ( - "en-US", - "en-GB", - "lv-LV", - "ar-EG", - "ar-SA", - "pl-PL", - "zh-CN", - "ja-JP", - "de-DE", - "fr-FR", - "", # Empty locale - "C", # POSIX - "root", # CLDR root -) - -_STRUCTURALLY_INVALID_LOCALES: Sequence[str] = ( - "en/US", - "en US", - "en@US", - "123_US", - "\x00\x01\x02", - "en-US" + "\x00" * 8, - "invalid!!", -) - -_NON_STRING_LOCALES: Sequence[object] = ( - None, - 0, - 1.5, - ["en-US"], - {"locale": "en-US"}, -) - -TARGET_MESSAGE_IDS: Sequence[str] = ( - "msg", - "msg2", - "msg3", - "ref", - "tref", - "attr", - "cyclic", - "deep", - "func_call", - "num_sel", - "str_sel", - "nested", - "chain_a", - "chain_b", - "chain_c", - "nonexistent", -) - -# --- Grammar-Aware FTL Construction --- - -_IDENTIFIERS: Sequence[str] = ( - "msg", - "msg2", - "msg3", - "ref", - "tref", - "attr", - "func_call", - "num_sel", - "str_sel", - "nested", - "chain_a", - "chain_b", - "chain_c", - "deep", -) - -_TERM_IDENTIFIERS: Sequence[str] = ( - "-brand", - "-term", - "-os", - "-platform", - "-greeting", -) -_TERM_QUERY_IDS: Sequence[str] = tuple( - term.removeprefix("-") for term in _TERM_IDENTIFIERS -) - -_VAR_NAMES: Sequence[str] = ( - "$var", - "$name", - "$count", - "$amount", - "$date", - "$var_0", - "$var_1", - "$var_2", - "$var_3", -) - -_BUILTIN_FUNCTIONS: Sequence[str] = ( - "NUMBER", - "DATETIME", - "CURRENCY", -) - -_NUMBER_OPTS: Sequence[str] = ( - "minimumFractionDigits: 0", - "minimumFractionDigits: 2", - "maximumFractionDigits: 0", - "maximumFractionDigits: 5", - 'useGrouping: "true"', - 'useGrouping: "false"', -) - -_DATETIME_OPTS: Sequence[str] = ( - 'dateStyle: "short"', - 'dateStyle: "medium"', - 'dateStyle: "long"', - 'dateStyle: "full"', - 'timeStyle: "short"', - 'timeStyle: "long"', -) - -_CURRENCY_OPTS: Sequence[str] = ( - 'currency: "USD"', - 'currency: "EUR"', - 'currency: "JPY"', - 'currency: "BHD"', - 'currencyDisplay: "symbol"', - 'currencyDisplay: "code"', - 'currencyDisplay: "name"', -) - -_SELECTOR_KEYS: Sequence[str] = ( - "one", - "two", - "few", - "many", - "other", - "zero", -) - -_UNICODE_TEXTS: Sequence[str] = ( - "Hello", - "© ® ™", - "😀 🌟 🚀", - "مرحبا عالم", - "c\u0308a\u0308f\u0308e\u0308", - "\u200b\u200e\u200f", - "边界条件", - "", -) - - -# Scenario weights: (name, weight) -_SCENARIO_WEIGHTS: tuple[tuple[str, int], ...] = ( - ("core_runtime", 40), - ("strict_mode", 20), - ("caching", 15), - ("security", 10), - ("concurrent", 10), - ("differential", 5), -) - -_SCENARIO_SCHEDULE: tuple[str, ...] = build_weighted_schedule( - [name for name, _ in _SCENARIO_WEIGHTS], - [weight for _, weight in _SCENARIO_WEIGHTS], -) - -# Register intended weights for skew detection -_state.pattern_intended_weights = {name: float(weight) for name, weight in _SCENARIO_WEIGHTS} - -# Security attack sub-schedule -_SECURITY_WEIGHTS: tuple[tuple[str, int], ...] = ( - ("security_recursion", 25), - ("security_memory", 20), - ("security_cache_poison", 15), - ("security_function_inject", 12), - ("security_locale_boundary", 8), - ("security_expansion_budget", 8), - ("security_dag_expansion", 7), - ("security_dict_functions", 5), -) - -_SECURITY_SCHEDULE: tuple[str, ...] = build_weighted_schedule( - [name for name, _ in _SECURITY_WEIGHTS], - [weight for _, weight in _SECURITY_WEIGHTS], -) - - -class RuntimeIntegrityError(Exception): - """Raised when a runtime invariant is breached.""" - - -# --- Reporting --- - -_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "runtime" - - -def _build_stats_dict() -> dict[str, Any]: - """Build complete stats dictionary including domain metrics.""" - stats = build_base_stats_dict( - _state, - coverage_key="scenarios_tested", - coverage_prefix="scenario_", - ) - - # Domain-specific metrics - stats["strict_mode_tests"] = _domain.strict_mode_tests - stats["cache_operations"] = _domain.cache_operations - stats["integrity_checks"] = _domain.integrity_checks - stats["security_tests"] = _domain.security_tests - stats["concurrent_tests"] = _domain.concurrent_tests - stats["differential_tests"] = _domain.differential_tests - - # Contract validation metrics - stats["frozen_error_verifications"] = _domain.frozen_error_verifications - stats["cache_stability_checks"] = _domain.cache_stability_checks - stats["corruption_simulations"] = _domain.corruption_simulations - stats["ast_lookup_checks"] = _domain.ast_lookup_checks - stats["locale_boundary_checks"] = _domain.locale_boundary_checks - - return stats - - -_REPORT_FILENAME = "fuzz_runtime_report.json" - - -def _emit_checkpoint() -> None: - """Emit periodic checkpoint (uses checkpoint markers).""" - stats = _build_stats_dict() - emit_checkpoint_report( - _state, stats, _REPORT_DIR, _REPORT_FILENAME, - ) - - -def _emit_report() -> None: - """Emit comprehensive final report (crash-proof).""" - stats = _build_stats_dict() - emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) - - -atexit.register(_emit_report) - -# --- Suppress logging and instrument imports --- -logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) - -# Enable string and regex comparison instrumentation for better coverage -# of message ID lookups, selector key matching, and pattern-based parsing -atheris.enabled_hooks.add("str") -atheris.enabled_hooks.add("RegEx") - -with atheris.instrument_imports(include=["ftllexengine"]): - from ftllexengine import validate_message_variables - from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT - from ftllexengine.core.locale_utils import require_locale_code - from ftllexengine.diagnostics.errors import FrozenFluentError - from ftllexengine.integrity import ( - CacheCorruptionError, - FormattingIntegrityError, - WriteConflictError, - ) - from ftllexengine.runtime.bundle import FluentBundle - from ftllexengine.runtime.cache import IntegrityCacheEntry - from ftllexengine.runtime.cache_config import CacheConfig - from ftllexengine.syntax import Message, Term - - -# --- Grammar-Aware FTL Construction --- - - -def _build_expression( # noqa: PLR0911, PLR0912 - dispatch - fdp: atheris.FuzzedDataProvider, - depth: int = 0, -) -> str: - """Build a random FTL expression from fuzzed bytes. - - Maps byte values to grammar productions so mutations are meaningful. - High branch count mirrors FTL grammar production rules (10 expression types). - """ - if depth > 3 or fdp.remaining_bytes() < 2: - return fdp.PickValueInList(list(_VAR_NAMES)) - - expr_type = fdp.ConsumeIntInRange(0, 9) - match expr_type: - case 0: - # Variable reference - return fdp.PickValueInList(list(_VAR_NAMES)) - case 1: - # String literal - return f'"{fdp.PickValueInList(list(_UNICODE_TEXTS))}"' - case 2: - # Number literal - num = fdp.ConsumeIntInRange(-9999, 9999) - if fdp.ConsumeBool(): - return str(num) - frac = fdp.ConsumeIntInRange(0, 99) - return f"{num}.{frac:02d}" - case 3: - # Message reference - ref_id = fdp.PickValueInList(list(_IDENTIFIERS)) - if fdp.ConsumeBool(): - return f"{{ {ref_id}.title }}" - return f"{{ {ref_id} }}" - case 4: - # Term reference - term_id = fdp.PickValueInList(list(_TERM_IDENTIFIERS)) - if fdp.ConsumeBool(): - return f"{{ {term_id} }}" - return f"{{ {term_id}.attr }}" - case 5: - # NUMBER() call - var = fdp.PickValueInList(list(_VAR_NAMES)) - opts = "" - if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: - opts = ", " + fdp.PickValueInList(list(_NUMBER_OPTS)) - return f"{{ NUMBER({var}{opts}) }}" - case 6: - # DATETIME() call - var = fdp.PickValueInList(list(_VAR_NAMES)) - opts = "" - if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: - opts = ", " + fdp.PickValueInList(list(_DATETIME_OPTS)) - return f"{{ DATETIME({var}{opts}) }}" - case 7: - # CURRENCY() call - var = fdp.PickValueInList(list(_VAR_NAMES)) - opts = ", " + fdp.PickValueInList(list(_CURRENCY_OPTS)) - if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: - opts += ", " + fdp.PickValueInList(list(_CURRENCY_OPTS)) - return f"{{ CURRENCY({var}{opts}) }}" - case 8: - # Nested placeable - inner = _build_expression(fdp, depth + 1) - return f"{{ {inner} }}" - case 9: - # Custom function - var = fdp.PickValueInList(list(_VAR_NAMES)) - return f'{{ FUZZ_FUNC({var}, key: "val") }}' - - return fdp.PickValueInList(list(_VAR_NAMES)) - - -def _build_select_expression(fdp: atheris.FuzzedDataProvider) -> str: - """Build a select expression with plural/string keys.""" - var = fdp.PickValueInList(list(_VAR_NAMES)) - - # Selector: raw var, NUMBER(), or CURRENCY() - selector_type = fdp.ConsumeIntInRange(0, 2) - match selector_type: - case 0: - selector = var - case 1: - opts = "" - if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: - opts = ", " + fdp.PickValueInList(list(_NUMBER_OPTS)) - selector = f"NUMBER({var}{opts})" - case _: - opts = ", " + fdp.PickValueInList(list(_CURRENCY_OPTS)) - selector = f"CURRENCY({var}{opts})" - - # Build variants - num_variants = fdp.ConsumeIntInRange(1, 5) - variants: list[str] = [] - default_idx = fdp.ConsumeIntInRange(0, num_variants - 1) - - for i in range(num_variants): - # Key: plural category or number literal - if fdp.ConsumeBool(): - key = fdp.PickValueInList(list(_SELECTOR_KEYS)) - else: - key = str(fdp.ConsumeIntInRange(0, 100)) - - value = _build_expression(fdp, depth=1) if fdp.ConsumeBool() else "value" - prefix = "*" if i == default_idx else "" - variants.append(f" [{prefix}{key}] {value}") - - body = "\n".join(variants) - return f"{{ {selector} ->\n{body}\n}}" - - -def _build_message(fdp: atheris.FuzzedDataProvider, msg_id: str) -> str: # noqa: PLR0912 - dispatch - """Build a complete FTL message entry.""" - if fdp.remaining_bytes() < 2: - return f"{msg_id} = fallback\n" - - msg_type = fdp.ConsumeIntInRange(0, 5) - match msg_type: - case 0: - # Simple value with expressions - parts: list[str] = [] - num_parts = fdp.ConsumeIntInRange(1, 3) - for _ in range(num_parts): - if fdp.ConsumeBool(): - parts.append(_build_expression(fdp)) - else: - parts.append(fdp.PickValueInList(list(_UNICODE_TEXTS))) - value = " ".join(parts) - msg = f"{msg_id} = {value}\n" - case 1: - # Select expression - sel = _build_select_expression(fdp) - msg = f"{msg_id} =\n {sel}\n" - case 2: - # Message with attributes - value = _build_expression(fdp) - attrs: list[str] = [] - num_attrs = fdp.ConsumeIntInRange(1, 3) - for j in range(num_attrs): - attr_val = _build_expression(fdp, depth=1) - attrs.append(f" .attr{j} = {attr_val}") - attr_block = "\n".join(attrs) - msg = f"{msg_id} = {value}\n{attr_block}\n" - case 3: - # Cyclic reference - target = fdp.PickValueInList(list(_IDENTIFIERS)) - msg = f"{msg_id} = {{ {target} }}\n" - case 4: - # Reference chain - target = fdp.PickValueInList(list(_IDENTIFIERS)) - if fdp.ConsumeBool(): - msg = f"{msg_id} = prefix {{ {target} }} suffix\n" - else: - msg = f"{msg_id} = {{ {target}.title }}\n" - case _: - # Deep nesting - nesting = fdp.ConsumeIntInRange(1, 8) - expr = fdp.PickValueInList(list(_VAR_NAMES)) - for _ in range(nesting): - expr = f"{{ {expr} }}" - msg = f"{msg_id} = {expr}\n" - - # Optionally add attributes even to non-attribute messages - if fdp.ConsumeBool() and fdp.remaining_bytes() > 2: - msg = msg.rstrip("\n") + f"\n .title = {_build_expression(fdp)}\n" - - return msg - - -def _build_term(fdp: atheris.FuzzedDataProvider) -> str: - """Build a term definition.""" - term_id = fdp.PickValueInList(list(_TERM_IDENTIFIERS)) - - if fdp.ConsumeBool(): - # Term with select - sel = _build_select_expression(fdp) - term = f"{term_id} =\n {sel}\n" - else: - value = _build_expression(fdp) - term = f"{term_id} = {value}\n" - - # Optional attributes - if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: - attr_val = _build_expression(fdp, depth=1) - term = term.rstrip("\n") + f"\n .attr = {attr_val}\n" - - return term - - -def _build_ftl_resource(fdp: atheris.FuzzedDataProvider) -> str: - """Build a complete FTL resource from fuzzed bytes. - - Grammar-aware: each byte decision maps to a structural choice in the FTL - grammar, so libFuzzer coverage feedback drives exploration of new resolver - code paths rather than random noise. - """ - parts: list[str] = [] - - # Always include some terms for term references to resolve - num_terms = fdp.ConsumeIntInRange(0, 3) - for _ in range(num_terms): - if fdp.remaining_bytes() < 2: - break - parts.append(_build_term(fdp)) - - # Build messages - use deterministic IDs so TARGET_MESSAGE_IDS can find them - ids_to_build = list(_IDENTIFIERS) - num_messages = fdp.ConsumeIntInRange(2, min(8, len(ids_to_build))) - for i in range(num_messages): - if fdp.remaining_bytes() < 2: - break - parts.append(_build_message(fdp, ids_to_build[i])) - - return "\n".join(parts) - - -def _generate_complex_args(fdp: atheris.FuzzedDataProvider) -> ComplexArgs: - """Generate fuzzed arguments matching grammar variable names. - - Uses the same variable names as _build_expression so that constructed - FTL messages can resolve their variable references. - """ - # Always provide the core variables so resolution paths are exercised - arg_keys = ("var", "name", "count", "amount", "date", "var_0", "var_1", "var_2", "var_3") - - args: ComplexArgs = {} - for key in arg_keys: - if fdp.remaining_bytes() < 2: - # Provide defaults for remaining keys - args[key] = 42 - continue - - val_type = fdp.ConsumeIntInRange(0, 9) - match val_type: - case 0: - args[key] = fdp.ConsumeUnicodeNoSurrogates(20) - case 1: - args[key] = fdp.ConsumeFloat() - case 2: - args[key] = fdp.ConsumeInt(4) - case 3: - args[key] = datetime.now(tz=UTC) - case 4: - args[key] = [fdp.ConsumeUnicodeNoSurrogates(5) for _ in range(3)] - case 5: - args[key] = {"nested": fdp.ConsumeInt(2)} - case 6: - args[key] = fdp.ConsumeBool() - case 7: - # Numeric edge cases for NUMBER/CURRENCY selectors - args[key] = fdp.PickValueInList([0, 1, 2, 3, 5, 10, 100, 1000000]) - case 8: - # Float edge cases - args[key] = fdp.PickValueInList( - [0.0, -0.0, 1.5, float("inf"), float("-inf"), float("nan")] - ) - case 9: - # Decimal-like for precision testing - args[key] = fdp.ConsumeIntInRange(-99999, 99999) / 100 - - return args - - -def _fuzzed_function(args: list[Any], kwargs: dict[str, Any]) -> str: - """Mock custom function for FunctionRegistry testing.""" - return f"PROCESSED_{len(args)}_{len(kwargs)}" - - -def _add_random_resources(fdp: atheris.FuzzedDataProvider, bundle: FluentBundle) -> None: - """Add grammar-aware FTL resources to bundle. - - Constructs structurally valid FTL from fuzzed bytes so that libFuzzer - mutations map to meaningful grammar variations rather than random noise. - """ - ftl = _build_ftl_resource(fdp) - - with contextlib.suppress(Exception): - bundle.add_resource(ftl) - - # Optionally add a second resource (tests message dedup / last-wins behavior) - if fdp.ConsumeBool() and fdp.remaining_bytes() > 4: - ftl2 = _build_ftl_resource(fdp) - with contextlib.suppress(Exception): - bundle.add_resource(ftl2) - - -def _validate_bundle_message_lookup( - bundle: FluentBundle, - message_id: str, -) -> None: - """Validate FluentBundle.get_message() for one message identifier.""" - message = bundle.get_message(message_id) - if message is None: - return - - if not isinstance(message, Message): - msg = f"get_message({message_id!r}) returned {type(message).__name__}" - raise RuntimeIntegrityError(msg) - if message.id.name != message_id: - msg = f"get_message({message_id!r}) returned node named {message.id.name!r}" - raise RuntimeIntegrityError(msg) - if bundle.get_term(message_id) is not None: - msg = f"get_term({message_id!r}) crossed the message/term namespace boundary" - raise RuntimeIntegrityError(msg) - - declared_variables = bundle.get_message_variables(message_id) - validation = validate_message_variables(message, declared_variables) - if not validation.is_valid: - msg = f"validate_message_variables() rejected bundle message {message_id!r}" - raise RuntimeIntegrityError(msg) - if validation.declared_variables != declared_variables: - msg = ( - "validate_message_variables() changed declared variables for " - f"{message_id!r}: {validation.declared_variables!r} vs {declared_variables!r}" - ) - raise RuntimeIntegrityError(msg) - - -def _validate_bundle_term_lookup( - bundle: FluentBundle, - term_id: str, -) -> None: - """Validate FluentBundle.get_term() for one term identifier.""" - term = bundle.get_term(term_id) - if term is None: - return - - if not isinstance(term, Term): - msg = f"get_term({term_id!r}) returned {type(term).__name__}" - raise RuntimeIntegrityError(msg) - if term.id.name != term_id: - msg = f"get_term({term_id!r}) returned node named {term.id.name!r}" - raise RuntimeIntegrityError(msg) - if bundle.get_term(f"-{term_id}") is not None: - msg = f"get_term('-{term_id}') bypassed the no-leading-dash contract" - raise RuntimeIntegrityError(msg) - if bundle.get_message(term_id) is not None: - msg = f"get_message({term_id!r}) crossed the term/message namespace boundary" - raise RuntimeIntegrityError(msg) - - declared_variables = bundle.introspect_term(term_id).get_variable_names() - validation = validate_message_variables(term, declared_variables) - if not validation.is_valid: - msg = f"validate_message_variables() rejected bundle term {term_id!r}" - raise RuntimeIntegrityError(msg) - if validation.declared_variables != declared_variables: - msg = ( - "validate_message_variables() changed declared term variables for " - f"{term_id!r}: {validation.declared_variables!r} vs {declared_variables!r}" - ) - raise RuntimeIntegrityError(msg) - - -def _verify_ast_lookup_accessors(bundle: FluentBundle) -> None: - """Validate FluentBundle AST lookup accessors on the public facade.""" - _domain.ast_lookup_checks += 1 - - missing_id = "__missing_bundle_lookup__" - if bundle.get_message(missing_id) is not None: - msg = f"get_message({missing_id!r}) returned a node for a missing message" - raise RuntimeIntegrityError(msg) - if bundle.get_term(missing_id) is not None: - msg = f"get_term({missing_id!r}) returned a node for a missing term" - raise RuntimeIntegrityError(msg) - - for message_id in _IDENTIFIERS: - _validate_bundle_message_lookup(bundle, message_id) - - for term_id in _TERM_QUERY_IDS: - _validate_bundle_term_lookup(bundle, term_id) - - -def _execute_runtime_invariants( # noqa: PLR0912, PLR0915 - dispatch - fdp: atheris.FuzzedDataProvider, - bundle: FluentBundle, - args: ComplexArgs, - strict: bool, - enable_cache: bool, - cache_write_once: bool, -) -> None: - """Verify core runtime invariants across operations.""" - target_ids = list(TARGET_MESSAGE_IDS) - fdp_sample = fdp.ConsumeIntInRange(3, len(target_ids)) - sampled_ids = target_ids[:fdp_sample] - - for msg_id in sampled_ids: - attribute = fdp.PickValueInList([None, "title", "nonexistent"]) - try: - # Primary formatting - res1, err1 = bundle.format_pattern(msg_id, args, attribute=attribute) - - # INVARIANT: Strict Mode Integrity - if strict and len(err1) > 0: - _domain.strict_mode_tests += 1 - msg = f"Strict mode breach: {len(err1)} errors for '{msg_id}'." - raise RuntimeIntegrityError(msg) - - # INVARIANT: Frozen Error Integrity - for e in err1: - _domain.frozen_error_verifications += 1 - if not e.verify_integrity(): - msg = "FrozenFluentError checksum verification failed." - raise RuntimeIntegrityError(msg) - - # INVARIANT: Cache Stability - if enable_cache and bundle._cache is not None: - _domain.cache_operations += 1 - res2, err2 = bundle.format_pattern(msg_id, args, attribute=attribute) - _domain.cache_stability_checks += 1 - - if res1 != res2 or len(err1) != len(err2): - msg = f"Cache stability breach: non-deterministic result for '{msg_id}'." - raise RuntimeIntegrityError(msg) - - # Corruption simulation (5% chance) - if fdp.ConsumeProbability() < 0.05: - _domain.corruption_simulations += 1 - _simulate_corruption(bundle) - try: - bundle.format_pattern(msg_id, args, attribute=attribute) - except CacheCorruptionError as exc: - if not strict: - msg = "Non-strict cache raised CacheCorruptionError." - raise RuntimeIntegrityError(msg) from exc - except Exception as e: # pylint: disable=broad-exception-caught - is_corruption = "corruption" in str(e).lower() - if is_corruption and not isinstance(e, CacheCorruptionError): - msg = f"Wrong exception type for corruption: {type(e)}" - raise RuntimeIntegrityError(msg) from e - - except FormattingIntegrityError as e: - _domain.integrity_checks += 1 - if not strict: - msg = "Non-strict bundle raised FormattingIntegrityError." - raise RuntimeIntegrityError(msg) from e - if not e.fluent_errors: - msg = "FormattingIntegrityError empty." - raise RuntimeIntegrityError(msg) from e - - except WriteConflictError as e: - if not cache_write_once: - msg = "WriteConflictError raised when write_once=False." - raise RuntimeIntegrityError(msg) from e - - except (RecursionError, MemoryError, FrozenFluentError): - # FrozenFluentError: depth guard fires MAX_DEPTH_EXCEEDED as a safety - # mechanism regardless of strict mode to prevent stack overflow - pass - - -def _simulate_corruption(bundle: FluentBundle) -> None: - """Simulate cache corruption for integrity testing.""" - if bundle._cache is None: - return - with bundle._cache._lock: - if not bundle._cache._cache: - return - key = next(iter(bundle._cache._cache)) - entry = bundle._cache._cache[key] - - corrupted = IntegrityCacheEntry( - formatted=entry.formatted + "CORRUPTION", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - bundle._cache._cache[key] = corrupted - - -def _perform_security_fuzzing(fdp: atheris.FuzzedDataProvider) -> str: - """Perform security fuzzing with attack vectors.""" - _domain.security_tests += 1 - - attack_idx = fdp.ConsumeIntInRange(0, len(_SECURITY_SCHEDULE) - 1) - attack = _SECURITY_SCHEDULE[attack_idx] - - match attack: - case "security_recursion": - _test_deep_recursion(fdp) - case "security_memory": - _test_memory_exhaustion(fdp) - case "security_cache_poison": - _test_cache_poisoning(fdp) - case "security_function_inject": - _test_function_injection(fdp) - case "security_locale_boundary": - _test_locale_boundary(fdp) - case "security_expansion_budget": - _test_expansion_budget(fdp) - case "security_dag_expansion": - _test_dag_expansion(fdp) - case "security_dict_functions": - _test_dict_functions(fdp) - - return attack - - -def _test_deep_recursion(fdp: atheris.FuzzedDataProvider) -> None: - """Test deep recursion via nested placeables and cyclic references.""" - attack_type = fdp.ConsumeIntInRange(0, 2) - try: - bundle = FluentBundle("en", strict=False) - match attack_type: - case 0: - # Deep nested placeables - depth = fdp.ConsumeIntInRange(50, 200) - ftl = "msg = " + "{ " * depth + "$var" + " }" * depth + "\n" - bundle.add_resource(ftl) - case 1: - # Cyclic reference chain - chain_len = fdp.ConsumeIntInRange(2, 20) - parts = [] - for i in range(chain_len): - next_id = f"c{(i + 1) % chain_len}" - parts.append(f"c{i} = {{ {next_id} }}\n") - bundle.add_resource("\n".join(parts)) - case _: - # Self-referencing term with select - ftl = "-self = { -self ->\n *[other] { -self }\n}\nmsg = { -self }\n" - bundle.add_resource(ftl) - bundle.format_pattern("msg" if attack_type == 0 else "c0", {"var": "test"}) - except (RecursionError, MemoryError, ValueError, FrozenFluentError): - # FrozenFluentError: depth guard fires MAX_DEPTH_EXCEEDED regardless of strict mode - pass - - -def _test_memory_exhaustion(fdp: atheris.FuzzedDataProvider) -> None: - """Test memory exhaustion via large values and many variants.""" - attack_type = fdp.ConsumeIntInRange(0, 2) - try: - bundle = FluentBundle("en", strict=False) - match attack_type: - case 0: - # Large string value - size = fdp.ConsumeIntInRange(10000, 100000) - bundle.add_resource(f"msg = {'x' * size}\n") - case 1: - # Many variants in select - n = fdp.ConsumeIntInRange(50, 200) - variants = "\n".join(f" [{'*' if i == 0 else ''}v{i}] val{i}" for i in range(n)) - bundle.add_resource(f"msg = {{ $var ->\n{variants}\n}}\n") - case _: - # Many attributes - n = fdp.ConsumeIntInRange(50, 200) - attrs = "\n".join(f" .a{i} = val{i}" for i in range(n)) - bundle.add_resource(f"msg = val\n{attrs}\n") - bundle.format_pattern("msg", {"var": "test"}) - except (MemoryError, ValueError, FrozenFluentError): - pass - - -def _test_cache_poisoning(fdp: atheris.FuzzedDataProvider) -> None: - """Test cache poisoning attack.""" - try: - bundle = FluentBundle("en", cache=CacheConfig(), strict=False) - bundle.add_resource("msg = Hello { $name }\n") - - malicious_args = [ - {"name": float("inf")}, - {"name": float("-inf")}, - {"name": float("nan")}, - {"name": None}, - {"name": []}, - ] - - for args in malicious_args[: fdp.ConsumeIntInRange(1, len(malicious_args))]: - with contextlib.suppress(Exception): - bundle.format_pattern("msg", args) # type: ignore[arg-type] - - except Exception: # pylint: disable=broad-exception-caught - pass - - -def _test_function_injection(fdp: atheris.FuzzedDataProvider) -> None: - """Test function injection and recursive custom function attacks. - - Two sub-patterns: - 0 - No-op custom function (baseline injection) - 1 - Recursive custom function that calls back into bundle.format_pattern(), - testing GlobalDepthGuard cross-context recursion protection - """ - attack_variant = fdp.ConsumeIntInRange(0, 1) - try: - bundle = FluentBundle("en", strict=False) - - if attack_variant == 0: - # Baseline: no-op custom function - def noop_func(*_args: Any, **_kwargs: Any) -> str: - return "safe_output" - - bundle.add_function("INJECT", noop_func) - bundle.add_resource("msg = { INJECT() }\n") - bundle.format_pattern("msg", {}) - else: - # Recursive: custom function calls back into format_pattern, - # exercising GlobalDepthGuard across function boundaries - call_depth = fdp.ConsumeIntInRange(1, 10) - counter = {"n": 0} - - def recursive_func(*_args: Any, **_kwargs: Any) -> str: - counter["n"] += 1 - if counter["n"] < call_depth: - result, _ = bundle.format_pattern("recurse", {}) - return str(result) - return "base" - - bundle.add_function("RECURSE_FN", recursive_func) - bundle.add_resource("recurse = { RECURSE_FN() }\nmsg = { RECURSE_FN() }\n") - bundle.format_pattern("msg", {}) - - except Exception: # pylint: disable=broad-exception-caught - pass - - -def _assert_bundle_locale_accepts(raw_locale: str) -> None: - """Accepted constructor locales are canonicalized to LocaleCode form.""" - try: - bundle = FluentBundle(raw_locale, strict=False) - except Exception as err: # pylint: disable=broad-exception-caught - msg = f"FluentBundle rejected valid locale {raw_locale!r}: {err}" - raise RuntimeIntegrityError(msg) from err - - expected_locale = require_locale_code(raw_locale, "locale") - if bundle.locale != expected_locale: - msg = ( - f"FluentBundle stored the wrong canonical locale for {raw_locale!r}: " - f"{bundle.locale!r} vs {expected_locale!r}" - ) - raise RuntimeIntegrityError(msg) - - bundle.add_resource("msg = ready\n") - result, errors = bundle.format_pattern("msg", {}) - if result != "ready" or errors: - msg = ( - f"FluentBundle with accepted locale {expected_locale!r} " - f"failed basic formatting: result={result!r}, errors={errors!r}" - ) - raise RuntimeIntegrityError(msg) - - -def _assert_bundle_locale_rejected( - locale: object, - *, - expected_exception: type[ValueError | TypeError], - expected_fragment: str, -) -> None: - """Rejected constructor locales surface the canonical boundary error model.""" - locale_value: Any = locale - - try: - FluentBundle(locale_value, strict=False) - except Exception as err: # pylint: disable=broad-exception-caught - if not isinstance(err, expected_exception): - msg = ( - "FluentBundle raised the wrong locale-boundary exception for " - f"{locale!r}: {type(err).__name__}" - ) - raise RuntimeIntegrityError(msg) from err - if expected_fragment not in str(err): - msg = ( - "FluentBundle locale-boundary error message drifted for " - f"{locale!r}: {err}" - ) - raise RuntimeIntegrityError(msg) from err - return - - msg = f"FluentBundle accepted invalid locale {locale!r}" - raise RuntimeIntegrityError(msg) - - -def _test_locale_boundary(fdp: atheris.FuzzedDataProvider) -> None: - """Test the FluentBundle constructor locale boundary contract.""" - _domain.locale_boundary_checks += 1 - scenario = fdp.ConsumeIntInRange(0, 4) - boundary_locale = "a" + ("b" * (MAX_LOCALE_LENGTH_HARD_LIMIT - 2)) + "C" - - match scenario: - case 0: - raw_locale = fdp.PickValueInList( - [ - " EN-us ", - "\tpt-BR\n", - f" {boundary_locale} ", - ] - ) - _assert_bundle_locale_accepts(raw_locale) - case 1: - blank_locale = fdp.PickValueInList(["", " ", "\t\n", " \r\n "]) - _assert_bundle_locale_rejected( - blank_locale, - expected_exception=ValueError, - expected_fragment="locale cannot be blank", - ) - case 2: - invalid_locale = fdp.PickValueInList(list(_STRUCTURALLY_INVALID_LOCALES)) - _assert_bundle_locale_rejected( - invalid_locale, - expected_exception=ValueError, - expected_fragment="Invalid locale:", - ) - case 3: - overshoot = fdp.ConsumeIntInRange(1, 32) - overlong_locale = "a" * (MAX_LOCALE_LENGTH_HARD_LIMIT + overshoot) - _assert_bundle_locale_rejected( - overlong_locale, - expected_exception=ValueError, - expected_fragment="locale exceeds maximum length", - ) - case _: - non_string_locale = fdp.PickValueInList(list(_NON_STRING_LOCALES)) - _assert_bundle_locale_rejected( - non_string_locale, - expected_exception=TypeError, - expected_fragment="locale must be str", - ) - - -def _test_expansion_budget(fdp: atheris.FuzzedDataProvider) -> None: - """Test Billion Laughs expansion budget. - - Constructs exponentially expanding message references: - m0={m1}{m1}, m1={m2}{m2}, ... so small FTL produces huge output. - The expansion budget (max_expansion_size) should halt resolution. - """ - depth = fdp.ConsumeIntInRange(5, 20) - # Use both default and small budgets to exercise the guard path - budget = fdp.PickValueInList([100, 1000, 10000, None]) - try: - kwargs: dict[str, Any] = {"strict": False} - if budget is not None: - kwargs["max_expansion_size"] = budget - bundle = FluentBundle("en", **kwargs) - parts = [] - for i in range(depth): - parts.append(f"m{i} = {{ m{i + 1} }}{{ m{i + 1} }}\n") - parts.append(f"m{depth} = payload\n") - bundle.add_resource("\n".join(parts)) - bundle.format_pattern("m0", {}) - except (RecursionError, MemoryError, FrozenFluentError, ValueError): - pass - - -def _test_dag_expansion(fdp: atheris.FuzzedDataProvider) -> None: - """Test _make_hashable DAG expansion DoS. - - Constructs deeply shared references as cache args to stress the - node budget in IntegrityCache._make_hashable(). - """ - try: - bundle = FluentBundle("en", cache=CacheConfig(), strict=False) - bundle.add_resource("msg = Hello { $name }\n") - - # Build DAG: l = [l, l] repeated N times. - # Cap at 20: depth 20 creates 2^20 logical nodes which is sufficient - # to trigger _make_hashable node budget (10,000). Higher depths cause - # exponential str() expansion in the resolver (2^30 = 1B nodes). - depth = fdp.ConsumeIntInRange(10, 20) - dag: list[Any] = ["leaf"] - for _ in range(depth): - dag = [dag, dag] - - with contextlib.suppress(Exception): - bundle.format_pattern("msg", {"name": dag}) # type: ignore[arg-type] - - # Lock must still be usable after DAG rejection - with contextlib.suppress(Exception): - bundle.format_pattern("msg", {"name": "safe"}) - - except Exception: # pylint: disable=broad-exception-caught - pass - - -def _test_dict_functions(_fdp: atheris.FuzzedDataProvider) -> None: - """Test FluentBundle rejects dict as functions parameter. - - Passing a raw dict should raise TypeError at construction time. - """ - try: - FluentBundle("en", functions={"NUMBER": lambda *_a, **_k: "x"}) # type: ignore[arg-type] - # If we get here, the guard didn't fire -- that's a finding - msg = "FluentBundle accepted dict as functions parameter" - raise RuntimeIntegrityError(msg) - except TypeError: - pass # Expected - except RuntimeIntegrityError: - raise - except Exception: # pylint: disable=broad-exception-caught - pass - - -def _perform_differential_testing( - fdp: atheris.FuzzedDataProvider, - bundle: FluentBundle, - args: ComplexArgs, -) -> None: - """Differential testing: same FTL, different configs must not crash differently.""" - _domain.differential_tests += 1 - - alt_locale = fdp.PickValueInList(["en-US", "de-DE", "ar-EG", "ja-JP", "C", ""]) - alt_strict = not bundle.strict if fdp.ConsumeBool() else bundle.strict - alt_cache = not bundle.cache_enabled if fdp.ConsumeBool() else bundle.cache_enabled - - try: - alt_bundle = FluentBundle( - alt_locale, - strict=alt_strict, - cache=CacheConfig() if alt_cache else None, - ) - - # Copy functions - for name in bundle._function_registry: - func = bundle._function_registry.get_callable(name) - if func: - alt_bundle.add_function(name, func) - - # Same FTL resource - ftl = _build_ftl_resource(fdp) - with contextlib.suppress(Exception): - alt_bundle.add_resource(ftl) - - # Format all reachable messages - for msg_id in TARGET_MESSAGE_IDS[:8]: - with contextlib.suppress(Exception): - alt_bundle.format_pattern(msg_id, args) - - except Exception: # pylint: disable=broad-exception-caught - pass - - -def _run_concurrent_test( - fdp: atheris.FuzzedDataProvider, - bundle: FluentBundle, - args: ComplexArgs, - strict: bool, - enable_cache: bool, - cache_write_once: bool, -) -> None: - """Run concurrent execution test.""" - _domain.concurrent_tests += 1 - - barrier = threading.Barrier(2) - - def worker() -> None: - with contextlib.suppress(threading.BrokenBarrierError): - barrier.wait(timeout=1.0) - try: - _execute_runtime_invariants(fdp, bundle, args, strict, enable_cache, cache_write_once) - except CacheCorruptionError: - # Expected from corruption simulation in strict mode - pass - except (RecursionError, MemoryError, FrozenFluentError): - # FrozenFluentError: depth guard (MAX_DEPTH_EXCEEDED) - pass - - threads = [threading.Thread(target=worker) for _ in range(2)] - for t in threads: - t.start() - for t in threads: - t.join(timeout=3.0) - if t.is_alive(): - msg = "RWLock deadlock detected." - raise RuntimeIntegrityError(msg) - - -def test_one_input(data: bytes) -> None: # noqa: PLR0912, PLR0915 - dispatch - """Atheris entry point: Test runtime invariants and contracts.""" - # Initialize memory baseline - if _state.iterations == 0: - _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) - - _state.iterations += 1 - _state.status = "running" - - # Periodic checkpoint - if _state.iterations % _state.checkpoint_interval == 0: - _emit_checkpoint() - - start_time = time.perf_counter() - fdp = atheris.FuzzedDataProvider(data) - - scenario = select_pattern_round_robin(_state, _SCENARIO_SCHEDULE) - _state.pattern_coverage[scenario] = _state.pattern_coverage.get(scenario, 0) + 1 - - if fdp.remaining_bytes() < 2: - return - - # Security fuzzing (separate path) - if scenario == "security": - security_scenario = _perform_security_fuzzing(fdp) - _state.pattern_coverage[security_scenario] = ( - _state.pattern_coverage.get(security_scenario, 0) + 1 - ) - record_iteration_metrics(_state, scenario, start_time, data, is_interesting=True) - return - - # Configuration - strict = scenario == "strict_mode" or fdp.ConsumeBool() - enable_cache = scenario == "caching" or fdp.ConsumeBool() - use_isolating = fdp.ConsumeBool() - cache_write_once = fdp.ConsumeBool() - - # Locale selection - locale = fdp.PickValueInList(list(TEST_LOCALES)) - - try: - try: - cache_cfg = CacheConfig(write_once=cache_write_once) if enable_cache else None - bundle = FluentBundle( - locale, - strict=strict, - cache=cache_cfg, - use_isolating=use_isolating, - ) - if fdp.ConsumeBool(): - bundle.add_function("FUZZ_FUNC", _fuzzed_function) - except (ValueError, TypeError): - return - - # Add resources - _add_random_resources(fdp, bundle) - _verify_ast_lookup_accessors(bundle) - - # Generate args - args = _generate_complex_args(fdp) - - if strict: - _domain.strict_mode_tests += 1 - - # Execute based on scenario - if scenario == "concurrent": - _run_concurrent_test(fdp, bundle, args, strict, enable_cache, cache_write_once) - elif scenario == "differential": - _perform_differential_testing(fdp, bundle, args) - else: - _execute_runtime_invariants(fdp, bundle, args, strict, enable_cache, cache_write_once) - - except CacheCorruptionError: - if strict: - return # Expected - _state.findings += 1 - raise - - except RuntimeIntegrityError: - _state.findings += 1 - raise - - except Exception as e: # pylint: disable=broad-exception-caught - error_key = f"{type(e).__name__}_{str(e)[:30]}" - _state.error_counts[error_key] = _state.error_counts.get(error_key, 0) + 1 - - finally: - is_interesting = "security" in scenario or "integrity" in scenario or ( - (time.perf_counter() - start_time) * 1000 > 50.0 - ) - record_iteration_metrics( - _state, scenario, start_time, data, is_interesting=is_interesting, - ) - - # Break reference cycles in AST/error objects to prevent RSS growth - if _state.iterations % GC_INTERVAL == 0: - gc.collect() - - # Memory tracking (every 100 iterations) - if _state.iterations % 100 == 0: - record_memory(_state) - - -def main() -> None: - """Run the runtime fuzzer with CLI support.""" - parser = argparse.ArgumentParser( - description="Runtime end-to-end fuzzer using Atheris/libFuzzer", - epilog="All unrecognized arguments are passed to libFuzzer.", - ) - parser.add_argument( - "--checkpoint-interval", - type=int, - default=500, - help="Emit report every N iterations (default: 500)", - ) - parser.add_argument( - "--seed-corpus-size", - type=int, - default=500, - help="Maximum size of in-memory seed corpus (default: 500)", - ) - parser.add_argument( - "--recursion-limit", - type=int, - default=2000, - help="Python recursion limit (default: 2000)", - ) - - args, remaining = parser.parse_known_args() - _state.checkpoint_interval = args.checkpoint_interval - _state.seed_corpus_max_size = args.seed_corpus_size - sys.setrecursionlimit(args.recursion_limit) - - # Inject -rss_limit_mb default if not already specified. - # AST reference cycles can accumulate between gc passes; 4096 MB provides - # headroom while still catching true leaks before system OOM-kill. - if not any(arg.startswith("-rss_limit_mb") for arg in remaining): - remaining.append("-rss_limit_mb=4096") - - sys.argv = [sys.argv[0], *remaining] - - print_fuzzer_banner( - title="Runtime End-to-End Fuzzer (Atheris)", - target="FluentBundle, IntegrityCache, Resolver, Strict Mode", - state=_state, - schedule_len=len(_SCENARIO_SCHEDULE), - extra_lines=[f"Recursion: {args.recursion_limit} limit"], - ) - - run_fuzzer(_state, test_one_input=test_one_input) - +from fuzz_runtime_entry import main if __name__ == "__main__": main() diff --git a/fuzz_atheris/fuzz_runtime_builders.py b/fuzz_atheris/fuzz_runtime_builders.py new file mode 100644 index 00000000..147c2b07 --- /dev/null +++ b/fuzz_atheris/fuzz_runtime_builders.py @@ -0,0 +1,388 @@ +# mypy: disable-error-code=name-defined +from fuzz_runtime_support import ( + _CURRENCY_OPTS, + _DATETIME_OPTS, + _IDENTIFIERS, + _NUMBER_OPTS, + _SELECTOR_KEYS, + _TERM_IDENTIFIERS, + _TERM_QUERY_IDS, + _UNICODE_TEXTS, + _VAR_NAMES, + UTC, + Any, + ComplexArgs, + FluentBundle, + RuntimeIntegrityError, + _domain, + atheris, + contextlib, + datetime, + validate_message_variables, +) + + +def _build_expression( # noqa: PLR0911, PLR0912 - dispatch + fdp: atheris.FuzzedDataProvider, + depth: int = 0, +) -> str: + """Build a random FTL expression from fuzzed bytes. + + Maps byte values to grammar productions so mutations are meaningful. + High branch count mirrors FTL grammar production rules (10 expression types). + """ + if depth > 3 or fdp.remaining_bytes() < 2: + return str(fdp.PickValueInList(list(_VAR_NAMES))) + + expr_type = fdp.ConsumeIntInRange(0, 9) + match expr_type: + case 0: + # Variable reference + return str(fdp.PickValueInList(list(_VAR_NAMES))) + case 1: + # String literal + return f'"{fdp.PickValueInList(list(_UNICODE_TEXTS))}"' + case 2: + # Number literal + num = fdp.ConsumeIntInRange(-9999, 9999) + if fdp.ConsumeBool(): + return str(num) + frac = fdp.ConsumeIntInRange(0, 99) + return f"{num}.{frac:02d}" + case 3: + # Message reference + ref_id = fdp.PickValueInList(list(_IDENTIFIERS)) + if fdp.ConsumeBool(): + return f"{{ {ref_id}.title }}" + return f"{{ {ref_id} }}" + case 4: + # Term reference + term_id = fdp.PickValueInList(list(_TERM_IDENTIFIERS)) + if fdp.ConsumeBool(): + return f"{{ {term_id} }}" + return f"{{ {term_id}.attr }}" + case 5: + # NUMBER() call + var = fdp.PickValueInList(list(_VAR_NAMES)) + opts = "" + if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: + opts = ", " + fdp.PickValueInList(list(_NUMBER_OPTS)) + return f"{{ NUMBER({var}{opts}) }}" + case 6: + # DATETIME() call + var = fdp.PickValueInList(list(_VAR_NAMES)) + opts = "" + if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: + opts = ", " + fdp.PickValueInList(list(_DATETIME_OPTS)) + return f"{{ DATETIME({var}{opts}) }}" + case 7: + # CURRENCY() call + var = fdp.PickValueInList(list(_VAR_NAMES)) + opts = ", " + fdp.PickValueInList(list(_CURRENCY_OPTS)) + if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: + opts += ", " + fdp.PickValueInList(list(_CURRENCY_OPTS)) + return f"{{ CURRENCY({var}{opts}) }}" + case 8: + # Nested placeable + inner = _build_expression(fdp, depth + 1) + return f"{{ {inner} }}" + case 9: + # Custom function + var = fdp.PickValueInList(list(_VAR_NAMES)) + return f'{{ FUZZ_FUNC({var}, key: "val") }}' + + return str(fdp.PickValueInList(list(_VAR_NAMES))) + + +def _build_select_expression(fdp: atheris.FuzzedDataProvider) -> str: + """Build a select expression with plural/string keys.""" + var = fdp.PickValueInList(list(_VAR_NAMES)) + + # Selector: raw var, NUMBER(), or CURRENCY() + selector_type = fdp.ConsumeIntInRange(0, 2) + match selector_type: + case 0: + selector = var + case 1: + opts = "" + if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: + opts = ", " + fdp.PickValueInList(list(_NUMBER_OPTS)) + selector = f"NUMBER({var}{opts})" + case _: + opts = ", " + fdp.PickValueInList(list(_CURRENCY_OPTS)) + selector = f"CURRENCY({var}{opts})" + + # Build variants + num_variants = fdp.ConsumeIntInRange(1, 5) + variants: list[str] = [] + default_idx = fdp.ConsumeIntInRange(0, num_variants - 1) + + for i in range(num_variants): + # Key: plural category or number literal + if fdp.ConsumeBool(): + key = fdp.PickValueInList(list(_SELECTOR_KEYS)) + else: + key = str(fdp.ConsumeIntInRange(0, 100)) + + value = _build_expression(fdp, depth=1) if fdp.ConsumeBool() else "value" + prefix = "*" if i == default_idx else "" + variants.append(f" [{prefix}{key}] {value}") + + body = "\n".join(variants) + return f"{{ {selector} ->\n{body}\n}}" + + +def _build_message(fdp: atheris.FuzzedDataProvider, msg_id: str) -> str: # noqa: PLR0912 - dispatch + """Build a complete FTL message entry.""" + if fdp.remaining_bytes() < 2: + return f"{msg_id} = fallback\n" + + msg_type = fdp.ConsumeIntInRange(0, 5) + match msg_type: + case 0: + # Simple value with expressions + parts: list[str] = [] + num_parts = fdp.ConsumeIntInRange(1, 3) + for _ in range(num_parts): + if fdp.ConsumeBool(): + parts.append(_build_expression(fdp)) + else: + parts.append(fdp.PickValueInList(list(_UNICODE_TEXTS))) + value = " ".join(parts) + msg = f"{msg_id} = {value}\n" + case 1: + # Select expression + sel = _build_select_expression(fdp) + msg = f"{msg_id} =\n {sel}\n" + case 2: + # Message with attributes + value = _build_expression(fdp) + attrs: list[str] = [] + num_attrs = fdp.ConsumeIntInRange(1, 3) + for j in range(num_attrs): + attr_val = _build_expression(fdp, depth=1) + attrs.append(f" .attr{j} = {attr_val}") + attr_block = "\n".join(attrs) + msg = f"{msg_id} = {value}\n{attr_block}\n" + case 3: + # Cyclic reference + target = fdp.PickValueInList(list(_IDENTIFIERS)) + msg = f"{msg_id} = {{ {target} }}\n" + case 4: + # Reference chain + target = fdp.PickValueInList(list(_IDENTIFIERS)) + if fdp.ConsumeBool(): + msg = f"{msg_id} = prefix {{ {target} }} suffix\n" + else: + msg = f"{msg_id} = {{ {target}.title }}\n" + case _: + # Deep nesting + nesting = fdp.ConsumeIntInRange(1, 8) + expr = fdp.PickValueInList(list(_VAR_NAMES)) + for _ in range(nesting): + expr = f"{{ {expr} }}" + msg = f"{msg_id} = {expr}\n" + + # Optionally add attributes even to non-attribute messages + if fdp.ConsumeBool() and fdp.remaining_bytes() > 2: + msg = msg.rstrip("\n") + f"\n .title = {_build_expression(fdp)}\n" + + return msg + + +def _build_term(fdp: atheris.FuzzedDataProvider) -> str: + """Build a term definition.""" + term_id = fdp.PickValueInList(list(_TERM_IDENTIFIERS)) + + if fdp.ConsumeBool(): + # Term with select + sel = _build_select_expression(fdp) + term = f"{term_id} =\n {sel}\n" + else: + value = _build_expression(fdp) + term = f"{term_id} = {value}\n" + + # Optional attributes + if fdp.ConsumeBool() and fdp.remaining_bytes() > 1: + attr_val = _build_expression(fdp, depth=1) + term = term.rstrip("\n") + f"\n .attr = {attr_val}\n" + + return term + + +def _build_ftl_resource(fdp: atheris.FuzzedDataProvider) -> str: + """Build a complete FTL resource from fuzzed bytes. + + Grammar-aware: each byte decision maps to a structural choice in the FTL + grammar, so libFuzzer coverage feedback drives exploration of new resolver + code paths rather than random noise. + """ + parts: list[str] = [] + + # Always include some terms for term references to resolve + num_terms = fdp.ConsumeIntInRange(0, 3) + for _ in range(num_terms): + if fdp.remaining_bytes() < 2: + break + parts.append(_build_term(fdp)) + + # Build messages - use deterministic IDs so TARGET_MESSAGE_IDS can find them + ids_to_build = list(_IDENTIFIERS) + num_messages = fdp.ConsumeIntInRange(2, min(8, len(ids_to_build))) + for i in range(num_messages): + if fdp.remaining_bytes() < 2: + break + parts.append(_build_message(fdp, ids_to_build[i])) + + return "\n".join(parts) + + +def _generate_complex_args(fdp: atheris.FuzzedDataProvider) -> ComplexArgs: + """Generate fuzzed arguments matching grammar variable names. + + Uses the same variable names as _build_expression so that constructed + FTL messages can resolve their variable references. + """ + # Always provide the core variables so resolution paths are exercised + arg_keys = ("var", "name", "count", "amount", "date", "var_0", "var_1", "var_2", "var_3") + + args: ComplexArgs = {} + for key in arg_keys: + if fdp.remaining_bytes() < 2: + # Provide defaults for remaining keys + args[key] = 42 + continue + + val_type = fdp.ConsumeIntInRange(0, 9) + match val_type: + case 0: + args[key] = fdp.ConsumeUnicodeNoSurrogates(20) + case 1: + args[key] = fdp.ConsumeFloat() + case 2: + args[key] = fdp.ConsumeInt(4) + case 3: + args[key] = datetime.now(tz=UTC) + case 4: + args[key] = [fdp.ConsumeUnicodeNoSurrogates(5) for _ in range(3)] + case 5: + args[key] = {"nested": fdp.ConsumeInt(2)} + case 6: + args[key] = fdp.ConsumeBool() + case 7: + # Numeric edge cases for NUMBER/CURRENCY selectors + args[key] = fdp.PickValueInList([0, 1, 2, 3, 5, 10, 100, 1000000]) + case 8: + # Float edge cases + args[key] = fdp.PickValueInList( + [0.0, -0.0, 1.5, float("inf"), float("-inf"), float("nan")] + ) + case 9: + # Decimal-like for precision testing + args[key] = fdp.ConsumeIntInRange(-99999, 99999) / 100 + + return args + + +def _fuzzed_function(args: list[Any], kwargs: dict[str, Any]) -> str: + """Mock custom function for FunctionRegistry testing.""" + return f"PROCESSED_{len(args)}_{len(kwargs)}" + + +def _add_random_resources(fdp: atheris.FuzzedDataProvider, bundle: FluentBundle) -> None: + """Add grammar-aware FTL resources to bundle. + + Constructs structurally valid FTL from fuzzed bytes so that libFuzzer + mutations map to meaningful grammar variations rather than random noise. + """ + ftl = _build_ftl_resource(fdp) + + with contextlib.suppress(Exception): + bundle.add_resource(ftl) + + # Optionally add a second resource (tests message dedup / last-wins behavior) + if fdp.ConsumeBool() and fdp.remaining_bytes() > 4: + ftl2 = _build_ftl_resource(fdp) + with contextlib.suppress(Exception): + bundle.add_resource(ftl2) + + +def _validate_bundle_message_lookup( + bundle: FluentBundle, + message_id: str, +) -> None: + """Validate FluentBundle.get_message() for one message identifier.""" + message = bundle.get_message(message_id) + if message is None: + return + + if message.id.name != message_id: + msg = f"get_message({message_id!r}) returned node named {message.id.name!r}" + raise RuntimeIntegrityError(msg) + if bundle.get_term(message_id) is not None: + msg = f"get_term({message_id!r}) crossed the message/term namespace boundary" + raise RuntimeIntegrityError(msg) + + declared_variables = bundle.get_message_variables(message_id) + validation = validate_message_variables(message, declared_variables) + if not validation.is_valid: + msg = f"validate_message_variables() rejected bundle message {message_id!r}" + raise RuntimeIntegrityError(msg) + if validation.declared_variables != declared_variables: + msg = ( + "validate_message_variables() changed declared variables for " + f"{message_id!r}: {validation.declared_variables!r} vs {declared_variables!r}" + ) + raise RuntimeIntegrityError(msg) + + +def _validate_bundle_term_lookup( + bundle: FluentBundle, + term_id: str, +) -> None: + """Validate FluentBundle.get_term() for one term identifier.""" + term = bundle.get_term(term_id) + if term is None: + return + + if term.id.name != term_id: + msg = f"get_term({term_id!r}) returned node named {term.id.name!r}" + raise RuntimeIntegrityError(msg) + if bundle.get_term(f"-{term_id}") is not None: + msg = f"get_term('-{term_id}') bypassed the no-leading-dash contract" + raise RuntimeIntegrityError(msg) + if bundle.get_message(term_id) is not None: + msg = f"get_message({term_id!r}) crossed the term/message namespace boundary" + raise RuntimeIntegrityError(msg) + + declared_variables = bundle.introspect_term(term_id).get_variable_names() + validation = validate_message_variables(term, declared_variables) + if not validation.is_valid: + msg = f"validate_message_variables() rejected bundle term {term_id!r}" + raise RuntimeIntegrityError(msg) + if validation.declared_variables != declared_variables: + msg = ( + "validate_message_variables() changed declared term variables for " + f"{term_id!r}: {validation.declared_variables!r} vs {declared_variables!r}" + ) + raise RuntimeIntegrityError(msg) + + +def _verify_ast_lookup_accessors(bundle: FluentBundle) -> None: + """Validate FluentBundle AST lookup accessors on the public facade.""" + _domain.ast_lookup_checks += 1 + + missing_id = "__missing_bundle_lookup__" + if bundle.get_message(missing_id) is not None: + msg = f"get_message({missing_id!r}) returned a node for a missing message" + raise RuntimeIntegrityError(msg) + if bundle.get_term(missing_id) is not None: + msg = f"get_term({missing_id!r}) returned a node for a missing term" + raise RuntimeIntegrityError(msg) + + for message_id in _IDENTIFIERS: + _validate_bundle_message_lookup(bundle, message_id) + + for term_id in _TERM_QUERY_IDS: + _validate_bundle_term_lookup(bundle, term_id) + diff --git a/fuzz_atheris/fuzz_runtime_entry.py b/fuzz_atheris/fuzz_runtime_entry.py new file mode 100644 index 00000000..a058f758 --- /dev/null +++ b/fuzz_atheris/fuzz_runtime_entry.py @@ -0,0 +1,270 @@ +# mypy: disable-error-code=name-defined +from fuzz_runtime_builders import ( + _add_random_resources, + _build_ftl_resource, + _fuzzed_function, + _generate_complex_args, + _verify_ast_lookup_accessors, +) +from fuzz_runtime_scenarios import ( + _execute_runtime_invariants, + _perform_security_fuzzing, +) +from fuzz_runtime_support import ( + _SCENARIO_SCHEDULE, + GC_INTERVAL, + TARGET_MESSAGE_IDS, + TEST_LOCALES, + CacheConfig, + CacheCorruptionError, + ComplexArgs, + FluentBundle, + FrozenFluentError, + RuntimeIntegrityError, + _domain, + _emit_checkpoint, + _state, + argparse, + atheris, + contextlib, + gc, + get_process, + print_fuzzer_banner, + record_iteration_metrics, + record_memory, + run_fuzzer, + select_pattern_round_robin, + sys, + threading, + time, +) + + +def _perform_differential_testing( + fdp: atheris.FuzzedDataProvider, + bundle: FluentBundle, + args: ComplexArgs, +) -> None: + """Differential testing: same FTL, different configs must not crash differently.""" + _domain.differential_tests += 1 + + alt_locale = fdp.PickValueInList(["en-US", "de-DE", "ar-EG", "ja-JP", "C", ""]) + alt_strict = not bundle.strict if fdp.ConsumeBool() else bundle.strict + alt_cache = not bundle.cache_enabled if fdp.ConsumeBool() else bundle.cache_enabled + + try: + alt_bundle = FluentBundle( + alt_locale, + strict=alt_strict, + cache=CacheConfig() if alt_cache else None, + ) + + # Copy functions + for name in bundle._function_registry: + func = bundle._function_registry.get_callable(name) + if func: + alt_bundle.add_function(name, func) + + # Same FTL resource + ftl = _build_ftl_resource(fdp) + with contextlib.suppress(Exception): + alt_bundle.add_resource(ftl) + + # Format all reachable messages + for msg_id in TARGET_MESSAGE_IDS[:8]: + with contextlib.suppress(Exception): + alt_bundle.format_pattern(msg_id, args) + + except Exception: # pylint: disable=broad-exception-caught + pass + + +def _run_concurrent_test( + fdp: atheris.FuzzedDataProvider, + bundle: FluentBundle, + args: ComplexArgs, + strict: bool, + enable_cache: bool, + cache_write_once: bool, +) -> None: + """Run concurrent execution test.""" + _domain.concurrent_tests += 1 + + barrier = threading.Barrier(2) + + def worker() -> None: + with contextlib.suppress(threading.BrokenBarrierError): + barrier.wait(timeout=1.0) + try: + _execute_runtime_invariants(fdp, bundle, args, strict, enable_cache, cache_write_once) + except CacheCorruptionError: + # Expected from corruption simulation in strict mode + pass + except (RecursionError, MemoryError, FrozenFluentError): + # FrozenFluentError: depth guard (MAX_DEPTH_EXCEEDED) + pass + + threads = [threading.Thread(target=worker) for _ in range(2)] + for t in threads: + t.start() + for t in threads: + t.join(timeout=3.0) + if t.is_alive(): + msg = "RWLock deadlock detected." + raise RuntimeIntegrityError(msg) + + +def test_one_input(data: bytes) -> None: # noqa: PLR0912, PLR0915 - dispatch + """Atheris entry point: Test runtime invariants and contracts.""" + # Initialize memory baseline + if _state.iterations == 0: + _state.initial_memory_mb = get_process().memory_info().rss / (1024 * 1024) + + _state.iterations += 1 + _state.status = "running" + + # Periodic checkpoint + if _state.iterations % _state.checkpoint_interval == 0: + _emit_checkpoint() + + start_time = time.perf_counter() + fdp = atheris.FuzzedDataProvider(data) + + scenario = select_pattern_round_robin(_state, _SCENARIO_SCHEDULE) + _state.pattern_coverage[scenario] = _state.pattern_coverage.get(scenario, 0) + 1 + + if fdp.remaining_bytes() < 2: + return + + # Security fuzzing (separate path) + if scenario == "security": + security_scenario = _perform_security_fuzzing(fdp) + _state.pattern_coverage[security_scenario] = ( + _state.pattern_coverage.get(security_scenario, 0) + 1 + ) + record_iteration_metrics(_state, scenario, start_time, data, is_interesting=True) + return + + # Configuration + strict = scenario == "strict_mode" or fdp.ConsumeBool() + enable_cache = scenario == "caching" or fdp.ConsumeBool() + use_isolating = fdp.ConsumeBool() + cache_write_once = fdp.ConsumeBool() + + # Locale selection + locale = fdp.PickValueInList(list(TEST_LOCALES)) + + try: + try: + cache_cfg = CacheConfig(write_once=cache_write_once) if enable_cache else None + bundle = FluentBundle( + locale, + strict=strict, + cache=cache_cfg, + use_isolating=use_isolating, + ) + if fdp.ConsumeBool(): + bundle.add_function("FUZZ_FUNC", _fuzzed_function) + except (ValueError, TypeError): + return + + # Add resources + _add_random_resources(fdp, bundle) + _verify_ast_lookup_accessors(bundle) + + # Generate args + args = _generate_complex_args(fdp) + + if strict: + _domain.strict_mode_tests += 1 + + # Execute based on scenario + if scenario == "concurrent": + _run_concurrent_test(fdp, bundle, args, strict, enable_cache, cache_write_once) + elif scenario == "differential": + _perform_differential_testing(fdp, bundle, args) + else: + _execute_runtime_invariants(fdp, bundle, args, strict, enable_cache, cache_write_once) + + except CacheCorruptionError: + if strict: + return # Expected + _state.findings += 1 + raise + + except RuntimeIntegrityError: + _state.findings += 1 + raise + + except Exception as e: # pylint: disable=broad-exception-caught + error_key = f"{type(e).__name__}_{str(e)[:30]}" + _state.error_counts[error_key] = _state.error_counts.get(error_key, 0) + 1 + + finally: + is_interesting = "security" in scenario or "integrity" in scenario or ( + (time.perf_counter() - start_time) * 1000 > 50.0 + ) + record_iteration_metrics( + _state, scenario, start_time, data, is_interesting=is_interesting, + ) + + # Break reference cycles in AST/error objects to prevent RSS growth + if _state.iterations % GC_INTERVAL == 0: + gc.collect() + + # Memory tracking (every 100 iterations) + if _state.iterations % 100 == 0: + record_memory(_state) + + +def main() -> None: + """Run the runtime fuzzer with CLI support.""" + parser = argparse.ArgumentParser( + description="Runtime end-to-end fuzzer using Atheris/libFuzzer", + epilog="All unrecognized arguments are passed to libFuzzer.", + ) + parser.add_argument( + "--checkpoint-interval", + type=int, + default=500, + help="Emit report every N iterations (default: 500)", + ) + parser.add_argument( + "--seed-corpus-size", + type=int, + default=500, + help="Maximum size of in-memory seed corpus (default: 500)", + ) + parser.add_argument( + "--recursion-limit", + type=int, + default=2000, + help="Python recursion limit (default: 2000)", + ) + + args, remaining = parser.parse_known_args() + _state.checkpoint_interval = args.checkpoint_interval + _state.seed_corpus_max_size = args.seed_corpus_size + sys.setrecursionlimit(args.recursion_limit) + + # Inject -rss_limit_mb default if not already specified. + # AST reference cycles can accumulate between gc passes; 4096 MB provides + # headroom while still catching true leaks before system OOM-kill. + if not any(arg.startswith("-rss_limit_mb") for arg in remaining): + remaining.append("-rss_limit_mb=4096") + + sys.argv = [sys.argv[0], *remaining] + + print_fuzzer_banner( + title="Runtime End-to-End Fuzzer (Atheris)", + target="FluentBundle, IntegrityCache, Resolver, Strict Mode", + state=_state, + schedule_len=len(_SCENARIO_SCHEDULE), + extra_lines=[f"Recursion: {args.recursion_limit} limit"], + ) + + run_fuzzer(_state, test_one_input=test_one_input) + + +if __name__ == "__main__": + main() diff --git a/fuzz_atheris/fuzz_runtime_scenarios.py b/fuzz_atheris/fuzz_runtime_scenarios.py new file mode 100644 index 00000000..d0f4115f --- /dev/null +++ b/fuzz_atheris/fuzz_runtime_scenarios.py @@ -0,0 +1,447 @@ +# mypy: disable-error-code=name-defined +from fuzz_runtime_support import ( + _NON_STRING_LOCALES, + _SECURITY_SCHEDULE, + _STRUCTURALLY_INVALID_LOCALES, + MAX_LOCALE_LENGTH_HARD_LIMIT, + TARGET_MESSAGE_IDS, + Any, + CacheConfig, + CacheCorruptionError, + ComplexArgs, + FluentBundle, + FormattingIntegrityError, + FrozenFluentError, + IntegrityCacheEntry, + RuntimeIntegrityError, + WriteConflictError, + _domain, + atheris, + contextlib, + require_locale_code, +) + + +def _test_dict_functions(_fdp: atheris.FuzzedDataProvider) -> None: + """Test FluentBundle rejects dict as functions parameter.""" + try: + invalid_functions: Any = {"NUMBER": lambda *_args, **_kwargs: "x"} + FluentBundle("en", functions=invalid_functions) + msg = "FluentBundle accepted dict as functions parameter" + raise RuntimeIntegrityError(msg) + except TypeError: + pass + except RuntimeIntegrityError: + raise + except Exception: # pylint: disable=broad-exception-caught + pass + + +def _execute_runtime_invariants( # noqa: PLR0912, PLR0915 - dispatch + fdp: atheris.FuzzedDataProvider, + bundle: FluentBundle, + args: ComplexArgs, + strict: bool, + enable_cache: bool, + cache_write_once: bool, +) -> None: + """Verify core runtime invariants across operations.""" + target_ids = list(TARGET_MESSAGE_IDS) + fdp_sample = fdp.ConsumeIntInRange(3, len(target_ids)) + sampled_ids = target_ids[:fdp_sample] + + for msg_id in sampled_ids: + attribute = fdp.PickValueInList([None, "title", "nonexistent"]) + try: + # Primary formatting + res1, err1 = bundle.format_pattern(msg_id, args, attribute=attribute) + + # INVARIANT: Strict Mode Integrity + if strict and len(err1) > 0: + _domain.strict_mode_tests += 1 + msg = f"Strict mode breach: {len(err1)} errors for '{msg_id}'." + raise RuntimeIntegrityError(msg) + + # INVARIANT: Frozen Error Integrity + for e in err1: + _domain.frozen_error_verifications += 1 + if not e.verify_integrity(): + msg = "FrozenFluentError checksum verification failed." + raise RuntimeIntegrityError(msg) + + # INVARIANT: Cache Stability + if enable_cache and bundle._cache is not None: + _domain.cache_operations += 1 + res2, err2 = bundle.format_pattern(msg_id, args, attribute=attribute) + _domain.cache_stability_checks += 1 + + if res1 != res2 or len(err1) != len(err2): + msg = f"Cache stability breach: non-deterministic result for '{msg_id}'." + raise RuntimeIntegrityError(msg) + + # Corruption simulation (5% chance) + if fdp.ConsumeProbability() < 0.05: + _domain.corruption_simulations += 1 + _simulate_corruption(bundle) + try: + bundle.format_pattern(msg_id, args, attribute=attribute) + except CacheCorruptionError as exc: + if not strict: + msg = "Non-strict cache raised CacheCorruptionError." + raise RuntimeIntegrityError(msg) from exc + except Exception as e: # pylint: disable=broad-exception-caught + is_corruption = "corruption" in str(e).lower() + if is_corruption and not isinstance(e, CacheCorruptionError): + msg = f"Wrong exception type for corruption: {type(e)}" + raise RuntimeIntegrityError(msg) from e + + except FormattingIntegrityError as e: + _domain.integrity_checks += 1 + if not strict: + msg = "Non-strict bundle raised FormattingIntegrityError." + raise RuntimeIntegrityError(msg) from e + if not e.fluent_errors: + msg = "FormattingIntegrityError empty." + raise RuntimeIntegrityError(msg) from e + + except WriteConflictError as e: + if not cache_write_once: + msg = "WriteConflictError raised when write_once=False." + raise RuntimeIntegrityError(msg) from e + + except (RecursionError, MemoryError, FrozenFluentError): + # FrozenFluentError: depth guard fires MAX_DEPTH_EXCEEDED as a safety + # mechanism regardless of strict mode to prevent stack overflow + pass + + +def _simulate_corruption(bundle: FluentBundle) -> None: + """Simulate cache corruption for integrity testing.""" + if bundle._cache is None: + return + with bundle._cache._lock: + if not bundle._cache._cache: + return + key = next(iter(bundle._cache._cache)) + entry = bundle._cache._cache[key] + + corrupted = IntegrityCacheEntry( + formatted=entry.formatted + "CORRUPTION", + errors=entry.errors, + checksum=entry.checksum, + created_at=entry.created_at, + sequence=entry.sequence, + key_hash=entry.key_hash, + ) + bundle._cache._cache[key] = corrupted + + +def _perform_security_fuzzing(fdp: atheris.FuzzedDataProvider) -> str: + """Perform security fuzzing with attack vectors.""" + _domain.security_tests += 1 + + attack_idx = fdp.ConsumeIntInRange(0, len(_SECURITY_SCHEDULE) - 1) + attack = str(_SECURITY_SCHEDULE[attack_idx]) + + match attack: + case "security_recursion": + _test_deep_recursion(fdp) + case "security_memory": + _test_memory_exhaustion(fdp) + case "security_cache_poison": + _test_cache_poisoning(fdp) + case "security_function_inject": + _test_function_injection(fdp) + case "security_locale_boundary": + _test_locale_boundary(fdp) + case "security_expansion_budget": + _test_expansion_budget(fdp) + case "security_dag_expansion": + _test_dag_expansion(fdp) + case "security_dict_functions": + _test_dict_functions(fdp) + + return attack + + +def _test_deep_recursion(fdp: atheris.FuzzedDataProvider) -> None: + """Test deep recursion via nested placeables and cyclic references.""" + attack_type = fdp.ConsumeIntInRange(0, 2) + try: + bundle = FluentBundle("en", strict=False) + match attack_type: + case 0: + # Deep nested placeables + depth = fdp.ConsumeIntInRange(50, 200) + ftl = "msg = " + "{ " * depth + "$var" + " }" * depth + "\n" + bundle.add_resource(ftl) + case 1: + # Cyclic reference chain + chain_len = fdp.ConsumeIntInRange(2, 20) + parts = [] + for i in range(chain_len): + next_id = f"c{(i + 1) % chain_len}" + parts.append(f"c{i} = {{ {next_id} }}\n") + bundle.add_resource("\n".join(parts)) + case _: + # Self-referencing term with select + ftl = "-self = { -self ->\n *[other] { -self }\n}\nmsg = { -self }\n" + bundle.add_resource(ftl) + bundle.format_pattern("msg" if attack_type == 0 else "c0", {"var": "test"}) + except (RecursionError, MemoryError, ValueError, FrozenFluentError): + # FrozenFluentError: depth guard fires MAX_DEPTH_EXCEEDED regardless of strict mode + pass + + +def _test_memory_exhaustion(fdp: atheris.FuzzedDataProvider) -> None: + """Test memory exhaustion via large values and many variants.""" + attack_type = fdp.ConsumeIntInRange(0, 2) + try: + bundle = FluentBundle("en", strict=False) + match attack_type: + case 0: + # Large string value + size = fdp.ConsumeIntInRange(10000, 100000) + bundle.add_resource(f"msg = {'x' * size}\n") + case 1: + # Many variants in select + n = fdp.ConsumeIntInRange(50, 200) + variants = "\n".join(f" [{'*' if i == 0 else ''}v{i}] val{i}" for i in range(n)) + bundle.add_resource(f"msg = {{ $var ->\n{variants}\n}}\n") + case _: + # Many attributes + n = fdp.ConsumeIntInRange(50, 200) + attrs = "\n".join(f" .a{i} = val{i}" for i in range(n)) + bundle.add_resource(f"msg = val\n{attrs}\n") + bundle.format_pattern("msg", {"var": "test"}) + except (MemoryError, ValueError, FrozenFluentError): + pass + + +def _test_cache_poisoning(fdp: atheris.FuzzedDataProvider) -> None: + """Test cache poisoning attack.""" + try: + bundle = FluentBundle("en", cache=CacheConfig(), strict=False) + bundle.add_resource("msg = Hello { $name }\n") + + malicious_args = [ + {"name": float("inf")}, + {"name": float("-inf")}, + {"name": float("nan")}, + {"name": None}, + {"name": []}, + ] + + for args in malicious_args[: fdp.ConsumeIntInRange(1, len(malicious_args))]: + with contextlib.suppress(Exception): + unsafe_args: Any = args + bundle.format_pattern("msg", unsafe_args) + + except Exception: # pylint: disable=broad-exception-caught + pass + + +def _test_function_injection(fdp: atheris.FuzzedDataProvider) -> None: + """Test function injection and recursive custom function attacks. + + Two sub-patterns: + 0 - No-op custom function (baseline injection) + 1 - Recursive custom function that calls back into bundle.format_pattern(), + testing GlobalDepthGuard cross-context recursion protection + """ + attack_variant = fdp.ConsumeIntInRange(0, 1) + try: + bundle = FluentBundle("en", strict=False) + + if attack_variant == 0: + # Baseline: no-op custom function + def noop_func(*_args: Any, **_kwargs: Any) -> str: + return "safe_output" + + bundle.add_function("INJECT", noop_func) + bundle.add_resource("msg = { INJECT() }\n") + bundle.format_pattern("msg", {}) + else: + # Recursive: custom function calls back into format_pattern, + # exercising GlobalDepthGuard across function boundaries + call_depth = fdp.ConsumeIntInRange(1, 10) + counter = {"n": 0} + + def recursive_func(*_args: Any, **_kwargs: Any) -> str: + counter["n"] += 1 + if counter["n"] < call_depth: + result, _ = bundle.format_pattern("recurse", {}) + return str(result) + return "base" + + bundle.add_function("RECURSE_FN", recursive_func) + bundle.add_resource("recurse = { RECURSE_FN() }\nmsg = { RECURSE_FN() }\n") + bundle.format_pattern("msg", {}) + + except Exception: # pylint: disable=broad-exception-caught + pass + + +def _assert_bundle_locale_accepts(raw_locale: str) -> None: + """Accepted constructor locales are canonicalized to LocaleCode form.""" + try: + bundle = FluentBundle(raw_locale, strict=False) + except Exception as err: # pylint: disable=broad-exception-caught + msg = f"FluentBundle rejected valid locale {raw_locale!r}: {err}" + raise RuntimeIntegrityError(msg) from err + + expected_locale = require_locale_code(raw_locale, "locale") + if bundle.locale != expected_locale: + msg = ( + f"FluentBundle stored the wrong canonical locale for {raw_locale!r}: " + f"{bundle.locale!r} vs {expected_locale!r}" + ) + raise RuntimeIntegrityError(msg) + + bundle.add_resource("msg = ready\n") + result, errors = bundle.format_pattern("msg", {}) + if result != "ready" or errors: + msg = ( + f"FluentBundle with accepted locale {expected_locale!r} " + f"failed basic formatting: result={result!r}, errors={errors!r}" + ) + raise RuntimeIntegrityError(msg) + + +def _assert_bundle_locale_rejected( + locale: object, + *, + expected_exception: type[ValueError | TypeError], + expected_fragment: str, +) -> None: + """Rejected constructor locales surface the canonical boundary error model.""" + locale_value: Any = locale + + try: + FluentBundle(locale_value, strict=False) + except Exception as err: # pylint: disable=broad-exception-caught + if not isinstance(err, expected_exception): + msg = ( + "FluentBundle raised the wrong locale-boundary exception for " + f"{locale!r}: {type(err).__name__}" + ) + raise RuntimeIntegrityError(msg) from err + if expected_fragment not in str(err): + msg = ( + "FluentBundle locale-boundary error message drifted for " + f"{locale!r}: {err}" + ) + raise RuntimeIntegrityError(msg) from err + return + + msg = f"FluentBundle accepted invalid locale {locale!r}" + raise RuntimeIntegrityError(msg) + + +def _test_locale_boundary(fdp: atheris.FuzzedDataProvider) -> None: + """Test the FluentBundle constructor locale boundary contract.""" + _domain.locale_boundary_checks += 1 + scenario = fdp.ConsumeIntInRange(0, 5) + boundary_locale = "a" + ("b" * (MAX_LOCALE_LENGTH_HARD_LIMIT - 2)) + "C" + + match scenario: + case 0: + raw_locale = fdp.PickValueInList( + [ + " EN-us ", + "\tpt-BR\n", + " lv-LV ", + ] + ) + _assert_bundle_locale_accepts(raw_locale) + case 1: + blank_locale = fdp.PickValueInList(["", " ", "\t\n", " \r\n "]) + _assert_bundle_locale_rejected( + blank_locale, + expected_exception=ValueError, + expected_fragment="locale cannot be blank", + ) + case 2: + invalid_locale = fdp.PickValueInList(list(_STRUCTURALLY_INVALID_LOCALES)) + _assert_bundle_locale_rejected( + invalid_locale, + expected_exception=ValueError, + expected_fragment="Invalid locale:", + ) + case 3: + _assert_bundle_locale_rejected( + f" {boundary_locale} ", + expected_exception=ValueError, + expected_fragment="Unknown locale identifier", + ) + case 4: + overshoot = fdp.ConsumeIntInRange(1, 32) + overlong_locale = "a" * (MAX_LOCALE_LENGTH_HARD_LIMIT + overshoot) + _assert_bundle_locale_rejected( + overlong_locale, + expected_exception=ValueError, + expected_fragment="locale exceeds maximum length", + ) + case _: + non_string_locale = fdp.PickValueInList(list(_NON_STRING_LOCALES)) + _assert_bundle_locale_rejected( + non_string_locale, + expected_exception=TypeError, + expected_fragment="locale must be str", + ) + + +def _test_expansion_budget(fdp: atheris.FuzzedDataProvider) -> None: + """Test Billion Laughs expansion budget. + + Constructs exponentially expanding message references: + m0={m1}{m1}, m1={m2}{m2}, ... so small FTL produces huge output. + The expansion budget (max_expansion_size) should halt resolution. + """ + depth = fdp.ConsumeIntInRange(5, 20) + # Use both default and small budgets to exercise the guard path + budget = fdp.PickValueInList([100, 1000, 10000, None]) + try: + kwargs: dict[str, Any] = {"strict": False} + if budget is not None: + kwargs["max_expansion_size"] = budget + bundle = FluentBundle("en", **kwargs) + parts = [] + for i in range(depth): + parts.append(f"m{i} = {{ m{i + 1} }}{{ m{i + 1} }}\n") + parts.append(f"m{depth} = payload\n") + bundle.add_resource("\n".join(parts)) + bundle.format_pattern("m0", {}) + except (RecursionError, MemoryError, FrozenFluentError, ValueError): + pass + + +def _test_dag_expansion(fdp: atheris.FuzzedDataProvider) -> None: + """Test _make_hashable DAG expansion DoS. + + Constructs deeply shared references as cache args to stress the + node budget in IntegrityCache._make_hashable(). + """ + try: + bundle = FluentBundle("en", cache=CacheConfig(), strict=False) + bundle.add_resource("msg = Hello { $name }\n") + + # Build DAG: l = [l, l] repeated N times. + # Cap at 20: depth 20 creates 2^20 logical nodes which is sufficient + # to trigger _make_hashable node budget (10,000). Higher depths cause + # exponential str() expansion in the resolver (2^30 = 1B nodes). + depth = fdp.ConsumeIntInRange(10, 20) + dag: list[Any] = ["leaf"] + for _ in range(depth): + dag = [dag, dag] + + with contextlib.suppress(Exception): + unsafe_args: Any = {"name": dag} + bundle.format_pattern("msg", unsafe_args) + + # Lock must still be usable after DAG rejection + with contextlib.suppress(Exception): + bundle.format_pattern("msg", {"name": "safe"}) + + except Exception: # pylint: disable=broad-exception-caught + pass diff --git a/fuzz_atheris/fuzz_runtime_support.py b/fuzz_atheris/fuzz_runtime_support.py new file mode 100644 index 00000000..f234f492 --- /dev/null +++ b/fuzz_atheris/fuzz_runtime_support.py @@ -0,0 +1,400 @@ +"""Shared state, imports, and constants for the runtime Atheris fuzzer.""" + +from __future__ import annotations + +import argparse +import atexit +import contextlib +import gc +import logging +import pathlib +import sys +import threading +import time +from dataclasses import dataclass +from datetime import UTC, datetime +from typing import TYPE_CHECKING, Any, cast + +if TYPE_CHECKING: + from collections.abc import Sequence + +# --- Dependency Checks --- +_psutil_mod: Any = None +_atheris_mod: Any = None + +try: + import psutil as _psutil_import +except ImportError: + pass +else: + _psutil_mod = _psutil_import + +try: + import atheris as _atheris_import +except ImportError: + pass +else: + _atheris_mod = _atheris_import + +from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 + GC_INTERVAL, + BaseFuzzerState, + build_base_stats_dict, + build_weighted_schedule, + check_dependencies, + emit_checkpoint_report, + emit_final_report, + get_process, + print_fuzzer_banner, + record_iteration_metrics, + record_memory, + run_fuzzer, + select_pattern_round_robin, +) + +check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) + +atheris = cast("Any", _atheris_mod) + +type ComplexArgs = dict[str, Any] + +@dataclass +class RuntimeMetrics: + """Domain-specific metrics for runtime fuzzer.""" + + strict_mode_tests: int = 0 + cache_operations: int = 0 + integrity_checks: int = 0 + security_tests: int = 0 + concurrent_tests: int = 0 + differential_tests: int = 0 + + # Contract validation + frozen_error_verifications: int = 0 + cache_stability_checks: int = 0 + corruption_simulations: int = 0 + ast_lookup_checks: int = 0 + locale_boundary_checks: int = 0 + +_state = BaseFuzzerState( + seed_corpus_max_size=500, + fuzzer_name="runtime", + fuzzer_target="FluentBundle, IntegrityCache, Resolver, Strict Mode, Locale Boundary", +) +_domain = RuntimeMetrics() + +TEST_LOCALES: Sequence[str] = ( + "en-US", + "en-GB", + "lv-LV", + "ar-EG", + "ar-SA", + "pl-PL", + "zh-CN", + "ja-JP", + "de-DE", + "fr-FR", + "", # Empty locale + "C", # POSIX + "root", # CLDR root +) + +_STRUCTURALLY_INVALID_LOCALES: Sequence[str] = ( + "en/US", + "en US", + "en@US", + "123_US", + "\x00\x01\x02", + "en-US" + "\x00" * 8, + "invalid!!", +) + +_NON_STRING_LOCALES: Sequence[object] = ( + None, + 0, + 1.5, + ["en-US"], + {"locale": "en-US"}, +) + +TARGET_MESSAGE_IDS: Sequence[str] = ( + "msg", + "msg2", + "msg3", + "ref", + "tref", + "attr", + "cyclic", + "deep", + "func_call", + "num_sel", + "str_sel", + "nested", + "chain_a", + "chain_b", + "chain_c", + "nonexistent", +) +_IDENTIFIERS: Sequence[str] = ( + "msg", + "msg2", + "msg3", + "ref", + "tref", + "attr", + "func_call", + "num_sel", + "str_sel", + "nested", + "chain_a", + "chain_b", + "chain_c", + "deep", +) + +_TERM_IDENTIFIERS: Sequence[str] = ( + "-brand", + "-term", + "-os", + "-platform", + "-greeting", +) +_TERM_QUERY_IDS: Sequence[str] = tuple( + term.removeprefix("-") for term in _TERM_IDENTIFIERS +) + +_VAR_NAMES: Sequence[str] = ( + "$var", + "$name", + "$count", + "$amount", + "$date", + "$var_0", + "$var_1", + "$var_2", + "$var_3", +) + +_BUILTIN_FUNCTIONS: Sequence[str] = ( + "NUMBER", + "DATETIME", + "CURRENCY", +) + +_NUMBER_OPTS: Sequence[str] = ( + "minimumFractionDigits: 0", + "minimumFractionDigits: 2", + "maximumFractionDigits: 0", + "maximumFractionDigits: 5", + 'useGrouping: "true"', + 'useGrouping: "false"', +) + +_DATETIME_OPTS: Sequence[str] = ( + 'dateStyle: "short"', + 'dateStyle: "medium"', + 'dateStyle: "long"', + 'dateStyle: "full"', + 'timeStyle: "short"', + 'timeStyle: "long"', +) + +_CURRENCY_OPTS: Sequence[str] = ( + 'currency: "USD"', + 'currency: "EUR"', + 'currency: "JPY"', + 'currency: "BHD"', + 'currencyDisplay: "symbol"', + 'currencyDisplay: "code"', + 'currencyDisplay: "name"', +) + +_SELECTOR_KEYS: Sequence[str] = ( + "one", + "two", + "few", + "many", + "other", + "zero", +) + +_UNICODE_TEXTS: Sequence[str] = ( + "Hello", + "© ® ™", + "😀 🌟 🚀", + "مرحبا عالم", + "c\u0308a\u0308f\u0308e\u0308", + "\u200b\u200e\u200f", + "边界条件", + "", +) + + +# Scenario weights: (name, weight) +_SCENARIO_WEIGHTS: tuple[tuple[str, int], ...] = ( + ("core_runtime", 40), + ("strict_mode", 20), + ("caching", 15), + ("security", 10), + ("concurrent", 10), + ("differential", 5), +) + +_SCENARIO_SCHEDULE: tuple[str, ...] = build_weighted_schedule( + [name for name, _ in _SCENARIO_WEIGHTS], + [weight for _, weight in _SCENARIO_WEIGHTS], +) + +# Register intended weights for skew detection +_state.pattern_intended_weights = {name: float(weight) for name, weight in _SCENARIO_WEIGHTS} + +# Security attack sub-schedule +_SECURITY_WEIGHTS: tuple[tuple[str, int], ...] = ( + ("security_recursion", 25), + ("security_memory", 20), + ("security_cache_poison", 15), + ("security_function_inject", 12), + ("security_locale_boundary", 8), + ("security_expansion_budget", 8), + ("security_dag_expansion", 7), + ("security_dict_functions", 5), +) + +_SECURITY_SCHEDULE: tuple[str, ...] = build_weighted_schedule( + [name for name, _ in _SECURITY_WEIGHTS], + [weight for _, weight in _SECURITY_WEIGHTS], +) + + +class RuntimeIntegrityError(Exception): + """Raised when a runtime invariant is breached.""" + + +# --- Reporting --- + +_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "runtime" + + +def _build_stats_dict() -> dict[str, Any]: + """Build complete stats dictionary including domain metrics.""" + stats = cast( + "dict[str, Any]", + build_base_stats_dict( + _state, + coverage_key="scenarios_tested", + coverage_prefix="scenario_", + ), + ) + + # Domain-specific metrics + stats["strict_mode_tests"] = _domain.strict_mode_tests + stats["cache_operations"] = _domain.cache_operations + stats["integrity_checks"] = _domain.integrity_checks + stats["security_tests"] = _domain.security_tests + stats["concurrent_tests"] = _domain.concurrent_tests + stats["differential_tests"] = _domain.differential_tests + + # Contract validation metrics + stats["frozen_error_verifications"] = _domain.frozen_error_verifications + stats["cache_stability_checks"] = _domain.cache_stability_checks + stats["corruption_simulations"] = _domain.corruption_simulations + stats["ast_lookup_checks"] = _domain.ast_lookup_checks + stats["locale_boundary_checks"] = _domain.locale_boundary_checks + + return stats + + +_REPORT_FILENAME = "fuzz_runtime_report.json" + + +def _emit_checkpoint() -> None: + """Emit periodic checkpoint (uses checkpoint markers).""" + stats = _build_stats_dict() + emit_checkpoint_report( + _state, stats, _REPORT_DIR, _REPORT_FILENAME, + ) + + +def _emit_report() -> None: + """Emit comprehensive final report (crash-proof).""" + stats = _build_stats_dict() + emit_final_report(_state, stats, _REPORT_DIR, _REPORT_FILENAME) + + +atexit.register(_emit_report) + +# --- Suppress logging and instrument imports --- +logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) + +# Enable string and regex comparison instrumentation for better coverage +# of message ID lookups, selector key matching, and pattern-based parsing +atheris.enabled_hooks.add("str") +atheris.enabled_hooks.add("RegEx") + +with atheris.instrument_imports(include=["ftllexengine"]): + from ftllexengine import validate_message_variables + from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT + from ftllexengine.core.locale_utils import require_locale_code + from ftllexengine.diagnostics.errors import FrozenFluentError + from ftllexengine.integrity import ( + CacheCorruptionError, + FormattingIntegrityError, + WriteConflictError, + ) + from ftllexengine.runtime.bundle import FluentBundle + from ftllexengine.runtime.cache import IntegrityCacheEntry + from ftllexengine.runtime.cache_config import CacheConfig + from ftllexengine.syntax import Message, Term + + +__all__ = [ + "GC_INTERVAL", + "MAX_LOCALE_LENGTH_HARD_LIMIT", + "TARGET_MESSAGE_IDS", + "TEST_LOCALES", + "UTC", + "_CURRENCY_OPTS", + "_DATETIME_OPTS", + "_IDENTIFIERS", + "_NON_STRING_LOCALES", + "_NUMBER_OPTS", + "_SCENARIO_SCHEDULE", + "_SECURITY_SCHEDULE", + "_SELECTOR_KEYS", + "_STRUCTURALLY_INVALID_LOCALES", + "_TERM_IDENTIFIERS", + "_TERM_QUERY_IDS", + "_UNICODE_TEXTS", + "_VAR_NAMES", + "Any", + "CacheConfig", + "CacheCorruptionError", + "ComplexArgs", + "FluentBundle", + "FormattingIntegrityError", + "FrozenFluentError", + "IntegrityCacheEntry", + "Message", + "RuntimeIntegrityError", + "Term", + "WriteConflictError", + "_domain", + "_emit_checkpoint", + "_state", + "argparse", + "atheris", + "contextlib", + "datetime", + "gc", + "get_process", + "print_fuzzer_banner", + "record_iteration_metrics", + "record_memory", + "require_locale_code", + "run_fuzzer", + "select_pattern_round_robin", + "sys", + "threading", + "time", + "validate_message_variables", +] diff --git a/fuzz_atheris/fuzz_scope.py b/fuzz_atheris/fuzz_scope.py index e6d478cb..9651bf43 100644 --- a/fuzz_atheris/fuzz_scope.py +++ b/fuzz_atheris/fuzz_scope.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: scope - Variable Shadowing & Scoping Invariants -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Variable Scope & Resolution Context Fuzzer (Atheris). Targets: ftllexengine.runtime.resolver (FluentResolver, ResolutionContext, diff --git a/fuzz_atheris/fuzz_serializer.py b/fuzz_atheris/fuzz_serializer.py index 11763f7f..7f28de1d 100644 --- a/fuzz_atheris/fuzz_serializer.py +++ b/fuzz_atheris/fuzz_serializer.py @@ -1,1104 +1,9 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: serializer - AST-construction serializer roundtrip -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END -"""AST-Construction Serializer Fuzzer (Atheris). - -Targets: ftllexengine.syntax.serializer.serialize, - ftllexengine.syntax.parser.FluentParserV1, - ftllexengine.syntax.visitor.ASTVisitor / ASTTransformer - -Concern boundary: This fuzzer programmatically constructs AST nodes -(bypassing the parser) and feeds them to the serializer. This is the -ONLY Atheris fuzzer that can produce AST states the parser would never -emit -- e.g. TextElement values with leading whitespace, syntax characters -in pattern-initial positions, empty patterns, or structurally valid but -semantically unusual combinations. - -This directly addresses the blind spot where text-based fuzzers -(fuzz_roundtrip, fuzz_structured) start from the parser, which normalizes -inputs before the serializer ever sees them. - -The same AST-construction model is also ideal for visitor/transformer -coverage because it can construct trees and transformation results that -ordinary parser-driven fuzzers do not reach. - -Invariant: -- serialize(ast) must produce valid FTL (no Junk on reparse) -- Idempotence: serialize(parse(serialize(ast))) == serialize(ast) - -Pattern Routing: -Deterministic round-robin from a weighted schedule (same infrastructure -as fuzz_roundtrip). Pattern selection is independent of fuzzed bytes -to avoid coverage-guided mutation bias. - -Custom Mutator: -AST-level mutations applied to programmatically constructed ASTs: -inject leading/trailing whitespace, syntax characters, empty patterns, -deeply nested placeables. Byte-level mutation applied on top. - -Finding Artifacts: -Convergence failures write source/S1/S2/metadata to -.fuzz_atheris_corpus/serializer/findings/ for standalone reproduction. - -Requires Python 3.13+ (uses PEP 695 type aliases). -""" +"""AST-construction serializer Atheris entry wrapper.""" from __future__ import annotations -import argparse -import atexit -import gc -import logging -import pathlib -import random -import sys -import time -from dataclasses import dataclass -from dataclasses import replace as dc_replace -from typing import Any, cast - -# --- Dependency Checks --- -_psutil_mod: Any = None -_atheris_mod: Any = None - -try: # noqa: SIM105 - need module ref for check_dependencies - import psutil as _psutil_mod # type: ignore[no-redef] -except ImportError: - pass - -try: # noqa: SIM105 - need module ref for check_dependencies - import atheris as _atheris_mod # type: ignore[no-redef] -except ImportError: - pass - -from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 - GC_INTERVAL, - BaseFuzzerState, - build_base_stats_dict, - build_weighted_schedule, - check_dependencies, - emit_final_report, - gen_ftl_identifier, - gen_ftl_value, - get_process, - print_fuzzer_banner, - record_iteration_metrics, - record_memory, - run_fuzzer, - select_pattern_round_robin, - write_finding_artifact, -) - -check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) - -import atheris # noqa: E402 # pylint: disable=C0412,C0413 - -# --- Domain Metrics --- - - -@dataclass -class SerializerMetrics: - """Domain-specific metrics for AST-construction serializer fuzzer.""" - - ast_construction_failures: int = 0 - convergence_failures: int = 0 - junk_on_reparse: int = 0 - validation_errors: int = 0 - visitor_runs: int = 0 - transformer_runs: int = 0 - - -# --- Global State --- - -_state = BaseFuzzerState( - seed_corpus_max_size=100, - fuzzer_name="serializer", - fuzzer_target="serialize (AST-constructed), FluentParserV1", -) -_domain = SerializerMetrics() - - -# Pattern weights: (name, weight) -_PATTERN_WEIGHTS: tuple[tuple[str, int], ...] = ( - ("leading_whitespace", 18), - ("trailing_whitespace", 8), - ("syntax_chars_value", 15), - ("simple_message", 8), - ("string_literal_placeable", 10), - ("attribute_edge_cases", 12), - ("term_edge_cases", 8), - ("select_expression", 8), - ("mixed_elements", 8), - ("multiline_value", 5), - ("visitor_dispatch", 8), - ("transformer_roundtrip", 8), - ("transformer_validation", 6), -) - -_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( - [name for name, _ in _PATTERN_WEIGHTS], - [weight for _, weight in _PATTERN_WEIGHTS], -) - -# Register intended weights for skew detection -_state.pattern_intended_weights = { - name: float(weight) for name, weight in _PATTERN_WEIGHTS -} - - -class SerializerFuzzError(Exception): - """Raised when a serializer roundtrip invariant is breached.""" - - -# Allowed exceptions from parser/serializer -ALLOWED_EXCEPTIONS = ( - ValueError, - TypeError, - RecursionError, - MemoryError, - UnicodeDecodeError, - UnicodeEncodeError, -) - - -# --- Reporting --- - -_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "serializer" - - -def _build_stats_dict() -> dict[str, Any]: - """Build complete stats dictionary including domain metrics.""" - stats = build_base_stats_dict(_state) - - stats["ast_construction_failures"] = _domain.ast_construction_failures - stats["convergence_failures"] = _domain.convergence_failures - stats["junk_on_reparse"] = _domain.junk_on_reparse - stats["validation_errors"] = _domain.validation_errors - stats["visitor_runs"] = _domain.visitor_runs - stats["transformer_runs"] = _domain.transformer_runs - - return stats - - -def _emit_report() -> None: - """Emit comprehensive final report (crash-proof).""" - emit_final_report( - _state, _build_stats_dict(), _REPORT_DIR, - "fuzz_serializer_report.json", - ) - - -atexit.register(_emit_report) - - -# --- Finding Artifacts --- - -_FINDINGS_DIR = _REPORT_DIR / "findings" - - -# --- Instrumentation & Parser --- - -logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) - -atheris.enabled_hooks.add("str") -# RegEx hook omitted: serializer fuzzer constructs ASTs programmatically, -# no regex in the hot path. The hook triggers spurious Atheris errors on -# transitively imported stdlib regex patterns (e.g., email.charset). - -with atheris.instrument_imports(include=["ftllexengine"]): - from ftllexengine.syntax.ast import ( - Attribute, - Identifier, - Junk, - Message, - NumberLiteral, - Pattern, - Placeable, - Resource, - SelectExpression, - StringLiteral, - Term, - TermReference, - TextElement, - VariableReference, - Variant, - ) - from ftllexengine.syntax.parser import FluentParserV1 - from ftllexengine.syntax.serializer import serialize - from ftllexengine.syntax.visitor import ASTTransformer, ASTVisitor - -_parser = FluentParserV1() - - -# --- AST Construction Helpers --- - -# Characters that are syntactically significant in FTL pattern positions -_FTL_SYNTAX_CHARS = "{}.#*[" - - -def _mk_id(fdp: atheris.FuzzedDataProvider) -> Identifier: - """Construct an Identifier AST node from fuzzed bytes.""" - return Identifier(name=gen_ftl_identifier(fdp)) - - -def _mk_pattern(text: str) -> Pattern: - """Construct a single-element text Pattern.""" - return Pattern(elements=(TextElement(value=text),)) - - -def _mk_attr( - fdp: atheris.FuzzedDataProvider, - value_text: str, -) -> Attribute: - """Construct an Attribute with the given value text.""" - return Attribute(id=_mk_id(fdp), value=_mk_pattern(value_text)) - - -def _mk_message( - fdp: atheris.FuzzedDataProvider, - *, - value: Pattern | None = None, - attributes: tuple[Attribute, ...] = (), -) -> Message: - """Construct a Message AST node.""" - return Message(id=_mk_id(fdp), value=value, attributes=attributes) - - -def _mk_term( - fdp: atheris.FuzzedDataProvider, - *, - value: Pattern, - attributes: tuple[Attribute, ...] = (), -) -> Term: - """Construct a Term AST node.""" - return Term(id=_mk_id(fdp), value=value, attributes=attributes) - - -def _mk_nonempty_value( - fdp: atheris.FuzzedDataProvider, - *, - max_length: int = 24, -) -> str: - """Generate a non-empty FTL-safe text fragment.""" - value = gen_ftl_value(fdp, max_length=max_length) - return value or "value" - - -def _build_visitor_resource(fdp: atheris.FuzzedDataProvider) -> Resource: - """Construct a small but structurally rich AST for visitor coverage.""" - selector = VariableReference(id=Identifier(name="count")) - variants = ( - Variant( - key=Identifier(name="one"), - value=_mk_pattern(_mk_nonempty_value(fdp)), - ), - Variant( - key=Identifier(name="other"), - value=_mk_pattern(_mk_nonempty_value(fdp)), - default=True, - ), - ) - select = SelectExpression(selector=selector, variants=variants) - message = Message( - id=Identifier(name=f"msg-{gen_ftl_identifier(fdp)}"), - value=Pattern( - elements=( - TextElement(value=_mk_nonempty_value(fdp)), - Placeable(expression=select), - ), - ), - attributes=( - Attribute( - id=Identifier(name="label"), - value=_mk_pattern(_mk_nonempty_value(fdp)), - ), - ), - ) - term = Term( - id=Identifier(name=f"term-{gen_ftl_identifier(fdp)}"), - value=Pattern( - elements=( - TextElement(value=_mk_nonempty_value(fdp)), - Placeable( - expression=StringLiteral(value=_mk_nonempty_value(fdp)), - ), - ), - ), - attributes=(), - ) - return Resource(entries=(message, term)) - - -# --- Roundtrip Verification --- - - -def _verify_serializer_roundtrip( - ast: Resource, - pattern: str, -) -> None: - """Verify serialize(ast) -> parse -> serialize convergence. - - Steps: - 1. Serialize constructed AST -> S1 - 2. Parse S1 -> AST2 - 3. Check no Junk entries - 4. Serialize AST2 -> S2 - 5. Assert S1 == S2 (idempotence) - - On failure, writes finding artifacts before raising. - """ - s1 = serialize(ast, validate=False) - - ast2 = _parser.parse(s1) - - if any(isinstance(e, Junk) for e in ast2.entries): - _domain.junk_on_reparse += 1 - write_finding_artifact( - findings_dir=_FINDINGS_DIR, state=_state, - source=f"[AST-constructed: {pattern}]", s1=s1, s2="", - pattern=pattern, - extra_meta={"failure_type": "junk_on_reparse"}, - ) - msg = ( - f"Serialized AST produced Junk on re-parse.\n" - f"Pattern: {pattern}\n" - f"S1 ({len(s1)} chars): {s1[:200]!r}" - ) - raise SerializerFuzzError(msg) - - s2 = serialize(ast2) - - if s1 != s2: - _domain.convergence_failures += 1 - write_finding_artifact( - findings_dir=_FINDINGS_DIR, state=_state, - source=f"[AST-constructed: {pattern}]", s1=s1, s2=s2, - pattern=pattern, - extra_meta={"failure_type": "convergence_failure"}, - ) - msg = ( - f"Convergence failure: S(AST) != S(P(S(AST)))\n" - f"Pattern: {pattern}\n" - f"S1 ({len(s1)} chars): {s1[:200]!r}\n" - f"S2 ({len(s2)} chars): {s2[:200]!r}" - ) - raise SerializerFuzzError(msg) - - -# --- Pattern Implementations --- - - -def _pattern_leading_whitespace( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """TextElement values with leading whitespace. - - Targets BUG-SERIALIZER-LEADING-WS-001: the parser consumes post-= - whitespace as syntax, so leading spaces in TextElement values must - be wrapped in StringLiteral placeables by the serializer. - """ - num_spaces = fdp.ConsumeIntInRange(1, 8) - base_value = gen_ftl_value(fdp) - value_text = " " * num_spaces + base_value - - # Message with leading-whitespace value - msg = _mk_message(fdp, value=_mk_pattern(value_text)) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - # Attribute with leading-whitespace value - attr = _mk_attr(fdp, value_text) - msg2 = _mk_message( - fdp, - value=_mk_pattern(gen_ftl_value(fdp)), - attributes=(attr,), - ) - _verify_serializer_roundtrip(Resource(entries=(msg2,)), pattern) - - -def _pattern_trailing_whitespace( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """TextElement values with trailing whitespace.""" - num_spaces = fdp.ConsumeIntInRange(1, 8) - base_value = gen_ftl_value(fdp) - value_text = base_value + " " * num_spaces - - msg = _mk_message(fdp, value=_mk_pattern(value_text)) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - -def _pattern_syntax_chars_value( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """TextElement values containing FTL syntax characters. - - Tests that the serializer correctly escapes or wraps braces, - dots, hash, asterisk, and brackets at various positions. - """ - base_value = gen_ftl_value(fdp) - char = _FTL_SYNTAX_CHARS[ - fdp.ConsumeIntInRange(0, len(_FTL_SYNTAX_CHARS) - 1) - ] - pos = fdp.ConsumeIntInRange(0, len(base_value)) - value_text = base_value[:pos] + char + base_value[pos:] - - msg = _mk_message(fdp, value=_mk_pattern(value_text)) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - # Also test in attribute value - attr = _mk_attr(fdp, value_text) - msg2 = _mk_message( - fdp, - value=_mk_pattern(gen_ftl_value(fdp)), - attributes=(attr,), - ) - _verify_serializer_roundtrip(Resource(entries=(msg2,)), pattern) - - -def _pattern_simple_message( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """Baseline: simple message with clean text value.""" - value = gen_ftl_value(fdp) - msg = _mk_message(fdp, value=_mk_pattern(value)) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - -def _pattern_string_literal_placeable( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """Patterns with StringLiteral placeables containing edge-case content.""" - literal_value = gen_ftl_value(fdp, max_length=20) - - # Optionally inject special content - special = fdp.ConsumeIntInRange(0, 3) - match special: - case 0: - literal_value = " " * fdp.ConsumeIntInRange(1, 5) - case 1: - literal_value = "\\" + literal_value - case 2: - literal_value = '"' + literal_value + '"' - - placeable = Placeable(expression=StringLiteral(value=literal_value)) - text_before = TextElement(value=gen_ftl_value(fdp, max_length=15)) - elements: tuple[TextElement | Placeable, ...] = (text_before, placeable) - pat = Pattern(elements=elements) - - msg = _mk_message(fdp, value=pat) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - -def _pattern_attribute_edge_cases( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """Attributes with edge-case values: leading/trailing spaces, syntax chars.""" - num_attrs = fdp.ConsumeIntInRange(1, 4) - attrs: list[Attribute] = [] - for _ in range(num_attrs): - edge_type = fdp.ConsumeIntInRange(0, 3) - base = gen_ftl_value(fdp) - match edge_type: - case 0: - val = " " * fdp.ConsumeIntInRange(1, 5) + base - case 1: - val = base + " " * fdp.ConsumeIntInRange(1, 5) - case 2: - ch = _FTL_SYNTAX_CHARS[ - fdp.ConsumeIntInRange( - 0, len(_FTL_SYNTAX_CHARS) - 1, - ) - ] - val = ch + base - case _: - val = base - attrs.append(_mk_attr(fdp, val)) - - msg = _mk_message( - fdp, - value=_mk_pattern(gen_ftl_value(fdp)), - attributes=tuple(attrs), - ) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - -def _pattern_term_edge_cases( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """Terms with edge-case attribute and value content.""" - num_spaces = fdp.ConsumeIntInRange(0, 5) - base = gen_ftl_value(fdp) - value_text = " " * num_spaces + base if num_spaces > 0 else base - - attrs: tuple[Attribute, ...] = () - if fdp.ConsumeBool(): - attr_val = " " * fdp.ConsumeIntInRange(1, 3) + gen_ftl_value(fdp) - attrs = (_mk_attr(fdp, attr_val),) - - term = _mk_term(fdp, value=_mk_pattern(value_text), attributes=attrs) - _verify_serializer_roundtrip(Resource(entries=(term,)), pattern) - - -def _pattern_select_expression( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """Select expressions constructed from AST nodes.""" - var_id = _mk_id(fdp) - selector = VariableReference(id=var_id) - - num_variants = fdp.ConsumeIntInRange(1, 4) - variants: list[Variant] = [] - for _ in range(num_variants): - key_is_number = fdp.ConsumeBool() - if key_is_number: - num = fdp.ConsumeIntInRange(0, 99) - key: Identifier | NumberLiteral = NumberLiteral( - value=num, raw=str(num), - ) - else: - key = _mk_id(fdp) - - val = gen_ftl_value(fdp) - # Optionally add leading whitespace to variant value - if fdp.ConsumeBool(): - val = " " + val - variants.append(Variant(key=key, value=_mk_pattern(val))) - - # Ensure exactly one default - variants.append( - Variant( - key=Identifier(name="other"), - value=_mk_pattern(gen_ftl_value(fdp)), - default=True, - ), - ) - - sel = SelectExpression( - selector=selector, variants=tuple(variants), - ) - pat = Pattern(elements=(Placeable(expression=sel),)) - msg = _mk_message(fdp, value=pat) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - -def _pattern_mixed_elements( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """Patterns with interleaved TextElement and Placeable nodes.""" - num_elements = fdp.ConsumeIntInRange(2, 6) - elements: list[TextElement | Placeable] = [] - - for _ in range(num_elements): - is_placeable = fdp.ConsumeBool() - if is_placeable: - expr_type = fdp.ConsumeIntInRange(0, 2) - match expr_type: - case 0: - expr: Any = StringLiteral( - value=gen_ftl_value(fdp, max_length=10), - ) - case 1: - expr = VariableReference(id=_mk_id(fdp)) - case _: - expr = TermReference(id=_mk_id(fdp)) - elements.append(Placeable(expression=expr)) - else: - val = gen_ftl_value(fdp, max_length=15) - # Optionally inject leading space - if fdp.ConsumeBool() and elements: - val = " " + val - elements.append(TextElement(value=val)) - - pat = Pattern(elements=tuple(elements)) - msg = _mk_message(fdp, value=pat) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - -def _pattern_multiline_value( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """Multi-line TextElement values with newlines and indentation.""" - num_lines = fdp.ConsumeIntInRange(2, 5) - lines: list[str] = [] - for _ in range(num_lines): - line = gen_ftl_value(fdp, max_length=30) - # Optionally add leading spaces - if fdp.ConsumeBool(): - line = " " * fdp.ConsumeIntInRange(1, 4) + line - lines.append(line) - - # Join with newlines and indentation (4 spaces for FTL continuation) - value_text = ("\n ").join(lines) - msg = _mk_message(fdp, value=_mk_pattern(value_text)) - _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) - - -def _pattern_visitor_dispatch( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """ASTVisitor dispatch reaches custom handlers and generic traversal.""" - del pattern # pattern name unused beyond dispatch routing - _domain.visitor_runs += 1 - resource = _build_visitor_resource(fdp) - - class CountingVisitor(ASTVisitor): - """Count dispatch hits across a deliberately mixed AST.""" - - def __init__(self) -> None: - """Initialize node visit counters.""" - super().__init__() - self.messages = 0 - self.terms = 0 - self.text_elements = 0 - self.variables = 0 - self.select_expressions = 0 - - def visit_Message(self, node: Message) -> Message: # noqa: N802 - NodeName - """Count message visits and continue traversal.""" - self.messages += 1 - return cast("Message", self.generic_visit(node)) - - def visit_Term(self, node: Term) -> Term: # noqa: N802 - NodeName - """Count term visits and continue traversal.""" - self.terms += 1 - return cast("Term", self.generic_visit(node)) - - def visit_TextElement(self, node: TextElement) -> TextElement: # noqa: N802 - NodeName - """Count text-element visits and continue traversal.""" - self.text_elements += 1 - return cast("TextElement", self.generic_visit(node)) - - def visit_VariableReference( # noqa: N802 - NodeName - self, node: VariableReference - ) -> VariableReference: - """Count variable-reference visits and continue traversal.""" - self.variables += 1 - return cast("VariableReference", self.generic_visit(node)) - - def visit_SelectExpression( # noqa: N802 - NodeName - self, node: SelectExpression - ) -> SelectExpression: - """Count select-expression visits and continue traversal.""" - self.select_expressions += 1 - return cast("SelectExpression", self.generic_visit(node)) - - visitor = CountingVisitor() - result = visitor.visit(resource) - - if result is not resource: - msg = "ASTVisitor.visit(Resource) did not return the visited node" - raise SerializerFuzzError(msg) - if visitor.messages < 1 or visitor.terms < 1: - msg = "ASTVisitor failed to dispatch to Message/Term handlers" - raise SerializerFuzzError(msg) - if visitor.text_elements < 4: - msg = f"Expected multiple TextElement visits, got {visitor.text_elements}" - raise SerializerFuzzError( - msg, - ) - if visitor.variables < 1 or visitor.select_expressions < 1: - msg = ( - "ASTVisitor failed to traverse nested " - "VariableReference/SelectExpression nodes" - ) - raise SerializerFuzzError( - msg, - ) - - -def _pattern_transformer_roundtrip( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """ASTTransformer list expansion preserves serializer roundtrip invariants.""" - _domain.transformer_runs += 1 - resource = _build_visitor_resource(fdp) - duplicate_suffix = f"-copy-{fdp.ConsumeIntInRange(0, 99)}" - - class ExpandingTransformer(ASTTransformer): - """Expand a single message into two messages via list return.""" - - def __init__(self, suffix: str) -> None: - """Store suffix used for the duplicated message ID.""" - super().__init__() - self._suffix = suffix - self.expansions = 0 - - def visit_Message(self, node: Message) -> list[Message]: # noqa: N802 - NodeName - """Duplicate visited messages after transforming their children.""" - transformed = self.generic_visit(node) - if not isinstance(transformed, Message): - msg = ( - "ASTTransformer.generic_visit(Message) returned " - f"{type(transformed).__name__}" - ) - raise SerializerFuzzError(msg) - - self.expansions += 1 - duplicate = dc_replace( - transformed, - id=Identifier(name=f"{transformed.id.name}{self._suffix}"), - ) - return [transformed, duplicate] - - transformer = ExpandingTransformer(duplicate_suffix) - transformed = transformer.transform(resource) - - if not isinstance(transformed, Resource): - msg = f"transform(Resource) returned {type(transformed).__name__}" - raise SerializerFuzzError(msg) - - message_count = sum( - 1 for entry in transformed.entries if isinstance(entry, Message) - ) - if transformer.expansions < 1 or message_count < 2: - msg = "ASTTransformer list expansion did not duplicate message entries" - raise SerializerFuzzError( - msg, - ) - - _verify_serializer_roundtrip(transformed, pattern) - - -def _pattern_transformer_validation( - fdp: atheris.FuzzedDataProvider, - pattern: str, -) -> None: - """ASTTransformer rejects invalid scalar replacements for required fields.""" - del pattern # validation path does not serialize on success - resource = _build_visitor_resource(fdp) - invalid_mode = fdp.ConsumeIntInRange(0, 1) - - class InvalidScalarTransformer(ASTTransformer): - """Return invalid scalar replacements to verify runtime validation.""" - - def __init__(self, mode: int) -> None: - """Select whether to return None or a list for Identifier fields.""" - super().__init__() - self._mode = mode - - def visit_Identifier( # noqa: N802 - NodeName - self, node: Identifier - ) -> None | list[Identifier]: - """Break required scalar field contracts for Identifier nodes.""" - if self._mode == 0: - return None - return [node, dc_replace(node, name=f"{node.name}-dup")] - - transformer = InvalidScalarTransformer(invalid_mode) - - try: - transformer.transform(resource) - except TypeError as exc: - if "Message.id" not in str(exc): - msg = ( - "ASTTransformer raised unexpected TypeError during scalar " - f"validation: {exc}" - ) - raise SerializerFuzzError(msg) from exc - _domain.validation_errors += 1 - return - - msg = "ASTTransformer accepted invalid scalar replacement for Message.id" - raise SerializerFuzzError(msg) - - -# --- Pattern Dispatch --- - -_PATTERN_DISPATCH: dict[str, Any] = { - "leading_whitespace": _pattern_leading_whitespace, - "trailing_whitespace": _pattern_trailing_whitespace, - "syntax_chars_value": _pattern_syntax_chars_value, - "simple_message": _pattern_simple_message, - "string_literal_placeable": _pattern_string_literal_placeable, - "attribute_edge_cases": _pattern_attribute_edge_cases, - "term_edge_cases": _pattern_term_edge_cases, - "select_expression": _pattern_select_expression, - "mixed_elements": _pattern_mixed_elements, - "multiline_value": _pattern_multiline_value, - "visitor_dispatch": _pattern_visitor_dispatch, - "transformer_roundtrip": _pattern_transformer_roundtrip, - "transformer_validation": _pattern_transformer_validation, -} - - -# --- Custom Mutator --- - - -def _mutate_constructed_ast(ast: Resource, seed: int) -> Resource: - """Apply mutations targeting serializer edge cases. - - Mutations focus on whitespace injection and syntax character - insertion -- the exact bug classes that text-based fuzzers miss. - """ - rng = random.Random(seed) - entries = list(ast.entries) - if not entries: - return ast - - mut_type = rng.randint(0, 3) - - match mut_type: - case 0: - entries = _mut_add_leading_spaces(entries, rng) - case 1: - entries = _mut_add_syntax_char(entries, rng) - case 2: - entries = _mut_add_attribute_ws(entries, rng) - case 3: - entries = _mut_nest_placeable(entries, rng) - - return Resource(entries=tuple(entries)) - - -def _mut_add_leading_spaces( - entries: list[Any], - rng: random.Random, -) -> list[Any]: - """Inject leading spaces into the first TextElement of a pattern.""" - for i, entry in enumerate(entries): - if not isinstance(entry, (Message, Term)) or entry.value is None: - continue - elements = list(entry.value.elements) - for idx, elem in enumerate(elements): - if isinstance(elem, TextElement) and elem.value: - n = rng.randint(1, 6) - elements[idx] = dc_replace( - elem, value=" " * n + elem.value, - ) - new_pat = dc_replace( - entry.value, elements=tuple(elements), - ) - entries[i] = dc_replace(entry, value=new_pat) - return entries - return entries - - -def _mut_add_syntax_char( - entries: list[Any], - rng: random.Random, -) -> list[Any]: - """Insert a syntax character at a random position in a TextElement.""" - for i, entry in enumerate(entries): - if not isinstance(entry, (Message, Term)) or entry.value is None: - continue - elements = list(entry.value.elements) - for idx, elem in enumerate(elements): - if isinstance(elem, TextElement) and elem.value: - ch = rng.choice(_FTL_SYNTAX_CHARS) - pos = rng.randint(0, len(elem.value)) - new_val = elem.value[:pos] + ch + elem.value[pos:] - elements[idx] = dc_replace(elem, value=new_val) - new_pat = dc_replace( - entry.value, elements=tuple(elements), - ) - entries[i] = dc_replace(entry, value=new_pat) - return entries - return entries - - -def _mut_add_attribute_ws( - entries: list[Any], - rng: random.Random, -) -> list[Any]: - """Add leading whitespace to an attribute value.""" - for i, entry in enumerate(entries): - if not isinstance(entry, (Message, Term)) or not entry.attributes: - continue - attr = rng.choice(entry.attributes) - if attr.value and attr.value.elements: - elem = attr.value.elements[0] - if isinstance(elem, TextElement) and elem.value: - n = rng.randint(1, 5) - new_elem = dc_replace( - elem, value=" " * n + elem.value, - ) - new_elements = (new_elem, *attr.value.elements[1:]) - new_pat = dc_replace( - attr.value, elements=new_elements, - ) - new_attr = dc_replace(attr, value=new_pat) - new_attrs = tuple( - new_attr if a is attr else a - for a in entry.attributes - ) - entries[i] = dc_replace(entry, attributes=new_attrs) - return entries - return entries - - -def _mut_nest_placeable( - entries: list[Any], - _rng: random.Random, -) -> list[Any]: - """Wrap a StringLiteral in an additional Placeable layer.""" - for i, entry in enumerate(entries): - if not isinstance(entry, (Message, Term)) or entry.value is None: - continue - for idx, elem in enumerate(entry.value.elements): - if ( - isinstance(elem, Placeable) - and isinstance(elem.expression, StringLiteral) - ): - inner = Placeable(expression=elem.expression) - new_elem = dc_replace(elem, expression=inner) - new_elements = list(entry.value.elements) - new_elements[idx] = new_elem - new_pat = dc_replace( - entry.value, elements=tuple(new_elements), - ) - entries[i] = dc_replace(entry, value=new_pat) - return entries - return entries - - -def _custom_mutator(data: bytes, max_size: int, seed: int) -> bytes: - """Structure-aware mutator for AST-constructed inputs. - - Parses the serialized output, applies AST-level mutations targeting - serializer edge cases, re-serializes, then applies byte-level mutation. - """ - try: - source = data.decode("utf-8", errors="replace") - ast = _parser.parse(source) - - if ast.entries and not any( - isinstance(e, Junk) for e in ast.entries - ): - mutated = _mutate_constructed_ast(ast, seed) - serialized = serialize(mutated, validate=False) - result = serialized.encode("utf-8") - if len(result) <= max_size: - return atheris.Mutate(result, max_size) - except KeyboardInterrupt: - _state.status = "stopped" - raise - except Exception: # pylint: disable=broad-exception-caught - pass - - return atheris.Mutate(data, max_size) - - -def test_one_input(data: bytes) -> None: - """Atheris entry point: fuzz serializer via AST construction.""" - if _state.iterations == 0: - _state.initial_memory_mb = ( - get_process().memory_info().rss / (1024 * 1024) - ) - - _state.iterations += 1 - _state.status = "running" - - if _state.iterations % _state.checkpoint_interval == 0: - _emit_report() - - start_time = time.perf_counter() - fdp = atheris.FuzzedDataProvider(data) - - pattern = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) - _state.pattern_coverage[pattern] = ( - _state.pattern_coverage.get(pattern, 0) + 1 - ) - - if fdp.remaining_bytes() < 4: - return - - try: - handler = _PATTERN_DISPATCH[pattern] - handler(fdp, pattern) - - except SerializerFuzzError: - _state.findings += 1 - raise - - except KeyboardInterrupt: - _state.status = "stopped" - raise - - except ALLOWED_EXCEPTIONS: - pass - - except Exception as e: # pylint: disable=broad-exception-caught - error_key = f"{type(e).__name__}_{str(e)[:30]}" - _state.error_counts[error_key] = ( - _state.error_counts.get(error_key, 0) + 1 - ) - - finally: - is_interesting = pattern in ( - "leading_whitespace", "syntax_chars_value", - "attribute_edge_cases", "visitor_dispatch", - "transformer_roundtrip", "transformer_validation", - ) or ((time.perf_counter() - start_time) * 1000 > 10.0) - record_iteration_metrics( - _state, pattern, start_time, data, - is_interesting=is_interesting, - ) - - if _state.iterations % GC_INTERVAL == 0: - gc.collect() - - if _state.iterations % 100 == 0: - record_memory(_state) - - -def main() -> None: - """Run the AST-construction serializer fuzzer with CLI support.""" - parser = argparse.ArgumentParser( - description=( - "AST-construction serializer roundtrip fuzzer " - "using Atheris/libFuzzer" - ), - epilog="All unrecognized arguments are passed to libFuzzer.", - ) - parser.add_argument( - "--checkpoint-interval", type=int, default=500, - help="Emit report every N iterations (default: 500)", - ) - parser.add_argument( - "--seed-corpus-size", type=int, default=100, - help="Maximum size of in-memory seed corpus (default: 100)", - ) - - args, remaining = parser.parse_known_args() - _state.checkpoint_interval = args.checkpoint_interval - _state.seed_corpus_max_size = args.seed_corpus_size - - if not any(arg.startswith("-rss_limit_mb") for arg in remaining): - remaining.append("-rss_limit_mb=4096") - - sys.argv = [sys.argv[0], *remaining] - - print_fuzzer_banner( - title="AST-Construction Serializer Fuzzer (Atheris)", - target="serialize (AST-constructed), FluentParserV1", - state=_state, - schedule_len=len(_PATTERN_SCHEDULE), - extra_lines=( - "Mode: AST-construction (bypasses parser normalization)", - ), - ) - - run_fuzzer( - _state, - test_one_input=test_one_input, - custom_mutator=_custom_mutator, - ) - +from fuzz_serializer_entry import main if __name__ == "__main__": main() diff --git a/fuzz_atheris/fuzz_serializer_entry.py b/fuzz_atheris/fuzz_serializer_entry.py new file mode 100644 index 00000000..74076ebc --- /dev/null +++ b/fuzz_atheris/fuzz_serializer_entry.py @@ -0,0 +1,164 @@ +from __future__ import annotations + +import argparse +import gc +import sys +import time +from typing import Any + +import atheris +from fuzz_common import ( + GC_INTERVAL, + get_process, + print_fuzzer_banner, + record_iteration_metrics, + record_memory, + run_fuzzer, + select_pattern_round_robin, +) +from fuzz_serializer_mutators import _custom_mutator +from fuzz_serializer_patterns_text import ( + _pattern_attribute_edge_cases, + _pattern_leading_whitespace, + _pattern_mixed_elements, + _pattern_multiline_value, + _pattern_select_expression, + _pattern_simple_message, + _pattern_string_literal_placeable, + _pattern_syntax_chars_value, + _pattern_term_edge_cases, + _pattern_trailing_whitespace, +) +from fuzz_serializer_patterns_transform import ( + _pattern_transformer_roundtrip, + _pattern_transformer_validation, + _pattern_visitor_dispatch, +) +from fuzz_serializer_support import ( + _PATTERN_SCHEDULE, + ALLOWED_EXCEPTIONS, + SerializerFuzzError, + _emit_report, + _state, +) + +_PATTERN_DISPATCH: dict[str, Any] = { + "leading_whitespace": _pattern_leading_whitespace, + "trailing_whitespace": _pattern_trailing_whitespace, + "syntax_chars_value": _pattern_syntax_chars_value, + "simple_message": _pattern_simple_message, + "string_literal_placeable": _pattern_string_literal_placeable, + "attribute_edge_cases": _pattern_attribute_edge_cases, + "term_edge_cases": _pattern_term_edge_cases, + "select_expression": _pattern_select_expression, + "mixed_elements": _pattern_mixed_elements, + "multiline_value": _pattern_multiline_value, + "visitor_dispatch": _pattern_visitor_dispatch, + "transformer_roundtrip": _pattern_transformer_roundtrip, + "transformer_validation": _pattern_transformer_validation, +} + +def test_one_input(data: bytes) -> None: + """Atheris entry point: fuzz serializer via AST construction.""" + if _state.iterations == 0: + _state.initial_memory_mb = ( + get_process().memory_info().rss / (1024 * 1024) + ) + + _state.iterations += 1 + _state.status = "running" + + if _state.iterations % _state.checkpoint_interval == 0: + _emit_report() + + start_time = time.perf_counter() + fdp = atheris.FuzzedDataProvider(data) + + pattern = select_pattern_round_robin(_state, _PATTERN_SCHEDULE) + _state.pattern_coverage[pattern] = ( + _state.pattern_coverage.get(pattern, 0) + 1 + ) + + if fdp.remaining_bytes() < 4: + return + + try: + handler = _PATTERN_DISPATCH[pattern] + handler(fdp, pattern) + + except SerializerFuzzError: + _state.findings += 1 + raise + + except KeyboardInterrupt: + _state.status = "stopped" + raise + + except ALLOWED_EXCEPTIONS: + pass + + except Exception as e: # pylint: disable=broad-exception-caught + error_key = f"{type(e).__name__}_{str(e)[:30]}" + _state.error_counts[error_key] = ( + _state.error_counts.get(error_key, 0) + 1 + ) + + finally: + is_interesting = pattern in ( + "leading_whitespace", "syntax_chars_value", + "attribute_edge_cases", "visitor_dispatch", + "transformer_roundtrip", "transformer_validation", + ) or ((time.perf_counter() - start_time) * 1000 > 10.0) + record_iteration_metrics( + _state, pattern, start_time, data, + is_interesting=is_interesting, + ) + + if _state.iterations % GC_INTERVAL == 0: + gc.collect() + + if _state.iterations % 100 == 0: + record_memory(_state) + +def main() -> None: + """Run the AST-construction serializer fuzzer with CLI support.""" + parser = argparse.ArgumentParser( + description=( + "AST-construction serializer roundtrip fuzzer " + "using Atheris/libFuzzer" + ), + epilog="All unrecognized arguments are passed to libFuzzer.", + ) + parser.add_argument( + "--checkpoint-interval", type=int, default=500, + help="Emit report every N iterations (default: 500)", + ) + parser.add_argument( + "--seed-corpus-size", type=int, default=100, + help="Maximum size of in-memory seed corpus (default: 100)", + ) + + args, remaining = parser.parse_known_args() + _state.checkpoint_interval = args.checkpoint_interval + _state.seed_corpus_max_size = args.seed_corpus_size + + if not any(arg.startswith("-rss_limit_mb") for arg in remaining): + remaining.append("-rss_limit_mb=4096") + + sys.argv = [sys.argv[0], *remaining] + + print_fuzzer_banner( + title="AST-Construction Serializer Fuzzer (Atheris)", + target="serialize (AST-constructed), FluentParserV1", + state=_state, + schedule_len=len(_PATTERN_SCHEDULE), + extra_lines=( + "Mode: AST-construction (bypasses parser normalization)", + ), + ) + + run_fuzzer( + _state, + test_one_input=test_one_input, + custom_mutator=_custom_mutator, + ) diff --git a/fuzz_atheris/fuzz_serializer_mutators.py b/fuzz_atheris/fuzz_serializer_mutators.py new file mode 100644 index 00000000..51536f36 --- /dev/null +++ b/fuzz_atheris/fuzz_serializer_mutators.py @@ -0,0 +1,172 @@ +from __future__ import annotations + +import random +from dataclasses import replace as dc_replace +from typing import Any + +import atheris +from fuzz_serializer_support import ( + _FTL_SYNTAX_CHARS, + _parser, + _state, +) + +from ftllexengine.syntax.ast import ( + Junk, + Message, + Placeable, + Resource, + StringLiteral, + Term, + TextElement, +) +from ftllexengine.syntax.serializer import serialize + + +def _mutate_constructed_ast(ast: Resource, seed: int) -> Resource: + """Apply mutations targeting serializer edge cases. + + Mutations focus on whitespace injection and syntax character + insertion -- the exact bug classes that text-based fuzzers miss. + """ + rng = random.Random(seed) + entries = list(ast.entries) + if not entries: + return ast + + mut_type = rng.randint(0, 3) + + match mut_type: + case 0: + entries = _mut_add_leading_spaces(entries, rng) + case 1: + entries = _mut_add_syntax_char(entries, rng) + case 2: + entries = _mut_add_attribute_ws(entries, rng) + case 3: + entries = _mut_nest_placeable(entries, rng) + + return Resource(entries=tuple(entries)) + +def _mut_add_leading_spaces( + entries: list[Any], + rng: random.Random, +) -> list[Any]: + """Inject leading spaces into the first TextElement of a pattern.""" + for i, entry in enumerate(entries): + if not isinstance(entry, (Message, Term)) or entry.value is None: + continue + elements = list(entry.value.elements) + for idx, elem in enumerate(elements): + if isinstance(elem, TextElement) and elem.value: + n = rng.randint(1, 6) + elements[idx] = dc_replace( + elem, value=" " * n + elem.value, + ) + new_pat = dc_replace( + entry.value, elements=tuple(elements), + ) + entries[i] = dc_replace(entry, value=new_pat) + return entries + return entries + +def _mut_add_syntax_char( + entries: list[Any], + rng: random.Random, +) -> list[Any]: + """Insert a syntax character at a random position in a TextElement.""" + for i, entry in enumerate(entries): + if not isinstance(entry, (Message, Term)) or entry.value is None: + continue + elements = list(entry.value.elements) + for idx, elem in enumerate(elements): + if isinstance(elem, TextElement) and elem.value: + ch = rng.choice(_FTL_SYNTAX_CHARS) + pos = rng.randint(0, len(elem.value)) + new_val = elem.value[:pos] + ch + elem.value[pos:] + elements[idx] = dc_replace(elem, value=new_val) + new_pat = dc_replace( + entry.value, elements=tuple(elements), + ) + entries[i] = dc_replace(entry, value=new_pat) + return entries + return entries + +def _mut_add_attribute_ws( + entries: list[Any], + rng: random.Random, +) -> list[Any]: + """Add leading whitespace to an attribute value.""" + for i, entry in enumerate(entries): + if not isinstance(entry, (Message, Term)) or not entry.attributes: + continue + attr = rng.choice(entry.attributes) + if attr.value and attr.value.elements: + elem = attr.value.elements[0] + if isinstance(elem, TextElement) and elem.value: + n = rng.randint(1, 5) + new_elem = dc_replace( + elem, value=" " * n + elem.value, + ) + new_elements = (new_elem, *attr.value.elements[1:]) + new_pat = dc_replace( + attr.value, elements=new_elements, + ) + new_attr = dc_replace(attr, value=new_pat) + new_attrs = tuple( + new_attr if a is attr else a + for a in entry.attributes + ) + entries[i] = dc_replace(entry, attributes=new_attrs) + return entries + return entries + +def _mut_nest_placeable( + entries: list[Any], + _rng: random.Random, +) -> list[Any]: + """Wrap a StringLiteral in an additional Placeable layer.""" + for i, entry in enumerate(entries): + if not isinstance(entry, (Message, Term)) or entry.value is None: + continue + for idx, elem in enumerate(entry.value.elements): + if ( + isinstance(elem, Placeable) + and isinstance(elem.expression, StringLiteral) + ): + inner = Placeable(expression=elem.expression) + new_elem = dc_replace(elem, expression=inner) + new_elements = list(entry.value.elements) + new_elements[idx] = new_elem + new_pat = dc_replace( + entry.value, elements=tuple(new_elements), + ) + entries[i] = dc_replace(entry, value=new_pat) + return entries + return entries + +def _custom_mutator(data: bytes, max_size: int, seed: int) -> bytes: + """Structure-aware mutator for AST-constructed inputs. + + Parses the serialized output, applies AST-level mutations targeting + serializer edge cases, re-serializes, then applies byte-level mutation. + """ + try: + source = data.decode("utf-8", errors="replace") + ast = _parser.parse(source) + + if ast.entries and not any( + isinstance(e, Junk) for e in ast.entries + ): + mutated = _mutate_constructed_ast(ast, seed) + serialized = serialize(mutated, validate=False) + result = serialized.encode("utf-8") + if len(result) <= max_size: + return atheris.Mutate(result, max_size) + except KeyboardInterrupt: + _state.status = "stopped" + raise + except Exception: # pylint: disable=broad-exception-caught + pass + + return atheris.Mutate(data, max_size) diff --git a/fuzz_atheris/fuzz_serializer_patterns_text.py b/fuzz_atheris/fuzz_serializer_patterns_text.py new file mode 100644 index 00000000..597696ef --- /dev/null +++ b/fuzz_atheris/fuzz_serializer_patterns_text.py @@ -0,0 +1,277 @@ +from __future__ import annotations + +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + import atheris +from fuzz_common import gen_ftl_value +from fuzz_serializer_support import ( + _FTL_SYNTAX_CHARS, + _mk_attr, + _mk_id, + _mk_message, + _mk_pattern, + _mk_term, + _verify_serializer_roundtrip, +) + +from ftllexengine.syntax.ast import ( + Attribute, + Identifier, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + TermReference, + TextElement, + VariableReference, + Variant, +) + + +def _pattern_leading_whitespace( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """TextElement values with leading whitespace. + + Targets BUG-SERIALIZER-LEADING-WS-001: the parser consumes post-= + whitespace as syntax, so leading spaces in TextElement values must + be wrapped in StringLiteral placeables by the serializer. + """ + num_spaces = fdp.ConsumeIntInRange(1, 8) + base_value = gen_ftl_value(fdp) + value_text = " " * num_spaces + base_value + + # Message with leading-whitespace value + msg = _mk_message(fdp, value=_mk_pattern(value_text)) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + + # Attribute with leading-whitespace value + attr = _mk_attr(fdp, value_text) + msg2 = _mk_message( + fdp, + value=_mk_pattern(gen_ftl_value(fdp)), + attributes=(attr,), + ) + _verify_serializer_roundtrip(Resource(entries=(msg2,)), pattern) + +def _pattern_trailing_whitespace( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """TextElement values with trailing whitespace.""" + num_spaces = fdp.ConsumeIntInRange(1, 8) + base_value = gen_ftl_value(fdp) + value_text = base_value + " " * num_spaces + + msg = _mk_message(fdp, value=_mk_pattern(value_text)) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + +def _pattern_syntax_chars_value( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """TextElement values containing FTL syntax characters. + + Tests that the serializer correctly escapes or wraps braces, + dots, hash, asterisk, and brackets at various positions. + """ + base_value = gen_ftl_value(fdp) + char = _FTL_SYNTAX_CHARS[ + fdp.ConsumeIntInRange(0, len(_FTL_SYNTAX_CHARS) - 1) + ] + pos = fdp.ConsumeIntInRange(0, len(base_value)) + value_text = base_value[:pos] + char + base_value[pos:] + + msg = _mk_message(fdp, value=_mk_pattern(value_text)) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + + # Also test in attribute value + attr = _mk_attr(fdp, value_text) + msg2 = _mk_message( + fdp, + value=_mk_pattern(gen_ftl_value(fdp)), + attributes=(attr,), + ) + _verify_serializer_roundtrip(Resource(entries=(msg2,)), pattern) + +def _pattern_simple_message( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """Baseline: simple message with clean text value.""" + value = gen_ftl_value(fdp) + msg = _mk_message(fdp, value=_mk_pattern(value)) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + +def _pattern_string_literal_placeable( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """Patterns with StringLiteral placeables containing edge-case content.""" + literal_value = gen_ftl_value(fdp, max_length=20) + + # Optionally inject special content + special = fdp.ConsumeIntInRange(0, 3) + match special: + case 0: + literal_value = " " * fdp.ConsumeIntInRange(1, 5) + case 1: + literal_value = "\\" + literal_value + case 2: + literal_value = '"' + literal_value + '"' + + placeable = Placeable(expression=StringLiteral(value=literal_value)) + text_before = TextElement(value=gen_ftl_value(fdp, max_length=15)) + elements: tuple[TextElement | Placeable, ...] = (text_before, placeable) + pat = Pattern(elements=elements) + + msg = _mk_message(fdp, value=pat) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + +def _pattern_attribute_edge_cases( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """Attributes with edge-case values: leading/trailing spaces, syntax chars.""" + num_attrs = fdp.ConsumeIntInRange(1, 4) + attrs: list[Attribute] = [] + for _ in range(num_attrs): + edge_type = fdp.ConsumeIntInRange(0, 3) + base = gen_ftl_value(fdp) + match edge_type: + case 0: + val = " " * fdp.ConsumeIntInRange(1, 5) + base + case 1: + val = base + " " * fdp.ConsumeIntInRange(1, 5) + case 2: + ch = _FTL_SYNTAX_CHARS[ + fdp.ConsumeIntInRange( + 0, len(_FTL_SYNTAX_CHARS) - 1, + ) + ] + val = ch + base + case _: + val = base + attrs.append(_mk_attr(fdp, val)) + + msg = _mk_message( + fdp, + value=_mk_pattern(gen_ftl_value(fdp)), + attributes=tuple(attrs), + ) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + +def _pattern_term_edge_cases( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """Terms with edge-case attribute and value content.""" + num_spaces = fdp.ConsumeIntInRange(0, 5) + base = gen_ftl_value(fdp) + value_text = " " * num_spaces + base if num_spaces > 0 else base + + attrs: tuple[Attribute, ...] = () + if fdp.ConsumeBool(): + attr_val = " " * fdp.ConsumeIntInRange(1, 3) + gen_ftl_value(fdp) + attrs = (_mk_attr(fdp, attr_val),) + + term = _mk_term(fdp, value=_mk_pattern(value_text), attributes=attrs) + _verify_serializer_roundtrip(Resource(entries=(term,)), pattern) + +def _pattern_select_expression( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """Select expressions constructed from AST nodes.""" + var_id = _mk_id(fdp) + selector = VariableReference(id=var_id) + + num_variants = fdp.ConsumeIntInRange(1, 4) + variants: list[Variant] = [] + for _ in range(num_variants): + key_is_number = fdp.ConsumeBool() + if key_is_number: + num = fdp.ConsumeIntInRange(0, 99) + key: Identifier | NumberLiteral = NumberLiteral( + value=num, raw=str(num), + ) + else: + key = _mk_id(fdp) + + val = gen_ftl_value(fdp) + # Optionally add leading whitespace to variant value + if fdp.ConsumeBool(): + val = " " + val + variants.append(Variant(key=key, value=_mk_pattern(val))) + + # Ensure exactly one default + variants.append( + Variant( + key=Identifier(name="other"), + value=_mk_pattern(gen_ftl_value(fdp)), + default=True, + ), + ) + + sel = SelectExpression( + selector=selector, variants=tuple(variants), + ) + pat = Pattern(elements=(Placeable(expression=sel),)) + msg = _mk_message(fdp, value=pat) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + +def _pattern_mixed_elements( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """Patterns with interleaved TextElement and Placeable nodes.""" + num_elements = fdp.ConsumeIntInRange(2, 6) + elements: list[TextElement | Placeable] = [] + + for _ in range(num_elements): + is_placeable = fdp.ConsumeBool() + if is_placeable: + expr_type = fdp.ConsumeIntInRange(0, 2) + match expr_type: + case 0: + expr: Any = StringLiteral( + value=gen_ftl_value(fdp, max_length=10), + ) + case 1: + expr = VariableReference(id=_mk_id(fdp)) + case _: + expr = TermReference(id=_mk_id(fdp)) + elements.append(Placeable(expression=expr)) + else: + val = gen_ftl_value(fdp, max_length=15) + # Optionally inject leading space + if fdp.ConsumeBool() and elements: + val = " " + val + elements.append(TextElement(value=val)) + + pat = Pattern(elements=tuple(elements)) + msg = _mk_message(fdp, value=pat) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) + +def _pattern_multiline_value( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """Multi-line TextElement values with newlines and indentation.""" + num_lines = fdp.ConsumeIntInRange(2, 5) + lines: list[str] = [] + for _ in range(num_lines): + line = gen_ftl_value(fdp, max_length=30) + # Optionally add leading spaces + if fdp.ConsumeBool(): + line = " " * fdp.ConsumeIntInRange(1, 4) + line + lines.append(line) + + # Join with newlines and indentation (4 spaces for FTL continuation) + value_text = ("\n ").join(lines) + msg = _mk_message(fdp, value=_mk_pattern(value_text)) + _verify_serializer_roundtrip(Resource(entries=(msg,)), pattern) diff --git a/fuzz_atheris/fuzz_serializer_patterns_transform.py b/fuzz_atheris/fuzz_serializer_patterns_transform.py new file mode 100644 index 00000000..85548529 --- /dev/null +++ b/fuzz_atheris/fuzz_serializer_patterns_transform.py @@ -0,0 +1,193 @@ +from __future__ import annotations + +from dataclasses import replace as dc_replace +from typing import TYPE_CHECKING, cast + +if TYPE_CHECKING: + import atheris +from fuzz_serializer_support import ( + SerializerFuzzError, + _build_visitor_resource, + _domain, + _verify_serializer_roundtrip, +) + +from ftllexengine.syntax.ast import ( + Identifier, + Message, + Resource, + SelectExpression, + Term, + TextElement, + VariableReference, +) +from ftllexengine.syntax.visitor import ASTTransformer, ASTVisitor + + +def _pattern_visitor_dispatch( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """ASTVisitor dispatch reaches custom handlers and generic traversal.""" + del pattern # pattern name unused beyond dispatch routing + _domain.visitor_runs += 1 + resource = _build_visitor_resource(fdp) + + class CountingVisitor(ASTVisitor): + """Count dispatch hits across a deliberately mixed AST.""" + + def __init__(self) -> None: + """Initialize node visit counters.""" + super().__init__() + self.messages = 0 + self.terms = 0 + self.text_elements = 0 + self.variables = 0 + self.select_expressions = 0 + + def visit_Message(self, node: Message) -> Message: # noqa: N802 - NodeName + """Count message visits and continue traversal.""" + self.messages += 1 + return cast("Message", self.generic_visit(node)) + + def visit_Term(self, node: Term) -> Term: # noqa: N802 - NodeName + """Count term visits and continue traversal.""" + self.terms += 1 + return cast("Term", self.generic_visit(node)) + + def visit_TextElement(self, node: TextElement) -> TextElement: # noqa: N802 - NodeName + """Count text-element visits and continue traversal.""" + self.text_elements += 1 + return cast("TextElement", self.generic_visit(node)) + + def visit_VariableReference( # noqa: N802 - NodeName + self, node: VariableReference + ) -> VariableReference: + """Count variable-reference visits and continue traversal.""" + self.variables += 1 + return cast("VariableReference", self.generic_visit(node)) + + def visit_SelectExpression( # noqa: N802 - NodeName + self, node: SelectExpression + ) -> SelectExpression: + """Count select-expression visits and continue traversal.""" + self.select_expressions += 1 + return cast("SelectExpression", self.generic_visit(node)) + + visitor = CountingVisitor() + result = visitor.visit(resource) + + if result is not resource: + msg = "ASTVisitor.visit(Resource) did not return the visited node" + raise SerializerFuzzError(msg) + if visitor.messages < 1 or visitor.terms < 1: + msg = "ASTVisitor failed to dispatch to Message/Term handlers" + raise SerializerFuzzError(msg) + if visitor.text_elements < 4: + msg = f"Expected multiple TextElement visits, got {visitor.text_elements}" + raise SerializerFuzzError( + msg, + ) + if visitor.variables < 1 or visitor.select_expressions < 1: + msg = ( + "ASTVisitor failed to traverse nested " + "VariableReference/SelectExpression nodes" + ) + raise SerializerFuzzError( + msg, + ) + +def _pattern_transformer_roundtrip( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """ASTTransformer list expansion preserves serializer roundtrip invariants.""" + _domain.transformer_runs += 1 + resource = _build_visitor_resource(fdp) + duplicate_suffix = f"-copy-{fdp.ConsumeIntInRange(0, 99)}" + + class ExpandingTransformer(ASTTransformer): + """Expand a single message into two messages via list return.""" + + def __init__(self, suffix: str) -> None: + """Store suffix used for the duplicated message ID.""" + super().__init__() + self._suffix = suffix + self.expansions = 0 + + def visit_Message(self, node: Message) -> list[Message]: # noqa: N802 - NodeName + """Duplicate visited messages after transforming their children.""" + transformed = self.generic_visit(node) + if not isinstance(transformed, Message): + msg = ( + "ASTTransformer.generic_visit(Message) returned " + f"{type(transformed).__name__}" + ) + raise SerializerFuzzError(msg) + + self.expansions += 1 + duplicate = dc_replace( + transformed, + id=Identifier(name=f"{transformed.id.name}{self._suffix}"), + ) + return [transformed, duplicate] + + transformer = ExpandingTransformer(duplicate_suffix) + transformed = transformer.transform(resource) + + if not isinstance(transformed, Resource): + msg = f"transform(Resource) returned {type(transformed).__name__}" + raise SerializerFuzzError(msg) + + message_count = sum( + 1 for entry in transformed.entries if isinstance(entry, Message) + ) + if transformer.expansions < 1 or message_count < 2: + msg = "ASTTransformer list expansion did not duplicate message entries" + raise SerializerFuzzError( + msg, + ) + + _verify_serializer_roundtrip(transformed, pattern) + +def _pattern_transformer_validation( + fdp: atheris.FuzzedDataProvider, + pattern: str, +) -> None: + """ASTTransformer rejects invalid scalar replacements for required fields.""" + del pattern # validation path does not serialize on success + resource = _build_visitor_resource(fdp) + invalid_mode = fdp.ConsumeIntInRange(0, 1) + + class InvalidScalarTransformer(ASTTransformer): + """Return invalid scalar replacements to verify runtime validation.""" + + def __init__(self, mode: int) -> None: + """Select whether to return None or a list for Identifier fields.""" + super().__init__() + self._mode = mode + + def visit_Identifier( # noqa: N802 - NodeName + self, node: Identifier + ) -> None | list[Identifier]: + """Break required scalar field contracts for Identifier nodes.""" + if self._mode == 0: + return None + return [node, dc_replace(node, name=f"{node.name}-dup")] + + transformer = InvalidScalarTransformer(invalid_mode) + + try: + transformer.transform(resource) + except TypeError as exc: + if "Message.id" not in str(exc): + msg = ( + "ASTTransformer raised unexpected TypeError during scalar " + f"validation: {exc}" + ) + raise SerializerFuzzError(msg) from exc + _domain.validation_errors += 1 + return + + msg = "ASTTransformer accepted invalid scalar replacement for Message.id" + raise SerializerFuzzError(msg) diff --git a/fuzz_atheris/fuzz_serializer_support.py b/fuzz_atheris/fuzz_serializer_support.py new file mode 100644 index 00000000..8131a271 --- /dev/null +++ b/fuzz_atheris/fuzz_serializer_support.py @@ -0,0 +1,369 @@ +#!/usr/bin/env python3 +"""AST-Construction Serializer Fuzzer (Atheris). + +Targets: ftllexengine.syntax.serializer.serialize, + ftllexengine.syntax.parser.FluentParserV1, + ftllexengine.syntax.visitor.ASTVisitor / ASTTransformer + +Concern boundary: This fuzzer programmatically constructs AST nodes +(bypassing the parser) and feeds them to the serializer. This is the +ONLY Atheris fuzzer that can produce AST states the parser would never +emit -- e.g. TextElement values with leading whitespace, syntax characters +in pattern-initial positions, empty patterns, or structurally valid but +semantically unusual combinations. + +This directly addresses the blind spot where text-based fuzzers +(fuzz_roundtrip, fuzz_structured) start from the parser, which normalizes +inputs before the serializer ever sees them. + +The same AST-construction model is also ideal for visitor/transformer +coverage because it can construct trees and transformation results that +ordinary parser-driven fuzzers do not reach. + +Invariant: +- serialize(ast) must produce valid FTL (no Junk on reparse) +- Idempotence: serialize(parse(serialize(ast))) == serialize(ast) + +Pattern Routing: +Deterministic round-robin from a weighted schedule (same infrastructure +as fuzz_roundtrip). Pattern selection is independent of fuzzed bytes +to avoid coverage-guided mutation bias. + +Custom Mutator: +AST-level mutations applied to programmatically constructed ASTs: +inject leading/trailing whitespace, syntax characters, empty patterns, +deeply nested placeables. Byte-level mutation applied on top. + +Finding Artifacts: +Convergence failures write source/S1/S2/metadata to +.fuzz_atheris_corpus/serializer/findings/ for standalone reproduction. + +Requires Python 3.13+ (uses PEP 695 type aliases). +""" + +from __future__ import annotations + +import atexit +import logging +import pathlib +from dataclasses import dataclass +from typing import Any + +# --- Dependency Checks --- +_psutil_mod: Any = None +_atheris_mod: Any = None + +try: # noqa: SIM105 - need module ref for check_dependencies + import psutil as _psutil_mod # type: ignore[no-redef] +except ImportError: + pass + +try: # noqa: SIM105 - need module ref for check_dependencies + import atheris as _atheris_mod # type: ignore[no-redef] +except ImportError: + pass + +from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 + BaseFuzzerState, + build_base_stats_dict, + build_weighted_schedule, + check_dependencies, + emit_final_report, + gen_ftl_identifier, + gen_ftl_value, + write_finding_artifact, +) + +check_dependencies(["psutil", "atheris"], [_psutil_mod, _atheris_mod]) + +import atheris # noqa: E402 # pylint: disable=C0412,C0413 + +# --- Domain Metrics --- + + +@dataclass +class SerializerMetrics: + """Domain-specific metrics for AST-construction serializer fuzzer.""" + + ast_construction_failures: int = 0 + convergence_failures: int = 0 + junk_on_reparse: int = 0 + validation_errors: int = 0 + visitor_runs: int = 0 + transformer_runs: int = 0 + + +# --- Global State --- + +_state = BaseFuzzerState( + seed_corpus_max_size=100, + fuzzer_name="serializer", + fuzzer_target="serialize (AST-constructed), FluentParserV1", +) +_domain = SerializerMetrics() + + +# Pattern weights: (name, weight) +_PATTERN_WEIGHTS: tuple[tuple[str, int], ...] = ( + ("leading_whitespace", 18), + ("trailing_whitespace", 8), + ("syntax_chars_value", 15), + ("simple_message", 8), + ("string_literal_placeable", 10), + ("attribute_edge_cases", 12), + ("term_edge_cases", 8), + ("select_expression", 8), + ("mixed_elements", 8), + ("multiline_value", 5), + ("visitor_dispatch", 8), + ("transformer_roundtrip", 8), + ("transformer_validation", 6), +) + +_PATTERN_SCHEDULE: tuple[str, ...] = build_weighted_schedule( + [name for name, _ in _PATTERN_WEIGHTS], + [weight for _, weight in _PATTERN_WEIGHTS], +) + +# Register intended weights for skew detection +_state.pattern_intended_weights = { + name: float(weight) for name, weight in _PATTERN_WEIGHTS +} + + +class SerializerFuzzError(Exception): + """Raised when a serializer roundtrip invariant is breached.""" + + +# Allowed exceptions from parser/serializer +ALLOWED_EXCEPTIONS = ( + ValueError, + TypeError, + RecursionError, + MemoryError, + UnicodeDecodeError, + UnicodeEncodeError, +) + + +# --- Reporting --- + +_REPORT_DIR = pathlib.Path(".fuzz_atheris_corpus") / "serializer" + + +def _build_stats_dict() -> dict[str, Any]: + """Build complete stats dictionary including domain metrics.""" + stats = build_base_stats_dict(_state) + + stats["ast_construction_failures"] = _domain.ast_construction_failures + stats["convergence_failures"] = _domain.convergence_failures + stats["junk_on_reparse"] = _domain.junk_on_reparse + stats["validation_errors"] = _domain.validation_errors + stats["visitor_runs"] = _domain.visitor_runs + stats["transformer_runs"] = _domain.transformer_runs + + return stats + + +def _emit_report() -> None: + """Emit comprehensive final report (crash-proof).""" + emit_final_report( + _state, _build_stats_dict(), _REPORT_DIR, + "fuzz_serializer_report.json", + ) + + +atexit.register(_emit_report) + + +# --- Finding Artifacts --- + +_FINDINGS_DIR = _REPORT_DIR / "findings" + + +# --- Instrumentation & Parser --- + +logging.getLogger("ftllexengine").setLevel(logging.CRITICAL) + +atheris.enabled_hooks.add("str") +# RegEx hook omitted: serializer fuzzer constructs ASTs programmatically, +# no regex in the hot path. The hook triggers spurious Atheris errors on +# transitively imported stdlib regex patterns (e.g., email.charset). + +with atheris.instrument_imports(include=["ftllexengine"]): + from ftllexengine.syntax.ast import ( + Attribute, + Identifier, + Junk, + Message, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TextElement, + VariableReference, + Variant, + ) + from ftllexengine.syntax.parser import FluentParserV1 + from ftllexengine.syntax.serializer import serialize + +_parser = FluentParserV1() + + +# --- AST Construction Helpers --- + +# Characters that are syntactically significant in FTL pattern positions +_FTL_SYNTAX_CHARS = "{}.#*[" + + +def _mk_id(fdp: atheris.FuzzedDataProvider) -> Identifier: + """Construct an Identifier AST node from fuzzed bytes.""" + return Identifier(name=gen_ftl_identifier(fdp)) + + +def _mk_pattern(text: str) -> Pattern: + """Construct a single-element text Pattern.""" + return Pattern(elements=(TextElement(value=text),)) + + +def _mk_attr( + fdp: atheris.FuzzedDataProvider, + value_text: str, +) -> Attribute: + """Construct an Attribute with the given value text.""" + return Attribute(id=_mk_id(fdp), value=_mk_pattern(value_text)) + + +def _mk_message( + fdp: atheris.FuzzedDataProvider, + *, + value: Pattern | None = None, + attributes: tuple[Attribute, ...] = (), +) -> Message: + """Construct a Message AST node.""" + return Message(id=_mk_id(fdp), value=value, attributes=attributes) + + +def _mk_term( + fdp: atheris.FuzzedDataProvider, + *, + value: Pattern, + attributes: tuple[Attribute, ...] = (), +) -> Term: + """Construct a Term AST node.""" + return Term(id=_mk_id(fdp), value=value, attributes=attributes) + + +def _mk_nonempty_value( + fdp: atheris.FuzzedDataProvider, + *, + max_length: int = 24, +) -> str: + """Generate a non-empty FTL-safe text fragment.""" + value = gen_ftl_value(fdp, max_length=max_length) + return value or "value" + + +def _build_visitor_resource(fdp: atheris.FuzzedDataProvider) -> Resource: + """Construct a small but structurally rich AST for visitor coverage.""" + selector = VariableReference(id=Identifier(name="count")) + variants = ( + Variant( + key=Identifier(name="one"), + value=_mk_pattern(_mk_nonempty_value(fdp)), + ), + Variant( + key=Identifier(name="other"), + value=_mk_pattern(_mk_nonempty_value(fdp)), + default=True, + ), + ) + select = SelectExpression(selector=selector, variants=variants) + message = Message( + id=Identifier(name=f"msg-{gen_ftl_identifier(fdp)}"), + value=Pattern( + elements=( + TextElement(value=_mk_nonempty_value(fdp)), + Placeable(expression=select), + ), + ), + attributes=( + Attribute( + id=Identifier(name="label"), + value=_mk_pattern(_mk_nonempty_value(fdp)), + ), + ), + ) + term = Term( + id=Identifier(name=f"term-{gen_ftl_identifier(fdp)}"), + value=Pattern( + elements=( + TextElement(value=_mk_nonempty_value(fdp)), + Placeable( + expression=StringLiteral(value=_mk_nonempty_value(fdp)), + ), + ), + ), + attributes=(), + ) + return Resource(entries=(message, term)) + + +# --- Roundtrip Verification --- + + +def _verify_serializer_roundtrip( + ast: Resource, + pattern: str, +) -> None: + """Verify serialize(ast) -> parse -> serialize convergence. + + Steps: + 1. Serialize constructed AST -> S1 + 2. Parse S1 -> AST2 + 3. Check no Junk entries + 4. Serialize AST2 -> S2 + 5. Assert S1 == S2 (idempotence) + + On failure, writes finding artifacts before raising. + """ + s1 = serialize(ast, validate=False) + + ast2 = _parser.parse(s1) + + if any(isinstance(e, Junk) for e in ast2.entries): + _domain.junk_on_reparse += 1 + write_finding_artifact( + findings_dir=_FINDINGS_DIR, state=_state, + source=f"[AST-constructed: {pattern}]", s1=s1, s2="", + pattern=pattern, + extra_meta={"failure_type": "junk_on_reparse"}, + ) + msg = ( + f"Serialized AST produced Junk on re-parse.\n" + f"Pattern: {pattern}\n" + f"S1 ({len(s1)} chars): {s1[:200]!r}" + ) + raise SerializerFuzzError(msg) + + s2 = serialize(ast2) + + if s1 != s2: + _domain.convergence_failures += 1 + write_finding_artifact( + findings_dir=_FINDINGS_DIR, state=_state, + source=f"[AST-constructed: {pattern}]", s1=s1, s2=s2, + pattern=pattern, + extra_meta={"failure_type": "convergence_failure"}, + ) + msg = ( + f"Convergence failure: S(AST) != S(P(S(AST)))\n" + f"Pattern: {pattern}\n" + f"S1 ({len(s1)} chars): {s1[:200]!r}\n" + f"S2 ({len(s2)} chars): {s2[:200]!r}" + ) + raise SerializerFuzzError(msg) + +__all__ = [name for name in globals() if not name.startswith("__")] diff --git a/fuzz_atheris/fuzz_structured.py b/fuzz_atheris/fuzz_structured.py index 14bfea86..63667ae2 100644 --- a/fuzz_atheris/fuzz_structured.py +++ b/fuzz_atheris/fuzz_structured.py @@ -1,9 +1,4 @@ #!/usr/bin/env python3 -# FUZZ_PLUGIN_HEADER_START -# FUZZ_PLUGIN: structured - Structure-aware fuzzing (Deep AST) -# Intentional: This header is intentionally placed for dynamic plugin discovery. -# CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# FUZZ_PLUGIN_HEADER_END """Structure-Aware Fuzzer (Atheris). Generates syntactically plausible FTL using grammar-aware construction, diff --git a/fuzz_atheris/targets.tsv b/fuzz_atheris/targets.tsv new file mode 100644 index 00000000..89024794 --- /dev/null +++ b/fuzz_atheris/targets.tsv @@ -0,0 +1,25 @@ +# target module description +bridge fuzz_bridge.py FunctionRegistry bridge machinery +builtins fuzz_builtins.py Built-in function Babel boundary +cache fuzz_cache.py Cache concurrency and audit behavior +currency fuzz_currency.py Currency formatting oracle +cursor fuzz_cursor.py Cursor and parse-position helpers +dates fuzz_dates.py Locale-aware date and datetime parsing +diagnostics_formatter fuzz_diagnostics_formatter.py Diagnostic formatter output and escaping +graph fuzz_graph.py Dependency graph algorithms +integrity fuzz_integrity.py Semantic validation and data integrity +introspection fuzz_introspection.py Message introspection and reference extraction +iso fuzz_iso.py ISO lookup and introspection APIs +locale_context fuzz_locale_context.py LocaleContext direct formatting API +localization fuzz_localization.py FluentLocalization orchestration +lock fuzz_lock.py RWLock contention behavior +numbers fuzz_numbers.py Number formatting oracle +oom fuzz_oom.py Parser object-density limits +parse_currency fuzz_parse_currency.py Currency parsing and symbol resolution +parse_decimal fuzz_parse_decimal.py Decimal parsing and FluentNumber parsing +plural fuzz_plural.py CLDR plural category boundaries +roundtrip fuzz_roundtrip.py Parser and serializer roundtrip +runtime fuzz_runtime.py End-to-end runtime behavior and strict mode +scope fuzz_scope.py Variable scoping invariants +serializer fuzz_serializer.py AST-construction serializer paths +structured fuzz_structured.py Structure-aware parser stress diff --git a/images/FTLLexEngine.png b/images/FTLLexEngine.png deleted file mode 100644 index 5b640eee..00000000 Binary files a/images/FTLLexEngine.png and /dev/null differ diff --git a/pyproject.toml b/pyproject.toml index 24cc4345..8857bc2d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -33,6 +33,13 @@ python_exec_globs = [ "docs/VALIDATION_GUIDE.md", "docs/WORKFLOW_TOUR.md", ] +# Markdown files whose bash/sh fences are the canonical quick-start shell workflows. +shell_exec_globs = [ + "docs/FUZZING_GUIDE.md", + "docs/FUZZING_GUIDE_ATHERIS.md", + "docs/FUZZING_GUIDE_HYPOFUZZ.md", + "fuzz_atheris/README.md", +] # Substrings that trigger skipping a specific FTL code block skip_markers = [ "# ←", @@ -50,10 +57,11 @@ skip_markers = [ ] # Path to the parser class used for validation parser_path = "ftllexengine.syntax.parser:FluentParserV1" +shell_exec_timeout_seconds = 180 [project] name = "ftllexengine" -version = "0.165.0" +version = "0.166.0" description = "Python runtime for the Fluent (FTL) specification: bidirectional parsing, CLDR-backed locale-aware formatting, and fail-fast boot validation with structured audit evidence." readme = "README.md" requires-python = ">=3.13" @@ -122,15 +130,16 @@ dev = [ "Babel>=2.18.0,<3.0.0", # Required for tests (locale formatting, parsing) "pytest>=9.0.3", "pytest-cov>=7.1.0", - "hypothesis>=6.152.1", + "hypothesis>=6.152.4", "mypy>=1.20.2", - "ruff>=0.15.11", + "ruff>=0.15.12", "pytest-benchmark>=5.2.3", "psutil>=7.2.2", "types-psutil>=7.2.2.20260408", ] fuzz = [ + "hypothesis[cli]>=6.152.4", "hypofuzz>=25.11.1", ] @@ -139,7 +148,7 @@ atheris = [ ] release = [ - "build>=1.4.3", + "build>=1.5.0", "twine>=6.2.0", ] @@ -475,6 +484,12 @@ ignore = [ "tests/test_integration_coverage.py" = ["N802", "PLC0415"] "tests/test_syntax_validator.py" = ["PLC0415"] "tests/test_validation_resource.py" = ["PLC0415"] +"tests/*_cases/*.py" = ["F405", "PLC0415"] +"tests/parsing_dates_cases/*.py" = ["DTZ001", "F405", "PLC0415"] +"tests/syntax_visitor_cases/*.py" = ["F405", "N802"] +"tests/syntax_visitor_transformer_cases/*.py" = ["F405", "N802", "ARG002"] +"tests/syntax_visitor_cases/__init__.py" = ["N802"] +"tests/syntax_visitor_transformer_cases/__init__.py" = ["N802"] # Fuzzing: pytestmark precedes later imports for file-level marker application "tests/fuzz/test_syntax_parser_grammar.py" = ["E402"] "tests/fuzz/test_syntax_parser_whitespace.py" = ["E402"] diff --git a/scripts/benchmark.sh b/scripts/benchmark.sh index 8e75b559..374ff3db 100755 --- a/scripts/benchmark.sh +++ b/scripts/benchmark.sh @@ -73,7 +73,14 @@ shopt -s inherit_errexit # [SECTION: ENVIRONMENT_ISOLATION] PY_VERSION="${PY_VERSION:-3.13}" -TARGET_VENV=".venv-${PY_VERSION}" +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then + TARGET_VENV=".venv-devcontainer-${PY_VERSION}" +else + TARGET_VENV=".venv-${PY_VERSION}" +fi +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then + export UV_LINK_MODE="copy" +fi if [[ "${UV_PROJECT_ENVIRONMENT:-}" != "$TARGET_VENV" ]]; then if [[ "${BENCHMARK_ALREADY_PIVOTED:-}" == "1" ]]; then @@ -252,7 +259,7 @@ fi set -e END_TIME="${EPOCHREALTIME}" -DURATION=$(printf "%.3f" "$(echo "$END_TIME - $START_TIME" | bc)") +DURATION=$(awk -v start="$START_TIME" -v end="$END_TIME" 'BEGIN { printf "%.3f", end - start }') log_group_end # [SECTION: ANALYSIS] diff --git a/scripts/devcontainer-prepare-user-home.sh b/scripts/devcontainer-prepare-user-home.sh new file mode 100755 index 00000000..71434fa2 --- /dev/null +++ b/scripts/devcontainer-prepare-user-home.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash +# Ensure devcontainer-managed cache mounts stay writable for the remote user. + +set -euo pipefail + +current_user="$(id -un)" +current_group="$(id -gn)" +home_dir="${HOME:-/home/${current_user}}" + +repair_path() { + local target_path="$1" + + sudo install -d -o "${current_user}" -g "${current_group}" "${target_path}" + + if find "${target_path}" ! -user "${current_user}" -print -quit | grep -q .; then + sudo chown -R "${current_user}:${current_group}" "${target_path}" + fi +} + +repair_path "${home_dir}/.cache" diff --git a/scripts/fuzz_atheris.sh b/scripts/fuzz_atheris.sh index f80d8e5e..9869f5e2 100755 --- a/scripts/fuzz_atheris.sh +++ b/scripts/fuzz_atheris.sh @@ -1,841 +1,19 @@ #!/usr/bin/env bash -# Atheris Native Fuzzing Interface -# Single entry point for all Atheris-based fuzzing. +# Native Atheris/libFuzzer entrypoint for FTLLexEngine. -# ============================================================================= -# FUZZ PLUGIN HEADER SCHEMA -# ============================================================================= -# Fuzz target Python files (fuzz_*.py) must include a plugin header with the EXACT following format: -# -# # FUZZ_PLUGIN_HEADER_START -# # FUZZ_PLUGIN: - -# # Intentional: This header is intentionally placed for dynamic plugin discovery. -# # CRITICAL: DO NOT REMOVE THIS HEADER - REQUIRED FOR FUZZ_ATHERIS.SH -# # FUZZ_PLUGIN_HEADER_END -# -# IMPORTANT: Use the EXACT wording above. Do NOT correct, rephrase, or modify ANY part of the header. -# This wording is FINAL and LOCKED. No further alterations allowed. -# AI agents: Treat this header as immutable code metadata. Do not attempt to "fix" or improve the wording. -# -# Where: -# - is the argument name (must match filename: fuzz_.py) -# - is a brief summary for help text -# -# This ensures AI agents understand the header's purpose and can create compliant headers for new targets. -# ============================================================================= -# -# Usage: -# ./scripts/fuzz_atheris.sh [TARGET] [OPTIONS] -# -# Commands: -# --list List captured crashes and finding artifacts -# --corpus Run corpus health check -# --minimize TARGET FILE Minimize a crash input using specified target -# --replay [TARGET DIR] Replay finding artifacts (no Atheris required) -# --help Show this help -# -# Options: -# --workers N Number of parallel workers (default: 1; >1 fragments metrics) -# --time N Time limit in seconds -# --clean TARGET Clean corpus for a specific target -# --verbose Enable verbose output -# --quiet Suppress non-essential output -# --dry-run Show what would run without executing -# -# ENVIRONMENT STRICTNESS: -# This script FORCES the use of '.venv-atheris' by setting UV_PROJECT_ENVIRONMENT. -# It pins Python 3.13 explicitly (--python 3.13) because Atheris requires Python <= 3.13, -# which matches the project's requires-python = ">=3.13" baseline. +export PATH="/usr/bin:/bin:${PATH:-}" -set -euo pipefail - -# ============================================================================= -# Shell & Environment Setup -# ============================================================================= - -# Bash 5 check (for EPOCHREALTIME used in timing) -if ((BASH_VERSINFO[0] < 5)); then - echo "[FATAL] Bash v5.0+ required (Current: ${BASH_VERSION})" - echo "Install via 'brew install bash' and update shebang if needed." +if [[ "${BASH_VERSINFO[0]}" -lt 5 ]]; then + echo "Error: Bash 5.0+ required (current: ${BASH_VERSION})." >&2 exit 1 fi -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" - -# Atheris venv: managed independently of uv's project system. -# Atheris requires Python <= 3.13; the project baseline is Python 3.13. -# UV_PROJECT_ENVIRONMENT must NOT be set — it would cause uv to recreate the -# venv (matching requires-python), which could break Atheris if the venv Python -# changes. -ATHERIS_VENV="${PROJECT_ROOT}/.venv-atheris" -ATHERIS_PYTHON="${ATHERIS_VENV}/bin/python" -# Unset VIRTUAL_ENV to prevent uv from confusing it with an active shell venv -unset VIRTUAL_ENV - -# Colors (disabled if not terminal or --quiet) -if [[ -t 1 ]]; then - RED='\033[0;31m' - GREEN='\033[0;32m' - YELLOW='\033[0;33m' - BLUE='\033[0;34m' - BOLD='\033[1m' - NC='\033[0m' -else - RED='' GREEN='' YELLOW='' BLUE='' BOLD='' NC='' -fi - -# Verbosity control -VERBOSE=0 -QUIET=0 -DRY_RUN=0 - -log_info() { - if [[ $QUIET -eq 0 ]]; then - echo -e "${BLUE}[INFO]${NC} $1" - fi -} - -log_verbose() { - if [[ $VERBOSE -eq 1 ]]; then - echo -e "${BLUE}[DEBUG]${NC} $1" - fi -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" >&2 -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" >&2 -} - -# ============================================================================= -# Plugin / Target Definitions -# ============================================================================= - -declare -A PARAM_TARGETS -declare -A PARAM_DESCRIPTIONS -declare -a PARAM_ORDER # Preserve discovery order for deterministic help - -# Dynamic Plugin Discovery -discover_plugins() { - local fuzz_dir="$PROJECT_ROOT/fuzz_atheris" - for file in "$fuzz_dir"/fuzz_*.py; do - if [[ -f "$file" ]]; then - # Find the FUZZ_PLUGIN line - local header - if header=$(grep -m1 '^# FUZZ_PLUGIN:' "$file" 2>/dev/null); then - # Validate header structure with start and end tags - local start_line plugin_line end_line - start_line=$(grep -n '^# FUZZ_PLUGIN_HEADER_START' "$file" 2>/dev/null | head -1 | cut -d: -f1) - plugin_line=$(grep -n '^# FUZZ_PLUGIN:' "$file" 2>/dev/null | head -1 | cut -d: -f1) - end_line=$(grep -n '^# FUZZ_PLUGIN_HEADER_END' "$file" 2>/dev/null | head -1 | cut -d: -f1) - if [[ -n "$start_line" ]] && [[ -n "$plugin_line" ]] && [[ -n "$end_line" ]] && [[ "$start_line" -lt "$plugin_line" ]] && [[ "$plugin_line" -lt "$end_line" ]]; then - if [[ $header =~ ^#\ FUZZ_PLUGIN:\ (.+)\ -\ (.+)$ ]]; then - local name="${BASH_REMATCH[1]}" - local desc="${BASH_REMATCH[2]}" - local filename - filename=$(basename "$file" .py) - local expected_name="fuzz_${name}" - if [[ "$filename" = "$expected_name" ]]; then - PARAM_TARGETS["$name"]="$file" - PARAM_DESCRIPTIONS["$name"]="$desc" - PARAM_ORDER+=("$name") - log_verbose "Discovered plugin: $name -> $file" - else - log_warn "Plugin name '$name' does not match filename '$filename'" - fi - fi - fi - fi - fi - done - log_verbose "Discovered ${#PARAM_TARGETS[@]} plugins" -} - -# Discover plugins -discover_plugins - -# ============================================================================= -# Atheris Venv Bootstrap -# ============================================================================= - -_find_python313() { - # Prefer uv-managed interpreters in the current workspace. - if command -v uv &>/dev/null; then - local uv_python - uv_python=$(uv python find 3.13 2>/dev/null || echo "") - if [[ -n "$uv_python" ]] && [[ -x "$uv_python" ]]; then - echo "$uv_python" - return 0 - fi - fi - # Prefer pyenv (most reliable on macOS dev machines) - if command -v pyenv &>/dev/null; then - local pyenv_root resolved - pyenv_root=$(pyenv root 2>/dev/null) - resolved=$(pyenv latest 3.13 2>/dev/null || echo "") - if [[ -n "$resolved" ]] && [[ -f "${pyenv_root}/versions/${resolved}/bin/python3" ]]; then - echo "${pyenv_root}/versions/${resolved}/bin/python3" - return 0 - fi - fi - # Fall back to system python3.13 - if command -v python3.13 &>/dev/null; then - echo "python3.13" - return 0 - fi - return 1 -} - -ensure_atheris_venv() { - if [[ -d "$ATHERIS_VENV" ]] && [[ ! -x "$ATHERIS_PYTHON" ]]; then - echo -e "${YELLOW}[WARN] .venv-atheris exists but its Python is missing or broken. Recreating...${NC}" - rm -rf "$ATHERIS_VENV" - fi - - # If venv exists and is already Python 3.13, nothing to do. - if [[ -x "$ATHERIS_PYTHON" ]]; then - local venv_mm - venv_mm=$("$ATHERIS_PYTHON" --version 2>&1 | grep -oE '[0-9]+\.[0-9]+' | head -1) - if [[ "$venv_mm" == "3.13" ]]; then - if "$ATHERIS_PYTHON" -c "import atheris, ftllexengine, psutil" &>/dev/null; then - return 0 - fi - echo -e "${YELLOW}[WARN] .venv-atheris is missing required packages. Recreating...${NC}" - rm -rf "$ATHERIS_VENV" - else - echo -e "${YELLOW}[WARN] .venv-atheris has Python $venv_mm (need 3.13). Recreating...${NC}" - rm -rf "$ATHERIS_VENV" - fi - fi - - local python313 - if ! python313=$(_find_python313); then - log_error "Python 3.13 not found. Install it or make it discoverable via uv/python3.13/pyenv." - exit 1 - fi - - echo -e "${BOLD}Creating .venv-atheris with Python 3.13...${NC}" - "$python313" -m venv "$ATHERIS_VENV" - echo "Installing atheris + psutil..." - "$ATHERIS_PYTHON" -m pip install --quiet "atheris>=3.0.0" "psutil>=7.0.0" - echo "Installing ftllexengine[babel]..." - "$ATHERIS_PYTHON" -m pip install --quiet -e "${PROJECT_ROOT}[babel]" - echo -e "${GREEN}[OK] .venv-atheris ready (Python 3.13 + Atheris).${NC}" -} - -# ============================================================================= -# Pre-Flight Diagnostics & "Binary Surgery" (macOS Fix) -# ============================================================================= - -# This function ensures Atheris is installed, linked correctly, and running -# on the correct Python version. It auto-heals macOS dynamic linking issues. -run_diagnostics() { - ensure_atheris_venv - - if [[ $QUIET -eq 1 ]]; then - # Minimal diagnostics in quiet mode - if ! "$ATHERIS_PYTHON" -c "import atheris.core_with_libfuzzer" 2>/dev/null; then - log_error "Atheris is not properly installed. Run without --quiet for diagnostics." - exit 1 - fi - return 0 - fi - - echo -e "\n${BOLD}============================================================${NC}" - echo -e "${BOLD}Atheris Diagnostic Check${NC}" - echo -e "Env: ${BLUE}.venv-atheris${NC}" - echo -e "${BOLD}============================================================${NC}\n" - - # 1. Check Python Version (Must be < 3.14) - local python_bin="$ATHERIS_PYTHON" - local python_version - python_version=$("$python_bin" --version 2>&1 | grep -oE '[0-9]+\.[0-9]+' | head -1) - - echo -n "Python Version... " - # Semantic version comparison using sort -V - local min_supported="3.11" - local max_supported="3.13" - if ! printf '%s\n%s\n' "$min_supported" "$python_version" | sort -V -C 2>/dev/null; then - echo -e "${RED}$python_version (Too old, requires >= $min_supported)${NC}" - exit 3 - elif ! printf '%s\n%s\n' "$python_version" "$max_supported" | sort -V -C 2>/dev/null; then - echo -e "${RED}$python_version (Unsupported, Atheris requires <= $max_supported)${NC}" - echo -e "${RED}[FATAL] Python $python_version is not supported by Atheris.${NC}" - exit 3 - else - echo -e "${GREEN}$python_version${NC}" - fi - - # 2. Check ABI / Importability - echo -n "ABI Compatibility... " - if "$python_bin" -c "import atheris.core_with_libfuzzer" 2>/dev/null; then - echo -e "${GREEN}OK${NC}" - - echo -n "Fuzzing Capability... " - if "$python_bin" -c "import atheris; f=lambda d:None; atheris.Setup(['test'],f); print('OK')" 2>/dev/null | grep -q "OK"; then - echo -e "${GREEN}OK${NC}" - else - echo -e "${RED}FAILED${NC}" - exit 1 - fi - - echo -e "\n${BOLD}============================================================${NC}" - echo -e "${GREEN}[OK]${NC} Atheris is ready." - echo -e "${BOLD}============================================================${NC}\n" - return 0 - fi - - echo -e "${RED}FAILED${NC}" - echo -e "${YELLOW}[WARN] Atheris ABI check failed. Attempting Auto-Heal...${NC}" - - # 3. macOS Auto-Heal (Binary Surgery) - if [[ "$(uname)" == "Darwin" ]]; then - heal_macos_atheris "$python_bin" - - # Verify after healing - echo -n "ABI Compatibility (Post-Fix)... " - if "$python_bin" -c "import atheris.core_with_libfuzzer" 2>/dev/null; then - echo -e "${GREEN}OK${NC}" - echo -e "\n${BOLD}============================================================${NC}" - echo -e "${GREEN}[OK]${NC} Atheris was repaired and is ready." - echo -e "${BOLD}============================================================${NC}\n" - else - echo -e "${RED}STILL FAILING${NC}" - exit 1 - fi - else - log_error "Atheris setup is broken and this is not macOS (cannot auto-heal)." - echo "Try: rm -rf .venv-atheris && ./scripts/fuzz_atheris.sh --setup" - exit 1 - fi -} - - -heal_macos_atheris() { - local python_bin="$1" - - # Locate LLVM (support both ARM and Intel Macs) - local llvm_prefix - if command -v brew &>/dev/null; then - llvm_prefix=$(brew --prefix llvm 2>/dev/null || echo "") - fi - - # Fallback paths for common installations - if [[ -z "$llvm_prefix" ]] || [[ ! -d "$llvm_prefix" ]]; then - for candidate in "/opt/homebrew/opt/llvm" "/usr/local/opt/llvm" "/opt/llvm"; do - if [[ -d "$candidate" ]]; then - llvm_prefix="$candidate" - break - fi - done - fi - - if [[ -z "$llvm_prefix" ]] || [[ ! -d "$llvm_prefix" ]]; then - log_error "LLVM not found. Please run: brew install llvm" - exit 2 - fi - - log_info "Re-installing Atheris with custom flags from $llvm_prefix..." - - # Reinstall with special flags - ( - export CLANG_BIN="$llvm_prefix/bin/clang" - export CC="$llvm_prefix/bin/clang" - export CXX="$llvm_prefix/bin/clang++" - export LDFLAGS="-L$llvm_prefix/lib/c++ -L$llvm_prefix/lib -Wl,-rpath,$llvm_prefix/lib/c++" - export CPPFLAGS="-I$llvm_prefix/include" - - # Force reinstall in the specific environment - uv pip install --python "$python_bin" --reinstall --no-cache-dir --no-binary :all: atheris - ) - - # Validate again - if "$python_bin" -c "import atheris.core_with_libfuzzer" 2>/dev/null; then - echo -e "${GREEN}[SUCCESS] Atheris repaired successfully.${NC}" - else - log_error "Surgery failed. Could not fix Atheris ABI." - exit 1 - fi -} - - -# ============================================================================= -# Subroutines -# ============================================================================= - -run_setup() { - # Explicit environment setup/heal. Heals automatically on macOS; prints - # manual instructions on other platforms. - echo -e "${BOLD}Environment Setup / Heal${NC}" - if [[ -n "${TARGET:-}" ]]; then - echo "Target: $TARGET (target-specific checks enabled)" - else - echo "Target: (none) -- pass a target name for target-specific checks" - echo " Example: ./scripts/fuzz_atheris.sh --setup currency" - fi - echo "" - run_diagnostics - echo -e "${GREEN}[OK]${NC} Environment is ready." -} - - -show_help() { - cat << EOF -Atheris Fuzzing Interface - -USAGE: - ./scripts/fuzz_atheris.sh [TARGET] [OPTIONS] - -TARGETS: -EOF - # Use PARAM_ORDER for deterministic output - for key in "${PARAM_ORDER[@]}"; do - printf " %-15s %s\n" "$key" "${PARAM_DESCRIPTIONS[$key]}" - done - cat << EOF - -COMMANDS: - --setup [TARGET] Check and auto-heal the fuzzing environment. Heals - automatically on macOS; prints manual instructions - on other platforms. No fuzzing run is started. - --list List all crashes and finding artifacts - --corpus Run corpus health check (fuzz_atheris_corpus_health.py) - --minimize TARGET FILE Minimize a crash input using the specified target - --replay TARGET [DIR] Replay finding artifacts without Atheris - --report TARGET Show the report for the last run of a target - --clean TARGET Clean corpus for a specific target - -OPTIONS: - --workers N Number of workers (default: 1; >1 fragments metrics) - --time N Max time in seconds - --verbose Enable verbose output - --quiet Suppress non-essential output - --dry-run Show what would run without executing - --help Show this help - -AUTO-HEAL: - The script automatically checks and repairs the environment before each - fuzzing run. On macOS, one issue is healed automatically: - - 1. Atheris ABI mismatch (LLVM version): Atheris is rebuilt from source - using the system LLVM (brew install llvm required). - - To trigger heals without starting a fuzz run: - ./scripts/fuzz_atheris.sh --setup - -EXAMPLES: - ./scripts/fuzz_atheris.sh currency --time 60 - ./scripts/fuzz_atheris.sh runtime --workers 8 - ./scripts/fuzz_atheris.sh --setup - ./scripts/fuzz_atheris.sh --minimize currency .fuzz_atheris_corpus/crash_abc123 - ./scripts/fuzz_atheris.sh --replay structured - ./scripts/fuzz_atheris.sh --replay structured .fuzz_atheris_corpus/structured/findings/ - ./scripts/fuzz_atheris.sh --clean roundtrip -EOF -} - -run_list() { - local corpus_dir="$PROJECT_ROOT/.fuzz_atheris_corpus" - - if [[ ! -d "$corpus_dir" ]]; then - echo "No corpus directory found." - return 0 - fi - - # Use Python for robust JSON parsing and listing - "$ATHERIS_PYTHON" - "$corpus_dir" << 'EOF' -import sys -import os -import json -import glob - -corpus_dir = sys.argv[1] -RED = '\033[0;31m' -GREEN = '\033[0;32m' -BLUE = '\033[0;34m' -BOLD = '\033[1m' -NC = '\033[0m' - -if not os.isatty(1): - RED = GREEN = BLUE = BOLD = NC = '' - -# Section 1: Raw libFuzzer crash files -print(f"{BOLD}Crashes (raw libFuzzer artifacts){NC}") -crashes = glob.glob(os.path.join(corpus_dir, "crash_*")) -crash_count = len(crashes) - -if crash_count > 0: - print(f"Found {crash_count} crash(es):") - for crash in crashes[:10]: - print(f" {crash}") - if crash_count > 10: - print(f" ... and {crash_count - 10} more") - print(f"\nInspect: xxd {crashes[0]} | head -20") -else: - print(" No crashes found.") -print("") - -# Section 2: Finding artifacts (human-readable, actionable) -print(f"{BOLD}Findings (actionable artifacts){NC}") -finding_dirs = [] -for root, dirs, files in os.walk(corpus_dir): - if os.path.basename(root) == "findings": - finding_dirs.append(root) - -total_findings = 0 -for fdir in finding_dirs: - meta_files = sorted(glob.glob(os.path.join(fdir, "*_meta.json"))) - if meta_files: - print(f" {fdir}:") - for meta_file in meta_files[:10]: - total_findings += 1 - basename = os.path.basename(meta_file) - try: - with open(meta_file, 'r') as f: - data = json.load(f) - pattern = data.get('pattern', 'unknown') - diff_offset = data.get('diff_offset', '?') - source_len = data.get('source_len', '?') - print(f" {basename} pattern={pattern} source={source_len}chars diff@byte{diff_offset}") - except Exception: - print(f" {basename} (parse error)") - if len(meta_files) > 10: - print(f" ... and {len(meta_files) - 10} more") - -if total_findings == 0: - print(" No finding artifacts found.") -else: - print("\nReplay: ./scripts/fuzz_atheris.sh --replay ") -EOF -} - -run_corpus_health() { - local health_script="$PROJECT_ROOT/scripts/fuzz_atheris_corpus_health.py" - if [[ ! -f "$health_script" ]]; then - log_error "Corpus health script not found: $health_script" - exit 1 - fi - run_diagnostics - echo -e "${BOLD}Checking Corpus Health...${NC}" - "$ATHERIS_PYTHON" "$health_script" -} - -parse_and_display_report() { - local target_key="$1" - local corpus_dir="$PROJECT_ROOT/.fuzz_atheris_corpus/$target_key" - - # Try to read JSON report from file (written during fuzzing) - local report_file="$corpus_dir/fuzz_${target_key}_report.json" - - if [[ ! -f "$report_file" ]]; then - log_warn "No JSON summary found (fuzzer may not have completed enough iterations)" - return 0 - fi - - # Use Python for robust JSON parsing and display - "$ATHERIS_PYTHON" - "$report_file" << 'EOF' -import sys -import json -import os - -report_file = sys.argv[1] -RED = '\033[0;31m' -GREEN = '\033[0;32m' -YELLOW = '\033[0;33m' -BLUE = '\033[0;34m' -BOLD = '\033[1m' -NC = '\033[0m' - -if not os.isatty(1): - RED = GREEN = YELLOW = BLUE = BOLD = NC = '' - -def format_number(n): - return "{:,}".format(n) - -try: - with open(report_file, 'r') as f: - data = json.load(f) -except Exception: - # If JSON is invalid or empty, we can't do much - sys.exit(0) - -# Parse key metrics -status = data.get('status', 'unknown') -iterations = data.get('iterations', 0) -findings = data.get('findings', 0) - -# Parse optional fields -fuzzer_name = data.get('fuzzer_name') -fuzzer_target = data.get('fuzzer_target') -duration = data.get('campaign_duration_sec') -throughput = data.get('iterations_per_sec') - -# Display summary header -print(f"\n{BOLD}============================================================{NC}") -print(f"{BOLD}Fuzzing Campaign Summary{NC}") -print(f"{BOLD}============================================================{NC}") -if fuzzer_name: - print(f"Fuzzer: {fuzzer_name}") -if fuzzer_target: - print(f"Target: {fuzzer_target}") -print(f"Status: {status}") -print(f"Iterations: {format_number(iterations)}") -if duration is not None: - print(f"Duration: {duration}s") -if throughput is not None: - print(f"Throughput: {throughput:,.1f} iter/s") -print(f"Findings: {format_number(findings)}") - -# Performance metrics -perf_mean = data.get('perf_mean_ms') -perf_p95 = data.get('perf_p95_ms') -perf_p99 = data.get('perf_p99_ms') +set -o errexit +set -o nounset +set -o pipefail +shopt -s inherit_errexit -if perf_mean is not None: - print("") - print("Performance:") - print(f" Mean: {perf_mean}ms") - if perf_p95 is not None: - print(f" P95: {perf_p95}ms") - if perf_p99 is not None: - print(f" P99: {perf_p99}ms") - -# Memory metrics -mem_peak = data.get('memory_peak_mb') -mem_delta = data.get('memory_delta_mb') - -if mem_peak is not None: - print("") - print("Memory:") - print(f" Peak: {mem_peak}MB") - if mem_delta is not None: - print(f" Delta: {mem_delta}MB") - -# CRITICAL: Alert on findings -if findings > 0: - print(f"\n{RED}{BOLD}[WARNING] API Contract Violations Detected{NC}") - print(f"{RED}Found {findings} violations during fuzzing campaign{NC}") - print("") - - # Extract top error patterns - print("Top Error Patterns:") - error_patterns = {k: v for k, v in data.items() if k.startswith('error_') or k.startswith('contract_')} - sorted_patterns = sorted(error_patterns.items(), key=lambda item: item[1], reverse=True)[:10] - - if sorted_patterns: - for k, v in sorted_patterns: - print(f" {k}: {v}") - else: - print(" (No specific error patterns found)") - - print("") - print(f"{YELLOW}Action Required:{NC}") - print(" 1. Review error patterns above") - print(f" 2. Inspect full JSON report: {report_file}") - print(" 3. Fix API contract violations in source code") - print(" 4. Re-run fuzzer to verify fixes") - print(f"{BOLD}============================================================{NC}") - sys.exit(1) -else: - print(f"\n{GREEN}[OK] No API contract violations detected{NC}") - print(f"{BOLD}============================================================{NC}") - sys.exit(0) -EOF -} - -run_fuzz_target() { - local target_key="$1" - local target_script="${PARAM_TARGETS[$target_key]:-}" - - if [[ -z "$target_script" ]]; then - log_error "Unknown target: $target_key" - show_help - exit 1 - fi - - echo -e "${BOLD}Starting Fuzzing Campaign${NC}" - echo "Target: $target_key ($target_script)" - echo "Workers: $WORKERS" - if [[ "$WORKERS" -gt 1 ]]; then - log_warn "Workers > 1: libFuzzer uses fork(). Each worker has independent state." - log_warn "JSON report reflects the LAST-EXITING worker only, not aggregate stats." - log_warn "For reliable metrics, use --workers 1 (the default)." - fi - if [[ -n "$TIME_LIMIT" ]]; then - echo "Time: ${TIME_LIMIT}s" - else - echo "Time: Indefinite (Ctrl+C to stop)" - fi - echo "" - - if [[ $DRY_RUN -eq 1 ]]; then - echo "[DRY-RUN] Would execute fuzzer with above parameters" - return 0 - fi - - # Ensure Setup - run_diagnostics - - # Per-target corpus directory prevents cross-contamination between fuzzers - local corpus_dir="$PROJECT_ROOT/.fuzz_atheris_corpus/$target_key" - mkdir -p "$corpus_dir" - - # Build args array (proper quoting) - local -a fuzz_args=() - fuzz_args+=("-workers=$WORKERS") - fuzz_args+=("-jobs=0") - fuzz_args+=("-artifact_prefix=$corpus_dir/crash_") - - if [[ -n "$TIME_LIMIT" ]]; then - fuzz_args+=("-max_total_time=$TIME_LIMIT") - fi - - # Add corpus directory as first positional (read/write) - fuzz_args+=("$corpus_dir") - - # Add target-specific seed corpus (required; no fallback to generic seeds) - local seed_dir="$PROJECT_ROOT/fuzz_atheris/seeds/$target_key" - if [[ -d "$seed_dir" ]]; then - fuzz_args+=("$seed_dir") - log_verbose "Using target-specific seeds: $seed_dir" - else - log_warn "No seed directory found: $seed_dir (fuzzer will start from empty corpus)" - fi - - # Run fuzzer (report will be written to .fuzz_atheris_corpus/fuzz__report.json) - local exit_code=0 - "$ATHERIS_PYTHON" "$target_script" "${fuzz_args[@]}" || exit_code=$? - - # Parse and display report from file - local report_exit=0 - if ! parse_and_display_report "$target_key"; then - report_exit=1 - - # Auto-replay finding artifacts to check if they reproduce without Atheris - local findings_dir="$corpus_dir/findings" - local replay_script="$PROJECT_ROOT/fuzz_atheris/fuzz_atheris_replay_finding.py" - if [[ -d "$findings_dir" ]] && [[ -f "$replay_script" ]]; then - echo "" - echo -e "${BOLD}Auto-replaying findings without Atheris instrumentation...${NC}" - "$ATHERIS_PYTHON" "$replay_script" "$findings_dir" || true - fi - - log_error "Fuzzer detected API contract violations" - exit 1 - fi - - # Return fuzzer exit code if non-zero - if [[ $exit_code -ne 0 ]]; then - exit $exit_code - fi -} - -run_replay() { - local target_key="$1" - local findings_dir="${2:-}" - - # Default findings directory based on target - if [[ -z "$findings_dir" ]]; then - findings_dir="$PROJECT_ROOT/.fuzz_atheris_corpus/$target_key/findings" - fi - - if [[ ! -d "$findings_dir" ]]; then - log_error "Findings directory not found: $findings_dir" - echo "Run the fuzzer first, or specify a findings directory." - exit 1 - fi - - local replay_script="$PROJECT_ROOT/fuzz_atheris/fuzz_atheris_replay_finding.py" - if [[ ! -f "$replay_script" ]]; then - log_error "Replay script not found: $replay_script" - exit 1 - fi - - if [[ $DRY_RUN -eq 1 ]]; then - echo "[DRY-RUN] Would replay findings from $findings_dir" - return 0 - fi - - echo -e "${BOLD}Replaying Finding Artifacts${NC}" - echo "Target: $target_key" - echo "Directory: $findings_dir" - echo "Runner: Main project venv (NOT .venv-atheris)" - echo "" - - # Run replay in the main project venv (no Atheris instrumentation) - "$ATHERIS_PYTHON" "$replay_script" "$findings_dir" -} - -run_minimize() { - local target_key="$1" - local crash_file="$2" - - # Validate target - local target_script="${PARAM_TARGETS[$target_key]:-}" - if [[ -z "$target_script" ]]; then - log_error "Unknown target for minimization: $target_key" - echo "Available targets:" - for key in "${PARAM_ORDER[@]}"; do - echo " $key" - done - exit 1 - fi - - # Validate crash file - if [[ ! -f "$crash_file" ]]; then - log_error "Crash file not found: $crash_file" - exit 1 - fi - - if [[ $DRY_RUN -eq 1 ]]; then - echo "[DRY-RUN] Would minimize $crash_file using $target_key ($target_script)" - return 0 - fi - - run_diagnostics - echo -e "${BOLD}Minimizing Crash: $crash_file${NC}" - echo "Using target: $target_key ($target_script)" - - local minimized="${crash_file}.minimized" - - # Run Atheris with -minimize_crash=1 using the CORRECT target - "$ATHERIS_PYTHON" "$target_script" \ - -minimize_crash=1 \ - -exact_artifact_path="$minimized" \ - "$crash_file" - - if [[ -f "$minimized" ]]; then - # Cross-platform file size (prefer wc -c for portability) - local orig_size new_size - orig_size=$(wc -c < "$crash_file" | tr -d ' ') - new_size=$(wc -c < "$minimized" | tr -d ' ') - - echo -e "\n${BOLD}============================================================${NC}" - echo -e "${GREEN}[SUCCESS] Crash minimized.${NC}" - echo "Original size: $orig_size bytes" - echo "Minimized size: $new_size bytes" - echo "Saved to: $minimized" - echo -e "${BOLD}============================================================${NC}" - - echo -e "\n${YELLOW}Next Steps:${NC}" - echo " 1. Reproduce: $ATHERIS_PYTHON $target_script $minimized" - echo " 2. Debug: xxd $minimized | head -20" - echo " 3. Create regression test with minimized input" - else - log_error "Minimization failed (no output file generated)." - exit 1 - fi -} - - -# ============================================================================= -# Main Dispatch -# ============================================================================= - -# Defaults +PY_VERSION="${PY_VERSION:-3.13}" WORKERS=1 TIME_LIMIT="" TARGET="" @@ -844,116 +22,108 @@ MINIMIZE_TARGET="" MINIMIZE_FILE="" REPLAY_TARGET="" REPLAY_DIR="" +QUIET=0 +VERBOSE=0 +DRY_RUN=0 +ORIGINAL_ARGS=("$@") + +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" +readonly FUZZ_LIB_DIR="$SCRIPT_DIR/lib/fuzz_atheris" + +require_fuzz_lib() { + local path="$1" + [[ -f "$path" ]] || { + echo "Error: Missing Atheris helper library: $path" >&2 + exit 1 + } +} -# Strict Argument Parser while [[ $# -gt 0 ]]; do case "$1" in --setup) MODE="setup" shift - # Optional TARGET argument (not prefixed with --) - if [[ $# -gt 0 ]] && [[ ! "$1" == --* ]]; then + if [[ $# -gt 0 && "$1" != --* ]]; then TARGET="$1" shift fi ;; - --list|--corpus) - if [[ "$MODE" != "fuzz" && "$MODE" != "${1#--}" ]]; then - log_error "Conflicting modes selected: $MODE vs ${1#--}" - exit 1 - fi - MODE="${1#--}" + --list) + MODE="list" shift ;; - --clean) - MODE="clean" - # --clean requires a TARGET (either already parsed or as next arg) - if [[ -z "$TARGET" ]]; then - if [[ $# -lt 2 ]] || [[ "$2" == --* ]]; then - log_error "--clean requires a TARGET argument" - echo "Usage: ./scripts/fuzz_atheris.sh TARGET --clean" - echo " ./scripts/fuzz_atheris.sh --clean TARGET" - exit 1 - fi - TARGET="$2" - shift - fi + --corpus) + MODE="corpus" + shift + ;; + --smoke-all) + MODE="smoke" shift ;; --minimize) - MODE="minimize" - # --minimize requires TARGET and FILE - if [[ $# -lt 3 ]]; then - log_error "--minimize requires TARGET and FILE arguments" - echo "Usage: ./scripts/fuzz_atheris.sh --minimize TARGET FILE" - echo "Example: ./scripts/fuzz_atheris.sh --minimize currency .fuzz_atheris_corpus/crash_abc123" + [[ $# -ge 3 ]] || { + echo "Error: --minimize requires TARGET and FILE." >&2 exit 1 - fi + } + MODE="minimize" MINIMIZE_TARGET="$2" MINIMIZE_FILE="$3" shift 3 ;; --replay) - MODE="replay" - # --replay requires TARGET, optional DIR - if [[ $# -lt 2 ]]; then - log_error "--replay requires at least a TARGET argument" - echo "Usage: ./scripts/fuzz_atheris.sh --replay TARGET [DIR]" - echo "Example: ./scripts/fuzz_atheris.sh --replay structured" + [[ $# -ge 2 ]] || { + echo "Error: --replay requires TARGET." >&2 exit 1 - fi + } + MODE="replay" REPLAY_TARGET="$2" shift 2 - # Optional directory argument - if [[ $# -gt 0 ]] && [[ ! "$1" == --* ]]; then + if [[ $# -gt 0 && "$1" != --* ]]; then REPLAY_DIR="$1" shift fi ;; --report) + [[ $# -ge 2 ]] || { + echo "Error: --report requires TARGET." >&2 + exit 1 + } MODE="report" - if [[ -z "$TARGET" ]]; then - if [[ $# -lt 2 ]] || [[ "$2" == --* ]]; then - log_error "--report requires a TARGET argument" - echo "Usage: ./scripts/fuzz_atheris.sh --report TARGET" - exit 1 - fi - TARGET="$2" - shift - fi - shift + TARGET="$2" + shift 2 ;; - --workers) - if [[ $# -lt 2 ]] || [[ "$2" == --* ]]; then - log_error "--workers requires a positive integer argument" + --clean) + [[ $# -ge 2 ]] || { + echo "Error: --clean requires TARGET." >&2 exit 1 - fi - if ! [[ "$2" =~ ^[1-9][0-9]*$ ]]; then - log_error "--workers must be a positive integer, got: $2" + } + MODE="clean" + TARGET="$2" + shift 2 + ;; + --workers) + [[ $# -ge 2 && "$2" =~ ^[1-9][0-9]*$ ]] || { + echo "Error: --workers requires a positive integer." >&2 exit 1 - fi + } WORKERS="$2" shift 2 ;; --time) - if [[ $# -lt 2 ]] || [[ "$2" == --* ]]; then - log_error "--time requires a positive integer argument (seconds)" - exit 1 - fi - if ! [[ "$2" =~ ^[1-9][0-9]*$ ]]; then - log_error "--time must be a positive integer (seconds), got: $2" + [[ $# -ge 2 && "$2" =~ ^[1-9][0-9]*$ ]] || { + echo "Error: --time requires a positive integer." >&2 exit 1 - fi + } TIME_LIMIT="$2" shift 2 ;; - --verbose) + --verbose|-v) VERBOSE=1 shift ;; --quiet) QUIET=1 - RED='' GREEN='' YELLOW='' BLUE='' BOLD='' NC='' shift ;; --dry-run) @@ -961,33 +131,43 @@ while [[ $# -gt 0 ]]; do shift ;; --help|-h) - show_help - exit 0 + MODE="help" + shift ;; -*) - log_error "Unknown option: $1" - echo "Run './scripts/fuzz_atheris.sh --help' for usage." + echo "Error: Unknown option: $1" >&2 exit 1 ;; *) - if [[ -n "$TARGET" ]]; then - log_error "Multiple targets specified: $TARGET and $1" + [[ -z "$TARGET" && "$MODE" == "fuzz" ]] || { + echo "Error: Unexpected positional argument: $1" >&2 exit 1 - fi + } TARGET="$1" shift ;; esac done -# [SECTION: SIGNAL_HANDLING] -cleanup() { - # Cleanup on exit if needed (e.g., remove temp files) - : -} -trap cleanup EXIT INT TERM +require_fuzz_lib "$FUZZ_LIB_DIR/common.sh" +require_fuzz_lib "$FUZZ_LIB_DIR/commands.sh" + +# shellcheck source=scripts/lib/fuzz_atheris/common.sh +source "$FUZZ_LIB_DIR/common.sh" +# shellcheck source=scripts/lib/fuzz_atheris/commands.sh +source "$FUZZ_LIB_DIR/commands.sh" + +load_target_registry + +if mode_requires_atheris_env "$MODE"; then + require_devcontainer + pivot_to_atheris_env "${ORIGINAL_ARGS[@]}" +fi case "$MODE" in + help) + show_help + ;; setup) run_setup ;; @@ -997,6 +177,9 @@ case "$MODE" in corpus) run_corpus_health ;; + smoke) + run_smoke_all + ;; minimize) run_minimize "$MINIMIZE_TARGET" "$MINIMIZE_FILE" ;; @@ -1004,26 +187,14 @@ case "$MODE" in run_replay "$REPLAY_TARGET" "$REPLAY_DIR" ;; report) - if [[ -z "$TARGET" ]]; then - log_error "Target not specified for report." - show_help - exit 1 - fi + require_known_target "$TARGET" parse_and_display_report "$TARGET" ;; clean) - CLEAN_DIR="$PROJECT_ROOT/.fuzz_atheris_corpus/$TARGET" - if [[ ! -d "$CLEAN_DIR" ]]; then - echo -e "${YELLOW}[WARN]${NC} No corpus directory found for target '$TARGET': $CLEAN_DIR" - exit 0 - fi - echo -e "${BOLD}Cleaning corpus for target '$TARGET'...${NC}" - rm -rf "$CLEAN_DIR" - echo "Done. Removed: $CLEAN_DIR" + run_clean "$TARGET" ;; fuzz) if [[ -z "$TARGET" ]]; then - run_diagnostics show_help exit 0 fi diff --git a/scripts/fuzz_hypofuzz.sh b/scripts/fuzz_hypofuzz.sh index a033bef6..3211f6b9 100755 --- a/scripts/fuzz_hypofuzz.sh +++ b/scripts/fuzz_hypofuzz.sh @@ -15,7 +15,6 @@ # - [EXIT-CODE] N # ============================================================================== -# Bash Settings SCRIPT_VERSION="1.0.1" SCRIPT_NAME="fuzz_hypofuzz.sh" @@ -26,9 +25,15 @@ if [[ "${BASH_VERSINFO[0]}" -ge 5 ]]; then shopt -s inherit_errexit 2>/dev/null || true fi -# [SECTION: ENVIRONMENT_ISOLATION] PY_VERSION="${PY_VERSION:-3.13}" -TARGET_VENV=".venv-${PY_VERSION}" +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then + TARGET_VENV=".venv-devcontainer-${PY_VERSION}" +else + TARGET_VENV=".venv-${PY_VERSION}" +fi +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then + export UV_LINK_MODE="copy" +fi if [[ "${UV_PROJECT_ENVIRONMENT:-}" != "$TARGET_VENV" ]]; then if [[ "${FUZZ_ALREADY_PIVOTED:-}" == "1" ]]; then @@ -46,15 +51,13 @@ else unset FUZZ_ALREADY_PIVOTED fi -# [SECTION: SETUP] -# REQUIRED: Force TMPDIR to /tmp to avoid "AF_UNIX path too long" on macOS with HypoFuzz export TMPDIR="/tmp" SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" IS_GHA="${GITHUB_ACTIONS:-false}" +readonly FUZZ_LIB_DIR="$SCRIPT_DIR/lib/fuzz_hypofuzz" -# Defaults MODE="check" VERBOSE=false METRICS=false @@ -64,154 +67,26 @@ TARGET="" REPRO_TEST="" HEARTBEAT_ENABLED=true HEARTBEAT_INTERVAL_SEC="${FUZZ_HEARTBEAT_INTERVAL_SEC:-30}" +FORCE=false -# Colors (respects NO_COLOR standard and non-terminal detection) -if [[ "${NO_COLOR:-}" == "1" ]]; then - RED=""; GREEN=""; YELLOW=""; BLUE=""; CYAN=""; BOLD=""; RESET="" -elif [[ ! -t 1 ]]; then - RED=""; GREEN=""; YELLOW=""; BLUE=""; CYAN=""; BOLD=""; RESET="" -else - RED="\033[31m"; GREEN="\033[32m"; YELLOW="\033[33m"; BLUE="\033[34m"; CYAN="\033[36m"; BOLD="\033[1m"; RESET="\033[0m" -fi - -# psutil availability: cached once for heartbeat CPU/memory stats. -HAS_PSUTIL=false -python -c "import psutil" 2>/dev/null && HAS_PSUTIL=true || true - -# Logging (consistent with lint.sh and test.sh) -log_group_start() { [[ "$IS_GHA" == "true" ]] && echo "::group::$1"; echo -e "\n${BOLD}${CYAN}=== $1 ===${RESET}"; } -log_group_end() { [[ "$IS_GHA" == "true" ]] && echo "::endgroup::"; return 0; } -log_info() { echo -e "${BLUE}[INFO]${RESET} $1"; } -log_warn() { echo -e "${YELLOW}[WARN]${RESET} $1"; } -log_pass() { echo -e "${GREEN}[PASS]${RESET} $1"; } -log_fail() { echo -e "${RED}[FAIL]${RESET} $1"; } -log_err() { echo -e "${RED}[ERROR]${RESET} $1" >&2; } - -format_bytes() { - local bytes="$1" - if (( bytes >= 1048576 )); then - printf "%d MiB" $((bytes / 1048576)) - elif (( bytes >= 1024 )); then - printf "%d KiB" $((bytes / 1024)) - else - printf "%d B" "$bytes" - fi -} -last_nonempty_log_line() { - local log_file="$1" - local last_line - last_line=$(awk 'NF { line = $0 } END { print line }' "$log_file" 2>/dev/null || true) - last_line=${last_line//$'\r'/} - if [[ -z "$last_line" ]]; then - echo "awaiting first output" - return 0 - fi - if (( ${#last_line} > 160 )); then - echo "${last_line:0:157}..." - return 0 +require_fuzz_lib() { + local path="$1" + if [[ ! -f "$path" ]]; then + echo "[ERROR] Missing HypoFuzz helper library: $path" >&2 + exit 1 fi - echo "$last_line" -} -_heartbeat_daemon() { - # Background subshell: emits [HEARTBEAT] lines to stderr while watched_pid is alive. - # First beat fires at T+5s (short runs stay silent); subsequent beats every - # HEARTBEAT_INTERVAL_SEC seconds. Uses psutil for CPU/memory when available. - # - # Delta tracking: if the last log line has not changed since the previous beat, - # shows "(no new output, Xs)" instead of repeating the stale line. This prevents - # expected-but-frequent log messages (e.g., soft-error warnings from test iterations) - # from making the heartbeat appear stuck. - local watched_pid="$1" log_file="$2" start_sec="$3" - local prev_last_line="" prev_change_sec=$SECONDS - sleep 5 - while kill -0 "$watched_pid" 2>/dev/null; do - local elapsed=$(( SECONDS - start_sec )) - local log_bytes=0 - [[ -f "$log_file" ]] && log_bytes=$(wc -c < "$log_file" | tr -d '[:space:]') - local raw_last_line last_display - raw_last_line=$(last_nonempty_log_line "$log_file") - if [[ "$raw_last_line" == "$prev_last_line" ]]; then - local unchanged_sec=$(( SECONDS - prev_change_sec )) - last_display="(no new output, ${unchanged_sec}s)" - else - last_display="$raw_last_line" - prev_last_line="$raw_last_line" - prev_change_sec=$SECONDS - fi - if [[ "$HAS_PSUTIL" == "true" ]]; then - local stats - stats=$(python -c " -import psutil -try: - p = psutil.Process(${watched_pid}) - all_procs = [p] + p.children(recursive=True) - cpu = sum(x.cpu_percent(interval=0.2) for x in all_procs) - mem_mb = sum(x.memory_info().rss for x in all_procs) // 1048576 - print(f'CPU={cpu:.0f}% MEM={mem_mb}MB procs={len(all_procs)}') -except Exception: - print('CPU=? MEM=? procs=?') -" 2>/dev/null || echo "CPU=? MEM=? procs=?") - echo "[HEARTBEAT] T+${elapsed}s | ${stats} | log=$(format_bytes "$log_bytes") | last: ${last_display}" >&2 - else - echo "[HEARTBEAT] T+${elapsed}s | log=$(format_bytes "$log_bytes") | last: ${last_display}" >&2 - fi - sleep "$HEARTBEAT_INTERVAL_SEC" - done } -_run_with_heartbeat() { - # Run a command with FIFO capture and heartbeat. - # Usage: _run_with_heartbeat LOG_FILE APPEND -- CMD [ARGS...] - # APPEND: "true" to append to log, "false" to overwrite - # [HEARTBEAT] lines go to stderr; command output goes to log and - # optionally to stdout (when VERBOSE=true). - local log_file="$1" append="$2"; shift 2 - if [[ "$1" == "--" ]]; then shift; fi - local fifo - fifo=$(mktemp -u) - mkfifo "$fifo" - - "$@" > "$fifo" 2>&1 & - local cmd_pid=$! - PID_LIST+=("$cmd_pid") - local hb_pid=0 - if [[ "$HEARTBEAT_ENABLED" == "true" && "$HEARTBEAT_INTERVAL_SEC" -gt 0 ]]; then - _heartbeat_daemon "$cmd_pid" "$log_file" "$SECONDS" & - hb_pid=$! - PID_LIST+=("$hb_pid") - fi - - if [[ "$VERBOSE" == "true" ]]; then - if [[ "$append" == "true" ]]; then - tee -a "$log_file" < "$fifo" || true - else - tee "$log_file" < "$fifo" || true - fi - else - if [[ "$append" == "true" ]]; then - cat < "$fifo" >> "$log_file" || true - else - cat < "$fifo" > "$log_file" || true - fi - fi +require_fuzz_lib "$FUZZ_LIB_DIR/common.sh" +require_fuzz_lib "$FUZZ_LIB_DIR/modes_check.sh" +require_fuzz_lib "$FUZZ_LIB_DIR/modes_fuzz.sh" - # wait returns the exit code of the process; 2>/dev/null suppresses - # "no such process" if _on_signal already killed cmd_pid. - wait "$cmd_pid" 2>/dev/null - local exit_code=$? - - if [[ "$hb_pid" -gt 0 ]]; then - kill "$hb_pid" 2>/dev/null || true - wait "$hb_pid" 2>/dev/null || true - fi - - # All managed processes are done; clear PID_LIST. - # (_on_signal may have already cleared it; this is a no-op in that case.) - PID_LIST=() - - rm -f "$fifo" - return "$exit_code" -} +# shellcheck source=scripts/lib/fuzz_hypofuzz/common.sh +source "$FUZZ_LIB_DIR/common.sh" +# shellcheck source=scripts/lib/fuzz_hypofuzz/modes_check.sh +source "$FUZZ_LIB_DIR/modes_check.sh" +# shellcheck source=scripts/lib/fuzz_hypofuzz/modes_fuzz.sh +source "$FUZZ_LIB_DIR/modes_fuzz.sh" show_help() { local project_name="Project" @@ -307,15 +182,11 @@ NOTE: All modes emit periodic [HEARTBEAT] lines to stderr (T+5s first beat, then every 30s). Each line shows elapsed time, CPU%, memory, log size, - and the last log line — letting agents distinguish working from hung. + and the last log line - letting agents distinguish working from hung. Suppress with --no-heartbeat or set FUZZ_HEARTBEAT_INTERVAL_SEC=0. HELPEOF } -# Global state modifications -FORCE=false - -# Strict Argument Parser while [[ $# -gt 0 ]]; do case "$1" in --deep|--list|--clean|--repro|--preflight) @@ -351,915 +222,12 @@ while [[ $# -gt 0 ]]; do esac done -# [SECTION: SIGNAL_HANDLING] -# Two-handler design: _on_exit fires only on EXIT (prints [EXIT-CODE] once); -# _on_signal fires on INT/TERM (kills managed PIDs, sets flag, returns without -# printing or exiting so bash resumes execution naturally). PID_LIST=() _SIGNAL_RECEIVED=false -_on_exit() { - local exit_code=$? - local pid - for pid in "${PID_LIST[@]}"; do - kill -TERM "$pid" 2>/dev/null || true - done - [[ ${#PID_LIST[@]} -gt 0 ]] && wait "${PID_LIST[@]}" 2>/dev/null || true - echo "[EXIT-CODE] $exit_code" >&2 -} - -_on_signal() { - _SIGNAL_RECEIVED=true - local pid - for pid in "${PID_LIST[@]}"; do - kill -TERM "$pid" 2>/dev/null || true - done - PID_LIST=() -} - trap '_on_exit' EXIT trap '_on_signal' INT TERM -# ============================================================================= -# Subroutines -# ============================================================================= - -# [SECTION: DIAGNOSTICS] -run_diagnostics() { - log_group_start "Pre-Flight Diagnostics" - - echo "[ INFO ] Script : $SCRIPT_NAME v$SCRIPT_VERSION" - - local python_version - python_version=$(python --version 2>&1 | grep -oE '[0-9]+\.[0-9]+' | head -1) - echo "[ OK ] Python : $python_version" - - if python -c "import hypothesis" &>/dev/null; then - local hypo_version - hypo_version=$(python -c "import hypothesis; print(hypothesis.__version__)") - echo "[ OK ] Hypothesis : $hypo_version" - else - echo "[ FAIL ] Hypothesis : MISSING" - log_err "Hypothesis not installed. Run 'uv sync' to install dependencies." - exit 1 - fi - - log_pass "System is ready." - log_group_end -} - -# ============================================================================= -# Preflight Infrastructure Audit -# ============================================================================= - -run_preflight() { - log_group_start "Preflight Infrastructure Audit" - - # Capture the Python audit exit code separately: the heredoc subprocess exits 0 or 1 - # but 'set +e' (active from main dispatch) prevents it from aborting the function. - # Without explicit capture, log_group_end's exit 0 overwrites the audit result. - local audit_exit=0 - - # AST-based per-test event checking via Python - python << PREFLIGHT_EOF || audit_exit=$? -import ast -import re -import sys -from pathlib import Path -from collections import defaultdict - -tests_dir = Path("$PROJECT_ROOT/tests") -strategies_dir = tests_dir / "strategies" - -# ---- Pass 1: File-level metrics ---- -given_count = 0 -given_by_file = defaultdict(int) -event_count = 0 -event_by_file = defaultdict(int) - -for py_file in tests_dir.rglob("*.py"): - try: - content = py_file.read_text() - g_matches = len(re.findall(r'@given\(', content)) - if g_matches > 0: - given_count += g_matches - given_by_file[py_file.relative_to(tests_dir)] = g_matches - e_matches = len(re.findall(r'(? 0: - event_count += e_matches - event_by_file[py_file.relative_to(tests_dir)] = e_matches - except Exception: - pass - -# ---- Pass 2: Fuzz module identification ---- -fuzz_modules = [] -fuzz_modules_without_events = [] -for py_file in tests_dir.rglob("*.py"): - try: - # Skip infrastructure files (conftest contains marker registration, not tests) - if py_file.name == "conftest.py": - continue - content = py_file.read_text() - if "pytest.mark.fuzz" in content or "pytestmark = pytest.mark.fuzz" in content: - rel_path = py_file.relative_to(tests_dir) - fuzz_modules.append(str(rel_path)) - # Only flag as gap if the module has @given tests but no events - has_given = given_by_file.get(rel_path, 0) > 0 - has_events = rel_path in event_by_file - if has_given and not has_events: - fuzz_modules_without_events.append(str(rel_path)) - except Exception: - pass - -# ---- Pass 3: Per-test event checking (AST-based, ALL @given tests) ---- -# Every @given test discovered by HypoFuzz must emit event() for semantic guidance. -tests_without_events = [] -for py_file in tests_dir.rglob("*.py"): - try: - content = py_file.read_text() - if "@given" not in content: - continue - tree = ast.parse(content, filename=str(py_file)) - rel_path = str(py_file.relative_to(tests_dir)) - - for node in ast.walk(tree): - if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): - continue - # Check for @given decorator (direct or via attribute) - has_given = any( - isinstance(dec, ast.Call) - and ( - (isinstance(dec.func, ast.Name) and dec.func.id == "given") - or ( - isinstance(dec.func, ast.Attribute) - and dec.func.attr == "given" - ) - ) - for dec in node.decorator_list - ) - if not has_given: - continue - # Check if function body contains event() call - has_event = any( - isinstance(child, ast.Call) - and isinstance(child.func, ast.Name) - and child.func.id == "event" - for child in ast.walk(node) - ) - if not has_event: - tests_without_events.append(f"{rel_path}::{node.name}") - except Exception: - pass - -# ---- Pass 4: Strategy analysis ---- -# __init__.py is a pure re-export aggregator; event() calls belong in domain modules. -_STRATEGY_REEXPORT_FILES = {"__init__.py"} -strategy_coverage = {} -strategy_gaps = [] -has_strategies_dir = strategies_dir.exists() - -if has_strategies_dir: - for strat_file in strategies_dir.glob("*.py"): - try: - if strat_file.name in _STRATEGY_REEXPORT_FILES: - continue - content = strat_file.read_text() - events = len(re.findall(r'(? 0 else "[ FAIL ]" - print(f" {status} {name:<20} {count} events") - print() - - if strategy_gaps: - print("[FAIL] Strategy files without event() calls (HypoFuzz guidance gap):") - for name in sorted(strategy_gaps): - print(f" [ FAIL ] {name}") - print() -else: - print("[ INFO ] No tests/strategies directory found (skipped strategy audit)") - print() - -if fuzz_modules_without_events: - print("[WARN] Fuzz Modules WITHOUT Events (File-Level Gap):") - for mod in sorted(fuzz_modules_without_events): - given = given_by_file.get(Path(mod), 0) - print(f" [ WARN ] {mod} ({given} @given tests, 0 events)") - print() -else: - print("[ OK ] All fuzz modules have events (file-level)") - print() - -if tests_without_events: - print("[FAIL] @given Tests WITHOUT event() Calls (ALL test files):") - for test_id in sorted(tests_without_events): - print(f" [ FAIL ] {test_id}") - print() -else: - print("[ OK ] All @given tests emit events (per-test, all files)") - print() - -# Summary -gaps = len(fuzz_modules_without_events) + len(tests_without_events) + len(strategy_gaps) -if gaps > 0: - print(f"[FAIL] {gaps} gap(s) detected. Add hypothesis.event() calls for semantic guidance.") - sys.exit(1) -else: - print("[ OK ] Infrastructure audit passed. Run --deep for coverage-guided fuzzing.") -PREFLIGHT_EOF - - log_group_end - return "$audit_exit" -} - -# ============================================================================= -# Property Test Runner (check mode) -# ============================================================================= - -run_check() { - run_diagnostics - log_group_start "Property Tests" - - # Set profile based on verbose flag - if [[ "$VERBOSE" == "true" ]]; then - export HYPOTHESIS_PROFILE="verbose" - fi - - # Determine target - local test_target="${TARGET:-tests/}" - - # Verify target exists - if [[ ! -e "$test_target" ]]; then - log_err "Target not found: $test_target" - log_group_end - return 1 - fi - - log_info "Target: $test_target" - if [[ "$VERBOSE" == "true" ]]; then - log_info "Profile: verbose" - else - log_info "Profile: default (dev)" - fi - - # Log Capture - local temp_log - temp_log=$(mktemp) - - local cmd=(uv run --python "$PY_VERSION" pytest "$test_target" -v --tb=short) - - local exit_code=0 - set +e - _run_with_heartbeat "$temp_log" false -- "${cmd[@]}" - exit_code=$? - set -e - - # Log Parsing via Python - python << PYEOF -import json, re -from datetime import datetime, timezone -from pathlib import Path - -log_path = Path("$temp_log") -exit_code = $exit_code - -try: - log_content = log_path.read_text() if log_path.exists() else "" -except Exception: - log_content = "" - -# Parse metrics from the definitive summary line -# Example: "=== 1 failed, 123 passed, 2 skipped in 1.12s ===" -summary_match = re.search(r'=+ (.*?) =+', log_content) -summary_text = summary_match.group(1) if summary_match else "" - -passed_match = re.search(r'(\d+) passed', summary_text) -failed_match = re.search(r'(\d+) failed', summary_text) -skipped_match = re.search(r'(\d+) skipped', summary_text) - -tests_passed = int(passed_match.group(1)) if passed_match else 0 -tests_failed = int(failed_match.group(1)) if failed_match else 0 -tests_skipped = int(skipped_match.group(1)) if skipped_match else 0 - -hypo_count = log_content.count('Falsifying example') - -# Extract individual test failures -failures = [] -failed_test_pattern = r'FAILED (tests/.+?)(?: - |$)' -failed_tests = sorted(list(set(re.findall(failed_test_pattern, log_content)))) - -for test_path in failed_tests: - failure_entry = {"test": test_path} - test_section_start = log_content.find(test_path) - if test_section_start != -1: - test_section = log_content[test_section_start:test_section_start + 2000] - error_match = re.search(r'E\s+(\w+Error|\w+Exception):', test_section) - if error_match: - failure_entry["error_type"] = error_match.group(1) - if 'Falsifying example' in log_content: - test_func = test_path.split("::")[-1] if "::" in test_path else "" - example_pattern = rf'Falsifying example:\s*{re.escape(test_func)}\(([^\)]+)\)' - example_match = re.search(example_pattern, log_content, re.DOTALL) - if example_match: - failure_entry["example"] = example_match.group(1).strip()[:500] - failures.append(failure_entry) - -# Legacy field -fail_ex = "" -if 'Falsifying example' in log_content: - try: - fail_ex = log_content.split('Falsifying example')[1].split('\n')[0][:200].strip() - except IndexError: - pass - -# Status determination -if exit_code == 0: - status = 'pass' -elif exit_code in (130, 2): - status = 'stopped' -elif tests_failed > 0 or hypo_count > 0: - status = 'finding' -else: - status = 'error' - -report = { - 'script': '$SCRIPT_NAME', - 'script_version': '$SCRIPT_VERSION', - 'mode': 'check', - 'status': status, - 'timestamp': datetime.now(timezone.utc).isoformat(), - 'tests_passed': tests_passed, - 'tests_failed': tests_failed, - 'tests_skipped': tests_skipped, - 'hypothesis_failures': hypo_count, - 'falsifying_example': fail_ex, - 'failures': failures, - 'exit_code': exit_code -} -print('[SUMMARY-JSON-BEGIN]') -print(json.dumps(report, indent=2)) -print('[SUMMARY-JSON-END]') -PYEOF - - # Visual Feedback - if [[ $exit_code -eq 0 ]]; then - log_pass "All property tests passed." - elif [[ $exit_code -eq 130 || $exit_code -eq 2 ]]; then - log_info "Run interrupted by user." - elif [[ $exit_code -eq 1 ]]; then - log_fail "Failures detected. See JSON summary above." - if [[ "$VERBOSE" == "false" ]]; then - log_warn "Failure output:" - if [[ -s "$temp_log" ]]; then - grep -A 20 "Falsifying example" "$temp_log" || head -n 20 "$temp_log" - fi - fi - else - log_err "Test execution failed (code $exit_code)." - fi - - rm -f "$temp_log" - log_group_end - return "$exit_code" -} - -# ============================================================================= -# Continuous HypoFuzz (deep mode) -# ============================================================================= - -run_deep() { - run_diagnostics - - # Determine mode title: --metrics uses pytest (single-pass), else HypoFuzz (continuous) - if [[ "$METRICS" == "true" ]]; then - log_group_start "Deep Fuzzing (pytest with metrics)" - else - log_group_start "Continuous HypoFuzz" - fi - - # Activate hypofuzz profile: deadline=None, suppress health checks - export HYPOTHESIS_PROFILE="hypofuzz" - - # Log file for this session (append to preserve history) - local log_file="$PROJECT_ROOT/.hypothesis/hypofuzz.log" - mkdir -p "$PROJECT_ROOT/.hypothesis" - - # When --metrics is enabled, use pytest instead of HypoFuzz. - # HypoFuzz uses multiprocessing; STRATEGY_METRICS is only exported here, - # so each worker does not independently print zero-event summaries. - if [[ "$METRICS" == "true" ]]; then - export STRATEGY_METRICS="1" - export STRATEGY_METRICS_DETAILED="1" - export STRATEGY_METRICS_LIVE="1" - export STRATEGY_METRICS_INTERVAL="10" - log_info "Metrics: Per-strategy breakdown enabled (10s interval)" - log_info "Metrics: Using pytest (HypoFuzz multiprocessing incompatible with metrics)" - log_info "Profile: hypofuzz (deadline=None)" - - # Session header - { - echo "" - echo "================================================================================" - echo "Metrics Session (pytest -m fuzz): $(date '+%Y-%m-%d %H:%M:%S')" - echo "Profile: hypofuzz" - echo "================================================================================" - } >> "$log_file" - - local exit_code=0 - set +e - _run_with_heartbeat "$log_file" true -- uv run --python "$PY_VERSION" pytest tests/ -m fuzz -v --tb=short - exit_code=$? - set -e - - if [[ $exit_code -ne 0 ]]; then - log_fail "Metrics session failed (exit $exit_code). Last 80 lines of log:" - tail -n 80 "$log_file" - fi - - python << METRICS_PYEOF -import json, re -from datetime import datetime, timezone -from pathlib import Path - -log_path = Path("$log_file") -exit_code = $exit_code -try: - log_content = log_path.read_text() if log_path.exists() else "" -except Exception: - log_content = "" - -summary_match = re.search(r'=+ (.*?) =+\n*$', log_content, re.MULTILINE) -summary_text = summary_match.group(1) if summary_match else "" -passed_m = re.search(r'(\d+) passed', summary_text) -failed_m = re.search(r'(\d+) failed', summary_text) -skipped_m = re.search(r'(\d+) skipped', summary_text) -hypo_count = log_content.count('Falsifying example') - -report = { - 'script': '$SCRIPT_NAME', - 'script_version': '$SCRIPT_VERSION', - 'mode': 'deep_metrics', - 'status': 'pass' if exit_code == 0 else 'fail', - 'timestamp': datetime.now(timezone.utc).isoformat(), - 'tests_passed': int(passed_m.group(1)) if passed_m else 0, - 'tests_failed': int(failed_m.group(1)) if failed_m else 0, - 'tests_skipped': int(skipped_m.group(1)) if skipped_m else 0, - 'hypothesis_failures': hypo_count, - 'exit_code': exit_code, - 'log_file': str(log_path), -} -print('[SUMMARY-JSON-BEGIN]') -print(json.dumps(report, indent=2)) -print('[SUMMARY-JSON-END]') -METRICS_PYEOF - - log_group_end - return "$exit_code" - fi - - if [[ -n "$TIME_LIMIT" ]]; then - log_info "Time Limit: ${TIME_LIMIT}s" - else - log_info "Time Limit: Until Ctrl+C" - fi - log_info "Workers: $WORKERS" - log_info "Profile: hypofuzz (deadline=None)" - - # session_log_start: byte offset at the start of this --deep invocation. - # Used to count failures across ALL restarts in the session without - # double-counting evidence from prior sessions in the append-only log. - local session_log_start=0 - [[ -f "$log_file" ]] && session_log_start=$(wc -c < "$log_file" | tr -d ' ') - - local exit_code=0 - local teardown_race_detected=false - local restart_count=0 - local max_teardown_restarts=20 - - # Teardown race detection (HypoFuzz bug — hypofuzz.py FuzzWorkerHub.start): - # When HypoFuzz completes a full exploration pass, FuzzWorkerHub.start() - # breaks out of its poll loop and exits the `with Manager()` block without - # first terminating worker processes. Manager.__exit__() closes the IPC - # socket; workers crash on their next proxy access. This is a HypoFuzz bug, - # not a test failure. Failure mode differs by Python version: - # Python 3.13: BrokenPipeError (socket open at connect, write fails) - # Python 3.14: FileNotFoundError (Manager deletes socket file before - # worker _incref() reconnect; managers.py:863) - # - # Resolution: auto-restart. The Hypothesis database is preserved between - # restarts so exploration continues exactly where it left off. For --time N, - # single run only (user wants bounded total time; restarts would exceed it). - # For continuous --deep (no --time): loop until Ctrl+C, restarting on race. - # - # run_log_start is captured before each individual HypoFuzz invocation so - # teardown detection is scoped to that run's log window only — avoids - # false positives from prior runs' BrokenPipeError evidence. - - if [[ -n "$TIME_LIMIT" ]]; then - # Time-limited single run: no auto-restart (user wants bounded session). - { - echo "" - echo "================================================================================" - echo "HypoFuzz Session: $(date '+%Y-%m-%d %H:%M:%S')" - echo "Script: $SCRIPT_NAME v$SCRIPT_VERSION" - echo "Workers: $WORKERS" - echo "Profile: hypofuzz" - echo "================================================================================" - } >> "$log_file" - - local run_log_start=0 - [[ -f "$log_file" ]] && run_log_start=$(wc -c < "$log_file" | tr -d ' ') - - set +e - # timeout(1) sends SIGTERM after TIME_LIMIT seconds; exit 124 = time limit reached - _run_with_heartbeat "$log_file" true -- timeout "$TIME_LIMIT" uv run --python "$PY_VERSION" hypothesis fuzz --no-dashboard -n "$WORKERS" tests/fuzz/ - exit_code=$? - set -e - [[ $exit_code -eq 124 ]] && exit_code=0 # time limit reached is a clean stop - - if [[ "$_SIGNAL_RECEIVED" == "true" && $exit_code -ne 0 ]]; then exit_code=130; fi - - # Teardown race check (time-limited: report but don't restart). - # Detection: worker subprocess crash (_start_worker in traceback) that - # went through the multiprocessing.managers proxy layer (managers.py in - # traceback). The exception class varies by timing and Python version: - # BrokenPipeError — write to closed Manager socket (3.13 typical) - # FileNotFoundError — socket file deleted before connect (3.14) - # EOFError — Manager closed auth challenge mid-recv (3.14) - # Matching on exception class is fragile; _start_worker + managers.py - # is the invariant that covers all variants. - local _log_window - _log_window=$(tail -c "+$((run_log_start + 1))" "$log_file" 2>/dev/null || true) - if [[ $exit_code -ne 0 && $exit_code -ne 130 && $exit_code -ne 120 ]] \ - && [[ -f "$log_file" ]] \ - && echo "$_log_window" | grep -qF "_start_worker" 2>/dev/null \ - && echo "$_log_window" | grep -qF "managers.py" 2>/dev/null; then - log_warn "Worker teardown race detected (HypoFuzz bug, exit $exit_code)." - log_warn "Worker crashed on Manager proxy access after shutdown — no test failures." - log_warn "Re-run ./scripts/fuzz_hypofuzz.sh --deep to continue (database is preserved)." - teardown_race_detected=true - exit_code=0 - fi - else - # Continuous mode: auto-restart on teardown race until Ctrl+C. - while true; do - local run_log_start=0 - [[ -f "$log_file" ]] && run_log_start=$(wc -c < "$log_file" | tr -d ' ') - - { - echo "" - echo "================================================================================" - if [[ $restart_count -eq 0 ]]; then - echo "HypoFuzz Session: $(date '+%Y-%m-%d %H:%M:%S')" - else - echo "HypoFuzz Restart #${restart_count}: $(date '+%Y-%m-%d %H:%M:%S')" - fi - echo "Script: $SCRIPT_NAME v$SCRIPT_VERSION" - echo "Workers: $WORKERS" - echo "Profile: hypofuzz" - echo "================================================================================" - } >> "$log_file" - - set +e - _run_with_heartbeat "$log_file" true -- uv run --python "$PY_VERSION" hypothesis fuzz --no-dashboard -n "$WORKERS" tests/fuzz/ - exit_code=$? - set -e - - if [[ "$_SIGNAL_RECEIVED" == "true" ]]; then - [[ $exit_code -ne 0 ]] && exit_code=130 - break - fi - - [[ $exit_code -eq 0 || $exit_code -eq 120 ]] && break - - # Check teardown race scoped to THIS run's log window. - # Invariant: worker subprocess crash (_start_worker in traceback) - # via the multiprocessing.managers proxy (managers.py in traceback). - # Exception class varies by timing: BrokenPipeError (3.13 typical), - # FileNotFoundError (3.14, socket deleted before connect), EOFError - # (3.14, Manager closes auth challenge mid-recv). Matching on - # _start_worker + managers.py covers all variants. - local _log_window - _log_window=$(tail -c "+$((run_log_start + 1))" "$log_file" 2>/dev/null || true) - if [[ $exit_code -ne 130 ]] \ - && [[ -f "$log_file" ]] \ - && echo "$_log_window" | grep -qF "_start_worker" 2>/dev/null \ - && echo "$_log_window" | grep -qF "managers.py" 2>/dev/null; then - - teardown_race_detected=true - (( restart_count++ )) || true - - if [[ $restart_count -gt $max_teardown_restarts ]]; then - log_warn "Teardown race repeated $restart_count times — giving up (max $max_teardown_restarts)." - exit_code=1 - break - fi - - log_info "Teardown race (${restart_count}/${max_teardown_restarts}) — restarting automatically (database preserved)." - sleep 1 - continue - fi - - # Non-race error exit — don't restart - break - done - fi - - # Count failures across ALL runs in this session (from session_log_start) - local failure_count=0 - if [[ -f "$log_file" ]]; then - failure_count=$(tail -c "+$((session_log_start + 1))" "$log_file" | grep -c "Falsifying example" 2>/dev/null) || failure_count=0 - fi - - if [[ $exit_code -eq 0 || $exit_code -eq 130 || $exit_code -eq 120 ]]; then - # 0 = Done, 130 = SIGINT (Ctrl+C), 120 = HypoFuzz Interrupted - log_pass "Fuzzing session ended." - - if [[ "$failure_count" -gt 0 ]]; then - log_warn "$failure_count falsifying example(s) found in this session." - echo " View log: cat $log_file" - echo " List failures: ./scripts/fuzz_hypofuzz.sh --list" - fi - - # Event diversity summary - log_group_start "Event Infrastructure" - python << EVENTEOF -import re -from pathlib import Path - -tests_dir = Path("$PROJECT_ROOT/tests") - -event_count = 0 -for py_file in tests_dir.rglob("*.py"): - try: - content = py_file.read_text() - event_count += len(re.findall(r'(? 0: - example_pattern = r'Falsifying example:\s*(\w+)\(([^)]+)\)' - for match in re.finditer(example_pattern, log_content): - test_name = match.group(1) - example_args = match.group(2).strip()[:500] - failures.append({"test": test_name, "example": example_args}) - -# teardown_race in status only when the FINAL exit was a race (max restarts -# exhausted). Transparent restarts show up in teardown_restarts only. -if teardown_race and exit_code != 0: - status = "teardown_race" -elif exit_code == 120: - status = "interrupted" -else: - status = "pass" - -report = { - "script": "$SCRIPT_NAME", - "script_version": "$SCRIPT_VERSION", - "mode": "deep", - "status": status, - "timestamp": datetime.now(timezone.utc).isoformat(), - "failures_count": failure_count, - "failures": failures[:50], - "exit_code": exit_code, - "teardown_restarts": restart_count, - "log_file": "$log_file" -} -print("[SUMMARY-JSON-BEGIN]") -print(json.dumps(report, indent=2)) -print("[SUMMARY-JSON-END]") -PYEOF - else - log_err "HypoFuzz exited with error code $exit_code." - - # Check for common macOS issue - if grep -q "AF_UNIX path too long" "$log_file"; then - log_warn "AF_UNIX path too long detected. TMPDIR is set to $TMPDIR." - fi - - python << PYEOF -import json -from datetime import datetime, timezone - -report = { - "script": "$SCRIPT_NAME", - "script_version": "$SCRIPT_VERSION", - "mode": "deep", - "status": "error", - "timestamp": datetime.now(timezone.utc).isoformat(), - "failures_count": $failure_count, - "exit_code": $exit_code, - "log_file": "$log_file" -} -print("[SUMMARY-JSON-BEGIN]") -print(json.dumps(report, indent=2)) -print("[SUMMARY-JSON-END]") -PYEOF - log_group_end - return "$exit_code" - fi - - log_group_end - return "$exit_code" -} - -# ============================================================================= -# List Failures -# ============================================================================= - -run_list() { - local examples_dir="$PROJECT_ROOT/.hypothesis/examples" - local fuzz_log="$PROJECT_ROOT/.hypothesis/hypofuzz.log" - - log_group_start "Hypothesis Failure Reproduction Info" - - log_info "How Hypothesis failures work:" - echo " 1. When a property test fails, Hypothesis shrinks to a minimal example" - echo " 2. The shrunk example is stored in .hypothesis/examples/ (SHA-384 hashed)" - echo " 3. On re-run, Hypothesis AUTOMATICALLY replays the stored failure" - echo " 4. Simply running 'uv run pytest tests/' will reproduce all known failures" - echo "" - - # Check if examples database exists - if [[ -d "$examples_dir" ]]; then - local count - count=$(find "$examples_dir" -type f 2>/dev/null | wc -l | tr -d ' ') - log_pass ".hypothesis/examples/ exists with $count entries" - else - log_warn "No .hypothesis/examples/ directory found." - echo " Run some Hypothesis tests first to populate the database." - fi - echo "" - - # Check for HypoFuzz log - if [[ -f "$fuzz_log" ]]; then - log_info "Recent HypoFuzz session log: $fuzz_log" - local failure_count=0 - failure_count=$(grep -c "Falsifying example" "$fuzz_log" 2>/dev/null) || failure_count=0 - if [[ "$failure_count" -gt 0 ]]; then - log_warn "Found $failure_count falsifying example(s) in log." - echo "" - echo "Recent failures:" - grep -B2 "Falsifying example" "$fuzz_log" | tail -20 - else - echo " No failures recorded in latest session." - fi - else - log_info "HypoFuzz log: Not found (run --deep to create)" - fi - echo "" - - echo "To reproduce a specific failing test:" - echo " ./scripts/fuzz_hypofuzz.sh --repro test_module::test_function" - echo "" - echo "To reproduce all failures:" - echo " uv run pytest tests/ -x -v" - echo "" - echo "To extract @example decorator:" - echo " uv run python scripts/fuzz_hypofuzz_repro.py --example test_module::test_function" - - log_group_end -} - -# ============================================================================= -# Clean Hypothesis Database -# ============================================================================= - -run_clean() { - local hypothesis_dir="$PROJECT_ROOT/.hypothesis" - local fuzz_log="$hypothesis_dir/hypofuzz.log" - - if [[ ! -d "$hypothesis_dir" ]]; then - log_info "No .hypothesis/ directory found. Nothing to clean." - return 0 - fi - - local example_count - example_count=$(find "$hypothesis_dir/examples" -type f 2>/dev/null | wc -l | tr -d ' ') - - log_group_start "Hypothesis Database Cleanup" - echo "Directory: $hypothesis_dir" - echo "Examples: $example_count cached entries" - if [[ -f "$fuzz_log" ]]; then - echo "Log: $(wc -l < "$fuzz_log" | tr -d ' ') lines" - fi - echo "" - if [[ "$FORCE" == "true" ]]; then - rm -rf "$hypothesis_dir" - log_pass "Removed .hypothesis/ directory (forced)." - else - # Prevent hanging in non-interactive CI environments - if [[ ! -t 0 ]]; then - log_err "Non-interactive environment detected. You must use --force to clean the database." - exit 1 - fi - - log_warn "Removing .hypothesis/ will:" - echo " - Delete all cached examples (regression database)" - echo " - Delete any shrunk failure examples" - echo " - Require tests to rediscover edge cases" - echo "" - read -r -p "Remove .hypothesis/ directory? (y/N): " response - case "$response" in - [yY][eE][sS]|[yY]) - rm -rf "$hypothesis_dir" - log_pass "Removed .hypothesis/ directory." - ;; - *) - log_info "Cancelled." - ;; - esac - fi - log_group_end -} - -# ============================================================================= -# Reproduce Failures -# ============================================================================= - -run_repro() { - if [[ -z "$REPRO_TEST" ]]; then - log_err "Missing test argument for --repro" - echo "Usage: ./scripts/fuzz_hypofuzz.sh --repro " - echo "" - echo "Examples:" - echo " ./scripts/fuzz_hypofuzz.sh --repro tests/fuzz/test_syntax_parser_property.py::test_roundtrip" - echo " ./scripts/fuzz_hypofuzz.sh --repro tests/fuzz/test_syntax_parser_property.py" - return 1 - fi - - log_group_start "Reproduce Hypothesis Failure" - log_info "Test: $REPRO_TEST" - - local exit_code=0 - set +e - uv run --python "$PY_VERSION" python scripts/fuzz_hypofuzz_repro.py --verbose --example "$REPRO_TEST" - exit_code=$? - set -e - - if [[ $exit_code -eq 0 ]]; then - log_pass "Test passed - no failure to reproduce." - echo "If you expected a failure, the bug may have been fixed or the" - echo ".hypothesis/examples/ database may need to be cleared." - fi - - log_group_end - return "$exit_code" -} - -# ============================================================================= -# Main Dispatch -# ============================================================================= - set +e case "$MODE" in check) run_check ;; diff --git a/scripts/lib/fuzz_atheris/commands.sh b/scripts/lib/fuzz_atheris/commands.sh new file mode 100755 index 00000000..2523df58 --- /dev/null +++ b/scripts/lib/fuzz_atheris/commands.sh @@ -0,0 +1,290 @@ +#!/usr/bin/env bash + +show_help() { + cat < [OPTIONS] + ./scripts/fuzz_atheris.sh --setup [TARGET] + ./scripts/fuzz_atheris.sh --list + ./scripts/fuzz_atheris.sh --corpus + ./scripts/fuzz_atheris.sh --smoke-all [--time N] + ./scripts/fuzz_atheris.sh --minimize TARGET FILE + ./scripts/fuzz_atheris.sh --replay TARGET [DIR] + ./scripts/fuzz_atheris.sh --report TARGET + ./scripts/fuzz_atheris.sh --clean TARGET + +TARGETS: +EOF + for target in "${TARGET_ORDER[@]}"; do + printf ' %-22s %s\n' "$target" "${TARGET_DESCRIPTIONS[$target]}" + done + cat < 20: + print(f" ... and {len(crashes) - 20} more") +else: + print(" none") + +print("\nFindings") +finding_dirs = sorted(corpus_root.glob("*/findings")) +found_any = False +for finding_dir in finding_dirs: + meta_files = sorted(finding_dir.glob("*_meta.json")) + if not meta_files: + continue + found_any = True + print(f" {finding_dir}:") + for meta_file in meta_files[:10]: + try: + payload = json.loads(meta_file.read_text(encoding="utf-8")) + except json.JSONDecodeError: + print(f" {meta_file.name}: unreadable metadata") + continue + pattern = payload.get("pattern", "unknown") + source_len = payload.get("source_len", "?") + diff_offset = payload.get("diff_offset", "?") + print( + f" {meta_file.name}: pattern={pattern} source={source_len}chars diff@byte{diff_offset}" + ) + +if not found_any: + print(" none") +PY +} + +run_corpus_health() { + require_file "$ATHERIS_HEALTH_SCRIPT" + check_atheris_environment + python "$ATHERIS_HEALTH_SCRIPT" +} + +run_smoke_all() { + local per_target_time="${TIME_LIMIT:-3}" + local target="" + + check_atheris_environment + log_info "Running bounded Atheris smoke sweep across ${#TARGET_ORDER[@]} targets (${per_target_time}s each)" + + for target in "${TARGET_ORDER[@]}"; do + TIME_LIMIT="$per_target_time" run_fuzz_target "$target" + done +} + +parse_and_display_report() { + local target="$1" + local report_file="$ATHERIS_CORPUS_ROOT/$target/fuzz_${target}_report.json" + + if [[ ! -f "$report_file" ]]; then + log_warn "no campaign summary found for target '$target'" + return 0 + fi + + python3 - "$report_file" <<'PY' +from __future__ import annotations + +import json +import sys +from pathlib import Path + +report_path = Path(sys.argv[1]) +payload = json.loads(report_path.read_text(encoding="utf-8")) + +def fmt_int(value: object) -> str: + return f"{int(value):,}" + +print("\nFuzzing Campaign Summary") +print(f"Status : {payload.get('status', 'unknown')}") +print(f"Iterations : {fmt_int(payload.get('iterations', 0))}") +print(f"Findings : {fmt_int(payload.get('findings', 0))}") +if payload.get("campaign_duration_sec") is not None: + print(f"Duration : {payload['campaign_duration_sec']}s") +if payload.get("iterations_per_sec") is not None: + print(f"Throughput : {payload['iterations_per_sec']:,.1f} iter/s") + +if payload.get("perf_mean_ms") is not None: + print(f"Mean latency: {payload['perf_mean_ms']}ms") +if payload.get("memory_peak_mb") is not None: + print(f"Peak memory : {payload['memory_peak_mb']}MB") + +if int(payload.get("findings", 0)) > 0: + print(f"Report : {report_path}") + raise SystemExit(1) +PY +} + +run_fuzz_target() { + local target="$1" + local target_script="" + local corpus_dir="" + local seed_dir="" + local -a fuzz_args=() + local exit_code=0 + + target_script="$(target_script_for "$target")" + corpus_dir="$ATHERIS_CORPUS_ROOT/$target" + seed_dir="$PROJECT_ROOT/fuzz_atheris/seeds/$target" + + printf 'Target : %s\n' "$target" + printf 'Script : %s\n' "$target_script" + printf 'Workers : %s\n' "$WORKERS" + if [[ -n "$TIME_LIMIT" ]]; then + printf 'Time : %ss\n' "$TIME_LIMIT" + else + printf 'Time : until interrupted\n' + fi + + if [[ "$WORKERS" -gt 1 ]]; then + log_warn "workers > 1 fragments libFuzzer metrics across processes" + fi + + if [[ "$DRY_RUN" -eq 1 ]]; then + return 0 + fi + + check_atheris_environment + mkdir -p "$corpus_dir" + + fuzz_args+=("-workers=$WORKERS") + fuzz_args+=("-jobs=0") + fuzz_args+=("-artifact_prefix=$corpus_dir/crash_") + if [[ -n "$TIME_LIMIT" ]]; then + fuzz_args+=("-max_total_time=$TIME_LIMIT") + fi + fuzz_args+=("$corpus_dir") + + if [[ -d "$seed_dir" ]]; then + fuzz_args+=("$seed_dir") + else + log_warn "no seed corpus directory found for target '$target'" + fi + + python "$target_script" "${fuzz_args[@]}" || exit_code=$? + + if ! parse_and_display_report "$target"; then + local findings_dir="$corpus_dir/findings" + if [[ -d "$findings_dir" ]]; then + python "$ATHERIS_REPLAY_SCRIPT" "$findings_dir" || true + fi + exit 1 + fi + + if [[ "$exit_code" -ne 0 ]]; then + exit "$exit_code" + fi +} + +run_replay() { + local target="$1" + local findings_dir="${2:-$ATHERIS_CORPUS_ROOT/$target/findings}" + + require_known_target "$target" + require_file "$ATHERIS_REPLAY_SCRIPT" + [[ -d "$findings_dir" ]] || die "findings directory not found: $findings_dir" + + if [[ "$DRY_RUN" -eq 1 ]]; then + printf 'Replay : %s\n' "$findings_dir" + return 0 + fi + + python "$ATHERIS_REPLAY_SCRIPT" "$findings_dir" +} + +run_minimize() { + local target="$1" + local crash_file="$2" + local target_script="" + local minimized="" + local original_size=0 + local minimized_size=0 + + require_known_target "$target" + [[ -f "$crash_file" ]] || die "crash file not found: $crash_file" + + target_script="$(target_script_for "$target")" + minimized="${crash_file}.minimized" + + if [[ "$DRY_RUN" -eq 1 ]]; then + printf 'Minimize : %s via %s\n' "$crash_file" "$target" + return 0 + fi + + check_atheris_environment + python "$target_script" -minimize_crash=1 -exact_artifact_path="$minimized" "$crash_file" + + [[ -f "$minimized" ]] || die "minimization did not produce $minimized" + + original_size=$(wc -c < "$crash_file" | tr -d '[:space:]') + minimized_size=$(wc -c < "$minimized" | tr -d '[:space:]') + + printf 'Original size: %s bytes\n' "$original_size" + printf 'Minimized : %s bytes\n' "$minimized_size" + printf 'Output : %s\n' "$minimized" +} + +run_clean() { + local target="$1" + local clean_dir="$ATHERIS_CORPUS_ROOT/$target" + + require_known_target "$target" + if [[ ! -d "$clean_dir" ]]; then + log_warn "no corpus directory found for target '$target'" + return 0 + fi + + rm -rf "$clean_dir" + log_pass "Removed $clean_dir" +} diff --git a/scripts/lib/fuzz_atheris/common.sh b/scripts/lib/fuzz_atheris/common.sh new file mode 100755 index 00000000..f368ed5a --- /dev/null +++ b/scripts/lib/fuzz_atheris/common.sh @@ -0,0 +1,171 @@ +#!/usr/bin/env bash + +readonly TARGET_VENV=".venv-devcontainer-atheris" +readonly ATHERIS_TARGETS_FILE="$PROJECT_ROOT/fuzz_atheris/targets.tsv" +# shellcheck disable=SC2034 +readonly ATHERIS_CORPUS_ROOT="$PROJECT_ROOT/.fuzz_atheris_corpus" +# shellcheck disable=SC2034 +readonly ATHERIS_REPLAY_SCRIPT="$PROJECT_ROOT/fuzz_atheris/fuzz_atheris_replay_finding.py" +# shellcheck disable=SC2034 +readonly ATHERIS_HEALTH_SCRIPT="$PROJECT_ROOT/scripts/fuzz_atheris_corpus_health.py" + +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then + export UV_LINK_MODE="copy" +fi + +declare -A TARGET_SCRIPTS=() +# shellcheck disable=SC2034 +declare -A TARGET_DESCRIPTIONS=() +declare -a TARGET_ORDER=() + +if [[ -t 1 && "${QUIET:-0}" -eq 0 ]]; then + RED='\033[31m' + GREEN='\033[32m' + YELLOW='\033[33m' + BLUE='\033[34m' + RESET='\033[0m' +else + RED='' GREEN='' YELLOW='' BLUE='' RESET='' +fi + +die() { + printf '%serror:%s %s\n' "$RED" "$RESET" "$1" >&2 + exit 1 +} + +log_info() { + if [[ "${QUIET:-0}" -eq 0 ]]; then + printf '%s[INFO]%s %s\n' "$BLUE" "$RESET" "$1" + fi +} + +log_warn() { + printf '%s[WARN]%s %s\n' "$YELLOW" "$RESET" "$1" >&2 +} + +log_pass() { + if [[ "${QUIET:-0}" -eq 0 ]]; then + printf '%s[PASS]%s %s\n' "$GREEN" "$RESET" "$1" + fi +} + +require_file() { + local path="$1" + [[ -f "$path" ]] || die "missing required file: $path" +} + +require_command() { + command -v "$1" >/dev/null 2>&1 || die "required command not found: $1" +} + +load_target_registry() { + local name="" + local module="" + local description="" + + require_file "$ATHERIS_TARGETS_FILE" + + while IFS=$'\t' read -r name module description; do + [[ -z "$name" ]] && continue + [[ "$name" == \#* ]] && continue + [[ -n "$module" ]] || die "malformed Atheris target row for $name" + [[ -n "$description" ]] || die "missing description for Atheris target $name" + + TARGET_ORDER+=("$name") + TARGET_SCRIPTS["$name"]="$PROJECT_ROOT/fuzz_atheris/$module" + # shellcheck disable=SC2034 + TARGET_DESCRIPTIONS["$name"]="$description" + done < "$ATHERIS_TARGETS_FILE" + + [[ "${#TARGET_ORDER[@]}" -gt 0 ]] || die "Atheris target registry is empty" + + for name in "${TARGET_ORDER[@]}"; do + require_file "${TARGET_SCRIPTS[$name]}" + done +} + +target_script_for() { + local target="$1" + local script_path="${TARGET_SCRIPTS[$target]:-}" + [[ -n "$script_path" ]] || die "unknown target: $target" + printf '%s\n' "$script_path" +} + +require_known_target() { + local target="$1" + [[ -n "${TARGET_SCRIPTS[$target]:-}" ]] || die "unknown target: $target" +} + +require_devcontainer() { + if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" != "1" ]]; then + die "Atheris runs only inside the committed contributor devcontainer. +Use: + npx --yes @devcontainers/cli up --workspace-folder . + npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --help" + fi +} + +mode_requires_atheris_env() { + case "$1" in + setup|corpus|smoke|minimize|replay|fuzz) + return 0 + ;; + *) + return 1 + ;; + esac +} + +pivot_to_atheris_env() { + if [[ "${UV_PROJECT_ENVIRONMENT:-}" == "$TARGET_VENV" ]]; then + unset ATHERIS_ALREADY_PIVOTED + return 0 + fi + + [[ "${ATHERIS_ALREADY_PIVOTED:-0}" != "1" ]] || die "recursive uv environment pivot detected" + require_command uv + + log_info "Pivoting to isolated Atheris environment: ${TARGET_VENV}" + export UV_PROJECT_ENVIRONMENT="$TARGET_VENV" + export ATHERIS_ALREADY_PIVOTED=1 + unset VIRTUAL_ENV + exec uv run --python "$PY_VERSION" --group dev --group atheris --locked "${BASH:-bash}" "$0" "$@" +} + +check_atheris_environment() { + local clang_bin="${CLANG_BIN:-$(command -v clang)}" + local clang_resource_dir="" + local libfuzzer_path="" + + [[ -n "$clang_bin" ]] || die "clang toolchain not found in contributor environment" + clang_resource_dir="$("$clang_bin" --print-resource-dir 2>/dev/null || true)" + [[ -n "$clang_resource_dir" ]] || die "unable to resolve clang resource directory for ${clang_bin}" + libfuzzer_path="$(find "$clang_resource_dir" -name 'libclang_rt.fuzzer*.a' | head -1)" + [[ -n "$libfuzzer_path" ]] || die "libFuzzer runtime archive not found under ${clang_resource_dir}" + + printf 'Clang : %s\n' "$clang_bin" + printf 'LibFuzzer : %s\n' "$libfuzzer_path" + python - <<'PY' +from __future__ import annotations + +import platform +import sys + +import atheris # type: ignore[import-not-found] +import ftllexengine +import psutil + +version = sys.version_info +if (version.major, version.minor) != (3, 13): + raise SystemExit( + "expected Python 3.13 in the dedicated Atheris environment, " + f"got {sys.version.split()[0]}" + ) + +print(f"Python : {sys.version.split()[0]}") +print(f"Platform : {platform.platform()}") +print(f"Atheris : {getattr(atheris, '__version__', 'unknown')}") +print(f"psutil : {psutil.__version__}") +print(f"ftllexengine: {ftllexengine.__version__}") +PY +} diff --git a/scripts/lib/fuzz_hypofuzz/common.sh b/scripts/lib/fuzz_hypofuzz/common.sh new file mode 100755 index 00000000..3c80958d --- /dev/null +++ b/scripts/lib/fuzz_hypofuzz/common.sh @@ -0,0 +1,174 @@ +#!/usr/bin/env bash + +if [[ "${NO_COLOR:-}" == "1" ]]; then + RED=""; GREEN=""; YELLOW=""; BLUE=""; CYAN=""; BOLD=""; RESET="" +elif [[ ! -t 1 ]]; then + RED=""; GREEN=""; YELLOW=""; BLUE=""; CYAN=""; BOLD=""; RESET="" +else + RED="\033[31m"; GREEN="\033[32m"; YELLOW="\033[33m"; BLUE="\033[34m"; CYAN="\033[36m"; BOLD="\033[1m"; RESET="\033[0m" +fi + +HAS_PSUTIL=false +python -c "import psutil" 2>/dev/null && HAS_PSUTIL=true || true + +log_group_start() { [[ "$IS_GHA" == "true" ]] && echo "::group::$1"; echo -e "\n${BOLD}${CYAN}=== $1 ===${RESET}"; } +log_group_end() { [[ "$IS_GHA" == "true" ]] && echo "::endgroup::"; return 0; } +log_info() { echo -e "${BLUE}[INFO]${RESET} $1"; } +log_warn() { echo -e "${YELLOW}[WARN]${RESET} $1"; } +log_pass() { echo -e "${GREEN}[PASS]${RESET} $1"; } +log_fail() { echo -e "${RED}[FAIL]${RESET} $1"; } +log_err() { echo -e "${RED}[ERROR]${RESET} $1" >&2; } + +format_bytes() { + local bytes="$1" + if (( bytes >= 1048576 )); then + printf "%d MiB" $((bytes / 1048576)) + elif (( bytes >= 1024 )); then + printf "%d KiB" $((bytes / 1024)) + else + printf "%d B" "$bytes" + fi +} + +last_nonempty_log_line() { + local log_file="$1" + local last_line + last_line=$(awk 'NF { line = $0 } END { print line }' "$log_file" 2>/dev/null || true) + last_line=${last_line//$'\r'/} + if [[ -z "$last_line" ]]; then + echo "awaiting first output" + return 0 + fi + if (( ${#last_line} > 160 )); then + echo "${last_line:0:157}..." + return 0 + fi + echo "$last_line" +} + +_heartbeat_daemon() { + local watched_pid="$1" log_file="$2" start_sec="$3" + local prev_last_line="" prev_change_sec=$SECONDS + sleep 5 + while kill -0 "$watched_pid" 2>/dev/null; do + local elapsed=$(( SECONDS - start_sec )) + local log_bytes=0 + [[ -f "$log_file" ]] && log_bytes=$(wc -c < "$log_file" | tr -d '[:space:]') + local raw_last_line last_display + raw_last_line=$(last_nonempty_log_line "$log_file") + if [[ "$raw_last_line" == "$prev_last_line" ]]; then + local unchanged_sec=$(( SECONDS - prev_change_sec )) + last_display="(no new output, ${unchanged_sec}s)" + else + last_display="$raw_last_line" + prev_last_line="$raw_last_line" + prev_change_sec=$SECONDS + fi + if [[ "$HAS_PSUTIL" == "true" ]]; then + local stats + stats=$(python -c " +import psutil +try: + p = psutil.Process(${watched_pid}) + all_procs = [p] + p.children(recursive=True) + cpu = sum(x.cpu_percent(interval=0.2) for x in all_procs) + mem_mb = sum(x.memory_info().rss for x in all_procs) // 1048576 + print(f'CPU={cpu:.0f}% MEM={mem_mb}MB procs={len(all_procs)}') +except Exception: + print('CPU=? MEM=? procs=?') +" 2>/dev/null || echo "CPU=? MEM=? procs=?") + echo "[HEARTBEAT] T+${elapsed}s | ${stats} | log=$(format_bytes "$log_bytes") | last: ${last_display}" >&2 + else + echo "[HEARTBEAT] T+${elapsed}s | log=$(format_bytes "$log_bytes") | last: ${last_display}" >&2 + fi + sleep "$HEARTBEAT_INTERVAL_SEC" + done +} + +_run_with_heartbeat() { + local log_file="$1" append="$2"; shift 2 + if [[ "$1" == "--" ]]; then shift; fi + local fifo + fifo=$(mktemp -u) + mkfifo "$fifo" + + "$@" > "$fifo" 2>&1 & + local cmd_pid=$! + PID_LIST+=("$cmd_pid") + + local hb_pid=0 + if [[ "$HEARTBEAT_ENABLED" == "true" && "$HEARTBEAT_INTERVAL_SEC" -gt 0 ]]; then + _heartbeat_daemon "$cmd_pid" "$log_file" "$SECONDS" & + hb_pid=$! + PID_LIST+=("$hb_pid") + fi + + if [[ "$VERBOSE" == "true" ]]; then + if [[ "$append" == "true" ]]; then + tee -a "$log_file" < "$fifo" || true + else + tee "$log_file" < "$fifo" || true + fi + else + if [[ "$append" == "true" ]]; then + cat < "$fifo" >> "$log_file" || true + else + cat < "$fifo" > "$log_file" || true + fi + fi + + wait "$cmd_pid" 2>/dev/null + local exit_code=$? + + if [[ "$hb_pid" -gt 0 ]]; then + kill "$hb_pid" 2>/dev/null || true + wait "$hb_pid" 2>/dev/null || true + fi + + PID_LIST=() + + rm -f "$fifo" + return "$exit_code" +} + +run_diagnostics() { + log_group_start "Pre-Flight Diagnostics" + + echo "[ INFO ] Script : $SCRIPT_NAME v$SCRIPT_VERSION" + + local python_version + python_version=$(python --version 2>&1 | grep -oE '[0-9]+\.[0-9]+' | head -1) + echo "[ OK ] Python : $python_version" + + if python -c "import hypothesis" &>/dev/null; then + local hypo_version + hypo_version=$(python -c "import hypothesis; print(hypothesis.__version__)") + echo "[ OK ] Hypothesis : $hypo_version" + else + echo "[ FAIL ] Hypothesis : MISSING" + log_err "Hypothesis not installed. Run 'uv sync' to install dependencies." + exit 1 + fi + + log_pass "System is ready." + log_group_end +} + +_on_exit() { + local exit_code=$? + local pid + for pid in "${PID_LIST[@]}"; do + kill -TERM "$pid" 2>/dev/null || true + done + [[ ${#PID_LIST[@]} -gt 0 ]] && wait "${PID_LIST[@]}" 2>/dev/null || true + echo "[EXIT-CODE] $exit_code" >&2 +} + +_on_signal() { + _SIGNAL_RECEIVED=true + local pid + for pid in "${PID_LIST[@]}"; do + kill -TERM "$pid" 2>/dev/null || true + done + PID_LIST=() +} diff --git a/scripts/lib/fuzz_hypofuzz/modes_check.sh b/scripts/lib/fuzz_hypofuzz/modes_check.sh new file mode 100755 index 00000000..207b1c86 --- /dev/null +++ b/scripts/lib/fuzz_hypofuzz/modes_check.sh @@ -0,0 +1,296 @@ +#!/usr/bin/env bash + +run_preflight() { + log_group_start "Preflight Infrastructure Audit" + + local audit_exit=0 + + python << PREFLIGHT_EOF || audit_exit=$? +import ast +import re +import sys +from pathlib import Path +from collections import defaultdict + +tests_dir = Path("$PROJECT_ROOT/tests") +strategies_dir = tests_dir / "strategies" + +given_count = 0 +given_by_file = defaultdict(int) +event_count = 0 +event_by_file = defaultdict(int) + +for py_file in tests_dir.rglob("*.py"): + try: + content = py_file.read_text() + g_matches = len(re.findall(r'@given\(', content)) + if g_matches > 0: + given_count += g_matches + given_by_file[py_file.relative_to(tests_dir)] = g_matches + e_matches = len(re.findall(r'(? 0: + event_count += e_matches + event_by_file[py_file.relative_to(tests_dir)] = e_matches + except Exception: + pass + +fuzz_modules = [] +fuzz_modules_without_events = [] +for py_file in tests_dir.rglob("*.py"): + try: + if py_file.name == "conftest.py": + continue + content = py_file.read_text() + if "pytest.mark.fuzz" in content or "pytestmark = pytest.mark.fuzz" in content: + rel_path = py_file.relative_to(tests_dir) + fuzz_modules.append(str(rel_path)) + has_given = given_by_file.get(rel_path, 0) > 0 + has_events = rel_path in event_by_file + if has_given and not has_events: + fuzz_modules_without_events.append(str(rel_path)) + except Exception: + pass + +tests_without_events = [] +for py_file in tests_dir.rglob("*.py"): + try: + content = py_file.read_text() + if "@given" not in content: + continue + tree = ast.parse(content, filename=str(py_file)) + rel_path = str(py_file.relative_to(tests_dir)) + + for node in ast.walk(tree): + if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): + continue + has_given = any( + isinstance(dec, ast.Call) + and ( + (isinstance(dec.func, ast.Name) and dec.func.id == "given") + or ( + isinstance(dec.func, ast.Attribute) + and dec.func.attr == "given" + ) + ) + for dec in node.decorator_list + ) + if not has_given: + continue + has_event = any( + isinstance(child, ast.Call) + and isinstance(child.func, ast.Name) + and child.func.id == "event" + for child in ast.walk(node) + ) + if not has_event: + tests_without_events.append(f"{rel_path}::{node.name}") + except Exception: + pass + +_STRATEGY_SUPPORT_FILES = {"__init__.py", "ftl.py", "ftl_shared.py"} +strategy_coverage = {} +strategy_gaps = [] +has_strategies_dir = strategies_dir.exists() + +if has_strategies_dir: + for strat_file in strategies_dir.glob("*.py"): + try: + if strat_file.name in _STRATEGY_SUPPORT_FILES: + continue + content = strat_file.read_text() + events = len(re.findall(r'(? 0 else "[ FAIL ]" + print(f" {status} {name:<20} {count} events") + print() + + if strategy_gaps: + print("[FAIL] Strategy files without event() calls (HypoFuzz guidance gap):") + for name in sorted(strategy_gaps): + print(f" [ FAIL ] {name}") + print() +else: + print("[ INFO ] No tests/strategies directory found (skipped strategy audit)") + print() + +if fuzz_modules_without_events: + print("[WARN] Fuzz Modules WITHOUT Events (File-Level Gap):") + for mod in sorted(fuzz_modules_without_events): + given = given_by_file.get(Path(mod), 0) + print(f" [ WARN ] {mod} ({given} @given tests, 0 events)") + print() +else: + print("[ OK ] All fuzz modules have events (file-level)") + print() + +if tests_without_events: + print("[FAIL] @given Tests WITHOUT event() Calls (ALL test files):") + for test_id in sorted(tests_without_events): + print(f" [ FAIL ] {test_id}") + print() +else: + print("[ OK ] All @given tests emit events (per-test, all files)") + print() + +gaps = len(fuzz_modules_without_events) + len(tests_without_events) + len(strategy_gaps) +if gaps > 0: + print(f"[FAIL] {gaps} gap(s) detected. Add hypothesis.event() calls for semantic guidance.") + sys.exit(1) +else: + print("[ OK ] Infrastructure audit passed. Run --deep for coverage-guided fuzzing.") +PREFLIGHT_EOF + + log_group_end + return "$audit_exit" +} + +run_check() { + run_diagnostics + log_group_start "Property Tests" + + if [[ "$VERBOSE" == "true" ]]; then + export HYPOTHESIS_PROFILE="verbose" + fi + + local test_target="${TARGET:-tests/}" + + if [[ ! -e "$test_target" ]]; then + log_err "Target not found: $test_target" + log_group_end + return 1 + fi + + log_info "Target: $test_target" + if [[ "$VERBOSE" == "true" ]]; then + log_info "Profile: verbose" + else + log_info "Profile: default (dev)" + fi + + local temp_log + temp_log=$(mktemp) + + local cmd=(uv run --python "$PY_VERSION" pytest "$test_target" -v --tb=short) + + local exit_code=0 + set +e + _run_with_heartbeat "$temp_log" false -- "${cmd[@]}" + exit_code=$? + set -e + + python << PYEOF +import json, re +from datetime import datetime, timezone +from pathlib import Path + +log_path = Path("$temp_log") +exit_code = $exit_code + +try: + log_content = log_path.read_text() if log_path.exists() else "" +except Exception: + log_content = "" + +summary_match = re.search(r'=+ (.*?) =+', log_content) +summary_text = summary_match.group(1) if summary_match else "" + +passed_match = re.search(r'(\d+) passed', summary_text) +failed_match = re.search(r'(\d+) failed', summary_text) +skipped_match = re.search(r'(\d+) skipped', summary_text) + +tests_passed = int(passed_match.group(1)) if passed_match else 0 +tests_failed = int(failed_match.group(1)) if failed_match else 0 +tests_skipped = int(skipped_match.group(1)) if skipped_match else 0 + +hypo_count = log_content.count('Falsifying example') + +failures = [] +failed_test_pattern = r'FAILED (tests/.+?)(?: - |$)' +failed_tests = sorted(list(set(re.findall(failed_test_pattern, log_content)))) + +for test_path in failed_tests: + failure_entry = {"test": test_path} + test_section_start = log_content.find(test_path) + if test_section_start != -1: + test_section = log_content[test_section_start:test_section_start + 2000] + error_match = re.search(r'E\s+(\w+Error|\w+Exception):', test_section) + if error_match: + failure_entry["error_type"] = error_match.group(1) + if 'Falsifying example' in log_content: + test_func = test_path.split("::")[-1] if "::" in test_path else "" + example_pattern = rf'Falsifying example:\s*{re.escape(test_func)}\(([^\)]+)\)' + example_match = re.search(example_pattern, log_content, re.DOTALL) + if example_match: + failure_entry["example"] = example_match.group(1).strip()[:500] + failures.append(failure_entry) + +fail_ex = "" +if 'Falsifying example' in log_content: + try: + fail_ex = log_content.split('Falsifying example')[1].split('\n')[0][:200].strip() + except IndexError: + pass + +if exit_code == 0: + status = 'pass' +elif exit_code in (130, 2): + status = 'stopped' +elif tests_failed > 0 or hypo_count > 0: + status = 'finding' +else: + status = 'error' + +report = { + 'script': '$SCRIPT_NAME', + 'script_version': '$SCRIPT_VERSION', + 'mode': 'check', + 'status': status, + 'timestamp': datetime.now(timezone.utc).isoformat(), + 'tests_passed': tests_passed, + 'tests_failed': tests_failed, + 'tests_skipped': tests_skipped, + 'hypothesis_failures': hypo_count, + 'falsifying_example': fail_ex, + 'failures': failures, + 'exit_code': exit_code +} +print('[SUMMARY-JSON-BEGIN]') +print(json.dumps(report, indent=2)) +print('[SUMMARY-JSON-END]') +PYEOF + + if [[ $exit_code -eq 0 ]]; then + log_pass "All property tests passed." + elif [[ $exit_code -eq 130 || $exit_code -eq 2 ]]; then + log_info "Run interrupted by user." + elif [[ $exit_code -eq 1 ]]; then + log_fail "Failures detected. See JSON summary above." + if [[ "$VERBOSE" == "false" ]]; then + log_warn "Failure output:" + if [[ -s "$temp_log" ]]; then + grep -A 20 "Falsifying example" "$temp_log" || head -n 20 "$temp_log" + fi + fi + else + log_err "Test execution failed (code $exit_code)." + fi + + rm -f "$temp_log" + log_group_end + return "$exit_code" +} diff --git a/scripts/lib/fuzz_hypofuzz/modes_fuzz.sh b/scripts/lib/fuzz_hypofuzz/modes_fuzz.sh new file mode 100755 index 00000000..2becdcf9 --- /dev/null +++ b/scripts/lib/fuzz_hypofuzz/modes_fuzz.sh @@ -0,0 +1,459 @@ +#!/usr/bin/env bash + +run_deep() { + run_diagnostics + + local -a fuzz_uv=(uv run --group fuzz --python "$PY_VERSION") + local deep_tooling + if ! deep_tooling="$("${fuzz_uv[@]}" python <<'PYEOF' +import click +import hypofuzz +import hypothesis + +print(f"Hypothesis CLI : {hypothesis.__version__}") +print(f"HypoFuzz : {hypofuzz.__version__}") +print(f"Click : {click.__version__}") +PYEOF +)"; then + log_err "Deep fuzzing tooling is unavailable in the fuzz dependency group." + log_err "Run 'uv sync --group fuzz' or fix pyproject.toml dependency-groups.fuzz." + return 1 + fi + + if [[ "$METRICS" == "true" ]]; then + log_group_start "Deep Fuzzing (pytest with metrics)" + else + log_group_start "Continuous HypoFuzz" + fi + + export HYPOTHESIS_PROFILE="hypofuzz" + + local log_file="$PROJECT_ROOT/.hypothesis/hypofuzz.log" + mkdir -p "$PROJECT_ROOT/.hypothesis" + log_info "Tooling:" + while IFS= read -r line; do + log_info " $line" + done <<< "$deep_tooling" + + if [[ "$METRICS" == "true" ]]; then + export STRATEGY_METRICS="1" + export STRATEGY_METRICS_DETAILED="1" + export STRATEGY_METRICS_LIVE="1" + export STRATEGY_METRICS_INTERVAL="10" + log_info "Metrics: Per-strategy breakdown enabled (10s interval)" + log_info "Metrics: Using pytest (HypoFuzz multiprocessing incompatible with metrics)" + log_info "Profile: hypofuzz (deadline=None)" + + { + echo "" + echo "================================================================================" + echo "Metrics Session (pytest -m fuzz): $(date '+%Y-%m-%d %H:%M:%S')" + echo "Profile: hypofuzz" + echo "================================================================================" + } >> "$log_file" + + local exit_code=0 + set +e + _run_with_heartbeat "$log_file" true -- "${fuzz_uv[@]}" pytest tests/ -m fuzz -v --tb=short + exit_code=$? + set -e + + if [[ $exit_code -ne 0 ]]; then + log_fail "Metrics session failed (exit $exit_code). Last 80 lines of log:" + tail -n 80 "$log_file" + fi + + python << METRICS_PYEOF +import json, re +from datetime import datetime, timezone +from pathlib import Path + +log_path = Path("$log_file") +exit_code = $exit_code +try: + log_content = log_path.read_text() if log_path.exists() else "" +except Exception: + log_content = "" + +summary_match = re.search(r'=+ (.*?) =+\n*$', log_content, re.MULTILINE) +summary_text = summary_match.group(1) if summary_match else "" +passed_m = re.search(r'(\d+) passed', summary_text) +failed_m = re.search(r'(\d+) failed', summary_text) +skipped_m = re.search(r'(\d+) skipped', summary_text) +hypo_count = log_content.count('Falsifying example') + +report = { + 'script': '$SCRIPT_NAME', + 'script_version': '$SCRIPT_VERSION', + 'mode': 'deep_metrics', + 'status': 'pass' if exit_code == 0 else 'fail', + 'timestamp': datetime.now(timezone.utc).isoformat(), + 'tests_passed': int(passed_m.group(1)) if passed_m else 0, + 'tests_failed': int(failed_m.group(1)) if failed_m else 0, + 'tests_skipped': int(skipped_m.group(1)) if skipped_m else 0, + 'hypothesis_failures': hypo_count, + 'exit_code': exit_code, + 'log_file': str(log_path), +} +print('[SUMMARY-JSON-BEGIN]') +print(json.dumps(report, indent=2)) +print('[SUMMARY-JSON-END]') +METRICS_PYEOF + + log_group_end + return "$exit_code" + fi + + if [[ -n "$TIME_LIMIT" ]]; then + log_info "Time Limit: ${TIME_LIMIT}s" + else + log_info "Time Limit: Until Ctrl+C" + fi + log_info "Workers: $WORKERS" + log_info "Profile: hypofuzz (deadline=None)" + + local session_log_start=0 + [[ -f "$log_file" ]] && session_log_start=$(wc -c < "$log_file" | tr -d ' ') + + local exit_code=0 + local teardown_race_detected=false + local restart_count=0 + local max_teardown_restarts=20 + + if [[ -n "$TIME_LIMIT" ]]; then + { + echo "" + echo "================================================================================" + echo "HypoFuzz Session: $(date '+%Y-%m-%d %H:%M:%S')" + echo "Script: $SCRIPT_NAME v$SCRIPT_VERSION" + echo "Workers: $WORKERS" + echo "Profile: hypofuzz" + echo "================================================================================" + } >> "$log_file" + + local run_log_start=0 + [[ -f "$log_file" ]] && run_log_start=$(wc -c < "$log_file" | tr -d ' ') + + set +e + _run_with_heartbeat "$log_file" true -- timeout "$TIME_LIMIT" "${fuzz_uv[@]}" hypothesis fuzz --no-dashboard -n "$WORKERS" tests/fuzz/ + exit_code=$? + set -e + [[ $exit_code -eq 124 ]] && exit_code=0 + + if [[ "$_SIGNAL_RECEIVED" == "true" && $exit_code -ne 0 ]]; then exit_code=130; fi + + local _log_window + _log_window=$(tail -c "+$((run_log_start + 1))" "$log_file" 2>/dev/null || true) + if [[ $exit_code -ne 0 && $exit_code -ne 130 && $exit_code -ne 120 ]] \ + && [[ -f "$log_file" ]] \ + && echo "$_log_window" | grep -qF "_start_worker" 2>/dev/null \ + && echo "$_log_window" | grep -qF "managers.py" 2>/dev/null; then + log_warn "Worker teardown race detected (HypoFuzz bug, exit $exit_code)." + log_warn "Worker crashed on Manager proxy access after shutdown - no test failures." + log_warn "Re-run ./scripts/fuzz_hypofuzz.sh --deep to continue (database is preserved)." + teardown_race_detected=true + exit_code=0 + fi + else + while true; do + local run_log_start=0 + [[ -f "$log_file" ]] && run_log_start=$(wc -c < "$log_file" | tr -d ' ') + + { + echo "" + echo "================================================================================" + if [[ $restart_count -eq 0 ]]; then + echo "HypoFuzz Session: $(date '+%Y-%m-%d %H:%M:%S')" + else + echo "HypoFuzz Restart #${restart_count}: $(date '+%Y-%m-%d %H:%M:%S')" + fi + echo "Script: $SCRIPT_NAME v$SCRIPT_VERSION" + echo "Workers: $WORKERS" + echo "Profile: hypofuzz" + echo "================================================================================" + } >> "$log_file" + + set +e + _run_with_heartbeat "$log_file" true -- "${fuzz_uv[@]}" hypothesis fuzz --no-dashboard -n "$WORKERS" tests/fuzz/ + exit_code=$? + set -e + + if [[ "$_SIGNAL_RECEIVED" == "true" ]]; then + [[ $exit_code -ne 0 ]] && exit_code=130 + break + fi + + [[ $exit_code -eq 0 || $exit_code -eq 120 ]] && break + + local _log_window + _log_window=$(tail -c "+$((run_log_start + 1))" "$log_file" 2>/dev/null || true) + if [[ $exit_code -ne 130 ]] \ + && [[ -f "$log_file" ]] \ + && echo "$_log_window" | grep -qF "_start_worker" 2>/dev/null \ + && echo "$_log_window" | grep -qF "managers.py" 2>/dev/null; then + + teardown_race_detected=true + (( restart_count++ )) || true + + if [[ $restart_count -gt $max_teardown_restarts ]]; then + log_warn "Teardown race repeated $restart_count times - giving up (max $max_teardown_restarts)." + exit_code=1 + break + fi + + log_info "Teardown race (${restart_count}/${max_teardown_restarts}) - restarting automatically (database preserved)." + sleep 1 + continue + fi + + break + done + fi + + local failure_count=0 + if [[ -f "$log_file" ]]; then + failure_count=$(tail -c "+$((session_log_start + 1))" "$log_file" | grep -c "Falsifying example" 2>/dev/null) || failure_count=0 + fi + + if [[ $exit_code -eq 0 || $exit_code -eq 130 || $exit_code -eq 120 ]]; then + log_pass "Fuzzing session ended." + + if [[ "$failure_count" -gt 0 ]]; then + log_warn "$failure_count falsifying example(s) found in this session." + echo " View log: cat $log_file" + echo " List failures: ./scripts/fuzz_hypofuzz.sh --list" + fi + + log_group_start "Event Infrastructure" + python << EVENTEOF +import re +from pathlib import Path + +tests_dir = Path("$PROJECT_ROOT/tests") + +event_count = 0 +for py_file in tests_dir.rglob("*.py"): + try: + content = py_file.read_text() + event_count += len(re.findall(r'(? 0: + example_pattern = r'Falsifying example:\s*(\w+)\(([^)]+)\)' + for match in re.finditer(example_pattern, log_content): + test_name = match.group(1) + example_args = match.group(2).strip()[:500] + failures.append({"test": test_name, "example": example_args}) + +if teardown_race and exit_code != 0: + status = "teardown_race" +elif exit_code == 120: + status = "interrupted" +else: + status = "pass" + +report = { + "script": "$SCRIPT_NAME", + "script_version": "$SCRIPT_VERSION", + "mode": "deep", + "status": status, + "timestamp": datetime.now(timezone.utc).isoformat(), + "failures_count": failure_count, + "failures": failures[:50], + "exit_code": exit_code, + "teardown_restarts": restart_count, + "log_file": "$log_file" +} +print("[SUMMARY-JSON-BEGIN]") +print(json.dumps(report, indent=2)) +print("[SUMMARY-JSON-END]") +PYEOF + else + log_err "HypoFuzz exited with error code $exit_code." + + if grep -q "AF_UNIX path too long" "$log_file"; then + log_warn "AF_UNIX path too long detected. TMPDIR is set to $TMPDIR." + fi + + python << PYEOF +import json +from datetime import datetime, timezone + +report = { + "script": "$SCRIPT_NAME", + "script_version": "$SCRIPT_VERSION", + "mode": "deep", + "status": "error", + "timestamp": datetime.now(timezone.utc).isoformat(), + "failures_count": $failure_count, + "exit_code": $exit_code, + "log_file": "$log_file" +} +print("[SUMMARY-JSON-BEGIN]") +print(json.dumps(report, indent=2)) +print("[SUMMARY-JSON-END]") +PYEOF + log_group_end + return "$exit_code" + fi + + log_group_end + return "$exit_code" +} + +run_list() { + local examples_dir="$PROJECT_ROOT/.hypothesis/examples" + local fuzz_log="$PROJECT_ROOT/.hypothesis/hypofuzz.log" + + log_group_start "Hypothesis Failure Reproduction Info" + + log_info "How Hypothesis failures work:" + echo " 1. When a property test fails, Hypothesis shrinks to a minimal example" + echo " 2. The shrunk example is stored in .hypothesis/examples/ (SHA-384 hashed)" + echo " 3. On re-run, Hypothesis AUTOMATICALLY replays the stored failure" + echo " 4. Simply running 'uv run pytest tests/' will reproduce all known failures" + echo "" + + if [[ -d "$examples_dir" ]]; then + local count + count=$(find "$examples_dir" -type f 2>/dev/null | wc -l | tr -d ' ') + log_pass ".hypothesis/examples/ exists with $count entries" + else + log_warn "No .hypothesis/examples/ directory found." + echo " Run some Hypothesis tests first to populate the database." + fi + echo "" + + if [[ -f "$fuzz_log" ]]; then + log_info "Recent HypoFuzz session log: $fuzz_log" + local failure_count=0 + failure_count=$(grep -c "Falsifying example" "$fuzz_log" 2>/dev/null) || failure_count=0 + if [[ "$failure_count" -gt 0 ]]; then + log_warn "Found $failure_count falsifying example(s) in log." + echo "" + echo "Recent failures:" + grep -B2 "Falsifying example" "$fuzz_log" | tail -20 + else + echo " No failures recorded in latest session." + fi + else + log_info "HypoFuzz log: Not found (run --deep to create)" + fi + echo "" + + echo "To reproduce a specific failing test:" + echo " ./scripts/fuzz_hypofuzz.sh --repro test_module::test_function" + echo "" + echo "To reproduce all failures:" + echo " uv run pytest tests/ -x -v" + echo "" + echo "To extract @example decorator:" + echo " uv run python scripts/fuzz_hypofuzz_repro.py --example test_module::test_function" + + log_group_end +} + +run_clean() { + local hypothesis_dir="$PROJECT_ROOT/.hypothesis" + local fuzz_log="$hypothesis_dir/hypofuzz.log" + + if [[ ! -d "$hypothesis_dir" ]]; then + log_info "No .hypothesis/ directory found. Nothing to clean." + return 0 + fi + + local example_count + example_count=$(find "$hypothesis_dir/examples" -type f 2>/dev/null | wc -l | tr -d ' ') + + log_group_start "Hypothesis Database Cleanup" + echo "Directory: $hypothesis_dir" + echo "Examples: $example_count cached entries" + if [[ -f "$fuzz_log" ]]; then + echo "Log: $(wc -l < "$fuzz_log" | tr -d ' ') lines" + fi + echo "" + if [[ "$FORCE" == "true" ]]; then + rm -rf "$hypothesis_dir" + log_pass "Removed .hypothesis/ directory (forced)." + else + if [[ ! -t 0 ]]; then + log_err "Non-interactive environment detected. You must use --force to clean the database." + exit 1 + fi + + log_warn "Removing .hypothesis/ will:" + echo " - Delete all cached examples (regression database)" + echo " - Delete any shrunk failure examples" + echo " - Require tests to rediscover edge cases" + echo "" + read -r -p "Remove .hypothesis/ directory? (y/N): " response + case "$response" in + [yY][eE][sS]|[yY]) + rm -rf "$hypothesis_dir" + log_pass "Removed .hypothesis/ directory." + ;; + *) + log_info "Cancelled." + ;; + esac + fi + log_group_end +} + +run_repro() { + if [[ -z "$REPRO_TEST" ]]; then + log_err "Missing test argument for --repro" + echo "Usage: ./scripts/fuzz_hypofuzz.sh --repro " + echo "" + echo "Examples:" + echo " ./scripts/fuzz_hypofuzz.sh --repro tests/fuzz/test_syntax_parser_property.py::test_roundtrip" + echo " ./scripts/fuzz_hypofuzz.sh --repro tests/fuzz/test_syntax_parser_property.py" + return 1 + fi + + log_group_start "Reproduce Hypothesis Failure" + log_info "Test: $REPRO_TEST" + + local exit_code=0 + set +e + uv run --python "$PY_VERSION" python scripts/fuzz_hypofuzz_repro.py --verbose --example "$REPRO_TEST" + exit_code=$? + set -e + + if [[ $exit_code -eq 0 ]]; then + log_pass "Test passed - no failure to reproduce." + echo "If you expected a failure, the bug may have been fixed or the" + echo ".hypothesis/examples/ database may need to be cleared." + fi + + log_group_end + return "$exit_code" +} diff --git a/scripts/lint.sh b/scripts/lint.sh index 8c012e16..38187af5 100755 --- a/scripts/lint.sh +++ b/scripts/lint.sh @@ -41,7 +41,14 @@ shopt -s inherit_errexit # [SECTION: ENVIRONMENT_ISOLATION] PY_VERSION="${PY_VERSION:-3.13}" -TARGET_VENV=".venv-${PY_VERSION}" +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then + TARGET_VENV=".venv-devcontainer-${PY_VERSION}" +else + TARGET_VENV=".venv-${PY_VERSION}" +fi +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then + export UV_LINK_MODE="copy" +fi # Universal Pivot: Works with uv, or standard venvs if [[ "${UV_PROJECT_ENVIRONMENT:-}" != "$TARGET_VENV" ]]; then @@ -184,7 +191,8 @@ execute_tool() { local exit_code=$? set -e - local duration=$(printf "%.3f" "$(echo "${EPOCHREALTIME} - $start_time" | bc)") + local duration + duration=$(awk -v start="$start_time" -v end="$EPOCHREALTIME" 'BEGIN { printf "%.3f", end - start }') # Universal file counting (Pre-calc instead of parsing output) local file_count="0" @@ -263,52 +271,35 @@ run_mypy() { log_info " + Using ${config_source}: ${config}" # Flags: --no-color-output (agent), --no-error-summary (quiet) - local cmd=(mypy --config-file "$config" --python-version "$PY_VERSION" --no-color-output --no-error-summary) + local -a cmd=(mypy --config-file "$config" --python-version "$PY_VERSION" --no-color-output --no-error-summary) + if [[ "$target" == "fuzz_atheris" ]]; then + cmd=(env "MYPYPATH=$PWD/fuzz_atheris${MYPYPATH:+:$MYPYPATH}" "${cmd[@]}") + fi execute_tool "mypy" "$target" "${cmd[@]}" "$target" done log_group_end } -# [SECTION: PLUGINS] -run_plugins() { - if [[ -n "${LINT_PLUGIN_MODE:-}" ]]; then return 0; fi - export LINT_PLUGIN_MODE=1 - - declare -a plugin_files=() - set +e - while IFS= read -r file; do - if grep -q "# @lint-plugin:" "$file"; then - plugin_files+=("$file") - fi - done < <(find "$SCRIPT_DIR" -maxdepth 1 -type f ! -name "lint.sh" ! -name "for_testing_lint.sh" 2>/dev/null) - set -e - - if [[ ${#plugin_files[@]} -eq 0 ]]; then return 0; fi - - log_group_start "Plugins" - for file in "${plugin_files[@]}"; do - local name - # Extract name: Header format "# @lint-plugin: Name" (Strict start of line) - name=$(grep -m 1 "^# @lint-plugin:" "$file" | sed "s/^# @lint-plugin:[[:space:]]*//" | tr -d '\r\n') - - # Skip placeholders, invalid names, or empty strings - if [[ -z "$name" || "$name" == "" ]]; then continue; fi - - local cmd=() - # Use the venv Python explicitly so plugins always run with the correct Python - # version. Bare `python` resolves to the system Python on GitHub Actions Ubuntu - # runners (3.12.x), not the venv Python (3.13.x), causing ImportError on any - # module that uses Python 3.13+ features (e.g. TypeIs from PEP 742). - if [[ "$file" == *.py ]]; then cmd=("${TARGET_VENV}/bin/python" "$file") - elif [[ "$file" == *.sh ]]; then cmd=("bash" "$file") - elif [[ -x "$file" ]]; then cmd=("$file") - else cmd=("bash" "$file"); fi - - execute_tool "plugin:$name" "all" "${cmd[@]}" +# [SECTION: REPOSITORY_VALIDATORS] +# Explicit registry keeps lint surface auditable: adding a helper script does not +# silently change what "lint" means. +readonly -a SCRIPT_VALIDATORS=( + "PISync:$SCRIPT_DIR/validate_pyi_sync.py" + "ISO4217:$SCRIPT_DIR/verify_iso4217.py" +) + +run_script_validators() { + if [[ ${#SCRIPT_VALIDATORS[@]} -eq 0 ]]; then return 0; fi + + log_group_start "Repository Validators" + local validator_entry name script_path + for validator_entry in "${SCRIPT_VALIDATORS[@]}"; do + name="${validator_entry%%:*}" + script_path="${validator_entry#*:}" + execute_tool "validator:$name" "all" "${TARGET_VENV}/bin/python" "$script_path" done log_group_end - unset LINT_PLUGIN_MODE } # [SECTION: NOQA AUDIT] @@ -363,7 +354,7 @@ run_noqa_audit() { run_ruff || true run_mypy || true run_noqa_audit || true -run_plugins || true +run_script_validators || true # [SECTION: REPORT] log_group_start "Final Report" diff --git a/scripts/test.sh b/scripts/test.sh index 2b3a13fb..982ae242 100755 --- a/scripts/test.sh +++ b/scripts/test.sh @@ -40,24 +40,34 @@ shopt -s inherit_errexit # [SECTION: ENVIRONMENT_ISOLATION] PY_VERSION="${PY_VERSION:-3.13}" -TARGET_VENV=".venv-${PY_VERSION}" +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then + TARGET_VENV=".venv-devcontainer-${PY_VERSION}" +else + TARGET_VENV=".venv-${PY_VERSION}" +fi +if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then + export UV_LINK_MODE="copy" +fi + +PROJECT_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/.." && pwd)" +cd "$PROJECT_ROOT" if [[ "${UV_PROJECT_ENVIRONMENT:-}" != "$TARGET_VENV" ]]; then - if [[ "${TEST_ALREADY_PIVOTED:-}" == "1" ]]; then - echo "Error: Detected re-execution loop. Aborting." >&2 - exit 1 - fi - echo -e "\033[34m[INFO]\033[0m Pivoting to isolated environment: ${TARGET_VENV}" - export UV_PROJECT_ENVIRONMENT="$TARGET_VENV" - export TEST_ALREADY_PIVOTED=1 - unset VIRTUAL_ENV - exec uv run --python "$PY_VERSION" "${BASH:-bash}" "$0" "$@" -else - unset TEST_ALREADY_PIVOTED + echo -e "\033[34m[INFO]\033[0m Preparing isolated environment: ${TARGET_VENV}" fi +export UV_PROJECT_ENVIRONMENT="$TARGET_VENV" +unset VIRTUAL_ENV + +resolve_env_python() { + uv run --python "$PY_VERSION" python - <<'PYEOF' +import sys +print(sys.executable) +PYEOF +} + +ENV_PYTHON="$(resolve_env_python)" # [SECTION: SETUP] -DEFAULT_COV_LIMIT=100 QUICK_MODE=false CI_MODE=false CLEAN_CACHE=true @@ -79,7 +89,7 @@ fi # psutil availability: cached once for heartbeat CPU/memory stats. # psutil is a dev dependency — available in the test venv. HAS_PSUTIL=false -python -c "import psutil" 2>/dev/null && HAS_PSUTIL=true || true +"$ENV_PYTHON" -c "import psutil" 2>/dev/null && HAS_PSUTIL=true || true log_group_start() { [[ "$IS_GHA" == "true" ]] && echo "::group::$1"; echo -e "\n${BOLD}${CYAN}=== $1 ===${RESET}"; } log_group_end() { [[ "$IS_GHA" == "true" ]] && echo "::endgroup::"; return 0; } @@ -139,7 +149,7 @@ _heartbeat_daemon() { fi if [[ "$HAS_PSUTIL" == "true" ]]; then local stats - stats=$(python -c " + stats=$("$ENV_PYTHON" -c " import psutil try: p = psutil.Process(${watched_pid}) @@ -274,10 +284,10 @@ pre_flight_diagnostics() { else echo "[ INFO ] Environment : System/User ($VIRTUAL_ENV)" fi - echo "[ INFO ] Python : $(python --version)" + echo "[ INFO ] Python : $("$ENV_PYTHON" --version)" echo "[ INFO ] Import Mode : Installed package (PYTHONPATH unset)" - - if ! command -v pytest >/dev/null 2>&1; then + + if ! uv run --python "$PY_VERSION" python -c "import pytest" >/dev/null 2>&1; then echo "[ FAIL ] Tooling : Pytest missing (uv sync required)" exit 1 fi @@ -287,11 +297,21 @@ pre_flight_diagnostics() { pre_flight_diagnostics # Navigation -PROJECT_ROOT="$PWD" -while [[ "$PROJECT_ROOT" != "/" && ! -f "$PROJECT_ROOT/pyproject.toml" ]]; do - PROJECT_ROOT="$(dirname "$PROJECT_ROOT")" -done -cd "$PROJECT_ROOT" +PYPROJECT_CONFIG="$PROJECT_ROOT/pyproject.toml" + +read_coverage_threshold() { + "$ENV_PYTHON" - "$PYPROJECT_CONFIG" <<'PYEOF' +import sys +import tomllib +from pathlib import Path + +data = tomllib.loads(Path(sys.argv[1]).read_text(encoding="utf-8")) +threshold = data["tool"]["coverage"]["report"]["fail_under"] +print(int(threshold)) +PYEOF +} + +DEFAULT_COV_LIMIT="$(read_coverage_threshold)" # Clean Caches if [[ "$CLEAN_CACHE" == "true" ]]; then @@ -318,7 +338,7 @@ if [[ "$QUICK_MODE" == "false" && -d "src" ]]; then done fi -declare -a CMD=("pytest") +declare -a CMD=("uv" "run" "--python" "$PY_VERSION" "pytest") # forced agent-friendly defaults handled via env vars where possible export NO_COLOR=1 @@ -366,7 +386,7 @@ EXIT_CODE=$? set -e END_TIME="${EPOCHREALTIME}" -DURATION=$(printf "%.3f" "$(echo "$END_TIME - $START_TIME" | bc)") +DURATION=$(awk -v start="$START_TIME" -v end="$END_TIME" 'BEGIN { printf "%.3f", end - start }') # [SECTION: ANALYSIS] HYPOTHESIS_FAILURE="false" @@ -485,7 +505,7 @@ for item in "${FAILED_TEST_LIST[@]}"; do echo "$item" >> "$FAILED_TESTS_FILE" done -python -c "$PYTHON_JSON_SCRIPT" "$FAILED_TESTS_FILE" "$EXIT_CODE" "$DURATION" "$TESTS_PASSED" "$TESTS_FAILED" "$TESTS_SKIPPED" "$SKIPPED_FUZZ" "$SKIPPED_OTHER" "$COVERAGE_PCT" "$HYPOTHESIS_FAILURE" "$SCRIPT_NAME" "$SCRIPT_VERSION" +"$ENV_PYTHON" -c "$PYTHON_JSON_SCRIPT" "$FAILED_TESTS_FILE" "$EXIT_CODE" "$DURATION" "$TESTS_PASSED" "$TESTS_FAILED" "$TESTS_SKIPPED" "$SKIPPED_FUZZ" "$SKIPPED_OTHER" "$COVERAGE_PCT" "$HYPOTHESIS_FAILURE" "$SCRIPT_NAME" "$SCRIPT_VERSION" rm -f "$FAILED_TESTS_FILE" echo "[SUMMARY-JSON-END]" @@ -493,7 +513,7 @@ echo "[SUMMARY-JSON-END]" if [[ $EXIT_CODE -ne 0 && ${#FAILED_TEST_LIST[@]} -gt 0 ]]; then echo -e "\n${YELLOW}[DEBUG-SUGGESTION]${RESET}" echo "The following tests failed. Run this command to debug the first failure:" - echo " uv run --python \"$PY_VERSION\" pytest ${FAILED_TEST_LIST[0]} --pdb" + echo " UV_PROJECT_ENVIRONMENT=\"$TARGET_VENV\" uv run --python \"$PY_VERSION\" pytest ${FAILED_TEST_LIST[0]} --pdb" fi if [[ $EXIT_CODE -ne 0 ]]; then diff --git a/scripts/validate-devcontainer.sh b/scripts/validate-devcontainer.sh new file mode 100755 index 00000000..7b385142 --- /dev/null +++ b/scripts/validate-devcontainer.sh @@ -0,0 +1,149 @@ +#!/usr/bin/env bash +# Build-time and contract-level validation for the committed contributor devcontainer surface. + +set -euo pipefail + +die() { + printf 'error: %s\n' "$1" >&2 + exit 1 +} + +resolve_script_dir() { + local source_path="${BASH_SOURCE[0]}" + while [[ -h "${source_path}" ]]; do + local source_dir + source_dir="$(cd -P -- "$(dirname -- "${source_path}")" && pwd)" + source_path="$(readlink "${source_path}")" + if [[ "${source_path}" != /* ]]; then + source_path="${source_dir}/${source_path}" + fi + done + cd -P -- "$(dirname -- "${source_path}")" && pwd +} + +repo_root="$(cd "$(resolve_script_dir)/.." && pwd)" +readonly repo_root +readonly dockerfile_path="${repo_root}/.devcontainer/Dockerfile" +readonly config_path="${repo_root}/.devcontainer/devcontainer.json" +readonly user_home_repair_script="${repo_root}/scripts/devcontainer-prepare-user-home.sh" + +command -v docker >/dev/null 2>&1 || die "docker is required to validate the contributor devcontainer" +command -v python3 >/dev/null 2>&1 || die "python3 is required to validate devcontainer.json" +[[ -f "${dockerfile_path}" ]] || die "missing ${dockerfile_path}" +[[ -f "${config_path}" ]] || die "missing ${config_path}" +[[ -f "${user_home_repair_script}" ]] || die "missing ${user_home_repair_script}" + +python3 - <<'PY' "${config_path}" +import json +import sys +from pathlib import Path + +config = json.loads(Path(sys.argv[1]).read_text()) + +expected_feature = "ghcr.io/devcontainers/features/docker-outside-of-docker:1" +features = config.get("features", {}) +if expected_feature not in features: + raise SystemExit(f"missing {expected_feature} feature") + +build = config.get("build", {}) +if build.get("dockerfile") != "Dockerfile": + raise SystemExit("devcontainer build.dockerfile must stay 'Dockerfile'") +if build.get("context") != ".": + raise SystemExit("devcontainer build.context must stay '.'") + +if config.get("remoteUser") != "vscode": + raise SystemExit("remoteUser must stay 'vscode'") + +if config.get("workspaceFolder") != "/workspaces/ftllexengine": + raise SystemExit("workspaceFolder must stay /workspaces/ftllexengine") + +workspace_mount = config.get("workspaceMount", "") +if "target=/workspaces/ftllexengine" not in workspace_mount: + raise SystemExit("workspaceMount must bind into /workspaces/ftllexengine") +if "consistency=cached" in workspace_mount: + raise SystemExit("workspaceMount must not use consistency=cached") + +mounts = config.get("mounts", []) +if not any("target=/home/vscode/.cache" in mount for mount in mounts): + raise SystemExit("devcontainer must keep a named general cache volume") + +env = config.get("containerEnv", {}) +if env.get("FTLLEXENGINE_DEVCONTAINER") != "1": + raise SystemExit("devcontainer must set FTLLEXENGINE_DEVCONTAINER=1") +if env.get("CLANG_BIN") != "/usr/local/bin/clang": + raise SystemExit("devcontainer must expose CLANG_BIN=/usr/local/bin/clang") +if env.get("UV_LINK_MODE") != "copy": + raise SystemExit("devcontainer must set UV_LINK_MODE=copy for bind-mounted workspace installs") + +if config.get("postStartCommand") != "./scripts/devcontainer-prepare-user-home.sh": + raise SystemExit("devcontainer must repair cache mounts on start") + +settings = config.get("customizations", {}).get("vscode", {}).get("settings", {}) +if settings.get("terminal.integrated.defaultProfile.linux") != "bash": + raise SystemExit("devcontainer must default terminals to bash") + +extensions = config.get("customizations", {}).get("vscode", {}).get("extensions", []) +for extension_id in ( + "ms-python.python", + "ms-python.mypy-type-checker", + "charliermarsh.ruff", +): + if extension_id not in extensions: + raise SystemExit(f"{extension_id} must remain installed in the devcontainer") +PY + +readonly image_tag="ftllexengine-devcontainer-validate:local" +readonly cache_volume="ftllexengine-devcontainer-validate-cache-$$" + +cleanup() { + docker volume rm -f "${cache_volume}" >/dev/null 2>&1 || true +} +trap cleanup EXIT + +docker build \ + --file "${dockerfile_path}" \ + --tag "${image_tag}" \ + "${repo_root}/.devcontainer" >/dev/null + +docker run --rm "${image_tag}" bash -lc ' + set -euo pipefail + python3.13 --version | grep -E "^Python 3\.13" >/dev/null + uv --version >/dev/null + git --version >/dev/null + bash --version | head -1 | grep -E "version 5" >/dev/null + shellcheck --version >/dev/null + clang --version >/dev/null + test -n "$(find "$(clang --print-resource-dir)"/lib/linux -maxdepth 1 -name "libclang_rt.fuzzer*.a" -print -quit)" +' + +docker run --rm \ + --env CLANG_BIN=/usr/local/bin/clang \ + --env UV_LINK_MODE=copy \ + "${image_tag}" bash -lc ' + set -euo pipefail + test "${CLANG_BIN}" = "/usr/local/bin/clang" + test "${UV_LINK_MODE}" = "copy" +' + +docker volume create "${cache_volume}" >/dev/null + +docker run --rm --user root \ + --volume "${cache_volume}:/home/vscode/.cache" \ + "${image_tag}" bash -lc ' + set -euo pipefail + install -d -o root -g root /home/vscode/.cache/uv + touch /home/vscode/.cache/uv/root-owned-marker + ' + +docker run --rm \ + --interactive \ + --volume "${cache_volume}:/home/vscode/.cache" \ + "${image_tag}" bash -lc ' + set -euo pipefail + cat > /tmp/devcontainer-prepare-user-home.sh + chmod +x /tmp/devcontainer-prepare-user-home.sh + /tmp/devcontainer-prepare-user-home.sh + touch /home/vscode/.cache/uv/user-writable-marker + ' < "${user_home_repair_script}" + +printf 'devcontainer validation: success\n' diff --git a/scripts/validate_docs.py b/scripts/validate_docs.py index 40a082c9..0dbfddfb 100755 --- a/scripts/validate_docs.py +++ b/scripts/validate_docs.py @@ -1,5 +1,4 @@ #!/usr/bin/env python3 -# @lint-plugin: DocValidator """Validate code examples in documentation files against project parsers. Ensures that documentation never "lies" by verifying that every code block @@ -35,6 +34,7 @@ import json import os import re +import shutil import subprocess import sys import tomllib @@ -59,6 +59,8 @@ class CheckConfig: parser_path: str language: str = "ftl" python_exec_globs: list[str] = field(default_factory=list) + shell_exec_globs: list[str] = field(default_factory=list) + shell_exec_timeout_seconds: int = 180 @classmethod def from_pyproject(cls, root: Path) -> Self: @@ -85,6 +87,8 @@ def from_pyproject(cls, root: Path) -> Self: parser_path=config.get("parser_path", ""), language=config.get("language", "ftl"), python_exec_globs=config.get("python_exec_globs", []), + shell_exec_globs=config.get("shell_exec_globs", []), + shell_exec_timeout_seconds=config.get("shell_exec_timeout_seconds", 180), ) @@ -157,9 +161,54 @@ def _python_env(root: Path) -> dict[str, str]: del root env = dict(**os.environ) env.pop("PYTHONPATH", None) + env.setdefault("NO_COLOR", "1") return env +def _resolve_shell_runner() -> tuple[str, str]: + """Return the shell executable and preamble used for shell snippet checks. + + Prefer the PATH-resolved Bash installation over the inherited ``BASH`` + environment variable. macOS login shells often export ``BASH=/bin/bash`` + even when a newer Bash is available on ``PATH``; using the PATH-resolved + interpreter matches how repository scripts execute via their shebangs. + """ + shell = ( + shutil.which("bash") + or shutil.which("zsh") + or os.environ.get("SHELL") + or "/bin/sh" + ) + shell_name = Path(shell).name + preamble = "set -euo pipefail" if shell_name in {"bash", "zsh", "ksh", "mksh"} else "set -eu" + return shell, preamble + + +def _normalize_shell_code_for_runtime(code: str) -> str: + """Normalize shell docs for the current runtime context. + + When validation already runs inside the contributor devcontainer, host-side + devcontainer wrapper commands are semantically redundant. Rewrite the + documented wrapper invocations to the inner command so the same published + quick-start remains verifiable both from the host and from inside the + container-owned `./check.sh` flow. + """ + if os.environ.get("FTLLEXENGINE_DEVCONTAINER") != "1": + return code + + normalized_lines: list[str] = [] + for line in code.splitlines(): + stripped = line.strip() + if stripped == "npx --yes @devcontainers/cli up --workspace-folder .": + continue + prefix = "npx --yes @devcontainers/cli exec --workspace-folder . " + if stripped.startswith(prefix): + normalized_lines.append(stripped.removeprefix(prefix)) + continue + normalized_lines.append(line) + return "\n".join(normalized_lines) + + def validate_python_code(code: str, root: Path) -> str | None: """Execute a Python documentation block in isolation.""" try: @@ -183,6 +232,33 @@ def validate_python_code(code: str, root: Path) -> str | None: return stderr or stdout or f"process exited with code {result.returncode}" +def validate_shell_code(code: str, root: Path, timeout_seconds: int) -> str | None: + """Execute a shell documentation block in isolation.""" + env = _python_env(root) + shell, preamble = _resolve_shell_runner() + normalized_code = _normalize_shell_code_for_runtime(code) + + try: + result = subprocess.run( + [shell, "-c", f"{preamble}\n{normalized_code}"], + cwd=root, + env=env, + text=True, + capture_output=True, + timeout=timeout_seconds, + check=False, + ) + except subprocess.TimeoutExpired as exc: + return f"TimeoutExpired: {exc!s}" + + if result.returncode == 0: + return None + + stderr = result.stderr.strip() + stdout = result.stdout.strip() + return stderr or stdout or f"process exited with code {result.returncode}" + + def validate_code(code: str, parser: Any) -> str | None: """Validate a code block using the provided parser. @@ -235,6 +311,7 @@ def process_file( report.files_checked += 1 rel_path = str(md_file.relative_to(root)) python_enabled = any(md_file.match(pattern) for pattern in config.python_exec_globs) + shell_enabled = any(md_file.match(pattern) for pattern in config.shell_exec_globs) for match in pattern.finditer(content): indent = match.group(1) @@ -243,7 +320,8 @@ def process_file( should_validate_ftl = language == config.language should_validate_python = python_enabled and language == "python" - if not should_validate_ftl and not should_validate_python: + should_validate_shell = shell_enabled and language in {"bash", "sh", "shell"} + if not should_validate_ftl and not should_validate_python and not should_validate_shell: continue report.examples_validated += 1 @@ -262,6 +340,10 @@ def process_file( error = validate_python_code(code_block, root) error_type = "PythonRuntimeError" failure_language = "python" + elif should_validate_shell: + error = validate_shell_code(code_block, root, config.shell_exec_timeout_seconds) + error_type = "ShellRuntimeError" + failure_language = language else: error = validate_code(code_block, parser) error_type = "SyntaxError" diff --git a/scripts/validate_env.py b/scripts/validate_env.py deleted file mode 100755 index 9aad71cd..00000000 --- a/scripts/validate_env.py +++ /dev/null @@ -1,146 +0,0 @@ -#!/usr/bin/env python3 -# @lint-plugin: PyEnv -"""Diagnostic plugin: validate the Python environment used by lint plugins. - -Philosophy: - Environment bleed (the system Python slipping into isolated CI runners) is - the #1 cause of flaky linting and tests. This plugin is a universal, - project-agnostic "dead-man's switch" that guarantees the environment - executing it matches the strict requirements of the project. - -Architecture & Specs: - This script relies on `pyproject.toml` as the single source of truth for - project metadata. It strictly requires: - 1. `[project].name`: Used to dynamically resolve the package for the - import test (verifying C-extensions, dependencies, and syntax). - 2. `[project].requires-python`: Used to dynamically set the floor version - for `sys.version_info` (e.g., `>=3.13`). - - If the environment falls below this floor, or if the package cannot imported, - the plugin deliberately catches the failure and emits a high-signal - diagnostic message explaining *exactly* how to fix the CI runner. - -Exit Codes: - 0: Environment is correct (Python >= required, package importable) - 1: Environment bleed detected (version too old or package not importable) -""" - -from __future__ import annotations - -import importlib -import os -import sys -import tomllib -from pathlib import Path - -_FALLBACK_MIN_PYTHON = (3, 8) -_SCRIPT_DIR = Path(__file__).parent -ROOT = _SCRIPT_DIR.parent -PYPROJECT = ROOT / "pyproject.toml" - - -def _read_project_metadata() -> tuple[tuple[int, int], str]: - """Read minimum python version and package name from pyproject.toml.""" - try: - with PYPROJECT.open("rb") as f: - data = tomllib.load(f) - project = data.get("project", {}) - - # Parse ">=3.14,<3.15" or ">=3.13" → (3, 14). - # Split on comma first to isolate the lower-bound specifier before stripping operators. - raw = project.get("requires-python", ">=3.8") - lower = raw.split(",")[0].strip().lstrip(">= ") - parts = lower.split(".") - min_py = (int(parts[0]), int(parts[1]) if len(parts) > 1 else 0) - - # Parse package name, falling back to guessing from src/ if missing - pkg_name = project.get("name", "") - if not pkg_name: - src_dir = ROOT / "src" - subdirs = [ - d.name for d in src_dir.iterdir() - if d.is_dir() and d.name != "__pycache__" and not d.name.endswith(".egg-info") - ] - pkg_name = subdirs[0] if len(subdirs) == 1 else "unknown_package" - - return min_py, pkg_name.replace("-", "_") - except Exception: # pylint: disable=broad-exception-caught - return _FALLBACK_MIN_PYTHON, "unknown_package" - - -def _try_import(pkg_name: str) -> tuple[bool, str]: - """Attempt to import the project package and return (success, detail).""" - if pkg_name == "unknown_package": - return False, "Could not determine package name from pyproject.toml or src/ directory" - - try: - module = importlib.import_module(pkg_name) - version = getattr(module, "__version__", "") - return True, f"version={version}" - except Exception as exc: # pylint: disable=broad-exception-caught - # Diagnostic tool: intentionally catches all import failures to report them. - return False, f"{type(exc).__name__}: {exc}" - - -def main() -> int: - """Run environment validation. Returns 0 on pass, 1 on failure.""" - failures: list[str] = [] - warnings: list[str] = [] - - required, pkg_name = _read_project_metadata() - current = sys.version_info[:2] - - print(f" Python binary : {sys.executable}") - print(f" Python version : {sys.version}") - print(f" Required : >={required[0]}.{required[1]}") - print(f" PYTHONPATH : {os.environ.get('PYTHONPATH', '')}") - print(f" VIRTUAL_ENV : {os.environ.get('VIRTUAL_ENV', '')}") - print(f" sys.prefix : {sys.prefix}") - - if current < required: - failures.append( - f"Python {current[0]}.{current[1]} is below required >={required[0]}.{required[1]}.\n" - f" The lint plugin runner (scripts/lint.sh) uses bare `python`, which\n" - f" resolved to the system Python ({sys.executable}) rather than the\n" - f" venv Python. Fix: change the plugin runner to use the venv Python:\n" - f' if [[ "$file" == *.py ]]; then cmd=("${{TARGET_VENV}}/bin/python" "$file")' - ) - else: - print(f" [PASS] Python {current[0]}.{current[1]} >= {required[0]}.{required[1]}") - - # 2. Package import check - importable, detail = _try_import(pkg_name) - if importable: - print(f" [PASS] import {pkg_name} succeeded ({detail})") - else: - failures.append( - f"import {pkg_name} failed: {detail}\n" - f" Likely cause: Python version incompatibility or PYTHONPATH misconfiguration." - ) - - # 3. venv consistency check - venv = os.environ.get("VIRTUAL_ENV", "") - if venv and not sys.executable.startswith(venv): - warnings.append( - f"VIRTUAL_ENV={venv!r} but sys.executable={sys.executable!r}.\n" - f" The plugin is NOT running in the expected venv. The system Python\n" - f" is being used instead. This is the root cause of version-related\n" - f" plugin failures." - ) - - if warnings: - for w in warnings: - print(f" [WARN] {w}") - - if failures: - print("\n[FAIL] PyEnv: environment check failed") - for f in failures: - print(f" {f}") - return 1 - - print("[PASS] PyEnv: environment is correct.") - return 0 - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/scripts/validate_pyi_sync.py b/scripts/validate_pyi_sync.py index 6278e261..97e01b47 100755 --- a/scripts/validate_pyi_sync.py +++ b/scripts/validate_pyi_sync.py @@ -1,5 +1,4 @@ #!/usr/bin/env python3 -# @lint-plugin: PISync """Validate that src/ftllexengine/__init__.pyi is in sync with __init__.py. Enforces two invariants: @@ -11,10 +10,9 @@ __init__.pyi is the type-authoritative interface for external callers when py.typed is present. Mypy uses the stub exclusively — any symbol in __init__.py that is absent from the stub is invisible to typed callers and causes mypy errors. - More critically: CI lint plugins run `import ftllexengine` in a subprocess whose - venv may diverge from the local pre-built venv; when stub/__all__ diverge the - install metadata becomes inconsistent and the VersionSync plugin reports - "Package not installed or import failed". + More critically: any stub/__all__ divergence makes the installed package + contract inconsistent for typed callers, so the repository's static gate + must fail immediately instead of letting the mismatch leak to users. This plugin is a dead-man's switch: any __all__ change that is not reflected in __init__.pyi breaks the build immediately at the lint stage. diff --git a/scripts/validate_version.py b/scripts/validate_version.py index f4ed1eb4..e1ade8c9 100755 --- a/scripts/validate_version.py +++ b/scripts/validate_version.py @@ -1,5 +1,4 @@ #!/usr/bin/env python3 -# @lint-plugin: VersionSync """Validate version consistency across all project artifacts. Ensures pyproject.toml is the single source of truth for version information, diff --git a/scripts/verify_iso4217.py b/scripts/verify_iso4217.py index 942fb0bf..15c52c8b 100755 --- a/scripts/verify_iso4217.py +++ b/scripts/verify_iso4217.py @@ -1,5 +1,4 @@ #!/usr/bin/env python3 -# @lint-plugin: ISO4217 """Verify ISO 4217 decimal digits and active-code freshness against Babel CLDR. Compares ``ISO_4217_DECIMAL_DIGITS`` and ``ISO_4217_VALID_CODES`` against diff --git a/src/ftllexengine/__init__.py b/src/ftllexengine/__init__.py index efe4e960..f3299d93 100644 --- a/src/ftllexengine/__init__.py +++ b/src/ftllexengine/__init__.py @@ -197,19 +197,26 @@ def __getattr__(name: str) -> object: """Provide a helpful missing-symbol error for Babel-backed facade symbols.""" - if _BABEL_AVAILABLE and name in _BABEL_OPTIONAL_ATTRS: - value = load_babel_optional_export(__name__, name) + if name in _BABEL_OPTIONAL_ATTRS: + if _BABEL_AVAILABLE: + value = load_babel_optional_export(__name__, name) + else: + value = raise_missing_babel_symbol( + module_name=__name__, + name=name, + optional_attrs=_BABEL_OPTIONAL_ATTRS, + parser_only_hint=( + "For parser-only installs, use:\n" + " from ftllexengine.syntax import parse, serialize\n" + " from ftllexengine.syntax.ast import Message, Term, Pattern, ..." + ), + ) globals()[name] = value return value return raise_missing_babel_symbol( module_name=__name__, name=name, optional_attrs=_BABEL_OPTIONAL_ATTRS, - parser_only_hint=( - "For parser-only installs, use:\n" - " from ftllexengine.syntax import parse, serialize\n" - " from ftllexengine.syntax.ast import Message, Term, Pattern, ..." - ), ) diff --git a/src/ftllexengine/_optional_exports.py b/src/ftllexengine/_optional_exports.py index 059bb982..2174f44b 100644 --- a/src/ftllexengine/_optional_exports.py +++ b/src/ftllexengine/_optional_exports.py @@ -10,7 +10,8 @@ from dataclasses import dataclass from importlib import import_module -from typing import NoReturn + +from ftllexengine.core.babel_compat import BabelImportError __all__ = [ "OptionalFacadeExport", @@ -28,6 +29,7 @@ class OptionalFacadeExport: public_name: str source_module: str source_name: str + stub_kind: str = "class" _OPTIONAL_EXPORTS_BY_FACADE: dict[str, tuple[OptionalFacadeExport, ...]] = { @@ -85,16 +87,19 @@ class OptionalFacadeExport: public_name="create_default_registry", source_module="ftllexengine.runtime.functions", source_name="create_default_registry", + stub_kind="callable", ), OptionalFacadeExport( public_name="currency_format", source_module="ftllexengine.runtime.functions", source_name="currency_format", + stub_kind="callable", ), OptionalFacadeExport( public_name="datetime_format", source_module="ftllexengine.runtime.functions", source_name="datetime_format", + stub_kind="callable", ), OptionalFacadeExport( public_name="FluentBundle", @@ -105,16 +110,19 @@ class OptionalFacadeExport: public_name="get_shared_registry", source_module="ftllexengine.runtime.functions", source_name="get_shared_registry", + stub_kind="callable", ), OptionalFacadeExport( public_name="number_format", source_module="ftllexengine.runtime.functions", source_name="number_format", + stub_kind="callable", ), OptionalFacadeExport( public_name="select_plural_category", source_module="ftllexengine.runtime.plural_rules", source_name="select_plural_category", + stub_kind="callable", ), ), } @@ -141,35 +149,83 @@ def babel_optional_attr_set(module_name: str) -> frozenset[str]: def load_babel_optional_export(module_name: str, name: str) -> object: """Resolve one Babel-backed export from the canonical facade contract.""" + export = _optional_export(module_name, name) + module = import_module(export.source_module) + return getattr(module, export.source_name) + + +def _optional_export(module_name: str, name: str) -> OptionalFacadeExport: + """Return the export contract for one optional public name.""" for export in _optional_exports_for(module_name): if export.public_name == name: - module = import_module(export.source_module) - return getattr(module, export.source_name) + return export msg = f"module {module_name!r} has no optional Babel export {name!r}" raise AttributeError(msg) +def _missing_babel_message(name: str, parser_only_hint: str | None) -> str: + """Build the user-facing missing-Babel message for one symbol.""" + error = BabelImportError(name) + message = str(error) + if parser_only_hint is not None: + return f"{message}\n\n{parser_only_hint}" + return message + + +def _build_missing_babel_function(name: str, parser_only_hint: str | None) -> object: + """Create a callable placeholder that fails with BabelImportError when used.""" + message = _missing_babel_message(name, parser_only_hint) + + def _missing(*_args: object, **_kwargs: object) -> object: + error = BabelImportError(name) + error.args = (message,) + raise error + + _missing.__name__ = name + _missing.__qualname__ = name + _missing.__doc__ = message + return _missing + + +def _build_missing_babel_class(name: str, parser_only_hint: str | None) -> object: + """Create a class placeholder that fails with BabelImportError when instantiated.""" + message = _missing_babel_message(name, parser_only_hint) + + def _raise_on_new(_cls: type[object], *_args: object, **_kwargs: object) -> object: + error = BabelImportError(name) + error.args = (message,) + raise error + + return type( + name, + (), + { + "__doc__": message, + "__new__": staticmethod(_raise_on_new), + }, + ) + + def raise_missing_babel_symbol( *, module_name: str, name: str, optional_attrs: frozenset[str], parser_only_hint: str | None = None, -) -> NoReturn: - """Raise a helpful AttributeError for a Babel-backed optional symbol. +) -> object: + """Return a helpful placeholder for one Babel-backed optional symbol. Module attribute access uses ``AttributeError`` so Python feature probes - such as ``hasattr()`` and ``getattr(..., default)`` treat the symbol as - absent in parser-only installs. + such as ``hasattr()`` and ``getattr(..., default)`` treat unknown names as + absent. Optional runtime names resolve to explicit placeholders so import + statements can surface a useful Babel installation error when the symbol is + actually used. """ if name in optional_attrs: - message = ( - f"{name} requires the full runtime install (Babel + CLDR locale data). " - "Install with: pip install ftllexengine[babel]" - ) - if parser_only_hint is not None: - message = f"{message}\n\n{parser_only_hint}" - raise AttributeError(message) + export = _optional_export(module_name, name) + if export.stub_kind == "callable": + return _build_missing_babel_function(name, parser_only_hint) + return _build_missing_babel_class(name, parser_only_hint) message = f"module {module_name!r} has no attribute {name!r}" raise AttributeError(message) diff --git a/src/ftllexengine/core/babel_compat.py b/src/ftllexengine/core/babel_compat.py index 28697d2e..bd4f0058 100644 --- a/src/ftllexengine/core/babel_compat.py +++ b/src/ftllexengine/core/babel_compat.py @@ -46,6 +46,7 @@ def my_function(locale_code: str) -> None: __all__ = [ "BabelImportError", "get_babel_dates", + "get_babel_global_func", "get_babel_languages", "get_babel_numbers", "get_cldr_version", @@ -252,6 +253,21 @@ def get_babel_languages() -> Any: return languages +def get_babel_global_func() -> Any: + """Get the ``babel.core.get_global`` function. + + Returns: + The ``babel.core.get_global`` callable + + Raises: + BabelImportError: If Babel is not installed + """ + require_babel("babel.core.get_global") + from babel.core import get_global # noqa: PLC0415 - Babel-optional + + return get_global + + def get_number_format_error_class() -> type[NumberFormatError]: """Get the babel.numbers.NumberFormatError exception class. diff --git a/src/ftllexengine/localization/__init__.py b/src/ftllexengine/localization/__init__.py index aecbb06e..372ad27d 100644 --- a/src/ftllexengine/localization/__init__.py +++ b/src/ftllexengine/localization/__init__.py @@ -59,18 +59,25 @@ def __getattr__(name: str) -> object: """Raise a targeted missing-symbol error for Babel-backed localization symbols.""" - if _BABEL_AVAILABLE and name in _BABEL_OPTIONAL_ATTRS: - value = load_babel_optional_export(__name__, name) + if name in _BABEL_OPTIONAL_ATTRS: + if _BABEL_AVAILABLE: + value = load_babel_optional_export(__name__, name) + else: + value = raise_missing_babel_symbol( + module_name=__name__, + name=name, + optional_attrs=_BABEL_OPTIONAL_ATTRS, + parser_only_hint=( + "Parser-only usage still supports ResourceLoader, PathResourceLoader, " + "FallbackInfo, ResourceLoadResult, LoadSummary, and CacheAuditLogEntry." + ), + ) globals()[name] = value return value return raise_missing_babel_symbol( module_name=__name__, name=name, optional_attrs=_BABEL_OPTIONAL_ATTRS, - parser_only_hint=( - "Parser-only usage still supports ResourceLoader, PathResourceLoader, " - "FallbackInfo, ResourceLoadResult, LoadSummary, and CacheAuditLogEntry." - ), ) diff --git a/src/ftllexengine/parsing/currency.py b/src/ftllexengine/parsing/currency.py index 321536d2..cadf19e9 100644 --- a/src/ftllexengine/parsing/currency.py +++ b/src/ftllexengine/parsing/currency.py @@ -64,6 +64,8 @@ from ftllexengine.parsing.currency_maps import ( clear_currency_caches as _clear_currency_maps_caches, ) +from ftllexengine.parsing.numbers import _parse_decimal_localized +from ftllexengine.parsing.text_normalization import strip_bidi_format_chars __all__ = [ "_FAST_TIER_UNAMBIGUOUS_SYMBOLS", @@ -274,6 +276,7 @@ def _detect_currency_symbol( def _parse_currency_amount( value: str, + normalized_value: str, match: re.Match[str], locale: Any, locale_code: str, @@ -299,14 +302,18 @@ def _parse_currency_amount( # Remove ONLY the matched occurrence, not all instances. # Prevents corruption if the symbol appears elsewhere in the string. number_str = ( - value[:match.start(1)] + value[match.end(1):] + normalized_value[:match.start(1)] + normalized_value[match.end(1):] ).strip() - try: - amount = parse_decimal_fn(number_str, locale=locale) - except number_format_error as e: + amount, failure_reason = _parse_decimal_localized( + number_str, + locale, + babel_parse_decimal=parse_decimal_fn, + number_format_error_class=number_format_error, + ) + if amount is None: diagnostic = ErrorTemplate.parse_amount_invalid( - number_str, value, str(e), + number_str, value, str(failure_reason), ) context = FrozenErrorContext( input_value=str(value), @@ -429,6 +436,7 @@ def parse_currency( diagnostic=diagnostic, context=context, ),)) + normalized_value = strip_bidi_format_chars(value) # Guard: Babel silently accepts locale codes containing non-BCP-47 characters # (e.g. '/', '\x00') instead of raising UnknownLocaleError, then uses default @@ -465,7 +473,7 @@ def parse_currency( ),)) # Phase 2: Detect currency symbol/code - match, detect_error = _detect_currency_symbol(value, locale_code) + match, detect_error = _detect_currency_symbol(normalized_value, locale_code) if detect_error is not None or match is None: if detect_error is not None: return (None, (detect_error,)) @@ -519,6 +527,7 @@ def parse_currency( # Phase 4: Parse numeric amount amount, amount_error = _parse_currency_amount( value, + normalized_value, match, locale, locale_code, diff --git a/src/ftllexengine/parsing/currency_maps.py b/src/ftllexengine/parsing/currency_maps.py index 2ff5a53e..a69f2c63 100644 --- a/src/ftllexengine/parsing/currency_maps.py +++ b/src/ftllexengine/parsing/currency_maps.py @@ -13,6 +13,7 @@ is_babel_available, ) from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.parsing.text_normalization import strip_bidi_format_chars ISO_CURRENCY_CODE_LENGTH: int = 3 @@ -321,7 +322,9 @@ def _build_symbol_mappings( for currency_code in all_currencies: for locale in symbol_lookup_locales: try: - symbol = get_currency_symbol(currency_code, locale=locale) + symbol = strip_bidi_format_chars( + get_currency_symbol(currency_code, locale=locale) + ) is_iso_format = ( len(symbol) == ISO_CURRENCY_CODE_LENGTH and symbol.isupper() diff --git a/src/ftllexengine/parsing/date_patterns.py b/src/ftllexengine/parsing/date_patterns.py index c453ef4b..2e8c2bfa 100644 --- a/src/ftllexengine/parsing/date_patterns.py +++ b/src/ftllexengine/parsing/date_patterns.py @@ -13,6 +13,7 @@ require_babel, ) from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.parsing.text_normalization import strip_bidi_format_chars __all__ = [ "_babel_to_strptime", @@ -293,6 +294,7 @@ def _preprocess_datetime_input( value: str, locale_code: str | None = None, *, has_era: bool ) -> str: """Strip era text when a pattern requires era preprocessing.""" + value = strip_bidi_format_chars(value) if has_era: return _strip_era(value, locale_code) return value diff --git a/src/ftllexengine/parsing/dates.py b/src/ftllexengine/parsing/dates.py index b9e8b87d..e2c4b327 100644 --- a/src/ftllexengine/parsing/dates.py +++ b/src/ftllexengine/parsing/dates.py @@ -43,6 +43,7 @@ from ftllexengine.diagnostics import ErrorCategory, FrozenErrorContext, FrozenFluentError from ftllexengine.diagnostics.templates import ErrorTemplate +from ftllexengine.parsing.text_normalization import strip_bidi_format_chars from .date_patterns import ( _BABEL_TOKEN_MAP, @@ -167,9 +168,11 @@ def parse_date( errors.append(error) return (None, tuple(errors)) + normalized_value = strip_bidi_format_chars(value) + # Try ISO 8601 first (fastest path) try: - return (datetime.fromisoformat(value).date(), tuple(errors)) + return (datetime.fromisoformat(normalized_value).date(), tuple(errors)) except ValueError: pass @@ -192,7 +195,9 @@ def parse_date( for pattern, has_era in patterns: try: # Preprocess for era tokens before strptime (with localized era names) - parse_value = _preprocess_datetime_input(value, locale_code, has_era=has_era) + parse_value = _preprocess_datetime_input( + normalized_value, locale_code, has_era=has_era + ) return (datetime.strptime(parse_value, pattern).date(), tuple(errors)) except ValueError: continue @@ -284,9 +289,11 @@ def parse_datetime( errors.append(error) return (None, tuple(errors)) + normalized_value = strip_bidi_format_chars(value) + # Try ISO 8601 first (fastest path) try: - parsed = datetime.fromisoformat(value) + parsed = datetime.fromisoformat(normalized_value) if tzinfo is not None and parsed.tzinfo is None: parsed = parsed.replace(tzinfo=tzinfo) return (parsed, tuple(errors)) @@ -312,7 +319,9 @@ def parse_datetime( for pattern, has_era in patterns: try: # Preprocess for era tokens before strptime (with localized era names) - parse_value = _preprocess_datetime_input(value, locale_code, has_era=has_era) + parse_value = _preprocess_datetime_input( + normalized_value, locale_code, has_era=has_era + ) parsed = datetime.strptime(parse_value, pattern) if tzinfo is not None and parsed.tzinfo is None: parsed = parsed.replace(tzinfo=tzinfo) diff --git a/src/ftllexengine/parsing/numbers.py b/src/ftllexengine/parsing/numbers.py index 6752c4bc..e3277adc 100644 --- a/src/ftllexengine/parsing/numbers.py +++ b/src/ftllexengine/parsing/numbers.py @@ -16,6 +16,7 @@ """ from decimal import Decimal, InvalidOperation +from typing import Any from ftllexengine.core.babel_compat import ( get_locale_class, @@ -36,6 +37,7 @@ ParseResult, ) from ftllexengine.diagnostics.templates import ErrorTemplate +from ftllexengine.parsing.text_normalization import strip_bidi_format_chars __all__ = ["parse_decimal", "parse_fluent_number"] @@ -80,6 +82,96 @@ def _validate_group_positions( return 1 <= len(groups[0]) <= secondary_group +def _iter_numbering_systems(locale: Any) -> tuple[str, ...]: + """Return numbering systems to try for locale-aware parsing.""" + default_system = getattr(locale, "default_numbering_system", "latn") + if not isinstance(default_system, str) or not default_system: + return ("latn",) + if default_system == "latn": + return ("latn",) + return (default_system, "latn") + + +def _get_grouping_profile( + locale: Any, + numbering_system: str, +) -> tuple[str, str, int, int]: + """Return grouping separators and sizes for one numbering system.""" + try: + symbols = locale.number_symbols[numbering_system] + group_sep: str = symbols.get("group", "") + decimal_sep: str = symbols.get("decimal", ".") + fmt = locale.decimal_formats[None] + raw_grouping = getattr(fmt, "grouping", (3, 3)) + primary_group: int = raw_grouping[0] if raw_grouping else 3 + secondary_group: int = ( + raw_grouping[1] + if len(raw_grouping) > 1 and raw_grouping[1] != 0 + else primary_group + ) + return (group_sep, decimal_sep, primary_group, secondary_group) + except (AttributeError, KeyError, IndexError, TypeError): + return ("", ".", 3, 3) + + +def _parse_decimal_localized( + value: str, + locale: Any, + *, + babel_parse_decimal: Any, + number_format_error_class: type[Exception], +) -> tuple[Decimal | None, str | None]: + """Parse one numeric string across the locale's supported numbering systems.""" + last_reason: str | None = None + grouping_reason: str | None = None + + for numbering_system in _iter_numbering_systems(locale): + group_sep, decimal_sep, primary_group, secondary_group = _get_grouping_profile( + locale, numbering_system + ) + if ( + group_sep + and group_sep in value + and not _validate_group_positions( + value, group_sep, decimal_sep, primary_group, secondary_group + ) + ): + if grouping_reason is None: + grouping_reason = "group separators not at standard digit-boundary positions" + continue + + try: + amount = babel_parse_decimal( + value, + locale=locale, + numbering_system=numbering_system, + ) + return (amount, None) + except TypeError: + try: + amount = babel_parse_decimal(value, locale=locale) + return (amount, None) + except ( + number_format_error_class, + InvalidOperation, + ValueError, + AttributeError, + TypeError, + ) as exc: + last_reason = str(exc) + except ( + number_format_error_class, + InvalidOperation, + ValueError, + AttributeError, + ) as exc: + last_reason = str(exc) + + if grouping_reason is not None: + return (None, grouping_reason) + return (None, last_reason or f"{value!r} is not a valid decimal number") + + def parse_decimal( value: str, locale_code: str, @@ -138,6 +230,22 @@ def parse_decimal( number_format_error_class = get_number_format_error_class() babel_parse_decimal = get_parse_decimal_func() + if not isinstance(value, str): + diagnostic = ErrorTemplate.parse_decimal_failed( # type: ignore[unreachable] + str(value), locale_code, f"Expected string, got {type(value).__name__}" + ) + context = FrozenErrorContext( + input_value=str(value), + locale_code=locale_code, + parse_type="decimal", + ) + errors.append(FrozenFluentError( + str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context + )) + return (None, tuple(errors)) + + normalized_value = strip_bidi_format_chars(value) + # Guard: Babel silently accepts locale codes containing non-BCP-47 characters # (e.g. '/', '\x00') instead of raising UnknownLocaleError, then uses default # number format settings and parses any valid-looking number successfully. @@ -169,61 +277,26 @@ def parse_decimal( errors.append(error) return (None, tuple(errors)) - # Guard: Babel strips group separators without validating their positions. - # Extract the locale's group/decimal symbols and expected group sizes, then - # reject inputs where non-leftmost groups have the wrong digit count (e.g., - # "1,2,3" for en_US becomes Decimal("123") without this check). - try: - _ns = locale.number_symbols[locale.default_numbering_system] - group_sep: str = _ns.get("group", "") - decimal_sep: str = _ns.get("decimal", ".") - fmt = locale.decimal_formats[None] - raw_grouping = getattr(fmt, "grouping", (3, 3)) - primary_group: int = raw_grouping[0] if raw_grouping else 3 - secondary_group: int = ( - raw_grouping[1] - if len(raw_grouping) > 1 and raw_grouping[1] != 0 - else primary_group - ) - except (AttributeError, KeyError, IndexError, TypeError): - group_sep, decimal_sep, primary_group, secondary_group = "", ".", 3, 3 - - if ( - group_sep - and group_sep in value - and not _validate_group_positions( - value, group_sep, decimal_sep, primary_group, secondary_group - ) - ): - diagnostic = ErrorTemplate.parse_decimal_failed( - value, locale_code, "group separators not at standard digit-boundary positions" - ) - context = FrozenErrorContext( - input_value=str(value), - locale_code=locale_code, - parse_type="decimal", - ) - error = FrozenFluentError( - str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context - ) - return (None, (error,)) - - try: - return (babel_parse_decimal(value, locale=locale), tuple(errors)) - except ( - number_format_error_class, InvalidOperation, ValueError, AttributeError, TypeError, - ) as e: - diagnostic = ErrorTemplate.parse_decimal_failed(value, locale_code, str(e)) - context = FrozenErrorContext( - input_value=str(value), - locale_code=locale_code, - parse_type="decimal", - ) - error = FrozenFluentError( - str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context - ) - errors.append(error) - return (None, tuple(errors)) + amount, failure_reason = _parse_decimal_localized( + normalized_value, + locale, + babel_parse_decimal=babel_parse_decimal, + number_format_error_class=number_format_error_class, + ) + if amount is not None: + return (amount, tuple(errors)) + + diagnostic = ErrorTemplate.parse_decimal_failed(value, locale_code, str(failure_reason)) + context = FrozenErrorContext( + input_value=str(value), + locale_code=locale_code, + parse_type="decimal", + ) + error = FrozenFluentError( + str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context + ) + errors.append(error) + return (None, tuple(errors)) def parse_fluent_number( diff --git a/src/ftllexengine/parsing/text_normalization.py b/src/ftllexengine/parsing/text_normalization.py new file mode 100644 index 00000000..dae1fb87 --- /dev/null +++ b/src/ftllexengine/parsing/text_normalization.py @@ -0,0 +1,33 @@ +"""Helpers for normalizing human-entered parsing inputs.""" + +from __future__ import annotations + +_BIDI_FORMAT_TRANSLATION = str.maketrans( + "", + "", + ( + "\u061c" # ARABIC LETTER MARK + "\u200e" # LEFT-TO-RIGHT MARK + "\u200f" # RIGHT-TO-LEFT MARK + "\u202a" # LEFT-TO-RIGHT EMBEDDING + "\u202b" # RIGHT-TO-LEFT EMBEDDING + "\u202c" # POP DIRECTIONAL FORMATTING + "\u202d" # LEFT-TO-RIGHT OVERRIDE + "\u202e" # RIGHT-TO-LEFT OVERRIDE + "\u2066" # LEFT-TO-RIGHT ISOLATE + "\u2067" # RIGHT-TO-LEFT ISOLATE + "\u2068" # FIRST STRONG ISOLATE + "\u2069" # POP DIRECTIONAL ISOLATE + ), +) + + +def strip_bidi_format_chars(value: str) -> str: + """Remove invisible bidi-format controls from user-facing strings. + + Locale renderers and Fluent's isolation mode can legitimately inject + formatting-only directionality marks around otherwise parseable content. + Parsing APIs normalize them away so users can roundtrip copied UI text + without having to pre-clean invisible characters themselves. + """ + return value.translate(_BIDI_FORMAT_TRANSLATION) diff --git a/src/ftllexengine/runtime/__init__.py b/src/ftllexengine/runtime/__init__.py index e873f5c0..fd6b9b98 100644 --- a/src/ftllexengine/runtime/__init__.py +++ b/src/ftllexengine/runtime/__init__.py @@ -51,19 +51,26 @@ def __getattr__(name: str) -> object: """Raise a targeted missing-symbol error for Babel-backed runtime symbols.""" - if _BABEL_AVAILABLE and name in _BABEL_OPTIONAL_ATTRS: - value = load_babel_optional_export(__name__, name) + if name in _BABEL_OPTIONAL_ATTRS: + if _BABEL_AVAILABLE: + value = load_babel_optional_export(__name__, name) + else: + value = raise_missing_babel_symbol( + module_name=__name__, + name=name, + optional_attrs=_BABEL_OPTIONAL_ATTRS, + parser_only_hint=( + "Parser-only usage keeps CacheConfig, FluentNumber, FunctionRegistry, " + "fluent_function, make_fluent_number, ValidationResult, and cache entry " + "types importable. Locale-formatting helpers require the full runtime extra." + ), + ) globals()[name] = value return value return raise_missing_babel_symbol( module_name=__name__, name=name, optional_attrs=_BABEL_OPTIONAL_ATTRS, - parser_only_hint=( - "Parser-only usage keeps CacheConfig, FluentNumber, FunctionRegistry, " - "fluent_function, make_fluent_number, ValidationResult, and cache entry types " - "importable. Locale-formatting helpers require the full runtime extra." - ), ) diff --git a/src/ftllexengine/runtime/cache.py b/src/ftllexengine/runtime/cache.py index a6b7696b..cf353173 100644 --- a/src/ftllexengine/runtime/cache.py +++ b/src/ftllexengine/runtime/cache.py @@ -118,6 +118,7 @@ class IntegrityCache(_CacheStatsMixin, _CacheAuditMixin, _CacheKeyMixin): __slots__ = ( "_audit_log", + "_audit_sequence", "_cache", "_combined_weight_skips", "_corruption_detected", @@ -189,6 +190,7 @@ def __init__( self._audit_log: deque[WriteLogEntry] | None = ( deque(maxlen=max_audit_entries) if enable_audit else None ) + self._audit_sequence = 0 self._max_audit_entries = max_audit_entries # Statistics diff --git a/src/ftllexengine/runtime/cache_audit.py b/src/ftllexengine/runtime/cache_audit.py index 53ba7ec7..893ba980 100644 --- a/src/ftllexengine/runtime/cache_audit.py +++ b/src/ftllexengine/runtime/cache_audit.py @@ -32,6 +32,7 @@ def _audit( if self._audit_log is None: return + self._audit_sequence += 1 key_hash = hashlib.blake2b( str(key).encode("utf-8", errors="surrogatepass"), digest_size=8, @@ -41,7 +42,8 @@ def _audit( operation=operation, key_hash=key_hash, timestamp=time.monotonic(), - sequence=entry.sequence if entry is not None else 0, + sequence=self._audit_sequence, + cache_sequence=entry.sequence if entry is not None else self._sequence, checksum_hex=entry.checksum.hex() if entry is not None else "", wall_time_unix=time.time(), ) diff --git a/src/ftllexengine/runtime/cache_protocols.py b/src/ftllexengine/runtime/cache_protocols.py index 7de3ba63..2a4bc2f4 100644 --- a/src/ftllexengine/runtime/cache_protocols.py +++ b/src/ftllexengine/runtime/cache_protocols.py @@ -15,6 +15,7 @@ class CacheStateProtocol(Protocol): """Structural contract implemented by IntegrityCache.""" _audit_log: deque[WriteLogEntry] | None + _audit_sequence: int _cache: OrderedDict[_CacheKey, IntegrityCacheEntry] _combined_weight_skips: int _corruption_detected: int diff --git a/src/ftllexengine/runtime/cache_types.py b/src/ftllexengine/runtime/cache_types.py index b62692a2..01aeb316 100644 --- a/src/ftllexengine/runtime/cache_types.py +++ b/src/ftllexengine/runtime/cache_types.py @@ -209,6 +209,7 @@ class WriteLogEntry: key_hash: str timestamp: float sequence: int + cache_sequence: int checksum_hex: str wall_time_unix: float diff --git a/src/ftllexengine/runtime/locale_context.py b/src/ftllexengine/runtime/locale_context.py index 5a5295b7..6f142896 100644 --- a/src/ftllexengine/runtime/locale_context.py +++ b/src/ftllexengine/runtime/locale_context.py @@ -54,6 +54,13 @@ format_number_for_locale, get_iso_code_pattern_for_locale, ) +from ftllexengine.runtime.locale_resolution import ( + UNKNOWN_LOCALE_WARNING_LIMIT, + get_fallback_babel_locale, + is_definitely_unknown_locale, + log_fallback_warning, + reset_locale_resolution_state, +) if TYPE_CHECKING: from datetime import date, datetime @@ -67,6 +74,8 @@ logger = logging.getLogger(__name__) +_UNKNOWN_LOCALE_WARNING_LIMIT = UNKNOWN_LOCALE_WARNING_LIMIT + # Sentinel for factory method authorization. # Only create() and create_or_raise() pass this token to __init__. _FACTORY_TOKEN = object() @@ -99,7 +108,7 @@ class LocaleContext: >>> ctx.format_number(Decimal('1234.5'), use_grouping=True) # doctest: +SKIP '1 234,5' - Unknown locales fall back to `en_US` formatting rules with a warning: + Unknown locales fall back to `en_US` formatting rules with bounded warnings: >>> ctx = LocaleContext.create('xx-UNKNOWN') # doctest: +SKIP >>> ctx.locale_code # doctest: +SKIP 'xx_unknown' @@ -121,7 +130,6 @@ class LocaleContext: # Note: ClassVar is excluded from dataclass fields _cache: ClassVar[OrderedDict[str, LocaleContext]] = OrderedDict() _cache_lock: ClassVar[Lock] = Lock() - locale_code: LocaleCode _babel_locale: Locale is_fallback: bool = False @@ -143,6 +151,8 @@ def clear_cache(cls) -> None: """Clear the locale context cache. Use this method to free memory or reset state in tests. + Locale metadata and fallback-warning state are reset alongside + cached instances so repeated test runs stay deterministic. Thread-safe via Lock. Example: @@ -155,6 +165,7 @@ def clear_cache(cls) -> None: """ with cls._cache_lock: cls._cache.clear() + reset_locale_resolution_state() @classmethod def cache_size(cls) -> int: @@ -202,8 +213,9 @@ def create(cls, locale_code: str) -> LocaleContext: Factory method that validates and canonicalizes the locale boundary before construction. Structurally invalid boundary values are rejected immediately. - Unknown but structurally valid locales log a warning and fall back to en_US - formatting rules. Use create_or_raise() if unknown locales must fail fast. + Unknown but structurally valid locales log bounded warnings and fall + back to en_US formatting rules. Use create_or_raise() if unknown locales + must fail fast. Thread Safety: Uses OrderedDict with Lock for thread-safe LRU caching. @@ -225,12 +237,13 @@ def create(cls, locale_code: str) -> LocaleContext: >>> ctx = LocaleContext.create('xx_UNKNOWN') # Unknown locale # doctest: +SKIP >>> ctx.locale_code # doctest: +SKIP 'xx_unknown' - Formatting still uses `en_US` rules, with a warning logged: + Formatting uses `en_US` rules, with a warning logged: """ normalized_locale = require_locale_code(locale_code, "locale_code") + exceeds_typical_length = len(normalized_locale) > MAX_LOCALE_CODE_LENGTH # Warn for locale codes exceeding typical BCP 47 limit - if len(normalized_locale) > MAX_LOCALE_CODE_LENGTH: + if exceeds_typical_length: logger.warning( "Locale code exceeds typical BCP 47 length of %d characters: " "'%s...' (%d characters). Attempting Babel validation.", @@ -252,60 +265,43 @@ def create(cls, locale_code: str) -> LocaleContext: # Create new instance (Locale.parse is thread-safe) used_fallback = False - try: - babel_locale = locale_class.parse(normalized_locale) - except unknown_locale_error_class as e: - if len(normalized_locale) > MAX_LOCALE_CODE_LENGTH: - logger.warning( - "Unknown locale '%s' (exceeds %d chars): %s. Falling back to en_US", - normalized_locale, - MAX_LOCALE_CODE_LENGTH, - e, - ) - else: - logger.warning( - "Unknown locale '%s': %s. Falling back to en_US", - normalized_locale, - e, - ) - babel_locale = locale_class.parse("en_US") + if is_definitely_unknown_locale(normalized_locale): + log_fallback_warning( + normalized_locale=normalized_locale, + exceeds_typical_length=exceeds_typical_length, + detail="not present in Babel locale data", + kind="unknown", + ) + babel_locale = get_fallback_babel_locale(locale_class) used_fallback = True - except ValueError as e: - if len(normalized_locale) > MAX_LOCALE_CODE_LENGTH: - logger.warning( - "Invalid locale format '%s' (exceeds %d chars): %s. Falling back to en_US", - normalized_locale, - MAX_LOCALE_CODE_LENGTH, - e, + else: + try: + babel_locale = locale_class.parse(normalized_locale) + except unknown_locale_error_class as e: + log_fallback_warning( + normalized_locale=normalized_locale, + exceeds_typical_length=exceeds_typical_length, + detail=str(e), + kind="unknown", ) - else: - logger.warning( - "Invalid locale format '%s': %s. Falling back to en_US", - normalized_locale, - e, + babel_locale = get_fallback_babel_locale(locale_class) + used_fallback = True + except ValueError as e: + log_fallback_warning( + normalized_locale=normalized_locale, + exceeds_typical_length=exceeds_typical_length, + detail=str(e), + kind="invalid", ) - babel_locale = locale_class.parse("en_US") - used_fallback = True + babel_locale = get_fallback_babel_locale(locale_class) + used_fallback = True - ctx = cls( - locale_code=normalized_locale, - _babel_locale=babel_locale, - is_fallback=used_fallback, - _factory_token=_FACTORY_TOKEN, + return cls._cache_context( + normalized_locale=normalized_locale, + babel_locale=babel_locale, + used_fallback=used_fallback, ) - # Add to cache with lock (double-check pattern for thread safety) - with cls._cache_lock: - if normalized_locale in cls._cache: - return cls._cache[normalized_locale] - - # Evict LRU if cache is full - if len(cls._cache) >= MAX_LOCALE_CACHE_SIZE: - cls._cache.popitem(last=False) - - cls._cache[normalized_locale] = ctx - return ctx - @classmethod def create_or_raise(cls, locale_code: str) -> LocaleContext: """Create LocaleContext or raise on validation failure. @@ -313,11 +309,12 @@ def create_or_raise(cls, locale_code: str) -> LocaleContext: Strict validation method that raises ValueError for invalid locales. Use this in tests or when silent fallback is not acceptable. - Validates the locale code strictly via Babel, then delegates to - ``create()`` for cache lookup and population. This ensures that: + Validates the locale code strictly via Babel, then caches the verified + locale instance. This ensures that: - Invalid locales raise ValueError immediately (no silent fallback). - Valid locales are cached and reused, matching ``create()`` semantics. - - Subsequent ``create()`` calls for the same locale hit the cache. + - Subsequent ``create()`` or ``create_or_raise()`` calls for the same + locale hit the cache. Args: locale_code: Locale identifier (e.g., 'en-US', 'lv-LV', 'de-DE') @@ -341,34 +338,60 @@ def create_or_raise(cls, locale_code: str) -> LocaleContext: ValueError: Unknown locale identifier 'invalid-locale' """ require_babel("LocaleContext.create_or_raise") + normalized_locale = require_locale_code(locale_code, "locale_code") + with cls._cache_lock: + cached = cls._cache.get(normalized_locale) + if cached is not None and not cached.is_fallback: + cls._cache.move_to_end(normalized_locale) + return cached + + if is_definitely_unknown_locale(normalized_locale): + msg = f"Unknown locale identifier '{normalized_locale}'" + raise ValueError(msg) from None + locale_class = get_locale_class() unknown_locale_error_class = get_unknown_locale_error_class() - # Validate strictly — raises on unknown or malformed locale. - # locale_class.parse() is called only for validation here; create() - # will use the cache or re-parse as needed. On the first call for a - # locale, parse() executes twice (once here, once inside create() on - # cache miss). On subsequent calls, create() returns the cached - # instance without re-parsing, making this effectively O(1) after - # the first invocation. This is the correct trade-off: correctness - # and cache coherence take precedence over avoiding one extra parse - # on first use. - normalized_locale = require_locale_code(locale_code, "locale_code") - try: - locale_class.parse(normalized_locale) - except unknown_locale_error_class as e: - msg = f"Unknown locale identifier '{normalized_locale}': {e}" + babel_locale = locale_class.parse(normalized_locale) + except unknown_locale_error_class: + msg = f"Unknown locale identifier '{normalized_locale}'" raise ValueError(msg) from None except ValueError as e: msg = f"Invalid locale format '{normalized_locale}': {e}" raise ValueError(msg) from None - # Locale is valid — delegate to create() for proper cache management. - # create() will find the key in cache (populated by the parse above - # if another thread raced) or re-parse and insert. Either way the - # result is identical to create(locale_code) for a valid locale. - return cls.create(normalized_locale) + return cls._cache_context( + normalized_locale=normalized_locale, + babel_locale=babel_locale, + used_fallback=False, + ) + + @classmethod + def _cache_context( + cls, + *, + normalized_locale: str, + babel_locale: Locale, + used_fallback: bool, + ) -> LocaleContext: + """Insert or reuse a LocaleContext in the shared LRU cache.""" + ctx = cls( + locale_code=normalized_locale, + _babel_locale=babel_locale, + is_fallback=used_fallback, + _factory_token=_FACTORY_TOKEN, + ) + + with cls._cache_lock: + if normalized_locale in cls._cache: + return cls._cache[normalized_locale] + + if len(cls._cache) >= MAX_LOCALE_CACHE_SIZE: + cls._cache.popitem(last=False) + + cls._cache[normalized_locale] = ctx + return ctx @property def babel_locale(self) -> Locale: diff --git a/src/ftllexengine/runtime/locale_resolution.py b/src/ftllexengine/runtime/locale_resolution.py new file mode 100644 index 00000000..c5ebf560 --- /dev/null +++ b/src/ftllexengine/runtime/locale_resolution.py @@ -0,0 +1,142 @@ +"""Locale resolution helpers shared by ``LocaleContext`` creation paths.""" + +from __future__ import annotations + +import logging +from dataclasses import dataclass +from threading import Lock +from typing import TYPE_CHECKING, Literal + +from ftllexengine.constants import MAX_LOCALE_CODE_LENGTH +from ftllexengine.core.babel_compat import ( + get_babel_global_func, + get_locale_identifiers_func, + require_babel, +) +from ftllexengine.core.locale_utils import normalize_locale + +if TYPE_CHECKING: + from babel import Locale + +UNKNOWN_LOCALE_WARNING_LIMIT = 8 + +logger = logging.getLogger("ftllexengine.runtime.locale_context") + + +@dataclass(slots=True) +class _LocaleResolutionState: + known_locales: frozenset[str] | None = None + known_languages: frozenset[str] | None = None + fallback_babel_locale: Locale | None = None + fallback_warning_count: int = 0 + fallback_warning_suppressed: bool = False + + +_state_lock = Lock() +_state = _LocaleResolutionState() + + +def reset_locale_resolution_state() -> None: + """Reset cached locale metadata and fallback-warning state.""" + with _state_lock: + _state.known_locales = None + _state.known_languages = None + _state.fallback_babel_locale = None + _state.fallback_warning_count = 0 + _state.fallback_warning_suppressed = False + + +def is_definitely_unknown_locale(normalized_locale: str) -> bool: + """Return True when Babel locale parsing cannot possibly succeed.""" + known_locales, known_languages = _get_locale_metadata() + if normalized_locale in known_locales: + return False + + primary_language = normalized_locale.split("_", 1)[0] + return primary_language not in known_languages + + +def get_fallback_babel_locale(locale_class: type[Locale]) -> Locale: + """Return the shared Babel fallback locale.""" + with _state_lock: + if _state.fallback_babel_locale is None: + _state.fallback_babel_locale = locale_class.parse("en_US") + return _state.fallback_babel_locale + + +def log_fallback_warning( + *, + normalized_locale: str, + exceeds_typical_length: bool, + detail: str, + kind: Literal["invalid", "unknown"], +) -> None: + """Emit a bounded warning for fallback locale handling.""" + emit_detail, emit_suppression = _reserve_fallback_warning_slot() + if emit_detail: + label = "Unknown locale" if kind == "unknown" else "Invalid locale format" + if exceeds_typical_length: + logger.warning( + "%s '%s' (exceeds %d chars): %s. Falling back to en_US", + label, + normalized_locale, + MAX_LOCALE_CODE_LENGTH, + detail, + ) + else: + logger.warning( + "%s '%s': %s. Falling back to en_US", + label, + normalized_locale, + detail, + ) + return + + if emit_suppression: + logger.warning( + "Additional locale fallback warnings suppressed after %d events; " + "most recent locale was '%s'.", + UNKNOWN_LOCALE_WARNING_LIMIT, + normalized_locale, + ) + + +def _get_locale_metadata() -> tuple[frozenset[str], frozenset[str]]: + """Load normalized locale and language metadata used by ``LocaleContext``.""" + with _state_lock: + if _state.known_locales is not None and _state.known_languages is not None: + return _state.known_locales, _state.known_languages + + require_babel("LocaleContext locale metadata") + locale_identifiers_fn = get_locale_identifiers_func() + get_global = get_babel_global_func() + + known_locales = frozenset( + normalize_locale(locale_id) for locale_id in locale_identifiers_fn() + ) + known_languages = { + locale_id.split("_", 1)[0] for locale_id in known_locales + } + known_languages.update( + normalize_locale(alias) + for alias in get_global("language_aliases") + ) + + with _state_lock: + _state.known_locales = known_locales + _state.known_languages = frozenset(known_languages) + return _state.known_locales, _state.known_languages + + +def _reserve_fallback_warning_slot() -> tuple[bool, bool]: + """Return whether to emit a detailed or suppression warning.""" + with _state_lock: + if _state.fallback_warning_count < UNKNOWN_LOCALE_WARNING_LIMIT: + _state.fallback_warning_count += 1 + return True, False + + if not _state.fallback_warning_suppressed: + _state.fallback_warning_suppressed = True + return False, True + + return False, False diff --git a/src/ftllexengine/syntax/serializer.py b/src/ftllexengine/syntax/serializer.py index c31b66f1..714e795e 100644 --- a/src/ftllexengine/syntax/serializer.py +++ b/src/ftllexengine/syntax/serializer.py @@ -27,32 +27,24 @@ CallArguments, Comment, Expression, - FunctionReference, - Identifier, Junk, Message, - MessageReference, - NamedArgument, - NumberLiteral, Pattern, - Placeable, Resource, SelectExpression, - StringLiteral, Term, - TermReference, - TextElement, - VariableReference, ) -from .serializer_lines import ( - _ATTR_INDENT, - _CHAR_PLACEABLE, - _CONT_INDENT, - _VARIANT_INDENT, - _classify_line, - _escape_text, - _LineKind, +from .serializer_engine import emit_classified_line as _emit_classified_line_impl +from .serializer_engine import ( + pattern_needs_separate_line as _pattern_needs_separate_line_impl, ) +from .serializer_engine import serialize_call_arguments as _serialize_call_arguments_impl +from .serializer_engine import serialize_expression as _serialize_expression_impl +from .serializer_engine import serialize_pattern as _serialize_pattern_impl +from .serializer_engine import ( + serialize_select_expression as _serialize_select_expression_impl, +) +from .serializer_lines import _ATTR_INDENT from .serializer_validation import ( SerializationDepthError, SerializationValidationError, @@ -179,8 +171,7 @@ def _serialize_resource( and isinstance(entry, Comment) and prev_entry.type == entry.type ) or ( - isinstance(prev_entry, Comment) - and isinstance(entry, (Message, Term)) + isinstance(prev_entry, Comment) and isinstance(entry, (Message, Term)) # Standalone Comment followed by Message/Term needs extra blank # to prevent the comment from becoming attached on re-parse ) @@ -199,9 +190,7 @@ def _serialize_resource( # (Junk's own line-end "\n" + one blank-line "\n"). # Adding an unconditional "\n" would grow the blank count # by one on every parse/serialize cycle. - trailing_n = len(prev_entry.content) - len( - prev_entry.content.rstrip("\n") - ) + trailing_n = len(prev_entry.content) - len(prev_entry.content.rstrip("\n")) if trailing_n < 2: output.append("\n" * (2 - trailing_n)) else: @@ -229,9 +218,7 @@ def _serialize_entry( case _ as unreachable: # pragma: no cover assert_never(unreachable) - def _serialize_message( - self, node: Message, output: list[str], depth_guard: DepthGuard - ) -> None: + def _serialize_message(self, node: Message, output: list[str], depth_guard: DepthGuard) -> None: """Serialize Message.""" # Comment if present (attached comment, no blank line before message) # Per Fluent spec, attached comments (#) should immediately precede their entry @@ -253,9 +240,7 @@ def _serialize_message( output.append("\n") - def _serialize_term( - self, node: Term, output: list[str], depth_guard: DepthGuard - ) -> None: + def _serialize_term(self, node: Term, output: list[str], depth_guard: DepthGuard) -> None: """Serialize Term.""" # Comment if present (attached comment, no blank line before term) # Per Fluent spec, attached comments (#) should immediately precede their entry @@ -316,261 +301,50 @@ def _serialize_junk(self, node: Junk, output: list[str]) -> None: output.append("\n") def _pattern_needs_separate_line(self, pattern: Pattern) -> bool: - """Check if pattern needs separate-line serialization for roundtrip correctness. - - Returns True when a continuation line with NORMAL content has leading - whitespace that would be consumed as structural indent during re-parse. - Two triggers: - - 1. Cross-element: A TextElement starting with whitespace is preceded by - an element ending with newline, AND its first line classifies as - NORMAL. WHITESPACE_ONLY and SYNTAX_LEADING lines are handled by - per-line wrapping in _emit_classified_line and do not need - separate-line mode. - - 2. Intra-element: A single TextElement contains an embedded newline - followed by whitespace on a NORMAL line (not WHITESPACE_ONLY or - SYNTAX_LEADING, which are handled by per-line wrapping). - - Both triggers use _classify_line to determine if separate-line mode is - actually needed. Per-line wrapping converts TextElements into Placeables, - changing the AST structure on re-parse; triggering separate-line mode for - content that wrapping already handles makes the mode decision unstable - across roundtrips. - """ - prev_ends_newline = False - for elem in pattern.elements: - if isinstance(elem, TextElement): - if prev_ends_newline and elem.value and elem.value[0] == " ": - first_nl = elem.value.find("\n") - first_line = ( - elem.value[:first_nl] if first_nl != -1 - else elem.value - ) - kind, _ = _classify_line(first_line) - if kind is _LineKind.NORMAL: - return True - # Check for embedded newlines followed by whitespace within - # a single TextElement. Only NORMAL lines trigger separate-line - # mode; WHITESPACE_ONLY and SYNTAX_LEADING are handled by - # per-line wrapping in _serialize_pattern. - value = elem.value - idx = value.find("\n") - while idx != -1 and idx + 1 < len(value): - if value[idx + 1] == " ": - next_nl = value.find("\n", idx + 1) - line = value[idx + 1 : next_nl] if next_nl != -1 else value[idx + 1 :] - kind, _ = _classify_line(line) - if kind is _LineKind.NORMAL: - return True - idx = value.find("\n", idx + 1) - prev_ends_newline = value.endswith("\n") - else: - prev_ends_newline = False - return False + """Return True when a pattern requires separate-line mode.""" + return _pattern_needs_separate_line_impl(pattern) - def _serialize_pattern( # noqa: PLR0912 # Branches required by FTL pattern grammar + def _serialize_pattern( # Branches required by FTL pattern grammar self, pattern: Pattern, output: list[str], depth_guard: DepthGuard ) -> None: - """Serialize Pattern elements. - - Handles three concerns in strict order: - 1. Pattern-level: separate-line mode, leading whitespace preservation - 2. Line-level: classify each continuation line via _classify_line, - dispatch to the appropriate wrapping strategy via match/case - 3. Character-level: escape braces via _escape_text - - Per Fluent Spec 1.0: Backslash has no escaping power in TextElements. - Literal braces MUST be expressed as StringLiterals within Placeables. - """ - # Pattern-level: determine separate-line serialization. - needs_separate_line = self._pattern_needs_separate_line(pattern) - if needs_separate_line: - output.append("\n" + _CONT_INDENT) - - # Pattern-level: handle leading whitespace in the first TextElement. - # In FTL, whitespace after '=' is syntactic. A TextElement starting - # with spaces at pattern start loses its whitespace during re-parse. - leading_ws_len = 0 - if ( - pattern.elements - and isinstance(pattern.elements[0], TextElement) - and pattern.elements[0].value - and pattern.elements[0].value[0] == " " - ): - first_value = pattern.elements[0].value - stripped = first_value.lstrip(" ") - leading_ws_len = len(first_value) - len(stripped) - output.append('{ "') - output.append(" " * leading_ws_len) - output.append('" }') - - # Track continuation line state for text elements. - at_line_start = needs_separate_line - - for element in pattern.elements: - if isinstance(element, TextElement): - text = element.value - - # Skip already-emitted leading whitespace on first element. - if leading_ws_len > 0: - text = text[leading_ws_len:] - leading_ws_len = 0 - if not text: - at_line_start = False - continue - - if "\n" in text: - lines = text.split("\n") - # First line segment: classify if at line start, - # otherwise just escape braces. - if at_line_start: - self._emit_classified_line(lines[0], output) - else: - _escape_text(lines[0], output) - # Continuation lines: classify-then-dispatch. - for line in lines[1:]: - output.append("\n ") - self._emit_classified_line(line, output) - # Track state: empty last line means text ended with \n. - at_line_start = not lines[-1] - else: - if at_line_start: - self._emit_classified_line(text, output) - else: - _escape_text(text, output) - at_line_start = False - - else: - output.append("{ ") - with depth_guard: - self._serialize_expression(element.expression, output, depth_guard) - output.append(" }") - at_line_start = False + """Serialize Pattern elements.""" + _serialize_pattern_impl( + pattern, + output, + depth_guard, + pattern_needs_separate_line_fn=self._pattern_needs_separate_line, + emit_classified_line_fn=self._emit_classified_line, + serialize_expression_fn=self._serialize_expression, + ) @staticmethod def _emit_classified_line(line: str, output: list[str]) -> None: - """Classify a line and emit with appropriate wrapping. + """Emit one continuation line after ambiguity classification.""" + _emit_classified_line_impl(line, output) - Single dispatch point for all continuation line ambiguity classes. - Each _LineKind maps to exactly one emission strategy. - """ - kind, ws_len = _classify_line(line) - match kind: - case _LineKind.EMPTY: - pass - case _LineKind.WHITESPACE_ONLY: - output.append('{ "') - output.append(line) - output.append('" }') - case _LineKind.SYNTAX_LEADING: - # Invariant: ALL content whitespace preceding the first - # non-whitespace character on a continuation line must be - # placeable-wrapped. Raw spaces here become indistinguishable - # from structural indent during common-indent stripping. - if ws_len: - output.append('{ "') - output.append(line[:ws_len]) - output.append('" }') - output.append(_CHAR_PLACEABLE[line[ws_len]]) - remaining = line[ws_len + 1 :] - if remaining: - _escape_text(remaining, output) - case _LineKind.NORMAL: - _escape_text(line, output) - case _ as unreachable: # pragma: no cover - assert_never(unreachable) - - def _serialize_expression( # noqa: PLR0912 # Branches required by Expression union type + def _serialize_expression( # Branches required by Expression union type self, expr: Expression, output: list[str], depth_guard: DepthGuard ) -> None: - """Serialize Expression nodes using structural pattern matching. - - Handles all Expression types including nested Placeables (valid per FTL spec). - """ - match expr: - case StringLiteral(value=value): - # Escape special characters per FTL spec. - # Uses \uHHHH for ALL control characters (< 0x20 and 0x7F) - # to produce robust output that works in all editors and parsers. - result: list[str] = [] - for char in value: - code = ord(char) - if char == "\\": - result.append("\\\\") - elif char == '"': - result.append('\\"') - elif code < 0x20 or code == 0x7F: - result.append(f"\\u{code:04X}") - else: - result.append(char) - output.append(f'"{"".join(result)}"') - - case NumberLiteral(raw=raw): - output.append(raw) - - case VariableReference(id=Identifier(name=name)): - output.append(f"${name}") - - case MessageReference(id=Identifier(name=name), attribute=attr): - output.append(name) - if attr: - output.append(f".{attr.name}") - - case TermReference( - id=Identifier(name=name), attribute=attr, arguments=args - ): - output.append(f"-{name}") - if attr: - output.append(f".{attr.name}") - if args: - self._serialize_call_arguments(args, output, depth_guard) - - case FunctionReference(id=Identifier(name=name), arguments=args): - output.append(name) - self._serialize_call_arguments(args, output, depth_guard) - - case Placeable(expression=inner): - # Nested Placeable: { { $var } } is valid per FTL spec - output.append("{ ") - with depth_guard: - self._serialize_expression(inner, output, depth_guard) - output.append(" }") - - case SelectExpression(): - self._serialize_select_expression(expr, output, depth_guard) - - case _ as unreachable: # pragma: no cover - assert_never(unreachable) + """Serialize Expression nodes using structural pattern matching.""" + _serialize_expression_impl( + expr, + output, + depth_guard, + serialize_call_arguments_fn=self._serialize_call_arguments, + serialize_expression_fn=self._serialize_expression, + serialize_select_expression_fn=self._serialize_select_expression, + ) def _serialize_call_arguments( self, args: CallArguments, output: list[str], depth_guard: DepthGuard ) -> None: - """Serialize CallArguments. - - Security: Argument expressions are wrapped in depth_guard to prevent - deeply nested term/function arguments from bypassing depth limits. - Without this, -term(arg: -term(arg: ...)) could cause stack overflow. - """ - output.append("(") - - # Positional arguments - protected by depth_guard to enforce depth limits - for i, arg in enumerate(args.positional): - if i > 0: - output.append(", ") - with depth_guard: - self._serialize_expression(arg, output, depth_guard) - - # Named arguments - protected by depth_guard to enforce depth limits - named_arg: NamedArgument - for i, named_arg in enumerate(args.named): - if i > 0 or args.positional: - output.append(", ") - output.append(f"{named_arg.name.name}: ") - with depth_guard: - self._serialize_expression(named_arg.value, output, depth_guard) - - output.append(")") + """Serialize CallArguments.""" + _serialize_call_arguments_impl( + args, + output, + depth_guard, + serialize_expression_fn=self._serialize_expression, + ) def _serialize_select_expression( self, @@ -579,31 +353,13 @@ def _serialize_select_expression( depth_guard: DepthGuard, ) -> None: """Serialize SelectExpression.""" - # Wrap selector serialization in depth_guard to track depth for DoS protection. - # Without this, a deeply nested selector could bypass depth limits. - with depth_guard: - self._serialize_expression(expr.selector, output, depth_guard) - output.append(" ->") - - for variant in expr.variants: - output.append(_VARIANT_INDENT) - if variant.default: - output.append("*") - output.append("[") - - # Variant key: explicit destructuring for exhaustiveness - match variant.key: - case Identifier(name=name): - output.append(name) - case NumberLiteral(raw=raw): - output.append(raw) - case _ as unreachable: # pragma: no cover - assert_never(unreachable) - - output.append("] ") - self._serialize_pattern(variant.value, output, depth_guard) - - output.append("\n") + _serialize_select_expression_impl( + expr, + output, + depth_guard, + serialize_expression_fn=self._serialize_expression, + serialize_pattern_fn=self._serialize_pattern, + ) def serialize( diff --git a/src/ftllexengine/syntax/serializer_engine.py b/src/ftllexengine/syntax/serializer_engine.py new file mode 100644 index 00000000..36e1d10b --- /dev/null +++ b/src/ftllexengine/syntax/serializer_engine.py @@ -0,0 +1,274 @@ +"""Shared serialization helpers for FluentSerializer.""" + +from __future__ import annotations + +from typing import TYPE_CHECKING, assert_never + +from .ast import ( + CallArguments, + Expression, + FunctionReference, + Identifier, + MessageReference, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + SelectExpression, + StringLiteral, + TermReference, + TextElement, + VariableReference, +) +from .serializer_lines import ( + _CHAR_PLACEABLE, + _CONT_INDENT, + _VARIANT_INDENT, + _classify_line, + _escape_text, + _LineKind, +) + +if TYPE_CHECKING: + from collections.abc import Callable + + from ftllexengine.core.depth_guard import DepthGuard + +__all__ = [ + "emit_classified_line", + "pattern_needs_separate_line", + "serialize_call_arguments", + "serialize_expression", + "serialize_pattern", + "serialize_select_expression", +] + + +def pattern_needs_separate_line(pattern: Pattern) -> bool: + """Return True when separate-line mode is required for roundtrip safety.""" + prev_ends_newline = False + for elem in pattern.elements: + if isinstance(elem, TextElement): + if prev_ends_newline and elem.value and elem.value[0] == " ": + first_nl = elem.value.find("\n") + first_line = elem.value[:first_nl] if first_nl != -1 else elem.value + kind, _ = _classify_line(first_line) + if kind is _LineKind.NORMAL: + return True + value = elem.value + idx = value.find("\n") + while idx != -1 and idx + 1 < len(value): + if value[idx + 1] == " ": + next_nl = value.find("\n", idx + 1) + line = value[idx + 1 : next_nl] if next_nl != -1 else value[idx + 1 :] + kind, _ = _classify_line(line) + if kind is _LineKind.NORMAL: + return True + idx = value.find("\n", idx + 1) + prev_ends_newline = value.endswith("\n") + else: + prev_ends_newline = False + return False + + +def serialize_pattern( # noqa: C901, PLR0912 - FTL pattern grammar needs explicit branching. + pattern: Pattern, + output: list[str], + depth_guard: DepthGuard, + *, + pattern_needs_separate_line_fn: Callable[[Pattern], bool], + emit_classified_line_fn: Callable[[str, list[str]], None], + serialize_expression_fn: Callable[[Expression, list[str], DepthGuard], None], +) -> None: + """Serialize Pattern elements with line and character ambiguity handling.""" + needs_separate_line = pattern_needs_separate_line_fn(pattern) + if needs_separate_line: + output.append("\n" + _CONT_INDENT) + + leading_ws_len = 0 + if ( + pattern.elements + and isinstance(pattern.elements[0], TextElement) + and pattern.elements[0].value + and pattern.elements[0].value[0] == " " + ): + first_value = pattern.elements[0].value + stripped = first_value.lstrip(" ") + leading_ws_len = len(first_value) - len(stripped) + output.append('{ "') + output.append(" " * leading_ws_len) + output.append('" }') + + at_line_start = needs_separate_line + + for element in pattern.elements: + if isinstance(element, TextElement): + text = element.value + + if leading_ws_len > 0: + text = text[leading_ws_len:] + leading_ws_len = 0 + if not text: + at_line_start = False + continue + + if "\n" in text: + lines = text.split("\n") + if at_line_start: + emit_classified_line_fn(lines[0], output) + else: + _escape_text(lines[0], output) + for line in lines[1:]: + output.append("\n ") + emit_classified_line_fn(line, output) + at_line_start = not lines[-1] + else: + if at_line_start: + emit_classified_line_fn(text, output) + else: + _escape_text(text, output) + at_line_start = False + + else: + output.append("{ ") + with depth_guard: + serialize_expression_fn(element.expression, output, depth_guard) + output.append(" }") + at_line_start = False + + +def emit_classified_line(line: str, output: list[str]) -> None: + """Emit one continuation line using the classifier's single dispatch point.""" + kind, ws_len = _classify_line(line) + match kind: + case _LineKind.EMPTY: + pass + case _LineKind.WHITESPACE_ONLY: + output.append('{ "') + output.append(line) + output.append('" }') + case _LineKind.SYNTAX_LEADING: + if ws_len: + output.append('{ "') + output.append(line[:ws_len]) + output.append('" }') + output.append(_CHAR_PLACEABLE[line[ws_len]]) + remaining = line[ws_len + 1 :] + if remaining: + _escape_text(remaining, output) + case _LineKind.NORMAL: + _escape_text(line, output) + case _ as unreachable: # pragma: no cover + assert_never(unreachable) + + +def serialize_expression( # noqa: C901, PLR0912 - Expression union dispatch is intentionally explicit. + expr: Expression, + output: list[str], + depth_guard: DepthGuard, + *, + serialize_call_arguments_fn: Callable[[CallArguments, list[str], DepthGuard], None], + serialize_expression_fn: Callable[[Expression, list[str], DepthGuard], None], + serialize_select_expression_fn: Callable[[SelectExpression, list[str], DepthGuard], None], +) -> None: + """Serialize one Expression union member.""" + match expr: + case StringLiteral(value=value): + result: list[str] = [] + for char in value: + code = ord(char) + if char == "\\": + result.append("\\\\") + elif char == '"': + result.append('\\"') + elif code < 0x20 or code == 0x7F: + result.append(f"\\u{code:04X}") + else: + result.append(char) + output.append(f'"{"".join(result)}"') + case NumberLiteral(raw=raw): + output.append(raw) + case VariableReference(id=Identifier(name=name)): + output.append(f"${name}") + case MessageReference(id=Identifier(name=name), attribute=attr): + output.append(name) + if attr: + output.append(f".{attr.name}") + case TermReference(id=Identifier(name=name), attribute=attr, arguments=args): + output.append(f"-{name}") + if attr: + output.append(f".{attr.name}") + if args: + serialize_call_arguments_fn(args, output, depth_guard) + case FunctionReference(id=Identifier(name=name), arguments=args): + output.append(name) + serialize_call_arguments_fn(args, output, depth_guard) + case Placeable(expression=inner): + output.append("{ ") + with depth_guard: + serialize_expression_fn(inner, output, depth_guard) + output.append(" }") + case SelectExpression(): + serialize_select_expression_fn(expr, output, depth_guard) + case _ as unreachable: # pragma: no cover + assert_never(unreachable) + + +def serialize_call_arguments( + args: CallArguments, + output: list[str], + depth_guard: DepthGuard, + *, + serialize_expression_fn: Callable[[Expression, list[str], DepthGuard], None], +) -> None: + """Serialize positional and named call arguments with depth protection.""" + output.append("(") + + for i, arg in enumerate(args.positional): + if i > 0: + output.append(", ") + with depth_guard: + serialize_expression_fn(arg, output, depth_guard) + + named_arg: NamedArgument + for i, named_arg in enumerate(args.named): + if i > 0 or args.positional: + output.append(", ") + output.append(f"{named_arg.name.name}: ") + with depth_guard: + serialize_expression_fn(named_arg.value, output, depth_guard) + + output.append(")") + + +def serialize_select_expression( + expr: SelectExpression, + output: list[str], + depth_guard: DepthGuard, + *, + serialize_expression_fn: Callable[[Expression, list[str], DepthGuard], None], + serialize_pattern_fn: Callable[[Pattern, list[str], DepthGuard], None], +) -> None: + """Serialize a SelectExpression and its variants.""" + with depth_guard: + serialize_expression_fn(expr.selector, output, depth_guard) + output.append(" ->") + + for variant in expr.variants: + output.append(_VARIANT_INDENT) + if variant.default: + output.append("*") + output.append("[") + + match variant.key: + case Identifier(name=name): + output.append(name) + case NumberLiteral(raw=raw): + output.append(raw) + case _ as unreachable: # pragma: no cover + assert_never(unreachable) + + output.append("] ") + serialize_pattern_fn(variant.value, output, depth_guard) + + output.append("\n") diff --git a/src/ftllexengine/validation/resource.py b/src/ftllexengine/validation/resource.py index 27a9cd53..4c14153e 100644 --- a/src/ftllexengine/validation/resource.py +++ b/src/ftllexengine/validation/resource.py @@ -1,17 +1,9 @@ -"""FTL resource validation. +"""FTL resource validation orchestration. -Provides standalone validation for FTL resources without requiring -a FluentBundle instance. Useful for CI/CD pipelines, linters, and -tooling that needs to validate FTL files without runtime resolution. - -Architecture: - - validate_resource(): Main entry point, orchestrates validation passes - - _extract_syntax_errors(): Pass 1 - Convert Junk entries to ValidationError - - _collect_entries(): Pass 2 - Collect messages/terms, check duplicates - - _check_undefined_references(): Pass 3 - Validate message/term references - - _detect_circular_references(): Pass 4 - Check for reference cycles - - detect_long_chains(): Pass 5 - Check for chains exceeding MAX_DEPTH - - SemanticValidator: Pass 6 - Fluent spec compliance +Coordinates the validation passes used by CI/CD pipelines, linters, and +tooling that need resource checks without booting a full runtime bundle. +Focused helper modules own syntax extraction, entry collection, dependency +graph analysis, and undefined-reference checks. Python 3.13+. """ @@ -24,17 +16,13 @@ from ftllexengine.constants import MAX_DEPTH from ftllexengine.core.reference_graph import detect_cycles, make_cycle_key -from ftllexengine.diagnostics import ( - ValidationError, - ValidationResult, - ValidationWarning, - WarningSeverity, -) -from ftllexengine.diagnostics.codes import DiagnosticCode -from ftllexengine.syntax import Attribute, Junk, Message, Resource, Term +from ftllexengine.diagnostics import ValidationResult, ValidationWarning from ftllexengine.syntax.cursor import LineOffsetCache -from ftllexengine.syntax.reference_extraction import extract_references from ftllexengine.syntax.validator import SemanticValidator +from ftllexengine.validation.resource_entries import ( + check_undefined_references as _check_undefined_references, +) +from ftllexengine.validation.resource_entries import collect_entries as _collect_entries from ftllexengine.validation.resource_graph import ( build_dependency_graph, detect_long_chains, @@ -42,6 +30,9 @@ from ftllexengine.validation.resource_graph import ( detect_circular_references as _detect_circular_references_impl, ) +from ftllexengine.validation.resource_syntax import ( + extract_syntax_errors as _extract_syntax_errors, +) if TYPE_CHECKING: from collections.abc import Mapping @@ -53,395 +44,6 @@ logger = logging.getLogger(__name__) -def _get_entry_position( - entry: Message | Term, - line_cache: LineOffsetCache, -) -> tuple[int | None, int | None]: - """Get line/column from entry's span if available. - - Args: - entry: Message or Term with optional span - line_cache: Line offset cache for position lookup - - Returns: - (line, column) tuple, or (None, None) if no span - """ - if entry.span: - return line_cache.get_line_col(entry.span.start) - return None, None - - -def _annotation_to_diagnostic_code(annotation_code: str) -> DiagnosticCode: - """Resolve a parser annotation code string to a DiagnosticCode enum member. - - Parser annotations use DiagnosticCode member names as their code field - (e.g., "PARSE_JUNK", "PARSE_NESTING_DEPTH_EXCEEDED"). This function - performs a name-based lookup and falls back to PARSE_JUNK for any - annotation code that does not match a DiagnosticCode member. - - Args: - annotation_code: String code from an Annotation (e.g., "PARSE_JUNK") - - Returns: - Matching DiagnosticCode member, or DiagnosticCode.PARSE_JUNK if unknown - """ - try: - return DiagnosticCode[annotation_code] - except KeyError: - return DiagnosticCode.PARSE_JUNK - - -def _extract_syntax_errors( - resource: Resource, - line_cache: LineOffsetCache, -) -> list[ValidationError]: - """Extract syntax errors from Junk entries. - - Converts Junk AST nodes (unparseable content) to structured - ValidationError objects with line/column information. - - Propagates annotations from Junk nodes to preserve specific error codes - and messages from the parser. If a Junk entry has no annotations, falls - back to a generic parse error. - - Args: - resource: Parsed Resource AST (may contain Junk entries) - line_cache: Shared line offset cache for position lookups - - Returns: - List of ValidationError objects for each Junk entry - """ - errors: list[ValidationError] = [] - - for entry in resource.entries: - if isinstance(entry, Junk): - # Propagate annotations from Junk to preserve specific parser errors - if entry.annotations: - for annotation in entry.annotations: - # Use annotation's span if available, otherwise fall back to Junk span - ann_line: int | None = None - ann_column: int | None = None - if annotation.span: - ann_line, ann_column = line_cache.get_line_col( - annotation.span.start - ) - elif entry.span: - ann_line, ann_column = line_cache.get_line_col(entry.span.start) - - errors.append( - ValidationError( - code=_annotation_to_diagnostic_code(annotation.code), - message=annotation.message, - content=entry.content, - line=ann_line, - column=ann_column, - ) - ) - else: - # Fallback for Junk without annotations (shouldn't happen normally) - line: int | None = None - column: int | None = None - if entry.span: - line, column = line_cache.get_line_col(entry.span.start) - - errors.append( - ValidationError( - code=DiagnosticCode.VALIDATION_PARSE_ERROR, - message="Failed to parse FTL content", - content=entry.content, - line=line, - column=column, - ) - ) - - return errors - - -def _check_entry( - entry: Message | Term, - *, - kind: str, - entry_name: str, - attributes: tuple[Attribute, ...], - seen_ids: set[str], - known_ids: frozenset[str] | None, - line_cache: LineOffsetCache, - warnings: list[ValidationWarning], -) -> None: - """Check a single entry for duplicates, shadows, and attribute issues. - - Shared logic for both Message and Term validation in _collect_entries. - - Args: - entry: The Message or Term AST node - kind: Entry kind label ("message" or "term") - entry_name: The entry identifier name - attributes: The entry's attributes tuple - seen_ids: Mutable set of IDs already seen in this namespace - known_ids: Optional set of IDs already in bundle (for shadow detection) - line_cache: Shared line offset cache for position lookups - warnings: Mutable list to append warnings to - """ - # Check for duplicate IDs within namespace - if entry_name in seen_ids: - line, column = _get_entry_position(entry, line_cache) - warnings.append( - ValidationWarning( - code=DiagnosticCode.VALIDATION_DUPLICATE_ID, - message=( - f"Duplicate {kind} ID '{entry_name}' " - f"(later definition will overwrite earlier)" - ), - context=entry_name, - line=line, - column=column, - severity=WarningSeverity.WARNING, - ) - ) - seen_ids.add(entry_name) - - # Check for shadow conflict with known entries - if known_ids and entry_name in known_ids: - line, column = _get_entry_position(entry, line_cache) - warnings.append( - ValidationWarning( - code=DiagnosticCode.VALIDATION_SHADOW_WARNING, - message=( - f"{kind.capitalize()} '{entry_name}' shadows " - f"existing {kind} " - f"(this definition will override the earlier one)" - ), - context=entry_name, - line=line, - column=column, - severity=WarningSeverity.WARNING, - ) - ) - - # Check for duplicate attribute IDs within this entry - seen_attr_ids: set[str] = set() - for attr in attributes: - attr_name = attr.id.name - if attr_name in seen_attr_ids: - line, column = _get_entry_position(entry, line_cache) - warnings.append( - ValidationWarning( - code=DiagnosticCode.VALIDATION_DUPLICATE_ATTRIBUTE, - message=( - f"{kind.capitalize()} '{entry_name}' has " - f"duplicate attribute '{attr_name}' " - f"(later will override earlier)" - ), - context=f"{entry_name}.{attr_name}", - line=line, - column=column, - severity=WarningSeverity.WARNING, - ) - ) - seen_attr_ids.add(attr_name) - - -def _collect_entries( - resource: Resource, - line_cache: LineOffsetCache, - *, - known_messages: frozenset[str] | None = None, - known_terms: frozenset[str] | None = None, -) -> tuple[dict[str, Message], dict[str, Term], list[ValidationWarning]]: - """Collect message/term entries and check for structural issues. - - Performs the following checks: - - Duplicate message IDs (within message namespace) - - Duplicate term IDs (within term namespace) - - Messages without values or attributes - - Duplicate attribute IDs within entries - - Shadow warnings when entry ID conflicts with known entry - - Note: Per Fluent spec, messages and terms have separate namespaces. - A message named "foo" and a term named "foo" are NOT duplicates. - - Args: - resource: Parsed Resource AST - line_cache: Shared line offset cache for position lookups - known_messages: Optional set of message IDs already in bundle - known_terms: Optional set of term IDs already in bundle - - Returns: - Tuple of (messages_dict, terms_dict, warnings) - """ - warnings: list[ValidationWarning] = [] - # Per Fluent spec, messages and terms have separate namespaces. - # A message "foo" and a term "-foo" can coexist without conflict. - seen_message_ids: set[str] = set() - seen_term_ids: set[str] = set() - messages_dict: dict[str, Message] = {} - terms_dict: dict[str, Term] = {} - - for entry in resource.entries: - match entry: - case Message( - id=msg_id, value=value, attributes=attributes - ): - _check_entry( - entry, - kind="message", - entry_name=msg_id.name, - attributes=attributes, - seen_ids=seen_message_ids, - known_ids=known_messages, - line_cache=line_cache, - warnings=warnings, - ) - messages_dict[msg_id.name] = entry - - # Messages without values (defense-in-depth) - # NOTE: Unreachable - parser/AST prevent this. - # Kept for external AST construction scenarios. - if ( # pragma: no cover - value is None and len(attributes) == 0 - ): - line, column = _get_entry_position( # pragma: no cover - entry, line_cache - ) - warnings.append( # pragma: no cover - ValidationWarning( - code=DiagnosticCode.VALIDATION_NO_VALUE_OR_ATTRS, - message=( - f"Message '{msg_id.name}' has " - f"neither value nor attributes" - ), - context=msg_id.name, - line=line, - column=column, - severity=WarningSeverity.WARNING, - ) - ) - - case Term(id=term_id, attributes=attributes): - _check_entry( - entry, - kind="term", - entry_name=term_id.name, - attributes=attributes, - seen_ids=seen_term_ids, - known_ids=known_terms, - line_cache=line_cache, - warnings=warnings, - ) - terms_dict[term_id.name] = entry - - return messages_dict, terms_dict, warnings - - -def _check_undefined_references( - messages_dict: dict[str, Message], - terms_dict: dict[str, Term], - line_cache: LineOffsetCache, - *, - known_messages: frozenset[str] | None = None, - known_terms: frozenset[str] | None = None, -) -> list[ValidationWarning]: - """Check for undefined message and term references. - - Validates that all message and term references in the resource - point to defined entries. Optionally considers entries already - present in a bundle for cross-resource reference validation. - - Args: - messages_dict: Map of message IDs to Message nodes from current resource - terms_dict: Map of term IDs to Term nodes from current resource - line_cache: Shared line offset cache for position lookups - known_messages: Optional set of message IDs already in bundle - known_terms: Optional set of term IDs already in bundle - - Returns: - List of warnings for undefined references - """ - warnings: list[ValidationWarning] = [] - - # Combine current resource entries with known bundle entries - all_messages = set(messages_dict.keys()) - all_terms = set(terms_dict.keys()) - if known_messages is not None: - all_messages |= known_messages - if known_terms is not None: - all_terms |= known_terms - - # Check message references - for msg_name, message in messages_dict.items(): - msg_refs, term_refs = extract_references(message) - line, column = _get_entry_position(message, line_cache) - - for ref in msg_refs: - # Strip attribute qualification for existence check - # "msg.tooltip" -> check if "msg" exists - base_ref = ref.split(".", 1)[0] if "." in ref else ref - if base_ref not in all_messages: - warnings.append( - ValidationWarning( - code=DiagnosticCode.VALIDATION_UNDEFINED_REFERENCE, - message=f"Message '{msg_name}' references undefined message '{base_ref}'", - context=base_ref, - line=line, - column=column, - severity=WarningSeverity.CRITICAL, - ) - ) - - for ref in term_refs: - # Strip attribute qualification for existence check - # "term.attr" -> check if "term" exists - base_ref = ref.split(".", 1)[0] if "." in ref else ref - if base_ref not in all_terms: - warnings.append( - ValidationWarning( - code=DiagnosticCode.VALIDATION_UNDEFINED_REFERENCE, - message=f"Message '{msg_name}' references undefined term '-{base_ref}'", - context=f"-{base_ref}", - line=line, - column=column, - severity=WarningSeverity.CRITICAL, - ) - ) - - # Check term references - for term_name, term in terms_dict.items(): - msg_refs, term_refs = extract_references(term) - line, column = _get_entry_position(term, line_cache) - - for ref in msg_refs: - # Strip attribute qualification for existence check - base_ref = ref.split(".", 1)[0] if "." in ref else ref - if base_ref not in all_messages: - warnings.append( - ValidationWarning( - code=DiagnosticCode.VALIDATION_UNDEFINED_REFERENCE, - message=f"Term '-{term_name}' references undefined message '{base_ref}'", - context=base_ref, - line=line, - column=column, - severity=WarningSeverity.CRITICAL, - ) - ) - - for ref in term_refs: - # Strip attribute qualification for existence check - base_ref = ref.split(".", 1)[0] if "." in ref else ref - if base_ref not in all_terms: - warnings.append( - ValidationWarning( - code=DiagnosticCode.VALIDATION_UNDEFINED_REFERENCE, - message=f"Term '-{term_name}' references undefined term '-{base_ref}'", - context=f"-{base_ref}", - line=line, - column=column, - severity=WarningSeverity.CRITICAL, - ) - ) - - return warnings - - def _detect_circular_references(graph: dict[str, set[str]]) -> list[ValidationWarning]: """Compatibility wrapper preserving patch points for cycle tests.""" return _detect_circular_references_impl( diff --git a/src/ftllexengine/validation/resource_common.py b/src/ftllexengine/validation/resource_common.py new file mode 100644 index 00000000..1996a5af --- /dev/null +++ b/src/ftllexengine/validation/resource_common.py @@ -0,0 +1,21 @@ +"""Shared helpers for resource validation passes.""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +if TYPE_CHECKING: + from ftllexengine.syntax import Junk, Message, Term + from ftllexengine.syntax.cursor import LineOffsetCache + +__all__ = ["get_entry_position"] + + +def get_entry_position( + entry: Message | Term | Junk, + line_cache: LineOffsetCache, +) -> tuple[int | None, int | None]: + """Return the line and column for a spanned resource entry when available.""" + if entry.span: + return line_cache.get_line_col(entry.span.start) + return None, None diff --git a/src/ftllexengine/validation/resource_entries.py b/src/ftllexengine/validation/resource_entries.py new file mode 100644 index 00000000..ca284a00 --- /dev/null +++ b/src/ftllexengine/validation/resource_entries.py @@ -0,0 +1,245 @@ +"""Entry collection and reference checks for resource validation.""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +from ftllexengine.diagnostics import ValidationWarning, WarningSeverity +from ftllexengine.diagnostics.codes import DiagnosticCode +from ftllexengine.syntax import Attribute, Message, Resource, Term +from ftllexengine.syntax.reference_extraction import extract_references + +from .resource_common import get_entry_position + +__all__ = ["check_undefined_references", "collect_entries"] + +if TYPE_CHECKING: + from ftllexengine.syntax.cursor import LineOffsetCache + + +def _check_entry( + entry: Message | Term, + *, + kind: str, + entry_name: str, + attributes: tuple[Attribute, ...], + seen_ids: set[str], + known_ids: frozenset[str] | None, + line_cache: LineOffsetCache, + warnings: list[ValidationWarning], +) -> None: + """Check one message or term for duplicate, shadow, and attribute issues.""" + if entry_name in seen_ids: + line, column = get_entry_position(entry, line_cache) + warnings.append( + ValidationWarning( + code=DiagnosticCode.VALIDATION_DUPLICATE_ID, + message=( + f"Duplicate {kind} ID '{entry_name}' (later definition will overwrite earlier)" + ), + context=entry_name, + line=line, + column=column, + severity=WarningSeverity.WARNING, + ) + ) + seen_ids.add(entry_name) + + if known_ids and entry_name in known_ids: + line, column = get_entry_position(entry, line_cache) + warnings.append( + ValidationWarning( + code=DiagnosticCode.VALIDATION_SHADOW_WARNING, + message=( + f"{kind.capitalize()} '{entry_name}' shadows existing {kind} " + "(this definition will override the earlier one)" + ), + context=entry_name, + line=line, + column=column, + severity=WarningSeverity.WARNING, + ) + ) + + seen_attr_ids: set[str] = set() + for attr in attributes: + attr_name = attr.id.name + if attr_name in seen_attr_ids: + line, column = get_entry_position(entry, line_cache) + warnings.append( + ValidationWarning( + code=DiagnosticCode.VALIDATION_DUPLICATE_ATTRIBUTE, + message=( + f"{kind.capitalize()} '{entry_name}' has duplicate attribute " + f"'{attr_name}' (later will override earlier)" + ), + context=f"{entry_name}.{attr_name}", + line=line, + column=column, + severity=WarningSeverity.WARNING, + ) + ) + seen_attr_ids.add(attr_name) + + +def collect_entries( + resource: Resource, + line_cache: LineOffsetCache, + *, + known_messages: frozenset[str] | None = None, + known_terms: frozenset[str] | None = None, +) -> tuple[dict[str, Message], dict[str, Term], list[ValidationWarning]]: + """Collect message and term entries while recording structural warnings.""" + warnings: list[ValidationWarning] = [] + seen_message_ids: set[str] = set() + seen_term_ids: set[str] = set() + messages_dict: dict[str, Message] = {} + terms_dict: dict[str, Term] = {} + + for entry in resource.entries: + match entry: + case Message(id=msg_id, value=value, attributes=attributes): + _check_entry( + entry, + kind="message", + entry_name=msg_id.name, + attributes=attributes, + seen_ids=seen_message_ids, + known_ids=known_messages, + line_cache=line_cache, + warnings=warnings, + ) + messages_dict[msg_id.name] = entry + + if value is None and len(attributes) == 0: # pragma: no cover + line, column = get_entry_position(entry, line_cache) # pragma: no cover + warnings.append( # pragma: no cover + ValidationWarning( + code=DiagnosticCode.VALIDATION_NO_VALUE_OR_ATTRS, + message=(f"Message '{msg_id.name}' has neither value nor attributes"), + context=msg_id.name, + line=line, + column=column, + severity=WarningSeverity.WARNING, + ) + ) + case Term(id=term_id, attributes=attributes): + _check_entry( + entry, + kind="term", + entry_name=term_id.name, + attributes=attributes, + seen_ids=seen_term_ids, + known_ids=known_terms, + line_cache=line_cache, + warnings=warnings, + ) + terms_dict[term_id.name] = entry + + return messages_dict, terms_dict, warnings + + +def _base_reference(ref: str) -> str: + """Return the entry id portion of a possibly attribute-qualified reference.""" + return ref.split(".", 1)[0] if "." in ref else ref + + +def _append_missing_reference_warnings( + refs: frozenset[str] | set[str], + *, + owner_label: str, + target_kind: str, + available_ids: frozenset[str] | set[str], + context_prefix: str, + line: int | None, + column: int | None, + warnings: list[ValidationWarning], +) -> None: + """Append warnings for references that target ids absent from the known set.""" + for ref in refs: + base_ref = _base_reference(ref) + if base_ref in available_ids: + continue + + display_ref = f"{context_prefix}{base_ref}" + warnings.append( + ValidationWarning( + code=DiagnosticCode.VALIDATION_UNDEFINED_REFERENCE, + message=(f"{owner_label} references undefined {target_kind} '{display_ref}'"), + context=display_ref, + line=line, + column=column, + severity=WarningSeverity.CRITICAL, + ) + ) + + +def check_undefined_references( + messages_dict: dict[str, Message], + terms_dict: dict[str, Term], + line_cache: LineOffsetCache, + *, + known_messages: frozenset[str] | None = None, + known_terms: frozenset[str] | None = None, +) -> list[ValidationWarning]: + """Return warnings for references that point to unknown messages or terms.""" + warnings: list[ValidationWarning] = [] + + all_messages = set(messages_dict) + all_terms = set(terms_dict) + if known_messages is not None: + all_messages |= known_messages + if known_terms is not None: + all_terms |= known_terms + + for msg_name, message in messages_dict.items(): + msg_refs, term_refs = extract_references(message) + line, column = get_entry_position(message, line_cache) + + _append_missing_reference_warnings( + msg_refs, + owner_label=f"Message '{msg_name}'", + target_kind="message", + available_ids=all_messages, + context_prefix="", + line=line, + column=column, + warnings=warnings, + ) + _append_missing_reference_warnings( + term_refs, + owner_label=f"Message '{msg_name}'", + target_kind="term", + available_ids=all_terms, + context_prefix="-", + line=line, + column=column, + warnings=warnings, + ) + + for term_name, term in terms_dict.items(): + msg_refs, term_refs = extract_references(term) + line, column = get_entry_position(term, line_cache) + + _append_missing_reference_warnings( + msg_refs, + owner_label=f"Term '-{term_name}'", + target_kind="message", + available_ids=all_messages, + context_prefix="", + line=line, + column=column, + warnings=warnings, + ) + _append_missing_reference_warnings( + term_refs, + owner_label=f"Term '-{term_name}'", + target_kind="term", + available_ids=all_terms, + context_prefix="-", + line=line, + column=column, + warnings=warnings, + ) + + return warnings diff --git a/src/ftllexengine/validation/resource_syntax.py b/src/ftllexengine/validation/resource_syntax.py new file mode 100644 index 00000000..a17833e1 --- /dev/null +++ b/src/ftllexengine/validation/resource_syntax.py @@ -0,0 +1,73 @@ +"""Syntax-error extraction helpers for resource validation.""" + +from __future__ import annotations + +from typing import TYPE_CHECKING + +from ftllexengine.diagnostics import ValidationError +from ftllexengine.diagnostics.codes import DiagnosticCode +from ftllexengine.syntax import Junk, Resource + +from .resource_common import get_entry_position + +__all__ = ["extract_syntax_errors"] + +if TYPE_CHECKING: + from ftllexengine.syntax.cursor import LineOffsetCache + + +def _annotation_to_diagnostic_code(annotation_code: str) -> DiagnosticCode: + """Resolve one parser annotation code to the matching diagnostic enum.""" + try: + return DiagnosticCode[annotation_code] + except KeyError: + return DiagnosticCode.PARSE_JUNK + + +def extract_syntax_errors( + resource: Resource, + line_cache: LineOffsetCache, +) -> list[ValidationError]: + """Convert Junk entries into structured validation errors.""" + errors: list[ValidationError] = [] + + for entry in resource.entries: + if not isinstance(entry, Junk): + continue + + if entry.annotations: + for annotation in entry.annotations: + ann_line: int | None = None + ann_column: int | None = None + if annotation.span: + ann_line, ann_column = line_cache.get_line_col(annotation.span.start) + elif entry.span: + ann_line, ann_column = get_entry_position(entry, line_cache) + + errors.append( + ValidationError( + code=_annotation_to_diagnostic_code(annotation.code), + message=annotation.message, + content=entry.content, + line=ann_line, + column=ann_column, + ) + ) + continue + + line: int | None = None + column: int | None = None + if entry.span: + line, column = get_entry_position(entry, line_cache) + + errors.append( + ValidationError( + code=DiagnosticCode.VALIDATION_PARSE_ERROR, + message="Failed to parse FTL content", + content=entry.content, + line=line, + column=column, + ) + ) + + return errors diff --git a/tests/diagnostics_frozen_error_cases/__init__.py b/tests/diagnostics_frozen_error_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/diagnostics_frozen_error_cases/branch_coverage.py b/tests/diagnostics_frozen_error_cases/branch_coverage.py new file mode 100644 index 00000000..98728224 --- /dev/null +++ b/tests/diagnostics_frozen_error_cases/branch_coverage.py @@ -0,0 +1,577 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +import pytest +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ( + Diagnostic, + DiagnosticCode, + ErrorCategory, + FrozenErrorContext, + FrozenFluentError, + SourceSpan, +) +from ftllexengine.integrity import ImmutabilityViolationError +from tests.strategies.diagnostics import error_categories + +# ============================================================================= +# Strategies for generating test data +# ============================================================================= + + +@st.composite +def error_messages(draw: st.DrawFn) -> str: + """Generate valid error messages.""" + return draw(st.text(min_size=1, max_size=200)) + + +@st.composite +def optional_diagnostics(draw: st.DrawFn) -> Diagnostic | None: + """Generate optional Diagnostic objects.""" + if draw(st.booleans()): + code = draw(st.sampled_from(list(DiagnosticCode))) + message = draw(st.text(min_size=1, max_size=100)) + return Diagnostic(code=code, message=message, severity="error") + return None + + +@st.composite +def optional_contexts(draw: st.DrawFn) -> FrozenErrorContext | None: + """Generate optional FrozenErrorContext objects.""" + if draw(st.booleans()): + return FrozenErrorContext( + input_value=draw(st.text(min_size=0, max_size=50)), + locale_code=draw(st.text(min_size=1, max_size=10)), + parse_type=draw(st.sampled_from( + ["", "currency", "date", "datetime", "decimal", "number"] + )), + fallback_value=draw(st.text(min_size=0, max_size=50)), + ) + return None + + +@st.composite +def frozen_fluent_errors(draw: st.DrawFn) -> FrozenFluentError: + """Generate FrozenFluentError instances.""" + return FrozenFluentError( + message=draw(error_messages()), + category=draw(error_categories()), + diagnostic=draw(optional_diagnostics()), + context=draw(optional_contexts()), + ) + + +# ============================================================================= +# Content Hash Properties +# ============================================================================= + + + +class TestCompleteBranchCoverage: + """Tests to achieve 100% branch coverage for errors.py.""" + + def test_setattr_unfrozen_branch(self) -> None: + """Test __setattr__ when _frozen is False (line 176 coverage). + + This tests the defensive else branch in __setattr__ that allows + attribute setting when the object is not yet frozen. While this + branch is not normally reached (since __init__ uses object.__setattr__ + directly), it exists as a defensive measure. + + This test forcibly unfreezes an error to exercise the branch. + """ + error = FrozenFluentError("test", ErrorCategory.REFERENCE) + + # Verify object is initially frozen + assert error.verify_integrity() is True + + # Forcibly unfreeze using object.__setattr__ to bypass immutability + object.__setattr__(error, "_frozen", False) + + # Now call the instance's __setattr__ DIRECTLY - should reach line 176 + # Must use the class method, not object.__setattr__ + FrozenFluentError.__setattr__(error, "_message", "modified") + + # Verify the change took effect (since we unfroze it) + assert error._message == "modified" + + # Re-freeze for cleanup + object.__setattr__(error, "_frozen", True) + + def test_eq_with_non_error_type_returns_not_implemented(self) -> None: + """Test __eq__ returns NotImplemented for non-FrozenFluentError types. + + The __eq__ method should return NotImplemented (not False) when + comparing with objects that are not FrozenFluentError instances. + This allows Python to try the comparison from the other object's + perspective. + """ + error = FrozenFluentError("test", ErrorCategory.REFERENCE) + + # Test with various non-FrozenFluentError types + # Direct dunder call required to verify NotImplemented return value + # (using == operator would convert NotImplemented to False) + result = error.__eq__(42) # pylint: disable=unnecessary-dunder-call + assert result is NotImplemented + + result = error.__eq__("string") # pylint: disable=unnecessary-dunder-call + assert result is NotImplemented + + result = error.__eq__({"dict": "value"}) # pylint: disable=unnecessary-dunder-call + assert result is NotImplemented + + result = error.__eq__([1, 2, 3]) # pylint: disable=unnecessary-dunder-call + assert result is NotImplemented + + # The actual equality operator should return False (Python's default) + assert (error == 42) is False + assert (error == "string") is False + + def test_compute_content_hash_with_all_fields(self) -> None: + """Test _compute_content_hash with all optional fields populated. + + This ensures the hash computation includes all diagnostic and context + fields when present, achieving full branch coverage in the hash + computation logic. + """ + diagnostic = Diagnostic( + code=DiagnosticCode.MESSAGE_NOT_FOUND, + message="Test diagnostic message", + ) + context = FrozenErrorContext( + input_value="test input", + locale_code="en_US", + parse_type="number", + fallback_value="fallback", + ) + + error1 = FrozenFluentError( + "test message", + ErrorCategory.FORMATTING, + diagnostic=diagnostic, + context=context, + ) + + # Create another with same values + error2 = FrozenFluentError( + "test message", + ErrorCategory.FORMATTING, + diagnostic=diagnostic, + context=context, + ) + + # Hashes should be identical + assert error1.content_hash == error2.content_hash + + # Verify hash includes all fields by changing each one + error3 = FrozenFluentError( + "different message", # Changed + ErrorCategory.FORMATTING, + diagnostic=diagnostic, + context=context, + ) + assert error1.content_hash != error3.content_hash + + diagnostic2 = Diagnostic( + code=DiagnosticCode.TERM_NOT_FOUND, # Different code + message="Test diagnostic message", + ) + error4 = FrozenFluentError( + "test message", + ErrorCategory.FORMATTING, + diagnostic=diagnostic2, # Changed + context=context, + ) + assert error1.content_hash != error4.content_hash + + context2 = FrozenErrorContext( + input_value="different input", # Changed + locale_code="en_US", + parse_type="number", + fallback_value="fallback", + ) + error5 = FrozenFluentError( + "test message", + ErrorCategory.FORMATTING, + diagnostic=diagnostic, + context=context2, # Changed + ) + assert error1.content_hash != error5.content_hash + + def test_hash_with_surrogates_in_text(self) -> None: + """Test content hash computation with invalid Unicode surrogates. + + The hash function uses surrogatepass error handling to ensure it can + hash any Python string, including those with unpaired surrogates from + malformed user input. + """ + # Create error with unpaired surrogate (invalid Unicode) + # Python allows these in strings but they're not valid UTF-8 + message_with_surrogate = "Error: \ud800 invalid" + + error = FrozenFluentError(message_with_surrogate, ErrorCategory.PARSE) + + # Should successfully compute hash without raising UnicodeEncodeError + assert len(error.content_hash) == 16 + assert error.verify_integrity() is True + + # Test with surrogate in context fields + context = FrozenErrorContext( + input_value="\ud800 surrogate input", + locale_code="en_US", + parse_type="currency", + fallback_value="\ud800\udc00 surrogate fallback", + ) + error_with_context = FrozenFluentError( + "test", + ErrorCategory.FORMATTING, + context=context, + ) + assert len(error_with_context.content_hash) == 16 + assert error_with_context.verify_integrity() is True + + @given( + message=st.text(), + category=error_categories(), + ) + @settings(max_examples=50) + def test_repr_contains_all_constructor_args( + self, message: str, category: ErrorCategory + ) -> None: + """Property: __repr__ includes all constructor arguments for debugging.""" + error = FrozenFluentError(message, category) + r = repr(error) + + # Should contain class name + assert "FrozenFluentError" in r + + # Should contain all field names + assert "message=" in r + assert "category=" in r + assert "diagnostic=" in r + assert "context=" in r + + # Message should be represented (possibly truncated in repr) + # Category should be shown + assert category.name in r or str(category) in r + event(f"category={category.name}") + + def test_hash_with_diagnostic_span(self) -> None: + """Test content hash computation with Diagnostic containing SourceSpan. + + This exercises lines 196-199 in _compute_content_hash where span + fields are hashed when diagnostic.span is not None. + """ + # Create diagnostic WITH span + diagnostic_with_span = Diagnostic( + code=DiagnosticCode.MESSAGE_NOT_FOUND, + message="Test message", + span=SourceSpan(start=10, end=20, line=5, column=3), + severity="error", + ) + + error1 = FrozenFluentError( + "test", + ErrorCategory.REFERENCE, + diagnostic=diagnostic_with_span, + ) + + # Create another with same span + error2 = FrozenFluentError( + "test", + ErrorCategory.REFERENCE, + diagnostic=diagnostic_with_span, + ) + + # Should have identical hashes + assert error1.content_hash == error2.content_hash + + # Create diagnostic with different span values + diagnostic_different_span = Diagnostic( + code=DiagnosticCode.MESSAGE_NOT_FOUND, + message="Test message", + span=SourceSpan(start=100, end=200, line=10, column=15), + severity="error", + ) + + error3 = FrozenFluentError( + "test", + ErrorCategory.REFERENCE, + diagnostic=diagnostic_different_span, + ) + + # Should have different hash + assert error1.content_hash != error3.content_hash + + # Verify integrity + assert error1.verify_integrity() is True + assert error3.verify_integrity() is True + + def test_hash_with_diagnostic_optional_fields(self) -> None: + """Test content hash computation with all Diagnostic optional fields. + + This exercises line 215 in _compute_content_hash where optional + string fields (hint, help_url, etc.) are hashed when not None. + """ + # Create diagnostic with ALL optional string fields populated + diagnostic_full = Diagnostic( + code=DiagnosticCode.FUNCTION_FAILED, + message="Function error", + hint="Check your arguments", + help_url="https://example.com/help", + function_name="NUMBER", + argument_name="value", + expected_type="int | Decimal", + received_type="str", + ftl_location="messages.ftl:42", + severity="error", + ) + + error1 = FrozenFluentError( + "test", + ErrorCategory.RESOLUTION, + diagnostic=diagnostic_full, + ) + + # Create another with same fields + error2 = FrozenFluentError( + "test", + ErrorCategory.RESOLUTION, + diagnostic=diagnostic_full, + ) + + # Should have identical hashes + assert error1.content_hash == error2.content_hash + + # Change one optional field + diagnostic_changed = Diagnostic( + code=DiagnosticCode.FUNCTION_FAILED, + message="Function error", + hint="Different hint", # Changed + help_url="https://example.com/help", + function_name="NUMBER", + argument_name="value", + expected_type="int | Decimal", + received_type="str", + ftl_location="messages.ftl:42", + severity="error", + ) + + error3 = FrozenFluentError( + "test", + ErrorCategory.RESOLUTION, + diagnostic=diagnostic_changed, + ) + + # Should have different hash + assert error1.content_hash != error3.content_hash + + # Verify integrity + assert error1.verify_integrity() is True + assert error3.verify_integrity() is True + + def test_hash_with_diagnostic_resolution_path(self) -> None: + """Test content hash computation with Diagnostic resolution_path. + + This exercises lines 225-228 in _compute_content_hash where + resolution_path tuple elements are hashed when not None. + """ + # Create diagnostic with resolution_path + diagnostic_with_path = Diagnostic( + code=DiagnosticCode.CYCLIC_REFERENCE, + message="Cyclic reference detected", + resolution_path=("message1", "message2", "message3"), + severity="error", + ) + + error1 = FrozenFluentError( + "test", + ErrorCategory.CYCLIC, + diagnostic=diagnostic_with_path, + ) + + # Create another with same path + error2 = FrozenFluentError( + "test", + ErrorCategory.CYCLIC, + diagnostic=diagnostic_with_path, + ) + + # Should have identical hashes + assert error1.content_hash == error2.content_hash + + # Create diagnostic with different resolution_path + diagnostic_different_path = Diagnostic( + code=DiagnosticCode.CYCLIC_REFERENCE, + message="Cyclic reference detected", + resolution_path=("message1", "message4", "message5"), # Different + severity="error", + ) + + error3 = FrozenFluentError( + "test", + ErrorCategory.CYCLIC, + diagnostic=diagnostic_different_path, + ) + + # Should have different hash + assert error1.content_hash != error3.content_hash + + # Create diagnostic with empty resolution_path + diagnostic_empty_path = Diagnostic( + code=DiagnosticCode.CYCLIC_REFERENCE, + message="Cyclic reference detected", + resolution_path=(), # Empty tuple + severity="error", + ) + + error4 = FrozenFluentError( + "test", + ErrorCategory.CYCLIC, + diagnostic=diagnostic_empty_path, + ) + + # Should have different hash from non-empty path + assert error1.content_hash != error4.content_hash + + # Verify integrity + assert error1.verify_integrity() is True + assert error3.verify_integrity() is True + assert error4.verify_integrity() is True + + def test_setattr_allows_python_exception_attributes(self) -> None: + """Test __setattr__ allows Python exception mechanism attributes. + + This exercises lines 254-255 in __setattr__ where Python's exception + handling attributes (__traceback__, __context__, __cause__, + __suppress_context__) are allowed even after freeze. + """ + error = FrozenFluentError("test", ErrorCategory.REFERENCE) + + # Python exception attributes should be settable even after freeze + # These are set by Python's exception handling mechanism + import sys + + # Create a dummy traceback by raising and catching + tb = None + try: + msg = "dummy" + raise ValueError(msg) + except ValueError: + tb = sys.exc_info()[2] + + # Should NOT raise ImmutabilityViolationError + error.__traceback__ = tb + assert error.__traceback__ is tb + + # Test __context__ (exception chaining) + context_error = ValueError("context") + error.__context__ = context_error + assert error.__context__ is context_error + + # Test __cause__ (explicit exception chaining) + cause_error = RuntimeError("cause") + error.__cause__ = cause_error + assert error.__cause__ is cause_error + + # Test __suppress_context__ + error.__suppress_context__ = True + assert error.__suppress_context__ is True + + # Verify error is still frozen for other attributes + with pytest.raises(ImmutabilityViolationError): + error._message = "modified" + + # Verify integrity is maintained + assert error.verify_integrity() is True + + def test_notes_attribute_allowed_for_python_311_compatibility(self) -> None: + """__notes__ attribute can be set for Python 3.11+ exception groups. + + Python 3.11 added __notes__ for Exception Groups (PEP 654/678). + FrozenFluentError must allow this attribute to be set even after freeze + to support exception enrichment via add_note() and exception groups. + """ + error = FrozenFluentError("test", ErrorCategory.RESOLUTION) + + # Simulate what Python's add_note() does internally + # (it sets __notes__ attribute if not present, then appends) + error.__notes__ = [] + error.__notes__.append("additional context") + error.__notes__.append("more info") + + # Verify notes were set + assert hasattr(error, "__notes__") + assert error.__notes__ == ["additional context", "more info"] + + # Verify error is still frozen for other attributes + with pytest.raises(ImmutabilityViolationError): + error._message = "modified" + + # Verify integrity is maintained + assert error.verify_integrity() is True + + def test_delattr_raises_immutability_violation(self) -> None: + """__delattr__ rejects all attribute deletions after construction.""" + error = FrozenFluentError("test", ErrorCategory.REFERENCE) + with pytest.raises(ImmutabilityViolationError): + del error._message + with pytest.raises(ImmutabilityViolationError): + del error._category + + def test_hash_returns_int_from_content_hash(self) -> None: + """__hash__ derives from all 16 bytes of BLAKE2b-128 content hash. + + Python's hash() protocol calls int.__hash__() on the returned integer, + reducing it via Mersenne prime modulus to a platform-sized hash value. + We verify the full 128-bit integer is used, not a truncated subset. + """ + error = FrozenFluentError("test", ErrorCategory.REFERENCE) + h = hash(error) + # __hash__ returns int.from_bytes(content_hash, "big") (all 16 bytes); + # Python's hash() then applies int.__hash__() which reduces via modulus. + # Compute the same reduction independently to verify full-hash derivation. + expected = hash(int.from_bytes(error.content_hash, "big")) + assert h == expected + + def test_eq_compares_content_hash_for_matching_errors(self) -> None: + """__eq__ returns True for errors with identical content.""" + error1 = FrozenFluentError("test", ErrorCategory.REFERENCE) + error2 = FrozenFluentError("test", ErrorCategory.REFERENCE) + assert error1 == error2 + + def test_eq_compares_content_hash_for_different_errors(self) -> None: + """__eq__ returns False for errors with different content.""" + error1 = FrozenFluentError("msg1", ErrorCategory.REFERENCE) + error2 = FrozenFluentError("msg2", ErrorCategory.REFERENCE) + assert error1 != error2 + + def test_convenience_properties_return_empty_without_context( + self, + ) -> None: + """Convenience properties return empty string when context is None.""" + error = FrozenFluentError("test", ErrorCategory.REFERENCE) + assert error.context is None + assert error.fallback_value == "" + assert error.input_value == "" + assert error.locale_code == "" + assert error.parse_type == "" + + def test_convenience_properties_delegate_to_context(self) -> None: + """Convenience properties return context field values when present.""" + ctx = FrozenErrorContext( + input_value="42abc", + locale_code="de_DE", + parse_type="number", + fallback_value="{!NUMBER}", + ) + error = FrozenFluentError( + "test", ErrorCategory.PARSE, context=ctx + ) + assert error.fallback_value == "{!NUMBER}" + assert error.input_value == "42abc" + assert error.locale_code == "de_DE" + assert error.parse_type == "number" diff --git a/tests/diagnostics_frozen_error_cases/core_behavior.py b/tests/diagnostics_frozen_error_cases/core_behavior.py new file mode 100644 index 00000000..3c19f539 --- /dev/null +++ b/tests/diagnostics_frozen_error_cases/core_behavior.py @@ -0,0 +1,556 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest +from hypothesis import assume, event, example, given, settings +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ( + Diagnostic, + DiagnosticCode, + ErrorCategory, + FrozenErrorContext, + FrozenFluentError, +) +from ftllexengine.integrity import ImmutabilityViolationError +from tests.strategies.diagnostics import error_categories + +# ============================================================================= +# Strategies for generating test data +# ============================================================================= + + +@st.composite +def error_messages(draw: st.DrawFn) -> str: + """Generate valid error messages.""" + return draw(st.text(min_size=1, max_size=200)) + + +@st.composite +def optional_diagnostics(draw: st.DrawFn) -> Diagnostic | None: + """Generate optional Diagnostic objects.""" + if draw(st.booleans()): + code = draw(st.sampled_from(list(DiagnosticCode))) + message = draw(st.text(min_size=1, max_size=100)) + return Diagnostic(code=code, message=message, severity="error") + return None + + +@st.composite +def optional_contexts(draw: st.DrawFn) -> FrozenErrorContext | None: + """Generate optional FrozenErrorContext objects.""" + if draw(st.booleans()): + return FrozenErrorContext( + input_value=draw(st.text(min_size=0, max_size=50)), + locale_code=draw(st.text(min_size=1, max_size=10)), + parse_type=draw(st.sampled_from( + ["", "currency", "date", "datetime", "decimal", "number"] + )), + fallback_value=draw(st.text(min_size=0, max_size=50)), + ) + return None + + +@st.composite +def frozen_fluent_errors(draw: st.DrawFn) -> FrozenFluentError: + """Generate FrozenFluentError instances.""" + return FrozenFluentError( + message=draw(error_messages()), + category=draw(error_categories()), + diagnostic=draw(optional_diagnostics()), + context=draw(optional_contexts()), + ) + + +# ============================================================================= +# Content Hash Properties +# ============================================================================= + + + +@pytest.mark.fuzz +class TestContentHashDeterminism: + """Content hash must be deterministic - same inputs always produce same hash.""" + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=100) + def test_same_inputs_produce_same_hash( + self, message: str, category: ErrorCategory + ) -> None: + """Property: Identical errors have identical content hashes.""" + error1 = FrozenFluentError(message, category) + error2 = FrozenFluentError(message, category) + + event(f"msg_len={len(message)}") + assert error1.content_hash == error2.content_hash + assert error1 == error2 + event("outcome=hash_determinism_success") + + @given( + message=error_messages(), + category=error_categories(), + diagnostic=optional_diagnostics(), + context=optional_contexts(), + ) + @settings(max_examples=100) + def test_same_inputs_with_optional_fields( + self, + message: str, + category: ErrorCategory, + diagnostic: Diagnostic | None, + context: FrozenErrorContext | None, + ) -> None: + """Property: Identical errors with optional fields have identical hashes.""" + error1 = FrozenFluentError(message, category, diagnostic, context) + error2 = FrozenFluentError(message, category, diagnostic, context) + + has_diag = diagnostic is not None + has_ctx = context is not None + event(f"has_diagnostic={has_diag}") + event(f"has_context={has_ctx}") + assert error1.content_hash == error2.content_hash + assert error1 == error2 + + @given(error=frozen_fluent_errors()) + @settings(max_examples=100) + def test_hash_is_16_bytes(self, error: FrozenFluentError) -> None: + """Property: Content hash is always 16 bytes (BLAKE2b-128).""" + event(f"category={error.category.name}") + assert len(error.content_hash) == 16 + +@pytest.mark.fuzz +class TestContentHashCollisionResistance: + """Different inputs should produce different hashes (high probability).""" + + @given( + message1=error_messages(), + message2=error_messages(), + category=error_categories(), + ) + @settings(max_examples=100) + def test_different_messages_different_hashes( + self, message1: str, message2: str, category: ErrorCategory + ) -> None: + """Property: Different messages produce different hashes.""" + assume(message1 != message2) + + error1 = FrozenFluentError(message1, category) + error2 = FrozenFluentError(message2, category) + + event(f"msg1_len={len(message1)}") + event(f"msg2_len={len(message2)}") + assert error1.content_hash != error2.content_hash + assert error1 != error2 + event("outcome=hash_collision_resistance") + + @given( + message=error_messages(), + category1=error_categories(), + category2=error_categories(), + ) + @settings(max_examples=100) + def test_different_categories_different_hashes( + self, message: str, category1: ErrorCategory, category2: ErrorCategory + ) -> None: + """Property: Different categories produce different hashes.""" + assume(category1 != category2) + + error1 = FrozenFluentError(message, category1) + error2 = FrozenFluentError(message, category2) + + event(f"cat1={category1.name}") + event(f"cat2={category2.name}") + assert error1.content_hash != error2.content_hash + assert error1 != error2 + +@pytest.mark.fuzz +class TestImmutabilityEnforcement: + """FrozenFluentError must reject all mutations after construction.""" + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_cannot_modify_message(self, error: FrozenFluentError) -> None: + """Property: Cannot modify message after construction.""" + with pytest.raises(ImmutabilityViolationError): + error._message = "modified" + event(f"msg_len={len(error.message)}") + event("outcome=immutability_enforced") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_cannot_modify_category(self, error: FrozenFluentError) -> None: + """Property: Cannot modify category after construction.""" + with pytest.raises(ImmutabilityViolationError): + error._category = ErrorCategory.PARSE + event(f"category={error.category.name}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_cannot_modify_diagnostic(self, error: FrozenFluentError) -> None: + """Property: Cannot modify diagnostic after construction.""" + with pytest.raises(ImmutabilityViolationError): + error._diagnostic = None + has_diag = error.diagnostic is not None + event(f"has_diagnostic={has_diag}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_cannot_modify_context(self, error: FrozenFluentError) -> None: + """Property: Cannot modify context after construction.""" + with pytest.raises(ImmutabilityViolationError): + error._context = None + has_ctx = error.context is not None + event(f"has_context={has_ctx}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_cannot_modify_content_hash(self, error: FrozenFluentError) -> None: + """Property: Cannot modify content hash after construction.""" + with pytest.raises(ImmutabilityViolationError): + error._content_hash = b"fake" + event(f"category={error.category.name}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_cannot_delete_attributes(self, error: FrozenFluentError) -> None: + """Property: Cannot delete any attributes.""" + with pytest.raises(ImmutabilityViolationError): + del error._message + event(f"category={error.category.name}") + +class TestSealedTypeEnforcement: + """FrozenFluentError must reject subclassing at runtime.""" + + def test_cannot_subclass(self) -> None: + """FrozenFluentError cannot be subclassed.""" + with pytest.raises(TypeError, match="cannot be subclassed"): + # pylint: disable=unused-variable,subclassed-final-class + class MaliciousError(FrozenFluentError): # type: ignore[misc] + pass + +@pytest.mark.fuzz +class TestVerifyIntegrity: + """verify_integrity() must correctly detect corruption.""" + + @given(error=frozen_fluent_errors()) + @settings(max_examples=100) + def test_fresh_error_passes_integrity_check( + self, error: FrozenFluentError + ) -> None: + """Property: Freshly constructed errors always pass integrity check.""" + event(f"category={error.category.name}") + assert error.verify_integrity() is True + event("outcome=integrity_check_passed") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=100) + def test_integrity_is_idempotent(self, error: FrozenFluentError) -> None: + """Property: verify_integrity() can be called multiple times.""" + event(f"category={error.category.name}") + assert error.verify_integrity() is True + assert error.verify_integrity() is True + assert error.verify_integrity() is True + +@pytest.mark.fuzz +class TestHashability: + """FrozenFluentError must be usable in sets and as dict keys.""" + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_error_is_hashable(self, error: FrozenFluentError) -> None: + """Property: Errors are hashable (can use hash()).""" + h = hash(error) + assert isinstance(h, int) + event(f"category={error.category.name}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_hash_is_stable(self, error: FrozenFluentError) -> None: + """Property: Hash is stable across multiple calls.""" + h1 = hash(error) + h2 = hash(error) + h3 = hash(error) + assert h1 == h2 == h3 + event(f"category={error.category.name}") + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=50) + def test_equal_errors_have_equal_hashes( + self, message: str, category: ErrorCategory + ) -> None: + """Property: Equal errors have equal hashes (hash contract).""" + error1 = FrozenFluentError(message, category) + error2 = FrozenFluentError(message, category) + + assert error1 == error2 + assert hash(error1) == hash(error2) + event(f"category={category.name}") + + @given( + errors=st.lists(frozen_fluent_errors(), min_size=1, max_size=20, unique=True) + ) + @settings(max_examples=50) + def test_errors_can_be_added_to_set( + self, errors: list[FrozenFluentError] + ) -> None: + """Property: Errors can be stored in sets.""" + error_set = set(errors) + assert len(error_set) <= len(errors) + event(f"set_size={len(error_set)}") + + @given( + errors=st.lists(frozen_fluent_errors(), min_size=1, max_size=20, unique=True) + ) + @settings(max_examples=50) + def test_errors_can_be_dict_keys( + self, errors: list[FrozenFluentError] + ) -> None: + """Property: Errors can be used as dict keys.""" + error_dict = {e: i for i, e in enumerate(errors)} + assert len(error_dict) <= len(errors) + event(f"dict_size={len(error_dict)}") + +@pytest.mark.fuzz +class TestEquality: + """FrozenFluentError equality must be based on content.""" + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_error_equals_itself(self, error: FrozenFluentError) -> None: + """Property: Errors are equal to themselves (reflexivity).""" + same_ref = error + assert error == same_ref + event(f"category={error.category.name}") + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=50) + def test_identical_errors_are_equal( + self, message: str, category: ErrorCategory + ) -> None: + """Property: Identical errors are equal (symmetry).""" + error1 = FrozenFluentError(message, category) + error2 = FrozenFluentError(message, category) + + assert error1 == error2 + assert error2 == error1 + event(f"category={category.name}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_error_not_equal_to_string(self, error: FrozenFluentError) -> None: + """Property: Errors are not equal to strings.""" + assert (error == error.message) is False + event(f"category={error.category.name}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_error_not_equal_to_none(self, error: FrozenFluentError) -> None: + """Property: Errors are not equal to None (tests __eq__ method).""" + # pylint: disable=singleton-comparison + assert (error == None) is False # noqa: E711 - explicit None comparison intentional + event(f"category={error.category.name}") + +@pytest.mark.fuzz +class TestPropertyAccess: + """FrozenFluentError properties must be accessible.""" + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=50) + def test_message_property(self, message: str, category: ErrorCategory) -> None: + """Property: message property returns the message.""" + error = FrozenFluentError(message, category) + assert error.message == message + event(f"msg_len={len(message)}") + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=50) + def test_category_property(self, message: str, category: ErrorCategory) -> None: + """Property: category property returns the category.""" + error = FrozenFluentError(message, category) + assert error.category == category + event(f"category={category.name}") + + @given( + message=error_messages(), + category=error_categories(), + diagnostic=optional_diagnostics(), + ) + @settings(max_examples=50) + def test_diagnostic_property( + self, + message: str, + category: ErrorCategory, + diagnostic: Diagnostic | None, + ) -> None: + """Property: diagnostic property returns the diagnostic.""" + error = FrozenFluentError(message, category, diagnostic=diagnostic) + assert error.diagnostic == diagnostic + has_diag = diagnostic is not None + event(f"has_diagnostic={has_diag}") + + @given( + message=error_messages(), + category=error_categories(), + context=optional_contexts(), + ) + @settings(max_examples=50) + def test_context_property( + self, + message: str, + category: ErrorCategory, + context: FrozenErrorContext | None, + ) -> None: + """Property: context property returns the context.""" + error = FrozenFluentError(message, category, context=context) + assert error.context == context + has_ctx = context is not None + event(f"has_context={has_ctx}") + +@pytest.mark.fuzz +class TestContextConvenienceProperties: + """FrozenFluentError convenience properties for context fields.""" + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=50) + def test_context_properties_empty_without_context( + self, message: str, category: ErrorCategory + ) -> None: + """Property: Context convenience properties return empty strings without context.""" + error = FrozenFluentError(message, category) + + assert error.fallback_value == "" + assert error.input_value == "" + assert error.locale_code == "" + assert error.parse_type == "" + event(f"category={category.name}") + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=50) + def test_context_properties_with_context( + self, message: str, category: ErrorCategory + ) -> None: + """Property: Context convenience properties return context values.""" + context = FrozenErrorContext( + input_value="test_input", + locale_code="en_US", + parse_type="number", + fallback_value="{!NUMBER}", + ) + error = FrozenFluentError(message, category, context=context) + + assert error.fallback_value == "{!NUMBER}" + assert error.input_value == "test_input" + assert error.locale_code == "en_US" + assert error.parse_type == "number" + event(f"category={category.name}") + +@pytest.mark.fuzz +class TestStringRepresentation: + """FrozenFluentError must have sensible string representation.""" + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_str_returns_message(self, error: FrozenFluentError) -> None: + """Property: str() returns the error message.""" + assert str(error) == error.message + event(f"msg_len={len(error.message)}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_repr_is_valid(self, error: FrozenFluentError) -> None: + """Property: repr() returns a valid representation.""" + r = repr(error) + assert isinstance(r, str) + assert "FrozenFluentError" in r + assert "message=" in r + assert "category=" in r + event(f"category={error.category.name}") + +class TestEdgeCases: + """Edge case tests for FrozenFluentError.""" + + def test_empty_message(self) -> None: + """FrozenFluentError accepts empty message.""" + error = FrozenFluentError("", ErrorCategory.REFERENCE) + assert error.message == "" + assert error.verify_integrity() is True + + def test_unicode_message(self) -> None: + """FrozenFluentError handles Unicode messages.""" + error = FrozenFluentError("Error: \u4e2d\u6587\u6587\u672c", ErrorCategory.PARSE) + assert error.verify_integrity() is True + + def test_emoji_message(self) -> None: + """FrozenFluentError handles emoji in messages.""" + error = FrozenFluentError("Error \U0001F44D occurred", ErrorCategory.FORMATTING) + assert error.verify_integrity() is True + + @example(message="Test") + @given(message=st.text()) + @settings(max_examples=100) + def test_arbitrary_text_messages(self, message: str) -> None: + """Property: FrozenFluentError handles arbitrary text.""" + error = FrozenFluentError(message, ErrorCategory.RESOLUTION) + assert error.verify_integrity() is True + assert error.message == message + event(f"msg_len={len(message)}") + + def test_all_categories_work(self) -> None: + """All ErrorCategory values can be used.""" + for category in ErrorCategory: + error = FrozenFluentError("test", category) + assert error.category == category + assert error.verify_integrity() is True + +@pytest.mark.fuzz +class TestExceptionBehavior: + """FrozenFluentError must behave like a proper exception.""" + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_can_be_raised(self, error: FrozenFluentError) -> None: + """Property: FrozenFluentError can be raised and caught.""" + with pytest.raises(FrozenFluentError) as exc_info: + raise error + assert exc_info.value is error + event(f"category={error.category.name}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_can_be_caught_as_exception(self, error: FrozenFluentError) -> None: + """Property: FrozenFluentError can be caught as Exception.""" + caught: Exception | None = None + try: + raise error + except Exception as exc_info: + caught = exc_info + assert caught is error + event(f"category={error.category.name}") + + @given(error=frozen_fluent_errors()) + @settings(max_examples=50) + def test_exception_args(self, error: FrozenFluentError) -> None: + """Property: Exception args contain the message.""" + assert error.args == (error.message,) + event(f"category={error.category.name}") diff --git a/tests/diagnostics_frozen_error_cases/formatting_and_hashes.py b/tests/diagnostics_frozen_error_cases/formatting_and_hashes.py new file mode 100644 index 00000000..fbb2c3a4 --- /dev/null +++ b/tests/diagnostics_frozen_error_cases/formatting_and_hashes.py @@ -0,0 +1,605 @@ +# mypy: ignore-errors +from __future__ import annotations + +from typing import Literal + +import pytest +from hypothesis import assume, event, given, settings +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ( + Diagnostic, + DiagnosticCode, + ErrorCategory, + FrozenErrorContext, + FrozenFluentError, + SourceSpan, +) +from tests.strategies.diagnostics import error_categories + +# ============================================================================= +# Strategies for generating test data +# ============================================================================= + + +@st.composite +def error_messages(draw: st.DrawFn) -> str: + """Generate valid error messages.""" + return draw(st.text(min_size=1, max_size=200)) + + +@st.composite +def optional_diagnostics(draw: st.DrawFn) -> Diagnostic | None: + """Generate optional Diagnostic objects.""" + if draw(st.booleans()): + code = draw(st.sampled_from(list(DiagnosticCode))) + message = draw(st.text(min_size=1, max_size=100)) + return Diagnostic(code=code, message=message, severity="error") + return None + + +@st.composite +def optional_contexts(draw: st.DrawFn) -> FrozenErrorContext | None: + """Generate optional FrozenErrorContext objects.""" + if draw(st.booleans()): + return FrozenErrorContext( + input_value=draw(st.text(min_size=0, max_size=50)), + locale_code=draw(st.text(min_size=1, max_size=10)), + parse_type=draw(st.sampled_from( + ["", "currency", "date", "datetime", "decimal", "number"] + )), + fallback_value=draw(st.text(min_size=0, max_size=50)), + ) + return None + + +@st.composite +def frozen_fluent_errors(draw: st.DrawFn) -> FrozenFluentError: + """Generate FrozenFluentError instances.""" + return FrozenFluentError( + message=draw(error_messages()), + category=draw(error_categories()), + diagnostic=draw(optional_diagnostics()), + context=draw(optional_contexts()), + ) + + +# ============================================================================= +# Content Hash Properties +# ============================================================================= + + + +@st.composite +def rich_diagnostics(draw: st.DrawFn) -> Diagnostic: + """Generate Diagnostic objects with arbitrary optional field population. + + Produces diagnostics with varied combinations of span, hint, help_url, + function_name, argument_name, type info, ftl_location, severity, and + resolution_path. Provides broad input diversity for content hash and + format_error() fuzzing. + """ + code = draw(st.sampled_from(list(DiagnosticCode))) + message = draw(st.text(min_size=1, max_size=100)) + + has_span = draw(st.booleans()) + span = None + if has_span: + start = draw(st.integers(min_value=0, max_value=10000)) + end = draw( + st.integers(min_value=start, max_value=start + 1000) + ) + line = draw(st.integers(min_value=1, max_value=5000)) + column = draw(st.integers(min_value=1, max_value=200)) + span = SourceSpan( + start=start, end=end, line=line, column=column + ) + + opt_str = st.one_of(st.none(), st.text(min_size=1, max_size=80)) + hint = draw(opt_str) + help_url = draw(opt_str) + function_name = draw(opt_str) + argument_name = draw(opt_str) + expected_type = draw(opt_str) + received_type = draw(opt_str) + ftl_location = draw(opt_str) + severity: Literal["error", "warning"] = draw( + st.sampled_from(["error", "warning"]) + ) + + has_path = draw(st.booleans()) + resolution_path = None + if has_path: + path_elems = draw( + st.lists( + st.text(min_size=1, max_size=20), + min_size=0, + max_size=5, + ) + ) + resolution_path = tuple(path_elems) + + return Diagnostic( + code=code, + message=message, + span=span, + hint=hint, + help_url=help_url, + function_name=function_name, + argument_name=argument_name, + expected_type=expected_type, + received_type=received_type, + ftl_location=ftl_location, + severity=severity, + resolution_path=resolution_path, + ) + +@pytest.mark.fuzz +class TestRichDiagnosticHashProperties: + """Content hash integrity with fully-populated diagnostic fields. + + Exercises hash computation paths for all diagnostic optional fields + (span, hint, help_url, function_name, resolution_path, etc.) + that are unreachable with the basic optional_diagnostics strategy. + """ + + @given( + message=error_messages(), + category=error_categories(), + diagnostic=rich_diagnostics(), + context=optional_contexts(), + ) + @settings(max_examples=200) + def test_hash_determinism_rich_diagnostics( + self, + message: str, + category: ErrorCategory, + diagnostic: Diagnostic, + context: FrozenErrorContext | None, + ) -> None: + """Property: Hash is deterministic with fully-populated diagnostics.""" + error1 = FrozenFluentError( + message, category, diagnostic, context + ) + error2 = FrozenFluentError( + message, category, diagnostic, context + ) + + has_span = diagnostic.span is not None + has_path = diagnostic.resolution_path is not None + n_opt = sum( + 1 + for f in ( + diagnostic.hint, + diagnostic.help_url, + diagnostic.function_name, + diagnostic.argument_name, + diagnostic.expected_type, + diagnostic.received_type, + diagnostic.ftl_location, + ) + if f is not None + ) + event(f"has_span={has_span}") + event(f"has_resolution_path={has_path}") + event(f"optional_field_count={n_opt}") + event(f"severity={diagnostic.severity}") + + assert error1.content_hash == error2.content_hash + assert error1 == error2 + assert error1.verify_integrity() + event("outcome=rich_hash_determinism") + + @given( + message=error_messages(), + category=error_categories(), + diagnostic=rich_diagnostics(), + ) + @settings(max_examples=200) + def test_rich_diagnostic_integrity( + self, + message: str, + category: ErrorCategory, + diagnostic: Diagnostic, + ) -> None: + """Property: verify_integrity() passes for all diagnostic variants.""" + error = FrozenFluentError(message, category, diagnostic) + + has_span = diagnostic.span is not None + has_path = diagnostic.resolution_path is not None + event(f"has_span={has_span}") + event(f"has_resolution_path={has_path}") + event(f"severity={diagnostic.severity}") + event(f"code={diagnostic.code.name}") + + assert error.verify_integrity() + assert len(error.content_hash) == 16 + event("outcome=rich_integrity_verified") + + @given( + message=error_messages(), + category=error_categories(), + diagnostic=rich_diagnostics(), + ) + @settings(max_examples=100) + def test_rich_diagnostic_repr_contains_fields( + self, + message: str, + category: ErrorCategory, + diagnostic: Diagnostic, + ) -> None: + """Property: repr includes all constructor args for rich diagnostics.""" + error = FrozenFluentError(message, category, diagnostic) + r = repr(error) + + assert "FrozenFluentError" in r + assert "message=" in r + assert "category=" in r + assert "diagnostic=" in r + event(f"code={diagnostic.code.name}") + event("outcome=rich_repr_valid") + +_TEST_CONTROL_TRANSLATE = str.maketrans( + {c: f"\\x{c:02x}" for c in range(0x20)} + | {0x7F: "\\x7f", 0x1B: "\\x1b", 0x0D: "\\r", 0x0A: "\\n", 0x09: "\\t"} +) + +def _escape_control_chars(text: str) -> str: + """Mirror DiagnosticFormatter._escape_control_chars for test assertions.""" + return text.translate(_TEST_CONTROL_TRANSLATE) + +@pytest.mark.fuzz +class TestDiagnosticFormatProperties: + """Property tests for Diagnostic.format_error() output correctness. + + Tests the Rust-inspired diagnostic formatting in codes.py, ensuring + all field combinations produce well-structured output. + """ + + @given(diagnostic=rich_diagnostics()) + @settings(max_examples=200) + def test_format_error_nonempty( + self, diagnostic: Diagnostic + ) -> None: + """Property: format_error() always returns non-empty string.""" + formatted = diagnostic.format_error() + assert isinstance(formatted, str) + assert len(formatted) > 0 + event(f"has_span={diagnostic.span is not None}") + event(f"severity={diagnostic.severity}") + event("outcome=format_nonempty") + + @given(diagnostic=rich_diagnostics()) + @settings(max_examples=200) + def test_format_error_contains_message( + self, diagnostic: Diagnostic + ) -> None: + """Property: format_error() always contains the escaped diagnostic message.""" + formatted = diagnostic.format_error() + assert _escape_control_chars(diagnostic.message) in formatted + event(f"code={diagnostic.code.name}") + event("outcome=format_contains_message") + + @given(diagnostic=rich_diagnostics()) + @settings(max_examples=200) + def test_format_error_contains_code_name( + self, diagnostic: Diagnostic + ) -> None: + """Property: format_error() always contains the diagnostic code name.""" + formatted = diagnostic.format_error() + assert diagnostic.code.name in formatted + event(f"code={diagnostic.code.name}") + + @given(diagnostic=rich_diagnostics()) + @settings(max_examples=200) + def test_format_error_severity_prefix( + self, diagnostic: Diagnostic + ) -> None: + """Property: format_error() starts with correct severity prefix.""" + formatted = diagnostic.format_error() + if diagnostic.severity == "warning": + assert formatted.startswith("warning[") + event("severity=warning") + else: + assert formatted.startswith("error[") + event("severity=error") + + @given(diagnostic=rich_diagnostics()) + @settings(max_examples=200) + def test_format_error_location_dispatch( + self, diagnostic: Diagnostic + ) -> None: + """Property: format_error() dispatches location correctly. + + Span takes precedence over ftl_location. When neither is + present, no location line appears. + """ + formatted = diagnostic.format_error() + if diagnostic.span is not None: + line_str = f"line {diagnostic.span.line}" + col_str = f"column {diagnostic.span.column}" + assert line_str in formatted + assert col_str in formatted + event("location=span") + elif diagnostic.ftl_location is not None: + assert _escape_control_chars(diagnostic.ftl_location) in formatted + event("location=ftl_location") + else: + assert "-->" not in formatted + event("location=none") + + @given(diagnostic=rich_diagnostics()) + @settings(max_examples=200) + def test_format_error_optional_field_inclusion( + self, diagnostic: Diagnostic + ) -> None: + """Property: format_error() includes all present optional fields.""" + formatted = diagnostic.format_error() + + if diagnostic.function_name: + escaped_fn = _escape_control_chars(diagnostic.function_name) + fn_line = f"function: {escaped_fn}" + assert fn_line in formatted + event("has_function=True") + else: + event("has_function=False") + + if diagnostic.argument_name: + escaped_arg = _escape_control_chars(diagnostic.argument_name) + arg_line = f"argument: {escaped_arg}" + assert arg_line in formatted + + if diagnostic.expected_type: + escaped_exp = _escape_control_chars(diagnostic.expected_type) + exp_line = f"expected: {escaped_exp}" + assert exp_line in formatted + + if diagnostic.received_type: + escaped_rcv = _escape_control_chars(diagnostic.received_type) + rcv_line = f"received: {escaped_rcv}" + assert rcv_line in formatted + + if diagnostic.resolution_path: + path_str = " -> ".join(diagnostic.resolution_path) + escaped_path = _escape_control_chars(path_str) + assert escaped_path in formatted + path_len = len(diagnostic.resolution_path) + event(f"path_len={path_len}") + + if diagnostic.hint: + escaped_hint = _escape_control_chars(diagnostic.hint) + hint_line = f"help: {escaped_hint}" + assert hint_line in formatted + event("has_hint=True") + else: + event("has_hint=False") + + if diagnostic.help_url: + escaped_url = _escape_control_chars(diagnostic.help_url) + url_line = f"note: see {escaped_url}" + assert url_line in formatted + + @given(diagnostic=rich_diagnostics()) + @settings(max_examples=100) + def test_format_error_idempotent( + self, diagnostic: Diagnostic + ) -> None: + """Property: format_error() is idempotent.""" + result1 = diagnostic.format_error() + result2 = diagnostic.format_error() + assert result1 == result2 + event(f"code={diagnostic.code.name}") + event("outcome=format_idempotent") + +def _make_diag_with_field( + field: str, val: str +) -> Diagnostic: + """Build a Diagnostic with exactly one optional string field set.""" + return Diagnostic( + code=DiagnosticCode.FUNCTION_FAILED, + message="base", + hint=val if field == "hint" else None, + help_url=val if field == "help_url" else None, + function_name=( + val if field == "function_name" else None + ), + argument_name=( + val if field == "argument_name" else None + ), + expected_type=( + val if field == "expected_type" else None + ), + received_type=( + val if field == "received_type" else None + ), + ftl_location=( + val if field == "ftl_location" else None + ), + ) + +_OPTIONAL_DIAG_FIELDS = [ + "hint", + "help_url", + "function_name", + "argument_name", + "expected_type", + "received_type", + "ftl_location", +] + +@pytest.mark.fuzz +class TestHashCollisionResistanceProperties: + """Advanced collision resistance for _compute_content_hash. + + Tests three structural integrity mechanisms: + 1. Length-prefix prevents field boundary ambiguity + 2. Each optional diagnostic field independently affects hash + 3. Section markers prevent diagnostic/context presence collisions + """ + + @given( + a=st.text(min_size=2, max_size=50), + b=st.text(min_size=1, max_size=50), + ) + @settings(max_examples=200) + def test_length_prefix_prevents_boundary_collision( + self, a: str, b: str + ) -> None: + """Property: Shifting one char across field boundary changes hash. + + _hash_string uses 4-byte length prefix so ("ab","cd") and + ("a","bcd") produce different digests even though the raw + bytes concatenate identically without the prefix. + + Events emitted: + - a_len={n}: Length of first field + - b_len={n}: Length of second field + - outcome=length_prefix_collision_prevented + """ + # Shift last char of 'a' into 'b' + a_shifted = a[:-1] + b_shifted = a[-1] + b + + ctx1 = FrozenErrorContext( + input_value=a, + locale_code=b, + parse_type="date", + fallback_value="f", + ) + ctx2 = FrozenErrorContext( + input_value=a_shifted, + locale_code=b_shifted, + parse_type="date", + fallback_value="f", + ) + + error1 = FrozenFluentError( + "msg", ErrorCategory.REFERENCE, context=ctx1 + ) + error2 = FrozenFluentError( + "msg", ErrorCategory.REFERENCE, context=ctx2 + ) + + event(f"a_len={len(a)}") + event(f"b_len={len(b)}") + assert error1.content_hash != error2.content_hash + event("outcome=length_prefix_collision_prevented") + + @given( + field=st.sampled_from(_OPTIONAL_DIAG_FIELDS), + val1=st.text(min_size=1, max_size=50), + val2=st.text(min_size=1, max_size=50), + ) + @settings(max_examples=200) + def test_each_optional_field_affects_hash( + self, field: str, val1: str, val2: str + ) -> None: + """Property: Changing any single optional field changes the hash. + + Each of the 7 optional string fields in Diagnostic (hint, + help_url, function_name, argument_name, expected_type, + received_type, ftl_location) must independently affect the + content hash. + + Events emitted: + - field={name}: Which field was varied + - outcome=field_sensitivity_verified + """ + assume(val1 != val2) + event(f"field={field}") + + diag1 = _make_diag_with_field(field, val1) + diag2 = _make_diag_with_field(field, val2) + + e1 = FrozenFluentError( + "m", ErrorCategory.RESOLUTION, diagnostic=diag1 + ) + e2 = FrozenFluentError( + "m", ErrorCategory.RESOLUTION, diagnostic=diag2 + ) + + assert e1.content_hash != e2.content_hash + event("outcome=field_sensitivity_verified") + + @given( + field=st.sampled_from(_OPTIONAL_DIAG_FIELDS), + val=st.text(min_size=1, max_size=50), + ) + @settings(max_examples=100) + def test_none_vs_present_field_affects_hash( + self, field: str, val: str + ) -> None: + """Property: None vs present for any optional field changes hash. + + The hash uses b"\\x00NONE" sentinel for absent fields. A + present field must always produce a different hash than the + sentinel. + + Events emitted: + - field={name}: Which field was toggled + - outcome=none_vs_present_verified + """ + event(f"field={field}") + + diag_with = _make_diag_with_field(field, val) + diag_without = Diagnostic( + code=DiagnosticCode.FUNCTION_FAILED, + message="base", + ) + + e1 = FrozenFluentError( + "m", ErrorCategory.RESOLUTION, diagnostic=diag_with + ) + e2 = FrozenFluentError( + "m", ErrorCategory.RESOLUTION, diagnostic=diag_without + ) + + assert e1.content_hash != e2.content_hash + event("outcome=none_vs_present_verified") + + @given( + message=error_messages(), + category=error_categories(), + ) + @settings(max_examples=100) + def test_section_markers_prevent_presence_collision( + self, message: str, category: ErrorCategory + ) -> None: + """Property: All 4 diagnostic/context presence permutations differ. + + Section markers (\\x01DIAG/\\x00NODIAG, \\x01CTX/\\x00NOCTX) + ensure that errors with different combinations of diagnostic + and context presence always produce different hashes. + + Events emitted: + - category={name}: Error category + - outcome=section_markers_verified + """ + event(f"category={category.name}") + + diag = Diagnostic( + code=DiagnosticCode.MESSAGE_NOT_FOUND, + message="diag msg", + ) + ctx = FrozenErrorContext(input_value="ctx val") + + # All 4 presence permutations + e_nn = FrozenFluentError(message, category) + e_dn = FrozenFluentError( + message, category, diagnostic=diag + ) + e_nc = FrozenFluentError( + message, category, context=ctx + ) + e_dc = FrozenFluentError( + message, category, diagnostic=diag, context=ctx + ) + + hashes = { + e_nn.content_hash, + e_dn.content_hash, + e_nc.content_hash, + e_dc.content_hash, + } + assert len(hashes) == 4 + event("outcome=section_markers_verified") diff --git a/tests/fuzz/test_localization_property.py b/tests/fuzz/test_localization_property.py index fd424ed0..d4ba2280 100644 --- a/tests/fuzz/test_localization_property.py +++ b/tests/fuzz/test_localization_property.py @@ -1,1188 +1,14 @@ -"""Property-based tests for FluentLocalization orchestration layer. - -Covers multi-locale orchestration, data type invariants, fallback semantics, -and API surface completeness using Hypothesis strategies from -tests/strategies/localization. - -Fuzz module: all @given tests emit hypothesis.event() for HypoFuzz guidance. - -Python 3.13+. -""" - -from __future__ import annotations - -from decimal import Decimal -from pathlib import Path - -import pytest -from hypothesis import HealthCheck, event, given, settings -from hypothesis import strategies as st - -from ftllexengine.core.locale_utils import normalize_locale -from ftllexengine.localization import ( - FluentLocalization, - LoadStatus, - LoadSummary, - PathResourceLoader, - ResourceLoadResult, -) -from ftllexengine.runtime.cache_config import CacheConfig -from ftllexengine.syntax.ast import Junk, Span -from tests.strategies.ftl import ftl_simple_messages -from tests.strategies.localization import ( - DictResourceLoader, - FailingResourceLoader, - ftl_messages_with_attributes, - ftl_messages_with_terms, - ftl_resource_sets, - locale_chains, - message_ids, - resource_loaders, -) - -pytestmark = pytest.mark.fuzz - - -# --------------------------------------------------------------------------- -# ResourceLoadResult property invariants -# --------------------------------------------------------------------------- - - -class TestResourceLoadResultProperties: - """Property invariants for ResourceLoadResult data class.""" - - @given( - status=st.sampled_from(list(LoadStatus)), - locale=st.sampled_from(["en", "de", "fr", "lv"]), - resource_id=st.sampled_from(["main.ftl", "ui.ftl"]), - ) - def test_status_properties_are_mutually_exclusive( - self, status: LoadStatus, locale: str, resource_id: str, - ) -> None: - """Exactly one status property is True for any LoadStatus.""" - event(f"status={status.value}") - result = ResourceLoadResult( - locale=locale, resource_id=resource_id, status=status, - ) - flags = [result.is_success, result.is_not_found, result.is_error] - assert sum(flags) == 1 - - @given( - junk_count=st.integers(min_value=0, max_value=5), - ) - def test_has_junk_iff_junk_entries_nonempty( - self, junk_count: int, - ) -> None: - """has_junk is True iff junk_entries is non-empty.""" - event(f"junk_count={junk_count}") - junk_entries = tuple( - Junk( - content=f"invalid{i}", - span=Span(start=i * 10, end=i * 10 + 7), - ) - for i in range(junk_count) - ) - result = ResourceLoadResult( - locale="en", resource_id="test.ftl", - status=LoadStatus.SUCCESS, junk_entries=junk_entries, - ) - assert result.has_junk == (junk_count > 0) - - -# --------------------------------------------------------------------------- -# LoadSummary aggregation invariants -# --------------------------------------------------------------------------- - - -class TestLoadSummaryAggregation: - """Property invariants for LoadSummary post_init aggregation.""" - - @given( - success_n=st.integers(min_value=0, max_value=5), - not_found_n=st.integers(min_value=0, max_value=5), - error_n=st.integers(min_value=0, max_value=5), - ) - def test_status_counts_sum_to_total( - self, success_n: int, not_found_n: int, error_n: int, - ) -> None: - """successful + not_found + errors == total_attempted.""" - total = success_n + not_found_n + error_n - event(f"total={total}") - results: list[ResourceLoadResult] = [] - for i in range(success_n): - results.append(ResourceLoadResult( - f"en{i}", f"s{i}.ftl", LoadStatus.SUCCESS, - )) - for i in range(not_found_n): - results.append(ResourceLoadResult( - f"nf{i}", f"n{i}.ftl", LoadStatus.NOT_FOUND, - )) - for i in range(error_n): - results.append(ResourceLoadResult( - f"er{i}", f"e{i}.ftl", LoadStatus.ERROR, - error=OSError(f"fail{i}"), - )) - - summary = LoadSummary(results=tuple(results)) - assert summary.total_attempted == total - assert summary.successful == success_n - assert summary.not_found == not_found_n - assert summary.errors == error_n - assert summary.successful + summary.not_found + summary.errors == total - - @given( - junk_per_result=st.lists( - st.integers(min_value=0, max_value=3), - min_size=1, max_size=5, - ), - ) - def test_junk_count_is_total_across_results( - self, junk_per_result: list[int], - ) -> None: - """junk_count sums junk_entries lengths across all results.""" - expected_total = sum(junk_per_result) - event(f"total_junk={expected_total}") - results: list[ResourceLoadResult] = [] - for idx, jc in enumerate(junk_per_result): - junk = tuple( - Junk( - content=f"j{idx}_{j}", - span=Span(start=0, end=1), - ) - for j in range(jc) - ) - results.append(ResourceLoadResult( - "en", f"f{idx}.ftl", LoadStatus.SUCCESS, - junk_entries=junk, - )) - - summary = LoadSummary(results=tuple(results)) - assert summary.junk_count == expected_total - assert summary.has_junk == (expected_total > 0) - - @given( - success_n=st.integers(min_value=0, max_value=3), - not_found_n=st.integers(min_value=0, max_value=3), - error_n=st.integers(min_value=0, max_value=3), - ) - def test_filter_methods_partition_results( - self, success_n: int, not_found_n: int, error_n: int, - ) -> None: - """get_errors + get_not_found + get_successful == all results.""" - event(f"error_n={error_n}") - results: list[ResourceLoadResult] = [] - for i in range(success_n): - results.append(ResourceLoadResult( - "en", f"s{i}.ftl", LoadStatus.SUCCESS, - )) - for i in range(not_found_n): - results.append(ResourceLoadResult( - "de", f"n{i}.ftl", LoadStatus.NOT_FOUND, - )) - for i in range(error_n): - results.append(ResourceLoadResult( - "fr", f"e{i}.ftl", LoadStatus.ERROR, - error=OSError("fail"), - )) - - summary = LoadSummary(results=tuple(results)) - assert len(summary.get_successful()) == success_n - assert len(summary.get_not_found()) == not_found_n - assert len(summary.get_errors()) == error_n - - @given( - locale=st.sampled_from(["en", "de", "fr"]), - n=st.integers(min_value=0, max_value=4), - ) - def test_get_by_locale_filters_correctly( - self, locale: str, n: int, - ) -> None: - """get_by_locale returns only matching-locale results.""" - event(f"filter_count={n}") - results: list[ResourceLoadResult] = [] - for i in range(n): - results.append(ResourceLoadResult( - locale, f"f{i}.ftl", LoadStatus.SUCCESS, - )) - # Add results for other locales - results.append(ResourceLoadResult( - "xx", "other.ftl", LoadStatus.SUCCESS, - )) - - summary = LoadSummary(results=tuple(results)) - filtered = summary.get_by_locale(locale) - assert len(filtered) == n - assert all(r.locale == locale for r in filtered) - - @given( - junk_counts=st.lists( - st.integers(min_value=0, max_value=3), - min_size=1, max_size=4, - ), - ) - def test_get_all_junk_flattens_correctly( - self, junk_counts: list[int], - ) -> None: - """get_all_junk returns flattened tuple of all Junk entries.""" - expected_total = sum(junk_counts) - event(f"flatten_total={expected_total}") - results: list[ResourceLoadResult] = [] - all_junk: list[Junk] = [] - for idx, jc in enumerate(junk_counts): - junk_entries = tuple( - Junk( - content=f"j{idx}_{j}", - span=Span(start=0, end=1), - ) - for j in range(jc) - ) - all_junk.extend(junk_entries) - results.append(ResourceLoadResult( - "en", f"f{idx}.ftl", LoadStatus.SUCCESS, - junk_entries=junk_entries, - )) - - summary = LoadSummary(results=tuple(results)) - flattened = summary.get_all_junk() - assert len(flattened) == expected_total - for j in all_junk: - assert j in flattened - - @given( - has_errors=st.booleans(), - has_not_found=st.booleans(), - has_junk=st.booleans(), - ) - def test_all_successful_and_all_clean_semantics( - self, has_errors: bool, has_not_found: bool, has_junk: bool, - ) -> None: - """all_successful ignores junk; all_clean requires zero junk.""" - event(f"errors={has_errors}") - event(f"not_found={has_not_found}") - results: list[ResourceLoadResult] = [] - # Always add at least one success - junk = ( - (Junk(content="j", span=Span(start=0, end=1)),) - if has_junk else () - ) - results.append(ResourceLoadResult( - "en", "main.ftl", LoadStatus.SUCCESS, junk_entries=junk, - )) - if has_errors: - results.append(ResourceLoadResult( - "de", "err.ftl", LoadStatus.ERROR, error=OSError("f"), - )) - if has_not_found: - results.append(ResourceLoadResult( - "fr", "nf.ftl", LoadStatus.NOT_FOUND, - )) - - summary = LoadSummary(results=tuple(results)) - - expected_all_successful = not has_errors and not has_not_found - assert summary.all_successful == expected_all_successful - - expected_all_clean = ( - not has_errors and not has_not_found and not has_junk - ) - assert summary.all_clean == expected_all_clean - - @given( - has_errors=st.booleans(), - ) - def test_has_errors_property(self, has_errors: bool) -> None: - """has_errors is True iff errors > 0.""" - event(f"has_errors={has_errors}") - results: list[ResourceLoadResult] = [ - ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), - ] - if has_errors: - results.append(ResourceLoadResult( - "de", "err.ftl", LoadStatus.ERROR, error=OSError("f"), - )) - summary = LoadSummary(results=tuple(results)) - assert summary.has_errors == has_errors - - -# --------------------------------------------------------------------------- -# PathResourceLoader invariants -# --------------------------------------------------------------------------- - - -class TestPathResourceLoaderInvariants: - """Property invariants for PathResourceLoader.""" - - @given( - prefix=st.text( - alphabet=st.characters( - whitelist_categories=("Ll", "Lu"), - ), - min_size=0, max_size=8, - ), - ) - def test_init_resolves_root_from_static_prefix( - self, prefix: str, - ) -> None: - """Root directory is derived from static prefix before {locale}.""" - base_path = ( - f"{prefix}/{{locale}}/resources" - if prefix - else "{locale}/resources" - ) - event(f"prefix_len={len(prefix)}") - loader = PathResourceLoader(base_path=base_path) - assert loader._resolved_root is not None - assert loader._resolved_root.is_absolute() - if not prefix: - assert loader._resolved_root == Path.cwd().resolve() - - @given(st.just("static/path")) - def test_missing_locale_placeholder_raises(self, path: str) -> None: - """base_path without {locale} raises ValueError.""" - event("outcome=validation_error") - with pytest.raises(ValueError, match="must contain"): - PathResourceLoader(base_path=path) - - @given( - root_dir=st.just("/tmp/test_root"), - ) - def test_explicit_root_dir_overrides_derivation( - self, root_dir: str, - ) -> None: - """Explicit root_dir takes precedence over base_path derivation.""" - event("outcome=root_override") - loader = PathResourceLoader( - base_path="any/{locale}/path", root_dir=root_dir, - ) - assert loader._resolved_root == Path(root_dir).resolve() - - @given( - locale=st.from_regex(r"[A-Za-z][A-Za-z0-9]*(?:[_-][A-Za-z0-9]+)*", fullmatch=True), - ) - def test_valid_locales_pass_validation(self, locale: str) -> None: - """Locale codes without path separators or .. pass validation.""" - event(f"locale_len={len(locale)}") - # Should not raise - PathResourceLoader._validate_locale(locale) - - @given( - locale=st.sampled_from([ - "../etc", "en/US", "en\\US", "..", "a/../b", - ]), - ) - def test_unsafe_locales_rejected(self, locale: str) -> None: - """Locales with path traversal or separators are rejected.""" - event("outcome=locale_rejected") - with pytest.raises(ValueError, match=r"Invalid locale:"): - PathResourceLoader._validate_locale(locale) - - @given(st.just("")) - def test_empty_locale_rejected(self, locale: str) -> None: - """Empty locale string is rejected.""" - event("outcome=empty_locale") - with pytest.raises(ValueError, match="locale cannot be blank"): - PathResourceLoader._validate_locale(locale) - - @given( - resource_id=st.sampled_from([ - " main.ftl", "main.ftl ", "\tmain.ftl", - ]), - ) - def test_whitespace_resource_id_rejected( - self, resource_id: str, - ) -> None: - """Resource IDs with leading/trailing whitespace are rejected.""" - event("outcome=whitespace_rejected") - with pytest.raises(ValueError, match="whitespace"): - PathResourceLoader._validate_resource_id(resource_id) - - @given( - resource_id=st.sampled_from([ - "/etc/passwd", "\\windows\\sys", "../secret.ftl", - ]), - ) - def test_unsafe_resource_id_rejected( - self, resource_id: str, - ) -> None: - """Resource IDs with traversal or absolute paths are rejected.""" - event("outcome=resource_rejected") - with pytest.raises(ValueError, match="not allowed in resource_id"): - PathResourceLoader._validate_resource_id(resource_id) - - @given( - filename=st.text( - alphabet=st.characters( - whitelist_categories=("Ll", "Nd"), - blacklist_characters="./\\ \t\n", - ), - min_size=1, max_size=15, - ), - ) - def test_valid_resource_ids_accepted(self, filename: str) -> None: - """Clean resource IDs pass validation.""" - rid = f"{filename}.ftl" - event(f"rid_len={len(rid)}") - PathResourceLoader._validate_resource_id(rid) - - @settings(deadline=None, suppress_health_check=[HealthCheck.function_scoped_fixture]) - @given( - locale=st.sampled_from(["en", "de", "fr"]), - content=st.text( - min_size=1, max_size=100, - alphabet=st.characters( - blacklist_categories=("Cc", "Cs"), - ), - ), - ) - def test_load_roundtrip_preserves_content( - self, tmp_path: Path, locale: str, content: str, - ) -> None: - """PathResourceLoader.load returns exact file content.""" - event(f"locale={locale}") - locale_dir = tmp_path / "locales" / locale - locale_dir.mkdir(parents=True, exist_ok=True) - (locale_dir / "test.ftl").write_text(content, encoding="utf-8") - - loader = PathResourceLoader( - str(tmp_path / "locales" / "{locale}"), - ) - loaded = loader.load(locale, "test.ftl") - assert loaded == content - - -# --------------------------------------------------------------------------- -# FluentLocalization orchestration invariants -# --------------------------------------------------------------------------- - - -class TestFluentLocalizationOrchestration: - """Property invariants for FluentLocalization fallback behavior.""" - - @given(locales=locale_chains(min_size=1, max_size=5)) - def test_deduplication_preserves_order( - self, locales: list[str], - ) -> None: - """Locale deduplication preserves first-occurrence order.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - expected = tuple(dict.fromkeys(normalize_locale(locale) for locale in locales)) - assert l10n.locales == expected - - @given(locales=locale_chains(min_size=1, max_size=3)) - def test_locales_property_returns_same_instance( - self, locales: list[str], - ) -> None: - """locales property is referentially identical across calls.""" - event("outcome=identity_check") - l10n = FluentLocalization(locales) - assert l10n.locales is l10n.locales - - @given( - locales=locale_chains(min_size=2, max_size=4), - mid=message_ids(), - ) - def test_primary_locale_takes_precedence( - self, locales: list[str], mid: str, - ) -> None: - """First locale with message wins in fallback chain.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales, use_isolating=False) - for locale in locales: - l10n.add_resource(locale, f"{mid} = from-{locale}") - result, errors = l10n.format_value(mid) - assert not errors - assert result == f"from-{locales[0]}" - - @given( - locales=locale_chains(min_size=1, max_size=3), - mid=message_ids(), - ) - def test_has_message_consistent_with_format_value( - self, locales: list[str], mid: str, - ) -> None: - """has_message True iff format_value finds the message.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[0], f"{mid} = test") - has = l10n.has_message(mid) - _, errors = l10n.format_value(mid) - if has: - assert not any( - "not found in any locale" in str(e) for e in errors - ) - else: - assert any( - "not found in any locale" in str(e) for e in errors - ) - - @given( - locales=locale_chains(min_size=1, max_size=3), - mid=message_ids(), - ) - def test_format_value_deterministic( - self, locales: list[str], mid: str, - ) -> None: - """Repeated format_value calls return identical results.""" - event("outcome=determinism") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[0], f"{mid} = stable") - r1, _ = l10n.format_value(mid) - r2, _ = l10n.format_value(mid) - assert r1 == r2 - - @given(mid=message_ids()) - def test_missing_message_returns_braced_id(self, mid: str) -> None: - """Missing message returns {message_id} per Fluent convention. - - strict=False: missing-message error returned in tuple, not raised. - """ - event("outcome=missing_message") - l10n = FluentLocalization(["en"], strict=False) - result, errors = l10n.format_value(mid) - assert result == f"{{{mid}}}" - assert len(errors) == 1 - - @given(mid=st.just("")) - def test_empty_message_id_returns_fallback(self, mid: str) -> None: - """Empty message ID returns {???} fallback. - - strict=False: invalid-ID error returned in tuple, not raised. - """ - event("outcome=empty_id") - l10n = FluentLocalization(["en"], strict=False) - result, errors = l10n.format_value(mid) - assert result == "{???}" - assert len(errors) == 1 - - @given(locales=locale_chains(min_size=1, max_size=3)) - def test_repr_contains_locales_and_bundles( - self, locales: list[str], - ) -> None: - """__repr__ always includes locales and bundle count.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - r = repr(l10n) - assert "FluentLocalization" in r - assert "locales=" in r - assert "bundles=" in r - - -# --------------------------------------------------------------------------- -# FluentLocalization API methods (coverage targets) -# --------------------------------------------------------------------------- - - -class TestFluentLocalizationHasAttribute: - """Tests for has_attribute method (lines 1126-1130).""" - - @given( - locales=locale_chains(min_size=1, max_size=3), - ftl=ftl_messages_with_attributes(), - ) - def test_has_attribute_from_generated_resource( - self, locales: list[str], ftl: str, - ) -> None: - """has_attribute detects attributes in generated resources.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[0], ftl) - - # Extract message ID from generated FTL - first_line = ftl.split("\n", maxsplit=1)[0] - mid = first_line.split("=")[0].strip() - - # Check for attr0 (present if attributes were generated) - if ".attr0" in ftl: - assert l10n.has_attribute(mid, "attr0") is True - event("outcome=attribute_found") - else: - assert l10n.has_attribute(mid, "attr0") is False - event("outcome=no_attributes") - - @given(locales=locale_chains(min_size=2, max_size=4)) - def test_has_attribute_fallback_chain( - self, locales: list[str], - ) -> None: - """has_attribute searches across fallback chain.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - # Attribute only in last locale - l10n.add_resource( - locales[-1], "btn = Click\n .tooltip = Help text\n", - ) - assert l10n.has_attribute("btn", "tooltip") is True - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_has_attribute_missing_returns_false( - self, locales: list[str], - ) -> None: - """has_attribute returns False for nonexistent attributes.""" - event("outcome=not_found") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[0], "msg = No attributes\n") - assert l10n.has_attribute("msg", "nonexistent") is False - assert l10n.has_attribute("missing", "attr") is False - - -class TestFluentLocalizationGetMessageIds: - """Tests for get_message_ids method (lines 1142-1150).""" - - @given( - locales=locale_chains(min_size=1, max_size=3), - resources=ftl_resource_sets(), - ) - def test_get_message_ids_returns_union( - self, locales: list[str], resources: dict[str, str], - ) -> None: - """get_message_ids returns union of IDs across all locales.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - all_expected: set[str] = set() - for locale in locales: - if locale in resources: - l10n.add_resource(locale, resources[locale]) - # Parse message IDs from FTL - for line in resources[locale].split("\n"): - if "=" in line and not line.startswith( - ("#", " ", "-"), - ): - mid = line.split("=")[0].strip() - if mid: - all_expected.add(mid) - - ids = l10n.get_message_ids() - assert set(ids) == all_expected - # No duplicates - assert len(ids) == len(set(ids)) - - @given(locales=locale_chains(min_size=2, max_size=3)) - def test_get_message_ids_primary_locale_first( - self, locales: list[str], - ) -> None: - """get_message_ids orders primary locale IDs first.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[0], "alpha = A\n") - l10n.add_resource( - locales[-1], "alpha = A2\nbeta = B\n", - ) - ids = l10n.get_message_ids() - # alpha from primary appears before beta from fallback - assert ids.index("alpha") < ids.index("beta") - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_get_message_ids_empty_when_no_resources( - self, locales: list[str], - ) -> None: - """get_message_ids returns empty list when no resources loaded.""" - event("outcome=empty") - l10n = FluentLocalization(locales) - assert l10n.get_message_ids() == [] - - -class TestFluentLocalizationGetMessageVariables: - """Tests for get_message_variables method (lines 1169-1174).""" - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_get_message_variables_returns_variable_names( - self, locales: list[str], - ) -> None: - """get_message_variables extracts variable names from message.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - l10n.add_resource( - locales[0], - "greeting = Hello { $firstName } { $lastName }!\n", - ) - variables = l10n.get_message_variables("greeting") - assert "firstName" in variables - assert "lastName" in variables - - @given(locales=locale_chains(min_size=2, max_size=3)) - def test_get_message_variables_fallback( - self, locales: list[str], - ) -> None: - """get_message_variables searches fallback chain.""" - event("outcome=fallback_search") - l10n = FluentLocalization(locales) - l10n.add_resource( - locales[-1], "msg = Value { $count }\n", - ) - variables = l10n.get_message_variables("msg") - assert "count" in variables - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_get_message_variables_raises_for_missing( - self, locales: list[str], - ) -> None: - """get_message_variables raises KeyError for missing message.""" - event("outcome=key_error") - l10n = FluentLocalization(locales) - with pytest.raises(KeyError, match="not found"): - l10n.get_message_variables("nonexistent") - - -class TestFluentLocalizationGetAllMessageVariables: - """Tests for get_all_message_variables (lines 1188-1196).""" - - @given(locales=locale_chains(min_size=1, max_size=3)) - def test_get_all_message_variables_returns_dict( - self, locales: list[str], - ) -> None: - """get_all_message_variables returns dict of msg_id -> variables.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales) - l10n.add_resource( - locales[0], - "msg1 = { $name }\nmsg2 = Static text\n", - ) - all_vars = l10n.get_all_message_variables() - assert isinstance(all_vars, dict) - assert "msg1" in all_vars - assert "name" in all_vars["msg1"] - assert "msg2" in all_vars - - @given(locales=locale_chains(min_size=2, max_size=3)) - def test_primary_locale_variables_take_precedence( - self, locales: list[str], - ) -> None: - """Primary locale's variables win for duplicate message IDs.""" - event("outcome=precedence") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[0], "msg = { $primary }\n") - l10n.add_resource(locales[-1], "msg = { $fallback }\n") - all_vars = l10n.get_all_message_variables() - assert "primary" in all_vars["msg"] - - -class TestFluentLocalizationIntrospectTerm: - """Tests for introspect_term method (lines 1211-1217).""" - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_introspect_term_found( - self, locales: list[str], - ) -> None: - """introspect_term returns introspection for existing term.""" - event("outcome=term_found") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[0], "-brand = Firefox\n") - info = l10n.introspect_term("brand") - assert info is not None - - @given(locales=locale_chains(min_size=2, max_size=3)) - def test_introspect_term_fallback( - self, locales: list[str], - ) -> None: - """introspect_term searches fallback chain.""" - event("outcome=term_fallback") - l10n = FluentLocalization(locales) - l10n.add_resource(locales[-1], "-product = App\n") - info = l10n.introspect_term("product") - assert info is not None - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_introspect_term_not_found( - self, locales: list[str], - ) -> None: - """introspect_term returns None for missing term.""" - event("outcome=term_not_found") - l10n = FluentLocalization(locales) - info = l10n.introspect_term("nonexistent") - assert info is None - - - -# --------------------------------------------------------------------------- -# Resource loading and load summary -# --------------------------------------------------------------------------- - - -class TestFluentLocalizationResourceLoading: - """Tests for resource loading and load summary.""" - - @given( - loader_tuple=resource_loaders(), - ) - def test_load_summary_tracks_all_attempts( - self, - loader_tuple: tuple[ - DictResourceLoader | FailingResourceLoader, - list[str], - list[str], - ], - ) -> None: - """get_load_summary reflects all load attempts from init.""" - loader, locales, resource_ids = loader_tuple - event(f"locale_count={len(locales)}") - l10n = FluentLocalization( - locales, resource_ids, loader, - ) - summary = l10n.get_load_summary() - assert summary.total_attempted == len(locales) * len(resource_ids) - - @given(locales=locale_chains(min_size=1, max_size=3)) - def test_custom_loader_source_path_format( - self, locales: list[str], - ) -> None: - """Non-PathResourceLoader uses locale/resource_id as source_path.""" - event("outcome=custom_loader_path") - resources = { - loc: {"main.ftl": f"msg = {loc}\n"} - for loc in locales - } - loader = DictResourceLoader(resources) - l10n = FluentLocalization(locales, ["main.ftl"], loader) - summary = l10n.get_load_summary() - for result in summary.results: - # Custom loader uses "locale/resource_id" format - assert "/" in result.source_path # type: ignore[operator] - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_oserror_during_load_recorded_as_error( - self, locales: list[str], - ) -> None: - """OSError during resource loading recorded with ERROR status.""" - event("outcome=oserror_recorded") - loader = FailingResourceLoader(OSError, "Permission denied") - l10n = FluentLocalization(locales, ["main.ftl"], loader) - summary = l10n.get_load_summary() - assert summary.errors > 0 - for result in summary.get_errors(): - assert isinstance(result.error, OSError) - - @given(locales=locale_chains(min_size=1, max_size=2)) - def test_valueerror_during_load_recorded_as_error( - self, locales: list[str], - ) -> None: - """ValueError during resource loading recorded with ERROR status.""" - event("outcome=valueerror_recorded") - loader = FailingResourceLoader(ValueError, "Path traversal") - l10n = FluentLocalization(locales, ["main.ftl"], loader) - summary = l10n.get_load_summary() - assert summary.errors > 0 - for result in summary.get_errors(): - assert isinstance(result.error, ValueError) - - -# --------------------------------------------------------------------------- -# Cache stats aggregation branch coverage -# --------------------------------------------------------------------------- - - -class TestCacheStatsAggregation: - """Tests for get_cache_stats aggregation (branch 1327->1325).""" - - @given( - locales=locale_chains(min_size=2, max_size=4), - ) - def test_cache_stats_aggregates_across_bundles( - self, locales: list[str], - ) -> None: - """get_cache_stats sums metrics across all initialized bundles.""" - event(f"bundle_count={len(locales)}") - l10n = FluentLocalization( - locales, cache=CacheConfig(), - ) - # Initialize all bundles with resources - for locale in locales: - l10n.add_resource(locale, f"msg = {locale}\n") - - # Format to create cache entries - l10n.format_value("msg") - - stats = l10n.get_cache_stats() - assert stats is not None - assert stats["bundle_count"] == len(locales) - assert l10n.cache_config is not None - assert stats["maxsize"] == l10n.cache_config.size * len(locales) - - @given( - locales=locale_chains(min_size=1, max_size=2), - ) - def test_cache_stats_none_when_disabled( - self, locales: list[str], - ) -> None: - """get_cache_stats returns None when caching disabled.""" - event("outcome=cache_disabled") - l10n = FluentLocalization(locales) - assert l10n.get_cache_stats() is None - - -# --------------------------------------------------------------------------- -# Fallback callback -# --------------------------------------------------------------------------- - - -class TestFallbackCallback: - """Tests for on_fallback callback with property-based inputs.""" - - @given( - locales=locale_chains(min_size=2, max_size=4), - mid=message_ids(), - ) - def test_fallback_callback_invoked_for_non_primary( - self, locales: list[str], mid: str, - ) -> None: - """on_fallback invoked when message resolved from non-primary.""" - event(f"locale_count={len(locales)}") - from ftllexengine.localization import FallbackInfo # noqa: PLC0415 - import inside function - events: list[FallbackInfo] = [] - l10n = FluentLocalization( - locales, on_fallback=events.append, - ) - # Only add to last locale - l10n.add_resource(locales[-1], f"{mid} = fallback\n") - l10n.format_value(mid) - if len(locales) > 1: - assert len(events) == 1 - assert events[0].requested_locale == normalize_locale(locales[0]) - assert events[0].resolved_locale == normalize_locale(locales[-1]) - assert events[0].message_id == mid - - @given( - locales=locale_chains(min_size=1, max_size=3), - mid=message_ids(), - ) - def test_no_fallback_when_primary_has_message( - self, locales: list[str], mid: str, - ) -> None: - """on_fallback not invoked when primary locale has message.""" - event("outcome=no_fallback") - from ftllexengine.localization import FallbackInfo # noqa: PLC0415 - import inside function - events: list[FallbackInfo] = [] - l10n = FluentLocalization( - locales, on_fallback=events.append, - ) - l10n.add_resource(locales[0], f"{mid} = primary\n") - l10n.format_value(mid) - assert len(events) == 0 - - -# --------------------------------------------------------------------------- -# add_function deferred application -# --------------------------------------------------------------------------- - - -class TestAddFunctionDeferred: - """Tests for add_function deferred/immediate application.""" - - @given(locales=locale_chains(min_size=1, max_size=3)) - def test_function_applied_to_existing_bundles( - self, locales: list[str], - ) -> None: - """add_function applies to already-created bundles.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales, use_isolating=False) - # Create bundles by adding resources - for locale in locales: - l10n.add_resource(locale, "msg = { UPPER($x) }\n") - - def upper_fn(value: str) -> str: - return value.upper() - - l10n.add_function("UPPER", upper_fn) - result, _ = l10n.format_value("msg", {"x": "test"}) - assert "TEST" in result - - @given(locales=locale_chains(min_size=2, max_size=3)) - def test_function_stored_for_lazy_bundles( - self, locales: list[str], - ) -> None: - """add_function stored for bundles created later.""" - event("outcome=deferred") - l10n = FluentLocalization(locales, use_isolating=False) - - def lower_fn(value: str) -> str: - return value.lower() - - l10n.add_function("LOWER", lower_fn) - # Add resource and format after function registration - l10n.add_resource(locales[0], "msg = { LOWER($x) }\n") - result, _ = l10n.format_value("msg", {"x": "HELLO"}) - assert "hello" in result - - -# --------------------------------------------------------------------------- -# Validation edge cases -# --------------------------------------------------------------------------- - - -class TestValidationEdgeCases: - """Validation and defensive checks.""" - - @given( - locale=st.sampled_from(["en", "de"]), - ws=st.sampled_from([" ", "\t", "\n"]), - position=st.sampled_from(["leading", "trailing"]), - ) - def test_add_resource_whitespace_locale_rejected( - self, locale: str, ws: str, position: str, - ) -> None: - """add_resource trims locale boundaries and resolves them canonically.""" - event(f"position={position}") - padded = ws + locale if position == "leading" else locale + ws - l10n = FluentLocalization([locale]) - l10n.add_resource(padded, "msg = test") - assert l10n.has_message("msg") - assert l10n.locales == (normalize_locale(locale),) - - @given( - locale=st.sampled_from(["en", "de"]), - invalid_args=st.sampled_from([42, "str", [1, 2], True]), - ) - def test_format_value_invalid_args_type( - self, locale: str, invalid_args: int | str | list[int] | bool, - ) -> None: - """format_value with non-Mapping args returns error. - - strict=False: invalid-args error returned in tuple, not raised. - """ - event("outcome=invalid_args") - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, "msg = test") - result, errors = l10n.format_value( - "msg", invalid_args, # type: ignore[arg-type] - ) - assert result == "{???}" - assert len(errors) > 0 - - @given( - locale=st.sampled_from(["en", "de"]), - invalid_attr=st.sampled_from([42, Decimal("3.14"), ["a"], {"k": "v"}]), - ) - def test_format_pattern_invalid_attribute_type( - self, - locale: str, - invalid_attr: int | Decimal | list[str] | dict[str, str], - ) -> None: - """format_pattern with non-str attribute returns error. - - strict=False: invalid-attribute error returned in tuple, not raised. - """ - event("outcome=invalid_attr") - l10n = FluentLocalization([locale], strict=False) - l10n.add_resource(locale, "msg = test\n .a = v") - result, errors = l10n.format_pattern( - "msg", None, - attribute=invalid_attr, # type: ignore[arg-type] - ) - assert result == "{???}" - assert len(errors) > 0 - - -# --------------------------------------------------------------------------- -# Terms with Hypothesis strategies -# --------------------------------------------------------------------------- - - -class TestTermsWithStrategies: - """Tests using ftl_messages_with_terms strategy.""" - - @given( - locales=locale_chains(min_size=1, max_size=2), - ftl=ftl_messages_with_terms(), - ) - def test_terms_parsed_and_resolvable( - self, locales: list[str], ftl: str, - ) -> None: - """Generated terms are parsed without errors.""" - event(f"locale_count={len(locales)}") - l10n = FluentLocalization(locales, use_isolating=False) - junk = l10n.add_resource(locales[0], ftl) - # Should parse without junk - assert len(junk) == 0 - - -# --------------------------------------------------------------------------- -# add_resource_stream oracle: streaming == buffered for FluentLocalization -# --------------------------------------------------------------------------- - - -class TestAddResourceStreamLocalizationOracle: - """Oracle: FluentLocalization.add_resource_stream is equivalent to add_resource. - - Properties verified: - - Message IDs registered via stream match those registered via buffered load. - - format_pattern results are identical across both loading paths. - - Junk entry counts match for the same FTL content. - - Second call to add_resource_stream (bundle already exists) behaves correctly. - """ - - @given( - locales=locale_chains(min_size=1, max_size=3), - source=ftl_simple_messages(), - ) - def test_message_ids_match_add_resource( - self, locales: list[str], source: str - ) -> None: - """Stream-loaded message IDs equal buffered-loaded IDs for same FTL.""" - l_buf = FluentLocalization(locales, use_isolating=False, strict=False) - l_str = FluentLocalization(locales, use_isolating=False, strict=False) - locale = locales[0] - - l_buf.add_resource(locale, source) - l_str.add_resource_stream(locale, source.splitlines(keepends=True)) - - # Format the same message IDs from both — stream must produce same results - buf_ids: set[str] = set() - for msg_id in source.splitlines(): - if " = " in msg_id: - buf_ids.add(msg_id.split(" = ", 1)[0].strip()) - - event(f"l10n_stream_locale_count={len(locales)}") - for mid in buf_ids: - r_buf, e_buf = l_buf.format_pattern(mid) - r_str, e_str = l_str.format_pattern(mid) - event(f"outcome={'match' if r_buf == r_str else 'mismatch'}") - assert r_buf == r_str, ( - f"format_pattern mismatch for {mid!r}: " - f"buffered={r_buf!r}, stream={r_str!r}" - ) - assert len(e_buf) == len(e_str) - - @given( - locales=locale_chains(min_size=1, max_size=2), - source=ftl_simple_messages(), - ) - def test_junk_count_matches_add_resource( - self, locales: list[str], source: str - ) -> None: - """Junk count from stream load matches junk count from buffered load.""" - l_buf = FluentLocalization(locales, use_isolating=False, strict=False) - l_str = FluentLocalization(locales, use_isolating=False, strict=False) - locale = locales[0] - - junk_buf = l_buf.add_resource(locale, source) - junk_str = l_str.add_resource_stream( - locale, source.splitlines(keepends=True) - ) - - event(f"junk_buf={len(junk_buf)}") - event(f"junk_stream={len(junk_str)}") - assert len(junk_buf) == len(junk_str) - - @given( - locales=locale_chains(min_size=1, max_size=2), - source1=ftl_simple_messages(), - source2=ftl_simple_messages(), - ) - def test_second_stream_call_accumulates_messages( - self, locales: list[str], source1: str, source2: str - ) -> None: - """Two add_resource_stream calls accumulate messages on same bundle. - - The second call hits the pre-existing bundle path (orchestrator.py - line 734->736 False branch) — verifies correct state accumulation. - """ - l10n = FluentLocalization(locales, use_isolating=False, strict=False) - locale = locales[0] - - l10n.add_resource_stream(locale, source1.splitlines(keepends=True)) - l10n.add_resource_stream(locale, source2.splitlines(keepends=True)) - - event("l10n_two_stream_calls=done") - # At minimum the bundle must exist and be queryable - result, _errors = l10n.format_pattern("__nonexistent__") - event(f"fallback_result={result!r}") - # Missing message returns fallback string, not exception, in non-strict mode - assert "__nonexistent__" in result +"""Aggregated fuzz localization property test surface.""" + +from tests.fuzz_localization_property_cases.add_function_deferred_application import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.add_resource_stream_oracle_streaming_buffered_for_fluent_localization import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.cache_stats_aggregation_branch_coverage import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.fallback_callback import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.fluent_localization_api_methods_coverage_targets import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.fluent_localization_orchestration_invariants import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.load_summary_aggregation_invariants import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.path_resource_loader_invariants import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.resource_load_result_property_invariants import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.resource_loading_and_load_summary import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.terms_with_hypothesis_strategies import * # noqa: F403 - re-export split test surface +from tests.fuzz_localization_property_cases.validation_edge_cases import * # noqa: F403 - re-export split test surface diff --git a/tests/fuzz/test_parsing_numbers_property.py b/tests/fuzz/test_parsing_numbers_property.py index 50eba29b..22c9599d 100644 --- a/tests/fuzz/test_parsing_numbers_property.py +++ b/tests/fuzz/test_parsing_numbers_property.py @@ -128,6 +128,21 @@ def test_parse_decimal_invalid_returns_error(self, invalid_input: str) -> None: assert len(errors) > 0 assert result is None + @given( + value=st.one_of( + st.integers(), + st.decimals(allow_nan=False, allow_infinity=False), + st.lists(st.integers()), + st.dictionaries(st.text(), st.integers()), + ), + ) + def test_parse_decimal_type_error_returns_error(self, value: object) -> None: + """Non-string inputs return errors in tuple form instead of raising.""" + result, errors = parse_decimal(value, "en_US") + event(f"input_type={type(value).__name__}") + assert len(errors) > 0 + assert result is None + @given( locale=st.sampled_from(["en_US", "de_DE", "fr_FR", "lv_LV", "pl_PL", "ja_JP"]), value=st.decimals( diff --git a/tests/fuzz/test_runtime_resolver_state_machine.py b/tests/fuzz/test_runtime_resolver_state_machine.py index b5a90d3b..11139bdc 100644 --- a/tests/fuzz/test_runtime_resolver_state_machine.py +++ b/tests/fuzz/test_runtime_resolver_state_machine.py @@ -1,1198 +1,6 @@ -"""Stateful and advanced property-based tests for FluentResolver. +"""Aggregated fuzz runtime resolver state machine test surface.""" -Consolidates: -- test_resolver_state_machine.py: FluentResolverStateMachine (fuzz), TestResolverErrorPaths -- test_resolver_advanced_hypothesis.py: all classes -""" - -from __future__ import annotations - -from decimal import Decimal - -import pytest -from hypothesis import assume, event, given -from hypothesis import strategies as st -from hypothesis.stateful import Bundle, RuleBasedStateMachine, initialize, invariant, rule - -from ftllexengine.core.value_types import FluentValue -from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError -from ftllexengine.runtime.bundle import FluentBundle -from ftllexengine.runtime.function_bridge import FunctionRegistry -from ftllexengine.runtime.functions import create_default_registry -from ftllexengine.runtime.resolver import FluentResolver -from ftllexengine.syntax import ( - Attribute, - CallArguments, - FunctionReference, - Identifier, - Message, - MessageReference, - NumberLiteral, - Pattern, - Placeable, - SelectExpression, - Term, - TermReference, - TextElement, - VariableReference, - Variant, -) -from tests.strategies import ftl_identifiers, ftl_simple_text - -# ============================================================================ -# STRATEGY HELPERS -# ============================================================================ - - -def simple_pattern(text: str) -> Pattern: - """Create simple text pattern.""" - return Pattern(elements=(TextElement(value=text),)) - - -def variable_pattern(var_name: str) -> Pattern: - """Create pattern with variable reference.""" - return Pattern( - elements=( - Placeable(expression=VariableReference(id=Identifier(name=var_name))), - ) - ) - - -def term_reference_pattern(term_name: str) -> Pattern: - """Create pattern with term reference.""" - return Pattern( - elements=( - Placeable( - expression=TermReference(id=Identifier(name=term_name), attribute=None) - ), - ) - ) - - -def message_reference_pattern(msg_name: str) -> Pattern: - """Create pattern with message reference.""" - return Pattern( - elements=( - Placeable( - expression=MessageReference(id=Identifier(name=msg_name), attribute=None) - ), - ) - ) - - -# ============================================================================ -# STATE MACHINE -# ============================================================================ - - -class FluentResolverStateMachine(RuleBasedStateMachine): - """State machine for testing FluentResolver. - - Bundles: - - messages: Message IDs that have been added - - terms: Term IDs that have been added - - variables: Variable names used in patterns - - Invariants: - - Resolving same message twice produces same result (determinism) - - Resolver never crashes (robustness) - - All messages are resolvable with correct args - """ - - messages = Bundle("messages") - terms = Bundle("terms") - variables = Bundle("variables") - - @initialize() - def setup_resolver(self) -> None: - """Initialize resolver with empty registries.""" - self.message_registry: dict[str, Message] = {} - self.term_registry: dict[str, Term] = {} - self.locale = "en_US" - self.resolver = FluentResolver( - locale=self.locale, - messages=self.message_registry, - terms=self.term_registry, - function_registry=create_default_registry(), - use_isolating=False, - ) - - @rule(target=messages, msg_id=ftl_identifiers(), text=st.text(min_size=1, max_size=50)) - def add_simple_message(self, msg_id: str, text: str) -> str: - """Add simple text-only message.""" - message = Message( - id=Identifier(name=msg_id), - value=simple_pattern(text), - attributes=(), - comment=None, - ) - self.message_registry[msg_id] = message - event("rule=add_simple_message") - return msg_id - - @rule( - target=messages, - msg_id=ftl_identifiers(), - var_name=ftl_identifiers(), - ) - def add_message_with_variable(self, msg_id: str, var_name: str) -> str: - """Add message that requires variable argument.""" - message = Message( - id=Identifier(name=msg_id), - value=variable_pattern(var_name), - attributes=(), - comment=None, - ) - self.message_registry[msg_id] = message - event("rule=add_message_with_variable") - return msg_id - - @rule(target=terms, term_id=ftl_identifiers(), text=st.text(min_size=1, max_size=50)) - def add_simple_term(self, term_id: str, text: str) -> str: - """Add simple term.""" - term = Term( - id=Identifier(name=term_id), - value=simple_pattern(text), - attributes=(), - comment=None, - ) - self.term_registry[term_id] = term - event("rule=add_simple_term") - return term_id - - @rule( - target=messages, - msg_id=ftl_identifiers(), - term_id=terms, - ) - def add_message_referencing_term(self, msg_id: str, term_id: str) -> str: - """Add message that references a term.""" - message = Message( - id=Identifier(name=msg_id), - value=term_reference_pattern(term_id), - attributes=(), - comment=None, - ) - self.message_registry[msg_id] = message - event("rule=add_message_referencing_term") - return msg_id - - @rule(msg_id=messages) - def resolve_simple_message(self, msg_id: str) -> None: - """Resolve message without arguments. Checks determinism.""" - assume(msg_id in self.message_registry) - message = self.message_registry[msg_id] - - needs_vars = any( - isinstance(elem, Placeable) - and isinstance(elem.expression, VariableReference) - for elem in (message.value.elements if message.value else ()) - ) - - if needs_vars: - result, errors = self.resolver.resolve_message(message, args={}) - assert isinstance(result, str) - assert len(errors) >= 0 - else: - result1, _errors = self.resolver.resolve_message(message, args={}) - result2, _errors = self.resolver.resolve_message(message, args={}) - assert result1 == result2, f"Resolution should be deterministic for {msg_id}" - assert isinstance(result1, str) - event(f"rule=resolve_simple(vars={needs_vars})") - - @rule( - msg_id=messages, - var_name=ftl_identifiers(), - var_value=st.text(max_size=50), - ) - def resolve_message_with_args(self, msg_id: str, var_name: str, var_value: str) -> None: - """Resolve message with arguments.""" - assume(msg_id in self.message_registry) - message = self.message_registry[msg_id] - - args = {var_name: var_value} - - try: - result, _errors = self.resolver.resolve_message(message, args=args) - assert isinstance(result, str) - except FrozenFluentError: - pass - event("rule=resolve_message_with_args") - - @rule( - msg_id=ftl_identifiers(), - attr_name=ftl_identifiers(), - text=st.text(min_size=1, max_size=50), - ) - def add_message_with_attribute(self, msg_id: str, attr_name: str, text: str) -> None: - """Add message with attribute and resolve it.""" - attribute = Attribute( - id=Identifier(name=attr_name), - value=simple_pattern(text), - ) - message = Message( - id=Identifier(name=msg_id), - value=simple_pattern("default value"), - attributes=(attribute,), - comment=None, - ) - self.message_registry[msg_id] = message - - result, errors = self.resolver.resolve_message(message, args={}, attribute=attr_name) - assert text in result - assert errors == (), f"Unexpected errors: {errors}" - event("rule=add_message_with_attribute") - - @rule(msg_id=messages) - def resolve_nonexistent_attribute(self, msg_id: str) -> None: - """Try to resolve non-existent attribute - should give REFERENCE error.""" - assume(msg_id in self.message_registry) - message = self.message_registry[msg_id] - - _result, errors = self.resolver.resolve_message( - message, args={}, attribute="nonexistent_attr_xyz" - ) - assert len(errors) == 1 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - assert "attribute" in str(errors[0]).lower() - event("rule=resolve_nonexistent_attribute") - - @rule() - def resolve_nonexistent_term(self) -> None: - """Try to resolve term reference to non-existent term.""" - msg_id = "msg_ref_bad_term" - message = Message( - id=Identifier(name=msg_id), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier(name="nonexistent_term_xyz"), - attribute=None, - ) - ), - ) - ), - attributes=(), - comment=None, - ) - self.message_registry[msg_id] = message - - result, errors = self.resolver.resolve_message(message, args={}) - assert isinstance(result, str) - assert len(errors) > 0 - event("rule=resolve_nonexistent_term") - - @rule(term_id=terms) - def resolve_term_attribute_not_found(self, term_id: str) -> None: - """Try to resolve term attribute that doesn't exist.""" - assume(term_id in self.term_registry) - - msg_id = "msg_ref_term_attr" - message = Message( - id=Identifier(name=msg_id), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier(name=term_id), - attribute=Identifier(name="nonexistent_attr"), - ) - ), - ) - ), - attributes=(), - comment=None, - ) - self.message_registry[msg_id] = message - - result, errors = self.resolver.resolve_message(message, args={}) - assert isinstance(result, str) - assert len(errors) > 0 - event("rule=resolve_term_attr_not_found") - - @rule() - def test_unknown_expression_type(self) -> None: - """Document architecturally unreachable expression type error path. - - The unknown expression error path is unreachable by design since all - AST node types are exhaustively handled. This rule documents the gap. - """ - event("rule=test_unknown_expression_type") - - @rule( - msg_id1=ftl_identifiers(), - msg_id2=ftl_identifiers(), - ) - def test_circular_reference_detection(self, msg_id1: str, msg_id2: str) -> None: - """Test circular reference detection produces graceful degradation.""" - assume(msg_id1 != msg_id2) - - message1 = Message( - id=Identifier(name=msg_id1), - value=message_reference_pattern(msg_id2), - attributes=(), - comment=None, - ) - message2 = Message( - id=Identifier(name=msg_id2), - value=message_reference_pattern(msg_id1), - attributes=(), - comment=None, - ) - - self.message_registry[msg_id1] = message1 - self.message_registry[msg_id2] = message2 - - result, _errors = self.resolver.resolve_message(message1, args={}) - assert isinstance(result, str) - event("rule=circular_reference_detection") - - @rule( - msg_id=ftl_identifiers(), - number=st.integers(min_value=0, max_value=100), - ) - def add_message_with_select_expression(self, msg_id: str, number: int) -> None: - """Add message with select expression (plural).""" - variants = ( - Variant( - key=Identifier(name="one"), - value=simple_pattern("singular"), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=simple_pattern("plural"), - default=True, - ), - ) - - select_expr = SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=variants, - ) - - message = Message( - id=Identifier(name=msg_id), - value=Pattern(elements=(Placeable(expression=select_expr),)), - attributes=(), - comment=None, - ) - - self.message_registry[msg_id] = message - - result, errors = self.resolver.resolve_message(message, args={"count": number}) - assert result in ["singular", "plural"] - assert errors == (), f"Unexpected errors: {errors}" - event(f"rule=select_expression({result})") - - @rule() - def test_message_no_value(self) -> None: - """Test message without value (only attributes) produces REFERENCE error.""" - msg_id = "msg_no_value" - message = Message( - id=Identifier(name=msg_id), - value=None, - attributes=( - Attribute( - id=Identifier(name="attr1"), - value=simple_pattern("has attribute"), - ), - ), - comment=None, - ) - self.message_registry[msg_id] = message - - result, errors = self.resolver.resolve_message(message, args={}) - assert len(errors) == 1 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - assert "no value" in str(errors[0]).lower() - assert isinstance(result, str) - event("rule=test_message_no_value") - - @rule( - msg_id=ftl_identifiers(), - func_name=st.sampled_from(["NUMBER", "NONEXISTENT"]), - ) - def test_function_reference(self, msg_id: str, func_name: str) -> None: - """Test function reference resolution (both successful and failed calls).""" - func_ref = FunctionReference( - id=Identifier(name=func_name), - arguments=CallArguments( - positional=(NumberLiteral(value=42, raw="42"),), - named=(), - ), - ) - - message = Message( - id=Identifier(name=msg_id), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - comment=None, - ) - - self.message_registry[msg_id] = message - - result, errors = self.resolver.resolve_message(message, args={}) - assert isinstance(result, str) - - if func_name == "NUMBER": - assert "42" in result - assert errors == () - else: - assert len(errors) > 0 - event(f"rule=function_reference({func_name})") - - @invariant() - def resolver_state_consistent(self) -> None: - """Invariant: Resolver registries stay consistent.""" - assert self.resolver._messages == self.message_registry - assert self.resolver._terms == self.term_registry - assert self.resolver._locale == self.locale - msg_count = len(self.message_registry) - event(f"invariant=state_consistent({msg_count})") - - @invariant() - def resolution_uses_explicit_context(self) -> None: - """Invariant: Resolver properly initialized with explicit context pattern.""" - assert self.resolver._locale == self.locale - event("invariant=explicit_context") - - -# Stateful test runner -TestFluentResolverStateMachine = FluentResolverStateMachine.TestCase -TestFluentResolverStateMachine = pytest.mark.fuzz(TestFluentResolverStateMachine) - - -# ============================================================================ -# DIRECT ERROR PATH TESTS (from state machine module) -# ============================================================================ - - -class TestStatefulErrorPaths: - """Direct tests for specific error paths that are hard to reach via state machine.""" - - def test_term_not_found_direct(self) -> None: - """Term not found error (line 176).""" - resolver = FluentResolver( - locale="en_US", - messages={}, - terms={}, - function_registry=create_default_registry(), - use_isolating=False, - ) - - message = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier(name="nonexistent"), - attribute=None, - ) - ), - ) - ), - attributes=(), - comment=None, - ) - - result, errors = resolver.resolve_message(message, args={}) - assert len(errors) > 0 - assert "{-nonexistent}" in result - - def test_term_attribute_not_found_direct(self) -> None: - """Term attribute not found error (lines 182-185).""" - from ftllexengine.syntax import Term # noqa: PLC0415 - import inside function - - term = Term( - id=Identifier(name="brand"), - value=simple_pattern("Firefox"), - attributes=(), - comment=None, - ) - - resolver = FluentResolver( - locale="en_US", - messages={}, - terms={"brand": term}, - function_registry=create_default_registry(), - use_isolating=False, - ) - - message = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier(name="brand"), - attribute=Identifier(name="nonexistent"), - ) - ), - ) - ), - attributes=(), - comment=None, - ) - - result, errors = resolver.resolve_message(message, args={}) - assert len(errors) > 0 - assert "{-brand.nonexistent}" in result - - def test_message_not_found_reference(self) -> None: - """Message not found when referenced from another message (line 164).""" - message = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=MessageReference( - id=Identifier(name="nonexistent"), - attribute=None, - ) - ), - ) - ), - attributes=(), - comment=None, - ) - - resolver = FluentResolver( - locale="en_US", - messages={"test": message}, - terms={}, - function_registry=create_default_registry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, args={}) - assert len(errors) > 0 - assert "{nonexistent}" in result - - def test_variable_not_provided(self) -> None: - """Variable not provided in args (line 157).""" - message = Message( - id=Identifier(name="test"), - value=variable_pattern("missing_var"), - attributes=(), - comment=None, - ) - - resolver = FluentResolver( - locale="en_US", - messages={"test": message}, - terms={}, - function_registry=create_default_registry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, args={}) - assert len(errors) > 0 - assert "{$missing_var}" in result - - @given(st.data()) - def test_format_value_edge_cases(self, data: st.DataObject) -> None: - """Property: _format_value never crashes, always returns string (lines 268-278).""" - resolver = FluentResolver( - locale="en_US", - messages={}, - terms={}, - function_registry=create_default_registry(), - use_isolating=False, - ) - - test_values: list[FluentValue] = [ - data.draw(st.text()), - data.draw(st.integers()), - data.draw(st.decimals(allow_nan=False, allow_infinity=False)), - data.draw(st.booleans()), - None, - ] - - value = None - for value in test_values: - result = resolver._format_value(value) - assert isinstance(result, str), f"_format_value({value}) should return string" - val_type = type(value).__name__ - event(f"last_value_type={val_type}") - - def test_select_expression_no_variants(self) -> None: - """SelectExpression with no variants raises ValueError at construction.""" - with pytest.raises(ValueError, match="at least one variant"): - SelectExpression( - selector=NumberLiteral(value=1, raw="1"), - variants=(), - ) - - -# ============================================================================ -# ADVANCED PROPERTY-BASED TESTS -# ============================================================================ - - -class TestPatternResolution: - """Properties about pattern resolution.""" - - @given( - msg_id=ftl_identifiers(), - text_content=ftl_simple_text(), - ) - def test_simple_text_resolution(self, msg_id: str, text_content: str) -> None: - """Property: Simple text patterns resolve to their content.""" - event(f"text_len={len(text_content)}") - pattern = Pattern(elements=(TextElement(value=text_content),)) - message = Message(id=Identifier(name=msg_id), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en_US", - messages={msg_id: message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {}) - assert not errors - assert result == text_content, f"Expected {text_content}, got {result}" - - @given( - msg_id=ftl_identifiers(), - parts=st.lists(ftl_simple_text(), min_size=2, max_size=5), - ) - def test_multiple_text_elements_concatenation( - self, msg_id: str, parts: list[str] - ) -> None: - """Property: Multiple text elements are concatenated in order.""" - event(f"part_count={len(parts)}") - elements = tuple(TextElement(value=p) for p in parts) - pattern = Pattern(elements=elements) - message = Message(id=Identifier(name=msg_id), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en_US", - messages={msg_id: message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {}) - assert not errors - expected = "".join(parts) - assert result == expected, f"Concatenation mismatch: {result} != {expected}" - - -class TestVariableResolution: - """Properties about variable reference resolution.""" - - @given( - var_name=ftl_identifiers(), - var_value=st.one_of( - st.text(min_size=1, max_size=50), - st.integers(), - st.decimals(allow_nan=False, allow_infinity=False), - ), - ) - def test_variable_value_preservation( - self, var_name: str, var_value: str | int | Decimal - ) -> None: - """Property: Variable values are preserved in resolution.""" - val_type = type(var_value).__name__ - event(f"var_type={val_type}") - bundle = FluentBundle("en_US", use_isolating=False) - - ftl_source = f"msg = {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("msg", {var_name: var_value}) - assert not errors - assert str(var_value) in result, f"Variable value not in result: {result}" - - @given(var_name=ftl_identifiers()) - def test_missing_variable_error_handling(self, var_name: str) -> None: - """Property: Missing variables are handled gracefully.""" - event(f"var_name_len={len(var_name)}") - # strict=False: testing soft-error return semantics; missing-variable - # errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", strict=False) - - ftl_source = f"msg = {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, _errors = bundle.format_pattern("msg", {}) - assert isinstance(result, str), "Must return string even on missing variable" - - @given(var_count=st.integers(min_value=1, max_value=10)) - def test_multiple_variables_independent(self, var_count: int) -> None: - """Property: Multiple variables resolve independently.""" - event(f"var_count={var_count}") - bundle = FluentBundle("en_US", use_isolating=False) - - var_names = [f"v{i}" for i in range(var_count)] - placeholders = " ".join(f"{{ ${vn} }}" for vn in var_names) - ftl_source = f"msg = {placeholders}" - bundle.add_resource(ftl_source) - - args = {vn: f"val{i}" for i, vn in enumerate(var_names)} - result, errors = bundle.format_pattern("msg", args) - assert not errors - for value in args.values(): - assert value in result, f"Variable value {value} missing" - - -class TestMessageReferenceResolution: - """Properties about message reference resolution.""" - - @given( - ref_msg_id=ftl_identifiers(), - ref_value=ftl_simple_text(), - main_msg_id=ftl_identifiers(), - ) - def test_message_reference_resolution( - self, ref_msg_id: str, ref_value: str, main_msg_id: str - ) -> None: - """Property: Message references resolve to referenced message value.""" - event(f"ref_value_len={len(ref_value)}") - assume(ref_msg_id != main_msg_id) - - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f""" -{ref_msg_id} = {ref_value} -{main_msg_id} = {{ {ref_msg_id} }} -""" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern(main_msg_id) - assert not errors - assert ref_value.strip() in result, f"Referenced message value not in result: {result}" - - @given( - nonexistent_id=ftl_identifiers(), - main_msg_id=ftl_identifiers(), - ) - def test_missing_message_reference_handling( - self, nonexistent_id: str, main_msg_id: str - ) -> None: - """Property: Missing message references handled gracefully.""" - event(f"id_len={len(nonexistent_id)}") - assume(nonexistent_id != main_msg_id) - - # strict=False: testing soft-error return semantics; missing-message- - # reference errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", strict=False) - ftl_source = f"{main_msg_id} = {{ {nonexistent_id} }}" - bundle.add_resource(ftl_source) - - result, _errors = bundle.format_pattern(main_msg_id) - assert isinstance(result, str), "Must return string for missing reference" - - -class TestTermReferenceResolution: - """Properties about term reference resolution.""" - - @given( - term_id=ftl_identifiers(), - term_value=ftl_simple_text(), - msg_id=ftl_identifiers(), - ) - def test_term_reference_resolution( - self, term_id: str, term_value: str, msg_id: str - ) -> None: - """Property: Term references resolve to term value.""" - event(f"term_value_len={len(term_value)}") - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f""" --{term_id} = {term_value} -{msg_id} = {{ -{term_id} }} -""" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern(msg_id) - assert not errors - assert term_value.strip() in result, f"Term value not in result: {result}" - - @given( - nonexistent_term=ftl_identifiers(), - msg_id=ftl_identifiers(), - ) - def test_missing_term_reference_handling( - self, nonexistent_term: str, msg_id: str - ) -> None: - """Property: Missing term references handled gracefully.""" - event(f"term_len={len(nonexistent_term)}") - # strict=False: testing soft-error return semantics; missing-term - # errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", strict=False) - ftl_source = f"{msg_id} = {{ -{nonexistent_term} }}" - bundle.add_resource(ftl_source) - - result, _errors = bundle.format_pattern(msg_id) - assert isinstance(result, str), "Must return string for missing term" - - -class TestSelectExpressionResolution: - """Properties about select expression evaluation.""" - - @given( - var_name=ftl_identifiers(), - selector_value=st.one_of(st.text(min_size=1, max_size=20), st.integers(0, 100)), - variant1_key=ftl_identifiers(), - variant1_val=ftl_simple_text(), - variant2_val=ftl_simple_text(), - ) - def test_select_expression_matches_variant( - self, - var_name: str, - selector_value: str | int, - variant1_key: str, - variant1_val: str, - variant2_val: str, - ) -> None: - """Property: Select expressions match correct variant.""" - event(f"selector_type={type(selector_value).__name__}") - assume(variant1_key != "other") - assume(var_name != variant1_key) - - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f""" -msg = {{ ${var_name} -> - [{variant1_key}] {variant1_val} - *[other] {variant2_val} -}} -""" - bundle.add_resource(ftl_source) - - if not bundle.has_message("msg"): - return - - result, errors = bundle.format_pattern("msg", {var_name: selector_value}) - assert not errors - - if str(selector_value) == variant1_key: - assert variant1_val.strip() in result, f"Expected {variant1_val} for matching key" - else: - assert ( - variant2_val.strip() in result or variant1_val.strip() in result - ), "Must match some variant" - - @given( - var_name=ftl_identifiers(), - numeric_value=st.integers(0, 10), - ) - def test_numeric_selector_matching(self, var_name: str, numeric_value: int) -> None: - """Property: Numeric selectors match correctly.""" - event(f"numeric_value={numeric_value}") - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f""" -msg = {{ ${var_name} -> - [0] zero - [1] one - *[other] many -}} -""" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("msg", {var_name: numeric_value}) - assert not errors - - if numeric_value == 0: - assert "zero" in result, "Should match [0] variant" - elif numeric_value == 1: - assert "one" in result, "Should match [1] variant" - else: - assert "many" in result or result, "Should match default variant" - - -class TestCircularReferenceDetection: - """Properties about circular reference detection.""" - - @given( - msg1_id=ftl_identifiers(), - msg2_id=ftl_identifiers(), - ) - def test_direct_circular_reference_detection( - self, msg1_id: str, msg2_id: str - ) -> None: - """Property: Direct circular references are detected.""" - event(f"id_len={len(msg1_id)}") - assume(msg1_id != msg2_id) - - # strict=False: testing soft-error return semantics; circular-reference - # errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", strict=False) - ftl_source = f""" -{msg1_id} = {{ {msg2_id} }} -{msg2_id} = {{ {msg1_id} }} -""" - bundle.add_resource(ftl_source) - - result, _errors = bundle.format_pattern(msg1_id) - assert isinstance(result, str), "Must handle circular reference gracefully" - - @given(msg_ids=st.lists(ftl_identifiers(), min_size=3, max_size=5, unique=True)) - def test_indirect_circular_reference_detection(self, msg_ids: list[str]) -> None: - """Property: Indirect circular references (chains) are detected.""" - event(f"chain_len={len(msg_ids)}") - # strict=False: testing soft-error return semantics; circular-chain - # errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", strict=False) - - msg_pairs = list(zip(msg_ids, [*msg_ids[1:], msg_ids[0]], strict=True)) - ftl_lines = [f"{m1} = {{ {m2} }}" for m1, m2 in msg_pairs] - ftl_source = "\n".join(ftl_lines) - - bundle.add_resource(ftl_source) - - result, _errors = bundle.format_pattern(msg_ids[0]) - assert isinstance(result, str), "Must handle circular chain gracefully" - - -class TestFunctionCallResolution: - """Properties about function call resolution.""" - - @given( - func_name=st.text( - alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", min_size=3, max_size=10 - ), - return_value=ftl_simple_text(), - ) - def test_custom_function_called(self, func_name: str, return_value: str) -> None: - """Property: Custom functions are called and results used.""" - event(f"func_name_len={len(func_name)}") - assume(func_name not in ("NUMBER", "DATETIME")) - - bundle = FluentBundle("en_US", use_isolating=False) - - def custom_func() -> str: - return return_value - - bundle.add_function(func_name, custom_func) - bundle.add_resource(f"msg = {{ {func_name}() }}") - - result, errors = bundle.format_pattern("msg") - assert not errors - assert return_value.strip() in result, f"Function return value not in result: {result}" - - @given( - func_name=st.text( - alphabet=st.characters(whitelist_categories=["Lu"]), min_size=3, max_size=10 - ), - error_message=ftl_simple_text(), - ) - def test_function_exception_handling( - self, func_name: str, error_message: str - ) -> None: - """Property: Function exceptions are handled gracefully.""" - event(f"func_name_len={len(func_name)}") - assume(func_name not in ("NUMBER", "DATETIME")) - - # strict=False: testing soft-error return semantics; function-exception - # errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", strict=False) - - def failing_func() -> str: - raise ValueError(error_message) - - bundle.add_function(func_name, failing_func) - bundle.add_resource(f"msg = {{ {func_name}() }}") - - result, _errors = bundle.format_pattern("msg") - assert isinstance(result, str), "Must return string even when function fails" - - -class TestResolverIsolatingMarks: - """Properties about Unicode bidi isolation marks.""" - - @given( - var_name=ftl_identifiers(), - var_value=ftl_simple_text(), - ) - def test_isolating_marks_added_when_enabled( - self, var_name: str, var_value: str - ) -> None: - """Property: Isolation marks added around interpolated values when enabled.""" - event(f"value_len={len(var_value)}") - bundle = FluentBundle("en_US", use_isolating=True) - ftl_source = f"msg = {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("msg", {var_name: var_value}) - assert not errors - assert "\u2068" in result, "FSI mark missing" - assert "\u2069" in result, "PDI mark missing" - assert var_value in result, "Variable value missing" - - @given( - var_name=ftl_identifiers(), - var_value=ftl_simple_text(), - ) - def test_no_isolating_marks_when_disabled( - self, var_name: str, var_value: str - ) -> None: - """Property: No isolation marks when use_isolating=False.""" - event(f"value_len={len(var_value)}") - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f"msg = {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("msg", {var_name: var_value}) - assert not errors - assert "\u2068" not in result, "FSI mark should not be present" - assert "\u2069" not in result, "PDI mark should not be present" - - -class TestResolverValueFormatting: - """Properties about value formatting.""" - - @given( - var_name=ftl_identifiers(), - int_value=st.integers(), - ) - def test_integer_formatting(self, var_name: str, int_value: int) -> None: - """Property: Integers are formatted correctly.""" - event(f"int_value={int_value}") - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f"msg = {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("msg", {var_name: int_value}) - assert not errors - assert str(int_value) in result, f"Integer {int_value} not formatted correctly" - - @given( - var_name=ftl_identifiers(), - bool_value=st.booleans(), - ) - def test_boolean_formatting(self, var_name: str, bool_value: bool) -> None: - """Property: Booleans are formatted as lowercase 'true'/'false'.""" - event(f"bool_value={bool_value}") - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f"msg = {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("msg", {var_name: bool_value}) - assert not errors - expected = "true" if bool_value else "false" - assert expected in result, f"Boolean {bool_value} not formatted correctly" - - -class TestResolverMetamorphicProperties: - """Metamorphic properties relating different resolution operations.""" - - @given( - msg_id=ftl_identifiers(), - text1=ftl_simple_text(), - text2=ftl_simple_text(), - ) - def test_concatenation_order_preserved( - self, msg_id: str, text1: str, text2: str - ) -> None: - """Property: Multiple text elements appear in order.""" - event(f"text1_len={len(text1)}") - assume(text1 != text2) - - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f"{msg_id} = {text1} {text2}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern(msg_id) - assert not errors - assert text1.strip() in result, "First text element should be present" - assert text2.strip() in result, "Second text element should be present" - - idx1 = result.find(text1.strip()) - idx2 = result.find(text2.strip()) - if idx1 != idx2: - assert idx1 < idx2, "Text elements should appear in order" - - @given( - msg_id=ftl_identifiers(), - var_name=ftl_identifiers(), - value1=ftl_simple_text(), - value2=ftl_simple_text(), - ) - def test_variable_value_substitution( - self, msg_id: str, var_name: str, value1: str, value2: str - ) -> None: - """Property: Changing variable value changes result.""" - event(f"values_differ={value1 != value2}") - assume(value1 != value2) - assume(value1 not in value2 and value2 not in value1) - - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f"{msg_id} = {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result1, errors = bundle.format_pattern(msg_id, {var_name: value1}) - assert not errors - result2, errors = bundle.format_pattern(msg_id, {var_name: value2}) - assert not errors - assert result1 != result2, "Different variable values should produce different results" - - -class TestResolverErrorRecovery: - """Properties about error recovery during resolution.""" - - @given( - msg_id=ftl_identifiers(), - partial_text=ftl_simple_text(), - var_name=ftl_identifiers(), - ) - def test_partial_resolution_on_error( - self, msg_id: str, partial_text: str, var_name: str - ) -> None: - """Property: Partial resolution continues after errors.""" - event(f"text_len={len(partial_text)}") - # strict=False: testing soft-error return semantics; missing-variable - # errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", use_isolating=False, strict=False) - ftl_source = f"{msg_id} = {partial_text} {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, _errors = bundle.format_pattern(msg_id, {}) - assert partial_text.strip() in result, "Static text should be present even with missing var" - - -class TestResolverCoverageEdgeCases: - """Coverage tests for resolver edge cases.""" - - @given( - msg_id=ftl_identifiers(), - text=ftl_simple_text(), - ) - def test_placeable_error_handling_in_pattern( - self, msg_id: str, text: str - ) -> None: - """Placeable error handling in _resolve_pattern (line 142->138).""" - event(f"text_len={len(text)}") - # strict=False: testing soft-error return semantics; missing-variable - # errors must be returned in the tuple, not raised. - bundle = FluentBundle("en_US", use_isolating=False, strict=False) - ftl_source = f"{msg_id} = {text} {{ $missing }}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern(msg_id, {}) - assert len(errors) > 0 - assert "{$missing}" in result - - @given( - msg_id=ftl_identifiers(), - var_name=ftl_identifiers(), - value=st.integers(), - ) - def test_nested_placeable_expression_resolution( - self, msg_id: str, var_name: str, value: int - ) -> None: - """Placeable expression resolution (line 190).""" - event(f"value={value}") - bundle = FluentBundle("en_US", use_isolating=False) - ftl_source = f"{msg_id} = Value: {{ ${var_name} }}" - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern(msg_id, {var_name: value}) - assert not errors - assert str(value) in result +from tests.fuzz_runtime_resolver_state_machine_cases.advanced_property_based_tests import * # noqa: F403 - re-export split test surface +from tests.fuzz_runtime_resolver_state_machine_cases.direct_error_path_tests_from_state_machine_module import * # noqa: F403 - re-export split test surface +from tests.fuzz_runtime_resolver_state_machine_cases.state_machine import * # noqa: F403 - re-export split test surface +from tests.fuzz_runtime_resolver_state_machine_cases.strategy_helpers import * # noqa: F403 - re-export split test surface diff --git a/tests/fuzz/test_syntax_serializer_property.py b/tests/fuzz/test_syntax_serializer_property.py index be110914..3f2acf98 100644 --- a/tests/fuzz/test_syntax_serializer_property.py +++ b/tests/fuzz/test_syntax_serializer_property.py @@ -1,1505 +1,14 @@ -"""Property-based tests for ftllexengine.syntax.serializer module. - -Comprehensive test suite achieving 100% coverage using Hypothesis property-based -testing with HypoFuzz semantic coverage events. - -Test Properties: -- Roundtrip: parse(serialize(ast)) preserves structure -- Idempotence: serialize(parse(serialize(ast))) == serialize(ast) -- Validation: Invalid ASTs raise SerializationValidationError -- Depth: Nested ASTs respect max_depth limits - -Coverage Targets: -- Lines 117-118: SelectExpression with 0 defaults -- Lines 121-125: SelectExpression with >1 defaults -- Branch 238: FunctionReference without arguments -- Branch 429: Junk serialization -- Branch 616: Placeable in pattern -- Branch 749: SelectExpression serialization -- Branch 804: NumberLiteral variant keys - -Python 3.13+. -""" - -from __future__ import annotations - -import typing -from typing import cast - -import pytest -from hypothesis import HealthCheck, event, given, settings -from hypothesis import strategies as st - -from ftllexengine.constants import MAX_DEPTH -from ftllexengine.enums import CommentType -from ftllexengine.syntax.ast import ( - CallArguments, - Comment, - FTLLiteral, - FunctionReference, - Identifier, - Junk, - Message, - NamedArgument, - Pattern, - Placeable, - Resource, - SelectExpression, - StringLiteral, - Term, - TermReference, - TextElement, - VariableReference, -) -from ftllexengine.syntax.parser import FluentParserV1 -from ftllexengine.syntax.serializer import ( - FluentSerializer, - SerializationDepthError, - SerializationValidationError, - _classify_line, - _escape_text, - _LineKind, # Private import for property tests - serialize, -) -from tests.strategies.ftl import ( - build_invalid_select_multiple_defaults, - build_invalid_select_no_defaults, - ftl_comment_nodes, - ftl_deep_placeables, - ftl_function_references_no_args, - ftl_junk_nodes, - ftl_message_nodes, - ftl_patterns, - ftl_placeables, - ftl_resources, - ftl_select_expressions, - ftl_select_expressions_with_number_keys, - ftl_term_nodes, -) - -# ============================================================================= -# Roundtrip Properties (Core Correctness) -# ============================================================================= - - -class TestRoundtripProperties: - """Test roundtrip correctness: parse(serialize(ast)) preserves structure.""" - - @given(resource=ftl_resources()) - @settings(deadline=None, suppress_health_check=[HealthCheck.too_slow]) - def test_resource_roundtrip_preserves_structure(self, resource: Resource) -> None: - """PROPERTY: Serialized resources can be parsed back to equivalent AST. - - Events emitted: - - entry_count={n}: Number of entries in resource - - entry_type={type}: Type of each entry encountered - """ - # Emit entry count for HypoFuzz coverage - event(f"entry_count={len(resource.entries)}") - - # Serialize the resource - serialized = serialize(resource, validate=True) - - # Parse the serialized output - parser = FluentParserV1() - reparsed = parser.parse(serialized) - - # Emit entry types for HypoFuzz coverage - for entry in resource.entries: - event(f"entry_type={type(entry).__name__}") - - # Verify entry count preserved (no parse errors mean no Junk entries added) - assert len(reparsed.entries) == len(resource.entries) - - @given(message=ftl_message_nodes()) - def test_message_roundtrip_idempotence(self, message: Message) -> None: - """PROPERTY: serialize(parse(serialize(ast))) == serialize(ast). - - Idempotence ensures serialization is stable across multiple cycles. - - Events emitted: - - has_attributes={bool}: Whether message has attributes - - attribute_count={n}: Number of attributes - - pattern_starts_with_space={bool}: Edge case tracking - """ - # Track leading-space edge case for HypoFuzz coverage guidance. - pattern_value = message.value - starts_with_space = False - if pattern_value and pattern_value.elements: - first_elem = pattern_value.elements[0] - if isinstance(first_elem, TextElement) and first_elem.value.startswith(" "): - starts_with_space = True - - event(f"pattern_starts_with_space={starts_with_space}") - - resource = Resource(entries=(message,)) - - # Emit attribute coverage events - event(f"has_attributes={len(message.attributes) > 0}") - if message.attributes: - event(f"attribute_count={len(message.attributes)}") - - # First serialization - serialized1 = serialize(resource, validate=True) - - # Parse and re-serialize - parser = FluentParserV1() - reparsed = parser.parse(serialized1) - serialized2 = serialize(reparsed, validate=True) - - # Idempotence: second serialization matches first - assert serialized1 == serialized2 - - @given(term=ftl_term_nodes()) - def test_term_roundtrip_idempotence(self, term: Term) -> None: - """PROPERTY: Terms serialize idempotently. - - Events emitted: - - has_attributes={bool}: Whether term has attributes - - pattern_starts_with_space={bool}: Edge case tracking - """ - # Track leading-space edge case for HypoFuzz coverage guidance. - pattern_value = term.value - starts_with_space = False - if pattern_value and pattern_value.elements: - first_elem = pattern_value.elements[0] - if isinstance(first_elem, TextElement) and first_elem.value.startswith(" "): - starts_with_space = True - - event(f"pattern_starts_with_space={starts_with_space}") - - resource = Resource(entries=(term,)) - - event(f"has_attributes={len(term.attributes) > 0}") - - serialized1 = serialize(resource, validate=True) - - parser = FluentParserV1() - reparsed = parser.parse(serialized1) - serialized2 = serialize(reparsed, validate=True) - - assert serialized1 == serialized2 - - @given(pattern=ftl_patterns()) - def test_pattern_roundtrip_preserves_elements(self, pattern: Pattern) -> None: - """PROPERTY: Pattern serialization preserves all elements. - - Events emitted: - - element_count={n}: Number of elements in pattern - - element_type={type}: Type of each element - - has_placeable={bool}: Whether pattern contains placeables - """ - # Wrap pattern in a message - message = Message( - id=Identifier(name="test"), - value=pattern, - attributes=(), - ) - resource = Resource(entries=(message,)) - - # Emit pattern structure events - event(f"element_count={len(pattern.elements)}") - has_placeable = any(isinstance(e, Placeable) for e in pattern.elements) - event(f"has_placeable={has_placeable}") - - for element in pattern.elements: - event(f"element_type={type(element).__name__}") - - serialized = serialize(resource, validate=True) - - parser = FluentParserV1() - reparsed = parser.parse(serialized) - - # Verify no parse errors (no Junk entries) and correct entry count - assert len(reparsed.entries) == 1 - - -# ============================================================================= -# Validation Properties (Error Handling) -# ============================================================================= - - -class TestValidationProperties: - """Test validation error detection for invalid ASTs.""" - - def test_select_no_defaults_raises_validation_error(self) -> None: - """COVERAGE: Lines 117-118 - SelectExpression with 0 defaults.""" - - # Build invalid SelectExpression with no defaults - invalid_select = build_invalid_select_no_defaults() - - # Wrap in a message - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=invalid_select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - # Validation should catch the error - with pytest.raises(SerializationValidationError, match="no default variant"): - serialize(resource, validate=True) - - def test_select_multiple_defaults_raises_validation_error(self) -> None: - """COVERAGE: Lines 121-125 - SelectExpression with >1 defaults.""" - - # Build invalid SelectExpression with multiple defaults - invalid_select = build_invalid_select_multiple_defaults() - - # Wrap in a message - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=invalid_select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - # Validation should catch the error - with pytest.raises(SerializationValidationError, match="2 default variants"): - serialize(resource, validate=True) - - @given(message=ftl_message_nodes()) - def test_valid_ast_passes_validation(self, message: Message) -> None: - """PROPERTY: Valid ASTs pass validation without error. - - Events emitted: - - validation=passed: Successful validation - """ - resource = Resource(entries=(message,)) - - event("validation=passed") - - # Should not raise - serialized = serialize(resource, validate=True) - assert isinstance(serialized, str) - - def test_validation_can_be_disabled(self) -> None: - """COVERAGE: validate=False parameter skips validation.""" - - # Build invalid SelectExpression - invalid_select = build_invalid_select_no_defaults() - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=invalid_select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - # Should not raise when validate=False - serialized = serialize(resource, validate=False) - assert isinstance(serialized, str) - - def test_invalid_identifier_raises_validation_error(self) -> None: - """COVERAGE: Invalid identifier validation.""" - - # Create message with invalid identifier (empty string) - # Bypass validation by using object.__new__ - identifier = object.__new__(Identifier) - object.__setattr__(identifier, "name", "") # Invalid: empty - object.__setattr__(identifier, "span", None) - - message = Message( - id=identifier, - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - with pytest.raises(SerializationValidationError, match="Invalid identifier"): - serialize(resource, validate=True) - - def test_duplicate_named_arguments_raises_validation_error(self) -> None: - """COVERAGE: Duplicate named arguments validation.""" - - # Create function call with duplicate named arguments - func_ref = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="style"), - value=StringLiteral(value="currency"), - ), - NamedArgument( - name=Identifier(name="style"), # Duplicate! - value=StringLiteral(value="percent"), - ), - ), - ), - ) - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - with pytest.raises(SerializationValidationError, match="Duplicate named argument"): - serialize(resource, validate=True) - - def test_invalid_named_argument_value_type_raises_error(self) -> None: - """COVERAGE: Named argument value type validation.""" - - # Create function call with invalid named argument value (not literal) - func_ref = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="style"), - value=cast("FTLLiteral", VariableReference(id=Identifier(name="var"))), - ), - ), - ), - ) - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - with pytest.raises(SerializationValidationError, match="invalid value type"): - serialize(resource, validate=True) - - -# ============================================================================= -# Depth Properties (DoS Protection) -# ============================================================================= - - -class TestDepthProperties: - """Test max_depth protection against stack overflow.""" - - @given(deep_placeable=ftl_deep_placeables(depth=5)) - def test_moderate_depth_succeeds(self, deep_placeable: Placeable) -> None: - """PROPERTY: Moderately nested ASTs serialize successfully. - - Events emitted: - - depth=moderate: Depth category - """ - event("depth=moderate") - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(deep_placeable,)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - # Should succeed with default max_depth - serialized = serialize(resource, validate=True, max_depth=MAX_DEPTH) - assert isinstance(serialized, str) - - def test_extreme_depth_raises_depth_error(self) -> None: - """COVERAGE: SerializationDepthError on overflow.""" - - # Build deeply nested structure exceeding limit - # Start with innermost expression - inner_expr = VariableReference(id=Identifier(name="x")) - - # Wrap in 150 nested placeables (exceeds default MAX_DEPTH=100) - current: Placeable | VariableReference = inner_expr - for _ in range(150): - current = Placeable(expression=current) - - # After loop, current is guaranteed to be Placeable - outermost_placeable = typing.cast("Placeable", current) - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(outermost_placeable,)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - with pytest.raises(SerializationDepthError, match="depth limit exceeded"): - serialize(resource, validate=True, max_depth=MAX_DEPTH) - - def test_custom_max_depth_respected(self) -> None: - """COVERAGE: Custom max_depth parameter.""" - - # Build structure with 10 nested placeables - inner_expr = VariableReference(id=Identifier(name="x")) - current: Placeable | VariableReference = inner_expr - for _ in range(10): - current = Placeable(expression=current) - - # After loop, current is guaranteed to be Placeable - outermost_placeable = typing.cast("Placeable", current) - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(outermost_placeable,)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - # Should fail with max_depth=5 - with pytest.raises(SerializationDepthError): - serialize(resource, validate=True, max_depth=5) - - # Should succeed with max_depth=15 - serialized = serialize(resource, validate=True, max_depth=15) - assert isinstance(serialized, str) - - -# ============================================================================= -# Coverage-Targeted Tests (Branch Coverage) -# ============================================================================= - - -class TestCoverageTargeted: - """Tests targeting specific coverage gaps.""" - - @given(func_ref=ftl_function_references_no_args()) - def test_function_reference_without_arguments(self, func_ref: FunctionReference) -> None: - """COVERAGE: Branch 238 - FunctionReference without arguments. - - Events emitted: - - coverage_target=function_no_args: Branch target - """ - event("coverage_target=function_no_args") - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - serialized = serialize(resource, validate=True) - - # Should contain function name followed by empty parens - assert f"{func_ref.id.name}()" in serialized - - @given(junk=ftl_junk_nodes()) - def test_junk_serialization(self, junk: Junk) -> None: - """COVERAGE: Branch 429 - Junk serialization. - - Events emitted: - - coverage_target=junk: Branch target - - junk_has_trailing_newline={bool}: Content structure - """ - event("coverage_target=junk") - event(f"junk_has_trailing_newline={junk.content.endswith('\\n')}") - - resource = Resource(entries=(junk,)) - - serialized = serialize(resource, validate=False) # Junk may be invalid - - # Junk content should be preserved as-is (with trailing newline added if missing) - if junk.content.endswith("\n"): - assert junk.content in serialized - else: - assert junk.content + "\n" in serialized - - @given(select_expr=ftl_select_expressions_with_number_keys()) - def test_select_expression_number_keys(self, select_expr: SelectExpression) -> None: - """COVERAGE: Branch 804 - NumberLiteral variant keys. - - Events emitted: - - coverage_target=select_number_keys: Branch target - """ - event("coverage_target=select_number_keys") - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=select_expr),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - serialized = serialize(resource, validate=True) - - # Should contain numeric variant keys - assert "[0]" in serialized or "[1]" in serialized - - @given(placeable=ftl_placeables()) - def test_placeable_in_pattern(self, placeable: Placeable) -> None: - """COVERAGE: Branch 616 - Placeable in pattern. - - Events emitted: - - coverage_target=placeable_in_pattern: Branch target - - placeable_expr_type={type}: Expression type - """ - event("coverage_target=placeable_in_pattern") - event(f"placeable_expr_type={type(placeable.expression).__name__}") - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(placeable,)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - serialized = serialize(resource, validate=True) - - # Should contain placeable delimiters - assert "{ " in serialized - assert " }" in serialized - - @given(select_expr=ftl_select_expressions()) - def test_select_expression_serialization(self, select_expr: SelectExpression) -> None: - """COVERAGE: Branch 749 - SelectExpression serialization. - - Events emitted: - - coverage_target=select_expression: Branch target - - variant_count={n}: Number of variants - """ - event("coverage_target=select_expression") - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=select_expr),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - serialized = serialize(resource, validate=True) - - # Emit variant count for HypoFuzz - event(f"variant_count={len(select_expr.variants)}") - - # Should contain select syntax - assert "->" in serialized - # Should contain at least one default variant marker - assert "*[" in serialized - - @given(comment=ftl_comment_nodes()) - def test_comment_serialization(self, comment: Comment) -> None: - """COVERAGE: Comment serialization. - - Events emitted: - - coverage_target=comment: Branch target - - comment_type={type}: Comment type - """ - event("coverage_target=comment") - event(f"comment_type={comment.type.name}") - - resource = Resource(entries=(comment,)) - - serialized = serialize(resource, validate=False) - - # Should contain comment prefix - assert "#" in serialized - - -# ============================================================================= -# Serializer Class Tests (Direct Class Usage) -# ============================================================================= - - -class TestFluentSerializerClass: - """Test FluentSerializer class directly (not just convenience function).""" - - @given(resource=ftl_resources()) - def test_serializer_instance_reusable(self, resource: Resource) -> None: - """PROPERTY: FluentSerializer instances are reusable (thread-safe). - - Events emitted: - - serializer=reused: Reuse tracking - """ - event("serializer=reused") - - serializer = FluentSerializer() - - # Use same instance twice - result1 = serializer.serialize(resource, validate=True) - result2 = serializer.serialize(resource, validate=True) - - # Should produce identical results (no state mutation) - assert result1 == result2 - - @given(message=ftl_message_nodes()) - def test_serializer_matches_convenience_function(self, message: Message) -> None: - """PROPERTY: FluentSerializer.serialize() == serialize(). - - Events emitted: - - serializer=class_vs_function: Comparison tracking - """ - event("serializer=class_vs_function") - - resource = Resource(entries=(message,)) - - serializer = FluentSerializer() - class_result = serializer.serialize(resource, validate=True) - func_result = serialize(resource, validate=True) - - assert class_result == func_result - - -# ============================================================================= -# Special Character Handling Tests -# ============================================================================= - - -class TestSpecialCharacterHandling: - """Test proper escaping and handling of special characters.""" - - @given( - text=st.text( - alphabet=st.characters( - blacklist_categories=["Cs", "Cc"], # Surrogates and control - blacklist_characters=["\x00"], # Null - ), - min_size=1, - max_size=50, - ) - ) - def test_string_literal_escaping_roundtrip(self, text: str) -> None: - """PROPERTY: String literals with special chars roundtrip correctly. - - Events emitted: - - has_backslash={bool}: Contains backslash - - has_quote={bool}: Contains quote - - has_newline={bool}: Contains newline - """ - has_backslash = "\\\\" in text - has_quote = '"' in text - has_newline = "\\n" in text - event(f"has_backslash={has_backslash}") - event(f"has_quote={has_quote}") - event(f"has_newline={has_newline}") - - string_lit = StringLiteral(value=text) - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=string_lit),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - serialized = serialize(resource, validate=True) - - parser = FluentParserV1() - reparsed = parser.parse(serialized) - - # Verify no parse errors (no Junk entries means successful parse) - assert len(reparsed.entries) > 0 - - def test_brace_escaping_as_placeable(self) -> None: - """COVERAGE: Braces must be escaped as placeables.""" - - # Braces in text are represented as Placeable(StringLiteral) - pattern = Pattern( - elements=( - TextElement(value="Start "), - Placeable(expression=StringLiteral(value="{")), - TextElement(value=" middle "), - Placeable(expression=StringLiteral(value="}")), - TextElement(value=" end"), - ) - ) - - message = Message(id=Identifier(name="test"), value=pattern, attributes=()) - resource = Resource(entries=(message,)) - - serialized = serialize(resource, validate=True) - - # Should contain escaped braces as placeables - assert '{ "{" }' in serialized - assert '{ "}" }' in serialized - - def test_multiline_pattern_indentation(self) -> None: - """COVERAGE: Multiline patterns get proper indentation.""" - - # Pattern with embedded newline - pattern = Pattern( - elements=( - TextElement(value="Line 1\n"), - TextElement(value="Line 2"), - ) - ) - - message = Message(id=Identifier(name="test"), value=pattern, attributes=()) - resource = Resource(entries=(message,)) - - serialized = serialize(resource, validate=True) - - # Should contain structural indentation after newline - assert "Line 1\n Line 2" in serialized - - -# ============================================================================= -# _classify_line Property Tests -# ============================================================================= - - -# Characters syntactically significant at continuation line start in FTL -_SYNTAX_CHARS = ".[*" - - -class TestClassifyLineProperties: - """Property-based tests for _classify_line pure function. - - Properties verified: - - EMPTY iff empty string - - WHITESPACE_ONLY iff all spaces and non-empty - - SYNTAX_LEADING iff first non-ws char is in {., *, [} - - ws_len is always non-negative - - Classification is exhaustive (always one of 4 kinds) - """ - - @given(line=st.text( - alphabet=st.characters( - codec="utf-8", categories=("L", "N", "P", "S", "Z") - ), - min_size=0, - max_size=80, - )) - def test_output_is_valid_kind(self, line: str) -> None: - """_classify_line always returns a valid _LineKind.""" - kind, ws_len = _classify_line(line) - kind_name = kind.name - event(f"kind={kind_name}") - assert isinstance(kind, _LineKind) - assert ws_len >= 0 - - @given(line=st.text( - alphabet=st.characters( - codec="utf-8", categories=("L", "N", "P", "S", "Z") - ), - min_size=0, - max_size=80, - )) - def test_empty_iff_empty_string(self, line: str) -> None: - """EMPTY kind iff input is the empty string.""" - kind, _ = _classify_line(line) - is_empty = kind is _LineKind.EMPTY - event(f"empty={is_empty}") - assert is_empty == (line == "") - - @given(n=st.integers(min_value=1, max_value=20)) - def test_whitespace_only_for_space_strings(self, n: int) -> None: - """Strings of only spaces classify as WHITESPACE_ONLY.""" - line = " " * n - kind, ws_len = _classify_line(line) - event(f"spaces={n}") - assert kind is _LineKind.WHITESPACE_ONLY - assert ws_len == 0 - - @given( - ws=st.integers(min_value=0, max_value=10), - syntax_char=st.sampled_from(list(_SYNTAX_CHARS)), - suffix=st.text(min_size=0, max_size=20), - ) - def test_syntax_leading_classification( - self, ws: int, syntax_char: str, suffix: str - ) -> None: - """Lines starting with (optional ws + syntax char) are SYNTAX_LEADING.""" - line = " " * ws + syntax_char + suffix - kind, ws_len = _classify_line(line) - event(f"syntax_char={syntax_char}") - event(f"ws_prefix={ws}") - assert kind is _LineKind.SYNTAX_LEADING - assert ws_len == ws - - @given( - ws=st.integers(min_value=0, max_value=10), - first_char=st.characters( - codec="utf-8", - categories=("L", "N"), - ), - suffix=st.text(min_size=0, max_size=20), - ) - def test_normal_for_non_syntax_first_char( - self, ws: int, first_char: str, suffix: str - ) -> None: - """Lines where first non-ws char is not syntax are NORMAL.""" - line = " " * ws + first_char + suffix - kind, _ = _classify_line(line) - event(f"kind={kind.name}") - assert kind is _LineKind.NORMAL - - -# ============================================================================= -# _escape_text Property Tests -# ============================================================================= - - -class TestEscapeTextProperties: - """Property-based tests for _escape_text brace escaping. - - Properties verified: - - Content preserved: unescaping the result recovers the original - - No raw braces in non-placeable positions - """ - - @given(text=st.text(min_size=0, max_size=100)) - def test_content_roundtrip(self, text: str) -> None: - """Unescaping placeable wrappers recovers original text.""" - output: list[str] = [] - _escape_text(text, output) - result = "".join(output) - has_braces = "{" in text or "}" in text - event(f"has_braces={has_braces}") - event(f"length={len(text)}") - # Reverse the escaping - recovered = result.replace('{ "{" }', "{").replace('{ "}" }', "}") - assert recovered == text - - @given(text=st.text( - alphabet=st.characters( - codec="utf-8", - exclude_characters="{}", - ), - min_size=0, - max_size=100, - )) - def test_no_transformation_without_braces(self, text: str) -> None: - """Text without braces passes through unchanged.""" - output: list[str] = [] - _escape_text(text, output) - result = "".join(output) - event(f"length={len(text)}") - assert result == text - - -# ============================================================================= -# Call Argument Depth Properties (Depth Guard in Arguments) -# ============================================================================= - - -class TestCallArgumentDepthProperties: - """Test depth guard enforcement within call arguments. - - Serializer wraps each positional and named argument expression - in depth_guard. Nested term/function calls must respect limits. - """ - - @given(depth=st.integers(min_value=1, max_value=8)) - def test_nested_call_arguments_serialize( - self, depth: int - ) -> None: - """PROPERTY: Nested call arguments within limits serialize. - - Events emitted: - - call_arg_depth={n}: Nesting depth of call arguments - - outcome=nested_args_ok: Serialization succeeded - """ - event(f"call_arg_depth={depth}") - - # Build: NUMBER(-t0(-t1(-t2(...$x...)))) - inner: VariableReference | TermReference - inner = VariableReference(id=Identifier(name="x")) - for i in range(depth): - inner = TermReference( - id=Identifier(name=f"t{i}"), - arguments=CallArguments( - positional=(inner,), named=() - ), - ) - func = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(inner,), named=() - ), - ) - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(Placeable(expression=func),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - event("outcome=nested_args_ok") - assert "-t0(" in result - assert "$x" in result - - def test_deep_call_args_exceed_depth_limit(self) -> None: - """Deeply nested call arguments exceed depth limit.""" - inner: VariableReference | TermReference - inner = VariableReference(id=Identifier(name="x")) - for i in range(20): - inner = TermReference( - id=Identifier(name=f"t{i}"), - arguments=CallArguments( - positional=(inner,), named=() - ), - ) - func = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(inner,), named=() - ), - ) - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(Placeable(expression=func),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - with pytest.raises(SerializationDepthError): - serialize(resource, validate=True, max_depth=10) - - @given( - depth=st.integers(min_value=1, max_value=5), - named_val=st.sampled_from(["decimal", "percent"]), - ) - def test_named_args_in_nested_calls( - self, depth: int, named_val: str - ) -> None: - """PROPERTY: Named arguments in nested calls serialize. - - Events emitted: - - call_arg_depth={n}: Nesting depth - - has_named_arg=True: Named argument present - """ - event(f"call_arg_depth={depth}") - event("has_named_arg=True") - - inner: VariableReference | TermReference - inner = VariableReference(id=Identifier(name="x")) - for i in range(depth): - named = NamedArgument( - name=Identifier(name="style"), - value=StringLiteral(value=named_val), - ) - inner = TermReference( - id=Identifier(name=f"t{i}"), - arguments=CallArguments( - positional=(inner,), named=(named,) - ), - ) - func = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(inner,), named=() - ), - ) - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(Placeable(expression=func),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - assert f'style: "{named_val}"' in result - - -# ============================================================================= -# Control Character StringLiteral Properties -# ============================================================================= - - -class TestControlCharStringLiteralProperties: - """Test StringLiteral escaping for all control characters. - - Serializer uses \\uHHHH for chars < 0x20 and 0x7F. Verify - this encoding for the full control character range. - """ - - @given( - code=st.integers(min_value=0, max_value=0x1F), - ) - def test_c0_control_chars_escaped(self, code: int) -> None: - """PROPERTY: C0 control chars (0x00-0x1F) use \\uHHHH. - - Events emitted: - - control_char_code={n}: Character code point - - outcome=control_char_escaped: Escape verified - """ - event(f"control_char_code={code}") - - char = chr(code) - lit = StringLiteral(value=f"a{char}b") - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(Placeable(expression=lit),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - expected_escape = f"\\u{code:04X}" - assert expected_escape in result - event("outcome=control_char_escaped") - - def test_del_char_escaped(self) -> None: - """DEL character (0x7F) uses \\u007F encoding.""" - lit = StringLiteral(value="a\x7Fb") - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(Placeable(expression=lit),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - assert "\\u007F" in result - - @given( - code=st.sampled_from( - [0x00, 0x01, 0x08, 0x09, 0x0A, 0x0C, 0x0D, - 0x1B, 0x1F, 0x7F] - ), - ) - def test_control_char_roundtrip(self, code: int) -> None: - """PROPERTY: Control chars roundtrip through parse/serialize. - - Events emitted: - - control_char_code={n}: Character code point - - outcome=control_roundtrip_ok: Roundtrip succeeded - """ - event(f"control_char_code={code}") - - char = chr(code) - lit = StringLiteral(value=f"x{char}y") - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(Placeable(expression=lit),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource, validate=True) - parser = FluentParserV1() - reparsed = parser.parse(serialized) - assert len(reparsed.entries) == 1 - assert not any( - isinstance(e, Junk) for e in reparsed.entries - ) - event("outcome=control_roundtrip_ok") - - -# ============================================================================= -# Entry Sequencing Properties (Junk/Comment/Message ordering) -# ============================================================================= - - -class TestEntrySequencingProperties: - """Test blank-line insertion logic for mixed entry sequences. - - Serializer handles spacing between entries: extra blank lines - for adjacent comments of same type, Junk with leading - whitespace, Message/Term compact separation. - """ - - @given( - data=st.data(), - count=st.integers(min_value=2, max_value=5), - ) - @settings(deadline=None, suppress_health_check=[HealthCheck.too_slow]) - def test_mixed_entry_sequences_parseable( - self, data: st.DataObject, count: int - ) -> None: - """PROPERTY: Mixed entry sequences serialize to parseable FTL. - - Events emitted: - - entry_count={n}: Number of entries - - has_junk={bool}: Whether Junk entries present - - has_comment={bool}: Whether Comment entries present - - outcome=sequence_parseable: Output parses without error - """ - event(f"entry_count={count}") - - entries: list[Message | Term | Comment | Junk] = [] - seen_ids: set[str] = set() - has_junk = False - has_comment = False - - for i in range(count): - choice = data.draw( - st.sampled_from( - ["message", "term", "comment", "junk"] - ) - ) - if choice == "message": - name = f"msg{i}" - if name not in seen_ids: - seen_ids.add(name) - entries.append( - Message( - id=Identifier(name=name), - value=Pattern( - elements=( - TextElement(value="val"), - ) - ), - attributes=(), - ) - ) - elif choice == "term": - name = f"term{i}" - if name not in seen_ids: - seen_ids.add(name) - entries.append( - Term( - id=Identifier(name=name), - value=Pattern( - elements=( - TextElement(value="val"), - ) - ), - attributes=(), - ) - ) - elif choice == "comment": - has_comment = True - ctype = data.draw( - st.sampled_from([ - CommentType.COMMENT, - CommentType.GROUP, - CommentType.RESOURCE, - ]) - ) - entries.append( - Comment( - content=f"comment {i}", - type=ctype, - ) - ) - else: - has_junk = True - entries.append( - Junk(content=f"junk line {i}\n") - ) - - event(f"has_junk={has_junk}") - event(f"has_comment={has_comment}") - - if not entries: - return - - resource = Resource(entries=tuple(entries)) - result = serialize(resource, validate=False) - - parser = FluentParserV1() - reparsed = parser.parse(result) - assert len(reparsed.entries) > 0 - event("outcome=sequence_parseable") - - @given( - junk_count=st.integers(min_value=1, max_value=3), - msg_count=st.integers(min_value=1, max_value=3), - ) - def test_junk_between_messages( - self, junk_count: int, msg_count: int - ) -> None: - """PROPERTY: Junk interleaved with Messages serializes. - - Events emitted: - - junk_count={n}: Number of Junk entries - - msg_count={n}: Number of Message entries - - outcome=junk_interleaved_ok: Serialization succeeded - """ - event(f"junk_count={junk_count}") - event(f"msg_count={msg_count}") - - entries: list[Message | Junk] = [] - for i in range(msg_count): - entries.append( - Message( - id=Identifier(name=f"m{i}"), - value=Pattern( - elements=(TextElement(value="v"),) - ), - attributes=(), - ) - ) - if i < junk_count: - entries.append( - Junk(content=f"bad syntax {i}\n") - ) - - resource = Resource(entries=tuple(entries)) - result = serialize(resource, validate=False) - assert isinstance(result, str) - assert len(result) > 0 - event("outcome=junk_interleaved_ok") - - def test_adjacent_same_type_comments_separated( - self, - ) -> None: - """Adjacent same-type comments get extra blank line.""" - entries = ( - Comment(content="first", type=CommentType.COMMENT), - Comment(content="second", type=CommentType.COMMENT), - ) - resource = Resource(entries=entries) - result = serialize(resource, validate=False) - # Double newline separates same-type comments - assert "\n\n" in result - - -# ============================================================================= -# SYNTAX_LEADING Roundtrip Properties (Full Path) -# ============================================================================= - - -class TestSyntaxLeadingRoundtripProperties: - """Test full serialize-parse-serialize for syntax-leading lines. - - Continuation lines starting with . * [ need wrapping as - StringLiteral placeables to prevent parser misinterpretation. - """ - - _parser = FluentParserV1() - - @given( - syntax_char=st.sampled_from([".", "*", "["]), - ws=st.integers(min_value=0, max_value=6), - suffix=st.text( - alphabet=st.characters( - codec="utf-8", - categories=("L", "N"), - ), - min_size=0, - max_size=20, - ), - ) - def test_syntax_leading_roundtrip( - self, syntax_char: str, ws: int, suffix: str - ) -> None: - """PROPERTY: Syntax-leading continuation lines roundtrip. - - Events emitted: - - syntax_char={char}: Which syntax character - - ws_prefix={n}: Leading whitespace before syntax char - - has_suffix={bool}: Whether trailing text follows - - line_kind=SYNTAX_LEADING: Confirm classification - """ - event(f"syntax_char={syntax_char}") - event(f"ws_prefix={ws}") - has_suffix = len(suffix) > 0 - event(f"has_suffix={has_suffix}") - - line = " " * ws + syntax_char + suffix - kind, _ = _classify_line(line) - event(f"line_kind={kind.name}") - - text_val = f"line1\n{line}" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value=text_val),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - - # Must contain the syntax char wrapped as placeable - escaped = f'{{ "{syntax_char}" }}' - assert escaped in result - - # Parse: no Junk entries - reparsed = self._parser.parse(result) - assert not any( - isinstance(e, Junk) - for e in reparsed.entries - ) - - @given( - syntax_char=st.sampled_from([".", "*", "["]), - ) - def test_syntax_char_only_roundtrip( - self, syntax_char: str - ) -> None: - """PROPERTY: Line with only syntax char roundtrips. - - Events emitted: - - syntax_char={char}: Which syntax character - - line_kind=SYNTAX_LEADING: Classification - - has_suffix=False: No trailing text - """ - event(f"syntax_char={syntax_char}") - event("has_suffix=False") - - kind, _ = _classify_line(syntax_char) - event(f"line_kind={kind.name}") - - text_val = f"first line\n{syntax_char}" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value=text_val),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - escaped = f'{{ "{syntax_char}" }}' - assert escaped in result - - reparsed = self._parser.parse(result) - assert not any( - isinstance(e, Junk) - for e in reparsed.entries - ) - - @given( - n_spaces=st.integers(min_value=1, max_value=10), - ) - def test_whitespace_only_continuation_roundtrip( - self, n_spaces: int - ) -> None: - """PROPERTY: Whitespace-only continuation lines roundtrip. - - Events emitted: - - spaces={n}: Number of spaces - - line_kind=WHITESPACE_ONLY: Classification - """ - event(f"spaces={n_spaces}") - - ws_line = " " * n_spaces - kind, _ = _classify_line(ws_line) - event(f"line_kind={kind.name}") - - text_val = f"first line\n{ws_line}\nthird line" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value=text_val),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - # Whitespace-only wrapped as placeable - assert f'{{ "{ws_line}" }}' in result - - reparsed = self._parser.parse(result) - assert not any( - isinstance(e, Junk) - for e in reparsed.entries - ) - - -# ============================================================================= -# Separate-Line Trigger Discrimination -# ============================================================================= - - -class TestSeparateLineTriggerProperties: - """Test separate-line mode trigger discrimination. - - Two distinct triggers exist: - 1. Cross-element: TextElement starts with space after - element ending with newline. - 2. Intra-element: Single TextElement has embedded newline - followed by space on a NORMAL line. - """ - - @given( - n_spaces=st.integers(min_value=1, max_value=8), - ) - def test_cross_element_trigger( - self, n_spaces: int - ) -> None: - """PROPERTY: Cross-element whitespace triggers separate-line. - - Events emitted: - - trigger=cross_element: Trigger type - - leading_spaces={n}: Number of leading spaces - """ - event("trigger=cross_element") - event(f"leading_spaces={n_spaces}") - - # Element 1 ends with newline, element 2 starts with - # spaces — triggers separate-line mode. - elems = ( - TextElement(value="line one\n"), - TextElement(value=" " * n_spaces + "line two"), - ) - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=elems), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource, validate=True) - # Separate-line: pattern on new line after = - assert "test = \n " in result - - @given( - n_spaces=st.integers(min_value=1, max_value=8), - ) - def test_intra_element_trigger( - self, n_spaces: int - ) -> None: - """PROPERTY: Intra-element whitespace triggers separate-line. - - Events emitted: - - trigger=intra_element: Trigger type - - leading_spaces={n}: Number of leading spaces - """ - event("trigger=intra_element") - event(f"leading_spaces={n_spaces}") - - # Single element with embedded \n + spaces + NORMAL char - text_val = f"line one\n{' ' * n_spaces}line two" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value=text_val),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource, validate=True) - # Separate-line: pattern on new line after = - assert "test = \n " in result - - @given( - syntax_char=st.sampled_from([".", "*", "["]), - n_spaces=st.integers(min_value=1, max_value=6), - ) - def test_syntax_leading_does_not_trigger_separate_line( - self, syntax_char: str, n_spaces: int - ) -> None: - """PROPERTY: SYNTAX_LEADING lines DON'T trigger separate-line. - - Events emitted: - - trigger=syntax_not_separate: Negative case - - syntax_char={char}: Which syntax char - """ - event("trigger=syntax_not_separate") - event(f"syntax_char={syntax_char}") - - # Embedded \n + spaces + syntax char => SYNTAX_LEADING, - # which is handled by per-line wrapping, NOT separate-line. - line = " " * n_spaces + syntax_char + "rest" - text_val = f"line one\n{line}" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value=text_val),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource, validate=True) - # Should NOT use separate-line mode - assert result.startswith("test = ") - assert not result.startswith("test = \n") - - -# ============================================================================= -# Mark as fuzz tests for selective execution -# ============================================================================= - -pytestmark = pytest.mark.fuzz +"""Aggregated fuzz syntax serializer property test surface.""" + +from tests.fuzz_syntax_serializer_property_cases.call_argument_depth_properties_depth_guard_in_arguments import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.control_character_string_literal_properties import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.coverage_targeted_tests_branch_coverage import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.depth_properties_do_s_protection import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.entry_sequencing_properties_junk_comment_message_ordering import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.escape_text_property_tests import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.roundtrip_properties_core_correctness import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.separate_line_trigger_discrimination import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.serializer_class_tests_direct_class_usage import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.special_character_handling_tests import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.syntax_leading_roundtrip_properties_full_path import * # noqa: F403 - re-export split test surface +from tests.fuzz_syntax_serializer_property_cases.validation_properties_error_handling import * # noqa: F403 - re-export split test surface diff --git a/tests/fuzz_localization_property_cases/__init__.py b/tests/fuzz_localization_property_cases/__init__.py new file mode 100644 index 00000000..90c4a4d9 --- /dev/null +++ b/tests/fuzz_localization_property_cases/__init__.py @@ -0,0 +1,72 @@ +"""Property-based tests for FluentLocalization orchestration layer. + +Covers multi-locale orchestration, data type invariants, fallback semantics, +and API surface completeness using Hypothesis strategies from +tests/strategies/localization. + +Fuzz module: all @given tests emit hypothesis.event() for HypoFuzz guidance. + +Python 3.13+. +""" + +from __future__ import annotations + +from decimal import Decimal +from pathlib import Path + +import pytest +from hypothesis import HealthCheck, event, given, settings +from hypothesis import strategies as st + +from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.localization import ( + FluentLocalization, + LoadStatus, + LoadSummary, + PathResourceLoader, + ResourceLoadResult, +) +from ftllexengine.runtime.cache_config import CacheConfig +from ftllexengine.syntax.ast import Junk, Span +from tests.strategies.ftl import ftl_simple_messages +from tests.strategies.localization import ( + DictResourceLoader, + FailingResourceLoader, + ftl_messages_with_attributes, + ftl_messages_with_terms, + ftl_resource_sets, + locale_chains, + message_ids, + resource_loaders, +) + +pytestmark = pytest.mark.fuzz + +__all__ = [ + "CacheConfig", + "Decimal", + "DictResourceLoader", + "FailingResourceLoader", + "FluentLocalization", + "HealthCheck", + "Junk", + "LoadStatus", + "LoadSummary", + "Path", + "PathResourceLoader", + "ResourceLoadResult", + "Span", + "event", + "ftl_messages_with_attributes", + "ftl_messages_with_terms", + "ftl_resource_sets", + "ftl_simple_messages", + "given", + "locale_chains", + "message_ids", + "normalize_locale", + "pytest", + "resource_loaders", + "settings", + "st", +] diff --git a/tests/fuzz_localization_property_cases/add_function_deferred_application.py b/tests/fuzz_localization_property_cases/add_function_deferred_application.py new file mode 100644 index 00000000..67a2c8ac --- /dev/null +++ b/tests/fuzz_localization_property_cases/add_function_deferred_application.py @@ -0,0 +1,47 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# add_function deferred application +# --------------------------------------------------------------------------- + + +class TestAddFunctionDeferred: + """Tests for add_function deferred/immediate application.""" + + @given(locales=locale_chains(min_size=1, max_size=3)) + def test_function_applied_to_existing_bundles( + self, locales: list[str], + ) -> None: + """add_function applies to already-created bundles.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales, use_isolating=False) + # Create bundles by adding resources + for locale in locales: + l10n.add_resource(locale, "msg = { UPPER($x) }\n") + + def upper_fn(value: str) -> str: + return value.upper() + + l10n.add_function("UPPER", upper_fn) + result, _ = l10n.format_value("msg", {"x": "test"}) + assert "TEST" in result + + @given(locales=locale_chains(min_size=2, max_size=3)) + def test_function_stored_for_lazy_bundles( + self, locales: list[str], + ) -> None: + """add_function stored for bundles created later.""" + event("outcome=deferred") + l10n = FluentLocalization(locales, use_isolating=False) + + def lower_fn(value: str) -> str: + return value.lower() + + l10n.add_function("LOWER", lower_fn) + # Add resource and format after function registration + l10n.add_resource(locales[0], "msg = { LOWER($x) }\n") + result, _ = l10n.format_value("msg", {"x": "HELLO"}) + assert "hello" in result diff --git a/tests/fuzz_localization_property_cases/add_resource_stream_oracle_streaming_buffered_for_fluent_localization.py b/tests/fuzz_localization_property_cases/add_resource_stream_oracle_streaming_buffered_for_fluent_localization.py new file mode 100644 index 00000000..67b1c2ef --- /dev/null +++ b/tests/fuzz_localization_property_cases/add_resource_stream_oracle_streaming_buffered_for_fluent_localization.py @@ -0,0 +1,98 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# add_resource_stream oracle: streaming == buffered for FluentLocalization +# --------------------------------------------------------------------------- + + +class TestAddResourceStreamLocalizationOracle: + """Oracle: FluentLocalization.add_resource_stream is equivalent to add_resource. + + Properties verified: + - Message IDs registered via stream match those registered via buffered load. + - format_pattern results are identical across both loading paths. + - Junk entry counts match for the same FTL content. + - Second call to add_resource_stream (bundle already exists) behaves correctly. + """ + + @given( + locales=locale_chains(min_size=1, max_size=3), + source=ftl_simple_messages(), + ) + def test_message_ids_match_add_resource( + self, locales: list[str], source: str + ) -> None: + """Stream-loaded message IDs equal buffered-loaded IDs for same FTL.""" + l_buf = FluentLocalization(locales, use_isolating=False, strict=False) + l_str = FluentLocalization(locales, use_isolating=False, strict=False) + locale = locales[0] + + l_buf.add_resource(locale, source) + l_str.add_resource_stream(locale, source.splitlines(keepends=True)) + + # Format the same message IDs from both — stream must produce same results + buf_ids: set[str] = set() + for msg_id in source.splitlines(): + if " = " in msg_id: + buf_ids.add(msg_id.split(" = ", 1)[0].strip()) + + event(f"l10n_stream_locale_count={len(locales)}") + for mid in buf_ids: + r_buf, e_buf = l_buf.format_pattern(mid) + r_str, e_str = l_str.format_pattern(mid) + event(f"outcome={'match' if r_buf == r_str else 'mismatch'}") + assert r_buf == r_str, ( + f"format_pattern mismatch for {mid!r}: " + f"buffered={r_buf!r}, stream={r_str!r}" + ) + assert len(e_buf) == len(e_str) + + @given( + locales=locale_chains(min_size=1, max_size=2), + source=ftl_simple_messages(), + ) + def test_junk_count_matches_add_resource( + self, locales: list[str], source: str + ) -> None: + """Junk count from stream load matches junk count from buffered load.""" + l_buf = FluentLocalization(locales, use_isolating=False, strict=False) + l_str = FluentLocalization(locales, use_isolating=False, strict=False) + locale = locales[0] + + junk_buf = l_buf.add_resource(locale, source) + junk_str = l_str.add_resource_stream( + locale, source.splitlines(keepends=True) + ) + + event(f"junk_buf={len(junk_buf)}") + event(f"junk_stream={len(junk_str)}") + assert len(junk_buf) == len(junk_str) + + @given( + locales=locale_chains(min_size=1, max_size=2), + source1=ftl_simple_messages(), + source2=ftl_simple_messages(), + ) + def test_second_stream_call_accumulates_messages( + self, locales: list[str], source1: str, source2: str + ) -> None: + """Two add_resource_stream calls accumulate messages on same bundle. + + The second call hits the pre-existing bundle path (orchestrator.py + line 734->736 False branch) — verifies correct state accumulation. + """ + l10n = FluentLocalization(locales, use_isolating=False, strict=False) + locale = locales[0] + + l10n.add_resource_stream(locale, source1.splitlines(keepends=True)) + l10n.add_resource_stream(locale, source2.splitlines(keepends=True)) + + event("l10n_two_stream_calls=done") + # At minimum the bundle must exist and be queryable + result, _errors = l10n.format_pattern("__nonexistent__") + event(f"fallback_result={result!r}") + # Missing message returns fallback string, not exception, in non-strict mode + assert "__nonexistent__" in result diff --git a/tests/fuzz_localization_property_cases/cache_stats_aggregation_branch_coverage.py b/tests/fuzz_localization_property_cases/cache_stats_aggregation_branch_coverage.py new file mode 100644 index 00000000..2f5b9680 --- /dev/null +++ b/tests/fuzz_localization_property_cases/cache_stats_aggregation_branch_coverage.py @@ -0,0 +1,47 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Cache stats aggregation branch coverage +# --------------------------------------------------------------------------- + + +class TestCacheStatsAggregation: + """Tests for get_cache_stats aggregation (branch 1327->1325).""" + + @given( + locales=locale_chains(min_size=2, max_size=4), + ) + def test_cache_stats_aggregates_across_bundles( + self, locales: list[str], + ) -> None: + """get_cache_stats sums metrics across all initialized bundles.""" + event(f"bundle_count={len(locales)}") + l10n = FluentLocalization( + locales, cache=CacheConfig(), + ) + # Initialize all bundles with resources + for locale in locales: + l10n.add_resource(locale, f"msg = {locale}\n") + + # Format to create cache entries + l10n.format_value("msg") + + stats = l10n.get_cache_stats() + assert stats is not None + assert stats["bundle_count"] == len(locales) + assert l10n.cache_config is not None + assert stats["maxsize"] == l10n.cache_config.size * len(locales) + + @given( + locales=locale_chains(min_size=1, max_size=2), + ) + def test_cache_stats_none_when_disabled( + self, locales: list[str], + ) -> None: + """get_cache_stats returns None when caching disabled.""" + event("outcome=cache_disabled") + l10n = FluentLocalization(locales) + assert l10n.get_cache_stats() is None diff --git a/tests/fuzz_localization_property_cases/fallback_callback.py b/tests/fuzz_localization_property_cases/fallback_callback.py new file mode 100644 index 00000000..3c58c5c3 --- /dev/null +++ b/tests/fuzz_localization_property_cases/fallback_callback.py @@ -0,0 +1,53 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Fallback callback +# --------------------------------------------------------------------------- + + +class TestFallbackCallback: + """Tests for on_fallback callback with property-based inputs.""" + + @given( + locales=locale_chains(min_size=2, max_size=4), + mid=message_ids(), + ) + def test_fallback_callback_invoked_for_non_primary( + self, locales: list[str], mid: str, + ) -> None: + """on_fallback invoked when message resolved from non-primary.""" + event(f"locale_count={len(locales)}") + from ftllexengine.localization import FallbackInfo + events: list[FallbackInfo] = [] + l10n = FluentLocalization( + locales, on_fallback=events.append, + ) + # Only add to last locale + l10n.add_resource(locales[-1], f"{mid} = fallback\n") + l10n.format_value(mid) + if len(locales) > 1: + assert len(events) == 1 + assert events[0].requested_locale == normalize_locale(locales[0]) + assert events[0].resolved_locale == normalize_locale(locales[-1]) + assert events[0].message_id == mid + + @given( + locales=locale_chains(min_size=1, max_size=3), + mid=message_ids(), + ) + def test_no_fallback_when_primary_has_message( + self, locales: list[str], mid: str, + ) -> None: + """on_fallback not invoked when primary locale has message.""" + event("outcome=no_fallback") + from ftllexengine.localization import FallbackInfo + events: list[FallbackInfo] = [] + l10n = FluentLocalization( + locales, on_fallback=events.append, + ) + l10n.add_resource(locales[0], f"{mid} = primary\n") + l10n.format_value(mid) + assert len(events) == 0 diff --git a/tests/fuzz_localization_property_cases/fluent_localization_api_methods_coverage_targets.py b/tests/fuzz_localization_property_cases/fluent_localization_api_methods_coverage_targets.py new file mode 100644 index 00000000..2e6f4413 --- /dev/null +++ b/tests/fuzz_localization_property_cases/fluent_localization_api_methods_coverage_targets.py @@ -0,0 +1,227 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# FluentLocalization API methods (coverage targets) +# --------------------------------------------------------------------------- + + +class TestFluentLocalizationHasAttribute: + """Tests for has_attribute method (lines 1126-1130).""" + + @given( + locales=locale_chains(min_size=1, max_size=3), + ftl=ftl_messages_with_attributes(), + ) + def test_has_attribute_from_generated_resource( + self, locales: list[str], ftl: str, + ) -> None: + """has_attribute detects attributes in generated resources.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[0], ftl) + + # Extract message ID from generated FTL + first_line = ftl.split("\n", maxsplit=1)[0] + mid = first_line.split("=")[0].strip() + + # Check for attr0 (present if attributes were generated) + if ".attr0" in ftl: + assert l10n.has_attribute(mid, "attr0") is True + event("outcome=attribute_found") + else: + assert l10n.has_attribute(mid, "attr0") is False + event("outcome=no_attributes") + + @given(locales=locale_chains(min_size=2, max_size=4)) + def test_has_attribute_fallback_chain( + self, locales: list[str], + ) -> None: + """has_attribute searches across fallback chain.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + # Attribute only in last locale + l10n.add_resource( + locales[-1], "btn = Click\n .tooltip = Help text\n", + ) + assert l10n.has_attribute("btn", "tooltip") is True + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_has_attribute_missing_returns_false( + self, locales: list[str], + ) -> None: + """has_attribute returns False for nonexistent attributes.""" + event("outcome=not_found") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[0], "msg = No attributes\n") + assert l10n.has_attribute("msg", "nonexistent") is False + assert l10n.has_attribute("missing", "attr") is False + + +class TestFluentLocalizationGetMessageIds: + """Tests for get_message_ids method (lines 1142-1150).""" + + @given( + locales=locale_chains(min_size=1, max_size=3), + resources=ftl_resource_sets(), + ) + def test_get_message_ids_returns_union( + self, locales: list[str], resources: dict[str, str], + ) -> None: + """get_message_ids returns union of IDs across all locales.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + all_expected: set[str] = set() + for locale in locales: + if locale in resources: + l10n.add_resource(locale, resources[locale]) + # Parse message IDs from FTL + for line in resources[locale].split("\n"): + if "=" in line and not line.startswith( + ("#", " ", "-"), + ): + mid = line.split("=")[0].strip() + if mid: + all_expected.add(mid) + + ids = l10n.get_message_ids() + assert set(ids) == all_expected + # No duplicates + assert len(ids) == len(set(ids)) + + @given(locales=locale_chains(min_size=2, max_size=3)) + def test_get_message_ids_primary_locale_first( + self, locales: list[str], + ) -> None: + """get_message_ids orders primary locale IDs first.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[0], "alpha = A\n") + l10n.add_resource( + locales[-1], "alpha = A2\nbeta = B\n", + ) + ids = l10n.get_message_ids() + # alpha from primary appears before beta from fallback + assert ids.index("alpha") < ids.index("beta") + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_get_message_ids_empty_when_no_resources( + self, locales: list[str], + ) -> None: + """get_message_ids returns empty list when no resources loaded.""" + event("outcome=empty") + l10n = FluentLocalization(locales) + assert l10n.get_message_ids() == [] + + +class TestFluentLocalizationGetMessageVariables: + """Tests for get_message_variables method (lines 1169-1174).""" + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_get_message_variables_returns_variable_names( + self, locales: list[str], + ) -> None: + """get_message_variables extracts variable names from message.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + l10n.add_resource( + locales[0], + "greeting = Hello { $firstName } { $lastName }!\n", + ) + variables = l10n.get_message_variables("greeting") + assert "firstName" in variables + assert "lastName" in variables + + @given(locales=locale_chains(min_size=2, max_size=3)) + def test_get_message_variables_fallback( + self, locales: list[str], + ) -> None: + """get_message_variables searches fallback chain.""" + event("outcome=fallback_search") + l10n = FluentLocalization(locales) + l10n.add_resource( + locales[-1], "msg = Value { $count }\n", + ) + variables = l10n.get_message_variables("msg") + assert "count" in variables + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_get_message_variables_raises_for_missing( + self, locales: list[str], + ) -> None: + """get_message_variables raises KeyError for missing message.""" + event("outcome=key_error") + l10n = FluentLocalization(locales) + with pytest.raises(KeyError, match="not found"): + l10n.get_message_variables("nonexistent") + + +class TestFluentLocalizationGetAllMessageVariables: + """Tests for get_all_message_variables (lines 1188-1196).""" + + @given(locales=locale_chains(min_size=1, max_size=3)) + def test_get_all_message_variables_returns_dict( + self, locales: list[str], + ) -> None: + """get_all_message_variables returns dict of msg_id -> variables.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + l10n.add_resource( + locales[0], + "msg1 = { $name }\nmsg2 = Static text\n", + ) + all_vars = l10n.get_all_message_variables() + assert isinstance(all_vars, dict) + assert "msg1" in all_vars + assert "name" in all_vars["msg1"] + assert "msg2" in all_vars + + @given(locales=locale_chains(min_size=2, max_size=3)) + def test_primary_locale_variables_take_precedence( + self, locales: list[str], + ) -> None: + """Primary locale's variables win for duplicate message IDs.""" + event("outcome=precedence") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[0], "msg = { $primary }\n") + l10n.add_resource(locales[-1], "msg = { $fallback }\n") + all_vars = l10n.get_all_message_variables() + assert "primary" in all_vars["msg"] + + +class TestFluentLocalizationIntrospectTerm: + """Tests for introspect_term method (lines 1211-1217).""" + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_introspect_term_found( + self, locales: list[str], + ) -> None: + """introspect_term returns introspection for existing term.""" + event("outcome=term_found") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[0], "-brand = Firefox\n") + info = l10n.introspect_term("brand") + assert info is not None + + @given(locales=locale_chains(min_size=2, max_size=3)) + def test_introspect_term_fallback( + self, locales: list[str], + ) -> None: + """introspect_term searches fallback chain.""" + event("outcome=term_fallback") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[-1], "-product = App\n") + info = l10n.introspect_term("product") + assert info is not None + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_introspect_term_not_found( + self, locales: list[str], + ) -> None: + """introspect_term returns None for missing term.""" + event("outcome=term_not_found") + l10n = FluentLocalization(locales) + info = l10n.introspect_term("nonexistent") + assert info is None diff --git a/tests/fuzz_localization_property_cases/fluent_localization_orchestration_invariants.py b/tests/fuzz_localization_property_cases/fluent_localization_orchestration_invariants.py new file mode 100644 index 00000000..2f680495 --- /dev/null +++ b/tests/fuzz_localization_property_cases/fluent_localization_orchestration_invariants.py @@ -0,0 +1,120 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# FluentLocalization orchestration invariants +# --------------------------------------------------------------------------- + + +class TestFluentLocalizationOrchestration: + """Property invariants for FluentLocalization fallback behavior.""" + + @given(locales=locale_chains(min_size=1, max_size=5)) + def test_deduplication_preserves_order( + self, locales: list[str], + ) -> None: + """Locale deduplication preserves first-occurrence order.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + expected = tuple(dict.fromkeys(normalize_locale(locale) for locale in locales)) + assert l10n.locales == expected + + @given(locales=locale_chains(min_size=1, max_size=3)) + def test_locales_property_returns_same_instance( + self, locales: list[str], + ) -> None: + """locales property is referentially identical across calls.""" + event("outcome=identity_check") + l10n = FluentLocalization(locales) + assert l10n.locales is l10n.locales + + @given( + locales=locale_chains(min_size=2, max_size=4), + mid=message_ids(), + ) + def test_primary_locale_takes_precedence( + self, locales: list[str], mid: str, + ) -> None: + """First locale with message wins in fallback chain.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales, use_isolating=False) + for locale in locales: + l10n.add_resource(locale, f"{mid} = from-{locale}") + result, errors = l10n.format_value(mid) + assert not errors + assert result == f"from-{locales[0]}" + + @given( + locales=locale_chains(min_size=1, max_size=3), + mid=message_ids(), + ) + def test_has_message_consistent_with_format_value( + self, locales: list[str], mid: str, + ) -> None: + """has_message True iff format_value finds the message.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[0], f"{mid} = test") + has = l10n.has_message(mid) + _, errors = l10n.format_value(mid) + if has: + assert not any( + "not found in any locale" in str(e) for e in errors + ) + else: + assert any( + "not found in any locale" in str(e) for e in errors + ) + + @given( + locales=locale_chains(min_size=1, max_size=3), + mid=message_ids(), + ) + def test_format_value_deterministic( + self, locales: list[str], mid: str, + ) -> None: + """Repeated format_value calls return identical results.""" + event("outcome=determinism") + l10n = FluentLocalization(locales) + l10n.add_resource(locales[0], f"{mid} = stable") + r1, _ = l10n.format_value(mid) + r2, _ = l10n.format_value(mid) + assert r1 == r2 + + @given(mid=message_ids()) + def test_missing_message_returns_braced_id(self, mid: str) -> None: + """Missing message returns {message_id} per Fluent convention. + + strict=False: missing-message error returned in tuple, not raised. + """ + event("outcome=missing_message") + l10n = FluentLocalization(["en"], strict=False) + result, errors = l10n.format_value(mid) + assert result == f"{{{mid}}}" + assert len(errors) == 1 + + @given(mid=st.just("")) + def test_empty_message_id_returns_fallback(self, mid: str) -> None: + """Empty message ID returns {???} fallback. + + strict=False: invalid-ID error returned in tuple, not raised. + """ + event("outcome=empty_id") + l10n = FluentLocalization(["en"], strict=False) + result, errors = l10n.format_value(mid) + assert result == "{???}" + assert len(errors) == 1 + + @given(locales=locale_chains(min_size=1, max_size=3)) + def test_repr_contains_locales_and_bundles( + self, locales: list[str], + ) -> None: + """__repr__ always includes locales and bundle count.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales) + r = repr(l10n) + assert "FluentLocalization" in r + assert "locales=" in r + assert "bundles=" in r diff --git a/tests/fuzz_localization_property_cases/load_summary_aggregation_invariants.py b/tests/fuzz_localization_property_cases/load_summary_aggregation_invariants.py new file mode 100644 index 00000000..6a8cddfa --- /dev/null +++ b/tests/fuzz_localization_property_cases/load_summary_aggregation_invariants.py @@ -0,0 +1,218 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# LoadSummary aggregation invariants +# --------------------------------------------------------------------------- + + +class TestLoadSummaryAggregation: + """Property invariants for LoadSummary post_init aggregation.""" + + @given( + success_n=st.integers(min_value=0, max_value=5), + not_found_n=st.integers(min_value=0, max_value=5), + error_n=st.integers(min_value=0, max_value=5), + ) + def test_status_counts_sum_to_total( + self, success_n: int, not_found_n: int, error_n: int, + ) -> None: + """successful + not_found + errors == total_attempted.""" + total = success_n + not_found_n + error_n + event(f"total={total}") + results: list[ResourceLoadResult] = [] + for i in range(success_n): + results.append(ResourceLoadResult( + f"en{i}", f"s{i}.ftl", LoadStatus.SUCCESS, + )) + for i in range(not_found_n): + results.append(ResourceLoadResult( + f"nf{i}", f"n{i}.ftl", LoadStatus.NOT_FOUND, + )) + for i in range(error_n): + results.append(ResourceLoadResult( + f"er{i}", f"e{i}.ftl", LoadStatus.ERROR, + error=OSError(f"fail{i}"), + )) + + summary = LoadSummary(results=tuple(results)) + assert summary.total_attempted == total + assert summary.successful == success_n + assert summary.not_found == not_found_n + assert summary.errors == error_n + assert summary.successful + summary.not_found + summary.errors == total + + @given( + junk_per_result=st.lists( + st.integers(min_value=0, max_value=3), + min_size=1, max_size=5, + ), + ) + def test_junk_count_is_total_across_results( + self, junk_per_result: list[int], + ) -> None: + """junk_count sums junk_entries lengths across all results.""" + expected_total = sum(junk_per_result) + event(f"total_junk={expected_total}") + results: list[ResourceLoadResult] = [] + for idx, jc in enumerate(junk_per_result): + junk = tuple( + Junk( + content=f"j{idx}_{j}", + span=Span(start=0, end=1), + ) + for j in range(jc) + ) + results.append(ResourceLoadResult( + "en", f"f{idx}.ftl", LoadStatus.SUCCESS, + junk_entries=junk, + )) + + summary = LoadSummary(results=tuple(results)) + assert summary.junk_count == expected_total + assert summary.has_junk == (expected_total > 0) + + @given( + success_n=st.integers(min_value=0, max_value=3), + not_found_n=st.integers(min_value=0, max_value=3), + error_n=st.integers(min_value=0, max_value=3), + ) + def test_filter_methods_partition_results( + self, success_n: int, not_found_n: int, error_n: int, + ) -> None: + """get_errors + get_not_found + get_successful == all results.""" + event(f"error_n={error_n}") + results: list[ResourceLoadResult] = [] + for i in range(success_n): + results.append(ResourceLoadResult( + "en", f"s{i}.ftl", LoadStatus.SUCCESS, + )) + for i in range(not_found_n): + results.append(ResourceLoadResult( + "de", f"n{i}.ftl", LoadStatus.NOT_FOUND, + )) + for i in range(error_n): + results.append(ResourceLoadResult( + "fr", f"e{i}.ftl", LoadStatus.ERROR, + error=OSError("fail"), + )) + + summary = LoadSummary(results=tuple(results)) + assert len(summary.get_successful()) == success_n + assert len(summary.get_not_found()) == not_found_n + assert len(summary.get_errors()) == error_n + + @given( + locale=st.sampled_from(["en", "de", "fr"]), + n=st.integers(min_value=0, max_value=4), + ) + def test_get_by_locale_filters_correctly( + self, locale: str, n: int, + ) -> None: + """get_by_locale returns only matching-locale results.""" + event(f"filter_count={n}") + results: list[ResourceLoadResult] = [] + for i in range(n): + results.append(ResourceLoadResult( + locale, f"f{i}.ftl", LoadStatus.SUCCESS, + )) + # Add results for other locales + results.append(ResourceLoadResult( + "xx", "other.ftl", LoadStatus.SUCCESS, + )) + + summary = LoadSummary(results=tuple(results)) + filtered = summary.get_by_locale(locale) + assert len(filtered) == n + assert all(r.locale == locale for r in filtered) + + @given( + junk_counts=st.lists( + st.integers(min_value=0, max_value=3), + min_size=1, max_size=4, + ), + ) + def test_get_all_junk_flattens_correctly( + self, junk_counts: list[int], + ) -> None: + """get_all_junk returns flattened tuple of all Junk entries.""" + expected_total = sum(junk_counts) + event(f"flatten_total={expected_total}") + results: list[ResourceLoadResult] = [] + all_junk: list[Junk] = [] + for idx, jc in enumerate(junk_counts): + junk_entries = tuple( + Junk( + content=f"j{idx}_{j}", + span=Span(start=0, end=1), + ) + for j in range(jc) + ) + all_junk.extend(junk_entries) + results.append(ResourceLoadResult( + "en", f"f{idx}.ftl", LoadStatus.SUCCESS, + junk_entries=junk_entries, + )) + + summary = LoadSummary(results=tuple(results)) + flattened = summary.get_all_junk() + assert len(flattened) == expected_total + for j in all_junk: + assert j in flattened + + @given( + has_errors=st.booleans(), + has_not_found=st.booleans(), + has_junk=st.booleans(), + ) + def test_all_successful_and_all_clean_semantics( + self, has_errors: bool, has_not_found: bool, has_junk: bool, + ) -> None: + """all_successful ignores junk; all_clean requires zero junk.""" + event(f"errors={has_errors}") + event(f"not_found={has_not_found}") + results: list[ResourceLoadResult] = [] + # Always add at least one success + junk = ( + (Junk(content="j", span=Span(start=0, end=1)),) + if has_junk else () + ) + results.append(ResourceLoadResult( + "en", "main.ftl", LoadStatus.SUCCESS, junk_entries=junk, + )) + if has_errors: + results.append(ResourceLoadResult( + "de", "err.ftl", LoadStatus.ERROR, error=OSError("f"), + )) + if has_not_found: + results.append(ResourceLoadResult( + "fr", "nf.ftl", LoadStatus.NOT_FOUND, + )) + + summary = LoadSummary(results=tuple(results)) + + expected_all_successful = not has_errors and not has_not_found + assert summary.all_successful == expected_all_successful + + expected_all_clean = ( + not has_errors and not has_not_found and not has_junk + ) + assert summary.all_clean == expected_all_clean + + @given( + has_errors=st.booleans(), + ) + def test_has_errors_property(self, has_errors: bool) -> None: + """has_errors is True iff errors > 0.""" + event(f"has_errors={has_errors}") + results: list[ResourceLoadResult] = [ + ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), + ] + if has_errors: + results.append(ResourceLoadResult( + "de", "err.ftl", LoadStatus.ERROR, error=OSError("f"), + )) + summary = LoadSummary(results=tuple(results)) + assert summary.has_errors == has_errors diff --git a/tests/fuzz_localization_property_cases/path_resource_loader_invariants.py b/tests/fuzz_localization_property_cases/path_resource_loader_invariants.py new file mode 100644 index 00000000..78de321b --- /dev/null +++ b/tests/fuzz_localization_property_cases/path_resource_loader_invariants.py @@ -0,0 +1,149 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# PathResourceLoader invariants +# --------------------------------------------------------------------------- + + +class TestPathResourceLoaderInvariants: + """Property invariants for PathResourceLoader.""" + + @given( + prefix=st.text( + alphabet=st.characters( + whitelist_categories=("Ll", "Lu"), + ), + min_size=0, max_size=8, + ), + ) + def test_init_resolves_root_from_static_prefix( + self, prefix: str, + ) -> None: + """Root directory is derived from static prefix before {locale}.""" + base_path = ( + f"{prefix}/{{locale}}/resources" + if prefix + else "{locale}/resources" + ) + event(f"prefix_len={len(prefix)}") + loader = PathResourceLoader(base_path=base_path) + assert loader._resolved_root is not None + assert loader._resolved_root.is_absolute() + if not prefix: + assert loader._resolved_root == Path.cwd().resolve() + + @given(st.just("static/path")) + def test_missing_locale_placeholder_raises(self, path: str) -> None: + """base_path without {locale} raises ValueError.""" + event("outcome=validation_error") + with pytest.raises(ValueError, match="must contain"): + PathResourceLoader(base_path=path) + + @given( + root_dir=st.just("/tmp/test_root"), + ) + def test_explicit_root_dir_overrides_derivation( + self, root_dir: str, + ) -> None: + """Explicit root_dir takes precedence over base_path derivation.""" + event("outcome=root_override") + loader = PathResourceLoader( + base_path="any/{locale}/path", root_dir=root_dir, + ) + assert loader._resolved_root == Path(root_dir).resolve() + + @given( + locale=st.from_regex(r"[A-Za-z][A-Za-z0-9]*(?:[_-][A-Za-z0-9]+)*", fullmatch=True), + ) + def test_valid_locales_pass_validation(self, locale: str) -> None: + """Locale codes without path separators or .. pass validation.""" + event(f"locale_len={len(locale)}") + # Should not raise + PathResourceLoader._validate_locale(locale) + + @given( + locale=st.sampled_from([ + "../etc", "en/US", "en\\US", "..", "a/../b", + ]), + ) + def test_unsafe_locales_rejected(self, locale: str) -> None: + """Locales with path traversal or separators are rejected.""" + event("outcome=locale_rejected") + with pytest.raises(ValueError, match=r"Invalid locale:"): + PathResourceLoader._validate_locale(locale) + + @given(st.just("")) + def test_empty_locale_rejected(self, locale: str) -> None: + """Empty locale string is rejected.""" + event("outcome=empty_locale") + with pytest.raises(ValueError, match="locale cannot be blank"): + PathResourceLoader._validate_locale(locale) + + @given( + resource_id=st.sampled_from([ + " main.ftl", "main.ftl ", "\tmain.ftl", + ]), + ) + def test_whitespace_resource_id_rejected( + self, resource_id: str, + ) -> None: + """Resource IDs with leading/trailing whitespace are rejected.""" + event("outcome=whitespace_rejected") + with pytest.raises(ValueError, match="whitespace"): + PathResourceLoader._validate_resource_id(resource_id) + + @given( + resource_id=st.sampled_from([ + "/etc/passwd", "\\windows\\sys", "../secret.ftl", + ]), + ) + def test_unsafe_resource_id_rejected( + self, resource_id: str, + ) -> None: + """Resource IDs with traversal or absolute paths are rejected.""" + event("outcome=resource_rejected") + with pytest.raises(ValueError, match="not allowed in resource_id"): + PathResourceLoader._validate_resource_id(resource_id) + + @given( + filename=st.text( + alphabet=st.characters( + whitelist_categories=("Ll", "Nd"), + blacklist_characters="./\\ \t\n", + ), + min_size=1, max_size=15, + ), + ) + def test_valid_resource_ids_accepted(self, filename: str) -> None: + """Clean resource IDs pass validation.""" + rid = f"{filename}.ftl" + event(f"rid_len={len(rid)}") + PathResourceLoader._validate_resource_id(rid) + + @settings(deadline=None, suppress_health_check=[HealthCheck.function_scoped_fixture]) + @given( + locale=st.sampled_from(["en", "de", "fr"]), + content=st.text( + min_size=1, max_size=100, + alphabet=st.characters( + blacklist_categories=("Cc", "Cs"), + ), + ), + ) + def test_load_roundtrip_preserves_content( + self, tmp_path: Path, locale: str, content: str, + ) -> None: + """PathResourceLoader.load returns exact file content.""" + event(f"locale={locale}") + locale_dir = tmp_path / "locales" / locale + locale_dir.mkdir(parents=True, exist_ok=True) + (locale_dir / "test.ftl").write_text(content, encoding="utf-8") + + loader = PathResourceLoader( + str(tmp_path / "locales" / "{locale}"), + ) + loaded = loader.load(locale, "test.ftl") + assert loaded == content diff --git a/tests/fuzz_localization_property_cases/resource_load_result_property_invariants.py b/tests/fuzz_localization_property_cases/resource_load_result_property_invariants.py new file mode 100644 index 00000000..948ab989 --- /dev/null +++ b/tests/fuzz_localization_property_cases/resource_load_result_property_invariants.py @@ -0,0 +1,49 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# ResourceLoadResult property invariants +# --------------------------------------------------------------------------- + + +class TestResourceLoadResultProperties: + """Property invariants for ResourceLoadResult data class.""" + + @given( + status=st.sampled_from(list(LoadStatus)), + locale=st.sampled_from(["en", "de", "fr", "lv"]), + resource_id=st.sampled_from(["main.ftl", "ui.ftl"]), + ) + def test_status_properties_are_mutually_exclusive( + self, status: LoadStatus, locale: str, resource_id: str, + ) -> None: + """Exactly one status property is True for any LoadStatus.""" + event(f"status={status.value}") + result = ResourceLoadResult( + locale=locale, resource_id=resource_id, status=status, + ) + flags = [result.is_success, result.is_not_found, result.is_error] + assert sum(flags) == 1 + + @given( + junk_count=st.integers(min_value=0, max_value=5), + ) + def test_has_junk_iff_junk_entries_nonempty( + self, junk_count: int, + ) -> None: + """has_junk is True iff junk_entries is non-empty.""" + event(f"junk_count={junk_count}") + junk_entries = tuple( + Junk( + content=f"invalid{i}", + span=Span(start=i * 10, end=i * 10 + 7), + ) + for i in range(junk_count) + ) + result = ResourceLoadResult( + locale="en", resource_id="test.ftl", + status=LoadStatus.SUCCESS, junk_entries=junk_entries, + ) + assert result.has_junk == (junk_count > 0) diff --git a/tests/fuzz_localization_property_cases/resource_loading_and_load_summary.py b/tests/fuzz_localization_property_cases/resource_loading_and_load_summary.py new file mode 100644 index 00000000..c413e2ee --- /dev/null +++ b/tests/fuzz_localization_property_cases/resource_loading_and_load_summary.py @@ -0,0 +1,75 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Resource loading and load summary +# --------------------------------------------------------------------------- + + +class TestFluentLocalizationResourceLoading: + """Tests for resource loading and load summary.""" + + @given( + loader_tuple=resource_loaders(), + ) + def test_load_summary_tracks_all_attempts( + self, + loader_tuple: tuple[ + DictResourceLoader | FailingResourceLoader, + list[str], + list[str], + ], + ) -> None: + """get_load_summary reflects all load attempts from init.""" + loader, locales, resource_ids = loader_tuple + event(f"locale_count={len(locales)}") + l10n = FluentLocalization( + locales, resource_ids, loader, + ) + summary = l10n.get_load_summary() + assert summary.total_attempted == len(locales) * len(resource_ids) + + @given(locales=locale_chains(min_size=1, max_size=3)) + def test_custom_loader_source_path_format( + self, locales: list[str], + ) -> None: + """Non-PathResourceLoader uses locale/resource_id as source_path.""" + event("outcome=custom_loader_path") + resources = { + loc: {"main.ftl": f"msg = {loc}\n"} + for loc in locales + } + loader = DictResourceLoader(resources) + l10n = FluentLocalization(locales, ["main.ftl"], loader) + summary = l10n.get_load_summary() + for result in summary.results: + # Custom loader uses "locale/resource_id" format + assert "/" in result.source_path # type: ignore[operator] + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_oserror_during_load_recorded_as_error( + self, locales: list[str], + ) -> None: + """OSError during resource loading recorded with ERROR status.""" + event("outcome=oserror_recorded") + loader = FailingResourceLoader(OSError, "Permission denied") + l10n = FluentLocalization(locales, ["main.ftl"], loader) + summary = l10n.get_load_summary() + assert summary.errors > 0 + for result in summary.get_errors(): + assert isinstance(result.error, OSError) + + @given(locales=locale_chains(min_size=1, max_size=2)) + def test_valueerror_during_load_recorded_as_error( + self, locales: list[str], + ) -> None: + """ValueError during resource loading recorded with ERROR status.""" + event("outcome=valueerror_recorded") + loader = FailingResourceLoader(ValueError, "Path traversal") + l10n = FluentLocalization(locales, ["main.ftl"], loader) + summary = l10n.get_load_summary() + assert summary.errors > 0 + for result in summary.get_errors(): + assert isinstance(result.error, ValueError) diff --git a/tests/fuzz_localization_property_cases/terms_with_hypothesis_strategies.py b/tests/fuzz_localization_property_cases/terms_with_hypothesis_strategies.py new file mode 100644 index 00000000..69e8dcb0 --- /dev/null +++ b/tests/fuzz_localization_property_cases/terms_with_hypothesis_strategies.py @@ -0,0 +1,26 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Terms with Hypothesis strategies +# --------------------------------------------------------------------------- + + +class TestTermsWithStrategies: + """Tests using ftl_messages_with_terms strategy.""" + + @given( + locales=locale_chains(min_size=1, max_size=2), + ftl=ftl_messages_with_terms(), + ) + def test_terms_parsed_and_resolvable( + self, locales: list[str], ftl: str, + ) -> None: + """Generated terms are parsed without errors.""" + event(f"locale_count={len(locales)}") + l10n = FluentLocalization(locales, use_isolating=False) + junk = l10n.add_resource(locales[0], ftl) + # Should parse without junk + assert len(junk) == 0 diff --git a/tests/fuzz_localization_property_cases/validation_edge_cases.py b/tests/fuzz_localization_property_cases/validation_edge_cases.py new file mode 100644 index 00000000..0b4807d3 --- /dev/null +++ b/tests/fuzz_localization_property_cases/validation_edge_cases.py @@ -0,0 +1,71 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_localization_property.py.""" + +from tests.fuzz_localization_property_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Validation edge cases +# --------------------------------------------------------------------------- + + +class TestValidationEdgeCases: + """Validation and defensive checks.""" + + @given( + locale=st.sampled_from(["en", "de"]), + ws=st.sampled_from([" ", "\t", "\n"]), + position=st.sampled_from(["leading", "trailing"]), + ) + def test_add_resource_whitespace_locale_rejected( + self, locale: str, ws: str, position: str, + ) -> None: + """add_resource trims locale boundaries and resolves them canonically.""" + event(f"position={position}") + padded = ws + locale if position == "leading" else locale + ws + l10n = FluentLocalization([locale]) + l10n.add_resource(padded, "msg = test") + assert l10n.has_message("msg") + assert l10n.locales == (normalize_locale(locale),) + + @given( + locale=st.sampled_from(["en", "de"]), + invalid_args=st.sampled_from([42, "str", [1, 2], True]), + ) + def test_format_value_invalid_args_type( + self, locale: str, invalid_args: int | str | list[int] | bool, + ) -> None: + """format_value with non-Mapping args returns error. + + strict=False: invalid-args error returned in tuple, not raised. + """ + event("outcome=invalid_args") + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, "msg = test") + result, errors = l10n.format_value( + "msg", invalid_args, # type: ignore[arg-type] + ) + assert result == "{???}" + assert len(errors) > 0 + + @given( + locale=st.sampled_from(["en", "de"]), + invalid_attr=st.sampled_from([42, Decimal("3.14"), ["a"], {"k": "v"}]), + ) + def test_format_pattern_invalid_attribute_type( + self, + locale: str, + invalid_attr: int | Decimal | list[str] | dict[str, str], + ) -> None: + """format_pattern with non-str attribute returns error. + + strict=False: invalid-attribute error returned in tuple, not raised. + """ + event("outcome=invalid_attr") + l10n = FluentLocalization([locale], strict=False) + l10n.add_resource(locale, "msg = test\n .a = v") + result, errors = l10n.format_pattern( + "msg", None, + attribute=invalid_attr, # type: ignore[arg-type] + ) + assert result == "{???}" + assert len(errors) > 0 diff --git a/tests/fuzz_runtime_resolver_state_machine_cases/__init__.py b/tests/fuzz_runtime_resolver_state_machine_cases/__init__.py new file mode 100644 index 00000000..c48dd38f --- /dev/null +++ b/tests/fuzz_runtime_resolver_state_machine_cases/__init__.py @@ -0,0 +1,122 @@ +"""Stateful and advanced property-based tests for FluentResolver. + +Consolidates: +- test_resolver_state_machine.py: FluentResolverStateMachine (fuzz), TestResolverErrorPaths +- test_resolver_advanced_hypothesis.py: all classes +""" + +from __future__ import annotations + +from decimal import Decimal + +import pytest +from hypothesis import assume, event, given +from hypothesis import strategies as st +from hypothesis.stateful import Bundle, RuleBasedStateMachine, initialize, invariant, rule + +from ftllexengine.core.value_types import FluentValue +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.function_bridge import FunctionRegistry +from ftllexengine.runtime.functions import create_default_registry +from ftllexengine.runtime.resolver import FluentResolver +from ftllexengine.syntax import ( + Attribute, + CallArguments, + FunctionReference, + Identifier, + Message, + MessageReference, + NumberLiteral, + Pattern, + Placeable, + SelectExpression, + Term, + TermReference, + TextElement, + VariableReference, + Variant, +) +from tests.strategies import ftl_identifiers, ftl_simple_text + +# ============================================================================ +# STRATEGY HELPERS +# ============================================================================ + + +def simple_pattern(text: str) -> Pattern: + """Create simple text pattern.""" + return Pattern(elements=(TextElement(value=text),)) + + +def variable_pattern(var_name: str) -> Pattern: + """Create pattern with variable reference.""" + return Pattern( + elements=( + Placeable(expression=VariableReference(id=Identifier(name=var_name))), + ) + ) + + +def term_reference_pattern(term_name: str) -> Pattern: + """Create pattern with term reference.""" + return Pattern( + elements=( + Placeable( + expression=TermReference(id=Identifier(name=term_name), attribute=None) + ), + ) + ) + + +def message_reference_pattern(msg_name: str) -> Pattern: + """Create pattern with message reference.""" + return Pattern( + elements=( + Placeable( + expression=MessageReference(id=Identifier(name=msg_name), attribute=None) + ), + ) + ) + +__all__ = [ + "Attribute", + "Bundle", + "CallArguments", + "Decimal", + "ErrorCategory", + "FluentBundle", + "FluentResolver", + "FluentValue", + "FrozenFluentError", + "FunctionReference", + "FunctionRegistry", + "Identifier", + "Message", + "MessageReference", + "NumberLiteral", + "Pattern", + "Placeable", + "RuleBasedStateMachine", + "SelectExpression", + "Term", + "TermReference", + "TextElement", + "VariableReference", + "Variant", + "assume", + "create_default_registry", + "event", + "ftl_identifiers", + "ftl_simple_text", + "given", + "initialize", + "invariant", + "message_reference_pattern", + "pytest", + "rule", + "simple_pattern", + "st", + "term_reference_pattern", + "variable_pattern", +] diff --git a/tests/fuzz_runtime_resolver_state_machine_cases/advanced_property_based_tests.py b/tests/fuzz_runtime_resolver_state_machine_cases/advanced_property_based_tests.py new file mode 100644 index 00000000..75ee23b6 --- /dev/null +++ b/tests/fuzz_runtime_resolver_state_machine_cases/advanced_property_based_tests.py @@ -0,0 +1,572 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_runtime_resolver_state_machine.py.""" + +from tests.fuzz_runtime_resolver_state_machine_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# ADVANCED PROPERTY-BASED TESTS +# ============================================================================ + + +class TestPatternResolution: + """Properties about pattern resolution.""" + + @given( + msg_id=ftl_identifiers(), + text_content=ftl_simple_text(), + ) + def test_simple_text_resolution(self, msg_id: str, text_content: str) -> None: + """Property: Simple text patterns resolve to their content.""" + event(f"text_len={len(text_content)}") + pattern = Pattern(elements=(TextElement(value=text_content),)) + message = Message(id=Identifier(name=msg_id), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en_US", + messages={msg_id: message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {}) + assert not errors + assert result == text_content, f"Expected {text_content}, got {result}" + + @given( + msg_id=ftl_identifiers(), + parts=st.lists(ftl_simple_text(), min_size=2, max_size=5), + ) + def test_multiple_text_elements_concatenation( + self, msg_id: str, parts: list[str] + ) -> None: + """Property: Multiple text elements are concatenated in order.""" + event(f"part_count={len(parts)}") + elements = tuple(TextElement(value=p) for p in parts) + pattern = Pattern(elements=elements) + message = Message(id=Identifier(name=msg_id), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en_US", + messages={msg_id: message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {}) + assert not errors + expected = "".join(parts) + assert result == expected, f"Concatenation mismatch: {result} != {expected}" + + +class TestVariableResolution: + """Properties about variable reference resolution.""" + + @given( + var_name=ftl_identifiers(), + var_value=st.one_of( + st.text(min_size=1, max_size=50), + st.integers(), + st.decimals(allow_nan=False, allow_infinity=False), + ), + ) + def test_variable_value_preservation( + self, var_name: str, var_value: str | int | Decimal + ) -> None: + """Property: Variable values are preserved in resolution.""" + val_type = type(var_value).__name__ + event(f"var_type={val_type}") + bundle = FluentBundle("en_US", use_isolating=False) + + ftl_source = f"msg = {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("msg", {var_name: var_value}) + assert not errors + assert str(var_value) in result, f"Variable value not in result: {result}" + + @given(var_name=ftl_identifiers()) + def test_missing_variable_error_handling(self, var_name: str) -> None: + """Property: Missing variables are handled gracefully.""" + event(f"var_name_len={len(var_name)}") + # strict=False: testing soft-error return semantics; missing-variable + # errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", strict=False) + + ftl_source = f"msg = {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, _errors = bundle.format_pattern("msg", {}) + assert isinstance(result, str), "Must return string even on missing variable" + + @given(var_count=st.integers(min_value=1, max_value=10)) + def test_multiple_variables_independent(self, var_count: int) -> None: + """Property: Multiple variables resolve independently.""" + event(f"var_count={var_count}") + bundle = FluentBundle("en_US", use_isolating=False) + + var_names = [f"v{i}" for i in range(var_count)] + placeholders = " ".join(f"{{ ${vn} }}" for vn in var_names) + ftl_source = f"msg = {placeholders}" + bundle.add_resource(ftl_source) + + args = {vn: f"val{i}" for i, vn in enumerate(var_names)} + result, errors = bundle.format_pattern("msg", args) + assert not errors + for value in args.values(): + assert value in result, f"Variable value {value} missing" + + +class TestMessageReferenceResolution: + """Properties about message reference resolution.""" + + @given( + ref_msg_id=ftl_identifiers(), + ref_value=ftl_simple_text(), + main_msg_id=ftl_identifiers(), + ) + def test_message_reference_resolution( + self, ref_msg_id: str, ref_value: str, main_msg_id: str + ) -> None: + """Property: Message references resolve to referenced message value.""" + event(f"ref_value_len={len(ref_value)}") + assume(ref_msg_id != main_msg_id) + + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f""" +{ref_msg_id} = {ref_value} +{main_msg_id} = {{ {ref_msg_id} }} +""" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern(main_msg_id) + assert not errors + assert ref_value.strip() in result, f"Referenced message value not in result: {result}" + + @given( + nonexistent_id=ftl_identifiers(), + main_msg_id=ftl_identifiers(), + ) + def test_missing_message_reference_handling( + self, nonexistent_id: str, main_msg_id: str + ) -> None: + """Property: Missing message references handled gracefully.""" + event(f"id_len={len(nonexistent_id)}") + assume(nonexistent_id != main_msg_id) + + # strict=False: testing soft-error return semantics; missing-message- + # reference errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", strict=False) + ftl_source = f"{main_msg_id} = {{ {nonexistent_id} }}" + bundle.add_resource(ftl_source) + + result, _errors = bundle.format_pattern(main_msg_id) + assert isinstance(result, str), "Must return string for missing reference" + + +class TestTermReferenceResolution: + """Properties about term reference resolution.""" + + @given( + term_id=ftl_identifiers(), + term_value=ftl_simple_text(), + msg_id=ftl_identifiers(), + ) + def test_term_reference_resolution( + self, term_id: str, term_value: str, msg_id: str + ) -> None: + """Property: Term references resolve to term value.""" + event(f"term_value_len={len(term_value)}") + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f""" +-{term_id} = {term_value} +{msg_id} = {{ -{term_id} }} +""" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern(msg_id) + assert not errors + assert term_value.strip() in result, f"Term value not in result: {result}" + + @given( + nonexistent_term=ftl_identifiers(), + msg_id=ftl_identifiers(), + ) + def test_missing_term_reference_handling( + self, nonexistent_term: str, msg_id: str + ) -> None: + """Property: Missing term references handled gracefully.""" + event(f"term_len={len(nonexistent_term)}") + # strict=False: testing soft-error return semantics; missing-term + # errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", strict=False) + ftl_source = f"{msg_id} = {{ -{nonexistent_term} }}" + bundle.add_resource(ftl_source) + + result, _errors = bundle.format_pattern(msg_id) + assert isinstance(result, str), "Must return string for missing term" + + +class TestSelectExpressionResolution: + """Properties about select expression evaluation.""" + + @given( + var_name=ftl_identifiers(), + selector_value=st.one_of(st.text(min_size=1, max_size=20), st.integers(0, 100)), + variant1_key=ftl_identifiers(), + variant1_val=ftl_simple_text(), + variant2_val=ftl_simple_text(), + ) + def test_select_expression_matches_variant( + self, + var_name: str, + selector_value: str | int, + variant1_key: str, + variant1_val: str, + variant2_val: str, + ) -> None: + """Property: Select expressions match correct variant.""" + event(f"selector_type={type(selector_value).__name__}") + assume(variant1_key != "other") + assume(var_name != variant1_key) + + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f""" +msg = {{ ${var_name} -> + [{variant1_key}] {variant1_val} + *[other] {variant2_val} +}} +""" + bundle.add_resource(ftl_source) + + if not bundle.has_message("msg"): + return + + result, errors = bundle.format_pattern("msg", {var_name: selector_value}) + assert not errors + + if str(selector_value) == variant1_key: + assert variant1_val.strip() in result, f"Expected {variant1_val} for matching key" + else: + assert ( + variant2_val.strip() in result or variant1_val.strip() in result + ), "Must match some variant" + + @given( + var_name=ftl_identifiers(), + numeric_value=st.integers(0, 10), + ) + def test_numeric_selector_matching(self, var_name: str, numeric_value: int) -> None: + """Property: Numeric selectors match correctly.""" + event(f"numeric_value={numeric_value}") + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f""" +msg = {{ ${var_name} -> + [0] zero + [1] one + *[other] many +}} +""" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("msg", {var_name: numeric_value}) + assert not errors + + if numeric_value == 0: + assert "zero" in result, "Should match [0] variant" + elif numeric_value == 1: + assert "one" in result, "Should match [1] variant" + else: + assert "many" in result or result, "Should match default variant" + + +class TestCircularReferenceDetection: + """Properties about circular reference detection.""" + + @given( + msg1_id=ftl_identifiers(), + msg2_id=ftl_identifiers(), + ) + def test_direct_circular_reference_detection( + self, msg1_id: str, msg2_id: str + ) -> None: + """Property: Direct circular references are detected.""" + event(f"id_len={len(msg1_id)}") + assume(msg1_id != msg2_id) + + # strict=False: testing soft-error return semantics; circular-reference + # errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", strict=False) + ftl_source = f""" +{msg1_id} = {{ {msg2_id} }} +{msg2_id} = {{ {msg1_id} }} +""" + bundle.add_resource(ftl_source) + + result, _errors = bundle.format_pattern(msg1_id) + assert isinstance(result, str), "Must handle circular reference gracefully" + + @given(msg_ids=st.lists(ftl_identifiers(), min_size=3, max_size=5, unique=True)) + def test_indirect_circular_reference_detection(self, msg_ids: list[str]) -> None: + """Property: Indirect circular references (chains) are detected.""" + event(f"chain_len={len(msg_ids)}") + # strict=False: testing soft-error return semantics; circular-chain + # errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", strict=False) + + msg_pairs = list(zip(msg_ids, [*msg_ids[1:], msg_ids[0]], strict=True)) + ftl_lines = [f"{m1} = {{ {m2} }}" for m1, m2 in msg_pairs] + ftl_source = "\n".join(ftl_lines) + + bundle.add_resource(ftl_source) + + result, _errors = bundle.format_pattern(msg_ids[0]) + assert isinstance(result, str), "Must handle circular chain gracefully" + + +class TestFunctionCallResolution: + """Properties about function call resolution.""" + + @given( + func_name=st.text( + alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", min_size=3, max_size=10 + ), + return_value=ftl_simple_text(), + ) + def test_custom_function_called(self, func_name: str, return_value: str) -> None: + """Property: Custom functions are called and results used.""" + event(f"func_name_len={len(func_name)}") + assume(func_name not in ("NUMBER", "DATETIME")) + + bundle = FluentBundle("en_US", use_isolating=False) + + def custom_func() -> str: + return return_value + + bundle.add_function(func_name, custom_func) + bundle.add_resource(f"msg = {{ {func_name}() }}") + + result, errors = bundle.format_pattern("msg") + assert not errors + assert return_value.strip() in result, f"Function return value not in result: {result}" + + @given( + func_name=st.text( + alphabet=st.characters(whitelist_categories=["Lu"]), min_size=3, max_size=10 + ), + error_message=ftl_simple_text(), + ) + def test_function_exception_handling( + self, func_name: str, error_message: str + ) -> None: + """Property: Function exceptions are handled gracefully.""" + event(f"func_name_len={len(func_name)}") + assume(func_name not in ("NUMBER", "DATETIME")) + + # strict=False: testing soft-error return semantics; function-exception + # errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", strict=False) + + def failing_func() -> str: + raise ValueError(error_message) + + bundle.add_function(func_name, failing_func) + bundle.add_resource(f"msg = {{ {func_name}() }}") + + result, _errors = bundle.format_pattern("msg") + assert isinstance(result, str), "Must return string even when function fails" + + +class TestResolverIsolatingMarks: + """Properties about Unicode bidi isolation marks.""" + + @given( + var_name=ftl_identifiers(), + var_value=ftl_simple_text(), + ) + def test_isolating_marks_added_when_enabled( + self, var_name: str, var_value: str + ) -> None: + """Property: Isolation marks added around interpolated values when enabled.""" + event(f"value_len={len(var_value)}") + bundle = FluentBundle("en_US", use_isolating=True) + ftl_source = f"msg = {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("msg", {var_name: var_value}) + assert not errors + assert "\u2068" in result, "FSI mark missing" + assert "\u2069" in result, "PDI mark missing" + assert var_value in result, "Variable value missing" + + @given( + var_name=ftl_identifiers(), + var_value=ftl_simple_text(), + ) + def test_no_isolating_marks_when_disabled( + self, var_name: str, var_value: str + ) -> None: + """Property: No isolation marks when use_isolating=False.""" + event(f"value_len={len(var_value)}") + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f"msg = {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("msg", {var_name: var_value}) + assert not errors + assert "\u2068" not in result, "FSI mark should not be present" + assert "\u2069" not in result, "PDI mark should not be present" + + +class TestResolverValueFormatting: + """Properties about value formatting.""" + + @given( + var_name=ftl_identifiers(), + int_value=st.integers(), + ) + def test_integer_formatting(self, var_name: str, int_value: int) -> None: + """Property: Integers are formatted correctly.""" + event(f"int_value={int_value}") + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f"msg = {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("msg", {var_name: int_value}) + assert not errors + assert str(int_value) in result, f"Integer {int_value} not formatted correctly" + + @given( + var_name=ftl_identifiers(), + bool_value=st.booleans(), + ) + def test_boolean_formatting(self, var_name: str, bool_value: bool) -> None: + """Property: Booleans are formatted as lowercase 'true'/'false'.""" + event(f"bool_value={bool_value}") + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f"msg = {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("msg", {var_name: bool_value}) + assert not errors + expected = "true" if bool_value else "false" + assert expected in result, f"Boolean {bool_value} not formatted correctly" + + +class TestResolverMetamorphicProperties: + """Metamorphic properties relating different resolution operations.""" + + @given( + msg_id=ftl_identifiers(), + text1=ftl_simple_text(), + text2=ftl_simple_text(), + ) + def test_concatenation_order_preserved( + self, msg_id: str, text1: str, text2: str + ) -> None: + """Property: Multiple text elements appear in order.""" + event(f"text1_len={len(text1)}") + assume(text1 != text2) + + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f"{msg_id} = {text1} {text2}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern(msg_id) + assert not errors + assert text1.strip() in result, "First text element should be present" + assert text2.strip() in result, "Second text element should be present" + + idx1 = result.find(text1.strip()) + idx2 = result.find(text2.strip()) + if idx1 != idx2: + assert idx1 < idx2, "Text elements should appear in order" + + @given( + msg_id=ftl_identifiers(), + var_name=ftl_identifiers(), + value1=ftl_simple_text(), + value2=ftl_simple_text(), + ) + def test_variable_value_substitution( + self, msg_id: str, var_name: str, value1: str, value2: str + ) -> None: + """Property: Changing variable value changes result.""" + event(f"values_differ={value1 != value2}") + assume(value1 != value2) + assume(value1 not in value2 and value2 not in value1) + + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f"{msg_id} = {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result1, errors = bundle.format_pattern(msg_id, {var_name: value1}) + assert not errors + result2, errors = bundle.format_pattern(msg_id, {var_name: value2}) + assert not errors + assert result1 != result2, "Different variable values should produce different results" + + +class TestResolverErrorRecovery: + """Properties about error recovery during resolution.""" + + @given( + msg_id=ftl_identifiers(), + partial_text=ftl_simple_text(), + var_name=ftl_identifiers(), + ) + def test_partial_resolution_on_error( + self, msg_id: str, partial_text: str, var_name: str + ) -> None: + """Property: Partial resolution continues after errors.""" + event(f"text_len={len(partial_text)}") + # strict=False: testing soft-error return semantics; missing-variable + # errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", use_isolating=False, strict=False) + ftl_source = f"{msg_id} = {partial_text} {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, _errors = bundle.format_pattern(msg_id, {}) + assert partial_text.strip() in result, "Static text should be present even with missing var" + + +class TestResolverCoverageEdgeCases: + """Coverage tests for resolver edge cases.""" + + @given( + msg_id=ftl_identifiers(), + text=ftl_simple_text(), + ) + def test_placeable_error_handling_in_pattern( + self, msg_id: str, text: str + ) -> None: + """Placeable error handling in _resolve_pattern (line 142->138).""" + event(f"text_len={len(text)}") + # strict=False: testing soft-error return semantics; missing-variable + # errors must be returned in the tuple, not raised. + bundle = FluentBundle("en_US", use_isolating=False, strict=False) + ftl_source = f"{msg_id} = {text} {{ $missing }}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern(msg_id, {}) + assert len(errors) > 0 + assert "{$missing}" in result + + @given( + msg_id=ftl_identifiers(), + var_name=ftl_identifiers(), + value=st.integers(), + ) + def test_nested_placeable_expression_resolution( + self, msg_id: str, var_name: str, value: int + ) -> None: + """Placeable expression resolution (line 190).""" + event(f"value={value}") + bundle = FluentBundle("en_US", use_isolating=False) + ftl_source = f"{msg_id} = Value: {{ ${var_name} }}" + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern(msg_id, {var_name: value}) + assert not errors + assert str(value) in result diff --git a/tests/fuzz_runtime_resolver_state_machine_cases/direct_error_path_tests_from_state_machine_module.py b/tests/fuzz_runtime_resolver_state_machine_cases/direct_error_path_tests_from_state_machine_module.py new file mode 100644 index 00000000..a3690e0d --- /dev/null +++ b/tests/fuzz_runtime_resolver_state_machine_cases/direct_error_path_tests_from_state_machine_module.py @@ -0,0 +1,166 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_runtime_resolver_state_machine.py.""" + +from tests.fuzz_runtime_resolver_state_machine_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# DIRECT ERROR PATH TESTS (from state machine module) +# ============================================================================ + + +class TestStatefulErrorPaths: + """Direct tests for specific error paths that are hard to reach via state machine.""" + + def test_term_not_found_direct(self) -> None: + """Term not found error (line 176).""" + resolver = FluentResolver( + locale="en_US", + messages={}, + terms={}, + function_registry=create_default_registry(), + use_isolating=False, + ) + + message = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier(name="nonexistent"), + attribute=None, + ) + ), + ) + ), + attributes=(), + comment=None, + ) + + result, errors = resolver.resolve_message(message, args={}) + assert len(errors) > 0 + assert "{-nonexistent}" in result + + def test_term_attribute_not_found_direct(self) -> None: + """Term attribute not found error (lines 182-185).""" + from ftllexengine.syntax import Term + + term = Term( + id=Identifier(name="brand"), + value=simple_pattern("Firefox"), + attributes=(), + comment=None, + ) + + resolver = FluentResolver( + locale="en_US", + messages={}, + terms={"brand": term}, + function_registry=create_default_registry(), + use_isolating=False, + ) + + message = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier(name="brand"), + attribute=Identifier(name="nonexistent"), + ) + ), + ) + ), + attributes=(), + comment=None, + ) + + result, errors = resolver.resolve_message(message, args={}) + assert len(errors) > 0 + assert "{-brand.nonexistent}" in result + + def test_message_not_found_reference(self) -> None: + """Message not found when referenced from another message (line 164).""" + message = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=MessageReference( + id=Identifier(name="nonexistent"), + attribute=None, + ) + ), + ) + ), + attributes=(), + comment=None, + ) + + resolver = FluentResolver( + locale="en_US", + messages={"test": message}, + terms={}, + function_registry=create_default_registry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, args={}) + assert len(errors) > 0 + assert "{nonexistent}" in result + + def test_variable_not_provided(self) -> None: + """Variable not provided in args (line 157).""" + message = Message( + id=Identifier(name="test"), + value=variable_pattern("missing_var"), + attributes=(), + comment=None, + ) + + resolver = FluentResolver( + locale="en_US", + messages={"test": message}, + terms={}, + function_registry=create_default_registry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, args={}) + assert len(errors) > 0 + assert "{$missing_var}" in result + + @given(st.data()) + def test_format_value_edge_cases(self, data: st.DataObject) -> None: + """Property: _format_value never crashes, always returns string (lines 268-278).""" + resolver = FluentResolver( + locale="en_US", + messages={}, + terms={}, + function_registry=create_default_registry(), + use_isolating=False, + ) + + test_values: list[FluentValue] = [ + data.draw(st.text()), + data.draw(st.integers()), + data.draw(st.decimals(allow_nan=False, allow_infinity=False)), + data.draw(st.booleans()), + None, + ] + + value = None + for value in test_values: + result = resolver._format_value(value) + assert isinstance(result, str), f"_format_value({value}) should return string" + val_type = type(value).__name__ + event(f"last_value_type={val_type}") + + def test_select_expression_no_variants(self) -> None: + """SelectExpression with no variants raises ValueError at construction.""" + with pytest.raises(ValueError, match="at least one variant"): + SelectExpression( + selector=NumberLiteral(value=1, raw="1"), + variants=(), + ) diff --git a/tests/fuzz_runtime_resolver_state_machine_cases/state_machine.py b/tests/fuzz_runtime_resolver_state_machine_cases/state_machine.py new file mode 100644 index 00000000..76611d6a --- /dev/null +++ b/tests/fuzz_runtime_resolver_state_machine_cases/state_machine.py @@ -0,0 +1,389 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_runtime_resolver_state_machine.py.""" + +from tests.fuzz_runtime_resolver_state_machine_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# STATE MACHINE +# ============================================================================ + + +class FluentResolverStateMachine(RuleBasedStateMachine): + """State machine for testing FluentResolver. + + Bundles: + - messages: Message IDs that have been added + - terms: Term IDs that have been added + - variables: Variable names used in patterns + + Invariants: + - Resolving same message twice produces same result (determinism) + - Resolver never crashes (robustness) + - All messages are resolvable with correct args + """ + + messages = Bundle("messages") + terms = Bundle("terms") + variables = Bundle("variables") + + @initialize() + def setup_resolver(self) -> None: + """Initialize resolver with empty registries.""" + self.message_registry: dict[str, Message] = {} + self.term_registry: dict[str, Term] = {} + self.locale = "en_US" + self.resolver = FluentResolver( + locale=self.locale, + messages=self.message_registry, + terms=self.term_registry, + function_registry=create_default_registry(), + use_isolating=False, + ) + + @rule(target=messages, msg_id=ftl_identifiers(), text=st.text(min_size=1, max_size=50)) + def add_simple_message(self, msg_id: str, text: str) -> str: + """Add simple text-only message.""" + message = Message( + id=Identifier(name=msg_id), + value=simple_pattern(text), + attributes=(), + comment=None, + ) + self.message_registry[msg_id] = message + event("rule=add_simple_message") + return msg_id + + @rule( + target=messages, + msg_id=ftl_identifiers(), + var_name=ftl_identifiers(), + ) + def add_message_with_variable(self, msg_id: str, var_name: str) -> str: + """Add message that requires variable argument.""" + message = Message( + id=Identifier(name=msg_id), + value=variable_pattern(var_name), + attributes=(), + comment=None, + ) + self.message_registry[msg_id] = message + event("rule=add_message_with_variable") + return msg_id + + @rule(target=terms, term_id=ftl_identifiers(), text=st.text(min_size=1, max_size=50)) + def add_simple_term(self, term_id: str, text: str) -> str: + """Add simple term.""" + term = Term( + id=Identifier(name=term_id), + value=simple_pattern(text), + attributes=(), + comment=None, + ) + self.term_registry[term_id] = term + event("rule=add_simple_term") + return term_id + + @rule( + target=messages, + msg_id=ftl_identifiers(), + term_id=terms, + ) + def add_message_referencing_term(self, msg_id: str, term_id: str) -> str: + """Add message that references a term.""" + message = Message( + id=Identifier(name=msg_id), + value=term_reference_pattern(term_id), + attributes=(), + comment=None, + ) + self.message_registry[msg_id] = message + event("rule=add_message_referencing_term") + return msg_id + + @rule(msg_id=messages) + def resolve_simple_message(self, msg_id: str) -> None: + """Resolve message without arguments. Checks determinism.""" + assume(msg_id in self.message_registry) + message = self.message_registry[msg_id] + + needs_vars = any( + isinstance(elem, Placeable) + and isinstance(elem.expression, VariableReference) + for elem in (message.value.elements if message.value else ()) + ) + + if needs_vars: + result, errors = self.resolver.resolve_message(message, args={}) + assert isinstance(result, str) + assert len(errors) >= 0 + else: + result1, _errors = self.resolver.resolve_message(message, args={}) + result2, _errors = self.resolver.resolve_message(message, args={}) + assert result1 == result2, f"Resolution should be deterministic for {msg_id}" + assert isinstance(result1, str) + event(f"rule=resolve_simple(vars={needs_vars})") + + @rule( + msg_id=messages, + var_name=ftl_identifiers(), + var_value=st.text(max_size=50), + ) + def resolve_message_with_args(self, msg_id: str, var_name: str, var_value: str) -> None: + """Resolve message with arguments.""" + assume(msg_id in self.message_registry) + message = self.message_registry[msg_id] + + args = {var_name: var_value} + + try: + result, _errors = self.resolver.resolve_message(message, args=args) + assert isinstance(result, str) + except FrozenFluentError: + pass + event("rule=resolve_message_with_args") + + @rule( + msg_id=ftl_identifiers(), + attr_name=ftl_identifiers(), + text=st.text(min_size=1, max_size=50), + ) + def add_message_with_attribute(self, msg_id: str, attr_name: str, text: str) -> None: + """Add message with attribute and resolve it.""" + attribute = Attribute( + id=Identifier(name=attr_name), + value=simple_pattern(text), + ) + message = Message( + id=Identifier(name=msg_id), + value=simple_pattern("default value"), + attributes=(attribute,), + comment=None, + ) + self.message_registry[msg_id] = message + + result, errors = self.resolver.resolve_message(message, args={}, attribute=attr_name) + assert text in result + assert errors == (), f"Unexpected errors: {errors}" + event("rule=add_message_with_attribute") + + @rule(msg_id=messages) + def resolve_nonexistent_attribute(self, msg_id: str) -> None: + """Try to resolve non-existent attribute - should give REFERENCE error.""" + assume(msg_id in self.message_registry) + message = self.message_registry[msg_id] + + _result, errors = self.resolver.resolve_message( + message, args={}, attribute="nonexistent_attr_xyz" + ) + assert len(errors) == 1 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + assert "attribute" in str(errors[0]).lower() + event("rule=resolve_nonexistent_attribute") + + @rule() + def resolve_nonexistent_term(self) -> None: + """Try to resolve term reference to non-existent term.""" + msg_id = "msg_ref_bad_term" + message = Message( + id=Identifier(name=msg_id), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier(name="nonexistent_term_xyz"), + attribute=None, + ) + ), + ) + ), + attributes=(), + comment=None, + ) + self.message_registry[msg_id] = message + + result, errors = self.resolver.resolve_message(message, args={}) + assert isinstance(result, str) + assert len(errors) > 0 + event("rule=resolve_nonexistent_term") + + @rule(term_id=terms) + def resolve_term_attribute_not_found(self, term_id: str) -> None: + """Try to resolve term attribute that doesn't exist.""" + assume(term_id in self.term_registry) + + msg_id = "msg_ref_term_attr" + message = Message( + id=Identifier(name=msg_id), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier(name=term_id), + attribute=Identifier(name="nonexistent_attr"), + ) + ), + ) + ), + attributes=(), + comment=None, + ) + self.message_registry[msg_id] = message + + result, errors = self.resolver.resolve_message(message, args={}) + assert isinstance(result, str) + assert len(errors) > 0 + event("rule=resolve_term_attr_not_found") + + @rule() + def test_unknown_expression_type(self) -> None: + """Document architecturally unreachable expression type error path. + + The unknown expression error path is unreachable by design since all + AST node types are exhaustively handled. This rule documents the gap. + """ + event("rule=test_unknown_expression_type") + + @rule( + msg_id1=ftl_identifiers(), + msg_id2=ftl_identifiers(), + ) + def test_circular_reference_detection(self, msg_id1: str, msg_id2: str) -> None: + """Test circular reference detection produces graceful degradation.""" + assume(msg_id1 != msg_id2) + + message1 = Message( + id=Identifier(name=msg_id1), + value=message_reference_pattern(msg_id2), + attributes=(), + comment=None, + ) + message2 = Message( + id=Identifier(name=msg_id2), + value=message_reference_pattern(msg_id1), + attributes=(), + comment=None, + ) + + self.message_registry[msg_id1] = message1 + self.message_registry[msg_id2] = message2 + + result, _errors = self.resolver.resolve_message(message1, args={}) + assert isinstance(result, str) + event("rule=circular_reference_detection") + + @rule( + msg_id=ftl_identifiers(), + number=st.integers(min_value=0, max_value=100), + ) + def add_message_with_select_expression(self, msg_id: str, number: int) -> None: + """Add message with select expression (plural).""" + variants = ( + Variant( + key=Identifier(name="one"), + value=simple_pattern("singular"), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=simple_pattern("plural"), + default=True, + ), + ) + + select_expr = SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=variants, + ) + + message = Message( + id=Identifier(name=msg_id), + value=Pattern(elements=(Placeable(expression=select_expr),)), + attributes=(), + comment=None, + ) + + self.message_registry[msg_id] = message + + result, errors = self.resolver.resolve_message(message, args={"count": number}) + assert result in ["singular", "plural"] + assert errors == (), f"Unexpected errors: {errors}" + event(f"rule=select_expression({result})") + + @rule() + def test_message_no_value(self) -> None: + """Test message without value (only attributes) produces REFERENCE error.""" + msg_id = "msg_no_value" + message = Message( + id=Identifier(name=msg_id), + value=None, + attributes=( + Attribute( + id=Identifier(name="attr1"), + value=simple_pattern("has attribute"), + ), + ), + comment=None, + ) + self.message_registry[msg_id] = message + + result, errors = self.resolver.resolve_message(message, args={}) + assert len(errors) == 1 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + assert "no value" in str(errors[0]).lower() + assert isinstance(result, str) + event("rule=test_message_no_value") + + @rule( + msg_id=ftl_identifiers(), + func_name=st.sampled_from(["NUMBER", "NONEXISTENT"]), + ) + def test_function_reference(self, msg_id: str, func_name: str) -> None: + """Test function reference resolution (both successful and failed calls).""" + func_ref = FunctionReference( + id=Identifier(name=func_name), + arguments=CallArguments( + positional=(NumberLiteral(value=42, raw="42"),), + named=(), + ), + ) + + message = Message( + id=Identifier(name=msg_id), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + comment=None, + ) + + self.message_registry[msg_id] = message + + result, errors = self.resolver.resolve_message(message, args={}) + assert isinstance(result, str) + + if func_name == "NUMBER": + assert "42" in result + assert errors == () + else: + assert len(errors) > 0 + event(f"rule=function_reference({func_name})") + + @invariant() + def resolver_state_consistent(self) -> None: + """Invariant: Resolver registries stay consistent.""" + assert self.resolver._messages == self.message_registry + assert self.resolver._terms == self.term_registry + assert self.resolver._locale == self.locale + msg_count = len(self.message_registry) + event(f"invariant=state_consistent({msg_count})") + + @invariant() + def resolution_uses_explicit_context(self) -> None: + """Invariant: Resolver properly initialized with explicit context pattern.""" + assert self.resolver._locale == self.locale + event("invariant=explicit_context") + + +# Stateful test runner +TestFluentResolverStateMachine = FluentResolverStateMachine.TestCase +TestFluentResolverStateMachine = pytest.mark.fuzz(TestFluentResolverStateMachine) diff --git a/tests/fuzz_runtime_resolver_state_machine_cases/strategy_helpers.py b/tests/fuzz_runtime_resolver_state_machine_cases/strategy_helpers.py new file mode 100644 index 00000000..c858b2e9 --- /dev/null +++ b/tests/fuzz_runtime_resolver_state_machine_cases/strategy_helpers.py @@ -0,0 +1,44 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_runtime_resolver_state_machine.py.""" + +from tests.fuzz_runtime_resolver_state_machine_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# STRATEGY HELPERS +# ============================================================================ + + +def simple_pattern(text: str) -> Pattern: + """Create simple text pattern.""" + return Pattern(elements=(TextElement(value=text),)) + + +def variable_pattern(var_name: str) -> Pattern: + """Create pattern with variable reference.""" + return Pattern( + elements=( + Placeable(expression=VariableReference(id=Identifier(name=var_name))), + ) + ) + + +def term_reference_pattern(term_name: str) -> Pattern: + """Create pattern with term reference.""" + return Pattern( + elements=( + Placeable( + expression=TermReference(id=Identifier(name=term_name), attribute=None) + ), + ) + ) + + +def message_reference_pattern(msg_name: str) -> Pattern: + """Create pattern with message reference.""" + return Pattern( + elements=( + Placeable( + expression=MessageReference(id=Identifier(name=msg_name), attribute=None) + ), + ) + ) diff --git a/tests/fuzz_syntax_serializer_property_cases/__init__.py b/tests/fuzz_syntax_serializer_property_cases/__init__.py new file mode 100644 index 00000000..4ce6c915 --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/__init__.py @@ -0,0 +1,131 @@ +"""Property-based tests for ftllexengine.syntax.serializer module. + +Comprehensive test suite achieving 100% coverage using Hypothesis property-based +testing with HypoFuzz semantic coverage events. + +Test Properties: +- Roundtrip: parse(serialize(ast)) preserves structure +- Idempotence: serialize(parse(serialize(ast))) == serialize(ast) +- Validation: Invalid ASTs raise SerializationValidationError +- Depth: Nested ASTs respect max_depth limits + +Coverage Targets: +- Lines 117-118: SelectExpression with 0 defaults +- Lines 121-125: SelectExpression with >1 defaults +- Branch 238: FunctionReference without arguments +- Branch 429: Junk serialization +- Branch 616: Placeable in pattern +- Branch 749: SelectExpression serialization +- Branch 804: NumberLiteral variant keys + +Python 3.13+. +""" + +from __future__ import annotations + +import typing +from typing import cast + +import pytest +from hypothesis import HealthCheck, event, given, settings +from hypothesis import strategies as st + +from ftllexengine.constants import MAX_DEPTH +from ftllexengine.enums import CommentType +from ftllexengine.syntax.ast import ( + CallArguments, + Comment, + FTLLiteral, + FunctionReference, + Identifier, + Junk, + Message, + NamedArgument, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TermReference, + TextElement, + VariableReference, +) +from ftllexengine.syntax.parser import FluentParserV1 +from ftllexengine.syntax.serializer import ( + FluentSerializer, + SerializationDepthError, + SerializationValidationError, + serialize, +) +from ftllexengine.syntax.serializer_lines import ( + _classify_line, + _escape_text, + _LineKind, # Private import for property tests +) +from tests.strategies.ftl import ( + build_invalid_select_multiple_defaults, + build_invalid_select_no_defaults, + ftl_comment_nodes, + ftl_deep_placeables, + ftl_function_references_no_args, + ftl_junk_nodes, + ftl_message_nodes, + ftl_patterns, + ftl_placeables, + ftl_resources, + ftl_select_expressions, + ftl_select_expressions_with_number_keys, + ftl_term_nodes, +) + +__all__ = [ + "MAX_DEPTH", + "CallArguments", + "Comment", + "CommentType", + "FTLLiteral", + "FluentParserV1", + "FluentSerializer", + "FunctionReference", + "HealthCheck", + "Identifier", + "Junk", + "Message", + "NamedArgument", + "Pattern", + "Placeable", + "Resource", + "SelectExpression", + "SerializationDepthError", + "SerializationValidationError", + "StringLiteral", + "Term", + "TermReference", + "TextElement", + "VariableReference", + "_LineKind", + "_classify_line", + "_escape_text", + "build_invalid_select_multiple_defaults", + "build_invalid_select_no_defaults", + "cast", + "event", + "ftl_comment_nodes", + "ftl_deep_placeables", + "ftl_function_references_no_args", + "ftl_junk_nodes", + "ftl_message_nodes", + "ftl_patterns", + "ftl_placeables", + "ftl_resources", + "ftl_select_expressions", + "ftl_select_expressions_with_number_keys", + "ftl_term_nodes", + "given", + "pytest", + "serialize", + "settings", + "st", + "typing", +] diff --git a/tests/fuzz_syntax_serializer_property_cases/call_argument_depth_properties_depth_guard_in_arguments.py b/tests/fuzz_syntax_serializer_property_cases/call_argument_depth_properties_depth_guard_in_arguments.py new file mode 100644 index 00000000..7a7f5774 --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/call_argument_depth_properties_depth_guard_in_arguments.py @@ -0,0 +1,134 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Call Argument Depth Properties (Depth Guard in Arguments) +# ============================================================================= + + +class TestCallArgumentDepthProperties: + """Test depth guard enforcement within call arguments. + + Serializer wraps each positional and named argument expression + in depth_guard. Nested term/function calls must respect limits. + """ + + @given(depth=st.integers(min_value=1, max_value=8)) + def test_nested_call_arguments_serialize( + self, depth: int + ) -> None: + """PROPERTY: Nested call arguments within limits serialize. + + Events emitted: + - call_arg_depth={n}: Nesting depth of call arguments + - outcome=nested_args_ok: Serialization succeeded + """ + event(f"call_arg_depth={depth}") + + # Build: NUMBER(-t0(-t1(-t2(...$x...)))) + inner: VariableReference | TermReference + inner = VariableReference(id=Identifier(name="x")) + for i in range(depth): + inner = TermReference( + id=Identifier(name=f"t{i}"), + arguments=CallArguments( + positional=(inner,), named=() + ), + ) + func = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(inner,), named=() + ), + ) + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=func),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + event("outcome=nested_args_ok") + assert "-t0(" in result + assert "$x" in result + + def test_deep_call_args_exceed_depth_limit(self) -> None: + """Deeply nested call arguments exceed depth limit.""" + inner: VariableReference | TermReference + inner = VariableReference(id=Identifier(name="x")) + for i in range(20): + inner = TermReference( + id=Identifier(name=f"t{i}"), + arguments=CallArguments( + positional=(inner,), named=() + ), + ) + func = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(inner,), named=() + ), + ) + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=func),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + with pytest.raises(SerializationDepthError): + serialize(resource, validate=True, max_depth=10) + + @given( + depth=st.integers(min_value=1, max_value=5), + named_val=st.sampled_from(["decimal", "percent"]), + ) + def test_named_args_in_nested_calls( + self, depth: int, named_val: str + ) -> None: + """PROPERTY: Named arguments in nested calls serialize. + + Events emitted: + - call_arg_depth={n}: Nesting depth + - has_named_arg=True: Named argument present + """ + event(f"call_arg_depth={depth}") + event("has_named_arg=True") + + inner: VariableReference | TermReference + inner = VariableReference(id=Identifier(name="x")) + for i in range(depth): + named = NamedArgument( + name=Identifier(name="style"), + value=StringLiteral(value=named_val), + ) + inner = TermReference( + id=Identifier(name=f"t{i}"), + arguments=CallArguments( + positional=(inner,), named=(named,) + ), + ) + func = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(inner,), named=() + ), + ) + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=func),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + assert f'style: "{named_val}"' in result diff --git a/tests/fuzz_syntax_serializer_property_cases/control_character_string_literal_properties.py b/tests/fuzz_syntax_serializer_property_cases/control_character_string_literal_properties.py new file mode 100644 index 00000000..c38725eb --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/control_character_string_literal_properties.py @@ -0,0 +1,94 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Control Character StringLiteral Properties +# ============================================================================= + + +class TestControlCharStringLiteralProperties: + """Test StringLiteral escaping for all control characters. + + Serializer uses \\uHHHH for chars < 0x20 and 0x7F. Verify + this encoding for the full control character range. + """ + + @given( + code=st.integers(min_value=0, max_value=0x1F), + ) + def test_c0_control_chars_escaped(self, code: int) -> None: + """PROPERTY: C0 control chars (0x00-0x1F) use \\uHHHH. + + Events emitted: + - control_char_code={n}: Character code point + - outcome=control_char_escaped: Escape verified + """ + event(f"control_char_code={code}") + + char = chr(code) + lit = StringLiteral(value=f"a{char}b") + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=lit),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + expected_escape = f"\\u{code:04X}" + assert expected_escape in result + event("outcome=control_char_escaped") + + def test_del_char_escaped(self) -> None: + """DEL character (0x7F) uses \\u007F encoding.""" + lit = StringLiteral(value="a\x7Fb") + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=lit),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + assert "\\u007F" in result + + @given( + code=st.sampled_from( + [0x00, 0x01, 0x08, 0x09, 0x0A, 0x0C, 0x0D, + 0x1B, 0x1F, 0x7F] + ), + ) + def test_control_char_roundtrip(self, code: int) -> None: + """PROPERTY: Control chars roundtrip through parse/serialize. + + Events emitted: + - control_char_code={n}: Character code point + - outcome=control_roundtrip_ok: Roundtrip succeeded + """ + event(f"control_char_code={code}") + + char = chr(code) + lit = StringLiteral(value=f"x{char}y") + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=lit),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource, validate=True) + parser = FluentParserV1() + reparsed = parser.parse(serialized) + assert len(reparsed.entries) == 1 + assert not any( + isinstance(e, Junk) for e in reparsed.entries + ) + event("outcome=control_roundtrip_ok") diff --git a/tests/fuzz_syntax_serializer_property_cases/coverage_targeted_tests_branch_coverage.py b/tests/fuzz_syntax_serializer_property_cases/coverage_targeted_tests_branch_coverage.py new file mode 100644 index 00000000..413b4450 --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/coverage_targeted_tests_branch_coverage.py @@ -0,0 +1,144 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Coverage-Targeted Tests (Branch Coverage) +# ============================================================================= + + +class TestCoverageTargeted: + """Tests targeting specific coverage gaps.""" + + @given(func_ref=ftl_function_references_no_args()) + def test_function_reference_without_arguments(self, func_ref: FunctionReference) -> None: + """COVERAGE: Branch 238 - FunctionReference without arguments. + + Events emitted: + - coverage_target=function_no_args: Branch target + """ + event("coverage_target=function_no_args") + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + serialized = serialize(resource, validate=True) + + # Should contain function name followed by empty parens + assert f"{func_ref.id.name}()" in serialized + + @given(junk=ftl_junk_nodes()) + def test_junk_serialization(self, junk: Junk) -> None: + """COVERAGE: Branch 429 - Junk serialization. + + Events emitted: + - coverage_target=junk: Branch target + - junk_has_trailing_newline={bool}: Content structure + """ + event("coverage_target=junk") + event(f"junk_has_trailing_newline={junk.content.endswith('\\n')}") + + resource = Resource(entries=(junk,)) + + serialized = serialize(resource, validate=False) # Junk may be invalid + + # Junk content should be preserved as-is (with trailing newline added if missing) + if junk.content.endswith("\n"): + assert junk.content in serialized + else: + assert junk.content + "\n" in serialized + + @given(select_expr=ftl_select_expressions_with_number_keys()) + def test_select_expression_number_keys(self, select_expr: SelectExpression) -> None: + """COVERAGE: Branch 804 - NumberLiteral variant keys. + + Events emitted: + - coverage_target=select_number_keys: Branch target + """ + event("coverage_target=select_number_keys") + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=select_expr),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + serialized = serialize(resource, validate=True) + + # Should contain numeric variant keys + assert "[0]" in serialized or "[1]" in serialized + + @given(placeable=ftl_placeables()) + def test_placeable_in_pattern(self, placeable: Placeable) -> None: + """COVERAGE: Branch 616 - Placeable in pattern. + + Events emitted: + - coverage_target=placeable_in_pattern: Branch target + - placeable_expr_type={type}: Expression type + """ + event("coverage_target=placeable_in_pattern") + event(f"placeable_expr_type={type(placeable.expression).__name__}") + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(placeable,)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + serialized = serialize(resource, validate=True) + + # Should contain placeable delimiters + assert "{ " in serialized + assert " }" in serialized + + @given(select_expr=ftl_select_expressions()) + def test_select_expression_serialization(self, select_expr: SelectExpression) -> None: + """COVERAGE: Branch 749 - SelectExpression serialization. + + Events emitted: + - coverage_target=select_expression: Branch target + - variant_count={n}: Number of variants + """ + event("coverage_target=select_expression") + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=select_expr),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + serialized = serialize(resource, validate=True) + + # Emit variant count for HypoFuzz + event(f"variant_count={len(select_expr.variants)}") + + # Should contain select syntax + assert "->" in serialized + # Should contain at least one default variant marker + assert "*[" in serialized + + @given(comment=ftl_comment_nodes()) + def test_comment_serialization(self, comment: Comment) -> None: + """COVERAGE: Comment serialization. + + Events emitted: + - coverage_target=comment: Branch target + - comment_type={type}: Comment type + """ + event("coverage_target=comment") + event(f"comment_type={comment.type.name}") + + resource = Resource(entries=(comment,)) + + serialized = serialize(resource, validate=False) + + # Should contain comment prefix + assert "#" in serialized diff --git a/tests/fuzz_syntax_serializer_property_cases/depth_properties_do_s_protection.py b/tests/fuzz_syntax_serializer_property_cases/depth_properties_do_s_protection.py new file mode 100644 index 00000000..12fc1a02 --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/depth_properties_do_s_protection.py @@ -0,0 +1,84 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Depth Properties (DoS Protection) +# ============================================================================= + + +class TestDepthProperties: + """Test max_depth protection against stack overflow.""" + + @given(deep_placeable=ftl_deep_placeables(depth=5)) + def test_moderate_depth_succeeds(self, deep_placeable: Placeable) -> None: + """PROPERTY: Moderately nested ASTs serialize successfully. + + Events emitted: + - depth=moderate: Depth category + """ + event("depth=moderate") + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(deep_placeable,)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + # Should succeed with default max_depth + serialized = serialize(resource, validate=True, max_depth=MAX_DEPTH) + assert isinstance(serialized, str) + + def test_extreme_depth_raises_depth_error(self) -> None: + """COVERAGE: SerializationDepthError on overflow.""" + + # Build deeply nested structure exceeding limit + # Start with innermost expression + inner_expr = VariableReference(id=Identifier(name="x")) + + # Wrap in 150 nested placeables (exceeds default MAX_DEPTH=100) + current: Placeable | VariableReference = inner_expr + for _ in range(150): + current = Placeable(expression=current) + + # After loop, current is guaranteed to be Placeable + outermost_placeable = typing.cast("Placeable", current) + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(outermost_placeable,)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + with pytest.raises(SerializationDepthError, match="depth limit exceeded"): + serialize(resource, validate=True, max_depth=MAX_DEPTH) + + def test_custom_max_depth_respected(self) -> None: + """COVERAGE: Custom max_depth parameter.""" + + # Build structure with 10 nested placeables + inner_expr = VariableReference(id=Identifier(name="x")) + current: Placeable | VariableReference = inner_expr + for _ in range(10): + current = Placeable(expression=current) + + # After loop, current is guaranteed to be Placeable + outermost_placeable = typing.cast("Placeable", current) + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(outermost_placeable,)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + # Should fail with max_depth=5 + with pytest.raises(SerializationDepthError): + serialize(resource, validate=True, max_depth=5) + + # Should succeed with max_depth=15 + serialized = serialize(resource, validate=True, max_depth=15) + assert isinstance(serialized, str) diff --git a/tests/fuzz_syntax_serializer_property_cases/entry_sequencing_properties_junk_comment_message_ordering.py b/tests/fuzz_syntax_serializer_property_cases/entry_sequencing_properties_junk_comment_message_ordering.py new file mode 100644 index 00000000..1e0cc98f --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/entry_sequencing_properties_junk_comment_message_ordering.py @@ -0,0 +1,163 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Entry Sequencing Properties (Junk/Comment/Message ordering) +# ============================================================================= + + +class TestEntrySequencingProperties: + """Test blank-line insertion logic for mixed entry sequences. + + Serializer handles spacing between entries: extra blank lines + for adjacent comments of same type, Junk with leading + whitespace, Message/Term compact separation. + """ + + @given( + data=st.data(), + count=st.integers(min_value=2, max_value=5), + ) + @settings(deadline=None, suppress_health_check=[HealthCheck.too_slow]) + def test_mixed_entry_sequences_parseable( + self, data: st.DataObject, count: int + ) -> None: + """PROPERTY: Mixed entry sequences serialize to parseable FTL. + + Events emitted: + - entry_count={n}: Number of entries + - has_junk={bool}: Whether Junk entries present + - has_comment={bool}: Whether Comment entries present + - outcome=sequence_parseable: Output parses without error + """ + event(f"entry_count={count}") + + entries: list[Message | Term | Comment | Junk] = [] + seen_ids: set[str] = set() + has_junk = False + has_comment = False + + for i in range(count): + choice = data.draw( + st.sampled_from( + ["message", "term", "comment", "junk"] + ) + ) + if choice == "message": + name = f"msg{i}" + if name not in seen_ids: + seen_ids.add(name) + entries.append( + Message( + id=Identifier(name=name), + value=Pattern( + elements=( + TextElement(value="val"), + ) + ), + attributes=(), + ) + ) + elif choice == "term": + name = f"term{i}" + if name not in seen_ids: + seen_ids.add(name) + entries.append( + Term( + id=Identifier(name=name), + value=Pattern( + elements=( + TextElement(value="val"), + ) + ), + attributes=(), + ) + ) + elif choice == "comment": + has_comment = True + ctype = data.draw( + st.sampled_from([ + CommentType.COMMENT, + CommentType.GROUP, + CommentType.RESOURCE, + ]) + ) + entries.append( + Comment( + content=f"comment {i}", + type=ctype, + ) + ) + else: + has_junk = True + entries.append( + Junk(content=f"junk line {i}\n") + ) + + event(f"has_junk={has_junk}") + event(f"has_comment={has_comment}") + + if not entries: + return + + resource = Resource(entries=tuple(entries)) + result = serialize(resource, validate=False) + + parser = FluentParserV1() + reparsed = parser.parse(result) + assert len(reparsed.entries) > 0 + event("outcome=sequence_parseable") + + @given( + junk_count=st.integers(min_value=1, max_value=3), + msg_count=st.integers(min_value=1, max_value=3), + ) + def test_junk_between_messages( + self, junk_count: int, msg_count: int + ) -> None: + """PROPERTY: Junk interleaved with Messages serializes. + + Events emitted: + - junk_count={n}: Number of Junk entries + - msg_count={n}: Number of Message entries + - outcome=junk_interleaved_ok: Serialization succeeded + """ + event(f"junk_count={junk_count}") + event(f"msg_count={msg_count}") + + entries: list[Message | Junk] = [] + for i in range(msg_count): + entries.append( + Message( + id=Identifier(name=f"m{i}"), + value=Pattern( + elements=(TextElement(value="v"),) + ), + attributes=(), + ) + ) + if i < junk_count: + entries.append( + Junk(content=f"bad syntax {i}\n") + ) + + resource = Resource(entries=tuple(entries)) + result = serialize(resource, validate=False) + assert isinstance(result, str) + assert len(result) > 0 + event("outcome=junk_interleaved_ok") + + def test_adjacent_same_type_comments_separated( + self, + ) -> None: + """Adjacent same-type comments get extra blank line.""" + entries = ( + Comment(content="first", type=CommentType.COMMENT), + Comment(content="second", type=CommentType.COMMENT), + ) + resource = Resource(entries=entries) + result = serialize(resource, validate=False) + # Double newline separates same-type comments + assert "\n\n" in result diff --git a/tests/fuzz_syntax_serializer_property_cases/escape_text_property_tests.py b/tests/fuzz_syntax_serializer_property_cases/escape_text_property_tests.py new file mode 100644 index 00000000..f1e9db2a --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/escape_text_property_tests.py @@ -0,0 +1,46 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# _escape_text Property Tests +# ============================================================================= + + +class TestEscapeTextProperties: + """Property-based tests for _escape_text brace escaping. + + Properties verified: + - Content preserved: unescaping the result recovers the original + - No raw braces in non-placeable positions + """ + + @given(text=st.text(min_size=0, max_size=100)) + def test_content_roundtrip(self, text: str) -> None: + """Unescaping placeable wrappers recovers original text.""" + output: list[str] = [] + _escape_text(text, output) + result = "".join(output) + has_braces = "{" in text or "}" in text + event(f"has_braces={has_braces}") + event(f"length={len(text)}") + # Reverse the escaping + recovered = result.replace('{ "{" }', "{").replace('{ "}" }', "}") + assert recovered == text + + @given(text=st.text( + alphabet=st.characters( + codec="utf-8", + exclude_characters="{}", + ), + min_size=0, + max_size=100, + )) + def test_no_transformation_without_braces(self, text: str) -> None: + """Text without braces passes through unchanged.""" + output: list[str] = [] + _escape_text(text, output) + result = "".join(output) + event(f"length={len(text)}") + assert result == text diff --git a/tests/fuzz_syntax_serializer_property_cases/roundtrip_properties_core_correctness.py b/tests/fuzz_syntax_serializer_property_cases/roundtrip_properties_core_correctness.py new file mode 100644 index 00000000..45dc03ba --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/roundtrip_properties_core_correctness.py @@ -0,0 +1,140 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Roundtrip Properties (Core Correctness) +# ============================================================================= + + +class TestRoundtripProperties: + """Test roundtrip correctness: parse(serialize(ast)) preserves structure.""" + + @given(resource=ftl_resources()) + @settings(deadline=None, suppress_health_check=[HealthCheck.too_slow]) + def test_resource_roundtrip_preserves_structure(self, resource: Resource) -> None: + """PROPERTY: Serialized resources can be parsed back to equivalent AST. + + Events emitted: + - entry_count={n}: Number of entries in resource + - entry_type={type}: Type of each entry encountered + """ + # Emit entry count for HypoFuzz coverage + event(f"entry_count={len(resource.entries)}") + + # Serialize the resource + serialized = serialize(resource, validate=True) + + # Parse the serialized output + parser = FluentParserV1() + reparsed = parser.parse(serialized) + + # Emit entry types for HypoFuzz coverage + for entry in resource.entries: + event(f"entry_type={type(entry).__name__}") + + # Verify entry count preserved (no parse errors mean no Junk entries added) + assert len(reparsed.entries) == len(resource.entries) + + @given(message=ftl_message_nodes()) + def test_message_roundtrip_idempotence(self, message: Message) -> None: + """PROPERTY: serialize(parse(serialize(ast))) == serialize(ast). + + Idempotence ensures serialization is stable across multiple cycles. + + Events emitted: + - has_attributes={bool}: Whether message has attributes + - attribute_count={n}: Number of attributes + - pattern_starts_with_space={bool}: Edge case tracking + """ + # Track leading-space edge case for HypoFuzz coverage guidance. + pattern_value = message.value + starts_with_space = False + if pattern_value and pattern_value.elements: + first_elem = pattern_value.elements[0] + if isinstance(first_elem, TextElement) and first_elem.value.startswith(" "): + starts_with_space = True + + event(f"pattern_starts_with_space={starts_with_space}") + + resource = Resource(entries=(message,)) + + # Emit attribute coverage events + event(f"has_attributes={len(message.attributes) > 0}") + if message.attributes: + event(f"attribute_count={len(message.attributes)}") + + # First serialization + serialized1 = serialize(resource, validate=True) + + # Parse and re-serialize + parser = FluentParserV1() + reparsed = parser.parse(serialized1) + serialized2 = serialize(reparsed, validate=True) + + # Idempotence: second serialization matches first + assert serialized1 == serialized2 + + @given(term=ftl_term_nodes()) + def test_term_roundtrip_idempotence(self, term: Term) -> None: + """PROPERTY: Terms serialize idempotently. + + Events emitted: + - has_attributes={bool}: Whether term has attributes + - pattern_starts_with_space={bool}: Edge case tracking + """ + # Track leading-space edge case for HypoFuzz coverage guidance. + pattern_value = term.value + starts_with_space = False + if pattern_value and pattern_value.elements: + first_elem = pattern_value.elements[0] + if isinstance(first_elem, TextElement) and first_elem.value.startswith(" "): + starts_with_space = True + + event(f"pattern_starts_with_space={starts_with_space}") + + resource = Resource(entries=(term,)) + + event(f"has_attributes={len(term.attributes) > 0}") + + serialized1 = serialize(resource, validate=True) + + parser = FluentParserV1() + reparsed = parser.parse(serialized1) + serialized2 = serialize(reparsed, validate=True) + + assert serialized1 == serialized2 + + @given(pattern=ftl_patterns()) + def test_pattern_roundtrip_preserves_elements(self, pattern: Pattern) -> None: + """PROPERTY: Pattern serialization preserves all elements. + + Events emitted: + - element_count={n}: Number of elements in pattern + - element_type={type}: Type of each element + - has_placeable={bool}: Whether pattern contains placeables + """ + # Wrap pattern in a message + message = Message( + id=Identifier(name="test"), + value=pattern, + attributes=(), + ) + resource = Resource(entries=(message,)) + + # Emit pattern structure events + event(f"element_count={len(pattern.elements)}") + has_placeable = any(isinstance(e, Placeable) for e in pattern.elements) + event(f"has_placeable={has_placeable}") + + for element in pattern.elements: + event(f"element_type={type(element).__name__}") + + serialized = serialize(resource, validate=True) + + parser = FluentParserV1() + reparsed = parser.parse(serialized) + + # Verify no parse errors (no Junk entries) and correct entry count + assert len(reparsed.entries) == 1 diff --git a/tests/fuzz_syntax_serializer_property_cases/separate_line_trigger_discrimination.py b/tests/fuzz_syntax_serializer_property_cases/separate_line_trigger_discrimination.py new file mode 100644 index 00000000..f91b710f --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/separate_line_trigger_discrimination.py @@ -0,0 +1,119 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Separate-Line Trigger Discrimination +# ============================================================================= + + +class TestSeparateLineTriggerProperties: + """Test separate-line mode trigger discrimination. + + Two distinct triggers exist: + 1. Cross-element: TextElement starts with space after + element ending with newline. + 2. Intra-element: Single TextElement has embedded newline + followed by space on a NORMAL line. + """ + + @given( + n_spaces=st.integers(min_value=1, max_value=8), + ) + def test_cross_element_trigger( + self, n_spaces: int + ) -> None: + """PROPERTY: Cross-element whitespace triggers separate-line. + + Events emitted: + - trigger=cross_element: Trigger type + - leading_spaces={n}: Number of leading spaces + """ + event("trigger=cross_element") + event(f"leading_spaces={n_spaces}") + + # Element 1 ends with newline, element 2 starts with + # spaces — triggers separate-line mode. + elems = ( + TextElement(value="line one\n"), + TextElement(value=" " * n_spaces + "line two"), + ) + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=elems), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource, validate=True) + # Separate-line: pattern on new line after = + assert "test = \n " in result + + @given( + n_spaces=st.integers(min_value=1, max_value=8), + ) + def test_intra_element_trigger( + self, n_spaces: int + ) -> None: + """PROPERTY: Intra-element whitespace triggers separate-line. + + Events emitted: + - trigger=intra_element: Trigger type + - leading_spaces={n}: Number of leading spaces + """ + event("trigger=intra_element") + event(f"leading_spaces={n_spaces}") + + # Single element with embedded \n + spaces + NORMAL char + text_val = f"line one\n{' ' * n_spaces}line two" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(TextElement(value=text_val),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource, validate=True) + # Separate-line: pattern on new line after = + assert "test = \n " in result + + @given( + syntax_char=st.sampled_from([".", "*", "["]), + n_spaces=st.integers(min_value=1, max_value=6), + ) + def test_syntax_leading_does_not_trigger_separate_line( + self, syntax_char: str, n_spaces: int + ) -> None: + """PROPERTY: SYNTAX_LEADING lines DON'T trigger separate-line. + + Events emitted: + - trigger=syntax_not_separate: Negative case + - syntax_char={char}: Which syntax char + """ + event("trigger=syntax_not_separate") + event(f"syntax_char={syntax_char}") + + # Embedded \n + spaces + syntax char => SYNTAX_LEADING, + # which is handled by per-line wrapping, NOT separate-line. + line = " " * n_spaces + syntax_char + "rest" + text_val = f"line one\n{line}" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(TextElement(value=text_val),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource, validate=True) + # Should NOT use separate-line mode + assert result.startswith("test = ") + assert not result.startswith("test = \n") + + +# ============================================================================= +# Mark as fuzz tests for selective execution +# ============================================================================= + +pytestmark = pytest.mark.fuzz diff --git a/tests/fuzz_syntax_serializer_property_cases/serializer_class_tests_direct_class_usage.py b/tests/fuzz_syntax_serializer_property_cases/serializer_class_tests_direct_class_usage.py new file mode 100644 index 00000000..8cce92d5 --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/serializer_class_tests_direct_class_usage.py @@ -0,0 +1,47 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Serializer Class Tests (Direct Class Usage) +# ============================================================================= + + +class TestFluentSerializerClass: + """Test FluentSerializer class directly (not just convenience function).""" + + @given(resource=ftl_resources()) + def test_serializer_instance_reusable(self, resource: Resource) -> None: + """PROPERTY: FluentSerializer instances are reusable (thread-safe). + + Events emitted: + - serializer=reused: Reuse tracking + """ + event("serializer=reused") + + serializer = FluentSerializer() + + # Use same instance twice + result1 = serializer.serialize(resource, validate=True) + result2 = serializer.serialize(resource, validate=True) + + # Should produce identical results (no state mutation) + assert result1 == result2 + + @given(message=ftl_message_nodes()) + def test_serializer_matches_convenience_function(self, message: Message) -> None: + """PROPERTY: FluentSerializer.serialize() == serialize(). + + Events emitted: + - serializer=class_vs_function: Comparison tracking + """ + event("serializer=class_vs_function") + + resource = Resource(entries=(message,)) + + serializer = FluentSerializer() + class_result = serializer.serialize(resource, validate=True) + func_result = serialize(resource, validate=True) + + assert class_result == func_result diff --git a/tests/fuzz_syntax_serializer_property_cases/special_character_handling_tests.py b/tests/fuzz_syntax_serializer_property_cases/special_character_handling_tests.py new file mode 100644 index 00000000..9d819968 --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/special_character_handling_tests.py @@ -0,0 +1,187 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Special Character Handling Tests +# ============================================================================= + + +class TestSpecialCharacterHandling: + """Test proper escaping and handling of special characters.""" + + @given( + text=st.text( + alphabet=st.characters( + blacklist_categories=["Cs", "Cc"], # Surrogates and control + blacklist_characters=["\x00"], # Null + ), + min_size=1, + max_size=50, + ) + ) + def test_string_literal_escaping_roundtrip(self, text: str) -> None: + """PROPERTY: String literals with special chars roundtrip correctly. + + Events emitted: + - has_backslash={bool}: Contains backslash + - has_quote={bool}: Contains quote + - has_newline={bool}: Contains newline + """ + has_backslash = "\\\\" in text + has_quote = '"' in text + has_newline = "\\n" in text + event(f"has_backslash={has_backslash}") + event(f"has_quote={has_quote}") + event(f"has_newline={has_newline}") + + string_lit = StringLiteral(value=text) + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=string_lit),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + serialized = serialize(resource, validate=True) + + parser = FluentParserV1() + reparsed = parser.parse(serialized) + + # Verify no parse errors (no Junk entries means successful parse) + assert len(reparsed.entries) > 0 + + def test_brace_escaping_as_placeable(self) -> None: + """COVERAGE: Braces must be escaped as placeables.""" + + # Braces in text are represented as Placeable(StringLiteral) + pattern = Pattern( + elements=( + TextElement(value="Start "), + Placeable(expression=StringLiteral(value="{")), + TextElement(value=" middle "), + Placeable(expression=StringLiteral(value="}")), + TextElement(value=" end"), + ) + ) + + message = Message(id=Identifier(name="test"), value=pattern, attributes=()) + resource = Resource(entries=(message,)) + + serialized = serialize(resource, validate=True) + + # Should contain escaped braces as placeables + assert '{ "{" }' in serialized + assert '{ "}" }' in serialized + + def test_multiline_pattern_indentation(self) -> None: + """COVERAGE: Multiline patterns get proper indentation.""" + + # Pattern with embedded newline + pattern = Pattern( + elements=( + TextElement(value="Line 1\n"), + TextElement(value="Line 2"), + ) + ) + + message = Message(id=Identifier(name="test"), value=pattern, attributes=()) + resource = Resource(entries=(message,)) + + serialized = serialize(resource, validate=True) + + # Should contain structural indentation after newline + assert "Line 1\n Line 2" in serialized + + +# ============================================================================= +# _classify_line Property Tests +# ============================================================================= + + +# Characters syntactically significant at continuation line start in FTL +_SYNTAX_CHARS = ".[*" + + +class TestClassifyLineProperties: + """Property-based tests for _classify_line pure function. + + Properties verified: + - EMPTY iff empty string + - WHITESPACE_ONLY iff all spaces and non-empty + - SYNTAX_LEADING iff first non-ws char is in {., *, [} + - ws_len is always non-negative + - Classification is exhaustive (always one of 4 kinds) + """ + + @given(line=st.text( + alphabet=st.characters( + codec="utf-8", categories=("L", "N", "P", "S", "Z") + ), + min_size=0, + max_size=80, + )) + def test_output_is_valid_kind(self, line: str) -> None: + """_classify_line always returns a valid _LineKind.""" + kind, ws_len = _classify_line(line) + kind_name = kind.name + event(f"kind={kind_name}") + assert isinstance(kind, _LineKind) + assert ws_len >= 0 + + @given(line=st.text( + alphabet=st.characters( + codec="utf-8", categories=("L", "N", "P", "S", "Z") + ), + min_size=0, + max_size=80, + )) + def test_empty_iff_empty_string(self, line: str) -> None: + """EMPTY kind iff input is the empty string.""" + kind, _ = _classify_line(line) + is_empty = kind is _LineKind.EMPTY + event(f"empty={is_empty}") + assert is_empty == (line == "") + + @given(n=st.integers(min_value=1, max_value=20)) + def test_whitespace_only_for_space_strings(self, n: int) -> None: + """Strings of only spaces classify as WHITESPACE_ONLY.""" + line = " " * n + kind, ws_len = _classify_line(line) + event(f"spaces={n}") + assert kind is _LineKind.WHITESPACE_ONLY + assert ws_len == 0 + + @given( + ws=st.integers(min_value=0, max_value=10), + syntax_char=st.sampled_from(list(_SYNTAX_CHARS)), + suffix=st.text(min_size=0, max_size=20), + ) + def test_syntax_leading_classification( + self, ws: int, syntax_char: str, suffix: str + ) -> None: + """Lines starting with (optional ws + syntax char) are SYNTAX_LEADING.""" + line = " " * ws + syntax_char + suffix + kind, ws_len = _classify_line(line) + event(f"syntax_char={syntax_char}") + event(f"ws_prefix={ws}") + assert kind is _LineKind.SYNTAX_LEADING + assert ws_len == ws + + @given( + ws=st.integers(min_value=0, max_value=10), + first_char=st.characters( + codec="utf-8", + categories=("L", "N"), + ), + suffix=st.text(min_size=0, max_size=20), + ) + def test_normal_for_non_syntax_first_char( + self, ws: int, first_char: str, suffix: str + ) -> None: + """Lines where first non-ws char is not syntax are NORMAL.""" + line = " " * ws + first_char + suffix + kind, _ = _classify_line(line) + event(f"kind={kind.name}") + assert kind is _LineKind.NORMAL diff --git a/tests/fuzz_syntax_serializer_property_cases/syntax_leading_roundtrip_properties_full_path.py b/tests/fuzz_syntax_serializer_property_cases/syntax_leading_roundtrip_properties_full_path.py new file mode 100644 index 00000000..b06e8245 --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/syntax_leading_roundtrip_properties_full_path.py @@ -0,0 +1,150 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# SYNTAX_LEADING Roundtrip Properties (Full Path) +# ============================================================================= + + +class TestSyntaxLeadingRoundtripProperties: + """Test full serialize-parse-serialize for syntax-leading lines. + + Continuation lines starting with . * [ need wrapping as + StringLiteral placeables to prevent parser misinterpretation. + """ + + _parser = FluentParserV1() + + @given( + syntax_char=st.sampled_from([".", "*", "["]), + ws=st.integers(min_value=0, max_value=6), + suffix=st.text( + alphabet=st.characters( + codec="utf-8", + categories=("L", "N"), + ), + min_size=0, + max_size=20, + ), + ) + def test_syntax_leading_roundtrip( + self, syntax_char: str, ws: int, suffix: str + ) -> None: + """PROPERTY: Syntax-leading continuation lines roundtrip. + + Events emitted: + - syntax_char={char}: Which syntax character + - ws_prefix={n}: Leading whitespace before syntax char + - has_suffix={bool}: Whether trailing text follows + - line_kind=SYNTAX_LEADING: Confirm classification + """ + event(f"syntax_char={syntax_char}") + event(f"ws_prefix={ws}") + has_suffix = len(suffix) > 0 + event(f"has_suffix={has_suffix}") + + line = " " * ws + syntax_char + suffix + kind, _ = _classify_line(line) + event(f"line_kind={kind.name}") + + text_val = f"line1\n{line}" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(TextElement(value=text_val),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + + # Must contain the syntax char wrapped as placeable + escaped = f'{{ "{syntax_char}" }}' + assert escaped in result + + # Parse: no Junk entries + reparsed = self._parser.parse(result) + assert not any( + isinstance(e, Junk) + for e in reparsed.entries + ) + + @given( + syntax_char=st.sampled_from([".", "*", "["]), + ) + def test_syntax_char_only_roundtrip( + self, syntax_char: str + ) -> None: + """PROPERTY: Line with only syntax char roundtrips. + + Events emitted: + - syntax_char={char}: Which syntax character + - line_kind=SYNTAX_LEADING: Classification + - has_suffix=False: No trailing text + """ + event(f"syntax_char={syntax_char}") + event("has_suffix=False") + + kind, _ = _classify_line(syntax_char) + event(f"line_kind={kind.name}") + + text_val = f"first line\n{syntax_char}" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(TextElement(value=text_val),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + escaped = f'{{ "{syntax_char}" }}' + assert escaped in result + + reparsed = self._parser.parse(result) + assert not any( + isinstance(e, Junk) + for e in reparsed.entries + ) + + @given( + n_spaces=st.integers(min_value=1, max_value=10), + ) + def test_whitespace_only_continuation_roundtrip( + self, n_spaces: int + ) -> None: + """PROPERTY: Whitespace-only continuation lines roundtrip. + + Events emitted: + - spaces={n}: Number of spaces + - line_kind=WHITESPACE_ONLY: Classification + """ + event(f"spaces={n_spaces}") + + ws_line = " " * n_spaces + kind, _ = _classify_line(ws_line) + event(f"line_kind={kind.name}") + + text_val = f"first line\n{ws_line}\nthird line" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(TextElement(value=text_val),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + # Whitespace-only wrapped as placeable + assert f'{{ "{ws_line}" }}' in result + + reparsed = self._parser.parse(result) + assert not any( + isinstance(e, Junk) + for e in reparsed.entries + ) diff --git a/tests/fuzz_syntax_serializer_property_cases/validation_properties_error_handling.py b/tests/fuzz_syntax_serializer_property_cases/validation_properties_error_handling.py new file mode 100644 index 00000000..27b6187c --- /dev/null +++ b/tests/fuzz_syntax_serializer_property_cases/validation_properties_error_handling.py @@ -0,0 +1,156 @@ +# mypy: ignore-errors +"""Split test cases from tests/fuzz/test_syntax_serializer_property.py.""" + +from tests.fuzz_syntax_serializer_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Validation Properties (Error Handling) +# ============================================================================= + + +class TestValidationProperties: + """Test validation error detection for invalid ASTs.""" + + def test_select_no_defaults_raises_validation_error(self) -> None: + """COVERAGE: Lines 117-118 - SelectExpression with 0 defaults.""" + + # Build invalid SelectExpression with no defaults + invalid_select = build_invalid_select_no_defaults() + + # Wrap in a message + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=invalid_select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + # Validation should catch the error + with pytest.raises(SerializationValidationError, match="no default variant"): + serialize(resource, validate=True) + + def test_select_multiple_defaults_raises_validation_error(self) -> None: + """COVERAGE: Lines 121-125 - SelectExpression with >1 defaults.""" + + # Build invalid SelectExpression with multiple defaults + invalid_select = build_invalid_select_multiple_defaults() + + # Wrap in a message + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=invalid_select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + # Validation should catch the error + with pytest.raises(SerializationValidationError, match="2 default variants"): + serialize(resource, validate=True) + + @given(message=ftl_message_nodes()) + def test_valid_ast_passes_validation(self, message: Message) -> None: + """PROPERTY: Valid ASTs pass validation without error. + + Events emitted: + - validation=passed: Successful validation + """ + resource = Resource(entries=(message,)) + + event("validation=passed") + + # Should not raise + serialized = serialize(resource, validate=True) + assert isinstance(serialized, str) + + def test_validation_can_be_disabled(self) -> None: + """COVERAGE: validate=False parameter skips validation.""" + + # Build invalid SelectExpression + invalid_select = build_invalid_select_no_defaults() + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=invalid_select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + # Should not raise when validate=False + serialized = serialize(resource, validate=False) + assert isinstance(serialized, str) + + def test_invalid_identifier_raises_validation_error(self) -> None: + """COVERAGE: Invalid identifier validation.""" + + # Create message with invalid identifier (empty string) + # Bypass validation by using object.__new__ + identifier = object.__new__(Identifier) + object.__setattr__(identifier, "name", "") # Invalid: empty + object.__setattr__(identifier, "span", None) + + message = Message( + id=identifier, + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + with pytest.raises(SerializationValidationError, match="Invalid identifier"): + serialize(resource, validate=True) + + def test_duplicate_named_arguments_raises_validation_error(self) -> None: + """COVERAGE: Duplicate named arguments validation.""" + + # Create function call with duplicate named arguments + func_ref = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="style"), + value=StringLiteral(value="currency"), + ), + NamedArgument( + name=Identifier(name="style"), # Duplicate! + value=StringLiteral(value="percent"), + ), + ), + ), + ) + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + with pytest.raises(SerializationValidationError, match="Duplicate named argument"): + serialize(resource, validate=True) + + def test_invalid_named_argument_value_type_raises_error(self) -> None: + """COVERAGE: Named argument value type validation.""" + + # Create function call with invalid named argument value (not literal) + func_ref = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="style"), + value=cast("FTLLiteral", VariableReference(id=Identifier(name="var"))), + ), + ), + ), + ) + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + with pytest.raises(SerializationValidationError, match="invalid value type"): + serialize(resource, validate=True) diff --git a/tests/integration_e2e_cases/__init__.py b/tests/integration_e2e_cases/__init__.py new file mode 100644 index 00000000..2f1afac5 --- /dev/null +++ b/tests/integration_e2e_cases/__init__.py @@ -0,0 +1,74 @@ +"""End-to-end tests for parse->format workflow integration. + +Tests the complete pipeline from FTL source to formatted output: +- Parse FTL source with parse_ftl() +- Add to FluentBundle via add_resource() +- Format with format_pattern() +- Verify round-trip produces expected results + +These tests validate that parsing and formatting work together correctly +as an integrated system, not just as isolated components. + +Note: "Bidirectional" refers to the two-way workflow (parse->format), not +bidirectional text handling or currency/number parsing from strings. + +Structure: + - TestParseFormatBasic: Essential round-trip tests (run in every CI build) + - TestParseFormatWithVariables: Variable interpolation round-trips + - TestParseFormatSelectExpressions: Select expression round-trips + - TestParseFormatReferences: Message/term reference round-trips + - TestParseFormatEdgeCases: Edge cases and unicode handling + - TestParseFormatWithFunctions: Built-in function integration + - TestParseFormatErrorHandling: Error paths in integration + - TestParseFormatIntrospection: Introspection API integration + - TestParseFormatValidation: Validation API integration + - TestParseFormatWithCache: Caching behavior integration + - TestParseFormatIsolation: Unicode isolation mark behavior + - TestSerializeParseRoundtrip: AST serialization round-trips + - TestMultiModuleIntegration: parse->validate->serialize->introspect pipeline + - TestValidationRuntimeConsistency: validation warnings predict runtime failures +""" + +from __future__ import annotations + +from datetime import UTC, datetime +from decimal import Decimal + +import pytest + +from ftllexengine import ( + FluentBundle, + parse_ftl, + serialize_ftl, +) +from ftllexengine.constants import MAX_DEPTH +from ftllexengine.diagnostics import DiagnosticCode, ErrorCategory, FrozenFluentError +from ftllexengine.introspection import introspect_message +from ftllexengine.runtime.cache_config import CacheConfig +from ftllexengine.syntax.ast import Junk, Message, NumberLiteral, Term +from ftllexengine.syntax.parser import FluentParserV1 +from ftllexengine.syntax.serializer import serialize +from ftllexengine.validation.resource import validate_resource + +__all__ = [ + "MAX_DEPTH", + "UTC", + "CacheConfig", + "Decimal", + "DiagnosticCode", + "ErrorCategory", + "FluentBundle", + "FluentParserV1", + "FrozenFluentError", + "Junk", + "Message", + "NumberLiteral", + "Term", + "datetime", + "introspect_message", + "parse_ftl", + "pytest", + "serialize", + "serialize_ftl", + "validate_resource", +] diff --git a/tests/integration_e2e_cases/essential_parse_format_tests_run_in_every_ci_build.py b/tests/integration_e2e_cases/essential_parse_format_tests_run_in_every_ci_build.py new file mode 100644 index 00000000..2f83abe8 --- /dev/null +++ b/tests/integration_e2e_cases/essential_parse_format_tests_run_in_every_ci_build.py @@ -0,0 +1,694 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_integration_e2e.py.""" + +from tests.integration_e2e_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Essential Parse->Format Tests (Run in every CI build) +# ============================================================================= + + +class TestParseFormatBasic: + """Essential tests for parse->format round-trip.""" + + def test_simple_message_roundtrip(self) -> None: + """Simple message parses and formats correctly.""" + ftl_source = "hello = Hello, World!" + + # Verify parsing produces valid AST + resource = parse_ftl(ftl_source) + assert len(resource.entries) == 1 + assert isinstance(resource.entries[0], Message) + + # Verify formatting produces expected output + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("hello") + assert result == "Hello, World!" + assert len(errors) == 0 + + def test_multiple_messages_roundtrip(self) -> None: + """Multiple messages parse and format correctly.""" + ftl_source = """ +msg1 = First message +msg2 = Second message +msg3 = Third message +""" + # Verify parsing + resource = parse_ftl(ftl_source) + messages = [e for e in resource.entries if isinstance(e, Message)] + assert len(messages) == 3 + + # Verify formatting + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result1, _ = bundle.format_pattern("msg1") + result2, _ = bundle.format_pattern("msg2") + result3, _ = bundle.format_pattern("msg3") + + assert result1 == "First message" + assert result2 == "Second message" + assert result3 == "Third message" + + def test_multiline_pattern_roundtrip(self) -> None: + """Multiline patterns parse and format correctly.""" + ftl_source = """ +multi = First line + Second line + Third line +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("multi") + assert "First line" in result + assert "Second line" in result + assert "Third line" in result + assert len(errors) == 0 + + def test_message_with_attribute_roundtrip(self) -> None: + """Messages with attributes parse and format correctly.""" + ftl_source = """ +button = Click here + .accesskey = C + .title = Submit form +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + # Format main value + result, _ = bundle.format_pattern("button") + assert result == "Click here" + + # Format attributes using the attribute parameter + accesskey, _ = bundle.format_pattern("button", attribute="accesskey") + title, _ = bundle.format_pattern("button", attribute="title") + + assert accesskey == "C" + assert title == "Submit form" + + def test_term_roundtrip(self) -> None: + """Terms parse and format correctly.""" + ftl_source = """ +-brand = Firefox +-version = 120.0 +about = { -brand } v{ -version } +""" + resource = parse_ftl(ftl_source) + terms = [e for e in resource.entries if isinstance(e, Term)] + assert len(terms) == 2 + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("about") + assert result == "Firefox v120.0" + + +class TestParseFormatWithVariables: + """Tests for parse->format with variable interpolation.""" + + def test_single_variable_roundtrip(self) -> None: + """Single variable interpolation works correctly.""" + ftl_source = "greeting = Hello, { $name }!" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("greeting", {"name": "Alice"}) + assert result == "Hello, Alice!" + assert len(errors) == 0 + + def test_multiple_variables_roundtrip(self) -> None: + """Multiple variables interpolate correctly.""" + ftl_source = "user = { $firstName } { $lastName } ({ $role })" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern( + "user", + {"firstName": "John", "lastName": "Doe", "role": "Admin"}, + ) + assert result == "John Doe (Admin)" + + def test_number_variable_roundtrip(self) -> None: + """Number variables format correctly.""" + ftl_source = "count = You have { $n } items." + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("count", {"n": 42}) + assert "42" in result + + def test_decimal_variable_roundtrip(self) -> None: + """Decimal variables format correctly.""" + ftl_source = "price = Total: { $amount }" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("price", {"amount": Decimal("19.99")}) + assert "19.99" in result + + def test_missing_variable_fallback(self) -> None: + """Missing variables produce fallback with error.""" + ftl_source = "greeting = Hello, { $name }!" + + bundle = FluentBundle("en-US", strict=False, use_isolating=False) + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("greeting") + assert "Hello" in result + assert len(errors) > 0 # Should report missing variable + + +class TestParseFormatSelectExpressions: + """Tests for parse->format with select expressions.""" + + def test_simple_select_roundtrip(self) -> None: + """Simple select expression resolves correctly.""" + ftl_source = """ +items = { $count -> + [one] One item + *[other] { $count } items +} +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result_one, _ = bundle.format_pattern("items", {"count": 1}) + result_many, _ = bundle.format_pattern("items", {"count": 5}) + + assert result_one == "One item" + assert "5" in result_many + assert "items" in result_many + + def test_string_selector_roundtrip(self) -> None: + """String selector in select expression works correctly.""" + ftl_source = """ +status = { $state -> + [active] Currently active + [inactive] Not active + *[unknown] Status unknown +} +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + active, _ = bundle.format_pattern("status", {"state": "active"}) + inactive, _ = bundle.format_pattern("status", {"state": "inactive"}) + other, _ = bundle.format_pattern("status", {"state": "foo"}) + + assert active == "Currently active" + assert inactive == "Not active" + assert other == "Status unknown" + + def test_nested_select_roundtrip(self) -> None: + """Nested select expressions resolve correctly.""" + ftl_source = """ +response = { $gender -> + [male] { $count -> + [one] He has one item + *[other] He has { $count } items + } + *[other] { $count -> + [one] They have one item + *[other] They have { $count } items + } +} +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("response", {"gender": "male", "count": 1}) + assert "He has one item" in result + + def test_number_literal_variant_roundtrip(self) -> None: + """Number literal variants in select expressions work correctly.""" + ftl_source = """ +rating = { $stars -> + [1] Poor + [2] Fair + [3] Good + [4] Great + [5] Excellent + *[other] Unknown +} +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("rating", {"stars": 5}) + assert result == "Excellent" + + +class TestParseFormatReferences: + """Tests for parse->format with message and term references.""" + + def test_message_reference_roundtrip(self) -> None: + """Message references resolve correctly.""" + ftl_source = """ +base = World +greeting = Hello, { base }! +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("greeting") + assert result == "Hello, World!" + assert len(errors) == 0 + + def test_chained_reference_roundtrip(self) -> None: + """Chained message references resolve correctly.""" + ftl_source = """ +level1 = Core +level2 = { level1 } Extended +level3 = { level2 } Final +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("level3") + assert result == "Core Extended Final" + + def test_term_reference_roundtrip(self) -> None: + """Term references resolve correctly.""" + ftl_source = """ +-brand = Firefox +download = Download { -brand } now! +about = About { -brand } +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + download, _ = bundle.format_pattern("download") + about, _ = bundle.format_pattern("about") + + assert "Firefox" in download + assert "Firefox" in about + + def test_term_attribute_reference_roundtrip(self) -> None: + """Term attribute references resolve correctly.""" + ftl_source = """ +-brand = Firefox + .short = Fx +full = { -brand } +short = { -brand.short } +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + full, _ = bundle.format_pattern("full") + short, _ = bundle.format_pattern("short") + + assert full == "Firefox" + assert short == "Fx" + + def test_term_with_arguments_roundtrip(self) -> None: + """Term references with arguments resolve correctly.""" + ftl_source = """ +-brand = { $case -> + [nominative] Firefox + [genitive] Firefoxu + *[other] Firefox +} +download = Download { -brand(case: "nominative") } +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("download") + assert "Firefox" in result + + +class TestParseFormatEdgeCases: + """Tests for edge cases and unicode handling.""" + + def test_unicode_content_roundtrip(self) -> None: + """Unicode content parses and formats correctly.""" + ftl_source = "greeting = Sveiki, pasaule!" + + bundle = FluentBundle("lv-LV", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("greeting") + assert result == "Sveiki, pasaule!" + + def test_emoji_content_roundtrip(self) -> None: + """Emoji content parses and formats correctly.""" + ftl_source = "welcome = Welcome! \U0001F44B" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("welcome") + assert "\U0001F44B" in result + + def test_cjk_content_roundtrip(self) -> None: + """CJK (Japanese) content in pattern values parses and formats correctly.""" + ftl_source = "hello = \u3053\u3093\u306b\u3061\u306f\u4e16\u754c" + + bundle = FluentBundle("ja-JP", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("hello") + assert "\u3053\u3093\u306b\u3061\u306f" in result + + def test_arabic_content_roundtrip(self) -> None: + """Arabic RTL script in pattern values parses and formats correctly.""" + ftl_source = "greeting = \u0645\u0631\u062d\u0628\u0627" + + bundle = FluentBundle("ar-SA", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("greeting") + assert "\u0645\u0631\u062d\u0628\u0627" in result + + def test_hebrew_content_roundtrip(self) -> None: + """Hebrew RTL script in pattern values parses and formats correctly.""" + ftl_source = "greeting = \u05e9\u05b8\u05dc\u05d5\u05b9\u05dd" + + bundle = FluentBundle("he-IL", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("greeting") + assert "\u05e9\u05b8\u05dc\u05d5\u05b9\u05dd" in result + + def test_backslash_in_text_roundtrip(self) -> None: + """Backslash in text (not StringLiteral) is preserved as-is per Fluent spec.""" + ftl_source = r"path = C:\Users\file.txt" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("path") + assert "\\" in result + assert "Users" in result + + def test_literal_brace_via_string_literal(self) -> None: + """Literal braces via StringLiteral placeable.""" + ftl_source = 'json = { "{" }key{ "}" }' + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("json") + assert "{" in result + assert "}" in result + + def test_empty_pattern_roundtrip(self) -> None: + """Empty pattern value handled correctly.""" + ftl_source = """ +msg = + .attr = Has attribute +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + # Main value is empty + result, errors = bundle.format_pattern("msg") + + assert not errors + assert isinstance(result, str) + + # Attribute should work + attr, _ = bundle.format_pattern("msg", attribute="attr") + assert attr == "Has attribute" + + def test_whitespace_preservation_roundtrip(self) -> None: + """Significant whitespace in patterns is preserved.""" + ftl_source = "spaced = Hello World" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("spaced") + assert " " in result + + +class TestParseFormatWithFunctions: + """Tests for parse->format with built-in functions.""" + + def test_number_function_roundtrip(self) -> None: + """NUMBER function formats correctly.""" + ftl_source = "amount = { NUMBER($value, minimumFractionDigits: 2) }" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("amount", {"value": Decimal("19.99")}) + assert "19.99" in result or "19,99" in result + + def test_datetime_function_roundtrip(self) -> None: + """DATETIME function formats correctly.""" + ftl_source = 'date = Date: { DATETIME($when, dateStyle: "short") }' + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern( + "date", {"when": datetime(2024, 1, 15, tzinfo=UTC)} + ) + assert "1" in result or "2024" in result + + def test_custom_function_roundtrip(self) -> None: + """Custom functions work in parse->format workflow.""" + ftl_source = "msg = Result: { DOUBLE($n) }" + + bundle = FluentBundle("en-US", use_isolating=False) + + def double_func(n: int | Decimal) -> str: + return str(n * 2) + + bundle.add_function("DOUBLE", double_func) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("msg", {"n": 21}) + assert "42" in result + + +class TestParseFormatErrorHandling: + """Tests for error handling in parse->format workflow.""" + + def test_missing_message_returns_fallback(self) -> None: + """Missing message returns fallback string with error.""" + ftl_source = "hello = Hello!" + + bundle = FluentBundle("en-US", strict=False, use_isolating=False) + bundle.add_resource(ftl_source) + + result, errors = bundle.format_pattern("nonexistent") + assert "{nonexistent}" in result + assert len(errors) == 1 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + + def test_missing_attribute_returns_fallback(self) -> None: + """Missing attribute returns fallback string with error.""" + ftl_source = """ +button = Click + .title = Button title +""" + bundle = FluentBundle("en-US", strict=False, use_isolating=False) + bundle.add_resource(ftl_source) + + _, errors = bundle.format_pattern("button", attribute="nonexistent") + assert len(errors) == 1 + + def test_invalid_ftl_produces_junk(self) -> None: + """Invalid FTL syntax produces Junk entry.""" + ftl_source = "invalid = { unclosed" + + resource = parse_ftl(ftl_source) + assert any(isinstance(e, Junk) for e in resource.entries) + + def test_resolution_error_propagates(self) -> None: + """Resolution errors are captured and returned.""" + ftl_source = """ +msg = { missing-ref } +""" + bundle = FluentBundle("en-US", strict=False, use_isolating=False) + bundle.add_resource(ftl_source) + + _, errors = bundle.format_pattern("msg") + assert len(errors) > 0 + + +class TestParseFormatIntrospection: + """Tests for introspection API in parse->format workflow.""" + + def test_has_message_after_parse(self) -> None: + """has_message() works correctly after parsing.""" + ftl_source = """ +hello = Hello +world = World +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + assert bundle.has_message("hello") is True + assert bundle.has_message("world") is True + assert bundle.has_message("nonexistent") is False + + def test_has_attribute_after_parse(self) -> None: + """has_attribute() works correctly after parsing.""" + ftl_source = """ +button = Click + .title = Title + .accesskey = A +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + assert bundle.has_attribute("button", "title") is True + assert bundle.has_attribute("button", "accesskey") is True + assert bundle.has_attribute("button", "nonexistent") is False + + def test_get_message_ids_after_parse(self) -> None: + """get_message_ids() returns all parsed message IDs.""" + ftl_source = """ +msg1 = First +msg2 = Second +msg3 = Third +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + ids = bundle.get_message_ids() + assert "msg1" in ids + assert "msg2" in ids + assert "msg3" in ids + assert len(ids) == 3 + + def test_get_message_variables_after_parse(self) -> None: + """get_message_variables() extracts variables from parsed message.""" + ftl_source = "greeting = Hello, { $name }! You have { $count } items." + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + variables = bundle.get_message_variables("greeting") + assert "name" in variables + assert "count" in variables + assert len(variables) == 2 + + def test_introspect_message_after_parse(self) -> None: + """introspect_message() provides detailed info after parsing.""" + ftl_source = """ +msg = Hello, { $name }! +select-msg = { $count -> + [one] One item + *[other] { $count } items +} +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + # Introspect simple message + info = bundle.introspect_message("msg") + assert info.message_id == "msg" + assert "name" in info.get_variable_names() + assert info.has_selectors is False + + # Introspect message with select expression + select_info = bundle.introspect_message("select-msg") + assert select_info.message_id == "select-msg" + assert "count" in select_info.get_variable_names() + assert select_info.has_selectors is True + + +class TestParseFormatValidation: + """Tests for validation API in parse->format workflow.""" + + def test_validate_resource_valid_ftl(self) -> None: + """validate_resource() accepts valid FTL.""" + ftl_source = """ +hello = Hello, World! +greeting = Hello, { $name }! +""" + bundle = FluentBundle("en-US", use_isolating=False) + result = bundle.validate_resource(ftl_source) + + assert result.is_valid is True + assert len(result.errors) == 0 + + def test_validate_resource_invalid_ftl(self) -> None: + """validate_resource() rejects invalid FTL.""" + ftl_source = "invalid = { unclosed" + + bundle = FluentBundle("en-US", use_isolating=False) + result = bundle.validate_resource(ftl_source) + + assert result.is_valid is False + assert len(result.errors) > 0 + + +class TestParseFormatWithCache: + """Tests for caching behavior in parse->format workflow.""" + + def test_cache_enabled_improves_repeated_calls(self) -> None: + """Cache improves performance on repeated format calls.""" + ftl_source = "msg = Hello, { $name }!" + + bundle = FluentBundle("en-US", use_isolating=False, cache=CacheConfig()) + bundle.add_resource(ftl_source) + + # First call - cache miss + result1, _ = bundle.format_pattern("msg", {"name": "Alice"}) + + # Second call with same args - cache hit + result2, _ = bundle.format_pattern("msg", {"name": "Alice"}) + + assert result1 == result2 + + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["hits"] >= 1 + + def test_cache_stats_available_when_enabled(self) -> None: + """Cache statistics are available when caching enabled.""" + ftl_source = "msg = Hello!" + + bundle = FluentBundle("en-US", use_isolating=False, cache=CacheConfig()) + bundle.add_resource(ftl_source) + + bundle.format_pattern("msg") + + stats = bundle.get_cache_stats() + assert stats is not None + assert "hits" in stats + assert "misses" in stats + + def test_cache_stats_none_when_disabled(self) -> None: + """Cache statistics are None when caching disabled.""" + ftl_source = "msg = Hello!" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + bundle.format_pattern("msg") + + stats = bundle.get_cache_stats() + assert stats is None + + def test_clear_cache_preserves_stats(self) -> None: + """clear_cache() clears entries but metrics are cumulative (not reset).""" + ftl_source = "msg = Hello!" + + bundle = FluentBundle("en-US", use_isolating=False, cache=CacheConfig()) + bundle.add_resource(ftl_source) + + bundle.format_pattern("msg") # miss + bundle.format_pattern("msg") # hit + + bundle.clear_cache() + bundle.format_pattern("msg") # miss (entries cleared, not metrics) + + stats = bundle.get_cache_stats() + assert stats is not None + # 1 pre-clear miss + 1 post-clear miss = 2 cumulative misses + assert stats["misses"] == 2 diff --git a/tests/integration_e2e_cases/essential_parse_format_tests_run_in_every_ci_build_2.py b/tests/integration_e2e_cases/essential_parse_format_tests_run_in_every_ci_build_2.py new file mode 100644 index 00000000..57671eb4 --- /dev/null +++ b/tests/integration_e2e_cases/essential_parse_format_tests_run_in_every_ci_build_2.py @@ -0,0 +1,57 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_integration_e2e.py.""" + +from tests.integration_e2e_cases import * # noqa: F403 - shared split test support + + +class TestParseFormatIsolation: + """Tests for Unicode bidi isolation in parse->format workflow.""" + + def test_use_isolating_true_adds_marks(self) -> None: + """use_isolating=True wraps placeables in bidi isolation marks.""" + ftl_source = "msg = Hello, { $name }!" + + bundle = FluentBundle("en-US", use_isolating=True) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("msg", {"name": "World"}) + + # Should contain FSI (First Strong Isolate) and PDI (Pop Directional Isolate) + assert "\u2068" in result + assert "\u2069" in result + + def test_use_isolating_false_no_marks(self) -> None: + """use_isolating=False does not add bidi isolation marks.""" + ftl_source = "msg = Hello, { $name }!" + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + result, _ = bundle.format_pattern("msg", {"name": "World"}) + + # Should NOT contain isolation marks + assert "\u2068" not in result + assert "\u2069" not in result + + +class TestCommentPreservation: + """Tests for comment handling in parse->format.""" + + def test_comments_dont_affect_formatting(self) -> None: + """Comments in FTL don't affect message formatting.""" + ftl_source = """ +# This is a comment +## Group comment +### Resource comment +hello = Hello! +# Another comment +world = World! +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl_source) + + hello, _ = bundle.format_pattern("hello") + world, _ = bundle.format_pattern("world") + + assert hello == "Hello!" + assert world == "World!" diff --git a/tests/integration_e2e_cases/intensive_round_trip_tests_fuzz_marked_run_with_pytest_m_fuzz.py b/tests/integration_e2e_cases/intensive_round_trip_tests_fuzz_marked_run_with_pytest_m_fuzz.py new file mode 100644 index 00000000..b9f9fca4 --- /dev/null +++ b/tests/integration_e2e_cases/intensive_round_trip_tests_fuzz_marked_run_with_pytest_m_fuzz.py @@ -0,0 +1,96 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_integration_e2e.py.""" + +from tests.integration_e2e_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Intensive Round-trip Tests (Fuzz-marked, run with pytest -m fuzz) +# ============================================================================= + + +class TestSerializeParseRoundtrip: + """Example-based tests for AST serialization round-trips.""" + + def test_serialize_parse_simple_message(self) -> None: + """Serialize->parse round-trip preserves simple messages.""" + ftl_source = "hello = Hello, World!" + + resource = parse_ftl(ftl_source) + serialized = serialize_ftl(resource) + resource2 = parse_ftl(serialized) + + assert len(resource.entries) == len(resource2.entries) + + def test_serialize_parse_with_variables(self) -> None: + """Serialize->parse round-trip preserves variables.""" + ftl_source = "greeting = Hello, { $name }!" + + resource = parse_ftl(ftl_source) + serialized = serialize_ftl(resource) + + bundle1 = FluentBundle("en-US", use_isolating=False) + bundle1.add_resource(ftl_source) + + bundle2 = FluentBundle("en-US", use_isolating=False) + bundle2.add_resource(serialized) + + result1, _ = bundle1.format_pattern("greeting", {"name": "Test"}) + result2, _ = bundle2.format_pattern("greeting", {"name": "Test"}) + + assert result1 == result2 + + def test_serialize_preserves_select_expressions(self) -> None: + """Serialize->parse preserves select expression structure.""" + ftl_source = """ +count = { $n -> + [one] One + *[other] Many +} +""" + resource = parse_ftl(ftl_source) + serialized = serialize_ftl(resource) + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(serialized) + + one, _ = bundle.format_pattern("count", {"n": 1}) + many, _ = bundle.format_pattern("count", {"n": 5}) + + assert "One" in one + assert "Many" in many + + def test_serialize_preserves_term_attributes(self) -> None: + """Serialize->parse preserves term attributes.""" + ftl_source = """ +-brand = Firefox + .short = Fx + .full = Mozilla Firefox +msg = { -brand.short } +""" + resource = parse_ftl(ftl_source) + serialized = serialize_ftl(resource) + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(serialized) + + result, _ = bundle.format_pattern("msg") + assert "Fx" in result + + def test_serialize_preserves_message_attributes(self) -> None: + """Serialize->parse preserves message attributes.""" + ftl_source = """ +button = Click me + .accesskey = C + .title = Submit +""" + resource = parse_ftl(ftl_source) + serialized = serialize_ftl(resource) + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(serialized) + + accesskey, _ = bundle.format_pattern("button", attribute="accesskey") + title, _ = bundle.format_pattern("button", attribute="title") + + assert accesskey == "C" + assert title == "Submit" diff --git a/tests/integration_e2e_cases/locale_code_validation.py b/tests/integration_e2e_cases/locale_code_validation.py new file mode 100644 index 00000000..1aa8535d --- /dev/null +++ b/tests/integration_e2e_cases/locale_code_validation.py @@ -0,0 +1,25 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_integration_e2e.py.""" + +from tests.integration_e2e_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Locale Code Validation +# ============================================================================= + + +class TestLocaleCodeValidation: + """FluentBundle validates locale codes against BCP 47 format.""" + + def test_posix_locale_with_charset_rejected(self) -> None: + """POSIX locale string with charset suffix is rejected with BCP 47 guidance.""" + with pytest.raises(ValueError, match="Strip charset suffixes"): + FluentBundle("en_US.UTF-8") + + def test_valid_bcp47_locales_accepted(self) -> None: + """Valid BCP 47 locale codes are accepted by FluentBundle.""" + for locale in ("en-US", "de-DE", "zh-Hans-CN"): + bundle = FluentBundle(locale, use_isolating=False) + bundle.add_resource("hello = Hello") + result, _ = bundle.format_pattern("hello") + assert result == "Hello" diff --git a/tests/integration_e2e_cases/multi_module_pipeline_tests.py b/tests/integration_e2e_cases/multi_module_pipeline_tests.py new file mode 100644 index 00000000..2715c7fc --- /dev/null +++ b/tests/integration_e2e_cases/multi_module_pipeline_tests.py @@ -0,0 +1,90 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_integration_e2e.py.""" + +from tests.integration_e2e_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# Multi-Module Pipeline Tests +# ============================================================================= + + +class TestMultiModuleIntegration: + """Integration tests exercising parse->validate->serialize->introspect pipeline.""" + + def test_parse_validate_serialize_roundtrip(self) -> None: + """Complete roundtrip: parse -> validate -> serialize -> re-parse preserves structure.""" + ftl = """ +msg = Hello { $name } + .title = Title + +-brand = Firefox + +plural = { $count -> + [one] One item + *[other] { $count } items +} +""" + parser = FluentParserV1() + resource = parser.parse(ftl) + + result = validate_resource(ftl) + assert result.is_valid + + serialized = serialize(resource) + resource2 = parser.parse(serialized) + + assert len(resource2.entries) == len(resource.entries) + + def test_introspect_complex_message(self) -> None: + """Introspect message with select expression, term reference, and function call.""" + ftl = """ +complex = { NUMBER($count) -> + [one] { -brand } has { $count } item + *[other] { -brand } has { NUMBER($count) } items +} + .hint = { $hint } +""" + parser = FluentParserV1() + resource = parser.parse(ftl) + + msg = resource.entries[0] + assert isinstance(msg, Message) + + info = introspect_message(msg) + + var_names = {v.name for v in info.variables} + func_names = {f.name for f in info.functions} + assert "count" in var_names + assert "hint" in var_names + assert info.has_selectors + assert "NUMBER" in func_names + + +class TestValidationRuntimeConsistency: + """Validation warnings predict runtime resolution failures.""" + + def test_chain_depth_warning_matches_runtime_error(self) -> None: + """VALIDATION_CHAIN_DEPTH_EXCEEDED warning implies MAX_DEPTH_EXCEEDED at runtime.""" + chain_length = MAX_DEPTH + 5 + messages = ["msg-0 = Base"] + for i in range(1, chain_length): + messages.append(f"msg-{i} = {{ msg-{i - 1} }}") + + ftl_source = "\n".join(messages) + + result = validate_resource(ftl_source) + has_chain_warning = any( + w.code == DiagnosticCode.VALIDATION_CHAIN_DEPTH_EXCEEDED + for w in result.warnings + ) + assert has_chain_warning + + bundle = FluentBundle("en", strict=False) + bundle.add_resource(ftl_source) + _, errors = bundle.format_pattern(f"msg-{chain_length - 1}") + has_depth_error = any( + e.diagnostic is not None + and e.diagnostic.code.name == "MAX_DEPTH_EXCEEDED" + for e in errors + ) + assert has_depth_error diff --git a/tests/integration_e2e_cases/number_literal_invariant_and_roundtrip.py b/tests/integration_e2e_cases/number_literal_invariant_and_roundtrip.py new file mode 100644 index 00000000..f58a7fe1 --- /dev/null +++ b/tests/integration_e2e_cases/number_literal_invariant_and_roundtrip.py @@ -0,0 +1,66 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_integration_e2e.py.""" + +from tests.integration_e2e_cases import * # noqa: F403 - shared split test support + +# ============================================================================= +# NumberLiteral Invariant and Roundtrip +# ============================================================================= + + +class TestNumberLiteralInvariant: + """NumberLiteral enforces raw/value consistency and rejects bool.""" + + def test_bool_value_rejected(self) -> None: + """NumberLiteral rejects bool for value (bool is int subclass, not a number literal).""" + with pytest.raises(TypeError, match="must be int or Decimal, not bool"): + NumberLiteral(value=True, raw="1") + + def test_raw_value_inconsistency_rejected(self) -> None: + """NumberLiteral rejects raw that parses to a different value than the value field.""" + with pytest.raises(ValueError, match=r"parses to.*but value is"): + NumberLiteral(value=Decimal("1.5"), raw="9.9") + + def test_integer_variant_key_exact_match_roundtrip(self) -> None: + """Integer number variant keys select the correct variant.""" + ftl = """ +rating = { $stars -> + [1] Poor + [3] Good + [5] Excellent + *[other] Unknown +} +""" + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(ftl) + + poor, err1 = bundle.format_pattern("rating", {"stars": 1}) + excellent, err2 = bundle.format_pattern("rating", {"stars": 5}) + fallback, err3 = bundle.format_pattern("rating", {"stars": 99}) + + assert not err1 + assert not err2 + assert not err3 + assert poor == "Poor" + assert excellent == "Excellent" + assert fallback == "Unknown" + + def test_decimal_variant_key_roundtrip(self) -> None: + """Decimal number variant keys in serialized FTL survive parse->format roundtrip.""" + ftl = """ +precision = { $level -> + [0.5] Half + [1.0] Full + *[other] Custom +} +""" + resource = parse_ftl(ftl) + serialized = serialize_ftl(resource) + resource2 = parse_ftl(serialized) + + bundle = FluentBundle("en-US", use_isolating=False) + bundle.add_resource(serialize_ftl(resource2)) + + # Default variant (string selector won't match numeric keys) + result, _ = bundle.format_pattern("precision", {"level": "other"}) + assert result == "Custom" diff --git a/tests/introspection_iso_cases/__init__.py b/tests/introspection_iso_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/introspection_iso_cases/cache_and_babel.py b/tests/introspection_iso_cases/cache_and_babel.py new file mode 100644 index 00000000..0757dec2 --- /dev/null +++ b/tests/introspection_iso_cases/cache_and_babel.py @@ -0,0 +1,610 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +import sys +from unittest.mock import patch + +import pytest + +import ftllexengine.core.babel_compat as _bc +from ftllexengine.introspection import ( + BabelImportError, + CurrencyCode, + CurrencyInfo, + TerritoryCode, + TerritoryInfo, + clear_iso_cache, + get_currency, + get_territory, + get_territory_currencies, + is_valid_currency_code, + is_valid_territory_code, + list_currencies, + list_territories, +) + +# Private member access permitted for integration tests +from ftllexengine.introspection.iso import ( + _get_babel_currencies, + _get_babel_currency_name, + _get_babel_currency_symbol, + _get_babel_official_languages, + _get_babel_territory_currencies, +) + + +class TestCaching: + """Tests for cache behavior.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_results_are_cached(self) -> None: + """Repeated calls return same cached objects.""" + result1 = get_territory("US") + result2 = get_territory("US") + + # Same object should be returned (cached) + assert result1 is result2 + + def test_clear_cache_works(self) -> None: + """clear_iso_cache clears all caches.""" + # Populate cache + result1 = get_territory("US") + result1_currency = get_currency("USD") + + # Clear cache + clear_iso_cache() + + # New objects should be returned + result2 = get_territory("US") + result2_currency = get_currency("USD") + + # Values should be equal + assert result1 == result2 + assert result1_currency == result2_currency + + def test_different_locales_cached_separately(self) -> None: + """Different locales have separate cache entries.""" + result_en = get_territory("DE", locale="en") + result_de = get_territory("DE", locale="de") + + # Different objects (different locales) + assert result_en != result_de + + # Repeat calls return cached objects + assert get_territory("DE", locale="en") is result_en + assert get_territory("DE", locale="de") is result_de + +class TestTypeAliases: + """Tests for TerritoryCode and CurrencyCode NewType wrappers.""" + + def test_territory_code_is_str_at_runtime(self) -> None: + """TerritoryCode is a NewType of str; transparent (identity) at runtime.""" + code = TerritoryCode("US") + assert isinstance(code, str) + assert code == "US" + + def test_currency_code_is_str_at_runtime(self) -> None: + """CurrencyCode is a NewType of str; transparent (identity) at runtime.""" + code = CurrencyCode("USD") + assert isinstance(code, str) + assert code == "USD" + + def test_territory_code_newtype_constructor_is_identity(self) -> None: + """TerritoryCode(...) returns the string value unchanged at runtime.""" + raw = "LV" + assert TerritoryCode(raw) == raw + + def test_currency_code_newtype_constructor_is_identity(self) -> None: + """CurrencyCode(...) returns the string value unchanged at runtime.""" + raw = "EUR" + assert CurrencyCode(raw) == raw + +class TestBabelImportError: + """Tests for BabelImportError exception.""" + + def test_exception_is_import_error_subclass(self) -> None: + """BabelImportError is a subclass of ImportError.""" + assert issubclass(BabelImportError, ImportError) + + def test_exception_message(self) -> None: + """BabelImportError has informative installation message.""" + exc = BabelImportError("ISO introspection") + message = str(exc) + assert "Babel" in message + assert "pip install ftllexengine[babel]" in message + assert "ISO introspection" in message + + def test_exception_can_be_raised_and_caught(self) -> None: + """BabelImportError can be raised and caught.""" + feature = "test feature" + with pytest.raises(BabelImportError) as exc_info: + raise BabelImportError(feature) + assert "Babel" in str(exc_info.value) + +class TestEdgeCases: + """Tests for edge cases and error handling.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_empty_string_territory(self) -> None: + """get_territory handles empty string gracefully.""" + result = get_territory("") + assert result is None + + def test_empty_string_currency(self) -> None: + """get_currency handles empty string gracefully.""" + result = get_currency("") + assert result is None + + def test_numeric_string_territory(self) -> None: + """get_territory handles numeric strings.""" + result = get_territory("12") + assert result is None + + def test_numeric_string_currency(self) -> None: + """get_currency handles numeric strings.""" + result = get_currency("123") + assert result is None + + def test_whitespace_territory(self) -> None: + """get_territory handles whitespace strings.""" + result = get_territory(" ") + assert result is None + + def test_whitespace_currency(self) -> None: + """get_currency handles whitespace strings.""" + result = get_currency(" ") + assert result is None + + def test_special_iso_codes(self) -> None: + """Test special ISO 4217 codes.""" + # XXX is "No currency" - a valid ISO 4217 code + xxx = get_currency("XXX") + assert xxx is not None + + # XAU is gold - a valid ISO 4217 code + xau = get_currency("XAU") + assert xau is not None + + def test_invalid_locale_territory(self) -> None: + """get_territory returns None for invalid locales.""" + result = get_territory("US", locale="invalid_LOCALE_123") + assert result is None + + def test_invalid_locale_currency(self) -> None: + """get_currency returns None for invalid locales.""" + result = get_currency("USD", locale="invalid_LOCALE_123") + assert result is None + + def test_malformed_locale_list_territories(self) -> None: + """list_territories returns empty frozenset for malformed locales.""" + result = list_territories(locale="xxx_YYY") + assert isinstance(result, frozenset) + assert len(result) == 0 + + def test_malformed_locale_list_currencies(self) -> None: + """list_currencies returns frozenset for malformed locales.""" + result = list_currencies(locale="xxx_YYY") + assert isinstance(result, frozenset) + + def test_currency_symbol_fallback(self) -> None: + """get_currency returns code as symbol fallback for unknown/problematic currencies.""" + # Test with a real currency but in a locale that might not have symbol data + result = get_currency("USD", locale="en") + assert result is not None + # Symbol should either be locale-specific or fall back to code + assert result.symbol in ("$", "US$", "USD") + + def test_territory_without_currency(self) -> None: + """Territories without currency data have empty currencies tuple.""" + # Antarctica (AQ) typically has no official currency + result = get_territory("AQ") + if result is not None: + # May have no currencies (empty tuple) + assert isinstance(result.currencies, tuple) + # May be empty or contain some currencies depending on CLDR data + assert all(isinstance(c, str) for c in result.currencies) + + def test_type_guard_non_string_territory(self) -> None: + """is_valid_territory_code returns False for non-string inputs.""" + assert is_valid_territory_code(None) is False # type: ignore[arg-type] + assert is_valid_territory_code(123) is False # type: ignore[arg-type] + assert is_valid_territory_code([]) is False # type: ignore[arg-type] + assert is_valid_territory_code({}) is False # type: ignore[arg-type] + + def test_type_guard_non_string_currency(self) -> None: + """is_valid_currency_code returns False for non-string inputs.""" + assert is_valid_currency_code(None) is False # type: ignore[arg-type] + assert is_valid_currency_code(123) is False # type: ignore[arg-type] + assert is_valid_currency_code([]) is False # type: ignore[arg-type] + assert is_valid_currency_code({}) is False # type: ignore[arg-type] + +class TestBabelExceptionHandling: + """Tests for Babel exception handling paths.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_currency_name_none_for_truly_invalid_code(self) -> None: + """get_currency returns None for codes not in CLDR.""" + # Use a code that's definitely not in CLDR + result = get_currency("ZZZ") + assert result is None + + # Another invalid code + result2 = get_currency("QQQ") + assert result2 is None + + def test_currency_symbol_with_unusual_locale(self) -> None: + """get_currency handles unusual locales gracefully.""" + # Test with rare locale that might not have full currency symbol data + result = get_currency("USD", locale="zu") # Zulu + if result is not None: + # Symbol should be present (may be fallback) + assert len(result.symbol) > 0 + + def test_territory_currencies_for_non_sovereign_territories(self) -> None: + """get_territory_currencies handles territories without unique currencies.""" + # Vatican City might have unusual currency data + result = get_territory_currencies("VA") + # May return EUR or empty tuple + assert isinstance(result, tuple) + assert all(isinstance(c, str) for c in result) + + # Antarctica has no official currency + result_aq = get_territory_currencies("AQ") + assert result_aq == () + + def test_get_currency_with_very_rare_locale(self) -> None: + """get_currency handles a locale with minimal CLDR data.""" + # Sichuan Yi (ii) is a valid but rare locale with limited data + result = get_currency("USD", locale="ii") + assert result is None or isinstance(result, CurrencyInfo) + + def test_get_territory_with_deprecated_locale_format(self) -> None: + """get_territory handles POSIX locale format variant.""" + result = get_territory("US", locale="en_US_POSIX") + assert result is None or isinstance(result, TerritoryInfo) + + def test_babel_import_error_propagation(self) -> None: + """BabelImportError is raised when Babel is not available.""" + # Temporarily hide babel modules to trigger ImportError + babel_modules = {k: v for k, v in sys.modules.items() if k.startswith("babel")} + saved_available = _bc._babel_available + try: + # Remove babel from sys.modules + for key in list(babel_modules.keys()): + sys.modules.pop(key, None) + + # Clear caches to force re-import + clear_iso_cache() + + # Prevent import by blocking it + sys.modules["babel"] = None # type: ignore[assignment] + + # Reset the availability sentinel so require_babel() re-evaluates against + # the patched sys.modules. Without this, a cached True value causes + # require_babel() to pass even though Babel is no longer importable, + # leading to a raw ModuleNotFoundError instead of BabelImportError. + _bc._babel_available = None + + # Now try to use the functions - they should raise BabelImportError + # PLC0415: Runtime import needed to test ImportError path + from ftllexengine.introspection import iso + + with pytest.raises(BabelImportError): + iso.get_territory("US") + + finally: + # Restore babel modules and availability sentinel + for key, value in babel_modules.items(): + sys.modules[key] = value + _bc._babel_available = saved_available + # Clear cache again to restore normal operation + clear_iso_cache() + +class TestPrivateBabelWrappers: + """Tests for private Babel wrapper functions. + + Tests exception handling paths in internal functions. + Private member access permitted. + """ + + def test_get_babel_currency_name_with_invalid_code(self) -> None: + """_get_babel_currency_name returns None for invalid codes.""" + result = _get_babel_currency_name("ZZZ", "en") + assert result is None + + result2 = _get_babel_currency_name("QQQ", "en") + assert result2 is None + + def test_get_babel_currency_name_with_problematic_locale(self) -> None: + """_get_babel_currency_name returns None for malformed locales.""" + result = _get_babel_currency_name("USD", "invalid_LOCALE_123") + assert result is None + + def test_get_babel_currency_symbol_with_unknown_code(self) -> None: + """_get_babel_currency_symbol returns code as fallback for unknown codes.""" + # Test with an invalid code - should return the code itself as fallback + result = _get_babel_currency_symbol("ZZZ", "en") + # Should either work or fall back to the code + assert result == "ZZZ" or len(result) > 0 + + def test_get_babel_currency_symbol_with_problematic_locale(self) -> None: + """_get_babel_currency_symbol falls back to currency code for malformed locales.""" + result = _get_babel_currency_symbol("USD", "xxx_YYY_ZZZ") + assert result == "USD" # Falls back to code + + def test_get_babel_territory_currencies_with_invalid_territory(self) -> None: + """_get_babel_territory_currencies returns empty list for invalid territories.""" + result = _get_babel_territory_currencies("XX") + # Should return empty list for unknown territories + assert isinstance(result, list) + assert len(result) == 0 + + def test_get_babel_territory_currencies_with_antarctica(self) -> None: + """_get_babel_territory_currencies handles territories without currencies.""" + result = _get_babel_territory_currencies("AQ") # Antarctica + # Should return empty list (no official currency) + assert isinstance(result, list) + + def test_get_babel_currency_symbol_fallback_path(self) -> None: + """_get_babel_currency_symbol uses fallback when Babel raises exception.""" + # Use a code/locale combination that might trigger Babel errors + # XTS is a test currency code - might not have symbols in all locales + result = _get_babel_currency_symbol("XTS", "en") + # Should return either a valid symbol or the code as fallback + assert isinstance(result, str) + assert len(result) > 0 + + def test_get_babel_currency_name_import_error(self) -> None: + """_get_babel_currency_name raises BabelImportError when Babel unavailable.""" + _bc._babel_available = False + try: + with pytest.raises(BabelImportError): + _get_babel_currency_name("USD", "en") + finally: + _bc._babel_available = None + + def test_get_babel_currency_symbol_import_error(self) -> None: + """_get_babel_currency_symbol raises BabelImportError when Babel unavailable.""" + # Set sentinel to False to simulate Babel being unavailable. + # Direct sentinel manipulation avoids the recursive __import__ mock pattern. + _bc._babel_available = False + try: + with pytest.raises(BabelImportError): + _get_babel_currency_symbol("USD", "en") + finally: + # Reset so subsequent tests reinitialize with Babel available + _bc._babel_available = None + + def test_get_babel_territory_currencies_import_error(self) -> None: + """_get_babel_territory_currencies raises BabelImportError when Babel unavailable.""" + # Set sentinel to False to simulate Babel being unavailable. + # Direct sentinel manipulation avoids the recursive __import__ mock pattern. + _bc._babel_available = False + try: + with pytest.raises(BabelImportError): + _get_babel_territory_currencies("US") + finally: + # Reset so subsequent tests reinitialize with Babel available + _bc._babel_available = None + + def test_get_babel_territory_currencies_exception_handling(self) -> None: + """_get_babel_territory_currencies returns empty list on Babel API errors. + + The production code calls babel.numbers.get_territory_currencies() directly. + Patching that function to raise ValueError exercises the defensive except clause. + """ + with patch( + "babel.numbers.get_territory_currencies", + side_effect=ValueError("simulated Babel data error"), + ): + result = _get_babel_territory_currencies("US") + assert result == [] + + def test_get_babel_official_languages_exception_handling(self) -> None: + """_get_babel_official_languages returns empty tuple on Babel API errors. + + The production code calls babel.languages.get_official_languages() directly. + Patching that function to raise ValueError exercises the defensive except clause. + """ + with patch( + "babel.languages.get_official_languages", + side_effect=ValueError("simulated Babel data error"), + ): + result = _get_babel_official_languages("GB") + assert result == () + + def test_get_babel_official_languages_lookup_error(self) -> None: + """_get_babel_official_languages returns empty tuple on LookupError.""" + with patch( + "babel.languages.get_official_languages", + side_effect=LookupError("unknown territory"), + ): + result = _get_babel_official_languages("XX") + assert result == () + + def test_list_currencies_filters_invalid_codes(self) -> None: + """list_currencies filters out invalid currency codes from Babel data.""" + # This tests the branch where codes don't match ISO 4217 format + # Clear cache to ensure fresh call + clear_iso_cache() + + # Mock _get_babel_currencies to return invalid codes + original_get_babel_currencies = _get_babel_currencies + + def mock_get_babel_currencies() -> dict[str, str]: + real_currencies = original_get_babel_currencies() + # Add invalid codes to trigger the filter branch + return { + **real_currencies, + "US": "Invalid two-letter code", # Only 2 letters + "USDD": "Invalid four-letter code", # 4 letters + "usd": "Invalid lowercase code", # Lowercase + "12D": "Invalid numeric code", # Contains numbers + "": "Empty code", # Empty + } + + with patch( + "ftllexengine.introspection.iso_lookup._get_babel_currencies", + side_effect=mock_get_babel_currencies, + ): + result = list_currencies() + # Should still return valid currencies, filtering out invalid ones + assert isinstance(result, frozenset) + codes = {c.code for c in result} + # Invalid codes should not be in result + assert "US" not in codes # Two-letter code + assert "USDD" not in codes # Four-letter code + # Valid codes should be present + assert "USD" in codes + +class TestLocaleNormalization: + """Tests for locale input normalization.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_locale_format_variants_return_same_cached_object(self) -> None: + """Different locale formats should hit the same cache entry.""" + # Clear cache to start fresh + clear_iso_cache() + + # Call with BCP-47 format + result_bcp47 = get_territory("US", locale="en-US") + + # Call with POSIX format (should hit same cache) + result_posix = get_territory("US", locale="en_US") + + # Call with lowercase + result_lower = get_territory("US", locale="en_us") + + # All should return the same cached object + assert result_bcp47 is result_posix + assert result_posix is result_lower + + def test_locale_normalization_for_get_currency(self) -> None: + """get_currency normalizes locale formats to single cache entry.""" + clear_iso_cache() + + result1 = get_currency("EUR", locale="de-DE") + result2 = get_currency("EUR", locale="de_DE") + result3 = get_currency("EUR", locale="de_de") + + # Same cached object for all variants + assert result1 is result2 + assert result2 is result3 + + def test_locale_normalization_for_list_territories(self) -> None: + """list_territories normalizes locale formats to single cache entry.""" + clear_iso_cache() + + result1 = list_territories(locale="fr-FR") + result2 = list_territories(locale="fr_FR") + result3 = list_territories(locale="fr_fr") + + # Same cached object for all variants + assert result1 is result2 + assert result2 is result3 + + def test_locale_normalization_for_list_currencies(self) -> None: + """list_currencies normalizes locale formats to single cache entry.""" + clear_iso_cache() + + result1 = list_currencies(locale="ja-JP") + result2 = list_currencies(locale="ja_JP") + result3 = list_currencies(locale="ja_jp") + + # Same cached object for all variants + assert result1 is result2 + assert result2 is result3 + + def test_code_case_normalization(self) -> None: + """Territory and currency codes are case-normalized.""" + clear_iso_cache() + + # Territory code case variants should hit same cache + t_upper = get_territory("US") + t_lower = get_territory("us") + t_mixed = get_territory("Us") + + assert t_upper is t_lower + assert t_lower is t_mixed + + # Currency code case variants should hit same cache + c_upper = get_currency("USD") + c_lower = get_currency("usd") + c_mixed = get_currency("Usd") + + assert c_upper is c_lower + assert c_lower is c_mixed + +class TestBoundedCache: + """Tests for bounded LRU cache.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_cache_uses_lru_with_maxsize(self) -> None: + """Cache implementation should use bounded LRU cache.""" + # Import the internal cached functions to check their cache_info + + from ftllexengine.introspection.iso import ( + _get_currency_impl, + _get_territory_currencies_impl, + _get_territory_impl, + _list_currencies_impl, + _list_territories_impl, + ) + + # All internal cached functions should have cache_info method (lru_cache feature) + assert hasattr(_get_territory_impl, "cache_info") + assert hasattr(_get_currency_impl, "cache_info") + assert hasattr(_list_territories_impl, "cache_info") + assert hasattr(_list_currencies_impl, "cache_info") + assert hasattr(_get_territory_currencies_impl, "cache_info") + + # Check maxsize is set (bounded cache, not unbounded) + # pylint: disable=no-value-for-parameter + # Note: cache_info() is a method added by @lru_cache decorator, not + # related to the function's parameters. Pylint doesn't understand this. + info = _get_territory_impl.cache_info() + assert info.maxsize is not None + assert info.maxsize > 0 # Should be MAX_LOCALE_CACHE_SIZE (128) + + def test_cache_statistics_work(self) -> None: + """Cache statistics (hits, misses) should be tracked.""" + from ftllexengine.introspection.iso import ( + _get_territory_impl, + ) + + clear_iso_cache() + + # pylint: disable=no-value-for-parameter + # Note: cache_info() is a method added by @lru_cache decorator, not + # related to the function's parameters. Pylint doesn't understand this. + + # Get initial stats + initial_info = _get_territory_impl.cache_info() + initial_hits = initial_info.hits + initial_misses = initial_info.misses + + # First call should be a miss + get_territory("US") + info_after_first = _get_territory_impl.cache_info() + assert info_after_first.misses == initial_misses + 1 + + # Second call should be a hit + get_territory("US") + info_after_second = _get_territory_impl.cache_info() + assert info_after_second.hits == initial_hits + 1 diff --git a/tests/introspection_iso_cases/defensive_branches.py b/tests/introspection_iso_cases/defensive_branches.py new file mode 100644 index 00000000..79a68b4b --- /dev/null +++ b/tests/introspection_iso_cases/defensive_branches.py @@ -0,0 +1,522 @@ +# mypy: ignore-errors +# ruff: noqa: ARG001 +# mypy: ignore-errors +from __future__ import annotations + +import builtins +from unittest.mock import MagicMock, patch + +import pytest + +from ftllexengine.introspection import ( + BabelImportError, +) + +# Private member access permitted for integration tests +from ftllexengine.introspection.iso import ( + _get_babel_currency_name, + _get_babel_currency_symbol, + _get_babel_territories, +) +from ftllexengine.introspection.iso_babel import _is_unknown_locale_error + + +class _UnexpectedTestError(Exception): + """Custom exception for testing defensive error handling. + + Defined at module level to avoid scoping issues with pytest.raises. + Used to verify that non-UnknownLocaleError exceptions propagate correctly. + """ + + def __str__(self) -> str: + return "Something went wrong - internal processing error" + +class _LocaleWordTestError(Exception): + """Exception whose message contains 'locale' but is NOT UnknownLocaleError. + + Tests type-based exception matching: this must propagate even though the + message contains the word 'locale'. The old substring-based matching would + have incorrectly suppressed this. + """ + + def __str__(self) -> str: + return "Failed to process locale configuration data" + +class TestDefensiveExceptionPropagation: + """Tests for defensive exception re-raising in Babel wrappers. + + iso.py catches babel.core.UnknownLocaleError by type (isinstance check) + and re-raises all other exceptions. These tests verify that logic bugs + and unexpected exceptions propagate, including those whose messages + contain 'locale' or 'unknown' but are not UnknownLocaleError. + """ + + def test_currency_name_reraises_unexpected_exception(self) -> None: + """_get_babel_currency_name re-raises non-locale exceptions. + + Tests line 196: raise statement in defensive exception handler. + """ + # This test verifies that unexpected exceptions (not matching the + # "locale" or "unknown" pattern) are propagated rather than suppressed. + + call_count = [0] # Use list to allow modification in nested function + error_msg = "Internal error" + + def mock_locale_parse(locale_str: str) -> object: + """Mock Locale.parse to raise unexpected exception.""" + call_count[0] += 1 + raise _UnexpectedTestError(error_msg) + + # Patch Babel's Locale.parse to inject our test exception + with patch("babel.Locale.parse", side_effect=mock_locale_parse): + # The exception should propagate (not be suppressed) + exception_raised = False + result = None + try: + result = _get_babel_currency_name("USD", "en") + except _UnexpectedTestError: + exception_raised = True + except Exception as e: + pytest.fail(f"Unexpected exception type: {type(e).__name__}: {e}") + + if not exception_raised: + pytest.fail( + f"Expected _UnexpectedTestError to be raised. " + f"Mock called {call_count[0]} times. Result: {result}" + ) + + def test_currency_symbol_reraises_unexpected_exception(self) -> None: + """_get_babel_currency_symbol re-raises non-locale exceptions. + + Tests line 217: raise statement in defensive exception handler. + """ + error_msg = "Internal error" + + def mock_get_currency_symbol(code: str, locale: str | object = None) -> str: + """Mock that raises unexpected exception.""" + raise _UnexpectedTestError(error_msg) + + # Patch get_currency_symbol to trigger the exception path + with patch("babel.numbers.get_currency_symbol", side_effect=mock_get_currency_symbol): + # The exception should propagate (not be suppressed) + exception_raised = False + try: + _get_babel_currency_symbol("USD", "en") + except _UnexpectedTestError: + exception_raised = True + + assert exception_raised, "Expected _UnexpectedTestError to be raised" + + def test_territories_reraises_non_unknown_locale_error_with_locale_word( + self, + ) -> None: + """Non-UnknownLocaleError with 'locale' in message propagates. + + Verifies type-based matching: exceptions whose message contains + 'locale' propagate if not babel.core.UnknownLocaleError. + """ + from ftllexengine.introspection.iso import ( + _get_babel_territories, + ) + + def mock_locale_parse(locale_str: str) -> object: + raise _LocaleWordTestError + + with ( + patch("babel.Locale.parse", side_effect=mock_locale_parse), + pytest.raises(_LocaleWordTestError), + ): + _get_babel_territories("en") + + def test_currency_name_reraises_non_unknown_locale_error_with_locale_word( + self, + ) -> None: + """Non-UnknownLocaleError with 'locale' in message propagates. + + Verifies type-based matching replaces fragile substring matching. + """ + def mock_locale_parse(locale_str: str) -> object: + raise _LocaleWordTestError + + with ( + patch("babel.Locale.parse", side_effect=mock_locale_parse), + pytest.raises(_LocaleWordTestError), + ): + _get_babel_currency_name("USD", "en") + + def test_currency_symbol_reraises_non_unknown_locale_error_with_locale_word( + self, + ) -> None: + """Non-UnknownLocaleError with 'locale' in message propagates. + + Verifies type-based matching replaces fragile substring matching. + """ + def mock_symbol( + code: str, + locale: str | object = None, + ) -> str: + raise _LocaleWordTestError + + with ( + patch( + "babel.numbers.get_currency_symbol", + side_effect=mock_symbol, + ), + pytest.raises(_LocaleWordTestError), + ): + _get_babel_currency_symbol("USD", "en") + +class TestUnknownLocaleErrorImportFailure: + """Tests for UnknownLocaleError import failure paths. + + These tests cover the edge case where: + 1. Babel raises a non-standard exception (not in the caught set) + 2. Attempting to import UnknownLocaleError fails with ImportError + 3. The original exception should be re-raised + """ + + def test_currency_name_reraises_when_import_fails(self) -> None: + """_get_babel_currency_name re-raises when UnknownLocaleError import fails.""" + + class CustomBabelError(Exception): + """Custom exception to simulate unexpected Babel error.""" + + custom_exc = CustomBabelError("Unexpected Babel error") + mock_get_currency_name = MagicMock(side_effect=custom_exc) + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_arg: dict[str, object] | None = None, + locals_arg: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name in ("babel", "babel.numbers"): + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + if name == "babel.core" and "UnknownLocaleError" in fromlist: + msg = "Cannot import UnknownLocaleError" + raise ImportError(msg) + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + + with ( + patch( + "babel.numbers.get_currency_name", + mock_get_currency_name, + ), + patch("builtins.__import__", side_effect=mock_import), + pytest.raises(CustomBabelError) as exc_info, + ): + _get_babel_currency_name("USD", "en") + + assert exc_info.value is custom_exc + + def test_currency_symbol_reraises_when_import_fails(self) -> None: + """_get_babel_currency_symbol re-raises when UnknownLocaleError import fails.""" + + class CustomBabelError(Exception): + """Custom exception to simulate unexpected Babel error.""" + + custom_exc = CustomBabelError("Unexpected symbol error") + mock_get_currency_symbol = MagicMock(side_effect=custom_exc) + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_arg: dict[str, object] | None = None, + locals_arg: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel.numbers": + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + if name == "babel.core" and "UnknownLocaleError" in fromlist: + msg = "Cannot import UnknownLocaleError" + raise ImportError(msg) + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + + with ( + patch( + "babel.numbers.get_currency_symbol", + mock_get_currency_symbol, + ), + patch("builtins.__import__", side_effect=mock_import), + pytest.raises(CustomBabelError) as exc_info, + ): + _get_babel_currency_symbol("USD", "en") + + assert exc_info.value is custom_exc + + def test_currency_name_chained_exception_propagation(self) -> None: + """Exception propagation when UnknownLocaleError import fails.""" + + class UnexpectedError(Exception): + """Simulates an unexpected Babel exception.""" + + original_exc = UnexpectedError("Original error") + mock_get_currency_name = MagicMock(side_effect=original_exc) + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_arg: dict[str, object] | None = None, + locals_arg: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name in ("babel", "babel.numbers"): + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + if name == "babel.core" and "UnknownLocaleError" in fromlist: + msg = "UnknownLocaleError unavailable" + raise ImportError(msg) + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + + with ( + patch( + "babel.numbers.get_currency_name", + mock_get_currency_name, + ), + patch("builtins.__import__", side_effect=mock_import), + pytest.raises(UnexpectedError) as exc_info, + ): + _get_babel_currency_name("USD", "en") + + assert exc_info.value is original_exc + + def test_currency_symbol_chained_exception_propagation(self) -> None: + """Exception propagation when UnknownLocaleError import fails.""" + + class UnexpectedError(Exception): + """Simulates an unexpected Babel exception.""" + + original_exc = UnexpectedError("Original symbol error") + mock_get_currency_symbol = MagicMock(side_effect=original_exc) + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_arg: dict[str, object] | None = None, + locals_arg: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel.numbers": + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + if name == "babel.core" and "UnknownLocaleError" in fromlist: + msg = "UnknownLocaleError unavailable" + raise ImportError(msg) + return original_import( + name, globals_arg, locals_arg, fromlist, level + ) + + with ( + patch( + "babel.numbers.get_currency_symbol", + mock_get_currency_symbol, + ), + patch("builtins.__import__", side_effect=mock_import), + pytest.raises(UnexpectedError) as exc_info, + ): + _get_babel_currency_symbol("USD", "en") + + assert exc_info.value is original_exc + +class TestIsoBabelDefensiveBranches: + """Direct coverage for defensive helper branches in iso_babel.py.""" + + def test_is_unknown_locale_error_returns_false_when_babel_is_unavailable(self) -> None: + """BabelImportError while resolving the error class yields False.""" + with patch( + "ftllexengine.introspection.iso_babel.get_unknown_locale_error_class", + side_effect=BabelImportError("UnknownLocaleError"), + ): + assert _is_unknown_locale_error(ValueError("not a locale error")) is False + + def test_is_unknown_locale_error_returns_true_for_matching_exception(self) -> None: + """The helper returns True when the exception matches Babel's error class.""" + + class FakeUnknownLocaleError(Exception): + """Stand-in for babel.core.UnknownLocaleError.""" + + with patch( + "ftllexengine.introspection.iso_babel.get_unknown_locale_error_class", + return_value=FakeUnknownLocaleError, + ): + assert _is_unknown_locale_error(FakeUnknownLocaleError("bad locale")) is True + + def test_get_babel_territories_without_unknown_locale_class_success(self) -> None: + """The no-UnknownLocaleError branch still returns territory data when lookup succeeds.""" + + class FakeLocale: + def __init__(self) -> None: + self.territories = {"US": "United States"} + + with ( + patch( + "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", + return_value=None, + ), + patch( + "ftllexengine.introspection.iso_babel._get_babel_locale", + return_value=FakeLocale(), + ), + ): + assert _get_babel_territories("en") == {"US": "United States"} + + def test_get_babel_territories_without_unknown_locale_class_failure(self) -> None: + """The no-UnknownLocaleError branch returns an empty mapping on locale lookup errors.""" + with ( + patch( + "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", + return_value=None, + ), + patch( + "ftllexengine.introspection.iso_babel._get_babel_locale", + side_effect=ValueError("bad locale"), + ), + ): + assert _get_babel_territories("en") == {} + + def test_get_babel_currency_name_without_unknown_locale_class_success(self) -> None: + """The no-UnknownLocaleError branch returns the localized currency name.""" + + class FakeLocale: + def __init__(self) -> None: + self.currencies = {"USD": "US Dollar"} + + class FakeLocaleClass: + @staticmethod + def parse(_locale_str: str) -> FakeLocale: + return FakeLocale() + + class FakeNumbers: + @staticmethod + def get_currency_name(_code: str, *, locale: str) -> str: + assert locale == "en" + return "US Dollar" + + with ( + patch( + "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", + return_value=None, + ), + patch( + "ftllexengine.introspection.iso_babel.get_locale_class", + return_value=FakeLocaleClass, + ), + patch( + "ftllexengine.introspection.iso_babel.get_babel_numbers", + return_value=FakeNumbers, + ), + ): + assert _get_babel_currency_name("USD", "en") == "US Dollar" + + def test_get_babel_currency_name_without_unknown_locale_class_failure(self) -> None: + """The no-UnknownLocaleError branch returns None on locale parse errors.""" + + class FakeLocaleClass: + @staticmethod + def parse(_locale_str: str) -> object: + msg = "bad locale" + raise ValueError(msg) + + with ( + patch( + "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", + return_value=None, + ), + patch( + "ftllexengine.introspection.iso_babel.get_locale_class", + return_value=FakeLocaleClass, + ), + patch( + "ftllexengine.introspection.iso_babel.get_babel_numbers", + return_value=MagicMock(), + ), + ): + assert _get_babel_currency_name("USD", "en") is None + + def test_get_babel_currency_name_without_unknown_locale_class_missing_code(self) -> None: + """The no-UnknownLocaleError branch returns None for absent currency codes.""" + + class FakeLocale: + def __init__(self) -> None: + self.currencies = {"EUR": "Euro"} + + class FakeLocaleClass: + @staticmethod + def parse(_locale_str: str) -> FakeLocale: + return FakeLocale() + + with ( + patch( + "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", + return_value=None, + ), + patch( + "ftllexengine.introspection.iso_babel.get_locale_class", + return_value=FakeLocaleClass, + ), + patch( + "ftllexengine.introspection.iso_babel.get_babel_numbers", + return_value=MagicMock(), + ), + ): + assert _get_babel_currency_name("USD", "en") is None + + def test_get_babel_currency_symbol_without_unknown_locale_class_success(self) -> None: + """The no-UnknownLocaleError branch returns the localized symbol when lookup succeeds.""" + + class FakeNumbers: + @staticmethod + def get_currency_symbol(_code: str, *, locale: str) -> str: + assert locale == "en" + return "$" + + with ( + patch( + "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", + return_value=None, + ), + patch( + "ftllexengine.introspection.iso_babel.get_babel_numbers", + return_value=FakeNumbers, + ), + ): + assert _get_babel_currency_symbol("USD", "en") == "$" + + def test_get_babel_currency_symbol_without_unknown_locale_class_failure(self) -> None: + """The no-UnknownLocaleError branch falls back to the code on lookup errors.""" + + class FakeNumbers: + @staticmethod + def get_currency_symbol(_code: str, *, locale: str) -> str: + _ = locale + msg = "bad locale" + raise ValueError(msg) + + with ( + patch( + "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", + return_value=None, + ), + patch( + "ftllexengine.introspection.iso_babel.get_babel_numbers", + return_value=FakeNumbers, + ), + ): + assert _get_babel_currency_symbol("USD", "en") == "USD" diff --git a/tests/introspection_iso_cases/error_paths.py b/tests/introspection_iso_cases/error_paths.py new file mode 100644 index 00000000..8b4deeec --- /dev/null +++ b/tests/introspection_iso_cases/error_paths.py @@ -0,0 +1,295 @@ +# mypy: ignore-errors +# ruff: noqa: ARG001 +# mypy: ignore-errors +from __future__ import annotations + +from unittest.mock import patch + +import pytest + +from ftllexengine.introspection import ( + CurrencyInfo, + TerritoryInfo, + clear_iso_cache, + get_currency, + get_territory, + get_territory_currencies, + list_currencies, + list_territories, +) + +# Private member access permitted for integration tests +from ftllexengine.introspection.iso import ( + _get_babel_currency_name, + _get_babel_currency_symbol, +) + + +class TestExceptionNarrowing: + """Tests for narrowed exception handling in Babel wrappers.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_value_error_is_caught(self) -> None: + """ValueError from Babel should be caught and handled gracefully.""" + # Invalid locale formats trigger ValueError in Babel + # The function should return None rather than propagating + result = get_territory("US", locale="invalid") + # Should either work or return None, not raise + assert result is None or isinstance(result, TerritoryInfo) + + def test_lookup_error_is_caught(self) -> None: + """LookupError (UnknownLocaleError) from Babel should be handled.""" + # Test with a locale that doesn't exist in CLDR + try: + result = get_currency("USD", locale="xyz_ABC") + # Should return None or result, not raise + assert result is None or isinstance(result, CurrencyInfo) + except LookupError: + pytest.fail("LookupError should be caught, not propagated") + + def test_attribute_key_error_handled(self) -> None: + """AttributeError and KeyError from data access should be handled.""" + # These are handled internally; we verify by checking edge case inputs + # that might trigger such errors in Babel's data access + result = get_territory("XX") # Unknown territory + assert result is None + + result2 = get_currency("ZZZ") # Unknown currency + assert result2 is None + + def test_name_error_propagates(self) -> None: + """NameError (programming bug) propagates rather than being suppressed. + + The narrowed exception catch list (ValueError, LookupError, KeyError, + AttributeError) excludes NameError; it must propagate uncaught. + """ + def mock_locale_parse(locale_str: str) -> object: + msg = "name 'undefined_var' is not defined" + raise NameError(msg) + + with ( + patch("babel.Locale.parse", side_effect=mock_locale_parse), + pytest.raises(NameError), + ): + _get_babel_currency_name("USD", "en") + +class TestUnknownLocaleErrorHandling: + """Tests for UnknownLocaleError handling (fuzzer-discovered regression). + + Babel's UnknownLocaleError inherits from Exception, not LookupError. + These tests verify the defensive exception handling catches it properly. + """ + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_very_long_invalid_locale_get_currency(self) -> None: + """get_currency handles very long invalid locales gracefully. + + Regression test: fuzzer discovered UnknownLocaleError leak with + locale='x' * 100. Previously raised babel.core.UnknownLocaleError. + """ + # Fuzzer-discovered input + long_locale = "x" * 100 + result = get_currency("USD", locale=long_locale) + # Should return None (graceful degradation), not raise + assert result is None + + def test_very_long_invalid_locale_get_territory(self) -> None: + """get_territory handles very long invalid locales gracefully. + + Regression test for defensive exception handling. + """ + long_locale = "x" * 100 + result = get_territory("US", locale=long_locale) + # Should return None (graceful degradation), not raise + assert result is None + + def test_garbage_locale_get_currency(self) -> None: + """get_currency handles garbage locale strings gracefully.""" + garbage_locales = [ + "!@#$%^", + "123456789", + "\x00\x01\x02", + "a" * 500, + "xx_YY_ZZ_AA_BB", + ] + for locale in garbage_locales: + result = get_currency("USD", locale=locale) + # Should return None, not raise + assert result is None, f"Failed for locale: {locale!r}" + + def test_garbage_locale_get_territory(self) -> None: + """get_territory handles garbage locale strings gracefully.""" + garbage_locales = [ + "!@#$%^", + "123456789", + "\x00\x01\x02", + "a" * 500, + "xx_YY_ZZ_AA_BB", + ] + for locale in garbage_locales: + result = get_territory("US", locale=locale) + # Should return None, not raise + assert result is None, f"Failed for locale: {locale!r}" + + def test_currency_symbol_fallback_on_invalid_locale(self) -> None: + """_get_babel_currency_symbol returns code as fallback for invalid locale.""" + # When locale is invalid, the function should return the code as fallback + result = _get_babel_currency_symbol("USD", "x" * 100) + assert result == "USD" # Falls back to code + + def test_currency_name_none_on_invalid_locale(self) -> None: + """_get_babel_currency_name returns None for invalid locale.""" + result = _get_babel_currency_name("USD", "x" * 100) + assert result is None + + def test_list_territories_empty_on_invalid_locale(self) -> None: + """list_territories returns empty set for invalid locales.""" + long_locale = "x" * 100 + result = list_territories(locale=long_locale) + # Should return empty frozenset, not raise + assert isinstance(result, frozenset) + assert len(result) == 0 + + def test_list_currencies_with_invalid_locale(self) -> None: + """list_currencies handles invalid locales gracefully.""" + long_locale = "x" * 100 + result = list_currencies(locale=long_locale) + # Should return frozenset (may be empty), not raise + assert isinstance(result, frozenset) + +class TestClearAllCachesIntegration: + """Tests for clear_module_caches integration with ISO caches.""" + + def test_clear_module_caches_includes_iso_cache(self) -> None: + """clear_module_caches should clear ISO introspection caches.""" + from ftllexengine import clear_module_caches + from ftllexengine.introspection.iso import ( + _get_territory_impl, + ) + + # Populate ISO cache + get_territory("US") + get_currency("USD") + list_territories() + + # pylint: disable=no-value-for-parameter + # Note: cache_info() is a method added by @lru_cache decorator, not + # related to the function's parameters. Pylint doesn't understand this. + + # Verify cache is populated + info_before = _get_territory_impl.cache_info() + assert info_before.currsize > 0 + + # Clear ALL caches (not just ISO) + clear_module_caches() + + # Verify ISO cache is now empty + info_after = _get_territory_impl.cache_info() + assert info_after.currsize == 0 + +class TestListCurrenciesConsistency: + """Tests for list_currencies() consistency across locales.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_same_currency_count_across_locales(self) -> None: + """list_currencies returns same number of currencies for all locales. + + Currencies without localized names fall back to English names rather + than being excluded, ensuring consistent result sets across locales. + """ + result_en = list_currencies(locale="en") + result_de = list_currencies(locale="de") + result_fr = list_currencies(locale="fr") + + # All locales should return the same number of currencies + assert len(result_en) == len(result_de), ( + f"Currency count differs: en={len(result_en)}, de={len(result_de)}" + ) + assert len(result_en) == len(result_fr), ( + f"Currency count differs: en={len(result_en)}, fr={len(result_fr)}" + ) + + def test_same_currency_codes_across_locales(self) -> None: + """list_currencies returns same currency codes regardless of locale. + + The code set is identical across locales; only names/symbols differ. + """ + codes_en = {c.code for c in list_currencies(locale="en")} + codes_de = {c.code for c in list_currencies(locale="de")} + codes_ja = {c.code for c in list_currencies(locale="ja")} + + assert codes_en == codes_de, "Codes differ: en vs de" + assert codes_en == codes_ja, "Codes differ: en vs ja" + + def test_fallback_name_for_rare_locale(self) -> None: + """Currencies with no localized name use English name as fallback. + + For locales with incomplete CLDR coverage, the English name should + be used rather than excluding the currency. + """ + # Use a rare locale that might have incomplete coverage + result = list_currencies(locale="zu") # Zulu + + # Should still include major currencies + codes = {c.code for c in result} + assert "USD" in codes + assert "EUR" in codes + assert "JPY" in codes + +class TestTerritoryCacheSize: + """Tests for territory cache bounded by MAX_TERRITORY_CACHE_SIZE.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_territory_currencies_cache_size(self) -> None: + """Territory currencies cache uses correct MAX_TERRITORY_CACHE_SIZE.""" + from ftllexengine.constants import ( + MAX_TERRITORY_CACHE_SIZE, + ) + from ftllexengine.introspection.iso import ( + _get_territory_currencies_impl, + ) + + # pylint: disable=no-value-for-parameter + info = _get_territory_currencies_impl.cache_info() + assert info.maxsize == MAX_TERRITORY_CACHE_SIZE + # Should be 300 (enough for all ~249 territories) + assert info.maxsize >= 249 + + def test_no_cache_thrashing_on_full_iteration(self) -> None: + """Iterating all territories should not cause cache thrashing. + + With MAX_TERRITORY_CACHE_SIZE >= 249, all territories fit in cache. + """ + from ftllexengine.introspection.iso import ( + _get_territory_currencies_impl, + ) + + clear_iso_cache() + + # Iterate all territories + territories = list_territories() + for t in territories: + _ = get_territory_currencies(t.alpha2) + + # pylint: disable=no-value-for-parameter + info = _get_territory_currencies_impl.cache_info() + + # No evictions should have occurred (all fit in cache) + # Eviction count is misses - currsize when cache is full + assert info.maxsize is not None # This cache is bounded + assert info.currsize <= info.maxsize + # All unique territories should be cached + unique_territories = {t.alpha2 for t in territories} + assert info.currsize >= len(unique_territories) - 1 # Allow small margin diff --git a/tests/introspection_iso_cases/lookup.py b/tests/introspection_iso_cases/lookup.py new file mode 100644 index 00000000..ed9355a9 --- /dev/null +++ b/tests/introspection_iso_cases/lookup.py @@ -0,0 +1,533 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest + +from ftllexengine.introspection import ( + CurrencyCode, + CurrencyInfo, + TerritoryCode, + TerritoryInfo, + clear_iso_cache, + get_currency, + get_territory, + get_territory_currencies, + is_valid_currency_code, + is_valid_territory_code, + list_currencies, + list_territories, +) + +# Private member access permitted for integration tests + + +class TestTerritoryInfo: + """Tests for TerritoryInfo dataclass.""" + + def test_immutable(self) -> None: + """TerritoryInfo is immutable (frozen).""" + info = TerritoryInfo( + alpha2=TerritoryCode("US"), name="United States", + currencies=(CurrencyCode("USD"),), official_languages=("en",), + ) + with pytest.raises(AttributeError): + info.alpha2 = TerritoryCode("CA") # type: ignore[misc] + + def test_hashable(self) -> None: + """TerritoryInfo is hashable (can be used in sets/dicts).""" + info = TerritoryInfo( + alpha2=TerritoryCode("US"), name="United States", + currencies=(CurrencyCode("USD"),), official_languages=("en",), + ) + assert hash(info) is not None + territories = {info} + assert len(territories) == 1 + + def test_equality(self) -> None: + """TerritoryInfo instances with same values are equal.""" + info1 = TerritoryInfo( + alpha2=TerritoryCode("US"), name="United States", + currencies=(CurrencyCode("USD"),), official_languages=("en",), + ) + info2 = TerritoryInfo( + alpha2=TerritoryCode("US"), name="United States", + currencies=(CurrencyCode("USD"),), official_languages=("en",), + ) + assert info1 == info2 + + def test_slots(self) -> None: + """TerritoryInfo uses __slots__ for memory efficiency.""" + info = TerritoryInfo( + alpha2=TerritoryCode("US"), name="United States", + currencies=(CurrencyCode("USD"),), official_languages=("en",), + ) + assert not hasattr(info, "__dict__") or info.__dict__ == {} + + def test_multi_currency_territory(self) -> None: + """TerritoryInfo supports multiple currencies for multi-currency territories.""" + info = TerritoryInfo( + alpha2=TerritoryCode("PA"), name="Panama", + currencies=(CurrencyCode("PAB"), CurrencyCode("USD")), + official_languages=("es",), + ) + assert len(info.currencies) == 2 + assert CurrencyCode("PAB") in info.currencies + assert CurrencyCode("USD") in info.currencies + + def test_empty_currencies_tuple(self) -> None: + """TerritoryInfo supports empty currencies tuple for territories without currency data.""" + info = TerritoryInfo( + alpha2=TerritoryCode("AQ"), name="Antarctica", + currencies=(), official_languages=(), + ) + assert info.currencies == () + assert len(info.currencies) == 0 + + def test_official_languages_field(self) -> None: + """TerritoryInfo stores official_languages as tuple of BCP-47 codes.""" + info = TerritoryInfo( + alpha2=TerritoryCode("BE"), name="Belgium", + currencies=(CurrencyCode("EUR"),), + official_languages=("fr", "nl", "de"), + ) + assert info.official_languages == ("fr", "nl", "de") + assert isinstance(info.official_languages, tuple) + + def test_official_languages_empty(self) -> None: + """TerritoryInfo accepts empty official_languages tuple.""" + info = TerritoryInfo( + alpha2=TerritoryCode("AQ"), name="Antarctica", + currencies=(), official_languages=(), + ) + assert info.official_languages == () + +class TestCurrencyInfo: + """Tests for CurrencyInfo dataclass.""" + + def test_immutable(self) -> None: + """CurrencyInfo is immutable (frozen).""" + info = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) + with pytest.raises(AttributeError): + info.code = CurrencyCode("EUR") # type: ignore[misc] + + def test_hashable(self) -> None: + """CurrencyInfo is hashable (can be used in sets/dicts).""" + info = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) + assert hash(info) is not None + currencies = {info} + assert len(currencies) == 1 + + def test_equality(self) -> None: + """CurrencyInfo instances with same values are equal.""" + info1 = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) + info2 = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) + assert info1 == info2 + + def test_slots(self) -> None: + """CurrencyInfo uses __slots__ for memory efficiency.""" + info = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) + assert not hasattr(info, "__dict__") or info.__dict__ == {} + +class TestGetTerritory: + """Tests for get_territory() function.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_returns_territory_info_for_valid_code(self) -> None: + """get_territory returns TerritoryInfo for known codes.""" + result = get_territory("US") + assert result is not None + assert isinstance(result, TerritoryInfo) + assert result.alpha2 == "US" + assert "United States" in result.name or "USA" in result.name + + def test_returns_none_for_unknown_code(self) -> None: + """get_territory returns None for unknown codes.""" + result = get_territory("XX") + assert result is None + + def test_case_insensitive(self) -> None: + """get_territory accepts lowercase codes.""" + result_upper = get_territory("US") + result_lower = get_territory("us") + result_mixed = get_territory("Us") + + assert result_upper is not None + assert result_lower is not None + assert result_mixed is not None + assert result_upper.alpha2 == result_lower.alpha2 == result_mixed.alpha2 + + def test_localized_names(self) -> None: + """get_territory returns localized names based on locale.""" + result_en = get_territory("DE", locale="en") + result_de = get_territory("DE", locale="de") + + assert result_en is not None + assert result_de is not None + + # English name should contain "Germany" + assert "Germany" in result_en.name + # German name should be "Deutschland" + assert "Deutschland" in result_de.name + + def test_includes_currencies(self) -> None: + """get_territory includes currencies when available.""" + result = get_territory("US") + assert result is not None + assert "USD" in result.currencies + + result_jp = get_territory("JP") + assert result_jp is not None + assert "JPY" in result_jp.currencies + + def test_includes_official_languages(self) -> None: + """get_territory populates official_languages from CLDR data.""" + # GB has English as official language per CLDR + result_gb = get_territory("GB") + assert result_gb is not None + assert isinstance(result_gb.official_languages, tuple) + assert "en" in result_gb.official_languages + + # Belgium has three official languages per CLDR + result_be = get_territory("BE") + assert result_be is not None + assert isinstance(result_be.official_languages, tuple) + assert len(result_be.official_languages) >= 2 + for lang in result_be.official_languages: + assert isinstance(lang, str) + assert len(lang) > 0 + + # official_languages is always a tuple (may be empty for some territories) + result_us = get_territory("US") + assert result_us is not None + assert isinstance(result_us.official_languages, tuple) + + def test_various_territories(self) -> None: + """get_territory works for various territory codes.""" + test_cases = ["US", "CA", "GB", "DE", "FR", "JP", "AU", "BR", "IN", "CN"] + + for code in test_cases: + result = get_territory(code) + assert result is not None, f"Failed for {code}" + assert result.alpha2 == code + assert len(result.name) > 0 + + def test_casefold_expansion_returns_none(self) -> None: + """get_territory returns None for inputs that expand via str.upper(). + + 'ß' (U+00DF, LATIN SMALL LETTER SHARP S) has len 1 but upper() returns + 'SS' (len 2), which is the valid ISO 3166-1 code for South Sudan. The + raw input 'ß' is not a valid territory code and must return None. + Regression for FIX-ISO-CASEFOLD-001. + """ + # 'ß'.upper() == 'SS' (South Sudan) — must not be returned + assert get_territory("ß") is None + # Confirm 'SS' itself IS valid (South Sudan exists in CLDR) + assert get_territory("SS") is not None + +class TestGetCurrency: + """Tests for get_currency() function.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_returns_currency_info_for_valid_code(self) -> None: + """get_currency returns CurrencyInfo for known codes.""" + result = get_currency("USD") + assert result is not None + assert isinstance(result, CurrencyInfo) + assert result.code == "USD" + assert "$" in result.symbol or "USD" in result.symbol + + def test_returns_none_for_unknown_code(self) -> None: + """get_currency returns None for truly unknown codes.""" + # Use a code that's definitely not in any currency database + result = get_currency("ZZZ") + assert result is None + + def test_case_insensitive(self) -> None: + """get_currency accepts lowercase codes.""" + result_upper = get_currency("USD") + result_lower = get_currency("usd") + result_mixed = get_currency("Usd") + + assert result_upper is not None + assert result_lower is not None + assert result_mixed is not None + assert result_upper.code == result_lower.code == result_mixed.code + + def test_localized_symbols(self) -> None: + """get_currency returns localized symbols based on locale.""" + result_en = get_currency("EUR", locale="en") + result_de = get_currency("EUR", locale="de") + + assert result_en is not None + assert result_de is not None + + def test_decimal_digits_standard(self) -> None: + """get_currency returns correct decimal digits for standard currencies.""" + usd = get_currency("USD") + eur = get_currency("EUR") + gbp = get_currency("GBP") + + assert usd is not None + assert usd.decimal_digits == 2 + assert eur is not None + assert eur.decimal_digits == 2 + assert gbp is not None + assert gbp.decimal_digits == 2 + + def test_decimal_digits_zero(self) -> None: + """get_currency returns 0 decimal digits for zero-decimal currencies.""" + jpy = get_currency("JPY") + krw = get_currency("KRW") + vnd = get_currency("VND") + + assert jpy is not None + assert jpy.decimal_digits == 0 + assert krw is not None + assert krw.decimal_digits == 0 + assert vnd is not None + assert vnd.decimal_digits == 0 + + def test_decimal_digits_three(self) -> None: + """get_currency returns 3 decimal digits for three-decimal currencies.""" + kwd = get_currency("KWD") + bhd = get_currency("BHD") + omr = get_currency("OMR") + + assert kwd is not None + assert kwd.decimal_digits == 3 + assert bhd is not None + assert bhd.decimal_digits == 3 + assert omr is not None + assert omr.decimal_digits == 3 + + def test_decimal_digits_four(self) -> None: + """get_currency returns 4 decimal digits for accounting units.""" + clf = get_currency("CLF") + uyw = get_currency("UYW") + + assert clf is not None + assert clf.decimal_digits == 4 + assert uyw is not None + assert uyw.decimal_digits == 4 + + def test_casefold_expansion_returns_none(self) -> None: + """get_currency returns None for inputs that expand via str.upper(). + + A 2-char input whose upper() produces a valid 3-char currency code + must return None — the raw input is not a valid currency code. + Regression for FIX-ISO-CASEFOLD-001. + """ + # 'ßD' has len 2; 'ßD'.upper() == 'SSD' (not a valid code, but the + # pattern is guarded). Verify the length guard returns None for any + # wrong-length input regardless of what upper() produces. + assert get_currency("ß") is None # len 1 + assert get_currency("ßD") is None # len 2, 'ßD'.upper() = 'SSD' + +class TestListTerritories: + """Tests for list_territories() function.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_returns_frozenset(self) -> None: + """list_territories returns a frozenset.""" + result = list_territories() + assert isinstance(result, frozenset) + + def test_contains_major_territories(self) -> None: + """list_territories includes major world territories.""" + result = list_territories() + codes = {t.alpha2 for t in result} + + major_codes = ["US", "CA", "GB", "DE", "FR", "JP", "AU", "BR", "IN", "CN"] + for code in major_codes: + assert code in codes, f"Missing {code}" + + def test_all_have_two_letter_codes(self) -> None: + """All returned territories have valid 2-letter alpha codes.""" + result = list_territories() + + for territory in result: + assert len(territory.alpha2) == 2 + assert territory.alpha2.isalpha() + assert territory.alpha2.isupper() + + def test_localized_names(self) -> None: + """list_territories returns localized names based on locale.""" + result_en = list_territories(locale="en") + result_de = list_territories(locale="de") + + # Find Germany in both results + de_en = next((t for t in result_en if t.alpha2 == "DE"), None) + de_de = next((t for t in result_de if t.alpha2 == "DE"), None) + + assert de_en is not None + assert de_de is not None + assert "Germany" in de_en.name + assert "Deutschland" in de_de.name + +class TestListCurrencies: + """Tests for list_currencies() function.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_returns_frozenset(self) -> None: + """list_currencies returns a frozenset.""" + result = list_currencies() + assert isinstance(result, frozenset) + + def test_contains_major_currencies(self) -> None: + """list_currencies includes major world currencies.""" + result = list_currencies() + codes = {c.code for c in result} + + major_codes = ["USD", "EUR", "GBP", "JPY", "CHF", "CAD", "AUD"] + for code in major_codes: + assert code in codes, f"Missing {code}" + + def test_all_have_three_letter_codes(self) -> None: + """All returned currencies have valid 3-letter codes.""" + result = list_currencies() + + for currency in result: + assert len(currency.code) == 3 + assert currency.code.isalpha() + assert currency.code.isupper() + +class TestGetTerritoryCurrencies: + """Tests for get_territory_currencies() function.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_returns_currencies_for_known_territory(self) -> None: + """get_territory_currencies returns currencies for known territories.""" + us_currencies = get_territory_currencies("US") + assert isinstance(us_currencies, tuple) + assert "USD" in us_currencies + + jp_currencies = get_territory_currencies("JP") + assert "JPY" in jp_currencies + + gb_currencies = get_territory_currencies("GB") + assert "GBP" in gb_currencies + + def test_returns_empty_tuple_for_unknown_territory(self) -> None: + """get_territory_currencies returns empty tuple for unknown territories.""" + result = get_territory_currencies("XX") + assert result == () + + def test_case_insensitive(self) -> None: + """get_territory_currencies accepts lowercase codes.""" + assert "USD" in get_territory_currencies("us") + assert "JPY" in get_territory_currencies("jp") + + def test_eurozone_countries(self) -> None: + """get_territory_currencies returns EUR for eurozone countries.""" + eurozone = ["DE", "FR", "IT", "ES", "NL", "BE", "AT", "LV", "LT", "EE"] + + for code in eurozone: + result = get_territory_currencies(code) + assert "EUR" in result, f"Expected EUR for {code}, got {result}" + + def test_multi_currency_territories(self) -> None: + """get_territory_currencies returns all currencies for multi-currency territories.""" + # Panama uses both PAB and USD + pa_currencies = get_territory_currencies("PA") + # CLDR data should include at least one currency + assert len(pa_currencies) >= 1 + + def test_returns_tuple_for_immutability(self) -> None: + """get_territory_currencies returns an immutable tuple per architectural requirement.""" + result = get_territory_currencies("US") + assert isinstance(result, tuple) + # Verify it's immutable (tuple cannot be modified) + # Callers can convert to list if mutation is needed: list(result) + + def test_casefold_expansion_returns_empty(self) -> None: + """get_territory_currencies returns () for inputs that expand via str.upper(). + + 'ß' (len 1) uppercases to 'SS' (South Sudan, valid), but the raw + input is not a valid territory code. Must return empty tuple. + Regression for FIX-ISO-CASEFOLD-001. + """ + assert get_territory_currencies("ß") == () + # Confirm 'SS' itself returns currencies (South Sudan uses USD) + assert get_territory_currencies("SS") != () + +class TestTypeGuards: + """Tests for type guard functions.""" + + def setup_method(self) -> None: + """Clear cache before each test.""" + clear_iso_cache() + + def test_is_valid_territory_code_valid(self) -> None: + """is_valid_territory_code returns True for valid codes.""" + assert is_valid_territory_code("US") is True + assert is_valid_territory_code("GB") is True + assert is_valid_territory_code("JP") is True + + def test_is_valid_territory_code_invalid(self) -> None: + """is_valid_territory_code returns False for invalid codes.""" + # XX is not in CLDR; ZZ is (represents "Unknown Region") + assert is_valid_territory_code("XX") is False + assert is_valid_territory_code("QQ") is False + + def test_is_valid_territory_code_wrong_length(self) -> None: + """is_valid_territory_code returns False for wrong-length strings.""" + assert is_valid_territory_code("U") is False + assert is_valid_territory_code("USA") is False + assert is_valid_territory_code("") is False + + def test_is_valid_territory_code_case_insensitive(self) -> None: + """is_valid_territory_code is case insensitive.""" + assert is_valid_territory_code("us") is True + assert is_valid_territory_code("Us") is True + + def test_is_valid_currency_code_valid(self) -> None: + """is_valid_currency_code returns True for valid codes.""" + assert is_valid_currency_code("USD") is True + assert is_valid_currency_code("EUR") is True + assert is_valid_currency_code("JPY") is True + + def test_is_valid_currency_code_invalid(self) -> None: + """is_valid_currency_code returns False for invalid codes.""" + # ZZZ and QQQ are not in CLDR; XXX is (represents "No currency") + assert is_valid_currency_code("ZZZ") is False + assert is_valid_currency_code("QQQ") is False + + def test_is_valid_currency_code_wrong_length(self) -> None: + """is_valid_currency_code returns False for wrong-length strings.""" + assert is_valid_currency_code("US") is False + assert is_valid_currency_code("USDD") is False + assert is_valid_currency_code("") is False + + def test_is_valid_currency_code_case_insensitive(self) -> None: + """is_valid_currency_code is case insensitive.""" + assert is_valid_currency_code("usd") is True + assert is_valid_currency_code("Usd") is True + + def test_type_guard_lookup_consistency_casefold(self) -> None: + """Type guard and lookup agree for inputs that expand under str.upper(). + + If is_valid_territory_code(v) is False, get_territory(v) must be None. + 'ß' (len 1, upper() = 'SS') violated this invariant before FIX-ISO-CASEFOLD-001. + """ + assert is_valid_territory_code("ß") is False + assert get_territory("ß") is None + + assert is_valid_currency_code("ß") is False + assert get_currency("ß") is None + assert is_valid_currency_code("ßD") is False + assert get_currency("ßD") is None diff --git a/tests/introspection_iso_cases/requirements.py b/tests/introspection_iso_cases/requirements.py new file mode 100644 index 00000000..d5128e70 --- /dev/null +++ b/tests/introspection_iso_cases/requirements.py @@ -0,0 +1,319 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest + +from ftllexengine.introspection import ( + CurrencyCode, + TerritoryCode, + get_currency, + get_currency_decimal_digits, + require_currency_code, + require_territory_code, +) + +# Private member access permitted for integration tests + + +class TestGetCurrencyDecimalDigits: + """Tests for get_currency_decimal_digits() convenience function. + + Decimal precision is locale-independent (ISO 4217 standard). + The function must not require a locale parameter. + """ + + def test_standard_two_decimal_currencies(self) -> None: + """Common 2-decimal currencies return 2.""" + for code in ("EUR", "USD", "GBP", "CHF", "CAD", "AUD", "NZD"): + assert get_currency_decimal_digits(code) == 2, ( + f"{code} should have 2 decimal digits" + ) + + def test_zero_decimal_currencies(self) -> None: + """Zero-decimal currencies return 0.""" + for code in ("JPY", "KRW", "VND", "ISK", "CLP"): + assert get_currency_decimal_digits(code) == 0, ( + f"{code} should have 0 decimal digits" + ) + + def test_three_decimal_currencies(self) -> None: + """Three-decimal currencies return 3.""" + for code in ("KWD", "JOD", "OMR", "BHD", "TND"): + assert get_currency_decimal_digits(code) == 3, ( + f"{code} should have 3 decimal digits" + ) + + def test_four_decimal_currencies(self) -> None: + """Four-decimal currencies return 4.""" + assert get_currency_decimal_digits("CLF") == 4 + assert get_currency_decimal_digits("UYW") == 4 + + def test_unknown_code_returns_none(self) -> None: + """Unknown ISO code returns None.""" + assert get_currency_decimal_digits("XYZ") is None + assert get_currency_decimal_digits("FOO") is None + + def test_case_insensitive(self) -> None: + """Currency code lookup is case-insensitive.""" + assert get_currency_decimal_digits("eur") == 2 + assert get_currency_decimal_digits("Eur") == 2 + assert get_currency_decimal_digits("EUR") == 2 + assert get_currency_decimal_digits("jpy") == 0 + + def test_wrong_length_returns_none(self) -> None: + """Codes of wrong length return None without Babel call.""" + assert get_currency_decimal_digits("") is None + assert get_currency_decimal_digits("EU") is None + assert get_currency_decimal_digits("EURO") is None + + def test_consistent_with_get_currency(self) -> None: + """Result matches get_currency(code).decimal_digits for all known codes.""" + for code in ("USD", "EUR", "JPY", "KWD", "CLF", "GBP"): + info = get_currency(code) + assert info is not None + digits = get_currency_decimal_digits(code) + assert digits == info.decimal_digits, ( + f"Inconsistency for {code}: get_currency_decimal_digits={digits}, " + f"get_currency().decimal_digits={info.decimal_digits}" + ) + + def test_latvian_lats_historical(self) -> None: + """Historical currency LVL (Latvian Lats) returns None (withdrawn from ISO 4217).""" + # LVL is a withdrawn currency — Babel no longer includes it in active CLDR data. + # get_currency_decimal_digits must return None for withdrawn/unknown codes. + result = get_currency_decimal_digits("LVL") + # Accept both None (withdrawn from Babel's CLDR) and 2 (if still in data). + assert result in (None, 2), f"LVL should be None or 2, got {result!r}" + + def test_precious_metal_x_codes_return_zero(self) -> None: + """ISO 4217 precious-metal X-codes return 0 decimal digits.""" + for code in ("XAG", "XAU", "XPD", "XPT"): + assert get_currency_decimal_digits(code) == 0, ( + f"{code} (precious metal) should have 0 decimal digits" + ) + + def test_special_x_codes_return_zero(self) -> None: + """ISO 4217 special X-codes (bond units, SDR, testing, no-currency) return 0.""" + for code in ("XBA", "XBB", "XBC", "XBD", "XDR", "XSU", "XTS", "XUA", "XXX"): + assert get_currency_decimal_digits(code) == 0, ( + f"{code} should have 0 decimal digits" + ) + + def test_xcd_eastern_caribbean_is_two_decimal(self) -> None: + """XCD (Eastern Caribbean Dollar) uses default 2 decimal digits.""" + assert get_currency_decimal_digits("XCD") == 2 + + def test_babel_free_no_babel_install_required(self) -> None: + """get_currency_decimal_digits works without Babel installed. + + Validates the Babel-free contract: result must not depend on any + Babel import path. We verify by confirming standard codes work and + that the returned value is a plain int (not a Babel-derived object). + """ + result = get_currency_decimal_digits("USD") + assert result == 2 + assert type(result) is int + + def test_known_invalid_codes_return_none(self) -> None: + """Non-ISO codes return None without fallback to default.""" + for code in ("XYZ", "FOO", "ZZZ", "AAA", "TST"): + assert get_currency_decimal_digits(code) is None, ( + f"Unknown code {code!r} should return None" + ) + + def test_casefold_expansion_guard(self) -> None: + """Single-char inputs that expand via .upper() return None (no casefold confusion). + + Verifies the raw-length guard prevents the 'ß' -> 'SS' casefold expansion + from matching 'SS' or any other 2-char result of uppercasing a 1-char input. + """ + assert get_currency_decimal_digits("ß") is None + assert get_currency_decimal_digits("a") is None + + def test_fund_codes_return_correct_precision(self) -> None: + """ISO 4217 fund codes are valid and return correct precision.""" + # BOV (Bolivian Mvdol), MXV (Mexican Unidad), USN (US Next Day): 2 decimal + for code in ("BOV", "MXV", "USN"): + result = get_currency_decimal_digits(code) + assert result == 2, f"{code} (fund code) should have 2 decimal digits" + # UYI (Uruguay Peso en Unidades Indexadas): 0 decimal + assert get_currency_decimal_digits("UYI") == 0 + + def test_recently_added_active_codes(self) -> None: + """Codes added by recent ISO 4217 amendments are active and return precision. + + VED (Amendment 169, 2021), ZWG (Amendment 171+, 2024), and XCG + (Amendment 17x, 2025) are active ISO 4217 codes with default 2 decimal digits. + """ + for code in ("VED", "ZWG", "XCG"): + result = get_currency_decimal_digits(code) + assert result == 2, ( + f"{code} (recently-added active code) should have 2 decimal digits, " + f"got {result!r}" + ) + + def test_recently_retired_codes_return_none(self) -> None: + """Codes retired by recent ISO 4217 amendments return None. + + SLL (Sierra Leone Leone, retired Amendment 170, 2022) and ZWL + (Zimbabwean Dollar, retired Amendment 171+, 2024) are no longer active + and must not appear in ISO_4217_VALID_CODES. + """ + for code in ("SLL", "ZWL"): + result = get_currency_decimal_digits(code) + assert result is None, ( + f"{code} (retired code) should return None, got {result!r}" + ) + + def test_iqd_iso_standard_value(self) -> None: + """IQD (Iraqi Dinar) returns ISO 4217 standard value of 3 decimal digits. + + ISO 4217 specifies IQD with 3 decimal places (fils subdivision). + Babel CLDR reports 0 because fils are not used in practice. + This library follows the ISO standard, not CLDR practical usage. + """ + assert get_currency_decimal_digits("IQD") == 3 + +class TestRequireCurrencyCode: + """Tests for require_currency_code boundary validator.""" + + def test_valid_uppercase_code_returns_currency_code(self) -> None: + """Valid uppercase ISO 4217 code returns CurrencyCode.""" + result = require_currency_code("USD", "price") + assert result == CurrencyCode("USD") + assert type(result) is str # CurrencyCode is a str alias + + def test_valid_lowercase_code_is_normalized(self) -> None: + """Lowercase code is normalised to uppercase CurrencyCode.""" + result = require_currency_code("eur", "amount") + assert result == CurrencyCode("EUR") + + def test_valid_mixed_case_code_is_normalized(self) -> None: + """Mixed-case code is normalised to uppercase.""" + result = require_currency_code("Jpy", "fee") + assert result == CurrencyCode("JPY") + + def test_leading_trailing_whitespace_is_stripped(self) -> None: + """Whitespace around a valid code is stripped before validation.""" + result = require_currency_code(" GBP ", "price") + assert result == CurrencyCode("GBP") + + def test_invalid_code_raises_value_error(self) -> None: + """Unrecognised currency code raises ValueError.""" + with pytest.raises(ValueError, match="currency code"): + require_currency_code("XYZ", "amount") + + def test_empty_string_raises_value_error(self) -> None: + """Empty string raises ValueError (not a valid ISO 4217 code).""" + with pytest.raises(ValueError, match="currency code"): + require_currency_code("", "amount") + + def test_whitespace_only_raises_value_error(self) -> None: + """Whitespace-only string raises ValueError after stripping.""" + with pytest.raises(ValueError, match="currency code"): + require_currency_code(" ", "amount") + + def test_non_str_raises_type_error(self) -> None: + """Non-str value raises TypeError with field_name in message.""" + with pytest.raises(TypeError, match="price"): + require_currency_code(123, "price") + + def test_none_raises_type_error(self) -> None: + """None raises TypeError.""" + with pytest.raises(TypeError, match="currency_code"): + require_currency_code(None, "currency_code") + + def test_field_name_in_error_message(self) -> None: + """field_name appears in both TypeError and ValueError messages.""" + with pytest.raises(TypeError, match="my_field"): + require_currency_code(42, "my_field") + with pytest.raises(ValueError, match="my_field"): + require_currency_code("BADCODE", "my_field") + + def test_valid_codes_cover_major_currencies(self) -> None: + """Major ISO 4217 codes are accepted.""" + for code in ("USD", "EUR", "GBP", "JPY", "CHF", "CAD", "AUD"): + result = require_currency_code(code, "amount") + assert result == CurrencyCode(code) + + def test_returns_currency_code_type(self) -> None: + """Return value is CurrencyCode (str subtype).""" + result = require_currency_code("USD", "amount") + assert isinstance(result, str) + +class TestRequireTerritoryCode: + """Tests for require_territory_code boundary validator.""" + + def test_valid_uppercase_code_returns_territory_code(self) -> None: + """Valid uppercase ISO 3166-1 alpha-2 code returns TerritoryCode.""" + result = require_territory_code("US", "region") + assert result == TerritoryCode("US") + + def test_valid_lowercase_code_is_normalized(self) -> None: + """Lowercase code is normalised to uppercase TerritoryCode.""" + result = require_territory_code("de", "country") + assert result == TerritoryCode("DE") + + def test_valid_mixed_case_code_is_normalized(self) -> None: + """Mixed-case code is normalised to uppercase.""" + result = require_territory_code("Gb", "territory") + assert result == TerritoryCode("GB") + + def test_leading_trailing_whitespace_is_stripped(self) -> None: + """Whitespace around a valid code is stripped before validation.""" + result = require_territory_code(" FR ", "country") + assert result == TerritoryCode("FR") + + def test_invalid_code_raises_value_error(self) -> None: + """Unrecognised territory code raises ValueError.""" + # "99"/"X9" contain digits — not valid ISO 3166-1 alpha-2 codes + with pytest.raises(ValueError, match="territory code"): + require_territory_code("99", "region") + + def test_empty_string_raises_value_error(self) -> None: + """Empty string raises ValueError.""" + with pytest.raises(ValueError, match="territory code"): + require_territory_code("", "region") + + def test_whitespace_only_raises_value_error(self) -> None: + """Whitespace-only string raises ValueError after stripping.""" + with pytest.raises(ValueError, match="territory code"): + require_territory_code(" ", "region") + + def test_three_char_code_raises_value_error(self) -> None: + """3-char string is not a valid alpha-2 code and raises ValueError.""" + with pytest.raises(ValueError, match="territory code"): + require_territory_code("USA", "country") + + def test_non_str_raises_type_error(self) -> None: + """Non-str value raises TypeError with field_name in message.""" + with pytest.raises(TypeError, match="region"): + require_territory_code(42, "region") + + def test_none_raises_type_error(self) -> None: + """None raises TypeError.""" + with pytest.raises(TypeError, match="territory"): + require_territory_code(None, "territory") + + def test_field_name_in_error_message(self) -> None: + """field_name appears in both TypeError and ValueError messages.""" + with pytest.raises(TypeError, match="my_field"): + require_territory_code(99, "my_field") + with pytest.raises(ValueError, match="my_field"): + require_territory_code("XX", "my_field") + + def test_valid_codes_cover_major_territories(self) -> None: + """Major ISO 3166-1 alpha-2 codes are accepted.""" + for code in ("US", "DE", "GB", "FR", "JP", "CA", "AU"): + result = require_territory_code(code, "region") + assert result == TerritoryCode(code) + + def test_casefold_expansion_guard(self) -> None: + """Single-char inputs that expand via .upper() (e.g. 'ß'->'SS') are rejected.""" + with pytest.raises(ValueError, match="territory code"): + require_territory_code("ß", "region") + + def test_returns_territory_code_type(self) -> None: + """Return value is TerritoryCode (str subtype).""" + result = require_territory_code("US", "region") + assert isinstance(result, str) diff --git a/tests/introspection_message_cases/__init__.py b/tests/introspection_message_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/introspection_message_cases/cache_and_validation.py b/tests/introspection_message_cases/cache_and_validation.py new file mode 100644 index 00000000..00de7a88 --- /dev/null +++ b/tests/introspection_message_cases/cache_and_validation.py @@ -0,0 +1,320 @@ +# mypy: ignore-errors +from __future__ import annotations + +from unittest.mock import patch + +import pytest +from hypothesis import event, given, settings +from hypothesis import strategies as st + +import ftllexengine.introspection.message as _introspection_msg_mod +from ftllexengine.introspection import ( + MessageVariableValidationResult, + clear_introspection_cache, + introspect_message, + validate_message_variables, +) +from ftllexengine.syntax.ast import ( + Attribute, + Identifier, + Message, + Pattern, + Placeable, + Term, + TextElement, +) +from ftllexengine.syntax.parser import FluentParserV1 + +# =========================================================================== +# HELPERS +# =========================================================================== + + +def _parse_message(ftl: str) -> Message: + """Parse FTL source and return first Message entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Message) + return entry + + +def _parse_term(ftl: str) -> Term: + """Parse FTL source and return first Term entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Term) + return entry + + +def _make_message( + name: str, + *, + value: Pattern | None = None, + attributes: tuple[Attribute, ...] = (), +) -> Message: + """Construct a Message programmatically (bypasses parser).""" + return Message(id=Identifier(name=name), value=value, attributes=attributes) + + +def _make_pattern(*elements: TextElement | Placeable) -> Pattern: + """Construct a Pattern from elements.""" + return Pattern(elements=elements) + + +# =========================================================================== +# VARIABLE EXTRACTION +# =========================================================================== + + + +class TestCacheDoubleCheckHit: + """Covers introspect_message line 674: the locked double-check cache hit. + + Line 674 fires only when another thread stores the result between step 1 + (initial pre-lock miss check) and step 3 (locked store). The test uses + a mock lock that pre-fills the cache before the double-check code runs, + exactly simulating the winning-race scenario. + """ + + def test_double_check_returns_preexisting_result(self) -> None: + """Line 674: double-check inside lock returns pre-filled entry. + + The mock lock pre-fills _introspection_cache[msg] on __enter__, + simulating another thread winning the race. introspect_message must + return the pre-filled result rather than overwriting it. + """ + msg = _parse_message("dc-test = { $var }") + clear_introspection_cache() + + # Compute reference result (no cache interaction) + expected = introspect_message(msg, use_cache=False) + clear_introspection_cache() + + # Capture original lock before patching + orig_lock = _introspection_msg_mod._introspection_cache_lock + + class _RaceLock: + """Simulates a concurrent thread winning the race at Step 3. + + introspect_message acquires the lock TWICE per call with use_cache=True: + - First acquisition: Step 1 read-check (cache is empty, should miss) + - Second acquisition: Step 3 write-check (pre-fill simulates the race) + Pre-filling on the first acquisition would cause an early return at the + Step 1 hit (line 641), bypassing the double-check at line 674 entirely. + """ + + def __init__(self) -> None: + self._call_count = 0 + + def __enter__(self) -> object: + orig_lock.acquire() + self._call_count += 1 + if self._call_count == 2: # Step 3 write-check only + _introspection_msg_mod._introspection_cache[msg] = expected + return self + + def __exit__( + self, + exc_type: type[BaseException] | None, + exc_val: BaseException | None, + exc_tb: object, + ) -> None: + orig_lock.release() + + with patch.object( + _introspection_msg_mod, "_introspection_cache_lock", _RaceLock() + ): + # Step 1: first __enter__ — cache is empty, miss, continue to Step 2 + # Step 2: computation proceeds normally + # Step 3: second __enter__ pre-fills cache — double-check hits line 674 + result = introspect_message(msg, use_cache=True) + + assert result.message_id == expected.message_id + assert result.get_variable_names() == expected.get_variable_names() + clear_introspection_cache() + +class TestMessageVariableValidationResult: + """Tests for the MessageVariableValidationResult frozen dataclass.""" + + def test_immutable(self) -> None: + """MessageVariableValidationResult is frozen (immutable).""" + result = MessageVariableValidationResult( + message_id="greeting", + is_valid=True, + declared_variables=frozenset({"name"}), + missing_variables=frozenset(), + extra_variables=frozenset(), + ) + with pytest.raises(AttributeError): + result.is_valid = False # type: ignore[misc] + + def test_valid_result_fields(self) -> None: + """is_valid=True when missing and extra are both empty.""" + result = MessageVariableValidationResult( + message_id="msg", + is_valid=True, + declared_variables=frozenset({"a", "b"}), + missing_variables=frozenset(), + extra_variables=frozenset(), + ) + assert result.is_valid is True + assert result.declared_variables == frozenset({"a", "b"}) + assert result.missing_variables == frozenset() + assert result.extra_variables == frozenset() + + def test_invalid_with_missing(self) -> None: + """is_valid=False when missing_variables is non-empty.""" + result = MessageVariableValidationResult( + message_id="msg", + is_valid=False, + declared_variables=frozenset({"a"}), + missing_variables=frozenset({"b"}), + extra_variables=frozenset(), + ) + assert result.is_valid is False + assert "b" in result.missing_variables + + def test_hashable(self) -> None: + """MessageVariableValidationResult is hashable (frozen dataclass).""" + r1 = MessageVariableValidationResult( + message_id="greeting", + is_valid=True, + declared_variables=frozenset({"name"}), + missing_variables=frozenset(), + extra_variables=frozenset(), + ) + assert hash(r1) is not None + s: set[MessageVariableValidationResult] = {r1} + assert len(s) == 1 + +class TestValidateMessageVariables: + """Tests for validate_message_variables().""" + + def test_exact_match_is_valid(self) -> None: + """Message declaring exactly the expected variables returns is_valid=True.""" + msg = _parse_message("greeting = Hello, { $name }! You have { $count } items.") + result = validate_message_variables(msg, {"name", "count"}) + assert result.is_valid is True + assert result.declared_variables == frozenset({"name", "count"}) + assert result.missing_variables == frozenset() + assert result.extra_variables == frozenset() + + def test_missing_variable_detected(self) -> None: + """Expected variable absent from FTL message is reported in missing_variables.""" + msg = _parse_message("greeting = Hello, { $name }!") + result = validate_message_variables(msg, {"name", "count"}) + assert result.is_valid is False + assert result.missing_variables == frozenset({"count"}) + assert result.extra_variables == frozenset() + + def test_extra_variable_detected(self) -> None: + """Variable declared in FTL but absent from expected is reported in extra_variables.""" + msg = _parse_message("greeting = Hello, { $name }! You have { $count } items.") + result = validate_message_variables(msg, {"name"}) + assert result.is_valid is False + assert result.extra_variables == frozenset({"count"}) + assert result.missing_variables == frozenset() + + def test_both_missing_and_extra_detected(self) -> None: + """Both missing and extra variables reported independently.""" + msg = _parse_message("msg = { $actual } value") + result = validate_message_variables(msg, {"expected"}) + assert result.is_valid is False + assert "expected" in result.missing_variables + assert "actual" in result.extra_variables + + def test_empty_expected_all_extra(self) -> None: + """Expected set is empty: all declared variables are extra.""" + msg = _parse_message("msg = Hello { $name }!") + result = validate_message_variables(msg, frozenset()) + assert result.is_valid is False + assert result.extra_variables == frozenset({"name"}) + assert result.missing_variables == frozenset() + + def test_message_with_no_variables_and_empty_expected(self) -> None: + """Static message with no variables and empty expected is valid.""" + msg = _parse_message("static = Hello World") + result = validate_message_variables(msg, frozenset()) + assert result.is_valid is True + assert result.declared_variables == frozenset() + + def test_message_id_extracted_from_ast_node(self) -> None: + """result.message_id matches the FTL message identifier.""" + msg = _parse_message("my-message = { $var }") + result = validate_message_variables(msg, {"var"}) + assert result.message_id == "my-message" + + def test_frozenset_and_set_expected_equivalent(self) -> None: + """frozenset and set inputs for expected_variables produce identical results.""" + msg = _parse_message("greeting = Hello, { $name }!") + result_set = validate_message_variables(msg, {"name"}) + result_frozen = validate_message_variables(msg, frozenset({"name"})) + assert result_set.is_valid == result_frozen.is_valid + assert result_set.declared_variables == result_frozen.declared_variables + assert result_set.missing_variables == result_frozen.missing_variables + assert result_set.extra_variables == result_frozen.extra_variables + + def test_validate_term(self) -> None: + """validate_message_variables works on Term AST nodes.""" + resource = FluentParserV1().parse("-brand = { $edition } Edition") + term = next(e for e in resource.entries if isinstance(e, Term)) + result = validate_message_variables(term, {"edition"}) + assert result.is_valid is True + assert result.message_id == "brand" + + @given( + var_names=st.frozensets( + st.from_regex(r"[a-z][a-z]{0,9}", fullmatch=True), + min_size=0, + max_size=5, + ), + extra_vars=st.frozensets( + st.from_regex(r"[a-z][a-z]{0,9}", fullmatch=True), + min_size=0, + max_size=3, + ), + ) + @settings(max_examples=200) + def test_property_validity_iff_exact_match( + self, var_names: frozenset[str], extra_vars: frozenset[str] + ) -> None: + """is_valid iff declared == expected (exact set equality). + + Constructs a message with exactly var_names as variables, validates + against expected = var_names | extra_vars. Result is valid only when + extra_vars is empty. + """ + event(f"declared_count={len(var_names)}") + event(f"extra_count={len(extra_vars)}") + + # Filter out names that overlap between the two sets + safe_names = list(var_names) + safe_extra = [n for n in extra_vars if n not in var_names] + + if not safe_names and not safe_extra: + event("outcome=empty_skip") + return + + placeable_ftl = " ".join(f"{{ ${n} }}" for n in safe_names) + ftl_source = f"msg = {placeable_ftl or 'static'}" + + resource = FluentParserV1().parse(ftl_source) + messages = [e for e in resource.entries if isinstance(e, Message)] + if not messages: + event("outcome=parse_failed") + return + + declared = frozenset(safe_names) + expected = declared | frozenset(safe_extra) + result = validate_message_variables(messages[0], expected) + + assert result.declared_variables == declared + assert result.missing_variables == frozenset(safe_extra) + assert result.extra_variables == frozenset() + + if safe_extra: + event("outcome=missing_detected") + assert result.is_valid is False + else: + event("outcome=exact_match") + assert result.is_valid is True diff --git a/tests/introspection_message_cases/contracts_and_spans.py b/tests/introspection_message_cases/contracts_and_spans.py new file mode 100644 index 00000000..c319e101 --- /dev/null +++ b/tests/introspection_message_cases/contracts_and_spans.py @@ -0,0 +1,507 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine import FluentBundle, parse_ftl +from ftllexengine.enums import ReferenceKind, VariableContext +from ftllexengine.introspection import ( + VariableInfo, + introspect_message, +) +from ftllexengine.introspection.message import ( + IntrospectionVisitor, + ReferenceExtractor, +) +from ftllexengine.syntax.ast import ( + Attribute, + Identifier, + Junk, + Message, + Pattern, + Placeable, + Term, + TextElement, + VariableReference, +) +from ftllexengine.syntax.parser import FluentParserV1 + +# =========================================================================== +# HELPERS +# =========================================================================== + + +def _parse_message(ftl: str) -> Message: + """Parse FTL source and return first Message entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Message) + return entry + + +def _parse_term(ftl: str) -> Term: + """Parse FTL source and return first Term entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Term) + return entry + + +def _make_message( + name: str, + *, + value: Pattern | None = None, + attributes: tuple[Attribute, ...] = (), +) -> Message: + """Construct a Message programmatically (bypasses parser).""" + return Message(id=Identifier(name=name), value=value, attributes=attributes) + + +def _make_pattern(*elements: TextElement | Placeable) -> Pattern: + """Construct a Pattern from elements.""" + return Pattern(elements=elements) + + +# =========================================================================== +# VARIABLE EXTRACTION +# =========================================================================== + + + +class TestIntrospectMessageNoneValue: + """introspect_message with Message(value=None) - covers line 609->613.""" + + def test_introspect_message_value_none_no_crash(self) -> None: + """Message with value=None is introspected without error. + + Covers line 609->613: False branch of ``if message.value is not None:`` + """ + attr = Attribute( + id=Identifier(name="label"), + value=_make_pattern(Placeable(expression=VariableReference(id=Identifier("x")))), + ) + msg = _make_message("test", value=None, attributes=(attr,)) + result = introspect_message(msg, use_cache=False) + assert result.message_id == "test" + assert "x" in result.get_variable_names() + + def test_introspect_message_value_none_only_attributes(self) -> None: + """Attribute variables are still extracted when value is None.""" + attr1 = Attribute( + id=Identifier(name="formal"), + value=_make_pattern(Placeable(expression=VariableReference(id=Identifier("name")))), + ) + attr2 = Attribute( + id=Identifier(name="casual"), + value=_make_pattern(TextElement(value="Hi there")), + ) + msg = _make_message("greet", value=None, attributes=(attr1, attr2)) + result = introspect_message(msg, use_cache=False) + assert "name" in result.get_variable_names() + assert result.message_id == "greet" + +class TestNestedPlaceableExpression: + """Nested Placeable inside Placeable (lines 363-364 branch coverage).""" + + def test_nested_placeable_extracts_inner_variable(self) -> None: + """Placeable wrapping another Placeable extracts the inner variable. + + Covers lines 363-364: ``elif Placeable.guard(expr):`` branch in + _visit_expression when the expression is itself a Placeable node. + """ + inner_var = VariableReference(id=Identifier(name="inner")) + inner_placeable = Placeable(expression=inner_var) + outer_placeable = Placeable(expression=inner_placeable) + msg = _make_message("test", value=_make_pattern(outer_placeable)) + + visitor = IntrospectionVisitor() + assert msg.value is not None + visitor.visit(msg.value) + names = {v.name for v in visitor.variables} + assert "inner" in names + + def test_nested_placeable_via_introspect_message(self) -> None: + """introspect_message handles doubly-nested Placeable.""" + inner_var = VariableReference(id=Identifier(name="deep")) + msg = _make_message( + "test", + value=_make_pattern(Placeable(expression=Placeable(expression=inner_var))), + ) + result = introspect_message(msg, use_cache=False) + assert "deep" in result.get_variable_names() + +class TestPatternElementExhaustiveness: + """_visit_pattern_element assert_never guard for unexpected element types.""" + + def test_unknown_pattern_element_raises_assertion_error(self) -> None: + """assert_never raises AssertionError for non-TextElement non-Placeable. + + Covers the ``case _ as unreachable: assert_never(unreachable)`` branch. + """ + visitor = IntrospectionVisitor() + # Pass an object that is neither TextElement nor Placeable + sentinel = object() + with pytest.raises(AssertionError): + visitor._visit_pattern_element(sentinel) # type: ignore[arg-type] + +class TestMessageIntrospectionContracts: + """MessageIntrospection immutability, accessor, and consistency contracts.""" + + def test_frozen_immutability(self) -> None: + """MessageIntrospection cannot be mutated.""" + info = introspect_message(_parse_message("test = { $var }")) + with pytest.raises(AttributeError): + info.message_id = "modified" # type: ignore[misc] + + def test_variable_info_immutability(self) -> None: + """VariableInfo is frozen.""" + var_info = VariableInfo(name="test", context=VariableContext.PATTERN) + with pytest.raises(AttributeError): + var_info.name = "modified" # type: ignore[misc] + + def test_requires_variable_true(self) -> None: + """requires_variable returns True for present variable.""" + info = introspect_message(_parse_message("greeting = Hello, { $name }!")) + assert info.requires_variable("name") + + def test_requires_variable_false(self) -> None: + """requires_variable returns False for absent variable.""" + info = introspect_message(_parse_message("greeting = Hello, { $name }!")) + assert not info.requires_variable("age") + + def test_get_variable_names_returns_frozenset(self) -> None: + """get_variable_names returns frozenset.""" + info = introspect_message(_parse_message("msg = { $x }")) + assert isinstance(info.get_variable_names(), frozenset) + + def test_get_function_names_returns_frozenset(self) -> None: + """get_function_names returns frozenset.""" + info = introspect_message(_parse_message("msg = { NUMBER($x) }")) + assert isinstance(info.get_function_names(), frozenset) + + def test_variables_field_is_frozenset(self) -> None: + """variables field is a frozenset of VariableInfo.""" + info = introspect_message(_parse_message("msg = { $x }")) + assert isinstance(info.variables, frozenset) + + def test_message_id_preserved(self) -> None: + """introspect_message preserves message_id.""" + msg = _parse_message("greet-user = Hello") + assert introspect_message(msg).message_id == "greet-user" + +class TestAttributeIntrospection: + """Variables in message attributes are extracted.""" + + def test_attribute_variable_extracted(self) -> None: + """Variable in attribute is extracted from message.""" + bundle = FluentBundle("en") + bundle.add_resource( + "login-button = Sign In\n .title = Click to sign in as { $username }\n" + ) + info = bundle.introspect_message("login-button") + assert "username" in info.get_variable_names() + + def test_multiple_attributes_all_extracted(self) -> None: + """Variables from all attributes are collected.""" + bundle = FluentBundle("en") + bundle.add_resource( + "button = Action\n" + " .tooltip = { $action } for { $user }\n" + " .aria-label = { $role }\n" + ) + info = bundle.introspect_message("button") + assert info.get_variable_names() == frozenset({"action", "user", "role"}) + + def test_attribute_only_message(self) -> None: + """Message with no value but attributes is introspected.""" + resource = FluentParserV1().parse("msg =\n .attr1 = Value 1\n .attr2 = Value 2\n") + msg = resource.entries[0] + assert isinstance(msg, Message) + result = introspect_message(msg) + assert result.message_id == "msg" + + def test_attribute_only_message_with_variables(self) -> None: + """Variables in attributes of value-less message are extracted.""" + resource = FluentParserV1().parse( + "msg =\n .formal = Hello { $name }\n .casual = Hi { $name }\n" + ) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert "name" in introspect_message(msg).get_variable_names() + +class TestTermIntrospection: + """Introspection of Term AST nodes.""" + + def test_introspect_term_direct(self) -> None: + """introspect_message accepts Term nodes.""" + term = _parse_term("-brand = { $companyName }") + info = introspect_message(term) + assert info.message_id == "brand" + assert "companyName" in info.get_variable_names() + + def test_introspect_term_via_bundle(self) -> None: + """FluentBundle.introspect_term() introspects a term.""" + bundle = FluentBundle("en") + bundle.add_resource("-brand = { $companyName }") + info = bundle.introspect_term("brand") + assert info.message_id == "brand" + assert "companyName" in info.get_variable_names() + + def test_introspect_term_not_found(self) -> None: + """KeyError raised for non-existent term.""" + bundle = FluentBundle("en") + with pytest.raises(KeyError, match=r"Term 'nonexistent' not found"): + bundle.introspect_term("nonexistent") + + def test_term_reference_positional_args(self) -> None: + """Term reference with positional arguments extracts nested variables.""" + msg = _parse_message("greeting = { -brand($platform) }") + assert isinstance(msg, (Message, Term)) + info = introspect_message(msg) + assert "platform" in info.get_variable_names() + + def test_term_reference_named_args(self) -> None: + """Term reference with named arguments extracts variable values.""" + msg = _parse_message('app-name = { -brand($userCase, case: "nominative") }') + assert isinstance(msg, (Message, Term)) + info = introspect_message(msg) + assert "userCase" in info.get_variable_names() + + def test_term_reference_both_arg_types(self) -> None: + """Term reference with positional and named arguments captures all variables.""" + msg = _parse_message('msg = { -term($pos1, $pos2, style: "formal") }') + assert isinstance(msg, (Message, Term)) + info = introspect_message(msg) + assert "pos1" in info.get_variable_names() + assert "pos2" in info.get_variable_names() + +class TestVariableContexts: + """Variable context tracking in IntrospectionVisitor.""" + + def test_function_arg_context(self) -> None: + """Variables in function arguments have FUNCTION_ARG context.""" + msg = _parse_message("msg = { NUMBER($value, minimumFractionDigits: 2) }") + visitor = IntrospectionVisitor() + assert msg.value is not None + visitor.visit(msg.value) + value_vars = [v for v in visitor.variables if v.name == "value"] + assert len(value_vars) == 1 + assert value_vars[0].context == VariableContext.FUNCTION_ARG + + def test_selector_context(self) -> None: + """Variables in selectors have SELECTOR context.""" + msg = _parse_message("msg = { $count -> [one] one *[other] many }") + visitor = IntrospectionVisitor() + assert msg.value is not None + visitor.visit(msg.value) + count_vars = [v for v in visitor.variables if v.name == "count"] + selector_contexts = [v for v in count_vars if v.context == VariableContext.SELECTOR] + assert len(selector_contexts) >= 1 + + def test_variant_context(self) -> None: + """Variables in variant values have VARIANT context.""" + msg = _parse_message("msg = { $sel -> [key] Value is { $value } *[other] none }") + visitor = IntrospectionVisitor() + assert msg.value is not None + visitor.visit(msg.value) + value_vars = [v for v in visitor.variables if v.name == "value"] + variant_contexts = [v for v in value_vars if v.context == VariableContext.VARIANT] + assert len(variant_contexts) >= 1 + + def test_context_restored_after_selector(self) -> None: + """Variable context is correctly restored after visiting selector.""" + msg = _parse_message( + "emails = { $count ->\n" + " [one] { $name } has one email\n" + " *[other] { $name } has { $count } emails\n" + "}" + ) + visitor = IntrospectionVisitor() + assert msg.value is not None + visitor.visit(msg.value) + var_contexts = {v.name: v.context for v in visitor.variables} + assert "count" in var_contexts + assert "name" in var_contexts + +class TestSpanTracking: + """Source position spans are attached to introspection results.""" + + def test_variable_reference_span(self) -> None: + """Variable references include correct source spans.""" + msg = _parse_message("greeting = Hello, { $name }!") + info = introspect_message(msg) + assert len(info.variables) == 1 + var_info = next(iter(info.variables)) + assert var_info.name == "name" + assert var_info.span is not None + assert var_info.span.start == 20 + assert var_info.span.end == 25 + + def test_function_reference_span(self) -> None: + """Function references include correct source spans.""" + msg = _parse_message("price = { NUMBER($amount) }") + info = introspect_message(msg) + assert len(info.functions) == 1 + func_info = next(iter(info.functions)) + assert func_info.name == "NUMBER" + assert func_info.span is not None + assert func_info.span.start == 10 + assert func_info.span.end == 25 + + def test_message_reference_span(self) -> None: + """Message references include correct source spans.""" + msg = _parse_message("ref = { other-msg }") + info = introspect_message(msg) + refs = [r for r in info.references if r.kind == ReferenceKind.MESSAGE] + assert len(refs) == 1 + assert refs[0].id == "other-msg" + assert refs[0].span is not None + assert refs[0].span.start == 8 + assert refs[0].span.end == 17 + + def test_term_reference_span(self) -> None: + """Term references include correct source spans.""" + msg = _parse_message("msg = { -brand }") + info = introspect_message(msg) + refs = [r for r in info.references if r.kind == ReferenceKind.TERM] + assert len(refs) == 1 + assert refs[0].id == "brand" + assert refs[0].span is not None + assert refs[0].span.start == 8 + assert refs[0].span.end == 15 + + def test_term_reference_with_attribute_span(self) -> None: + """Term references with attributes have correct spans.""" + msg = _parse_message("msg = { -brand.short }") + info = introspect_message(msg) + refs = [r for r in info.references if r.kind == ReferenceKind.TERM] + assert len(refs) == 1 + assert refs[0].attribute == "short" + assert refs[0].span is not None + assert refs[0].span.start == 8 + assert refs[0].span.end == 21 + + def test_multiple_variables_distinct_spans(self) -> None: + """Multiple variables each have distinct spans.""" + msg = _parse_message("msg = { $first } and { $second }") + info = introspect_message(msg) + assert len(info.variables) == 2 + vars_by_name = {v.name: v for v in info.variables} + assert vars_by_name["first"].span is not None + assert vars_by_name["first"].span.start == 8 + assert vars_by_name["second"].span is not None + assert vars_by_name["second"].span.start == 23 + + def test_message_reference_with_attribute_span(self) -> None: + """Message references with attributes have correct spans.""" + msg = _parse_message("msg = { other.attr }") + info = introspect_message(msg) + refs = [r for r in info.references if r.kind == ReferenceKind.MESSAGE] + assert len(refs) == 1 + assert refs[0].attribute == "attr" + assert refs[0].span is not None + assert refs[0].span.start == 8 + assert refs[0].span.end == 18 + +class TestDepthLimits: + """Depth guard prevents stack overflow on deeply nested ASTs.""" + + def test_introspection_visitor_depth_limit(self) -> None: + """IntrospectionVisitor respects max_depth configuration.""" + msg = _parse_message( + "msg = { $a -> [x] { $b -> [y] { $c -> [z] value *[o] v } *[o] v } *[o] v }" + ) + visitor = IntrospectionVisitor(max_depth=100) + assert msg.value is not None + visitor.visit(msg.value) + names = {v.name for v in visitor.variables} + assert "a" in names + assert "b" in names + assert "c" in names + + def test_reference_extractor_depth_limit(self) -> None: + """ReferenceExtractor respects max_depth configuration.""" + msg = _parse_message("msg = { -term1(-term2(-term3)) }") + extractor = ReferenceExtractor(max_depth=100) + assert msg.value is not None + extractor.visit(msg.value) + assert "term1" in extractor.term_refs + assert "term2" in extractor.term_refs + assert "term3" in extractor.term_refs + +class TestIntrospectMessageTypeErrors: + """introspect_message raises TypeError for non-Message/Term inputs.""" + + def test_raises_for_junk(self) -> None: + """Junk entry raises TypeError.""" + resource = parse_ftl("invalid syntax here !!!") + assert resource.entries + junk = resource.entries[0] + assert isinstance(junk, Junk) + with pytest.raises(TypeError, match="Expected Message or Term"): + introspect_message(junk) # type: ignore[arg-type] + + def test_raises_for_string(self) -> None: + """String input raises TypeError.""" + with pytest.raises(TypeError, match="Expected Message or Term"): + introspect_message("not a message") # type: ignore[arg-type] + + def test_raises_for_none(self) -> None: + """None input raises TypeError.""" + with pytest.raises(TypeError, match="Expected Message or Term"): + introspect_message(None) # type: ignore[arg-type] + + def test_raises_for_dict(self) -> None: + """Dict input raises TypeError.""" + with pytest.raises(TypeError, match="Expected Message or Term"): + introspect_message({"not": "a message"}) # type: ignore[arg-type] + + @given( + st.one_of( + st.integers(), + st.decimals(allow_nan=False, allow_infinity=False), + st.booleans(), + st.lists(st.text()), + ) + ) + @settings(max_examples=30) + def test_raises_for_arbitrary_types(self, invalid_input: object) -> None: + """Arbitrary non-Message types raise TypeError.""" + event(f"input_type={type(invalid_input).__name__}") + with pytest.raises(TypeError, match="Expected Message or Term"): + introspect_message(invalid_input) # type: ignore[arg-type] + +class TestRealWorldScenarios: + """Integration tests for practical use cases.""" + + def test_ui_message_validation(self) -> None: + """CI/CD variable validation for UI messages.""" + bundle = FluentBundle("en") + bundle.add_resource( + "home-subtitle = Welcome to { $country }\n" + "money-with-vat = Gross: { $gross }, Net: { $net }, VAT: { $vat } ({ $rate }%)\n" + ) + assert "country" in bundle.get_message_variables("home-subtitle") + assert bundle.get_message_variables("money-with-vat") == frozenset( + {"gross", "net", "vat", "rate"} + ) + + def test_function_usage_analysis(self) -> None: + """Analyze function usage in financial messages.""" + bundle = FluentBundle("en") + bundle.add_resource( + 'timestamp = Last updated: { DATETIME($time, dateStyle: "medium") }\n' + "price = Total: { NUMBER($amount, minimumFractionDigits: 2," + " maximumFractionDigits: 2) }\n" + ) + ts_info = bundle.introspect_message("timestamp") + assert "DATETIME" in ts_info.get_function_names() + assert "time" in ts_info.get_variable_names() + + price_info = bundle.introspect_message("price") + number_funcs = [f for f in price_info.functions if f.name == "NUMBER"] + assert len(number_funcs) == 1 + assert "minimumFractionDigits" in number_funcs[0].named_args + assert "maximumFractionDigits" in number_funcs[0].named_args diff --git a/tests/introspection_message_cases/extraction_and_references.py b/tests/introspection_message_cases/extraction_and_references.py new file mode 100644 index 00000000..d16fbe13 --- /dev/null +++ b/tests/introspection_message_cases/extraction_and_references.py @@ -0,0 +1,545 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest + +from ftllexengine import FluentBundle +from ftllexengine.enums import ReferenceKind +from ftllexengine.introspection import ( + extract_references, + extract_references_by_attribute, + extract_variables, + introspect_message, +) +from ftllexengine.introspection.message import ( + IntrospectionVisitor, + ReferenceExtractor, +) +from ftllexengine.syntax.ast import ( + Attribute, + CallArguments, + FunctionReference, + Identifier, + Message, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + StringLiteral, + Term, + TermReference, + TextElement, + VariableReference, +) +from ftllexengine.syntax.parser import FluentParserV1 + +# =========================================================================== +# HELPERS +# =========================================================================== + + +def _parse_message(ftl: str) -> Message: + """Parse FTL source and return first Message entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Message) + return entry + + +def _parse_term(ftl: str) -> Term: + """Parse FTL source and return first Term entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Term) + return entry + + +def _make_message( + name: str, + *, + value: Pattern | None = None, + attributes: tuple[Attribute, ...] = (), +) -> Message: + """Construct a Message programmatically (bypasses parser).""" + return Message(id=Identifier(name=name), value=value, attributes=attributes) + + +def _make_pattern(*elements: TextElement | Placeable) -> Pattern: + """Construct a Pattern from elements.""" + return Pattern(elements=elements) + + +# =========================================================================== +# VARIABLE EXTRACTION +# =========================================================================== + + + +class TestVariableExtraction: + """Variable extraction from various message patterns.""" + + def test_simple_variable(self) -> None: + """Extract single variable from simple message.""" + bundle = FluentBundle("en") + bundle.add_resource("greeting = Hello, { $name }!") + assert bundle.get_message_variables("greeting") == frozenset({"name"}) + + def test_multiple_variables(self) -> None: + """Extract multiple variables from message.""" + bundle = FluentBundle("en") + bundle.add_resource("user-info = { $firstName } { $lastName } (Age: { $age })") + assert bundle.get_message_variables("user-info") == frozenset( + {"firstName", "lastName", "age"} + ) + + def test_duplicate_variables(self) -> None: + """Duplicate variable references appear once (frozenset deduplication).""" + bundle = FluentBundle("en") + bundle.add_resource("greeting = { $name }, nice to meet you { $name }!") + assert bundle.get_message_variables("greeting") == frozenset({"name"}) + + def test_no_variables(self) -> None: + """Message with no variables returns empty frozenset.""" + bundle = FluentBundle("en") + bundle.add_resource("hello = Hello, World!") + assert bundle.get_message_variables("hello") == frozenset() + + def test_message_not_found(self) -> None: + """KeyError raised for non-existent message.""" + bundle = FluentBundle("en") + with pytest.raises(KeyError, match=r"Message 'nonexistent' not found"): + bundle.get_message_variables("nonexistent") + + def test_plain_text_pattern_has_no_variables(self) -> None: + """TextElement branch: patterns with only text extract nothing.""" + msg = _parse_message("msg = Plain text without any placeables") + result = introspect_message(msg) + assert len(result.get_variable_names()) == 0 + assert len(result.get_function_names()) == 0 + assert not result.has_selectors + + def test_text_element_branch_in_visitor(self) -> None: + """TextElement case in _visit_pattern_element executes without effect.""" + msg = _parse_message("msg = just text") + visitor = IntrospectionVisitor() + assert msg.value is not None + visitor.visit(msg.value) + assert visitor.variables == set() + + def test_extract_variables_direct_api(self) -> None: + """extract_variables() convenience function delegates correctly.""" + msg = _parse_message("greeting = Hello, { $name }!") + assert extract_variables(msg) == frozenset({"name"}) + + def test_extract_variables_from_select_with_variants(self) -> None: + """All variant-local variables are captured.""" + msg = _parse_message( + "msg = { $count ->\n" + " [one] You have { $count } item from { $source }\n" + " [few] You have { $count } items from { $source }\n" + " *[other] You have { $count } items from { $source }\n" + "}" + ) + vars_ = extract_variables(msg) + assert "count" in vars_ + assert "source" in vars_ + +class TestSelectExpressions: + """Variable extraction from select expressions.""" + + def test_selector_variable(self) -> None: + """Variable used in selector is extracted.""" + bundle = FluentBundle("en") + bundle.add_resource( + "emails = { $count ->\n [one] one email\n *[other] { $count } emails\n}\n" + ) + assert "count" in bundle.get_message_variables("emails") + + def test_variant_variables(self) -> None: + """Variables in variants are all extracted.""" + bundle = FluentBundle("en") + bundle.add_resource( + "message = { $userType ->\n" + " [admin] Hello { $name }, you are an admin\n" + " *[user] Welcome { $name }\n" + "}\n" + ) + assert bundle.get_message_variables("message") == frozenset({"userType", "name"}) + + def test_nested_selectors(self) -> None: + """Nested select expressions extract all variables.""" + bundle = FluentBundle("en") + bundle.add_resource( + "complex = { $gender ->\n" + " [male] { $count ->\n" + " [one] one item\n" + " *[other] { $count } items\n" + " }\n" + " *[female] { $count } things\n" + "}\n" + ) + assert bundle.get_message_variables("complex") == frozenset({"gender", "count"}) + + def test_has_selectors_flag_set(self) -> None: + """MessageIntrospection.has_selectors is True for select expressions.""" + msg = _parse_message( + "msg = { $count ->\n [0] No items\n [1] One item\n *[other] Many items\n}\n" + ) + result = introspect_message(msg) + assert result.has_selectors is True + assert "count" in result.get_variable_names() + + def test_has_selectors_flag_false_for_plain(self) -> None: + """has_selectors is False for messages without select expressions.""" + msg = _parse_message("simple = Hello") + assert not introspect_message(msg).has_selectors + +class TestFunctionIntrospection: + """Function call detection and metadata extraction.""" + + def test_function_detection(self) -> None: + """Function calls are detected and named correctly.""" + info = introspect_message(_parse_message("price = { NUMBER($amount) }")) + assert "NUMBER" in info.get_function_names() + assert "amount" in info.get_variable_names() + + def test_function_with_named_args(self) -> None: + """Named argument keys are captured in FunctionCallInfo.""" + info = introspect_message( + _parse_message("price = { NUMBER($amount, minimumFractionDigits: 2) }") + ) + funcs = list(info.functions) + assert len(funcs) == 1 + assert funcs[0].name == "NUMBER" + assert "amount" in funcs[0].positional_arg_vars + assert "minimumFractionDigits" in funcs[0].named_args + + def test_multiple_functions(self) -> None: + """Multiple distinct function calls are all detected.""" + info = introspect_message( + _parse_message("ts = { NUMBER($value) } at { DATETIME($time) }") + ) + assert info.get_function_names() == frozenset({"NUMBER", "DATETIME"}) + + def test_function_without_arguments(self) -> None: + """Function with empty argument list (FUNC()) is detected.""" + msg = _parse_message("msg = Result: { BUILTIN() }") + result = introspect_message(msg) + assert "BUILTIN" in result.get_function_names() + + def test_function_with_empty_arguments(self) -> None: + """FunctionReference with empty CallArguments is detected and has no variables. + + Verifies that a function call with no positional or named arguments + produces a FunctionCallInfo with empty variable sets. + """ + func_ref = FunctionReference( + id=Identifier(name="NOOP"), + arguments=CallArguments(positional=(), named=()), + ) + msg = _make_message( + "test", value=_make_pattern(Placeable(expression=func_ref)) + ) + info = introspect_message(msg, use_cache=False) + assert "NOOP" in info.get_function_names() + assert len(info.get_variable_names()) == 0 + + def test_function_multiple_positional_args(self) -> None: + """Multiple positional arguments are all extracted.""" + msg = _parse_message("msg = { FUNC($a, $b, $c) }") + result = introspect_message(msg) + assert result.get_variable_names() == frozenset({"a", "b", "c"}) + + def test_function_variable_in_positional_arg_with_literal_named_arg(self) -> None: + """Variable reference in positional arg is extracted; named arg literals are not. + + Per FTL spec, named argument values are constrained to StringLiteral or + NumberLiteral. They cannot be VariableReferences. Only positional arguments + contribute variable names when they contain VariableReference nodes. + """ + func_ref = FunctionReference( + id=Identifier(name="CUSTOM"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="x")),), + named=( + NamedArgument( + name=Identifier(name="opt"), + value=StringLiteral(value="opt_value"), + ), + ), + ), + ) + msg = _make_message("test", value=_make_pattern(Placeable(expression=func_ref))) + info = introspect_message(msg, use_cache=False) + # Only "x" from positional arg; named arg literal value contributes nothing + assert info.get_variable_names() == frozenset({"x"}) + + def test_function_named_args_with_literals_do_not_contribute_variable_names( + self, + ) -> None: + """Named argument literal values do not contribute to variable_names. + + Per FTL spec, named argument values are always literals (StringLiteral or + NumberLiteral), never VariableReferences. Variables from positional args + are extracted; named arg literal values are not variable references. + """ + func_ref = FunctionReference( + id=Identifier(name="FUNC"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="val")),), + named=( + NamedArgument( + name=Identifier(name="a"), + value=StringLiteral(value="first"), + ), + NamedArgument( + name=Identifier(name="b"), + value=StringLiteral(value="second"), + ), + NamedArgument( + name=Identifier(name="n"), + value=NumberLiteral(value=42, raw="42"), + ), + ), + ), + ) + msg = _make_message("test", value=_make_pattern(Placeable(expression=func_ref))) + info = introspect_message(msg, use_cache=False) + # Only "val" from positional arg; named arg literal values contribute nothing + assert info.get_variable_names() == frozenset({"val"}) + assert "FUNC" in info.get_function_names() + + def test_nested_message_reference_in_function_arg(self) -> None: + """MessageReference in function positional arg is extracted.""" + bundle = FluentBundle("en") + bundle.add_resource("base-value = 42\nformatted = { NUMBER(base-value) }\n") + info = bundle.introspect_message("formatted") + assert any(r.id == "base-value" for r in info.references) + + def test_variable_in_complex_nested_expression(self) -> None: + """Variables in function inside select expression are captured.""" + bundle = FluentBundle("en") + bundle.add_resource( + "complex = { $type ->\n" + " [currency] { NUMBER($amount, minimumFractionDigits: 2) }\n" + " *[plain] { $amount }\n" + "}\n" + ) + info = bundle.introspect_message("complex") + assert "type" in info.get_variable_names() + assert "amount" in info.get_variable_names() + +class TestReferenceIntrospection: + """Message and term reference tracking.""" + + def test_message_reference(self) -> None: + """MessageReference is captured in ReferenceInfo.""" + bundle = FluentBundle("en") + bundle.add_resource("brand = FTLLexEngine\ngreeting = Welcome to { brand }\n") + info = bundle.introspect_message("greeting") + refs = list(info.references) + assert len(refs) == 1 + assert refs[0].id == "brand" + assert refs[0].kind == ReferenceKind.MESSAGE + assert refs[0].attribute is None + + def test_term_reference(self) -> None: + """TermReference is captured in ReferenceInfo.""" + bundle = FluentBundle("en") + bundle.add_resource("-brand = FTLLexEngine\ngreeting = Welcome to { -brand }\n") + info = bundle.introspect_message("greeting") + refs = list(info.references) + assert len(refs) == 1 + assert refs[0].id == "brand" + assert refs[0].kind == ReferenceKind.TERM + + def test_attribute_message_reference(self) -> None: + """MessageReference with attribute is captured correctly.""" + bundle = FluentBundle("en") + bundle.add_resource( + "message = Message\n .tooltip = Tooltip\ngreeting = Hover for { message.tooltip }\n" + ) + info = bundle.introspect_message("greeting") + refs = list(info.references) + assert len(refs) == 1 + assert refs[0].id == "message" + assert refs[0].attribute == "tooltip" + +class TestReferenceExtractor: + """ReferenceExtractor specialized visitor for dependency analysis.""" + + def test_message_reference_collected(self) -> None: + """MessageReference is added to message_refs without attribute.""" + msg = _parse_message("msg = { other-message }") + extractor = ReferenceExtractor() + assert msg.value is not None + extractor.visit(msg.value) + assert "other-message" in extractor.message_refs + + def test_message_reference_with_attribute(self) -> None: + """MessageReference with attribute uses qualified form.""" + msg = _parse_message("msg = { other.attr }") + extractor = ReferenceExtractor() + assert msg.value is not None + extractor.visit(msg.value) + assert "other.attr" in extractor.message_refs + + def test_term_reference_no_attribute(self) -> None: + """TermReference without attribute uses unqualified form.""" + msg = _parse_message("msg = { -brand }") + extractor = ReferenceExtractor() + assert msg.value is not None + extractor.visit(msg.value) + assert "brand" in extractor.term_refs + + def test_term_reference_with_attribute(self) -> None: + """TermReference with attribute uses qualified form (line 482 branch).""" + msg = _parse_message("msg = { -brand.short }") + extractor = ReferenceExtractor() + assert msg.value is not None + extractor.visit(msg.value) + # Covers line 482: self.term_refs.add(f"{node.id.name}.{node.attribute.name}") + assert "brand.short" in extractor.term_refs + + def test_nested_term_references_via_arguments(self) -> None: + """Nested term arguments are traversed by generic_visit.""" + msg = _parse_message("msg = { -outer(-inner($var)) }") + assert isinstance(msg, (Message, Term)) + _msg_refs, term_refs = extract_references(msg) + assert "outer" in term_refs + assert "inner" in term_refs + + def test_depth_guard_in_deeply_nested_terms(self) -> None: + """ReferenceExtractor respects max_depth.""" + msg = _parse_message("msg = { -term1(-term2(-term3)) }") + extractor = ReferenceExtractor(max_depth=100) + assert msg.value is not None + extractor.visit(msg.value) + assert "term1" in extractor.term_refs + assert "term2" in extractor.term_refs + assert "term3" in extractor.term_refs + +class TestExtractReferences: + """Tests for extract_references() public function.""" + + def test_extract_message_and_term_refs(self) -> None: + """extract_references returns both message and term ref sets.""" + msg = _parse_message("msg = { welcome } uses { -brand }") + msg_refs, term_refs = extract_references(msg) + assert "welcome" in msg_refs + assert "brand" in term_refs + + def test_term_reference_with_args_tracked(self) -> None: + """Term references in arguments are captured.""" + msg = _parse_message('msg = { -brand($var, case: "nominative") }') + assert isinstance(msg, (Message, Term)) + _msg_refs, term_refs = extract_references(msg) + assert "brand" in term_refs + + def test_extract_references_message_with_no_value(self) -> None: + """extract_references handles Message(value=None) correctly. + + Covers line 518->522: False branch of ``if entry.value is not None:`` + when message has only attributes (no value pattern). + """ + attr = Attribute( + id=Identifier(name="attr"), + value=_make_pattern(Placeable(expression=TermReference(id=Identifier("brand")))), + ) + msg = _make_message("test", value=None, attributes=(attr,)) + msg_refs, term_refs = extract_references(msg) + # Value is None so no refs from value; attribute has term ref + assert "brand" in term_refs + assert len(msg_refs) == 0 + + def test_extract_references_message_with_empty_value_no_attrs(self) -> None: + """extract_references with empty pattern value returns empty sets.""" + msg = _make_message("test", value=_make_pattern()) + msg_refs, term_refs = extract_references(msg) + assert msg_refs == frozenset() + assert term_refs == frozenset() + +class TestExtractReferencesByAttribute: + """Tests for extract_references_by_attribute() public function. + + This function was previously untested (0% coverage). Tests cover all + branches: value pattern, per-attribute patterns, and None-value messages. + """ + + def test_value_pattern_refs_under_none_key(self) -> None: + """Value pattern references are stored under key None.""" + msg = _parse_message("msg = { welcome } uses { -brand }") + result = extract_references_by_attribute(msg) + assert None in result + msg_refs, term_refs = result[None] + assert "welcome" in msg_refs + assert "brand" in term_refs + + def test_attribute_refs_under_attribute_name_key(self) -> None: + """Attribute references are stored under the attribute name key.""" + msg = _parse_message( + "msg = Base text\n .tooltip = { -brand }\n .label = { other }\n" + ) + result = extract_references_by_attribute(msg) + assert "tooltip" in result + assert "label" in result + _m, term_refs = result["tooltip"] + assert "brand" in term_refs + msg_refs2, _t = result["label"] + assert "other" in msg_refs2 + + def test_value_and_attributes_separated(self) -> None: + """Value and attribute references are separate entries.""" + msg = _parse_message( + "msg = { value-ref }\n .attr = { -term-ref }\n" + ) + result = extract_references_by_attribute(msg) + assert None in result + assert "attr" in result + # Value has message ref + assert "value-ref" in result[None][0] + # Attr has term ref + assert "term-ref" in result["attr"][1] + + def test_message_with_no_value(self) -> None: + """Message with value=None has no None key in result.""" + attr = Attribute( + id=Identifier(name="tooltip"), + value=_make_pattern(Placeable(expression=TermReference(id=Identifier("brand")))), + ) + msg = _make_message("btn", value=None, attributes=(attr,)) + result = extract_references_by_attribute(msg) + # No None key (no value pattern) + assert None not in result + assert "tooltip" in result + assert "brand" in result["tooltip"][1] + + def test_message_with_only_value(self) -> None: + """Message with value but no attributes returns single entry.""" + msg = _parse_message("msg = { other }") + result = extract_references_by_attribute(msg) + assert set(result.keys()) == {None} + assert "other" in result[None][0] + + def test_empty_message_no_refs(self) -> None: + """Message with empty value and no attributes returns empty result.""" + msg = _make_message("test", value=_make_pattern()) + result = extract_references_by_attribute(msg) + # Empty Pattern creates a None key with empty sets + assert None in result + msg_refs, term_refs = result[None] + assert msg_refs == frozenset() + assert term_refs == frozenset() + + def test_multiple_attributes_all_present(self) -> None: + """All attributes appear as separate keys.""" + msg = _parse_message( + "btn = Base\n .a1 = { -t1 }\n .a2 = { -t2 }\n .a3 = { -t3 }\n" + ) + result = extract_references_by_attribute(msg) + assert "a1" in result + assert "a2" in result + assert "a3" in result + assert "t1" in result["a1"][1] + assert "t2" in result["a2"][1] + assert "t3" in result["a3"][1] diff --git a/tests/introspection_message_cases/properties_and_branches.py b/tests/introspection_message_cases/properties_and_branches.py new file mode 100644 index 00000000..9e1d5eba --- /dev/null +++ b/tests/introspection_message_cases/properties_and_branches.py @@ -0,0 +1,489 @@ +# mypy: ignore-errors +from __future__ import annotations + +import threading + +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine import parse_ftl +from ftllexengine.introspection import ( + clear_introspection_cache, + extract_references, + extract_variables, + introspect_message, +) +from ftllexengine.introspection.message import ( + _introspection_cache, + _introspection_cache_lock, +) +from ftllexengine.syntax.ast import ( + Attribute, + CallArguments, + FunctionReference, + Identifier, + Junk, + Message, + Pattern, + Placeable, + Term, + TextElement, + VariableReference, +) +from ftllexengine.syntax.parser import FluentParserV1 + +# =========================================================================== +# HELPERS +# =========================================================================== + + +def _parse_message(ftl: str) -> Message: + """Parse FTL source and return first Message entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Message) + return entry + + +def _parse_term(ftl: str) -> Term: + """Parse FTL source and return first Term entry.""" + resource = FluentParserV1().parse(ftl) + entry = resource.entries[0] + assert isinstance(entry, Term) + return entry + + +def _make_message( + name: str, + *, + value: Pattern | None = None, + attributes: tuple[Attribute, ...] = (), +) -> Message: + """Construct a Message programmatically (bypasses parser).""" + return Message(id=Identifier(name=name), value=value, attributes=attributes) + + +def _make_pattern(*elements: TextElement | Placeable) -> Pattern: + """Construct a Pattern from elements.""" + return Pattern(elements=elements) + + +# =========================================================================== +# VARIABLE EXTRACTION +# =========================================================================== + + +_var_names = st.from_regex(r"[a-z]+", fullmatch=True) + +_msg_ids = st.from_regex(r"[a-z]+", fullmatch=True) + +class TestVariableExtractionProperties: + """Property-based invariants for variable extraction.""" + + @given(var_name=_var_names) + @settings(max_examples=200) + def test_simple_variable_always_extracted(self, var_name: str) -> None: + """{ $var } always extracts var.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = Hello {{ ${var_name} }}") + assert var_name in extract_variables(msg) + + @given(var_name=_var_names) + @settings(max_examples=200) + def test_duplicate_variables_deduplicated(self, var_name: str) -> None: + """{ $var } { $var } extracts var once.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = Hello {{ ${var_name} }} {{ ${var_name} }}") + variables = extract_variables(msg) + assert var_name in variables + assert len([v for v in variables if v == var_name]) == 1 + + @given(var1=_var_names, var2=_var_names) + @settings(max_examples=200) + def test_multiple_variables_all_extracted(self, var1: str, var2: str) -> None: + """{ $a } { $b } extracts both a and b.""" + event(f"same_vars={var1 == var2}") + msg = _parse_message(f"msg = Hello {{ ${var1} }} {{ ${var2} }}") + variables = extract_variables(msg) + assert var1 in variables + if var1 != var2: + assert var2 in variables + + @given(msg_id=_msg_ids) + @settings(max_examples=100) + def test_no_variables_returns_empty_set(self, msg_id: str) -> None: + """Message with no variables returns empty frozenset.""" + event(f"msg_id={msg_id}") + msg = _parse_message(f"{msg_id} = Hello World") + assert len(extract_variables(msg)) == 0 + + @given(var_name=_var_names) + @settings(max_examples=100) + def test_variable_in_function_extracted(self, var_name: str) -> None: + """NUMBER($var) extracts var.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = {{ NUMBER(${var_name}) }}") + assert var_name in extract_variables(msg) + + @given(var_name=_var_names, attr_name=st.from_regex(r"[a-z]+", fullmatch=True)) + @settings(max_examples=100) + def test_attribute_variable_extracted(self, var_name: str, attr_name: str) -> None: + """Variables in attributes are extracted.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = Hello\n .{attr_name} = {{ ${var_name} }}") + assert var_name in introspect_message(msg).get_variable_names() + +class TestIntrospectionResultProperties: + """Properties of MessageIntrospection result objects.""" + + @given(msg_id=_msg_ids) + @settings(max_examples=200) + def test_message_id_preserved(self, msg_id: str) -> None: + """introspect_message preserves message ID.""" + event(f"msg_id={msg_id}") + msg = _parse_message(f"{msg_id} = Hello") + assert introspect_message(msg).message_id == msg_id + + @given(var_name=_var_names) + @settings(max_examples=200) + def test_get_variable_names_consistent(self, var_name: str) -> None: + """get_variable_names() and variables field are consistent.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = Hello {{ ${var_name} }}") + info = introspect_message(msg) + var_names = info.get_variable_names() + assert var_name in var_names + assert len(info.variables) == len(var_names) + + @given(var_name=_var_names) + @settings(max_examples=200) + def test_requires_variable_matches_extraction(self, var_name: str) -> None: + """requires_variable(x) iff x in get_variable_names().""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = Hello {{ ${var_name} }}") + info = introspect_message(msg) + if info.requires_variable(var_name): + assert var_name in info.get_variable_names() + if var_name in info.get_variable_names(): + assert info.requires_variable(var_name) + + @given(msg_id=_msg_ids) + @settings(max_examples=100) + def test_no_selectors_for_simple_message(self, msg_id: str) -> None: + """Simple message has has_selectors=False.""" + event(f"msg_id={msg_id}") + msg = _parse_message(f"{msg_id} = Hello") + assert introspect_message(msg).has_selectors is False + + @given(var_name=_var_names) + @settings(max_examples=100) + def test_select_expression_sets_has_selectors(self, var_name: str) -> None: + """Message with select expression has has_selectors=True.""" + event(f"var_name={var_name}") + msg = _parse_message( + f"msg = {{ ${var_name} ->\n [one] One item\n *[other] Many items\n}}" + ) + assert introspect_message(msg).has_selectors is True + + @given(var_name=_var_names) + @settings(max_examples=100) + def test_number_function_detected(self, var_name: str) -> None: + """NUMBER($var) is detected as a function call.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = {{ NUMBER(${var_name}) }}") + assert "NUMBER" in introspect_message(msg).get_function_names() + + @given(msg_id=_msg_ids) + @settings(max_examples=100) + def test_no_functions_returns_empty_set(self, msg_id: str) -> None: + """Message with no functions returns empty frozenset.""" + event(f"msg_id={msg_id}") + msg = _parse_message(f"{msg_id} = Hello World") + assert len(introspect_message(msg).get_function_names()) == 0 + +class TestIntrospectionIdempotence: + """Idempotence: repeated calls return same results.""" + + @given(var_name=_var_names) + @settings(max_examples=100) + def test_extract_variables_idempotent(self, var_name: str) -> None: + """Multiple extract_variables() calls return the same result.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = Hello {{ ${var_name} }}") + r1 = extract_variables(msg) + r2 = extract_variables(msg) + assert r1 == r2 + + @given(var_name=_var_names) + @settings(max_examples=100) + def test_introspect_message_idempotent(self, var_name: str) -> None: + """Multiple introspect_message() calls return equivalent results.""" + event(f"var_name={var_name}") + msg = _parse_message(f"msg = Hello {{ ${var_name} }}") + r1 = introspect_message(msg) + r2 = introspect_message(msg) + assert r1.message_id == r2.message_id + assert r1.variables == r2.variables + assert r1.functions == r2.functions + assert r1.references == r2.references + assert r1.has_selectors == r2.has_selectors + + @given(vars_list=st.lists(_var_names, min_size=1, max_size=10, unique=True)) + @settings(max_examples=50) + def test_multiple_variables_all_captured(self, vars_list: list[str]) -> None: + """All variables in message are captured in extract_variables.""" + event(f"var_count={len(vars_list)}") + placeables = " ".join(f"{{ ${v} }}" for v in vars_list) + msg = _parse_message(f"msg = {placeables}") + variables = extract_variables(msg) + for var in vars_list: + assert var in variables + assert len(variables) == len(vars_list) + + @given( + var_names_list=st.lists( + st.text( + alphabet=st.characters(min_codepoint=97, max_codepoint=122), + min_size=1, + max_size=10, + ), + min_size=1, + max_size=5, + ) + ) + @settings(max_examples=30) + def test_arbitrary_variable_named_args(self, var_names_list: list[str]) -> None: + """Functions with arbitrary variable names in named args extract all vars.""" + var_names_list = list(dict.fromkeys(var_names_list)) + if not var_names_list: + return + event(f"var_count={len(var_names_list)}") + var_list = ", ".join(f"{name}: ${name}" for name in var_names_list) + ftl = f"test = {{ NUMBER($value, {var_list}) }}" + resource = parse_ftl(ftl) + if not resource.entries or isinstance(resource.entries[0], Junk): + return + msg = resource.entries[0] + if not isinstance(msg, Message): + return + info = introspect_message(msg) + assert "value" in info.get_variable_names() + for name in var_names_list: + assert name in info.get_variable_names() + +class TestIntrospectionNestedPlaceable: + """Test introspection of nested Placeable expressions.""" + + def test_nested_placeable_extraction(self) -> None: + """Nested Placeable (Placeable containing Placeable) visits inner expression.""" + inner_var = VariableReference(id=Identifier("innerVar")) + inner_placeable = Placeable(expression=inner_var) + outer_placeable = Placeable(expression=inner_placeable) + + message = Message( + id=Identifier("nested"), + value=Pattern(elements=(outer_placeable,)), + attributes=(), + ) + + result = introspect_message(message) + + var_names = {v.name for v in result.variables} + assert "innerVar" in var_names + + def test_deeply_nested_placeables(self) -> None: + """Multiple levels of nested Placeables are fully traversed.""" + var = VariableReference(id=Identifier("deep")) + level1 = Placeable(expression=var) + level2 = Placeable(expression=level1) + level3 = Placeable(expression=level2) + + message = Message( + id=Identifier("deepNest"), + value=Pattern(elements=(level3,)), + attributes=(), + ) + + result = introspect_message(message) + var_names = {v.name for v in result.variables} + assert "deep" in var_names + + def test_message_without_value_extract_references(self) -> None: + """Message with value=None but with attributes extracts from attributes.""" + attr_pattern = Pattern( + elements=(Placeable(expression=VariableReference(id=Identifier("attrVar"))),) + ) + message = Message( + id=Identifier("attrsOnly"), + value=None, + attributes=(Attribute(id=Identifier("hint"), value=attr_pattern),), + ) + + msg_refs, term_refs = extract_references(message) + + assert isinstance(msg_refs, frozenset) + assert isinstance(term_refs, frozenset) + + def test_introspect_message_without_value(self) -> None: + """introspect_message extracts from attributes when message.value is None.""" + attr_pattern = Pattern( + elements=( + TextElement("Hint: "), + Placeable(expression=VariableReference(id=Identifier("hintVar"))), + ) + ) + message = Message( + id=Identifier("noValue"), + value=None, + attributes=(Attribute(id=Identifier("tooltip"), value=attr_pattern),), + ) + + result = introspect_message(message) + + var_names = {v.name for v in result.variables} + assert "hintVar" in var_names + +class TestIntrospectionBranchCoverage: + """Tests for introspection branch coverage.""" + + def test_function_without_arguments(self) -> None: + """Function reference with empty arguments visits function node correctly.""" + func_ref = FunctionReference( + id=Identifier("NOARGS"), + arguments=CallArguments(positional=(), named=()), + ) + + message = Message( + id=Identifier("noArgsFunc"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + + result = introspect_message(message) + + func_names = {f.name for f in result.functions} + assert "NOARGS" in func_names + + def test_text_element_only_pattern(self) -> None: + """Pattern with only TextElement yields no variables or functions.""" + message = Message( + id=Identifier("textOnly"), + value=Pattern(elements=(TextElement("Just plain text"),)), + attributes=(), + ) + + result = introspect_message(message) + + assert len(result.variables) == 0 + assert len(result.functions) == 0 + + def test_function_with_empty_call_arguments(self) -> None: + """Function with empty positional and named arguments is still recorded.""" + func_ref = FunctionReference( + id=Identifier("EMPTY"), + arguments=CallArguments(positional=(), named=()), + ) + + message = Message( + id=Identifier("emptyArgs"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + + result = introspect_message(message) + + func_names = {f.name for f in result.functions} + assert "EMPTY" in func_names + +class TestIntrospectionThreadSafety: + """Verify the cache lock prevents data corruption under concurrent access. + + These tests exercise the check-compute-store pattern introduced with the + threading.Lock that replaced the GIL-reliant lock-free WeakKeyDictionary + access. They run in CI (no @pytest.mark.fuzz) because the thread counts + are small and the wall-clock cost is negligible. + """ + + def test_concurrent_introspection_same_message(self) -> None: + """Concurrent introspection of the same Message yields identical results. + + All threads must see the same MessageIntrospection (equal by content), + and the cache must contain exactly one entry for the shared message. + """ + message = Message( + id=Identifier("sharedMsg"), + value=Pattern(elements=( + TextElement("Hello "), + Placeable(expression=VariableReference(id=Identifier("name"))), + )), + attributes=(), + ) + + # Clear cache to ensure a fresh start for this test. + with _introspection_cache_lock: + _introspection_cache.clear() + + results: list[object] = [] + errors: list[BaseException] = [] + + def worker() -> None: + try: + results.append(introspect_message(message)) + except Exception as exc: + errors.append(exc) + + threads = [threading.Thread(target=worker) for _ in range(20)] + for t in threads: + t.start() + for t in threads: + t.join() + + assert not errors, f"Thread errors: {errors}" + assert len(results) == 20 + + # All results must be equal (same content, immutable). + first = results[0] + assert all(r == first for r in results) + + def test_concurrent_clear_and_introspect(self) -> None: + """Concurrent clear + introspect does not corrupt the cache. + + After all operations complete, any surviving cached entry must be + a valid MessageIntrospection (no partially-written garbage). + """ + message = Message( + id=Identifier("racyMsg"), + value=Pattern(elements=(TextElement("race"),)), + attributes=(), + ) + + errors: list[BaseException] = [] + + def introspector() -> None: + try: + for _ in range(10): + introspect_message(message) + except Exception as exc: + errors.append(exc) + + def clearer() -> None: + try: + for _ in range(5): + clear_introspection_cache() + except Exception as exc: + errors.append(exc) + + threads = ( + [threading.Thread(target=introspector) for _ in range(8)] + + [threading.Thread(target=clearer) for _ in range(2)] + ) + for t in threads: + t.start() + for t in threads: + t.join() + + assert not errors, f"Thread errors: {errors}" + + # Final cache state must be consistent: either empty or holding a valid result. + result = introspect_message(message) + assert result.message_id == "racyMsg" diff --git a/tests/localization_cases/__init__.py b/tests/localization_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/localization_cases/basics_and_fallback.py b/tests/localization_cases/basics_and_fallback.py new file mode 100644 index 00000000..6369a194 --- /dev/null +++ b/tests/localization_cases/basics_and_fallback.py @@ -0,0 +1,288 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest + +from ftllexengine.localization import ( + FluentLocalization, +) + + +class TestFluentLocalizationBasics: + """Test basic FluentLocalization initialization and API.""" + + def test_single_locale_initialization(self) -> None: + """Initialize with single locale.""" + l10n = FluentLocalization(["en"]) + + assert l10n.locales == ("en",) + + def test_multiple_locales_initialization(self) -> None: + """Initialize with multiple locales in fallback order.""" + l10n = FluentLocalization(["lv", "en", "lt"]) + + assert l10n.locales == ("lv", "en", "lt") + + def test_empty_locales_raises_error(self) -> None: + """Empty locale list raises ValueError.""" + with pytest.raises(ValueError, match="At least one locale is required"): + FluentLocalization([]) + + def test_resource_ids_without_loader_raises_error(self) -> None: + """Providing resource_ids without loader raises ValueError.""" + with pytest.raises( + ValueError, match="resource_loader required when resource_ids provided" + ): + FluentLocalization(["en"], resource_ids=["main.ftl"]) + + def test_invalid_locale_format_rejected_at_init(self) -> None: + """Invalid locale format raises ValueError at initialization (fail-fast). + + Locale format errors are caught at construction time rather than + propagating out of format_value during lazy bundle creation. + """ + with pytest.raises(ValueError, match=r"Invalid locale: 'invalid locale with spaces'"): + FluentLocalization(["en", "invalid locale with spaces"]) + + def test_unknown_locale_rejected_at_init(self) -> None: + """Unknown but well-formed locales are rejected before localization starts.""" + with pytest.raises(ValueError, match="Unknown locale identifier"): + FluentLocalization(["en", "xx-UNKNOWN"]) + + def test_locales_property_immutable(self) -> None: + """Locales property returns immutable tuple.""" + l10n = FluentLocalization(["en", "fr"]) + + assert isinstance(l10n.locales, tuple) + assert l10n.locales == ("en", "fr") + +class TestAddResource: + """Test dynamic resource addition.""" + + def test_add_resource_single_locale(self) -> None: + """Add FTL resource to single locale.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "hello = Hello, World!") + + result, errors = l10n.format_value("hello") + + assert not errors + assert result == "Hello, World!" + + def test_add_resource_multiple_locales(self) -> None: + """Add different resources to different locales.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("lv", "hello = Sveiki, pasaule!") + l10n.add_resource("en", "hello = Hello, World!") + + result, errors = l10n.format_value("hello") + + assert not errors + # Should use first locale (lv) + assert result == "Sveiki, pasaule!" + + def test_add_resource_invalid_locale_raises_error(self) -> None: + """Adding resource for locale not in chain raises ValueError.""" + l10n = FluentLocalization(["en"]) + + with pytest.raises(ValueError, match="Locale 'fr' not in fallback chain"): + l10n.add_resource("fr", "hello = Bonjour!") + +class TestFallbackChain: + """Test locale fallback chain logic.""" + + def test_fallback_to_second_locale(self) -> None: + """Falls back to second locale when message missing in first.""" + l10n = FluentLocalization(["lv", "en"]) + # Add message only to English (not Latvian) + l10n.add_resource("en", "greeting = Hello!") + + result, errors = l10n.format_value("greeting") + + assert not errors + assert result == "Hello!" + + def test_fallback_to_third_locale(self) -> None: + """Falls back through chain to third locale.""" + l10n = FluentLocalization(["lv", "en", "lt"]) + # Add message only to Lithuanian + l10n.add_resource("lt", "welcome = Labas!") + + result, errors = l10n.format_value("welcome") + + assert not errors + assert result == "Labas!" + + def test_first_locale_takes_precedence(self) -> None: + """First locale in chain takes precedence over later locales.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("lv", "msg = Latvian version") + l10n.add_resource("en", "msg = English version") + + result, errors = l10n.format_value("msg") + + assert not errors + # Should use first locale (lv), not fallback to en + assert result == "Latvian version" + + def test_partial_translations(self) -> None: + """Handles partial translations with different messages per locale.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("lv", "home = Mājas") + l10n.add_resource("en", "home = Home\nabout = About") + + home_result, _ = l10n.format_value("home") + about_result, _ = l10n.format_value("about") + + assert home_result == "Mājas" # From lv + assert about_result == "About" # Falls back to en + + def test_message_not_found_in_any_locale(self) -> None: + """Message not found in any locale returns fallback.""" + l10n = FluentLocalization(["lv", "en"], strict=False) + l10n.add_resource("lv", "hello = Sveiki!") + l10n.add_resource("en", "hello = Hello!") + + result, errors = l10n.format_value("nonexistent") + + assert result == "{nonexistent}" + assert len(errors) == 1 + # Check error message contains 'nonexistent' + assert "nonexistent" in str(errors[0]) + +class TestFormatValue: + """Test format_value method.""" + + def test_format_simple_message(self) -> None: + """Format simple message without variables.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "hello = Hello, World!") + + result, errors = l10n.format_value("hello") + + assert result == "Hello, World!" + assert errors == () + + def test_format_message_with_variables(self) -> None: + """Format message with variable interpolation.""" + l10n = FluentLocalization(["en"], use_isolating=False) + l10n.add_resource("en", "greeting = Hello, { $name }!") + + result, errors = l10n.format_value("greeting", {"name": "Anna"}) + + assert not errors + + assert result == "Hello, Anna!" + + def test_format_message_with_multiple_variables(self) -> None: + """Format message with multiple variables.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "user-info = { $firstName } { $lastName } (Age: { $age })") + + result, errors = l10n.format_value( + "user-info", {"firstName": "John", "lastName": "Doe", "age": 30} + ) + + assert not errors + + assert "John" in result + assert "Doe" in result + assert "30" in result + + def test_format_propagates_bundle_errors(self) -> None: + """Format propagates errors from FluentBundle.""" + l10n = FluentLocalization(["en"], strict=False) + l10n.add_resource("en", "msg = Hello, { $name }!") + + # Missing required variable + result, errors = l10n.format_value("msg") + + assert "Hello" in result + assert len(errors) > 0 # Bundle should report missing variable + + def test_empty_message_id_returns_fallback(self) -> None: + """Empty message ID returns graceful fallback.""" + l10n = FluentLocalization(["en"], strict=False) + l10n.add_resource("en", "hello = Hello!") + + result, errors = l10n.format_value("") + + assert result == "{???}" + assert len(errors) == 1 + assert "Empty or invalid message ID" in str(errors[0]) + +class TestHasMessage: + """Test has_message method.""" + + def test_has_message_in_first_locale(self) -> None: + """Returns True if message in first locale.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("lv", "hello = Sveiki!") + + assert l10n.has_message("hello") is True + + def test_has_message_in_fallback_locale(self) -> None: + """Returns True if message in fallback locale.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("en", "hello = Hello!") + + assert l10n.has_message("hello") is True + + def test_has_message_not_found(self) -> None: + """Returns False if message not in any locale.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "hello = Hello!") + + assert l10n.has_message("goodbye") is False + +class TestGetBundles: + """Test get_bundles generator.""" + + def test_get_bundles_returns_generator(self) -> None: + """get_bundles returns a generator.""" + l10n = FluentLocalization(["en", "fr"]) + + bundles_gen = l10n.get_bundles() + + # Generator should be iterable + bundles = list(bundles_gen) + assert len(bundles) == 2 + + def test_get_bundles_respects_locale_order(self) -> None: + """get_bundles yields bundles in locale priority order.""" + l10n = FluentLocalization(["lv", "en", "lt"]) + + bundles = list(l10n.get_bundles()) + + assert bundles[0].locale == "lv" + assert bundles[1].locale == "en" + assert bundles[2].locale == "lt" + +class TestUseIsolating: + """Test use_isolating parameter.""" + + def test_use_isolating_true(self) -> None: + """use_isolating=True wraps placeables in isolation marks.""" + l10n = FluentLocalization(["en"], use_isolating=True) + l10n.add_resource("en", "msg = Hello, { $name }!") + + result, errors = l10n.format_value("msg", {"name": "Anna"}) + + assert not errors + + # Should contain Unicode bidi isolation marks + assert "\u2068" in result # FSI (First Strong Isolate) + assert "\u2069" in result # PDI (Pop Directional Isolate) + + def test_use_isolating_false(self) -> None: + """use_isolating=False does not wrap placeables.""" + l10n = FluentLocalization(["en"], use_isolating=False) + l10n.add_resource("en", "msg = Hello, { $name }!") + + result, errors = l10n.format_value("msg", {"name": "Anna"}) + + assert not errors + + # Should NOT contain isolation marks + assert "\u2068" not in result + assert "\u2069" not in result diff --git a/tests/localization_cases/loaders_and_cache.py b/tests/localization_cases/loaders_and_cache.py new file mode 100644 index 00000000..84b66d2a --- /dev/null +++ b/tests/localization_cases/loaders_and_cache.py @@ -0,0 +1,317 @@ +# mypy: ignore-errors +from __future__ import annotations + +from pathlib import Path + +import pytest + +from ftllexengine.localization import ( + FluentLocalization, + PathResourceLoader, + ResourceLoader, +) +from ftllexengine.runtime.cache_config import CacheConfig + + +class TestPathResourceLoader: + """Test PathResourceLoader implementation.""" + + def test_path_resource_loader_load(self, tmp_path: Path) -> None: + """PathResourceLoader loads FTL files from disk.""" + # Create test FTL files + locales_dir = tmp_path / "locales" + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + + main_ftl = en_dir / "main.ftl" + main_ftl.write_text("hello = Hello, World!", encoding="utf-8") + + # Load resource + loader = PathResourceLoader(str(locales_dir / "{locale}")) + ftl_source = loader.load("en", "main.ftl") + + assert ftl_source == "hello = Hello, World!" + + def test_path_resource_loader_missing_locale_placeholder_raises(self) -> None: + """PathResourceLoader raises ValueError when {locale} placeholder is missing.""" + # Fail-fast: Missing placeholder would cause silent data corruption + # where all locales load from the same static path + with pytest.raises(ValueError, match=r"must contain '\{locale\}' placeholder"): + PathResourceLoader("locales/en") # Missing {locale} + + with pytest.raises(ValueError, match=r"must contain '\{locale\}' placeholder"): + PathResourceLoader("/absolute/path/to/locales") # Missing {locale} + + # Valid: Contains {locale} placeholder + loader = PathResourceLoader("locales/{locale}") # Should not raise + assert "{locale}" in loader.base_path + + def test_path_resource_loader_file_not_found(self, tmp_path: Path) -> None: + """PathResourceLoader raises FileNotFoundError for missing files.""" + loader = PathResourceLoader(str(tmp_path / "{locale}")) + + with pytest.raises(FileNotFoundError): + loader.load("en", "nonexistent.ftl") + + def test_path_resource_loader_with_localization(self, tmp_path: Path) -> None: + """PathResourceLoader integrates with FluentLocalization.""" + # Create test structure: locales/en/main.ftl, locales/lv/main.ftl + locales_dir = tmp_path / "locales" + + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text("hello = Hello!", encoding="utf-8") + + lv_dir = locales_dir / "lv" + lv_dir.mkdir(parents=True) + (lv_dir / "main.ftl").write_text("hello = Sveiki!", encoding="utf-8") + + # Create localization with loader + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["lv", "en"], ["main.ftl"], loader) + + result, errors = l10n.format_value("hello") + + assert not errors + assert result == "Sveiki!" # From lv + + def test_path_resource_loader_missing_locale_file_uses_fallback( + self, tmp_path: Path + ) -> None: + """Missing locale file falls back to next locale.""" + # Create only English file (no Latvian) + locales_dir = tmp_path / "locales" + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text("hello = Hello!", encoding="utf-8") + + # Latvian directory doesn't exist - will fall back to English + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["lv", "en"], ["main.ftl"], loader) + + result, errors = l10n.format_value("hello") + + assert not errors + assert result == "Hello!" # Fell back to English + + def test_resource_loader_describe_path_default(self) -> None: + """ResourceLoader.describe_path default returns locale/resource_id.""" + + class _MinimalLoader(ResourceLoader): + def load(self, _locale: str, _resource_id: str) -> str: + return "" + + loader = _MinimalLoader() + result = loader.describe_path("en", "main.ftl") + assert result == "en/main.ftl" + + def test_resource_loader_describe_path_default_no_override(self) -> None: + """ResourceLoader.describe_path default is used when subclass does not override.""" + + class _BareLoader(ResourceLoader): + def load(self, _locale: str, _resource_id: str) -> str: + return "" + + loader = _BareLoader() + assert loader.describe_path("de_DE", "errors.ftl") == "de_DE/errors.ftl" + +class TestRealWorldScenarios: + """Test real-world usage patterns.""" + + def test_e_commerce_site_partial_translations(self) -> None: + """E-commerce site with partial Latvian translations.""" + l10n = FluentLocalization(["lv", "en"], use_isolating=False) + + # Latvian has only some translations + l10n.add_resource( + "lv", + """ +welcome = Sveiki, { $name }! +cart = Grozs +""", + ) + + # English has full translations + l10n.add_resource( + "en", + """ +welcome = Hello, { $name }! +cart = Cart +checkout = Checkout +payment-error = Payment failed: { $reason } +""", + ) + + # Messages in Latvian use lv + welcome, _ = l10n.format_value("welcome", {"name": "Anna"}) + assert welcome == "Sveiki, Anna!" + + cart, _ = l10n.format_value("cart") + assert cart == "Grozs" + + # Missing messages fall back to English + checkout, _ = l10n.format_value("checkout") + assert checkout == "Checkout" + + payment, _ = l10n.format_value("payment-error", {"reason": "Invalid card"}) + assert payment == "Payment failed: Invalid card" + + def test_fallback_chain_three_locales(self) -> None: + """Complex fallback: lv → en → lt.""" + l10n = FluentLocalization(["lv", "en", "lt"]) + + l10n.add_resource("lv", "home = Mājas") + l10n.add_resource("en", "home = Home\nabout = About") + l10n.add_resource("lt", "home = Namai\nabout = Apie\ncontact = Kontaktai") + + home, _ = l10n.format_value("home") + assert home == "Mājas" # From lv + + about, _ = l10n.format_value("about") + assert about == "About" # Falls back to en (skips lv) + + contact, _ = l10n.format_value("contact") + assert contact == "Kontaktai" # Falls back to lt (skips lv, en) + + def test_multiple_resource_files(self, tmp_path: Path) -> None: + """Multiple FTL files per locale (ui.ftl, errors.ftl).""" + # Create directory structure + locales_dir = tmp_path / "locales" + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + + (en_dir / "ui.ftl").write_text("hello = Hello!\nwelcome = Welcome!", encoding="utf-8") + (en_dir / "errors.ftl").write_text("error-404 = Page not found", encoding="utf-8") + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["en"], ["ui.ftl", "errors.ftl"], loader) + + # Should load from both files + hello, _ = l10n.format_value("hello") + error, _ = l10n.format_value("error-404") + + assert hello == "Hello!" + assert error == "Page not found" + +class TestCacheConfiguration: + """Test cache configuration in FluentLocalization.""" + + def test_cache_disabled_by_default(self) -> None: + """Cache is disabled by default.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "msg = Hello") + + # Format twice + l10n.format_value("msg") + l10n.format_value("msg") + + # Get stats from first bundle + bundles = list(l10n.get_bundles()) + stats = bundles[0].get_cache_stats() + + # Cache disabled - stats should be None + assert stats is None + + def test_cache_enabled_with_parameter(self) -> None: + """Cache can be enabled via constructor parameter.""" + l10n = FluentLocalization(["en"], cache=CacheConfig()) + l10n.add_resource("en", "msg = Hello") + + # Format twice - should hit cache on second call + l10n.format_value("msg") + l10n.format_value("msg") + + # Get stats from first bundle + bundles = list(l10n.get_bundles()) + stats = bundles[0].get_cache_stats() + + # Cache enabled - should have stats + assert stats is not None + assert stats["hits"] == 1 + assert stats["misses"] == 1 + + def test_cache_size_configurable(self) -> None: + """Cache size can be configured via constructor parameter.""" + l10n = FluentLocalization(["en"], cache=CacheConfig(size=500)) + l10n.add_resource("en", "msg = Hello") + + # Format message + l10n.format_value("msg") + + # Verify cache is enabled (size configuration is internal) + bundles = list(l10n.get_bundles()) + stats = bundles[0].get_cache_stats() + assert stats is not None + + def test_cache_works_across_multiple_locales(self) -> None: + """Cache enabled for all bundles in multi-locale setup.""" + l10n = FluentLocalization(["lv", "en"], cache=CacheConfig()) + l10n.add_resource("lv", "msg = Sveiki") + l10n.add_resource("en", "msg = Hello") + + # Format from primary locale (lv) + l10n.format_value("msg") + l10n.format_value("msg") + + # Verify lv bundle has cache hits + bundles = list(l10n.get_bundles()) + lv_stats = bundles[0].get_cache_stats() + assert lv_stats is not None + assert lv_stats["hits"] == 1 + + def test_clear_cache_on_all_bundles(self) -> None: + """clear_cache() clears cache on all bundles.""" + l10n = FluentLocalization(["lv", "en"], cache=CacheConfig()) + l10n.add_resource("lv", "msg = Sveiki") + l10n.add_resource("en", "msg = Hello") + + # Format messages to populate cache + l10n.format_value("msg") + l10n.format_value("msg") + + # Clear cache + l10n.clear_cache() + + # Format again - should be cache miss + l10n.format_value("msg") + + # Verify cache was cleared; metrics are cumulative (not reset on clear). + # 1 miss before clear + 1 miss after clear = 2 cumulative misses. + bundles = list(l10n.get_bundles()) + lv_stats = bundles[0].get_cache_stats() + assert lv_stats is not None + assert lv_stats["misses"] == 2 # Pre-clear miss + post-clear miss + +class TestCacheIntrospection: + """Test cache introspection properties.""" + + def test_cache_enabled_property_when_enabled(self) -> None: + """cache_enabled property returns True when caching enabled.""" + l10n = FluentLocalization(["en"], cache=CacheConfig()) + assert l10n.cache_enabled is True + + def test_cache_enabled_property_when_disabled(self) -> None: + """cache_enabled property returns False when no CacheConfig is provided.""" + l10n = FluentLocalization(["en"]) + assert l10n.cache_enabled is False + + def test_cache_config_property_when_enabled(self) -> None: + """cache_config property returns CacheConfig when caching enabled.""" + l10n = FluentLocalization(["en"], cache=CacheConfig(size=500)) + assert l10n.cache_config is not None + assert l10n.cache_config.size == 500 + + def test_cache_config_property_when_disabled(self) -> None: + """cache_config returns None when caching disabled.""" + l10n = FluentLocalization(["en"]) + assert l10n.cache_config is None + + def test_bundle_cache_properties_reflect_localization_config(self) -> None: + """Individual bundles reflect FluentLocalization cache config.""" + l10n = FluentLocalization(["lv", "en"], cache=CacheConfig(size=250)) + + # Check all bundles have matching config + for bundle in l10n.get_bundles(): + assert bundle.cache_enabled is True + assert bundle.cache_config is not None + assert bundle.cache_config.size == 250 diff --git a/tests/localization_cases/multilocale_and_callbacks.py b/tests/localization_cases/multilocale_and_callbacks.py new file mode 100644 index 00000000..3e9dcd6c --- /dev/null +++ b/tests/localization_cases/multilocale_and_callbacks.py @@ -0,0 +1,491 @@ +# mypy: ignore-errors +from __future__ import annotations + +from pathlib import Path + +from ftllexengine import FluentBundle +from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.localization import ( + FallbackInfo, + FluentLocalization, + PathResourceLoader, +) + + +class TestMultiLocaleFileLoading: + """Tests for multi-locale file loading workflows. + + These tests verify the end-to-end workflow of loading FTL files + from disk across multiple locales with proper fallback behavior. + """ + + def test_load_multiple_files_per_locale(self, tmp_path: Path) -> None: + """Multiple FTL files per locale are loaded and merged correctly.""" + locales_dir = tmp_path / "locales" + + # Create en locale with multiple files + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text("welcome = Welcome!", encoding="utf-8") + (en_dir / "errors.ftl").write_text("error-404 = Not Found", encoding="utf-8") + (en_dir / "buttons.ftl").write_text("submit = Submit", encoding="utf-8") + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization( + ["en"], ["main.ftl", "errors.ftl", "buttons.ftl"], loader + ) + + # All messages from all files should be available + welcome, _ = l10n.format_value("welcome") + error, _ = l10n.format_value("error-404") + submit, _ = l10n.format_value("submit") + + assert welcome == "Welcome!" + assert error == "Not Found" + assert submit == "Submit" + + def test_fallback_across_multiple_files(self, tmp_path: Path) -> None: + """Fallback works correctly across multiple files and locales.""" + locales_dir = tmp_path / "locales" + + # Create en locale (complete) + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text("home = Home\nabout = About", encoding="utf-8") + (en_dir / "errors.ftl").write_text("error-404 = Not Found", encoding="utf-8") + + # Create de locale (partial - missing errors.ftl) + de_dir = locales_dir / "de" + de_dir.mkdir(parents=True) + (de_dir / "main.ftl").write_text("home = Startseite\nabout = Uber uns", encoding="utf-8") + # Note: de/errors.ftl intentionally missing + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["de", "en"], ["main.ftl", "errors.ftl"], loader) + + # de messages should come from de + home, _ = l10n.format_value("home") + assert home == "Startseite" + + # error should fall back to en (de/errors.ftl missing) + error, _ = l10n.format_value("error-404") + assert error == "Not Found" + + def test_partial_translation_within_file(self, tmp_path: Path) -> None: + """Partial translations within a file fall back correctly.""" + locales_dir = tmp_path / "locales" + + # Create en locale (complete) + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text( + "home = Home\nabout = About\ncontact = Contact", encoding="utf-8" + ) + + # Create fr locale (partial translations) + fr_dir = locales_dir / "fr" + fr_dir.mkdir(parents=True) + (fr_dir / "main.ftl").write_text("home = Accueil", encoding="utf-8") + # Note: about and contact missing in fr + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["fr", "en"], ["main.ftl"], loader) + + # fr message from fr + home, _ = l10n.format_value("home") + assert home == "Accueil" + + # missing fr messages fall back to en + about, _ = l10n.format_value("about") + contact, _ = l10n.format_value("contact") + assert about == "About" + assert contact == "Contact" + + def test_three_locale_fallback_chain(self, tmp_path: Path) -> None: + """Three-locale fallback chain works correctly.""" + locales_dir = tmp_path / "locales" + + # en has all messages + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text( + "level1 = English One\nlevel2 = English Two\nlevel3 = English Three", + encoding="utf-8" + ) + + # de has two messages + de_dir = locales_dir / "de" + de_dir.mkdir(parents=True) + (de_dir / "main.ftl").write_text( + "level1 = Deutsch Eins\nlevel2 = Deutsch Zwei", + encoding="utf-8" + ) + + # fr has one message + fr_dir = locales_dir / "fr" + fr_dir.mkdir(parents=True) + (fr_dir / "main.ftl").write_text("level1 = Francais Un", encoding="utf-8") + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["fr", "de", "en"], ["main.ftl"], loader) + + # level1 from fr (first locale) + level1, _ = l10n.format_value("level1") + assert level1 == "Francais Un" + + # level2 from de (second locale, fr doesn't have it) + level2, _ = l10n.format_value("level2") + assert level2 == "Deutsch Zwei" + + # level3 from en (third locale, fr and de don't have it) + level3, _ = l10n.format_value("level3") + assert level3 == "English Three" + + def test_unicode_content_in_files(self, tmp_path: Path) -> None: + """FTL files containing CJK and accented Unicode characters load correctly.""" + locales_dir = tmp_path / "locales" + + ja_dir = locales_dir / "ja" + ja_dir.mkdir(parents=True) + (ja_dir / "main.ftl").write_text( + "greeting = \u3053\u3093\u306b\u3061\u306f\u4e16\u754c", encoding="utf-8" + ) + + lv_dir = locales_dir / "lv" + lv_dir.mkdir(parents=True) + (lv_dir / "main.ftl").write_text("greeting = Sveiki, pasaule!", encoding="utf-8") + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + + l10n_ja = FluentLocalization(["ja"], ["main.ftl"], loader) + l10n_lv = FluentLocalization(["lv"], ["main.ftl"], loader) + + ja_greeting, _ = l10n_ja.format_value("greeting") + lv_greeting, _ = l10n_lv.format_value("greeting") + + assert "\u3053\u3093\u306b\u3061\u306f" in ja_greeting + assert lv_greeting == "Sveiki, pasaule!" + + def test_missing_locale_directory_falls_back(self, tmp_path: Path) -> None: + """Missing locale directory gracefully falls back to next locale.""" + locales_dir = tmp_path / "locales" + + # Only create en directory (no de) + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text("greeting = Hello!", encoding="utf-8") + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + # de is first but doesn't exist + l10n = FluentLocalization(["de", "en"], ["main.ftl"], loader) + + # Should fall back to en + greeting, _ = l10n.format_value("greeting") + assert greeting == "Hello!" + + def test_empty_file_handled_gracefully(self, tmp_path: Path) -> None: + """Empty FTL files are handled without errors.""" + locales_dir = tmp_path / "locales" + + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "empty.ftl").write_text("", encoding="utf-8") + (en_dir / "main.ftl").write_text("greeting = Hello!", encoding="utf-8") + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["en"], ["empty.ftl", "main.ftl"], loader) + + # Should still work - empty file just adds no messages + greeting, _ = l10n.format_value("greeting") + assert greeting == "Hello!" + + def test_file_with_only_comments(self, tmp_path: Path) -> None: + """FTL files with only comments are handled correctly.""" + locales_dir = tmp_path / "locales" + + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "comments.ftl").write_text( + "# This file has only comments\n## Section comment\n### Resource comment", + encoding="utf-8" + ) + (en_dir / "main.ftl").write_text("greeting = Hello!", encoding="utf-8") + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["en"], ["comments.ftl", "main.ftl"], loader) + + # Should work - comments file adds no messages + greeting, _ = l10n.format_value("greeting") + assert greeting == "Hello!" + + def test_variables_in_file_loaded_messages(self, tmp_path: Path) -> None: + """Variables work correctly in file-loaded messages.""" + locales_dir = tmp_path / "locales" + + en_dir = locales_dir / "en" + en_dir.mkdir(parents=True) + (en_dir / "main.ftl").write_text( + "greeting = Hello, { $name }!\ncount = You have { $n } items.", + encoding="utf-8" + ) + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + l10n = FluentLocalization(["en"], ["main.ftl"], loader, use_isolating=False) + + greeting, _ = l10n.format_value("greeting", {"name": "World"}) + count, _ = l10n.format_value("count", {"n": 42}) + + assert greeting == "Hello, World!" + assert "42" in count + +class TestOnFallbackCallback: + """on_fallback callback is invoked when a message resolves from a fallback locale.""" + + def test_on_fallback_invoked_on_format_value(self) -> None: + """on_fallback callback invoked when message resolved from fallback locale.""" + fallback_events: list[FallbackInfo] = [] + + def record_fallback(info: FallbackInfo) -> None: + fallback_events.append(info) + + l10n = FluentLocalization(["lv", "en"], on_fallback=record_fallback) + + # Add message only to fallback locale (en) + l10n.add_resource("en", "fallback-msg = English fallback") + + # Request message - should trigger fallback + result, _ = l10n.format_value("fallback-msg") + + assert result == "English fallback" + assert len(fallback_events) == 1 + assert fallback_events[0].requested_locale == normalize_locale("lv") + assert fallback_events[0].resolved_locale == normalize_locale("en") + assert fallback_events[0].message_id == "fallback-msg" + + def test_on_fallback_invoked_on_format_pattern(self) -> None: + """on_fallback callback invoked in format_pattern when using fallback locale.""" + fallback_events: list[FallbackInfo] = [] + + def record_fallback(info: FallbackInfo) -> None: + fallback_events.append(info) + + l10n = FluentLocalization(["de", "en"], on_fallback=record_fallback) + + # Add message only to fallback locale (en) + l10n.add_resource("en", "pattern-msg = Pattern from fallback") + + # Request message via format_pattern - should trigger fallback + result, _ = l10n.format_pattern("pattern-msg") + + assert result == "Pattern from fallback" + assert len(fallback_events) == 1 + assert fallback_events[0].requested_locale == normalize_locale("de") + assert fallback_events[0].resolved_locale == normalize_locale("en") + assert fallback_events[0].message_id == "pattern-msg" + + def test_on_fallback_not_invoked_for_primary_locale(self) -> None: + """on_fallback not invoked when message found in primary locale.""" + fallback_events: list[FallbackInfo] = [] + + def record_fallback(info: FallbackInfo) -> None: + fallback_events.append(info) + + l10n = FluentLocalization(["fr", "en"], on_fallback=record_fallback) + + # Add message to primary locale (fr) + l10n.add_resource("fr", "french-msg = Message en francais") + + result, _ = l10n.format_value("french-msg") + + assert result == "Message en francais" + assert len(fallback_events) == 0 # No fallback occurred + + def test_on_fallback_none_does_not_raise(self) -> None: + """on_fallback=None (default) works without errors.""" + l10n = FluentLocalization(["lv", "en"]) + + l10n.add_resource("en", "msg = No callback") + + # Should not raise even without callback + result, _ = l10n.format_value("msg") + assert result == "No callback" + + def test_on_fallback_multiple_calls(self) -> None: + """on_fallback invoked for each fallback resolution.""" + fallback_events: list[FallbackInfo] = [] + + def record_fallback(info: FallbackInfo) -> None: + fallback_events.append(info) + + l10n = FluentLocalization(["it", "en"], on_fallback=record_fallback) + + l10n.add_resource("en", "msg1 = First\nmsg2 = Second") + + l10n.format_value("msg1") + l10n.format_value("msg2") + + assert len(fallback_events) == 2 + assert fallback_events[0].message_id == "msg1" + assert fallback_events[1].message_id == "msg2" + + def test_on_fallback_with_format_pattern_and_attribute(self) -> None: + """on_fallback invoked in format_pattern with attribute access.""" + fallback_events: list[FallbackInfo] = [] + + def record_fallback(info: FallbackInfo) -> None: + fallback_events.append(info) + + l10n = FluentLocalization(["es", "en"], on_fallback=record_fallback) + + l10n.add_resource( + "en", + """ +button = Click + .tooltip = Button tooltip +""", + ) + + # Request attribute via format_pattern + result, _ = l10n.format_pattern("button", attribute="tooltip") + + assert result == "Button tooltip" + assert len(fallback_events) == 1 + assert fallback_events[0].message_id == "button" + +class TestCrossFileDepthValidation: + """Reference depth limits are enforced even when the chain spans multiple add_resource calls.""" + + def test_deep_reference_chain_across_resources(self) -> None: + """Reference chains spanning multiple resources respect depth limits. + + When messages reference each other across multiple add_resource calls, + the total depth limit should still be enforced. + """ + l10n = FluentLocalization(["en"], use_isolating=False) + + # Add resources separately - chain: level5 -> level4 -> level3 -> level2 -> level1 -> base + l10n.add_resource("en", "base = Base value") + l10n.add_resource("en", "level1 = L1: { base }") + l10n.add_resource("en", "level2 = L2: { level1 }") + l10n.add_resource("en", "level3 = L3: { level2 }") + l10n.add_resource("en", "level4 = L4: { level3 }") + l10n.add_resource("en", "level5 = L5: { level4 }") + + # Should resolve successfully (depth 6 is within default limit of 50) + result, errors = l10n.format_value("level5") + + assert not errors + assert "Base value" in result + assert "L1:" in result + assert "L5:" in result + + def test_very_deep_reference_chain_is_limited(self) -> None: + """Reference chains exceeding max_nesting_depth produce errors, not stack overflow.""" + bundle = FluentBundle("en", use_isolating=False, max_nesting_depth=10, strict=False) + + # Build a chain deeper than max_nesting_depth + bundle.add_resource("level0 = Base") + for i in range(1, 15): # 15 levels, exceeds max_depth of 10 + bundle.add_resource(f"level{i} = Chain {{ level{i-1} }}") + + # Resolving the deepest level should hit depth limit + result, errors = bundle.format_pattern("level14") + + # Depth limit exceeded must produce resolution errors + assert len(errors) > 0, f"Expected depth limit errors; got result={result!r}" + + def test_cross_file_term_reference_depth(self) -> None: + """Term references across resources are tracked for depth. + + Terms (-name syntax) referenced across multiple resources + should also respect depth limits. + """ + l10n = FluentLocalization(["en"], use_isolating=False) + + # Add terms and messages across resources + l10n.add_resource("en", "-brand = Firefox") + l10n.add_resource("en", "-product = { -brand } Browser") + l10n.add_resource("en", "title = Welcome to { -product }") + l10n.add_resource("en", "subtitle = { title } - Get Started") + + result, errors = l10n.format_value("subtitle") + + assert not errors + assert "Firefox" in result + assert "Browser" in result + assert "Welcome" in result + + def test_cross_locale_depth_isolation(self) -> None: + """Depth limits are applied per-resolution, not accumulating across locales. + + When falling back through locales, each resolution attempt + should have its own depth counter, not share state. + """ + l10n = FluentLocalization(["de", "en"], use_isolating=False) + + # German has deep chain + l10n.add_resource("de", "a = DE-A") + l10n.add_resource("de", "b = DE-B: { a }") + l10n.add_resource("de", "c = DE-C: { b }") + + # English has different chain (also deep) + l10n.add_resource("en", "x = EN-X") + l10n.add_resource("en", "y = EN-Y: { x }") + + # Resolve German chain + result_c, errors_c = l10n.format_value("c") + assert not errors_c + assert "DE-A" in result_c + + # Resolve English chain (falls back since not in de) + result_y, errors_y = l10n.format_value("y") + assert not errors_y + assert "EN-X" in result_y + + def test_circular_reference_detection_across_resources(self) -> None: + """Circular references across resources are detected. + + Even when circular references are created by adding resources + separately, the resolver should detect and break the cycle. + """ + l10n = FluentLocalization(["en"], use_isolating=False, strict=False) + + # Create circular reference: msg1 -> msg2 -> msg3 -> msg1 + l10n.add_resource("en", "msg1 = Start: { msg2 }") + l10n.add_resource("en", "msg2 = Middle: { msg3 }") + l10n.add_resource("en", "msg3 = End: { msg1 }") # Circular! + + # Resolution should detect cycle and not stack overflow + result, errors = l10n.format_value("msg1") + + # Should have errors (cycle detected) or produce partial result + # The key is it doesn't hang or crash + assert isinstance(result, str) + assert len(errors) > 0 or "{" in result # Either errors or unresolved placeholder + + def test_select_expression_depth_across_resources(self) -> None: + """Select expressions in cross-resource chains respect depth. + + Complex patterns with select expressions referenced across + resources should not bypass depth limits. + """ + l10n = FluentLocalization(["en"], use_isolating=False) + + l10n.add_resource( + "en", + """ +base = { $type -> + [a] Type A + [b] Type B + *[other] Unknown +} +""", + ) + l10n.add_resource("en", "wrapper = Result: { base }") + l10n.add_resource("en", "outer = Final: { wrapper }") + + result, errors = l10n.format_value("outer", {"type": "a"}) + + assert not errors + assert "Type A" in result + assert "Final:" in result diff --git a/tests/localization_cases/validation_and_streams.py b/tests/localization_cases/validation_and_streams.py new file mode 100644 index 00000000..72068835 --- /dev/null +++ b/tests/localization_cases/validation_and_streams.py @@ -0,0 +1,411 @@ +# mypy: ignore-errors +from __future__ import annotations + +import tempfile +from pathlib import Path + +import pytest + +import ftllexengine +from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.enums import LoadStatus +from ftllexengine.localization import ( + FallbackInfo, + FluentLocalization, + LoadSummary, + LocalizationBootConfig, + LocalizationCacheStats, + PathResourceLoader, + ResourceLoader, + ResourceLoadResult, +) +from ftllexengine.syntax.ast import Message + + +class TestPathResourceLoaderResolvedRoot: + """PathResourceLoader._resolved_root falls back to cwd when no static prefix.""" + + def test_resolved_root_fallback_to_cwd(self) -> None: + """Pattern with no static path prefix resolves root to current working directory.""" + loader = PathResourceLoader("{locale}") + expected = Path.cwd().resolve() + assert loader._resolved_root == expected # pylint: disable=protected-access + +class TestPathResourceLoaderSecurity: + """PathResourceLoader rejects path traversal and absolute path inputs.""" + + def test_load_rejects_absolute_path(self) -> None: + """Absolute path resource_id raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match="Absolute paths not allowed"): + loader.load("en", "/etc/passwd") + + def test_load_rejects_absolute_path_posix_style(self) -> None: + """POSIX absolute path resource_id raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match="Absolute paths not allowed"): + loader.load("en", "/usr/local/etc/passwd") + + def test_load_rejects_parent_directory_traversal(self) -> None: + """'..' sequences in resource_id raise ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match="Path traversal sequences not allowed"): + loader.load("en", "../../../etc/passwd") + + def test_load_rejects_parent_directory_in_middle(self) -> None: + """'..' in the middle of a resource_id path raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match="Path traversal sequences not allowed"): + loader.load("en", "foo/../bar/../secrets.ftl") + + def test_load_rejects_path_starting_with_forward_slash(self) -> None: + """resource_id starting with '/' is rejected as absolute or separator-prefixed. + + On Unix, /messages.ftl is caught as an absolute path first. + On Windows with forward slash it may be caught by the separator check. + """ + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match=r"(Absolute|separator)"): + loader.load("en", "/messages.ftl") + + def test_load_rejects_path_starting_with_backslash(self) -> None: + """resource_id starting with '\\' is rejected.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match="not allowed in resource_id"): + loader.load("en", "\\messages.ftl") + + def test_load_detects_symlink_escape_via_is_safe_path(self) -> None: + """Symlink pointing outside the base directory is rejected by _is_safe_path.""" + + with tempfile.TemporaryDirectory() as tmpdir: + base_path = Path(tmpdir) + locale_dir = base_path / "locales" / "en" + locale_dir.mkdir(parents=True) + + outside_dir = base_path / "outside" + outside_dir.mkdir() + secret_file = outside_dir / "secret.ftl" + secret_file.write_text("secret = Secret data") + + symlink_path = locale_dir / "escape.ftl" + try: + symlink_path.symlink_to(secret_file) + + loader = PathResourceLoader(str(base_path / "locales" / "{locale}")) + + with pytest.raises(ValueError, match="Path traversal detected"): + loader.load("en", "escape.ftl") + except OSError: + pytest.skip("Symlink creation not supported on this system") + +class TestPathResourceLoaderValidation: + """PathResourceLoader accepts valid resource_ids and rejects malformed ones.""" + + def test_load_with_valid_resource_id(self) -> None: + """Valid resource_id loads file content correctly.""" + + with tempfile.TemporaryDirectory() as tmpdir: + base = Path(tmpdir) + locale_dir = base / "locales" / "en" + locale_dir.mkdir(parents=True) + + test_file = locale_dir / "messages.ftl" + test_file.write_text("hello = Hello, World!") + + loader = PathResourceLoader(str(base / "locales" / "{locale}")) + content = loader.load("en", "messages.ftl") + + assert "Hello, World!" in content + + def test_load_with_subdirectory_resource_id(self) -> None: + """Subdirectory in resource_id resolves to nested path correctly.""" + + with tempfile.TemporaryDirectory() as tmpdir: + base = Path(tmpdir) + locale_dir = base / "locales" / "en" / "ui" + locale_dir.mkdir(parents=True) + + test_file = locale_dir / "buttons.ftl" + test_file.write_text("save = Save") + + loader = PathResourceLoader(str(base / "locales" / "{locale}")) + content = loader.load("en", "ui/buttons.ftl") + + assert "Save" in content + + def test_validate_resource_id_validates_before_path_resolution(self) -> None: + """Validation rejects malformed resource_ids before any filesystem operations.""" + loader = PathResourceLoader("locales/{locale}") + + invalid_ids = [ + "/absolute/path.ftl", + "..\\parent\\path.ftl", + "..\\..\\..\\escape.ftl", + "\\windows\\path.ftl", + ] + + for invalid_id in invalid_ids: + with pytest.raises(ValueError, match=r"(Absolute|traversal|separator)"): + loader.load("en", invalid_id) + +class TestPathResourceLoaderLocaleValidation: + """PathResourceLoader rejects locale codes containing path traversal sequences.""" + + def test_load_rejects_locale_with_parent_traversal(self) -> None: + """'..' in locale code raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match=r"Invalid locale: '../../../etc'"): + loader.load("../../../etc", "messages.ftl") + + def test_load_rejects_locale_with_embedded_traversal(self) -> None: + """'..' embedded within locale code raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match=r"Invalid locale: 'en/\.\./de'"): + loader.load("en/../de", "messages.ftl") + + def test_load_rejects_locale_with_forward_slash(self) -> None: + """'/' in locale code raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match=r"Invalid locale: 'en/attack'"): + loader.load("en/attack", "messages.ftl") + + def test_load_rejects_locale_with_backslash(self) -> None: + """'\\' in locale code raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match=r"Invalid locale: 'en\\\\attack'"): + loader.load("en\\attack", "messages.ftl") + + def test_load_rejects_empty_locale(self) -> None: + """Empty locale code raises ValueError.""" + loader = PathResourceLoader("locales/{locale}") + + with pytest.raises(ValueError, match="locale cannot be blank"): + loader.load("", "messages.ftl") + + def test_load_accepts_valid_locale_codes(self) -> None: + """Standard BCP 47-style locale codes are accepted.""" + + with tempfile.TemporaryDirectory() as tmpdir: + base = Path(tmpdir) + + valid_locales = ["en", "en_US", "de_DE", "lv_LV", "zh_Hans_CN"] + + for locale in valid_locales: + locale_dir = base / "locales" / normalize_locale(locale) + locale_dir.mkdir(parents=True, exist_ok=True) + test_file = locale_dir / "test.ftl" + test_file.write_text(f"msg = Test for {locale}") + + loader = PathResourceLoader(str(base / "locales" / "{locale}")) + + for locale in valid_locales: + content = loader.load(locale, "test.ftl") + assert f"Test for {locale}" in content + + def test_root_dir_parameter_provides_fixed_anchor(self) -> None: + """root_dir anchors path validation independently of the locale parameter.""" + + with tempfile.TemporaryDirectory() as tmpdir: + base = Path(tmpdir) + locale_dir = base / "locales" / "en" + locale_dir.mkdir(parents=True) + test_file = locale_dir / "test.ftl" + test_file.write_text("msg = Test") + + loader = PathResourceLoader( + str(base / "locales" / "{locale}"), + root_dir=str(base), + ) + + content = loader.load("en", "test.ftl") + assert "Test" in content + + def test_root_dir_prevents_locale_escape_attempt(self) -> None: + """root_dir constrains path validation to a fixed boundary. + + When a symlink inside the locale directory resolves to a file + outside root_dir, the loader raises ValueError even though the + resource_id itself contains no traversal sequences. + """ + with tempfile.TemporaryDirectory() as tmpdir: + base = Path(tmpdir) + locale_dir = base / "locales" / "en" + locale_dir.mkdir(parents=True) + (locale_dir / "test.ftl").write_text("msg = Test") + + outside = base / "outside" + outside.mkdir() + secret = outside / "secret.ftl" + secret.write_text("secret = Should not access") + + loader = PathResourceLoader( + str(base / "locales" / "{locale}"), + root_dir=str(base / "locales"), + ) + + # Normal load within root_dir succeeds + content = loader.load("en", "test.ftl") + assert "Test" in content + + # Symlink from within locale dir to a file outside root_dir + escape_link = locale_dir / "escape.ftl" + try: + escape_link.symlink_to(secret) + # The resource_id has no '..' but the resolved path escapes root_dir + with pytest.raises(ValueError, match="Path traversal detected"): + loader.load("en", "escape.ftl") + except OSError: + pytest.skip("Symlink creation not supported on this system") + +class TestLocalizationBootTypesFacadeExport: + """Boot evidence types and loaders are accessible from the root facade.""" + + def test_load_status_accessible_from_root_facade(self) -> None: + """LoadStatus enum is exported from ftllexengine root facade.""" + assert ftllexengine.LoadStatus is LoadStatus + + def test_load_status_in_root_all(self) -> None: + """LoadStatus is listed in ftllexengine.__all__.""" + assert "LoadStatus" in ftllexengine.__all__ + + def test_fallback_info_accessible_from_root_facade(self) -> None: + """FallbackInfo is exported from ftllexengine root facade.""" + assert ftllexengine.FallbackInfo is FallbackInfo + + def test_load_summary_accessible_from_root_facade(self) -> None: + """LoadSummary is exported from ftllexengine root facade.""" + assert ftllexengine.LoadSummary is LoadSummary + + def test_resource_load_result_accessible_from_root_facade(self) -> None: + """ResourceLoadResult is exported from ftllexengine root facade.""" + assert ftllexengine.ResourceLoadResult is ResourceLoadResult + + def test_resource_loader_accessible_from_root_facade(self) -> None: + """ResourceLoader Protocol is exported from ftllexengine root facade.""" + assert ftllexengine.ResourceLoader is ResourceLoader + + def test_path_resource_loader_accessible_from_root_facade(self) -> None: + """PathResourceLoader is exported from ftllexengine root facade.""" + assert ftllexengine.PathResourceLoader is PathResourceLoader + + def test_localization_boot_config_accessible_from_root_facade(self) -> None: + """LocalizationBootConfig is exported from ftllexengine root facade.""" + assert ftllexengine.LocalizationBootConfig is LocalizationBootConfig + + def test_localization_cache_stats_accessible_from_root_facade(self) -> None: + """LocalizationCacheStats is exported from ftllexengine root facade.""" + assert ftllexengine.LocalizationCacheStats is LocalizationCacheStats + + def test_boot_types_in_root_all(self) -> None: + """All boot evidence types are listed in ftllexengine.__all__.""" + for name in ( + "FallbackInfo", + "LoadSummary", + "LocalizationBootConfig", + "LocalizationCacheStats", + "PathResourceLoader", + "ResourceLoadResult", + "ResourceLoader", + ): + assert name in ftllexengine.__all__, f"{name!r} missing from ftllexengine.__all__" + +class TestFluentLocalizationAddResourceStream: + """FluentLocalization.add_resource_stream incremental resource loading.""" + + def test_loads_message_from_line_list(self) -> None: + """add_resource_stream registers messages for a locale.""" + l10n = FluentLocalization( + locales=("en",), + resource_ids=(), + ) + l10n.add_resource_stream("en", ["greeting = Hello\n"]) + result, errors = l10n.format_pattern("greeting") + assert errors == () + assert result == "Hello" + + def test_invalid_locale_raises(self) -> None: + """Locale not in fallback chain raises ValueError.""" + l10n = FluentLocalization(locales=("en",), resource_ids=()) + with pytest.raises(ValueError, match="not in fallback chain"): + l10n.add_resource_stream("de", ["msg = Value\n"]) + + def test_returns_empty_junk_on_clean_source(self) -> None: + """Clean stream returns empty junk tuple.""" + l10n = FluentLocalization(locales=("en",), resource_ids=()) + junk = l10n.add_resource_stream("en", ["msg = Value\n"]) + assert junk == () + + def test_source_path_accepted(self) -> None: + """source_path kwarg threads through without error.""" + l10n = FluentLocalization(locales=("en",), resource_ids=()) + l10n.add_resource_stream( + "en", ["msg = Value\n"], source_path="locales/en/ui.ftl" + ) + result, _ = l10n.format_pattern("msg") + assert result == "Value" + + def test_multiple_messages_from_stream(self) -> None: + """Multiple messages from a stream are all registered.""" + l10n = FluentLocalization(locales=("en",), resource_ids=()) + l10n.add_resource_stream("en", ["msg1 = One\n", "\n", "msg2 = Two\n"]) + r1, _ = l10n.format_pattern("msg1") + r2, _ = l10n.format_pattern("msg2") + assert r1 == "One" + assert r2 == "Two" + + def test_equivalence_with_add_resource(self) -> None: + """add_resource_stream produces same result as add_resource for same content.""" + source = "msg = Hello\n" + l1 = FluentLocalization(locales=("en",), resource_ids=()) + l1.add_resource("en", source) + l2 = FluentLocalization(locales=("en",), resource_ids=()) + l2.add_resource_stream("en", source.splitlines(keepends=True)) + r1, e1 = l1.format_pattern("msg") + r2, e2 = l2.format_pattern("msg") + assert r1 == r2 + assert e1 == e2 + + def test_second_call_reuses_existing_bundle(self) -> None: + """Second add_resource_stream call for same locale reuses the existing bundle. + + The first call creates the bundle lazily; the second call must take the + branch where the bundle already exists in _bundles (line 734->736 coverage). + """ + l10n = FluentLocalization(locales=("en",), resource_ids=()) + l10n.add_resource_stream("en", ["msg1 = First\n"]) + l10n.add_resource_stream("en", ["msg2 = Second\n"]) + r1, e1 = l10n.format_pattern("msg1") + r2, e2 = l10n.format_pattern("msg2") + assert r1 == "First" + assert r2 == "Second" + assert e1 == () + assert e2 == () + +class TestParseStreamFtlFacade: + """parse_stream_ftl is accessible from root facade.""" + + def test_accessible_from_root(self) -> None: + """parse_stream_ftl is importable from ftllexengine.""" + assert hasattr(ftllexengine, "parse_stream_ftl") + assert callable(ftllexengine.parse_stream_ftl) + + def test_in_root_all(self) -> None: + """parse_stream_ftl is listed in ftllexengine.__all__.""" + assert "parse_stream_ftl" in ftllexengine.__all__ + + def test_yields_entries_from_lines(self) -> None: + """parse_stream_ftl yields Message entries from line list.""" + entries = list(ftllexengine.parse_stream_ftl(["greeting = Hello\n"])) + assert len(entries) == 1 + assert isinstance(entries[0], Message) + assert entries[0].id.name == "greeting" diff --git a/tests/localization_orchestration_cases/__init__.py b/tests/localization_orchestration_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/localization_orchestration_cases/ast_and_cleanup.py b/tests/localization_orchestration_cases/ast_and_cleanup.py new file mode 100644 index 00000000..c29be61f --- /dev/null +++ b/tests/localization_orchestration_cases/ast_and_cleanup.py @@ -0,0 +1,395 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest + +from ftllexengine import validate_message_variables +from ftllexengine.integrity import ( + FormattingIntegrityError, + IntegrityCheckFailedError, +) +from ftllexengine.localization import ( + FluentLocalization, + LoadStatus, + ResourceLoadResult, +) +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.cache_config import CacheConfig +from ftllexengine.syntax import Message, Term +from ftllexengine.syntax.ast import Junk, Span + + +class TestFormattingIntegrityErrorReraise: + """FluentLocalization re-raises FormattingIntegrityError with corrected component. + + Lines 690-703: the except FormattingIntegrityError block in format_pattern + fires when the bundle raises in strict mode and the message exists in the + bundle. The orchestrator must re-raise with component='localization'. + """ + + def test_strict_localization_reraises_with_localization_component(self) -> None: + """Strict FluentLocalization re-raises FormattingIntegrityError. + + Covers lines 690-703: the except block that replaces the 'bundle' + component with 'localization' in the IntegrityContext before re-raising. + """ + l10n = FluentLocalization(["en"], strict=True) + l10n.add_resource("en", "test-msg = Hello { $name }!") + + # Calling format_pattern without the required $name argument causes + # VARIABLE_NOT_PROVIDED error. In strict mode the bundle raises + # FormattingIntegrityError, which the orchestrator catches and re-raises. + with pytest.raises(FormattingIntegrityError) as exc_info: + l10n.format_pattern("test-msg", {}) + + exc = exc_info.value + assert exc.context is not None + assert exc.context.component == "localization" + assert len(exc.fluent_errors) > 0 + assert exc.message_id == "test-msg" + +class TestGetMessageAST: + """FluentLocalization.get_message() returns the parsed Message AST from the fallback chain.""" + + def test_existing_message_primary_locale(self) -> None: + """get_message returns the Message from the primary locale.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "greeting = Hello, { $name }!") + + msg = l10n.get_message("greeting") + + assert msg is not None + assert isinstance(msg, Message) + assert msg.id.name == "greeting" + + def test_missing_message_returns_none(self) -> None: + """get_message returns None when no locale contains the message.""" + l10n = FluentLocalization(["en", "lv"]) + l10n.add_resource("en", "hello = Hello!") + + assert l10n.get_message("nonexistent") is None + + def test_fallback_chain_used_when_primary_missing(self) -> None: + """get_message falls back to secondary locale when primary lacks the message.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("en", "greeting = Hello!") + # lv has no "greeting" resource + + msg = l10n.get_message("greeting") + + assert msg is not None + assert isinstance(msg, Message) + assert msg.id.name == "greeting" + + def test_primary_locale_wins_when_both_have_message(self) -> None: + """Primary locale's Message is returned when multiple locales have the message.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("lv", "greeting = Sveiki!") + l10n.add_resource("en", "greeting = Hello!") + + msg = l10n.get_message("greeting") + + assert msg is not None + assert msg.id.name == "greeting" + # Verify it's the primary locale's message by checking a separate bundle + lv_bundle = FluentBundle("lv", use_isolating=False) + lv_bundle.add_resource("greeting = Sveiki!") + lv_msg = lv_bundle.get_message("greeting") + assert lv_msg is not None + assert msg is not lv_msg # Different bundle instances, same message id + + def test_empty_localization_returns_none(self) -> None: + """get_message returns None when no resources have been added.""" + l10n = FluentLocalization(["en"]) + + assert l10n.get_message("anything") is None + + def test_get_message_result_usable_with_validate_message_variables(self) -> None: + """get_message result can be passed to validate_message_variables().""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "greeting = Hello, { $name }!") + + msg = l10n.get_message("greeting") + assert msg is not None + + result = validate_message_variables(msg, frozenset({"name"})) + assert result.is_valid + assert result.declared_variables == frozenset({"name"}) + +class TestGetTermAST: + """FluentLocalization.get_term() returns the parsed Term AST from the fallback chain.""" + + def test_existing_term_primary_locale(self) -> None: + """get_term returns the Term from the primary locale.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "-brand = Firefox") + + term = l10n.get_term("brand") + + assert term is not None + assert isinstance(term, Term) + assert term.id.name == "brand" + + def test_missing_term_returns_none(self) -> None: + """get_term returns None when no locale contains the term.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "hello = Hello!") + + assert l10n.get_term("nonexistent") is None + + def test_fallback_chain_used_for_term(self) -> None: + """get_term falls back to secondary locale when primary lacks the term.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("en", "-brand = Firefox") + # lv has no "-brand" resource + + term = l10n.get_term("brand") + + assert term is not None + assert isinstance(term, Term) + assert term.id.name == "brand" + + def test_term_id_without_leading_dash(self) -> None: + """-brand is accessed as get_term('brand'), not get_term('-brand').""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "-brand = Firefox") + + assert l10n.get_term("brand") is not None + assert l10n.get_term("-brand") is None + + def test_get_message_does_not_return_terms(self) -> None: + """get_message does not return terms (separate namespaces).""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "-brand = Firefox") + + assert l10n.get_message("brand") is None + + def test_get_term_does_not_return_messages(self) -> None: + """get_term does not return messages (separate namespaces).""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "brand = Firefox") + + assert l10n.get_term("brand") is None + +class TestDescribeUncleanLoadResult: + """Tests for _describe_unclean_load_result private helper. + + Called by require_clean() to build the error detail string. Tested + directly to cover the error=None (UnknownError) and junk branches. + """ + + def test_error_result_with_none_error_uses_unknown_error(self) -> None: + """When result.is_error is True but error is None, name is 'UnknownError'.""" + result = ResourceLoadResult("en", "bad.ftl", LoadStatus.ERROR, error=None) + l10n = FluentLocalization(["en"]) + + key, detail = l10n._describe_unclean_load_result(result) + + assert key == "en/bad.ftl" + assert detail == "load error (UnknownError)" + + def test_error_result_with_actual_error_uses_type_name(self) -> None: + """When result.error is not None, type name is used in the description.""" + result = ResourceLoadResult( + "en", "bad.ftl", LoadStatus.ERROR, error=OSError("disk fail"), + ) + l10n = FluentLocalization(["en"]) + + _key, detail = l10n._describe_unclean_load_result(result) + + assert "OSError" in detail + + def test_junk_result_describes_junk_entry_count(self) -> None: + """Junk branch returns description with junk entry count.""" + junk = Junk(content="bad syntax", span=Span(start=0, end=10)) + result = ResourceLoadResult( + "en", "partial.ftl", LoadStatus.SUCCESS, junk_entries=(junk,), + ) + l10n = FluentLocalization(["en"]) + + key, detail = l10n._describe_unclean_load_result(result) + + assert key == "en/partial.ftl" + assert "1 junk entry" in detail + + def test_junk_plural_with_two_entries(self) -> None: + """Two junk entries use 'entries' plural noun.""" + junk1 = Junk(content="bad1", span=Span(start=0, end=4)) + junk2 = Junk(content="bad2", span=Span(start=5, end=9)) + result = ResourceLoadResult( + "en", "partial.ftl", LoadStatus.SUCCESS, + junk_entries=(junk1, junk2), + ) + l10n = FluentLocalization(["en"]) + + _key, detail = l10n._describe_unclean_load_result(result) + + assert "2 junk entries" in detail + +class TestRequireCleanCleanBeforeProblematic: + """Tests for require_clean when the first result in summary is clean. + + The for-loop in require_clean iterates summary.results looking for the + first non-clean result. When results[0] is clean, the inner if-condition + is False for that iteration (the loop-continue branch), and iteration + advances to the next element. + """ + + def test_first_clean_second_not_found_raises_with_correct_key(self) -> None: + """require_clean iterates past a clean first result to find the bad one.""" + + class PartialLoader: + def load(self, _locale: str, resource_id: str) -> str: + if resource_id == "first.ftl": + return "msg = Hello\n" + msg = "missing" + raise FileNotFoundError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization( + ["en"], ["first.ftl", "second.ftl"], PartialLoader(), + ) + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.require_clean() + + ctx = exc_info.value.context + assert ctx is not None + # second.ftl is the first non-clean result; first.ftl was clean + assert "second.ftl" in (ctx.key or "") + +class TestRequireCleanJunkBranch: + """Tests for require_clean that trigger the junk description branch.""" + + def test_require_clean_raises_with_junk_detail(self) -> None: + """require_clean raises when the loader produces a resource with junk entries. + + strict=False: testing load summary junk tracking; junk entries must be + captured in the ResourceLoadResult, not raised as SyntaxIntegrityError. + """ + + class JunkLoader: + def load(self, _locale: str, _resource_id: str) -> str: + # "bad-junk" is not valid FTL syntax; produces a Junk AST node + return "bad-junk\n" + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization( + ["en"], ["main.ftl"], JunkLoader(), strict=False, + ) + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.require_clean() + + assert "junk" in str(exc_info.value).lower() + +class TestFormatSchemaDifferenceMissingVariables: + """Tests for _format_schema_difference when only missing_variables is set. + + Existing tests cover the extra_variables path (message declares more vars + than expected). These tests cover the missing_variables path (expected vars + not found in message) and the False branch of 'if validation.extra_variables'. + """ + + def test_missing_variables_only_reported(self) -> None: + """Schema diff reports missing variables when message uses fewer than expected.""" + l10n = FluentLocalization(["en"]) + # Message uses no variables; expected schema requires $amount + l10n.add_resource("en", "invoice = Static total\n") + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.validate_message_schemas({ + "invoice": frozenset({"amount"}), + }) + + err = exc_info.value + # Must describe the missing variable + assert "missing {amount}" in str(err) + + def test_validate_message_variables_missing_variable_raises(self) -> None: + """Single-message validation reports missing variable in error message.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "price = Free\n") + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.validate_message_variables("price", frozenset({"cost"})) + + assert "missing {cost}" in str(exc_info.value) + +class TestValidateMessageSchemasTruncation: + """Tests for validate_message_schemas 'N more issues' truncation. + + When 4 or more messages fail validation, mismatches[:3] is taken and + the remaining count is appended as '... N more issue(s)'. + """ + + def test_four_mismatches_appends_remaining_count(self) -> None: + """Four schema mismatches trigger 'N more issue' truncation.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource( + "en", + "m1 = { $a }\nm2 = { $a }\nm3 = { $a }\nm4 = { $a }\n", + ) + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + # All four messages have $a extra (expected empty schema) + l10n.validate_message_schemas({ + "m1": frozenset(), + "m2": frozenset(), + "m3": frozenset(), + "m4": frozenset(), + }) + + err_str = str(exc_info.value) + assert "more issue" in err_str + + def test_five_mismatches_pluralises_noun(self) -> None: + """Five mismatches produce '2 more issues' (plural noun).""" + l10n = FluentLocalization(["en"]) + l10n.add_resource( + "en", + "m1 = { $a }\nm2 = { $a }\nm3 = { $a }\nm4 = { $a }\nm5 = { $a }\n", + ) + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.validate_message_schemas({ + "m1": frozenset(), + "m2": frozenset(), + "m3": frozenset(), + "m4": frozenset(), + "m5": frozenset(), + }) + + err_str = str(exc_info.value) + assert "more issues" in err_str + +class TestGetCacheAuditLogBundleWithoutCache: + """Tests for get_cache_audit_log when a bundle in _bundles has no cache. + + When bundle.get_cache_audit_log() returns None (bundle has no cache + configured), that bundle's locale is excluded from the audit_logs dict. + This exercises the ``if audit_log is not None:`` False branch. + """ + + def test_bundle_without_cache_excluded_from_audit_log(self) -> None: + """Locale with a no-cache bundle is absent from the audit log mapping.""" + l10n = FluentLocalization( + ["en", "de"], cache=CacheConfig(enable_audit=True), + ) + l10n.add_resource("en", "msg = Hello\n") + l10n.format_value("msg") + + # Inject a bundle with no cache for "de"; get_cache_audit_log() returns None + no_cache_bundle = FluentBundle("de") + no_cache_bundle.add_resource("msg = Hallo\n") + l10n._bundles["de"] = no_cache_bundle + + audit_logs = l10n.get_cache_audit_log() + + assert audit_logs is not None + assert "en" in audit_logs + assert "de" not in audit_logs diff --git a/tests/localization_orchestration_cases/cache_and_properties.py b/tests/localization_orchestration_cases/cache_and_properties.py new file mode 100644 index 00000000..b68da2ec --- /dev/null +++ b/tests/localization_orchestration_cases/cache_and_properties.py @@ -0,0 +1,371 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +from pathlib import Path + +from hypothesis import HealthCheck, event, given, settings +from hypothesis import strategies as st + +from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.localization import ( + CacheAuditLogEntry, + FluentLocalization, + PathResourceLoader, +) +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.cache_config import CacheConfig +from tests.strategies.localization import locale_chains, message_ids + + +class TestCacheStatsBranch: + """Tests for get_cache_stats aggregation branch.""" + + def test_aggregates_across_multiple_bundles(self) -> None: + """get_cache_stats sums metrics across all bundles.""" + l10n = FluentLocalization( + ["en", "de"], cache=CacheConfig(size=500), + ) + l10n.add_resource("en", "msg = Hello\n") + l10n.add_resource("de", "msg = Hallo\n") + + # Format to create cache entries + l10n.format_value("msg") + + stats = l10n.get_cache_stats() + assert stats is not None + assert stats["bundle_count"] == 2 + assert stats["maxsize"] == 1000 # 500 * 2 + + def test_empty_bundles_returns_zero_stats(self) -> None: + """get_cache_stats returns zero stats with no initialized bundles.""" + l10n = FluentLocalization(["en"], cache=CacheConfig()) + stats = l10n.get_cache_stats() + assert stats is not None + assert stats["bundle_count"] == 0 + assert stats["size"] == 0 + + def test_hit_rate_calculated_correctly(self) -> None: + """Hit rate is hits/(hits+misses)*100.""" + l10n = FluentLocalization(["en"], cache=CacheConfig()) + l10n.add_resource("en", "msg = Hello\n") + l10n.format_value("msg") # miss + l10n.format_value("msg") # hit + stats = l10n.get_cache_stats() + assert stats is not None + assert stats["hit_rate"] == 50.0 + + def test_skips_bundle_with_no_cache(self) -> None: + """Bundles returning None from get_cache_stats are skipped.""" + l10n = FluentLocalization( + ["en", "de"], cache=CacheConfig(size=100), + ) + # Create cached bundle for "en" + l10n.add_resource("en", "msg = Hello\n") + l10n.format_value("msg") + + # Inject a no-cache bundle for "de" directly + no_cache_bundle = FluentBundle("de") + no_cache_bundle.add_resource("msg = Hallo\n") + l10n._bundles["de"] = no_cache_bundle + + stats = l10n.get_cache_stats() + assert stats is not None + # Only "en" bundle contributes stats + assert stats["bundle_count"] == 2 + assert stats["maxsize"] == 100 # Only en's maxsize + +class TestCacheAuditLogBranch: + """Tests for get_cache_audit_log per-locale audit access.""" + + def test_returns_none_when_caching_disabled(self) -> None: + """get_cache_audit_log() returns None when localization caching is disabled.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "msg = Hello\n") + + assert l10n.get_cache_audit_log() is None + + def test_returns_empty_mapping_when_no_bundles_initialized(self) -> None: + """get_cache_audit_log() does not create bundles during inspection.""" + l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_audit=True)) + + audit_logs = l10n.get_cache_audit_log() + assert audit_logs == {} + + def test_returns_per_locale_write_log_entries(self) -> None: + """get_cache_audit_log() returns immutable CacheAuditLogEntry tuples per locale.""" + l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_audit=True)) + l10n.add_resource("en", "msg = Hello\n") + l10n.add_resource("de", "msg = Hallo\n") + + l10n.format_value("msg") + l10n.format_value("msg") + + audit_logs = l10n.get_cache_audit_log() + assert audit_logs is not None + assert list(audit_logs) == ["en", "de"] + assert [entry.operation for entry in audit_logs["en"]] == ["MISS", "PUT", "HIT"] + assert audit_logs["de"] == () + assert all(isinstance(entry, CacheAuditLogEntry) for entry in audit_logs["en"]) + + @given(enable_audit=st.booleans(), locales=locale_chains(min_size=1, max_size=3)) + @settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture]) + def test_property_audit_log_tracks_initialized_locales( + self, enable_audit: bool, locales: list[str] + ) -> None: + """PROPERTY: get_cache_audit_log() uses canonical locale keys.""" + l10n = FluentLocalization(locales, cache=CacheConfig(enable_audit=enable_audit)) + for locale in locales: + l10n.add_resource(locale, "msg = Hello\n") + + l10n.format_value("msg") + + audit_logs = l10n.get_cache_audit_log() + assert audit_logs is not None + normalized_locales = [normalize_locale(locale) for locale in locales] + assert list(audit_logs) == normalized_locales + + event(f"audit={'enabled' if enable_audit else 'disabled'}") + event(f"locale_count={len(locales)}") + + if enable_audit: + assert len(audit_logs[normalized_locales[0]]) >= 2 + assert all( + isinstance(entry, CacheAuditLogEntry) + for entry in audit_logs[normalized_locales[0]] + ) + else: + assert all(log == () for log in audit_logs.values()) + +class TestFormatPattern: + """Tests for format_pattern fallback chain edge cases.""" + + def test_format_pattern_not_found_returns_braced_id(self) -> None: + """format_pattern returns {message_id} when not found in any locale.""" + l10n = FluentLocalization(["en", "de"], strict=False) + result, errors = l10n.format_pattern("missing") + assert result == "{missing}" + assert len(errors) == 1 + + def test_format_pattern_primary_locale_skips_fallback_callback( + self, + ) -> None: + """format_pattern does not invoke on_fallback for primary locale.""" + from ftllexengine.localization import FallbackInfo + events: list[FallbackInfo] = [] + l10n = FluentLocalization( + ["en", "de"], on_fallback=events.append, + use_isolating=False, + ) + l10n.add_resource("en", "msg = Primary") + result, errors = l10n.format_pattern("msg") + assert result == "Primary" + assert errors == () + assert len(events) == 0 + +class TestRepr: + """Tests for __repr__ format.""" + + def test_includes_locales_and_bundle_count(self) -> None: + """__repr__ shows locales and initialized/total bundles.""" + l10n = FluentLocalization(["en", "de"]) + r = repr(l10n) + assert "FluentLocalization" in r + assert "locales=('en', 'de')" in r + assert "bundles=0/2" in r + + def test_bundle_count_updates_after_access(self) -> None: + """__repr__ bundle count reflects initialized bundles.""" + l10n = FluentLocalization(["en", "de"]) + l10n.add_resource("en", "msg = test") + r = repr(l10n) + assert "bundles=1/2" in r + +class TestOrchestrationProperties: + """Property-based tests for orchestration invariants. + + Standard @given tests with bounded strategies. Run in CI (no fuzz marker). + """ + + @given( + locales=locale_chains(min_size=1, max_size=3), + message_id=message_ids(), + message_value=st.text(min_size=1, max_size=100), + ) + def test_format_value_never_crashes( + self, + locales: list[str], + message_id: str, + message_value: str, + ) -> None: + """format_value never crashes regardless of input (robustness).""" + event(f"locale_count={len(locales)}") + val_class = ( + "short" if len(message_value) <= 10 + else "medium" if len(message_value) <= 50 + else "long" + ) + event(f"value_len={val_class}") + l10n = FluentLocalization(locales, strict=False) + ftl_source = f"{message_id} = {message_value}" + l10n.add_resource(locales[0], ftl_source) + result, errors = l10n.format_value(message_id) + assert isinstance(result, str) + assert isinstance(errors, tuple) + + @given( + locales=locale_chains(min_size=2, max_size=5), + message_id=message_ids(), + target_locale_idx=st.integers(min_value=0, max_value=4), + ) + def test_fallback_uses_first_available_locale( + self, + locales: list[str], + message_id: str, + target_locale_idx: int, + ) -> None: + """Fallback resolves from first locale in chain that has message.""" + idx = min(target_locale_idx, len(locales) - 1) + event(f"target_idx={idx}") + l10n = FluentLocalization(locales) + target_locale = locales[idx] + l10n.add_resource( + target_locale, f"{message_id} = From {target_locale}", + ) + result, errors = l10n.format_value(message_id) + assert f"From {target_locale}" in result + assert not any( + "not found in any locale" in str(e) for e in errors + ) + + @given( + locales=locale_chains(min_size=1, max_size=3), + num_messages=st.integers(min_value=1, max_value=10), + ) + def test_partial_translations_use_correct_fallback( + self, locales: list[str], num_messages: int, + ) -> None: + """Partial translations correctly fall back per message.""" + event(f"num_messages={num_messages}") + has_fallback = len(locales) > 1 + event(f"has_fallback={has_fallback}") + l10n = FluentLocalization(locales, strict=False) + + msg_ids = [f"msg-{i}" for i in range(num_messages)] + + first_msgs = [m for i, m in enumerate(msg_ids) if i % 2 == 0] + if first_msgs: + ftl = "\n".join(f"{m} = First locale" for m in first_msgs) + l10n.add_resource(locales[0], ftl) + + if has_fallback: + last_msgs = [ + m for i, m in enumerate(msg_ids) if i % 2 == 1 + ] + if last_msgs: + ftl = "\n".join( + f"{m} = Last locale" for m in last_msgs + ) + l10n.add_resource(locales[-1], ftl) + + for idx, mid in enumerate(msg_ids): + result, errors = l10n.format_value(mid) + missing = any( + "not found in any locale" in str(e) + for e in errors + ) + if idx % 2 == 0: + assert "First locale" in result or missing + elif has_fallback: + assert "Last locale" in result or missing + + @settings(suppress_health_check=[HealthCheck.function_scoped_fixture]) + @given( + locales=locale_chains(min_size=1, max_size=3), + message_id=message_ids(), + ) + def test_loader_integration_deterministic( + self, + tmp_path: Path, + locales: list[str], + message_id: str, + ) -> None: + """Loader integration produces identical results across instances.""" + event(f"locale_count={len(locales)}") + locales_dir = tmp_path / "locales" + for idx, locale in enumerate(locales): + locale_dir = locales_dir / normalize_locale(locale) + locale_dir.mkdir(parents=True, exist_ok=True) + (locale_dir / "main.ftl").write_text( + f"{message_id} = Value {idx}", encoding="utf-8", + ) + + loader = PathResourceLoader(str(locales_dir / "{locale}")) + + l10n1 = FluentLocalization(locales, ["main.ftl"], loader) + result1, _ = l10n1.format_value(message_id) + + l10n2 = FluentLocalization(locales, ["main.ftl"], loader) + result2, _ = l10n2.format_value(message_id) + + assert result1 == result2 + + @given( + locales=locale_chains(min_size=2, max_size=4), + message_id=message_ids(), + ) + def test_locale_order_affects_resolution( + self, locales: list[str], message_id: str, + ) -> None: + """Reversing locale order changes which bundle resolves message.""" + event(f"locale_count={len(locales)}") + l10n_fwd = FluentLocalization(locales) + l10n_rev = FluentLocalization(list(reversed(locales))) + + first_msg = f"{message_id} = From {locales[0]}" + last_msg = f"{message_id} = From {locales[-1]}" + + l10n_fwd.add_resource(locales[0], first_msg) + l10n_fwd.add_resource(locales[-1], last_msg) + + l10n_rev.add_resource(locales[0], first_msg) + l10n_rev.add_resource(locales[-1], last_msg) + + result_fwd, _ = l10n_fwd.format_value(message_id) + result_rev, _ = l10n_rev.format_value(message_id) + + if len(locales) > 1: + assert result_fwd != result_rev + + @given( + locales=locale_chains(min_size=1, max_size=1), + message_id=message_ids(), + value1=st.text( + alphabet=st.characters(whitelist_categories=("L", "N")), + min_size=1, max_size=50, + ), + value2=st.text( + alphabet=st.characters(whitelist_categories=("L", "N")), + min_size=1, max_size=50, + ), + ) + def test_add_resource_twice_uses_latest( + self, + locales: list[str], + message_id: str, + value1: str, + value2: str, + ) -> None: + """Adding resource twice uses latest value (override property).""" + event("outcome=override") + locale = locales[0] + l10n = FluentLocalization([locale]) + + l10n.add_resource(locale, f"{message_id} = {value1}") + result1, _ = l10n.format_value(message_id) + + l10n.add_resource(locale, f"{message_id} = {value2}") + result2, _ = l10n.format_value(message_id) + + assert value1 in result1 or value2 in result1 + assert value2 in result2 diff --git a/tests/localization_orchestration_cases/load_and_lookup.py b/tests/localization_orchestration_cases/load_and_lookup.py new file mode 100644 index 00000000..943e5dc9 --- /dev/null +++ b/tests/localization_orchestration_cases/load_and_lookup.py @@ -0,0 +1,367 @@ +# mypy: ignore-errors +from __future__ import annotations + +from pathlib import Path + +import pytest + +from ftllexengine.localization import ( + FluentLocalization, + LoadStatus, + LoadSummary, + PathResourceLoader, + ResourceLoadResult, +) +from ftllexengine.syntax.ast import Junk, Span + + +class TestResourceLoadResultStatusProperties: + """ResourceLoadResult status predicates are mutually exclusive.""" + + @pytest.mark.parametrize("status", list(LoadStatus)) + def test_status_properties_exclusive(self, status: LoadStatus) -> None: + """Exactly one of is_success/is_not_found/is_error is True.""" + result = ResourceLoadResult("en", "main.ftl", status) + flags = [result.is_success, result.is_not_found, result.is_error] + assert sum(flags) == 1 + + def test_has_junk_true_when_junk_present(self) -> None: + """has_junk is True when junk_entries is non-empty.""" + junk = Junk(content="bad", span=Span(start=0, end=3)) + result = ResourceLoadResult( + "en", "test.ftl", LoadStatus.SUCCESS, + junk_entries=(junk,), + ) + assert result.has_junk is True + + def test_has_junk_false_when_empty(self) -> None: + """has_junk is False when junk_entries is empty.""" + result = ResourceLoadResult( + "en", "test.ftl", LoadStatus.SUCCESS, junk_entries=(), + ) + assert result.has_junk is False + +class TestLoadSummaryStatistics: + """LoadSummary post_init and filtering methods.""" + + def _make_summary(self) -> LoadSummary: + """Build a LoadSummary with all three status types and junk.""" + junk = Junk(content="j", span=Span(start=0, end=1)) + results = ( + ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), + ResourceLoadResult( + "en", "junk.ftl", LoadStatus.SUCCESS, + junk_entries=(junk,), + ), + ResourceLoadResult("de", "nf.ftl", LoadStatus.NOT_FOUND), + ResourceLoadResult( + "fr", "err.ftl", LoadStatus.ERROR, + error=OSError("fail"), + ), + ) + return LoadSummary(results=results) + + def test_post_init_calculates_counts(self) -> None: + """__post_init__ calculates all aggregate counts.""" + summary = self._make_summary() + assert summary.total_attempted == 4 + assert summary.successful == 2 + assert summary.not_found == 1 + assert summary.errors == 1 + assert summary.junk_count == 1 + + def test_get_errors_returns_error_results(self) -> None: + """get_errors returns only ERROR status results.""" + summary = self._make_summary() + errors = summary.get_errors() + assert len(errors) == 1 + assert errors[0].locale == "fr" + + def test_get_not_found_returns_not_found_results(self) -> None: + """get_not_found returns only NOT_FOUND status results.""" + summary = self._make_summary() + not_found = summary.get_not_found() + assert len(not_found) == 1 + assert not_found[0].locale == "de" + + def test_get_successful_returns_success_results(self) -> None: + """get_successful returns only SUCCESS status results.""" + summary = self._make_summary() + successful = summary.get_successful() + assert len(successful) == 2 + + def test_get_by_locale_filters_correctly(self) -> None: + """get_by_locale returns results for specified locale only.""" + summary = self._make_summary() + en_results = summary.get_by_locale("en") + assert len(en_results) == 2 + assert all(r.locale == "en" for r in en_results) + + def test_get_with_junk_returns_junk_results(self) -> None: + """get_with_junk returns results with non-empty junk_entries.""" + summary = self._make_summary() + junk_results = summary.get_with_junk() + assert len(junk_results) == 1 + assert junk_results[0].resource_id == "junk.ftl" + + def test_get_all_junk_flattens_entries(self) -> None: + """get_all_junk returns flattened tuple of all Junk entries.""" + summary = self._make_summary() + all_junk = summary.get_all_junk() + assert len(all_junk) == 1 + + def test_has_errors_property(self) -> None: + """has_errors reflects errors count.""" + summary = self._make_summary() + assert summary.has_errors is True + + clean = LoadSummary(results=( + ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), + )) + assert clean.has_errors is False + + def test_has_junk_property(self) -> None: + """has_junk reflects junk_count.""" + summary = self._make_summary() + assert summary.has_junk is True + + def test_all_successful_ignores_junk(self) -> None: + """all_successful is True even with junk, if no errors/not_found.""" + junk = Junk(content="j", span=Span(start=0, end=1)) + summary = LoadSummary(results=( + ResourceLoadResult( + "en", "ok.ftl", LoadStatus.SUCCESS, + junk_entries=(junk,), + ), + )) + assert summary.all_successful is True + + def test_all_clean_requires_zero_junk(self) -> None: + """all_clean is False when junk exists even if all_successful.""" + junk = Junk(content="j", span=Span(start=0, end=1)) + summary = LoadSummary(results=( + ResourceLoadResult( + "en", "ok.ftl", LoadStatus.SUCCESS, + junk_entries=(junk,), + ), + )) + assert summary.all_successful is True + assert summary.all_clean is False + + def test_all_clean_true_when_no_issues(self) -> None: + """all_clean is True when no errors, not_found, or junk.""" + summary = LoadSummary(results=( + ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), + )) + assert summary.all_clean is True + + def test_setattr_raises_attribute_error(self) -> None: + """LoadSummary rejects attribute mutation (frozen dataclass).""" + summary = self._make_summary() + with pytest.raises(AttributeError): + summary.results = () # type: ignore[misc] + + def test_delattr_raises_attribute_error(self) -> None: + """LoadSummary rejects attribute deletion (frozen dataclass).""" + summary = self._make_summary() + with pytest.raises(AttributeError): + del summary.results + + def test_eq_same_results_tuple(self) -> None: + """LoadSummary instances with same results tuple compare equal.""" + results = ( + ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), + ) + s1 = LoadSummary(results=results) + s2 = LoadSummary(results=results) + assert s1 == s2 + + def test_eq_not_implemented_for_other_types(self) -> None: + """LoadSummary equality returns NotImplemented for non-LoadSummary.""" + summary = self._make_summary() + assert summary != "not a summary" # type: ignore[comparison-overlap] + # Direct dunder call required to test NotImplemented sentinel + result = LoadSummary.__eq__(summary, "not a summary") # pylint: disable=unnecessary-dunder-call # type: ignore[arg-type] + assert result is NotImplemented + + def test_hash_consistent_with_eq(self) -> None: + """Equal LoadSummary instances sharing results have equal hashes.""" + results = ( + ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), + ) + s1 = LoadSummary(results=results) + s2 = LoadSummary(results=results) + assert hash(s1) == hash(s2) + + def test_repr_includes_counts(self) -> None: + """LoadSummary repr includes all aggregate counts.""" + summary = self._make_summary() + r = repr(summary) + assert "LoadSummary(" in r + assert "total=4" in r + assert "ok=2" in r + +class TestPathResourceLoaderInit: + """PathResourceLoader initialization edge cases.""" + + def test_empty_static_prefix_uses_cwd(self) -> None: + """base_path starting with {locale} uses cwd as root.""" + loader = PathResourceLoader("{locale}/resources") + assert loader._resolved_root == Path.cwd().resolve() + + def test_explicit_root_dir_overrides(self) -> None: + """Explicit root_dir overrides base_path derivation.""" + loader = PathResourceLoader( + "any/{locale}/path", root_dir="/tmp", + ) + assert loader._resolved_root == Path("/tmp").resolve() + + def test_trailing_separators_stripped(self) -> None: + """Trailing separators stripped from static prefix.""" + loader = PathResourceLoader("locales/{locale}////") + assert loader._resolved_root == Path("locales").resolve() + + def test_multiple_locale_placeholders(self) -> None: + """Multiple {locale} placeholders use first split part.""" + loader = PathResourceLoader("root/{locale}/sub/{locale}") + assert loader._resolved_root == Path("root").resolve() + +class TestHasAttribute: + """Tests for has_attribute fallback chain search.""" + + def test_attribute_in_primary_locale(self) -> None: + """has_attribute finds attribute in primary locale.""" + l10n = FluentLocalization(["en", "de"]) + l10n.add_resource("en", "btn = Click\n .tooltip = Help\n") + assert l10n.has_attribute("btn", "tooltip") is True + + def test_attribute_in_fallback_locale(self) -> None: + """has_attribute finds attribute in fallback locale.""" + l10n = FluentLocalization(["de", "en"]) + l10n.add_resource("en", "btn = Click\n .tooltip = Help\n") + assert l10n.has_attribute("btn", "tooltip") is True + + def test_attribute_not_found(self) -> None: + """has_attribute returns False for nonexistent attribute.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "msg = No attrs\n") + assert l10n.has_attribute("msg", "nonexistent") is False + + def test_message_not_found(self) -> None: + """has_attribute returns False for nonexistent message.""" + l10n = FluentLocalization(["en"]) + assert l10n.has_attribute("missing", "attr") is False + +class TestGetMessageIds: + """Tests for get_message_ids union across locales.""" + + def test_returns_union_of_ids(self) -> None: + """get_message_ids returns union across all locales.""" + l10n = FluentLocalization(["en", "de"]) + l10n.add_resource("en", "msg-a = A\nmsg-b = B\n") + l10n.add_resource("de", "msg-b = B2\nmsg-c = C\n") + ids = l10n.get_message_ids() + assert set(ids) == {"msg-a", "msg-b", "msg-c"} + + def test_no_duplicates(self) -> None: + """get_message_ids has no duplicate IDs.""" + l10n = FluentLocalization(["en", "de"]) + l10n.add_resource("en", "msg = A\n") + l10n.add_resource("de", "msg = B\n") + ids = l10n.get_message_ids() + assert len(ids) == len(set(ids)) + + def test_primary_locale_ids_first(self) -> None: + """Primary locale IDs appear before fallback IDs.""" + l10n = FluentLocalization(["en", "de"]) + l10n.add_resource("en", "alpha = A\n") + l10n.add_resource("de", "alpha = A2\nbeta = B\n") + ids = l10n.get_message_ids() + assert ids.index("alpha") < ids.index("beta") + + def test_empty_when_no_resources(self) -> None: + """get_message_ids is empty when no resources loaded.""" + l10n = FluentLocalization(["en"]) + assert l10n.get_message_ids() == [] + +class TestGetMessageVariables: + """Tests for get_message_variables with fallback.""" + + def test_returns_variable_names(self) -> None: + """get_message_variables returns frozenset of variable names.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource( + "en", "greeting = Hello { $firstName } { $lastName }!\n", + ) + variables = l10n.get_message_variables("greeting") + assert isinstance(variables, frozenset) + assert "firstName" in variables + assert "lastName" in variables + + def test_fallback_chain_search(self) -> None: + """get_message_variables searches fallback chain.""" + l10n = FluentLocalization(["de", "en"]) + l10n.add_resource("en", "msg = Value { $count }\n") + variables = l10n.get_message_variables("msg") + assert "count" in variables + + def test_raises_for_missing_message(self) -> None: + """get_message_variables raises KeyError for missing message.""" + l10n = FluentLocalization(["en"]) + with pytest.raises(KeyError, match="not found"): + l10n.get_message_variables("nonexistent") + +class TestGetAllMessageVariables: + """Tests for get_all_message_variables merged map.""" + + def test_returns_dict_of_variable_sets(self) -> None: + """get_all_message_variables returns dict mapping id -> frozenset.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource( + "en", "msg1 = { $name }\nmsg2 = Static\n", + ) + all_vars = l10n.get_all_message_variables() + assert isinstance(all_vars, dict) + assert "msg1" in all_vars + assert "name" in all_vars["msg1"] + assert "msg2" in all_vars + + def test_primary_locale_takes_precedence(self) -> None: + """Primary locale variables win for duplicate message IDs.""" + l10n = FluentLocalization(["en", "de"]) + l10n.add_resource("en", "msg = { $primary }\n") + l10n.add_resource("de", "msg = { $fallback }\n") + all_vars = l10n.get_all_message_variables() + assert "primary" in all_vars["msg"] + + def test_includes_fallback_only_messages(self) -> None: + """Messages only in fallback locales are included.""" + l10n = FluentLocalization(["en", "de"]) + l10n.add_resource("en", "en-only = { $x }\n") + l10n.add_resource("de", "de-only = { $y }\n") + all_vars = l10n.get_all_message_variables() + assert "en-only" in all_vars + assert "de-only" in all_vars + +class TestIntrospectTerm: + """Tests for introspect_term with fallback chain.""" + + def test_found_in_primary(self) -> None: + """introspect_term returns introspection from primary locale.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "-brand = Firefox\n") + info = l10n.introspect_term("brand") + assert info is not None + + def test_found_in_fallback(self) -> None: + """introspect_term searches fallback chain.""" + l10n = FluentLocalization(["de", "en"]) + l10n.add_resource("en", "-product = App\n") + info = l10n.introspect_term("product") + assert info is not None + + def test_not_found_returns_none(self) -> None: + """introspect_term returns None for missing term.""" + l10n = FluentLocalization(["en"]) + info = l10n.introspect_term("nonexistent") + assert info is None diff --git a/tests/localization_orchestration_cases/strict_and_boot.py b/tests/localization_orchestration_cases/strict_and_boot.py new file mode 100644 index 00000000..e3e3c088 --- /dev/null +++ b/tests/localization_orchestration_cases/strict_and_boot.py @@ -0,0 +1,360 @@ +# mypy: ignore-errors +from __future__ import annotations + +import pytest + +from ftllexengine.integrity import ( + FormattingIntegrityError, + IntegrityCheckFailedError, +) +from ftllexengine.localization import ( + FluentLocalization, + LoadSummary, +) + + +class TestStrictMode: + """Tests for FluentLocalization strict mode (fail-fast on errors).""" + + def test_strict_property_reflects_constructor(self) -> None: + """strict property returns constructor value.""" + l10n_strict = FluentLocalization(["en"], strict=True) + l10n_default = FluentLocalization(["en"]) + assert l10n_strict.strict is True + assert l10n_default.strict is True + + def test_strict_raises_on_missing_message(self) -> None: + """Strict mode raises FormattingIntegrityError for missing messages.""" + l10n = FluentLocalization(["en"], strict=True) + l10n.add_resource("en", "hello = Hello\n") + + with pytest.raises(FormattingIntegrityError) as exc_info: + l10n.format_value("nonexistent") + + err = exc_info.value + assert err.message_id == "nonexistent" + assert err.fallback_value is not None + assert len(err.fluent_errors) == 1 + ctx = err.context + assert ctx is not None + assert ctx.component == "localization" + assert ctx.operation == "format_pattern" + + def test_strict_raises_on_empty_message_id(self) -> None: + """Strict mode raises for empty/invalid message ID.""" + l10n = FluentLocalization(["en"], strict=True) + l10n.add_resource("en", "hello = Hello\n") + + with pytest.raises(FormattingIntegrityError) as exc_info: + l10n.format_value("") + + err = exc_info.value + assert err.message_id == "" + assert len(err.fluent_errors) == 1 + + def test_strict_format_pattern_raises_on_missing(self) -> None: + """Strict mode raises via format_pattern path.""" + l10n = FluentLocalization(["en"], strict=True) + l10n.add_resource("en", "hello = Hello\n") + + with pytest.raises(FormattingIntegrityError) as exc_info: + l10n.format_pattern("nonexistent") + + assert exc_info.value.message_id == "nonexistent" + + def test_strict_error_context_fields(self) -> None: + """Strict error includes component, operation, and count metadata.""" + l10n = FluentLocalization(["en"], strict=True) + + with pytest.raises(FormattingIntegrityError) as exc_info: + l10n.format_value("missing") + + err = exc_info.value + assert "failed:" in str(err) + ctx = err.context + assert ctx is not None + assert ctx.actual == "<1 error>" + assert ctx.expected == "" + + def test_strict_raises_on_invalid_args_type(self) -> None: + """Strict mode raises FormattingIntegrityError for invalid args type.""" + l10n = FluentLocalization(["en"], strict=True) + l10n.add_resource("en", "hello = Hello\n") + + with pytest.raises(FormattingIntegrityError) as exc_info: + l10n.format_pattern("hello", "not-a-mapping") # type: ignore[arg-type] + + err = exc_info.value + assert len(err.fluent_errors) == 1 + ctx = err.context + assert ctx is not None + assert ctx.component == "localization" + + def test_strict_raises_on_invalid_attribute_type(self) -> None: + """Strict mode raises FormattingIntegrityError for invalid attribute type.""" + l10n = FluentLocalization(["en"], strict=True) + l10n.add_resource("en", "hello = Hello\n") + + with pytest.raises(FormattingIntegrityError) as exc_info: + l10n.format_pattern( + "hello", attribute=42 # type: ignore[arg-type] + ) + + err = exc_info.value + assert len(err.fluent_errors) == 1 + ctx = err.context + assert ctx is not None + assert ctx.component == "localization" + + def test_non_strict_returns_fallback_on_invalid_args_type(self) -> None: + """Non-strict mode returns fallback for invalid args type without raising.""" + l10n = FluentLocalization(["en"], strict=False) + l10n.add_resource("en", "hello = Hello\n") + + _, errors = l10n.format_pattern("hello", "not-a-mapping") # type: ignore[arg-type] + assert len(errors) == 1 + + def test_strict_non_strict_returns_fallback(self) -> None: + """Non-strict mode returns fallback value without raising.""" + l10n = FluentLocalization(["en"], strict=False) + + result, errors = l10n.format_value("nonexistent") + assert "nonexistent" in result + assert len(errors) == 1 + +class TestResourceLoadingErrors: + """Tests for error handling during resource loading.""" + + def test_custom_loader_source_path_format(self) -> None: + """Non-PathResourceLoader uses locale/resource_id format.""" + + class DictLoader: + def load(self, locale: str, _resource_id: str) -> str: + return f"msg = Hello from {locale}\n" + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization( + ["en", "de"], ["main.ftl"], DictLoader(), + ) + summary = l10n.get_load_summary() + assert summary.total_attempted == 2 + for result in summary.results: + assert result.source_path is not None + assert "/" in result.source_path + + def test_oserror_recorded_as_error(self) -> None: + """OSError during loading recorded with ERROR status.""" + + class FailLoader: + def load( + self, _locale: str, _resource_id: str, + ) -> str: + msg = "Permission denied" + raise OSError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization(["en"], ["main.ftl"], FailLoader()) + summary = l10n.get_load_summary() + assert summary.errors == 1 + assert isinstance(summary.get_errors()[0].error, OSError) + + def test_valueerror_recorded_as_error(self) -> None: + """ValueError during loading recorded with ERROR status.""" + + class FailLoader: + def load( + self, _locale: str, _resource_id: str, + ) -> str: + msg = "Path traversal" + raise ValueError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization(["en"], ["main.ftl"], FailLoader()) + summary = l10n.get_load_summary() + assert summary.errors == 1 + assert isinstance(summary.get_errors()[0].error, ValueError) + + def test_file_not_found_recorded_as_not_found(self) -> None: + """FileNotFoundError recorded as NOT_FOUND status.""" + + class MissingLoader: + def load( + self, _locale: str, _resource_id: str, + ) -> str: + msg = "Not found" + raise FileNotFoundError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization(["en"], ["main.ftl"], MissingLoader()) + summary = l10n.get_load_summary() + assert summary.not_found == 1 + + def test_get_load_summary_returns_summary(self) -> None: + """get_load_summary returns LoadSummary from init phase.""" + l10n = FluentLocalization(["en"]) + summary = l10n.get_load_summary() + assert isinstance(summary, LoadSummary) + assert summary.total_attempted == 0 # No resource_ids provided + +class TestBootValidation: + """Tests for FluentLocalization boot-time validation helpers.""" + + def test_require_clean_returns_summary_when_all_resources_are_clean(self) -> None: + """require_clean returns the immutable load summary on success.""" + l10n = FluentLocalization(["en"]) + + summary = l10n.require_clean() + + assert isinstance(summary, LoadSummary) + assert summary.all_clean is True + assert summary.total_attempted == 0 + + def test_require_clean_raises_integrity_error_for_unclean_summary(self) -> None: + """require_clean raises IntegrityCheckFailedError with structured context.""" + + class MissingLoader: + def load(self, _locale: str, _resource_id: str) -> str: + msg = "missing" + raise FileNotFoundError(msg) + + def describe_path(self, locale: str, resource_id: str) -> str: + return f"{locale}/{resource_id}" + + l10n = FluentLocalization(["en"], ["main.ftl"], MissingLoader()) + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.require_clean() + + err = exc_info.value + assert "not clean" in str(err) + ctx = err.context + assert ctx is not None + assert ctx.component == "localization" + assert ctx.operation == "require_clean" + assert ctx.key == "en/main.ftl" + assert ctx.expected == "LoadSummary(all_clean=True)" + + def test_validate_message_schemas_returns_results_in_input_order(self) -> None: + """validate_message_schemas returns immutable validation results on success.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource( + "en", + "first = Hello { $name }\n" + "second = Balance { $amount }\n", + ) + + results = l10n.validate_message_schemas({ + "first": frozenset({"name"}), + "second": frozenset({"amount"}), + }) + + assert [result.message_id for result in results] == ["first", "second"] + assert all(result.is_valid for result in results) + + def test_validate_message_schemas_uses_fallback_chain(self) -> None: + """Schema validation resolves messages from fallback locales.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("en", "welcome = Hello { $name }\n") + + results = l10n.validate_message_schemas({ + "welcome": frozenset({"name"}), + }) + + assert len(results) == 1 + assert results[0].is_valid is True + + def test_validate_message_variables_returns_single_result(self) -> None: + """Single-message boot validation returns the exact validation result.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "invoice = Total { $amount } for { $customer }\n") + + result = l10n.validate_message_variables( + "invoice", + frozenset({"amount", "customer"}), + ) + + assert result.message_id == "invoice" + assert result.is_valid is True + + def test_validate_message_variables_uses_fallback_chain(self) -> None: + """Single-message validation resolves through localization fallback.""" + l10n = FluentLocalization(["lv", "en"]) + l10n.add_resource("en", "welcome = Hello { $name }\n") + + result = l10n.validate_message_variables("welcome", frozenset({"name"})) + + assert result.message_id == "welcome" + assert result.is_valid is True + + def test_validate_message_schemas_raises_for_missing_message(self) -> None: + """Missing messages fail boot validation with an integrity exception.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "present = Hello\n") + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.validate_message_schemas({"missing": frozenset()}) + + err = exc_info.value + assert "missing: not found" in str(err) + ctx = err.context + assert ctx is not None + assert ctx.operation == "validate_message_schemas" + assert ctx.key == "missing" + assert ctx.actual == "missing_messages=1" + + def test_validate_message_variables_raises_for_missing_message(self) -> None: + """Missing single-message validation raises IntegrityCheckFailedError.""" + l10n = FluentLocalization(["en"]) + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.validate_message_variables("missing", frozenset()) + + err = exc_info.value + assert "missing: not found" in str(err) + ctx = err.context + assert ctx is not None + assert ctx.operation == "validate_message_variables" + assert ctx.key == "missing" + assert ctx.actual == "missing_messages=1" + + def test_validate_message_schemas_raises_for_exact_schema_mismatch(self) -> None: + """Extra or missing variables fail exact boot schema validation.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "checkout = Total { $amount } for { $customer }\n") + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.validate_message_schemas({ + "checkout": frozenset({"amount"}), + }) + + err = exc_info.value + assert "checkout: extra {customer}" in str(err) + ctx = err.context + assert ctx is not None + assert ctx.operation == "validate_message_schemas" + assert ctx.key == "checkout" + assert ctx.actual == "schema_mismatches=1" + + def test_validate_message_variables_raises_for_exact_schema_mismatch(self) -> None: + """Single-message validation raises on exact-schema mismatch.""" + l10n = FluentLocalization(["en"]) + l10n.add_resource("en", "checkout = Total { $amount } for { $customer }\n") + + with pytest.raises(IntegrityCheckFailedError) as exc_info: + l10n.validate_message_variables("checkout", frozenset({"amount"})) + + err = exc_info.value + assert "checkout: extra {customer}" in str(err) + ctx = err.context + assert ctx is not None + assert ctx.operation == "validate_message_variables" + assert ctx.key == "checkout" + assert ctx.actual == "schema_mismatches=1" diff --git a/tests/parsing_currency_cases/__init__.py b/tests/parsing_currency_cases/__init__.py new file mode 100644 index 00000000..31b7b30b --- /dev/null +++ b/tests/parsing_currency_cases/__init__.py @@ -0,0 +1,68 @@ +"""Tests for currency parsing: parse_currency(), symbol resolution, CLDR maps. + +Property-based tests using Hypothesis cover: +- Roundtrip: format -> parse -> verify for unambiguous/ISO inputs +- Locale resilience: arbitrary locales never crash +- Invalid input: no-digit strings always fail +- Ambiguous resolution: locale-aware symbol disambiguation +- CLDR map integrity: type contracts and coverage invariants + +Unit tests cover specification examples and targeted edge cases. + +parse_currency() returns tuple[tuple[Decimal, str] | None, tuple[FrozenFluentError, ...]]. +Functions never raise exceptions (errors returned in tuple) except +BabelImportError when Babel is not installed. + +Python 3.13+. +""" + +from __future__ import annotations + +import builtins +import re +from decimal import Decimal +from typing import Any +from unittest.mock import MagicMock, patch + +import pytest +from babel import UnknownLocaleError +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine.parsing import currency as currency_module +from ftllexengine.parsing.currency import ( + _build_currency_maps_from_cldr, + _get_currency_maps, + parse_currency, + resolve_ambiguous_symbol, +) +from tests.strategies.currency import ( + ambiguous_currency_inputs, + invalid_currency_inputs, + iso_code_currency_inputs, + unambiguous_currency_inputs, +) + +__all__ = [ + "Any", + "Decimal", + "MagicMock", + "UnknownLocaleError", + "_build_currency_maps_from_cldr", + "_get_currency_maps", + "ambiguous_currency_inputs", + "builtins", + "currency_module", + "event", + "given", + "invalid_currency_inputs", + "iso_code_currency_inputs", + "parse_currency", + "patch", + "pytest", + "re", + "resolve_ambiguous_symbol", + "settings", + "st", + "unambiguous_currency_inputs", +] diff --git a/tests/parsing_currency_cases/babel_import_error_handling.py b/tests/parsing_currency_cases/babel_import_error_handling.py new file mode 100644 index 00000000..c9979ff4 --- /dev/null +++ b/tests/parsing_currency_cases/babel_import_error_handling.py @@ -0,0 +1,79 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# BabelImportError handling +# --------------------------------------------------------------------------- + + +class TestBabelImportError: + """Test Babel import error handling.""" + + def test_build_maps_returns_empty_when_babel_missing( + self, + ) -> None: + """_build_currency_maps_from_cldr returns empty without Babel.""" + import ftllexengine.core.babel_compat as _bc + + _build_currency_maps_from_cldr.cache_clear() + + original_import = builtins.__import__ + + def mock_import( + name: str, *args: object, **kwargs: object + ) -> object: + if name == "babel" or name.startswith("babel."): + msg = f"No module named '{name}'" + raise ImportError(msg) + return original_import(name, *args, **kwargs) # type: ignore[arg-type] + + # Reset sentinel so is_babel_available() re-evaluates under the mock + _bc._babel_available = None + + try: + with patch( + "builtins.__import__", side_effect=mock_import + ): + sym, amb, loc, codes = ( + _build_currency_maps_from_cldr() + ) + assert sym == {} + assert amb == set() + assert loc == {} + assert codes == frozenset() + finally: + _build_currency_maps_from_cldr.cache_clear() + # Reset sentinel so subsequent tests reinitialize with Babel available + _bc._babel_available = None + + def test_parse_currency_raises_babel_import_error( + self, + ) -> None: + """parse_currency raises BabelImportError without Babel.""" + import ftllexengine.core.babel_compat as _bc + from ftllexengine.core.babel_compat import BabelImportError + + _bc._babel_available = None + original_import = builtins.__import__ + + def mock_import( + name: str, *args: object, **kwargs: object + ) -> object: + if name == "babel" or name.startswith("babel."): + msg = f"No module named '{name}'" + raise ImportError(msg) + return original_import(name, *args, **kwargs) # type: ignore[arg-type] + + try: + with patch( + "builtins.__import__", side_effect=mock_import + ): + with pytest.raises(BabelImportError) as exc_info: + parse_currency("\u20ac100", "en_US") + + error_msg = str(exc_info.value) + assert "parse_currency" in error_msg + finally: + _bc._babel_available = None diff --git a/tests/parsing_currency_cases/build_currency_maps_from_cldr_exception_paths.py b/tests/parsing_currency_cases/build_currency_maps_from_cldr_exception_paths.py new file mode 100644 index 00000000..ca207858 --- /dev/null +++ b/tests/parsing_currency_cases/build_currency_maps_from_cldr_exception_paths.py @@ -0,0 +1,307 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# _build_currency_maps_from_cldr exception paths +# --------------------------------------------------------------------------- + + +class TestBuildCurrencyMapsExceptions: + """Test _build_currency_maps_from_cldr exception handling.""" + + @pytest.fixture(autouse=True) + def _clear_cache(self) -> None: + _build_currency_maps_from_cldr.cache_clear() + _get_currency_maps.cache_clear() + + def test_locale_parse_exception_handled(self) -> None: + """Locale.parse exceptions are caught gracefully.""" + from babel import Locale + + original_parse = Locale.parse + + def mock_parse(locale_id: str) -> Any: + if "broken" in locale_id.lower(): + msg = "Mocked parse failure" + raise ValueError(msg) + return original_parse(locale_id) + + with ( + patch.object(Locale, "parse", side_effect=mock_parse), + patch( + "babel.localedata.locale_identifiers", + return_value=["en_US", "broken_locale", "de_DE"], + ), + ): + sym, amb, loc, _ = _build_currency_maps_from_cldr() + + assert isinstance(sym, dict) + assert isinstance(amb, set) + assert isinstance(loc, dict) + + def test_key_error_in_currencies_access(self) -> None: + """KeyError when accessing locale.currencies is caught.""" + mock_locale = MagicMock() + mock_locale.currencies.keys.side_effect = KeyError("Mock") + + with ( + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=["test_locale"], + ), + ): + sym, _, _, codes = _build_currency_maps_from_cldr() + + assert isinstance(sym, dict) + assert isinstance(codes, frozenset) + + def test_locale_with_currencies_none(self) -> None: + """Locale with currencies=None is handled.""" + mock_locale = MagicMock() + mock_locale.currencies = None + + with ( + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=["test_locale"], + ), + ): + sym, amb, loc, _ = _build_currency_maps_from_cldr() + + assert isinstance(sym, dict) + assert isinstance(amb, set) + assert isinstance(loc, dict) + + def test_get_currency_symbol_exception(self) -> None: + """get_currency_symbol exceptions are caught.""" + + def mock_symbol( + currency_code: str, + locale: object = None, # noqa: ARG001 - unused + ) -> str: + if currency_code == "FAIL": + msg = "Mock symbol failure" + raise ValueError(msg) + return "$" if currency_code == "USD" else currency_code + + mock_locale = MagicMock() + mock_locale.currencies = {"USD": "Dollar", "FAIL": "Bad"} + mock_locale.territory = "US" + + with ( + patch( + "babel.numbers.get_currency_symbol", + side_effect=mock_symbol, + ), + patch( + "babel.localedata.locale_identifiers", + return_value=["en_US"], + ), + patch("babel.Locale.parse", return_value=mock_locale), + ): + sym, amb, _, _ = _build_currency_maps_from_cldr() + + assert isinstance(sym, dict) + assert isinstance(amb, set) + + def test_attribute_error_in_symbol_lookup(self) -> None: + """AttributeError in get_currency_symbol is caught.""" + + def mock_raises( + currency_code: str, # noqa: ARG001 - unused + locale: object = None, # noqa: ARG001 - unused + ) -> str: + msg = "Mock attribute error" + raise AttributeError(msg) + + mock_locale = MagicMock() + mock_locale.currencies = {"USD": "Dollar"} + mock_locale.territory = "US" + mock_locale.configure_mock( + **{"__str__.return_value": "en_US"}, + ) + + with ( + patch( + "babel.numbers.get_currency_symbol", + side_effect=mock_raises, + ), + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=["en_US"], + ), + ): + sym, _, _, codes = _build_currency_maps_from_cldr() + + assert isinstance(sym, dict) + assert isinstance(codes, frozenset) + + def test_territory_currencies_exception(self) -> None: + """get_territory_currencies exception is caught.""" + + def mock_territory(territory: str) -> list[str]: + if territory == "XX": + msg = "Unknown territory" + raise ValueError(msg) + return ["USD"] + + mock_us = MagicMock() + mock_us.territory = "US" + mock_us.currencies = {} + mock_us.configure_mock( + **{"__str__.return_value": "en_US"}, + ) + + mock_xx = MagicMock() + mock_xx.territory = "XX" + mock_xx.currencies = {} + mock_xx.configure_mock( + **{"__str__.return_value": "xx_XX"}, + ) + + def mock_parse(locale_id: str) -> MagicMock: + return mock_xx if locale_id == "xx_XX" else mock_us + + with ( + patch( + "babel.numbers.get_territory_currencies", + side_effect=mock_territory, + ), + patch( + "babel.localedata.locale_identifiers", + return_value=["en_US", "xx_XX"], + ), + patch("babel.Locale.parse", side_effect=mock_parse), + ): + _, _, loc, _ = _build_currency_maps_from_cldr() + + assert isinstance(loc, dict) + + def test_unknown_locale_error_in_territory_lookup(self) -> None: + """UnknownLocaleError in get_territory_currencies is caught.""" + + def mock_raises( + territory: str, # noqa: ARG001 - unused + ) -> list[str]: + msg = "Mock unknown locale" + raise UnknownLocaleError(msg) + + mock_locale = MagicMock() + mock_locale.territory = "XX" + mock_locale.currencies = {} + mock_locale.configure_mock( + **{"__str__.return_value": "xx_XX"}, + ) + + with ( + patch( + "babel.numbers.get_territory_currencies", + side_effect=mock_raises, + ), + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=["xx_XX"], + ), + ): + _, _, _, codes = _build_currency_maps_from_cldr() + + assert isinstance(codes, frozenset) + + def test_locale_without_territory(self) -> None: + """Locale without territory is handled.""" + mock_locale = MagicMock() + mock_locale.territory = None + mock_locale.currencies = {} + + with ( + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=["en"], + ), + ): + _, _, loc, _ = _build_currency_maps_from_cldr() + + assert isinstance(loc, dict) + + def test_locale_str_without_underscore_excluded(self) -> None: + """Locale str without underscore is not in locale_to_currency.""" + mock_locale = MagicMock() + mock_locale.territory = "XX" + mock_locale.currencies = {} + mock_locale.configure_mock( + **{"__str__.return_value": "en"}, + ) + + with ( + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=["en"], + ), + patch( + "babel.numbers.get_territory_currencies", + return_value=["GBP"], + ), + ): + _, _, loc, _ = _build_currency_maps_from_cldr() + + assert "en" not in loc + + def test_empty_territory_currencies(self) -> None: + """get_territory_currencies returning empty list is handled.""" + mock_locale = MagicMock() + mock_locale.territory = "US" + mock_locale.currencies = {} + mock_locale.configure_mock( + **{"__str__.return_value": "en_US"}, + ) + + with ( + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=["en_US"], + ), + patch( + "babel.numbers.get_territory_currencies", + return_value=[], + ), + ): + _, _, loc, _ = _build_currency_maps_from_cldr() + + assert isinstance(loc, dict) + + @given(locale_count=st.integers(min_value=1, max_value=5)) + @settings(max_examples=10) + def test_handles_various_locale_counts( + self, locale_count: int + ) -> None: + """PROPERTY: Function handles any number of locales.""" + event(f"locale_count={locale_count}") + + _build_currency_maps_from_cldr.cache_clear() + mock_locales = [f"mock_{i}" for i in range(locale_count)] + + mock_locale = MagicMock() + mock_locale.territory = None + mock_locale.currencies = {} + + with ( + patch("babel.Locale.parse", return_value=mock_locale), + patch( + "babel.localedata.locale_identifiers", + return_value=mock_locales, + ), + ): + sym, amb, loc, _ = _build_currency_maps_from_cldr() + + assert isinstance(sym, dict) + assert isinstance(amb, set) + assert isinstance(loc, dict) diff --git a/tests/parsing_currency_cases/cache_management.py b/tests/parsing_currency_cases/cache_management.py new file mode 100644 index 00000000..b3927b24 --- /dev/null +++ b/tests/parsing_currency_cases/cache_management.py @@ -0,0 +1,35 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Cache management +# --------------------------------------------------------------------------- + + +class TestClearCurrencyCaches: + """Test clear_currency_caches function.""" + + def test_executes_without_error(self) -> None: + """clear_currency_caches executes without error.""" + from ftllexengine.parsing.currency import clear_currency_caches + + clear_currency_caches() + + def test_invalidates_caches(self) -> None: + """clear_currency_caches clears cached data.""" + from ftllexengine.parsing.currency import clear_currency_caches + + maps1 = _get_currency_maps() + clear_currency_caches() + maps2 = _get_currency_maps() + assert len(maps1[0]) == len(maps2[0]) + + def test_idempotent(self) -> None: + """Multiple calls are safe.""" + from ftllexengine.parsing.currency import clear_currency_caches + + clear_currency_caches() + clear_currency_caches() + clear_currency_caches() diff --git a/tests/parsing_currency_cases/cldr_map_integrity.py b/tests/parsing_currency_cases/cldr_map_integrity.py new file mode 100644 index 00000000..8c65e251 --- /dev/null +++ b/tests/parsing_currency_cases/cldr_map_integrity.py @@ -0,0 +1,91 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# CLDR map integrity +# --------------------------------------------------------------------------- + + +class TestCLDRMapIntegrity: + """Test CLDR currency map structural invariants.""" + + REQUIRED_CURRENCIES: frozenset[str] = frozenset({ + "USD", "EUR", "JPY", "GBP", "CHF", "AUD", "NZD", "CAD", + "CNY", "HKD", "SGD", "SEK", "NOK", "DKK", "KRW", + "INR", "RUB", "TRY", "ZAR", "MXN", "BRL", + "PLN", "CZK", "HUF", "RON", "BGN", + }) + + def test_symbol_lookup_locales_discover_major_currencies( + self, + ) -> None: + """Hardcoded locale list discovers major currency symbols.""" + symbol_map, _, _, _ = _get_currency_maps() + discovered: set[str] = set(symbol_map.values()) + missing = self.REQUIRED_CURRENCIES - discovered + max_missing = len(self.REQUIRED_CURRENCIES) // 5 + assert len(missing) <= max_missing, ( + f"Too many major currencies missing: {sorted(missing)}. " + f"Max allowed: {max_missing}, got: {len(missing)}" + ) + + def test_locale_to_currency_covers_major_territories( + self, + ) -> None: + """Locale-to-currency mapping covers major territories.""" + _, _, locale_to_currency, _ = _get_currency_maps() + expected_locales = { + "en_US", "en_GB", "en_CA", "en_AU", + "de_DE", "de_AT", "de_CH", + "fr_FR", "fr_CA", + "ja_JP", "zh_CN", "ko_KR", + "es_ES", "es_MX", "pt_BR", + "lv_LV", "et_EE", "lt_LT", + } + found = expected_locales & set(locale_to_currency.keys()) + missing = expected_locales - found + min_coverage = len(expected_locales) * 0.8 + assert len(found) >= min_coverage, ( + f"Insufficient: {len(found)}/{len(expected_locales)}. " + f"Missing: {sorted(missing)}" + ) + + def test_symbol_map_normalizes_bidi_wrapped_arabic_symbols(self) -> None: + """CLDR symbol map stores Arabic symbols without formatting-only marks.""" + symbol_map, _, _, _ = _get_currency_maps() + assert symbol_map["ج.م."] == "EGP" + assert symbol_map["د.إ."] == "AED" + + def test_returns_correct_types(self) -> None: + """_build_currency_maps_from_cldr returns correct types.""" + sym, amb, loc, codes = _build_currency_maps_from_cldr() + for s, c in sym.items(): + assert isinstance(s, str) + assert isinstance(c, str) + for s in amb: + assert isinstance(s, str) + for l_key, l_val in loc.items(): + assert isinstance(l_key, str) + assert isinstance(l_val, str) + assert isinstance(codes, frozenset) + + def test_euro_is_unambiguous(self) -> None: + """EUR symbol is in the unambiguous map.""" + sym, amb, _, _ = _build_currency_maps_from_cldr() + assert "\u20ac" in sym or "\u20ac" not in amb + if "\u20ac" in sym: + assert sym["\u20ac"] == "EUR" + + def test_dollar_is_ambiguous(self) -> None: + """$ symbol is in the ambiguous set.""" + _, amb, _, _ = _build_currency_maps_from_cldr() + assert "$" in amb + + def test_currency_maps_caching(self) -> None: + """_get_currency_maps_full returns same cached object.""" + result1 = currency_module._get_currency_maps_full() + result2 = currency_module._get_currency_maps_full() + assert result1 is result2 + assert len(result1) == 4 diff --git a/tests/parsing_currency_cases/fast_tier_operations.py b/tests/parsing_currency_cases/fast_tier_operations.py new file mode 100644 index 00000000..04fa57f8 --- /dev/null +++ b/tests/parsing_currency_cases/fast_tier_operations.py @@ -0,0 +1,59 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Fast tier operations +# --------------------------------------------------------------------------- + + +class TestFastTierOperations: + """Test fast tier currency operations (no CLDR scan).""" + + def test_fast_tier_symbols_available(self) -> None: + """Fast tier unambiguous symbols always available.""" + from ftllexengine.parsing.currency import ( + _FAST_TIER_UNAMBIGUOUS_SYMBOLS, + _get_currency_maps_fast, + ) + + symbols, _, _, _ = _get_currency_maps_fast() + assert len(symbols) > 0 + assert "\u20ac" in symbols + assert symbols["\u20ac"] == "EUR" + assert symbols == _FAST_TIER_UNAMBIGUOUS_SYMBOLS + + def test_currency_pattern_compiles_and_matches(self) -> None: + """Currency regex pattern compiles and matches.""" + from ftllexengine.parsing.currency import ( + _get_currency_pattern, + ) + + _get_currency_pattern.cache_clear() + try: + pattern = _get_currency_pattern() + assert pattern.search("\u20ac100") is not None + assert pattern.search("USD 100") is not None + finally: + _get_currency_pattern.cache_clear() + + def test_currency_pattern_longest_match_first(self) -> None: + """Currency pattern matches multi-char symbols before prefixes.""" + from ftllexengine.parsing.currency import ( + _get_currency_pattern, + ) + + _get_currency_pattern.cache_clear() + try: + pattern = _get_currency_pattern() + # Rs must match before R + m = pattern.search("Rs100") + assert m is not None + assert m.group() == "Rs" + # kr. must match before kr + m = pattern.search("kr.500") + assert m is not None + assert m.group() == "kr." + finally: + _get_currency_pattern.cache_clear() diff --git a/tests/parsing_currency_cases/locale_to_currency_fallback.py b/tests/parsing_currency_cases/locale_to_currency_fallback.py new file mode 100644 index 00000000..6dac27e7 --- /dev/null +++ b/tests/parsing_currency_cases/locale_to_currency_fallback.py @@ -0,0 +1,89 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Locale-to-currency fallback +# --------------------------------------------------------------------------- + + +class TestLocaleToCurrencyFallback: + """Test locale-to-currency inference fallback.""" + + def test_dollar_inferred_from_en_us(self) -> None: + """$ inferred as USD from en_US.""" + result, errors = parse_currency( + "$100", "en_US", infer_from_locale=True, + ) + assert errors == () + assert result is not None + assert result[1] == "USD" + + def test_dollar_resolves_to_usd_in_de_de(self) -> None: + """$ resolves to USD in de_DE (dollar sign is unambiguous).""" + result, errors = parse_currency( + "$100", "de_DE", infer_from_locale=True, + ) + assert errors == () + assert result is not None + assert result[1] == "USD" + + def test_cldr_only_ambiguous_symbol_locale_fallback(self) -> None: + """CLDR-only ambiguous symbol resolves via locale-to-currency map. + + Rs is ambiguous in CLDR (INR, PKR, etc.) but not in the fast-tier + ambiguous set. resolve_ambiguous_symbol returns None, so resolution + falls through to the CLDR locale-to-currency mapping. + """ + result, errors = parse_currency( + "Rs 500", "hi_IN", infer_from_locale=True, + ) + assert errors == () + assert result is not None + assert result == (Decimal(500), "INR") + + def test_cldr_only_ambiguous_kr_dot_locale_fallback(self) -> None: + """kr. (Nordic krona with period) resolves via locale-to-currency map. + + kr. is ambiguous in CLDR (DKK, NOK, SEK, ISK) but not in the fast-tier + ambiguous set. Falls through to locale-to-currency mapping. + """ + result, errors = parse_currency( + "kr.500", "da_DK", infer_from_locale=True, + ) + assert errors == () + assert result is not None + assert result == (Decimal(500), "DKK") + + def test_no_resolution_available(self) -> None: + """Empty currency maps cause resolution failure.""" + with ( + patch( + "ftllexengine.parsing.currency.resolve_ambiguous_symbol", + return_value=None, + ), + patch( + "ftllexengine.parsing.currency._get_currency_maps", + return_value=( + {}, + {"$"}, + {}, + frozenset({"USD"}), + ), + ), + ): + result, errors = parse_currency( + "$100", "en_US", infer_from_locale=True, + ) + + assert result is None + assert len(errors) == 1 + + def test_kr_unknown_locale_defaults_to_sek(self) -> None: + """kr symbol with unknown locale defaults to SEK.""" + result, error = currency_module._resolve_currency_code( + "kr", "xx_UNKNOWN", "kr 100", + default_currency=None, infer_from_locale=True, + ) + assert result == "SEK" or error is not None diff --git a/tests/parsing_currency_cases/parse_currency_error_paths.py b/tests/parsing_currency_cases/parse_currency_error_paths.py new file mode 100644 index 00000000..aa360d87 --- /dev/null +++ b/tests/parsing_currency_cases/parse_currency_error_paths.py @@ -0,0 +1,63 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# parse_currency: Error paths +# --------------------------------------------------------------------------- + + +class TestParseCurrencyErrors: + """Test error handling in parse_currency.""" + + def test_no_symbol_returns_error(self) -> None: + """Missing currency symbol returns error.""" + result, errors = parse_currency("1,234.56", "en_US") + assert result is None + assert len(errors) == 1 + + def test_invalid_input_returns_error(self) -> None: + """Non-parseable input returns error.""" + result, errors = parse_currency("invalid", "en_US") + assert result is None + assert len(errors) == 1 + + def test_invalid_number_with_symbol(self) -> None: + """Invalid number with currency symbol returns error.""" + result, errors = parse_currency("\u20acinvalid", "en_US") + assert result is None + assert len(errors) == 1 + + def test_empty_string(self) -> None: + """Empty string returns error.""" + result, errors = parse_currency("", "en_US") + assert result is None + assert len(errors) == 1 + + def test_only_symbol(self) -> None: + """Symbol without number returns error.""" + result, errors = parse_currency("\u20ac", "en_US") + assert result is None + assert len(errors) == 1 + + def test_invalid_locale(self) -> None: + """Invalid locale returns error with locale info.""" + result, errors = parse_currency( + "\u20ac10.50", "invalid_LOCALE_CODE", + ) + assert result is None + assert len(errors) == 1 + assert any("locale" in str(err).lower() for err in errors) + + def test_malformed_locale(self) -> None: + """Malformed locale returns error.""" + result, errors = parse_currency("$100", "!!!invalid@@@") + assert result is None + assert len(errors) == 1 + + def test_ambiguous_without_default_returns_error(self) -> None: + """$ without default_currency or inference returns error.""" + result, errors = parse_currency("$100", "en_US") + assert result is None + assert len(errors) == 1 diff --git a/tests/parsing_currency_cases/parse_currency_specification_examples.py b/tests/parsing_currency_cases/parse_currency_specification_examples.py new file mode 100644 index 00000000..6e96503b --- /dev/null +++ b/tests/parsing_currency_cases/parse_currency_specification_examples.py @@ -0,0 +1,102 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# parse_currency: Specification examples +# --------------------------------------------------------------------------- + + +class TestParseCurrencySpecificationExamples: + """Specification examples for parse_currency behavior.""" + + def test_eur_symbol_prefix(self) -> None: + """EUR symbol prefix: EUR100.50 -> (100.50, EUR).""" + result, errors = parse_currency("\u20ac100.50", "en_US") + assert not errors + assert result is not None + assert result == (Decimal("100.50"), "EUR") + + def test_eur_symbol_suffix_latvian(self) -> None: + """EUR symbol suffix: 100,50 EUR -> (100.50, EUR) in lv_LV.""" + result, errors = parse_currency("100,50 \u20ac", "lv_LV") + assert not errors + assert result is not None + assert result == (Decimal("100.50"), "EUR") + + def test_usd_with_default_currency(self) -> None: + """$ with default_currency=USD resolves correctly.""" + result, errors = parse_currency( + "$1,234.56", "en_US", default_currency="USD", + ) + assert not errors + assert result is not None + assert result[0] == Decimal("1234.56") + assert result[1] == "USD" + + def test_iso_code_prefix(self) -> None: + """ISO code prefix: USD 1,234.56 -> (1234.56, USD).""" + result, errors = parse_currency("USD 1,234.56", "en_US") + assert not errors + assert result is not None + assert result == (Decimal("1234.56"), "USD") + + def test_iso_code_german_format(self) -> None: + """German format: EUR 1.234,56 -> (1234.56, EUR).""" + result, errors = parse_currency("EUR 1.234,56", "de_DE") + assert not errors + assert result is not None + assert result == (Decimal("1234.56"), "EUR") + + def test_rupee_unambiguous(self) -> None: + """Indian Rupee symbol is unambiguous.""" + result, errors = parse_currency("\u20b91000", "hi_IN") + assert not errors + assert result is not None + assert result[1] == "INR" + + def test_arabic_indic_digits_ar_eg(self) -> None: + """Arabic-Indic digits parse for locales with non-Latin defaults.""" + result, errors = parse_currency( + "US$ \u0661\u0662\u066c\u0663\u0664\u0665\u066b\u0666\u0667", + "ar_EG", + ) + assert not errors + assert result is not None + assert result[0] == Decimal("12345.67") + assert result[1] == "USD" + + def test_swiss_franc_iso(self) -> None: + """Swiss Franc via ISO code.""" + result, errors = parse_currency("CHF 100", "de_CH") + assert not errors + assert result is not None + assert result == (Decimal(100), "CHF") + + def test_cny_chinese_locale(self) -> None: + """Yen symbol resolves to CNY in Chinese locales.""" + result, errors = parse_currency( + "\u00a51000", "zh_CN", infer_from_locale=True, + ) + assert not errors + assert result is not None + assert result[1] == "CNY" + + def test_jpy_japanese_locale(self) -> None: + """Yen symbol resolves to JPY in Japanese locales.""" + result, errors = parse_currency( + "\u00a512,345", "ja_JP", infer_from_locale=True, + ) + assert not errors + assert result is not None + assert result[1] == "JPY" + + def test_gbp_british_locale(self) -> None: + """Pound symbol resolves to GBP in British locales.""" + result, errors = parse_currency( + "\u00a3999.99", "en_GB", infer_from_locale=True, + ) + assert not errors + assert result is not None + assert result == (Decimal("999.99"), "GBP") diff --git a/tests/parsing_currency_cases/pattern_compilation_fallback.py b/tests/parsing_currency_cases/pattern_compilation_fallback.py new file mode 100644 index 00000000..066fa0c4 --- /dev/null +++ b/tests/parsing_currency_cases/pattern_compilation_fallback.py @@ -0,0 +1,34 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Pattern compilation fallback +# --------------------------------------------------------------------------- + + +class TestPatternCompilationFallback: + """Test pattern compilation with empty symbol maps.""" + + def test_pattern_fallback_with_empty_symbols(self) -> None: + """Pattern falls back to ISO-code-only when no symbols.""" + from ftllexengine.parsing.currency import ( + _get_currency_pattern, + ) + + _get_currency_pattern.cache_clear() + + with patch( + "ftllexengine.parsing.currency._get_currency_maps", + return_value=({}, set(), {}, frozenset()), + ): + _get_currency_pattern.cache_clear() + pattern = _get_currency_pattern() + + assert isinstance(pattern, re.Pattern) + assert pattern.search("USD") is not None + assert pattern.search("\u20ac") is None + + _get_currency_pattern.cache_clear() + _get_currency_maps.cache_clear() diff --git a/tests/parsing_currency_cases/property_ambiguous_symbols_with_locale_inference.py b/tests/parsing_currency_cases/property_ambiguous_symbols_with_locale_inference.py new file mode 100644 index 00000000..708b9f68 --- /dev/null +++ b/tests/parsing_currency_cases/property_ambiguous_symbols_with_locale_inference.py @@ -0,0 +1,52 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Property: Ambiguous symbols with locale inference +# --------------------------------------------------------------------------- + + +class TestAmbiguousSymbolResolution: + """Property-based tests for ambiguous symbol resolution.""" + + @given(data=ambiguous_currency_inputs()) + def test_ambiguous_with_default_resolves( + self, data: tuple[str, str, str, str] + ) -> None: + """PROPERTY: Ambiguous symbols with default_currency resolve.""" + value, locale, default_currency, expected = data + event(f"locale={locale}") + + result, errors = parse_currency( + value, locale, default_currency=default_currency, + ) + if result is not None: + _, code = result + assert code == expected + assert errors == () + + @given( + locale_currency=st.sampled_from([ + ("en_US", "USD"), ("en_CA", "CAD"), + ("en_AU", "AUD"), ("en_NZ", "NZD"), + ("es_MX", "MXN"), ("es_AR", "ARS"), + ]) + ) + def test_dollar_locale_inference( + self, locale_currency: tuple[str, str] + ) -> None: + """PROPERTY: $ with infer_from_locale resolves per locale.""" + locale, expected = locale_currency + event(f"dollar_locale={locale}") + + result, errors = parse_currency( + "$100", locale, infer_from_locale=True, + ) + assert result is not None, ( + f"$ should resolve via locale {locale}" + ) + _, code = result + assert code == expected + assert errors == () diff --git a/tests/parsing_currency_cases/property_arbitrary_locales_never_crash.py b/tests/parsing_currency_cases/property_arbitrary_locales_never_crash.py new file mode 100644 index 00000000..cad88d66 --- /dev/null +++ b/tests/parsing_currency_cases/property_arbitrary_locales_never_crash.py @@ -0,0 +1,33 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Property: Arbitrary locales never crash +# --------------------------------------------------------------------------- + + +class TestLocaleResilience: + """Property-based tests for locale robustness.""" + + @given( + bad_locale=st.text( + alphabet=st.characters(blacklist_categories=["Cs"]), + min_size=1, + max_size=20, + ).filter(lambda x: x not in ["en", "en_US", "de_DE", "fr_FR"]) + ) + def test_arbitrary_locales_never_crash( + self, bad_locale: str + ) -> None: + """PROPERTY: Invalid locales never crash currency parsing.""" + locale_len = "short" if len(bad_locale) <= 5 else "long" + event(f"locale_length={locale_len}") + has_underscore = "_" in bad_locale + event(f"has_underscore={has_underscore}") + + result, errors = parse_currency("\u20ac50", bad_locale) + assert result is None or isinstance(result, tuple) + if result is None: + assert len(errors) > 0 diff --git a/tests/parsing_currency_cases/property_invalid_inputs_never_crash_always_return_errors.py b/tests/parsing_currency_cases/property_invalid_inputs_never_crash_always_return_errors.py new file mode 100644 index 00000000..3fcbd48c --- /dev/null +++ b/tests/parsing_currency_cases/property_invalid_inputs_never_crash_always_return_errors.py @@ -0,0 +1,44 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Property: Invalid inputs never crash, always return errors +# --------------------------------------------------------------------------- + + +class TestInvalidCurrencyInputs: + """Property-based tests for invalid currency input handling.""" + + @given(data=invalid_currency_inputs()) + def test_invalid_input_returns_error( + self, data: tuple[str, str] + ) -> None: + """PROPERTY: Invalid inputs return error tuple, never crash.""" + value, locale = data + is_empty = value == "" + event(f"is_empty={is_empty}") + + result, errors = parse_currency(value, locale) + assert result is None + assert len(errors) > 0 + + @given( + invalid_value=st.text(min_size=1, max_size=30).filter( + lambda x: not any(c.isdigit() for c in x) + ) + ) + def test_no_digits_always_fails( + self, invalid_value: str + ) -> None: + """PROPERTY: Values without digits always fail to parse.""" + has_currency_char = any( + c in invalid_value for c in "\u20ac$\u00a3\u00a5\u20b9" + ) + event(f"has_currency_char={has_currency_char}") + val_len = "short" if len(invalid_value) <= 5 else "long" + event(f"value_length={val_len}") + + result, _ = parse_currency(invalid_value, "en_US") + assert result is None diff --git a/tests/parsing_currency_cases/property_iso_code_inputs_always_resolve_correctly.py b/tests/parsing_currency_cases/property_iso_code_inputs_always_resolve_correctly.py new file mode 100644 index 00000000..6d6cfa25 --- /dev/null +++ b/tests/parsing_currency_cases/property_iso_code_inputs_always_resolve_correctly.py @@ -0,0 +1,28 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Property: ISO code inputs always resolve correctly +# --------------------------------------------------------------------------- + + +class TestISOCodeParsing: + """Property-based tests for ISO code currency parsing.""" + + @settings(deadline=None) + @given(data=iso_code_currency_inputs()) + def test_iso_code_parses_to_correct_currency( + self, data: tuple[str, str, str] + ) -> None: + """PROPERTY: ISO codes resolve to the correct currency.""" + value, locale, expected_code = data + event(f"iso_code={expected_code}") + + result, errors = parse_currency(value, locale) + assert result is not None, f"Failed to parse: {value!r} ({locale})" + amount, code = result + assert code == expected_code + assert isinstance(amount, Decimal) + assert errors == () diff --git a/tests/parsing_currency_cases/property_unambiguous_symbols_always_parse_successfully.py b/tests/parsing_currency_cases/property_unambiguous_symbols_always_parse_successfully.py new file mode 100644 index 00000000..65ca8018 --- /dev/null +++ b/tests/parsing_currency_cases/property_unambiguous_symbols_always_parse_successfully.py @@ -0,0 +1,29 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Property: Unambiguous symbols always parse successfully +# --------------------------------------------------------------------------- + + +class TestUnambiguousCurrencyParsing: + """Property-based tests for unambiguous currency parsing.""" + + @settings(deadline=None) # CLDR + numbering-system warmup varies on first call + @given(data=unambiguous_currency_inputs()) + def test_unambiguous_symbol_parses( + self, data: tuple[str, str, str] + ) -> None: + """PROPERTY: Unambiguous symbols and ISO codes always parse.""" + value, locale, expected_code = data + event(f"expected_code={expected_code}") + + result, errors = parse_currency(value, locale) + # Unambiguous symbols should parse without error + if result is not None: + amount, code = result + assert code == expected_code + assert isinstance(amount, Decimal) + assert errors == () diff --git a/tests/parsing_currency_cases/resolve_ambiguous_symbol_locale_prefix_fallback.py b/tests/parsing_currency_cases/resolve_ambiguous_symbol_locale_prefix_fallback.py new file mode 100644 index 00000000..78ed437c --- /dev/null +++ b/tests/parsing_currency_cases/resolve_ambiguous_symbol_locale_prefix_fallback.py @@ -0,0 +1,87 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# resolve_ambiguous_symbol: Locale prefix fallback +# --------------------------------------------------------------------------- + + +class TestResolveAmbiguousSymbolLocalePrefix: + """Test resolve_ambiguous_symbol locale prefix matching.""" + + def test_yen_sign_with_zh_cn_uses_prefix(self) -> None: + """Yen sign resolves to CNY via zh prefix for zh_CN.""" + result = resolve_ambiguous_symbol("\u00a5", "zh_CN") + assert result == "CNY" + + def test_yen_sign_with_zh_tw_uses_prefix(self) -> None: + """Yen sign resolves to CNY via zh prefix for zh_TW.""" + result = resolve_ambiguous_symbol("\u00a5", "zh_TW") + assert result == "CNY" + + def test_yen_sign_with_zh_hk_uses_prefix(self) -> None: + """Yen sign resolves to CNY via zh prefix for zh_HK.""" + result = resolve_ambiguous_symbol("\u00a5", "zh_HK") + assert result == "CNY" + + def test_pound_sign_with_en_gb_exact_match(self) -> None: + """Pound sign resolves to GBP via exact en_gb match.""" + result = resolve_ambiguous_symbol("\u00a3", "en_GB") + assert result == "GBP" + + def test_pound_sign_with_ar_eg_exact_match(self) -> None: + """Pound sign resolves to EGP via exact ar_eg match.""" + result = resolve_ambiguous_symbol("\u00a3", "ar_EG") + assert result == "EGP" + + def test_pound_sign_with_ar_sa_uses_prefix(self) -> None: + """Pound sign resolves to EGP via ar prefix for ar_SA.""" + # ar_SA is not in exact match but ar prefix maps to EGP + result = resolve_ambiguous_symbol("\u00a3", "ar_SA") + assert result == "EGP" + + def test_non_ambiguous_returns_none(self) -> None: + """Non-ambiguous symbols return None.""" + result = resolve_ambiguous_symbol("\u20ac", "en_US") + assert result is None + + def test_no_locale_uses_default(self) -> None: + """Ambiguous symbol without locale uses default.""" + result = resolve_ambiguous_symbol("\u00a5", None) + assert result == "JPY" + + def test_empty_locale_uses_default(self) -> None: + """Ambiguous symbol with empty locale uses default.""" + result = resolve_ambiguous_symbol("$", "") + assert result == "USD" + + def test_unknown_locale_with_underscore_uses_default(self) -> None: + """Unknown locale with underscore falls through to default.""" + result = resolve_ambiguous_symbol("$", "xx_YY") + assert result == "USD" + + def test_unknown_locale_without_underscore_uses_default(self) -> None: + """Unknown locale without underscore skips prefix match.""" + result = resolve_ambiguous_symbol("$", "xx") + assert result == "USD" + + @given( + symbol_locale=st.sampled_from([ + ("\u00a5", "zh_CN", "CNY"), + ("\u00a5", "zh_TW", "CNY"), + ("\u00a5", "zh_HK", "CNY"), + ("\u00a3", "ar_SA", "EGP"), + ("\u00a3", "ar_DZ", "EGP"), + ]) + ) + def test_prefix_resolution_property( + self, symbol_locale: tuple[str, str, str] + ) -> None: + """PROPERTY: Locale prefix resolution matches expected currency.""" + symbol, locale, expected = symbol_locale + event(f"prefix_symbol={symbol}") + event(f"prefix_locale={locale}") + result = resolve_ambiguous_symbol(symbol, locale) + assert result == expected diff --git a/tests/parsing_currency_cases/resolve_currency_code_internal_paths.py b/tests/parsing_currency_cases/resolve_currency_code_internal_paths.py new file mode 100644 index 00000000..8ad74b80 --- /dev/null +++ b/tests/parsing_currency_cases/resolve_currency_code_internal_paths.py @@ -0,0 +1,88 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# _resolve_currency_code internal paths +# --------------------------------------------------------------------------- + + +class TestResolveCurrencyCode: + """Test _resolve_currency_code edge cases.""" + + def test_unknown_symbol_returns_error(self) -> None: + """Unknown symbol returns error.""" + result, error = currency_module._resolve_currency_code( + "ZZZZZ", "en_US", "ZZZZZ 100", + default_currency=None, infer_from_locale=False, + ) + assert result is None + assert error is not None + + def test_invalid_default_currency_format(self) -> None: + """Ambiguous symbol with invalid default_currency returns error.""" + result, error = currency_module._resolve_currency_code( + "$", "en_US", "$100", + default_currency="invalid", infer_from_locale=False, + ) + assert result is None + assert error is not None + + def test_lowercase_default_currency_rejected(self) -> None: + """Lowercase default_currency is rejected (ISO requires uppercase).""" + result, error = currency_module._resolve_currency_code( + "$", "en_US", "$100", + default_currency="usd", infer_from_locale=False, + ) + assert result is None + assert error is not None + + def test_short_default_currency_rejected(self) -> None: + """2-letter default_currency is rejected (ISO requires 3).""" + result, error = currency_module._resolve_currency_code( + "$", "en_US", "$100", + default_currency="US", infer_from_locale=False, + ) + assert result is None + assert error is not None + + def test_long_default_currency_rejected(self) -> None: + """4-letter default_currency is rejected (ISO requires 3).""" + result, error = currency_module._resolve_currency_code( + "$", "en_US", "$100", + default_currency="USDD", infer_from_locale=False, + ) + assert result is None + assert error is not None + + def test_numeric_default_currency_rejected(self) -> None: + """Numeric default_currency is rejected (ISO requires letters).""" + result, error = currency_module._resolve_currency_code( + "$", "en_US", "$100", + default_currency="123", infer_from_locale=False, + ) + assert result is None + assert error is not None + + def test_invalid_iso_code_not_in_cldr(self) -> None: + """3-letter uppercase code not in CLDR returns error.""" + result, errors = parse_currency("AAA 100", "en_US") + assert result is None + assert len(errors) == 1 + + @given( + default=st.from_regex(r"[a-z]{3}", fullmatch=True) + ) + @settings(max_examples=20) + def test_lowercase_codes_always_rejected( + self, default: str + ) -> None: + """PROPERTY: Lowercase 3-letter codes always rejected.""" + event(f"code_sample={default[:2]}") + result, error = currency_module._resolve_currency_code( + "$", "en_US", "$100", + default_currency=default, infer_from_locale=False, + ) + assert result is None + assert error is not None diff --git a/tests/parsing_currency_cases/roundtrip_format_parse_verify.py b/tests/parsing_currency_cases/roundtrip_format_parse_verify.py new file mode 100644 index 00000000..9e7eceac --- /dev/null +++ b/tests/parsing_currency_cases/roundtrip_format_parse_verify.py @@ -0,0 +1,80 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Roundtrip: format -> parse -> verify +# --------------------------------------------------------------------------- + + +class TestRoundtripCurrency: + """Test format -> parse -> verify roundtrip.""" + + def test_roundtrip_usd_en_us(self) -> None: + """Currency roundtrip for US English.""" + from ftllexengine.runtime.functions import currency_format + + original = Decimal("1234.56") + formatted = currency_format( + original, "en-US", + currency="USD", currency_display="symbol", + ) + result, errors = parse_currency( + str(formatted), "en_US", default_currency="USD", + ) + assert not errors + assert result is not None + assert result[0] == original + assert result[1] == "USD" + + def test_roundtrip_eur_lv_lv(self) -> None: + """Currency roundtrip for Latvian EUR.""" + from ftllexengine.runtime.functions import currency_format + + original = Decimal("1234.56") + formatted = currency_format( + original, "lv-LV", + currency="EUR", currency_display="symbol", + ) + result, errors = parse_currency(str(formatted), "lv_LV") + assert not errors + assert result is not None + assert result[0] == original + assert result[1] == "EUR" + + def test_roundtrip_usd_ar_eg_with_rtl_marks(self) -> None: + """RTL locale currency output roundtrips through parse_currency().""" + from ftllexengine.runtime.functions import currency_format + + original = Decimal("1234.56") + formatted = currency_format( + original, "ar-EG", + currency="USD", currency_display="symbol", + ) + result, errors = parse_currency(str(formatted), "ar_EG") + assert not errors + assert result is not None + assert result[0] == original + assert result[1] == "USD" + + def test_roundtrip_egp_ar_eg_local_symbol(self) -> None: + """Localized Arabic EGP symbol roundtrips through parse_currency().""" + from ftllexengine.runtime.functions import currency_format + + original = Decimal("1234.56") + formatted = currency_format( + original, "ar-EG", + currency="EGP", currency_display="symbol", + ) + result, errors = parse_currency(str(formatted), "ar_EG", default_currency="EGP") + assert not errors + assert result is not None + assert result[0] == original + assert result[1] == "EGP" + + def test_parse_currency_ignores_bidi_isolation_marks(self) -> None: + """Invisible bidi controls are ignored at the parsing boundary.""" + result, errors = parse_currency("\u2068$123.45\u2069", "en_US", default_currency="USD") + assert not errors + assert result == (Decimal("123.45"), "USD") diff --git a/tests/parsing_currency_cases/thread_safe_caching_behavior.py b/tests/parsing_currency_cases/thread_safe_caching_behavior.py new file mode 100644 index 00000000..33d59eda --- /dev/null +++ b/tests/parsing_currency_cases/thread_safe_caching_behavior.py @@ -0,0 +1,57 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_currency.py.""" + +from tests.parsing_currency_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# Thread-safe caching behavior +# --------------------------------------------------------------------------- + + +class TestCurrencyCachingConcurrency: + """Test thread-safe caching via functools.cache.""" + + def test_concurrent_currency_maps_access(self) -> None: + """Concurrent calls to _get_currency_maps_full return cached object. + + functools.cache provides thread-safe cache access, but does NOT + prevent thundering herd on cold cache (multiple threads may compute + simultaneously). This test verifies that AFTER cache is populated, + concurrent access returns the same cached object. + """ + import threading + + # Pre-warm cache to ensure it's populated + _ = currency_module._get_currency_maps_full() + + barrier = threading.Barrier(4) + results: list[object] = [] + + def get_with_barrier() -> None: + barrier.wait() + data = currency_module._get_currency_maps_full() + results.append(data) + + threads = [ + threading.Thread(target=get_with_barrier) + for _ in range(4) + ] + for t in threads: + t.start() + for t in threads: + t.join() + + assert len(results) == 4 + assert all(r is results[0] for r in results) + + def test_currency_maps_structure(self) -> None: + """Cached currency maps have expected 4-tuple structure.""" + data = currency_module._get_currency_maps_full() + + assert len(data) == 4 + symbol_map, ambiguous, locale_to_currency, valid_codes = data + + assert isinstance(symbol_map, dict) + assert isinstance(ambiguous, set) + assert isinstance(locale_to_currency, dict) + assert isinstance(valid_codes, frozenset) diff --git a/tests/parsing_dates_cases/__init__.py b/tests/parsing_dates_cases/__init__.py new file mode 100644 index 00000000..2d5a74d4 --- /dev/null +++ b/tests/parsing_dates_cases/__init__.py @@ -0,0 +1,62 @@ +"""Tests for date and datetime parsing functions. + +Core parsing tests, internal function edge cases, tokenizer, separator +extraction, BabelImportError paths, datetime ordering, and property-based +roundtrip tests for parse_date() and parse_datetime(). + +Functions return tuple[value, errors]: +- parse_date() returns tuple[date | None, list[FluentParseError]] +- parse_datetime() returns tuple[datetime | None, list[FluentParseError]] +- Functions never raise exceptions; errors returned in list + +Python 3.13+. +""" + +from __future__ import annotations + +import builtins +import sys +from datetime import UTC, date, datetime +from unittest.mock import MagicMock, Mock, patch + +import pytest +from babel import Locale +from hypothesis import event, given +from hypothesis import strategies as st + +import ftllexengine.core.babel_compat as _bc +from ftllexengine.parsing.dates import ( + _babel_to_strptime, + _extract_datetime_separator, + _get_date_patterns, + _get_datetime_patterns, + _preprocess_datetime_input, + _tokenize_babel_pattern, + parse_date, + parse_datetime, +) + +__all__ = [ + "UTC", + "Locale", + "MagicMock", + "Mock", + "_babel_to_strptime", + "_bc", + "_extract_datetime_separator", + "_get_date_patterns", + "_get_datetime_patterns", + "_preprocess_datetime_input", + "_tokenize_babel_pattern", + "builtins", + "date", + "datetime", + "event", + "given", + "parse_date", + "parse_datetime", + "patch", + "pytest", + "st", + "sys", +] diff --git a/tests/parsing_dates_cases/babel_datetime_format_conversion_mock.py b/tests/parsing_dates_cases/babel_datetime_format_conversion_mock.py new file mode 100644 index 00000000..9ad50042 --- /dev/null +++ b/tests/parsing_dates_cases/babel_datetime_format_conversion_mock.py @@ -0,0 +1,41 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Babel Datetime Format Conversion (Mock) +# ============================================================================ + + +class TestBabelDatetimeFormatConversion: + """Test Babel datetime format conversion with mock pattern objects.""" + + def test_babel_datetime_format_with_mock(self) -> None: + """Mock Babel to return pattern object for datetime_formats.""" + from ftllexengine.parsing import dates + + dates._get_datetime_patterns.cache_clear() + dates._get_date_patterns.cache_clear() + + try: + mock_pattern = Mock() + mock_pattern.pattern = "M/d/yy, h:mm a" + + mock_locale = Mock() + mock_locale.datetime_formats = { + "short": mock_pattern, "medium": mock_pattern, + } + mock_date_format = Mock() + mock_date_format.pattern = "M/d/yy" + mock_locale.date_formats = {"short": mock_date_format} + + with patch("babel.Locale") as mock_locale_class: + mock_locale_class.parse.return_value = mock_locale + patterns = dates._get_datetime_patterns( + "test_mock_locale" + ) + assert len(patterns) > 0 + finally: + dates._get_datetime_patterns.cache_clear() + dates._get_date_patterns.cache_clear() diff --git a/tests/parsing_dates_cases/babel_import_error_structure.py b/tests/parsing_dates_cases/babel_import_error_structure.py new file mode 100644 index 00000000..1661e076 --- /dev/null +++ b/tests/parsing_dates_cases/babel_import_error_structure.py @@ -0,0 +1,63 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# BabelImportError Structure +# ============================================================================ + + +class TestBabelImportErrorBehavior: + """Test BabelImportError structure and message format.""" + + def test_babel_import_error_structure(self) -> None: + """BabelImportError has correct structure and message.""" + from ftllexengine.core.babel_compat import BabelImportError + + error = BabelImportError("parse_date") + assert error.feature == "parse_date" + assert "parse_date" in str(error) + assert "pip install ftllexengine[babel]" in str(error) + assert isinstance(error, ImportError) + + def test_get_date_patterns_returns_valid_patterns(self) -> None: + """_get_date_patterns returns valid (pattern, has_era) tuples.""" + from ftllexengine.parsing import dates + + dates._get_date_patterns.cache_clear() + patterns = dates._get_date_patterns("en_US") + + assert isinstance(patterns, tuple) + assert len(patterns) > 0 + for pattern, has_era in patterns: + assert isinstance(pattern, str) + assert isinstance(has_era, bool) + + def test_get_datetime_patterns_returns_valid_patterns(self) -> None: + """_get_datetime_patterns returns valid (pattern, has_era) tuples.""" + from ftllexengine.parsing import dates + + dates._get_datetime_patterns.cache_clear() + patterns = dates._get_datetime_patterns("en_US") + + assert isinstance(patterns, tuple) + assert len(patterns) > 0 + for pattern, has_era in patterns: + assert isinstance(pattern, str) + assert isinstance(has_era, bool) + + def test_parse_date_works(self) -> None: + """parse_date works correctly when Babel is installed.""" + result, errors = parse_date("2025-01-28", "en_US") + assert not errors + assert result is not None + assert result.year == 2025 + + def test_parse_datetime_works(self) -> None: + """parse_datetime works correctly when Babel is installed.""" + result, errors = parse_datetime("2025-01-28 14:30", "en_US") + assert not errors + assert result is not None + assert result.year == 2025 + assert result.hour == 14 diff --git a/tests/parsing_dates_cases/babel_to_strptime_timezone_token_handling.py b/tests/parsing_dates_cases/babel_to_strptime_timezone_token_handling.py new file mode 100644 index 00000000..204b4bd1 --- /dev/null +++ b/tests/parsing_dates_cases/babel_to_strptime_timezone_token_handling.py @@ -0,0 +1,87 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _babel_to_strptime: Timezone Token Handling +# ============================================================================ + + +class TestBabelToStrptimeTimezoneToken: + """Test _babel_to_strptime timezone token handling.""" + + def test_timezone_z(self) -> None: + """Timezone token 'z' is removed from pattern.""" + pattern, has_era = _babel_to_strptime("d MMM y HH:mm z") + assert has_era is False + assert "z" not in pattern + + def test_timezone_zzzz(self) -> None: + """Timezone token 'zzzz' is removed.""" + pattern, has_era = _babel_to_strptime( + "MMMM d, y 'at' h:mm a zzzz" + ) + assert has_era is False + assert "zzzz" not in pattern + + def test_timezone_v(self) -> None: + """Timezone token 'v' is removed.""" + pattern, has_era = _babel_to_strptime("d MMM y HH:mm v") + assert has_era is False + assert "v" not in pattern + + def test_timezone_vvvv(self) -> None: + """Timezone token 'vvvv' is removed.""" + pattern, has_era = _babel_to_strptime("d MMM y HH:mm vvvv") + assert has_era is False + assert "vvvv" not in pattern + + def test_timezone_o(self) -> None: + """Timezone token 'O' is removed.""" + pattern, has_era = _babel_to_strptime("d MMM y HH:mm O") + assert has_era is False + assert "O" not in pattern + + def test_both_era_and_timezone(self) -> None: + """Both era and timezone tokens handled correctly.""" + pattern, has_era = _babel_to_strptime("d MMM y G HH:mm z") + assert has_era is True + assert "G" not in pattern + assert "z" not in pattern + + def test_none_token_fallthrough(self) -> None: + """None-mapped token that is not era is silently dropped.""" + from ftllexengine.parsing import dates as dates_module + + original_map = dates_module._BABEL_TOKEN_MAP.copy() + modified_map = original_map.copy() + modified_map["QQQ"] = None + + with patch.object( + dates_module, "_BABEL_TOKEN_MAP", modified_map + ): + pattern, has_era = _babel_to_strptime( + "d MMM y QQQ HH:mm" + ) + assert has_era is False + assert "QQQ" not in pattern + + def test_zzzz_localized_gmt_skipped(self) -> None: + """ZZZZ (localized GMT) is skipped entirely.""" + pattern, has_era = _babel_to_strptime("d MMM y HH:mm ZZZZ") + assert has_era is False + assert "ZZZZ" not in pattern + assert "%z" not in pattern + + def test_trailing_whitespace_normalized(self) -> None: + """Trailing whitespace from skipped tokens is stripped.""" + pattern, has_era = _babel_to_strptime("HH:mm zzzz") + assert has_era is False + assert pattern == "%H:%M" + + def test_multiple_trailing_spaces_normalized(self) -> None: + """Multiple trailing spaces from skipped tokens stripped.""" + pattern, has_era = _babel_to_strptime("HH:mm zzzz") + assert has_era is False + assert pattern == "%H:%M" diff --git a/tests/parsing_dates_cases/datetime_separator_and_babel_pattern_tokenizer_coverage.py b/tests/parsing_dates_cases/datetime_separator_and_babel_pattern_tokenizer_coverage.py new file mode 100644 index 00000000..7d30d435 --- /dev/null +++ b/tests/parsing_dates_cases/datetime_separator_and_babel_pattern_tokenizer_coverage.py @@ -0,0 +1,89 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# DATETIME SEPARATOR AND BABEL PATTERN TOKENIZER COVERAGE +# ============================================================================ + + +class TestTokenizeBabelPatternEdgeCases: + """_tokenize_babel_pattern: patterns starting with a quote and unclosed sections.""" + + def test_quoted_section_with_escaped_quote(self) -> None: + """Escaped quote '' inside a quoted literal is unescaped to a single quote.""" + pattern = "'It''s a test'" + tokens = _tokenize_babel_pattern(pattern) + assert any("It's a test" in t for t in tokens) + + def test_unclosed_quoted_section(self) -> None: + """Unclosed quoted literal collects remaining characters.""" + pattern = "'unclosed" + tokens = _tokenize_babel_pattern(pattern) + assert any("unclosed" in t for t in tokens) + + +class TestDatesQuotedLiteral: + """Non-empty quoted literal in Babel date pattern tokenizes correctly.""" + + def test_quoted_literal_in_pattern(self) -> None: + """Spanish-style quoted separator 'de' is extracted as a token.""" + pattern = "d 'de' MMMM 'de' y" + tokens = _tokenize_babel_pattern(pattern) + assert "de" in tokens + + +class TestParseDateFourDigitYear: + """4-digit year inputs are accepted for locales whose CLDR short format uses yy. + + CLDR short patterns often specify a 2-digit year (e.g. lv-LV: dd.MM.yy, + en-US: M/d/yy). Documents commonly write dates with a 4-digit year for + clarity and unambiguity. Both forms must parse successfully. + """ + + def test_lv_lv_two_digit_year_parses(self) -> None: + """lv-LV short format (dd.MM.yy) parses 2-digit year correctly.""" + result, errors = parse_date("15.01.26", "lv_LV") + assert not errors + assert result == date(2026, 1, 15) + + def test_lv_lv_four_digit_year_parses(self) -> None: + """lv-LV common form (dd.MM.yyyy) parses 4-digit year correctly.""" + result, errors = parse_date("15.01.2026", "lv_LV") + assert not errors + assert result == date(2026, 1, 15) + + def test_lv_lv_four_digit_year_roundtrip_identity(self) -> None: + """Parse("15.01.2026", lv_LV) yields the same date as parse("15.01.26", lv_LV).""" + result_2, _ = parse_date("15.01.26", "lv_LV") + result_4, _ = parse_date("15.01.2026", "lv_LV") + assert result_2 == result_4 + + def test_de_de_four_digit_year_parses(self) -> None: + """de-DE short format (dd.MM.yy) accepts 4-digit year variant.""" + result, errors = parse_date("28.01.2025", "de_DE") + assert not errors + assert result == date(2025, 1, 28) + + def test_pl_pl_four_digit_year_parses(self) -> None: + """pl-PL short format accepts 4-digit year variant.""" + result, errors = parse_date("28.01.2025", "pl_PL") + assert not errors + assert result == date(2025, 1, 28) + + def test_two_digit_year_still_expands_via_cldr_semantics(self) -> None: + """2-digit input still matches first (CLDR %y expansion: 00-68 -> 2000-2068).""" + # %y in Python strptime: 00-68 -> 2000-2068, 69-99 -> 1969-1999 + result_short, _ = parse_date("28.01.68", "lv_LV") + assert result_short is not None + assert result_short.year == 2068 # %y expansion + + def test_extract_cldr_patterns_includes_four_digit_variant(self) -> None: + """_get_date_patterns for lv_LV includes both %y and %Y variants for short style.""" + patterns = _get_date_patterns("lv_LV") + strptime_patterns = [p for p, _ in patterns] + has_two_digit = any("%y" in p for p in strptime_patterns) + has_four_digit = any(("%Y" in p and ".%Y" in p) or "%Y" in p for p in strptime_patterns) + assert has_two_digit, "2-digit year pattern (%y) must be present for lv_LV" + assert has_four_digit, "4-digit year variant (%Y) must be generated for lv_LV" diff --git a/tests/parsing_dates_cases/extract_datetime_separator.py b/tests/parsing_dates_cases/extract_datetime_separator.py new file mode 100644 index 00000000..d3b7f037 --- /dev/null +++ b/tests/parsing_dates_cases/extract_datetime_separator.py @@ -0,0 +1,63 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _extract_datetime_separator +# ============================================================================ + + +class TestExtractDatetimeSeparator: + """Test _extract_datetime_separator edge cases.""" + + def test_normal_order(self) -> None: + """en_US uses date-first order.""" + locale = Locale.parse("en_US") + separator, is_time_first = _extract_datetime_separator(locale) + assert isinstance(separator, str) + assert is_time_first is False + + def test_fallback_on_missing(self) -> None: + """Missing datetime_format returns fallback space.""" + mock_locale = MagicMock() + mock_locale.datetime_formats.get.return_value = None + separator, is_time_first = _extract_datetime_separator(mock_locale) + assert separator == " " + assert is_time_first is False + + def test_missing_placeholders(self) -> None: + """Pattern without placeholders returns fallback.""" + mock_locale = MagicMock() + mock_locale.datetime_formats.get.return_value = ( + "no placeholders here" + ) + separator, is_time_first = _extract_datetime_separator(mock_locale) + assert separator == " " + assert is_time_first is False + + def test_reversed_order(self) -> None: + """Pattern with {0} before {1} detects time-first.""" + mock_locale = MagicMock() + mock_locale.datetime_formats.get.return_value = "{0} at {1}" + separator, is_time_first = _extract_datetime_separator(mock_locale) + assert separator == " at " + assert is_time_first is True + + def test_adjacent_placeholders(self) -> None: + """Adjacent placeholders return fallback separator.""" + mock_locale = MagicMock() + mock_locale.datetime_formats.get.return_value = "{1}{0}" + separator, is_time_first = _extract_datetime_separator(mock_locale) + assert separator == " " + assert is_time_first is False + + def test_exception_handling(self) -> None: + """AttributeError returns fallback.""" + mock_locale = MagicMock() + mock_locale.datetime_formats.get.side_effect = AttributeError( + "mock error" + ) + separator, is_time_first = _extract_datetime_separator(mock_locale) + assert separator == " " + assert is_time_first is False diff --git a/tests/parsing_dates_cases/get_date_patterns_exception_handling.py b/tests/parsing_dates_cases/get_date_patterns_exception_handling.py new file mode 100644 index 00000000..61c72cb5 --- /dev/null +++ b/tests/parsing_dates_cases/get_date_patterns_exception_handling.py @@ -0,0 +1,123 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _get_date_patterns Exception Handling +# ============================================================================ + + +class TestGetDatePatternsExceptions: + """Test _get_date_patterns exception handling.""" + + def test_unknown_locale_returns_empty(self) -> None: + """Unknown locale returns empty tuple.""" + _get_date_patterns.cache_clear() + assert _get_date_patterns("xx-UNKNOWN") == () + + def test_invalid_format_returns_empty(self) -> None: + """Invalid format returns empty tuple.""" + _get_date_patterns.cache_clear() + assert _get_date_patterns("not-valid-at-all-xyz-123") == () + + def test_valid_locale_returns_patterns(self) -> None: + """Valid locale returns non-empty patterns.""" + _get_date_patterns.cache_clear() + assert len(_get_date_patterns("en-US")) > 0 + + def test_attribute_error_in_pattern(self) -> None: + """AttributeError accessing pattern falls back to str(fmt).""" + _get_date_patterns.cache_clear() + + mock_format = MagicMock() + del mock_format.pattern + + with patch.object(Locale, "parse") as mock_parse: + mock_locale = MagicMock() + mock_locale.date_formats = { + "short": mock_format, "medium": mock_format, + "long": mock_format, "full": mock_format, + } + mock_parse.return_value = mock_locale + _get_date_patterns.cache_clear() + patterns = _get_date_patterns("mock-locale-attr-err") + + assert len(patterns) > 0 + + def test_raises_babel_import_error_when_babel_missing(self) -> None: + """Raises BabelImportError when Babel unavailable.""" + _get_date_patterns.cache_clear() + _bc._babel_available = None + + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_: dict[str, object] | None = None, + locals_: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel": + msg = "No module named 'babel'" + raise ImportError(msg) + return original_import(name, globals_, locals_, fromlist, level) + + try: + with patch.object( + builtins, "__import__", side_effect=mock_import + ): + with pytest.raises( + ImportError, match="parse" + ) as exc_info: + _get_date_patterns("en_US") + assert exc_info.typename == "BabelImportError" + assert "parse_date" in str(exc_info.value) + finally: + _bc._babel_available = None + + def test_babel_import_error_feature_name(self) -> None: + """BabelImportError contains correct feature name.""" + _get_date_patterns.cache_clear() + _bc._babel_available = None + + babel_modules_backup = {} + babel_keys = [ + k for k in sys.modules + if k == "babel" or k.startswith("babel.") + ] + for key in babel_keys: + babel_modules_backup[key] = sys.modules.pop(key, None) + + try: + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_: dict[str, object] | None = None, + locals_: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel" or name.startswith("babel."): + msg = f"No module named '{name}'" + raise ImportError(msg) + return original_import( + name, globals_, locals_, fromlist, level + ) + + with patch.object( + builtins, "__import__", side_effect=mock_import + ): + with pytest.raises( + ImportError, match="parse" + ) as exc_info: + _get_date_patterns("en_US") + assert "parse_date" in str(exc_info.value) + finally: + for key, value in babel_modules_backup.items(): + if value is not None: + sys.modules[key] = value + _get_date_patterns.cache_clear() + _bc._babel_available = None diff --git a/tests/parsing_dates_cases/get_datetime_patterns_exception_handling.py b/tests/parsing_dates_cases/get_datetime_patterns_exception_handling.py new file mode 100644 index 00000000..32f0772e --- /dev/null +++ b/tests/parsing_dates_cases/get_datetime_patterns_exception_handling.py @@ -0,0 +1,219 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _get_datetime_patterns Exception Handling +# ============================================================================ + + +class TestGetDatetimePatternsExceptions: + """Test _get_datetime_patterns exception handling.""" + + def test_unknown_locale_returns_empty(self) -> None: + """Unknown locale returns empty tuple.""" + _get_datetime_patterns.cache_clear() + assert _get_datetime_patterns("xx-UNKNOWN") == () + + def test_invalid_format_returns_empty(self) -> None: + """Invalid format returns empty tuple.""" + _get_datetime_patterns.cache_clear() + assert _get_datetime_patterns("invalid-locale-format-xyz") == () + + def test_valid_locale_returns_patterns(self) -> None: + """Valid locale returns non-empty patterns.""" + _get_datetime_patterns.cache_clear() + assert len(_get_datetime_patterns("en-US")) > 0 + + def test_cldr_pattern_success_path(self) -> None: + """Successful CLDR datetime pattern extraction via mock.""" + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + + class MockDateTimeFormat: + def __init__(self, pattern_str: str) -> None: + self._pattern = pattern_str + + @property + def pattern(self) -> str: + return self._pattern + + mock_short = MockDateTimeFormat("M/d/yy, h:mm a") + mock_medium = MockDateTimeFormat("MMM d, yyyy, h:mm:ss a") + mock_long = MockDateTimeFormat("MMMM d, yyyy 'at' h:mm:ss a") + + with patch.object(Locale, "parse") as mock_parse: + mock_locale = MagicMock() + mock_datetime_formats = MagicMock() + mock_datetime_formats.__getitem__ = MagicMock( + side_effect=lambda k: { + "short": mock_short, + "medium": mock_medium, + "long": mock_long, + }.get(k, mock_short) + ) + mock_datetime_formats.get = MagicMock( + return_value="{1}, {0}" + ) + mock_locale.datetime_formats = mock_datetime_formats + + mock_date_format = MockDateTimeFormat("M/d/yy") + mock_date_formats = MagicMock() + mock_date_formats.__getitem__ = MagicMock( + return_value=mock_date_format + ) + mock_locale.date_formats = mock_date_formats + mock_parse.return_value = mock_locale + + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + patterns = _get_datetime_patterns("mock-cldr-success-v1") + + assert len(patterns) > 0 + pattern_str = " ".join(p[0] for p in patterns) + assert "%" in pattern_str + + def test_attribute_error_in_pattern(self) -> None: + """AttributeError accessing datetime pattern handled gracefully.""" + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + + class RaisingFormat: + @property + def pattern(self) -> str: + msg = "no pattern attribute" + raise AttributeError(msg) + + mock_format = RaisingFormat() + + with patch.object(Locale, "parse") as mock_parse: + mock_locale = MagicMock() + mock_datetime_formats = MagicMock() + mock_datetime_formats.__getitem__ = MagicMock( + return_value=mock_format + ) + mock_datetime_formats.get = MagicMock(return_value=None) + mock_locale.datetime_formats = mock_datetime_formats + mock_date_formats = MagicMock() + mock_date_formats.__getitem__ = MagicMock( + return_value=mock_format + ) + mock_locale.date_formats = mock_date_formats + mock_parse.return_value = mock_locale + + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + patterns = _get_datetime_patterns( + "mock-locale-datetime-attr-err-v3" + ) + + assert len(patterns) > 0 + + def test_key_error_via_missing_key(self) -> None: + """KeyError accessing datetime style handled gracefully.""" + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + + with patch.object(Locale, "parse") as mock_parse: + mock_locale = MagicMock() + mock_datetime_formats = MagicMock() + mock_datetime_formats.__getitem__ = MagicMock( + side_effect=KeyError("No format") + ) + mock_datetime_formats.get = MagicMock(return_value=None) + mock_locale.datetime_formats = mock_datetime_formats + mock_date_formats = MagicMock() + mock_date_formats.__getitem__ = MagicMock( + side_effect=KeyError("No format") + ) + mock_locale.date_formats = mock_date_formats + mock_parse.return_value = mock_locale + + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + patterns = _get_datetime_patterns( + "mock-locale-keyerror-v2" + ) + + assert patterns == () + + def test_raises_babel_import_error_when_babel_missing(self) -> None: + """Raises BabelImportError when Babel unavailable.""" + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + _bc._babel_available = None + + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_: dict[str, object] | None = None, + locals_: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel": + msg = "No module named 'babel'" + raise ImportError(msg) + return original_import(name, globals_, locals_, fromlist, level) + + try: + with patch.object( + builtins, "__import__", side_effect=mock_import + ): + with pytest.raises( + ImportError, match="parse" + ) as exc_info: + _get_datetime_patterns("en_US") + assert exc_info.typename == "BabelImportError" + assert "parse_datetime" in str(exc_info.value) + finally: + _bc._babel_available = None + + def test_babel_import_error_feature_name(self) -> None: + """BabelImportError contains correct feature name.""" + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + _bc._babel_available = None + + babel_modules_backup = {} + babel_keys = [ + k for k in sys.modules + if k == "babel" or k.startswith("babel.") + ] + for key in babel_keys: + babel_modules_backup[key] = sys.modules.pop(key, None) + + try: + original_import = builtins.__import__ + + def mock_import( + name: str, + globals_: dict[str, object] | None = None, + locals_: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel" or name.startswith("babel."): + msg = f"No module named '{name}'" + raise ImportError(msg) + return original_import( + name, globals_, locals_, fromlist, level + ) + + with patch.object( + builtins, "__import__", side_effect=mock_import + ): + with pytest.raises( + ImportError, match="parse" + ) as exc_info: + _get_datetime_patterns("en_US") + assert "parse_datetime" in str(exc_info.value) + finally: + for key, value in babel_modules_backup.items(): + if value is not None: + sys.modules[key] = value + _get_datetime_patterns.cache_clear() + _get_date_patterns.cache_clear() + _bc._babel_available = None diff --git a/tests/parsing_dates_cases/hypothesis_property_tests.py b/tests/parsing_dates_cases/hypothesis_property_tests.py new file mode 100644 index 00000000..9112aa13 --- /dev/null +++ b/tests/parsing_dates_cases/hypothesis_property_tests.py @@ -0,0 +1,62 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Hypothesis Property Tests +# ============================================================================ + + +class TestDatetimeProperties: + """Property-based tests for datetime parsing.""" + + @given( + hour=st.integers(min_value=0, max_value=23), + minute=st.integers(min_value=0, max_value=59), + ) + def test_parse_datetime_various_times( + self, hour: int, minute: int + ) -> None: + """PROPERTY: Datetime patterns handle various times.""" + time_of_day = "morning" if hour < 12 else "afternoon" + event(f"time_of_day={time_of_day}") + + date_str = f"28.01.25, {hour:02d}:{minute:02d}" + result, errors = parse_datetime(date_str, "de_DE") + assert not errors + if result is not None: + assert result.hour == hour + assert result.minute == minute + + @given( + year=st.integers(min_value=2020, max_value=2030), + month=st.integers(min_value=1, max_value=12), + day=st.integers(min_value=1, max_value=28), + hour=st.integers(min_value=0, max_value=23), + minute=st.integers(min_value=0, max_value=59), + ) + def test_datetime_roundtrip( + self, + year: int, + month: int, + day: int, + hour: int, + minute: int, + ) -> None: + """PROPERTY: Datetime ISO formatted then parsed preserves values.""" + event(f"year={year}") + time_of_day = "morning" if hour < 12 else "afternoon" + event(f"time_of_day={time_of_day}") + + dt = datetime(year, month, day, hour, minute, 0, tzinfo=UTC) + iso_str = dt.strftime("%Y-%m-%d %H:%M:%S") + result, errors = parse_datetime(iso_str, "en_US") + + assert not errors + if result is not None: + assert result.year == year + assert result.month == month + assert result.day == day + assert result.hour == hour + assert result.minute == minute diff --git a/tests/parsing_dates_cases/integration_full_coverage_verification.py b/tests/parsing_dates_cases/integration_full_coverage_verification.py new file mode 100644 index 00000000..a021be49 --- /dev/null +++ b/tests/parsing_dates_cases/integration_full_coverage_verification.py @@ -0,0 +1,28 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Integration: Full Coverage Verification +# ============================================================================ + + +class TestIntegrationFullCoverage: + """Integration test exercising multiple code branches.""" + + def test_parse_datetime_exercises_all_branches(self) -> None: + """Exercise ISO, CLDR, error, and empty paths.""" + test_cases = [ + ("2025-01-28T14:30:00", "en_US", True), + ("1/28/25, 2:30 PM", "en_US", True), + ("not-a-datetime", "en_US", False), + ("", "en_US", False), + ] + for datetime_str, locale, should_succeed in test_cases: + result, errors = parse_datetime(datetime_str, locale) + if should_succeed: + assert result is not None or len(errors) > 0 + else: + assert len(errors) > 0 + assert result is None diff --git a/tests/parsing_dates_cases/parse_date_cases.py b/tests/parsing_dates_cases/parse_date_cases.py new file mode 100644 index 00000000..0c6b607e --- /dev/null +++ b/tests/parsing_dates_cases/parse_date_cases.py @@ -0,0 +1,54 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# parse_date +# --------------------------------------------------------------------------- + + +class TestParseDate: + """Test parse_date() function.""" + + def test_parse_date_us_format(self) -> None: + """Parse US date format (M/d/yy - CLDR short format).""" + result, errors = parse_date("1/28/25", "en_US") + assert not errors + assert result == date(2025, 1, 28) + + def test_parse_date_european_format(self) -> None: + """Parse European date format (d.M.yy - CLDR short format).""" + result, errors = parse_date("28.1.25", "lv_LV") + assert not errors + assert result == date(2025, 1, 28) + + result, errors = parse_date("28.01.25", "de_DE") + assert not errors + assert result == date(2025, 1, 28) + + def test_parse_date_iso_format(self) -> None: + """Parse ISO 8601 date format.""" + result, errors = parse_date("2025-01-28", "en_US") + assert not errors + assert result == date(2025, 1, 28) + + def test_parse_date_ignores_bidi_isolation_marks(self) -> None: + """Invisible bidi controls do not block ISO parsing.""" + result, errors = parse_date("\u20682025-01-28\u2069", "en_US") + assert not errors + assert result == date(2025, 1, 28) + + def test_parse_date_invalid_returns_error(self) -> None: + """Invalid input returns error in tuple; function never raises.""" + result, errors = parse_date("invalid", "en_US") + assert len(errors) > 0 + assert result is None + assert errors[0].parse_type == "date" + assert errors[0].input_value == "invalid" + + def test_parse_date_empty_returns_error(self) -> None: + """Empty input returns error in list.""" + result, errors = parse_date("", "en_US") + assert len(errors) > 0 + assert result is None diff --git a/tests/parsing_dates_cases/parse_datetime_cases.py b/tests/parsing_dates_cases/parse_datetime_cases.py new file mode 100644 index 00000000..0709b346 --- /dev/null +++ b/tests/parsing_dates_cases/parse_datetime_cases.py @@ -0,0 +1,90 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# --------------------------------------------------------------------------- +# parse_datetime +# --------------------------------------------------------------------------- + + +class TestParseDatetime: + """Test parse_datetime() function.""" + + def test_parse_datetime_us_format(self) -> None: + """Parse US datetime format (M/d/yy + time - CLDR).""" + result, errors = parse_datetime("1/28/25, 14:30", "en_US") + assert not errors + assert result == datetime(2025, 1, 28, 14, 30) + + def test_parse_datetime_european_format(self) -> None: + """Parse European datetime format (d.M.yy + time - CLDR).""" + result, errors = parse_datetime("28.1.25 14:30", "lv_LV") + assert not errors + assert result == datetime(2025, 1, 28, 14, 30) + + def test_parse_datetime_with_timezone(self) -> None: + """Parse datetime and apply timezone.""" + result, errors = parse_datetime( + "2025-01-28 14:30", "en_US", tzinfo=UTC + ) + assert not errors + assert result == datetime(2025, 1, 28, 14, 30, tzinfo=UTC) + + def test_parse_datetime_ignores_bidi_isolation_marks(self) -> None: + """Invisible bidi controls do not block ISO datetime parsing.""" + result, errors = parse_datetime("\u20682025-01-28 14:30:00\u2069", "en_US") + assert not errors + assert result == datetime(2025, 1, 28, 14, 30) + + def test_parse_datetime_invalid_returns_error(self) -> None: + """Invalid input returns error in tuple; function never raises.""" + result, errors = parse_datetime("invalid", "en_US") + assert len(errors) > 0 + assert result is None + assert errors[0].parse_type == "datetime" + + def test_parse_datetime_empty_returns_error(self) -> None: + """Empty input returns error in list.""" + result, errors = parse_datetime("", "en_US") + assert len(errors) > 0 + assert result is None + + def test_parse_datetime_with_seconds(self) -> None: + """Datetime parsing with seconds component.""" + result, errors = parse_datetime("28.01.25, 14:30:45", "de_DE") + assert not errors + assert result is not None + assert result.hour == 14 + assert result.minute == 30 + assert result.second == 45 + + def test_parse_datetime_iso_format_all_locales(self) -> None: + """ISO format works across all locales.""" + iso_str = "2025-01-28 14:30:00" + for locale in [ + "en_US", "de_DE", "fr_FR", "es_ES", "ja_JP", "zh_CN" + ]: + result, errors = parse_datetime(iso_str, locale) + assert not errors + assert result is not None, f"ISO format failed for {locale}" + assert result.year == 2025 + assert result.month == 1 + assert result.day == 28 + + def test_parse_datetime_with_working_formats(self) -> None: + """Datetime parsing with CLDR locale-specific separators.""" + test_cases = [ + ("01/28/25, 14:30", "en_US"), + ("01/28/25, 02:30 PM", "en_US"), + ("28.01.25, 14:30", "de_DE"), + ] + for date_str, locale in test_cases: + result, errors = parse_datetime(date_str, locale) + assert not errors + assert result is not None, ( + f"Failed to parse '{date_str}' for {locale}" + ) + assert result.year == 2025 + assert result.month == 1 + assert result.day == 28 diff --git a/tests/parsing_dates_cases/preprocess_datetime_input.py b/tests/parsing_dates_cases/preprocess_datetime_input.py new file mode 100644 index 00000000..239c9579 --- /dev/null +++ b/tests/parsing_dates_cases/preprocess_datetime_input.py @@ -0,0 +1,31 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _preprocess_datetime_input +# ============================================================================ + + +class TestPreprocessDatetimeInput: + """Test _preprocess_datetime_input function.""" + + def test_with_has_era_true(self) -> None: + """has_era=True triggers _strip_era.""" + result = _preprocess_datetime_input("28 Jan 2025 AD", has_era=True) + assert "AD" not in result + assert result == "28 Jan 2025" + + def test_with_has_era_false(self) -> None: + """has_era=False returns value unchanged.""" + value = "2025-01-28 14:30:00" + assert _preprocess_datetime_input(value, has_era=False) == value + + def test_with_era_and_timezone(self) -> None: + """Era is stripped but timezone preserved.""" + result = _preprocess_datetime_input( + "28 Jan 2025 AD PST", has_era=True + ) + assert "AD" not in result + assert "PST" in result diff --git a/tests/parsing_dates_cases/quoted_literals_in_cldr_patterns.py b/tests/parsing_dates_cases/quoted_literals_in_cldr_patterns.py new file mode 100644 index 00000000..d08b3e2f --- /dev/null +++ b/tests/parsing_dates_cases/quoted_literals_in_cldr_patterns.py @@ -0,0 +1,33 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Quoted Literals in CLDR Patterns +# ============================================================================ + + +class TestQuotedLiteralsInCLDRPatterns: + """Test non-empty quoted literals in CLDR date patterns.""" + + def test_parse_date_russian(self) -> None: + """Russian date parsing with short format.""" + result, errors = parse_date("28.01.2025", "ru_RU") + assert not errors + assert result is not None + assert result.year == 2025 + + def test_parse_date_spanish(self) -> None: + """Spanish short format d/M/yy.""" + result, errors = parse_date("28/01/25", "es_ES") + assert not errors + assert result is not None + assert result.year == 2025 + + def test_parse_date_portuguese(self) -> None: + """Portuguese date format.""" + result, errors = parse_date("28/01/2025", "pt_PT") + assert not errors + assert result is not None + assert result.year == 2025 diff --git a/tests/parsing_dates_cases/time_first_datetime_ordering.py b/tests/parsing_dates_cases/time_first_datetime_ordering.py new file mode 100644 index 00000000..e34b2cc4 --- /dev/null +++ b/tests/parsing_dates_cases/time_first_datetime_ordering.py @@ -0,0 +1,109 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Time-First Datetime Ordering +# ============================================================================ + + +class TestDatetimeTimeFirstOrdering: + """Test time-first datetime ordering (mock locales).""" + + def test_time_first_ordering(self) -> None: + """Mock locale with time-first ordering generates patterns.""" + _get_datetime_patterns.cache_clear() + + original_parse = Locale.parse + + def mock_parse_time_first(locale_str: str) -> MagicMock: + real_locale = original_parse(locale_str) + mock_locale = MagicMock(spec=Locale) + + time_first_pattern = "{0} {1}" + mock_datetime_format = MagicMock( + return_value=time_first_pattern + ) + mock_datetime_format.__str__ = MagicMock( # type: ignore[method-assign] + return_value=time_first_pattern + ) + mock_datetime_format.pattern = time_first_pattern + + mock_locale.datetime_formats = { + "short": mock_datetime_format, + "medium": mock_datetime_format, + "long": mock_datetime_format, + } + mock_locale.date_formats = real_locale.date_formats + return mock_locale + + with patch( + "babel.Locale.parse", side_effect=mock_parse_time_first + ): + patterns = _get_datetime_patterns("en_US") + + assert len(patterns) > 0 + + time_first_found = False + for pattern, _has_era in patterns: + time_pos = min( + ( + pattern.find(t) + for t in ["%H", "%I"] + if pattern.find(t) != -1 + ), + default=-1, + ) + date_pos = min( + ( + pattern.find(d) + for d in ["%d", "%m", "%Y"] + if pattern.find(d) != -1 + ), + default=-1, + ) + if ( + time_pos != -1 + and date_pos != -1 + and time_pos < date_pos + ): + time_first_found = True + break + + assert time_first_found + _get_datetime_patterns.cache_clear() + + def test_parse_datetime_with_time_first_locale(self) -> None: + """Integration: parse datetime with time-first mock locale.""" + _get_datetime_patterns.cache_clear() + + original_parse = Locale.parse + + def mock_parse_time_first(locale_str: str) -> MagicMock: + real_locale = original_parse(locale_str) + mock_locale = MagicMock(spec=Locale) + + time_first_pattern = "{0} {1}" + mock_datetime_format = MagicMock( + return_value=time_first_pattern + ) + mock_datetime_format.__str__ = MagicMock( # type: ignore[method-assign] + return_value=time_first_pattern + ) + mock_locale.datetime_formats = { + "short": mock_datetime_format, + "medium": mock_datetime_format, + } + mock_locale.date_formats = real_locale.date_formats + return mock_locale + + with patch( + "babel.Locale.parse", side_effect=mock_parse_time_first + ): + result, _errors = parse_datetime( + "14:30 28.01.2025", "de_DE" + ) + + assert result is None or result.year in (2025, 1925) + _get_datetime_patterns.cache_clear() diff --git a/tests/parsing_dates_cases/tokenize_babel_pattern.py b/tests/parsing_dates_cases/tokenize_babel_pattern.py new file mode 100644 index 00000000..10a9d2e8 --- /dev/null +++ b/tests/parsing_dates_cases/tokenize_babel_pattern.py @@ -0,0 +1,114 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _tokenize_babel_pattern +# ============================================================================ + + +class TestTokenizeBabelPattern: + """Test CLDR pattern tokenizer quote handling.""" + + def test_simple_quoted_literal(self) -> None: + """Simple quoted literal is extracted as single token.""" + tokens = _tokenize_babel_pattern("h 'at' a") + assert "at" in tokens + + def test_escaped_quote_outside(self) -> None: + """Two quotes '' outside a quoted section produce literal quote.""" + tokens = _tokenize_babel_pattern("h''mm") + assert "'" in tokens + + def test_escaped_quote_inside(self) -> None: + """Two quotes '' inside quoted text produce literal quote.""" + tokens = _tokenize_babel_pattern("h 'o''clock' a") + assert "o'clock" in tokens + + def test_irish_locale_pattern(self) -> None: + """Quoted literals in locale patterns.""" + tokens = _tokenize_babel_pattern("d MMMM 'de' yyyy") + assert "de" in tokens + assert "d" in tokens + assert "yyyy" in tokens + + def test_standard_pattern_unchanged(self) -> None: + """Standard patterns without quotes work correctly.""" + tokens = _tokenize_babel_pattern("yyyy-MM-dd") + assert tokens == ["yyyy", "-", "MM", "-", "dd"] + + def test_latvian_pattern(self) -> None: + """Latvian date pattern d.MM.yyyy.""" + tokens = _tokenize_babel_pattern("d.MM.yyyy") + assert tokens == ["d", ".", "MM", ".", "yyyy"] + + def test_empty_pattern(self) -> None: + """Empty pattern produces empty token list.""" + assert _tokenize_babel_pattern("") == [] + + def test_unclosed_quote(self) -> None: + """Unclosed quote at end is handled gracefully.""" + tokens = _tokenize_babel_pattern("h 'unclosed") + assert "h" in tokens + assert "unclosed" in tokens + + def test_empty_quoted_section(self) -> None: + """Empty quotes '' produce single quote, not empty token.""" + tokens = _tokenize_babel_pattern("a''b") + assert "'" in tokens + assert "a" in tokens + assert "b" in tokens + + def test_adjacent_quoted_sections(self) -> None: + """Multiple adjacent quotes produce multiple literal quotes.""" + tokens = _tokenize_babel_pattern("''''") + assert tokens.count("'") == 2 + + def test_just_two_quotes(self) -> None: + """Just '' produces single quote.""" + tokens = _tokenize_babel_pattern("''") + assert "'" in tokens + + def test_three_quotes(self) -> None: + """Three quotes: first two produce quote, third starts section.""" + tokens = _tokenize_babel_pattern("'''") + assert "'" in tokens + + def test_real_world_german_pattern(self) -> None: + """German pattern with quoted 'um' literal.""" + tokens = _tokenize_babel_pattern("d. MMMM yyyy 'um' HH:mm") + assert "um" in tokens + assert "d" in tokens + assert "MMMM" in tokens + + def test_real_world_at_pattern(self) -> None: + """Pattern with 'at' literal.""" + tokens = _tokenize_babel_pattern( + "EEEE, MMMM d, y 'at' h:mm a" + ) + assert "at" in tokens + + def test_pattern_ending_in_quote(self) -> None: + """Pattern ending with unclosed quote handled gracefully.""" + tokens = _tokenize_babel_pattern("yyyy 'test") + assert "yyyy" in tokens + assert "test" in tokens + + def test_russian_quoted_literal(self) -> None: + """Russian pattern with quoted Cyrillic year marker.""" + pattern = "d MMMM y '\u0433'." + tokens = _tokenize_babel_pattern(pattern) + assert "\u0433" in tokens + assert "d" in tokens + assert "MMMM" in tokens + assert "y" in tokens + assert "." in tokens + + def test_spanish_quoted_de(self) -> None: + """Spanish pattern d 'de' MMMM 'de' y with quoted 'de'.""" + tokens = _tokenize_babel_pattern("d 'de' MMMM 'de' y") + assert "de" in tokens + assert "d" in tokens + assert "MMMM" in tokens + assert "y" in tokens diff --git a/tests/parsing_dates_cases/unknown_locale_handling.py b/tests/parsing_dates_cases/unknown_locale_handling.py new file mode 100644 index 00000000..6c81768d --- /dev/null +++ b/tests/parsing_dates_cases/unknown_locale_handling.py @@ -0,0 +1,54 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_parsing_dates.py.""" + +from tests.parsing_dates_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Unknown Locale Handling +# ============================================================================ + + +class TestParseDateUnknownLocale: + """Test parse_date with unknown locale.""" + + def test_iso_format_succeeds(self) -> None: + """ISO format succeeds even with unknown locale.""" + result, errors = parse_date("2025-01-01", "xx-INVALID") + assert result is not None + assert len(errors) == 0 + + def test_non_iso_format_fails(self) -> None: + """Non-ISO format with unknown locale returns error.""" + result, errors = parse_date("01/28/2025", "xx-INVALID") + assert result is None + assert len(errors) == 1 + assert errors[0].parse_type == "date" + + def test_malformed_locale(self) -> None: + """Malformed locale returns error for non-ISO format.""" + result, errors = parse_date( + "28.01.2025", "not-a-valid-locale-format" + ) + assert result is None + assert len(errors) == 1 + + +class TestParseDatetimeUnknownLocale: + """Test parse_datetime with unknown locale.""" + + def test_iso_format_succeeds(self) -> None: + """ISO format succeeds even with unknown locale.""" + result, errors = parse_datetime( + "2025-01-28T14:30:00", "xx-INVALID" + ) + assert result is not None + assert len(errors) == 0 + + def test_non_iso_format_fails(self) -> None: + """Non-ISO format with unknown locale returns error.""" + result, errors = parse_datetime( + "01/28/2025 2:30 PM", "xx-INVALID" + ) + assert result is None + assert len(errors) == 1 + assert errors[0].parse_type == "datetime" diff --git a/tests/runtime_bundle_cases/__init__.py b/tests/runtime_bundle_cases/__init__.py new file mode 100644 index 00000000..81d4e037 --- /dev/null +++ b/tests/runtime_bundle_cases/__init__.py @@ -0,0 +1,29 @@ +"""Tests for runtime.bundle: FluentBundle resource loading, formatting, branch coverage.""" + +from __future__ import annotations + +import logging +from typing import Any +from unittest.mock import Mock, patch + +import pytest +from hypothesis import assume, event, example, given +from hypothesis import strategies as st + +from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT, MAX_SOURCE_SIZE +from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError, ValidationError +from ftllexengine.integrity import FormattingIntegrityError, SyntaxIntegrityError +from ftllexengine.runtime import FluentBundle +from ftllexengine.runtime.cache_config import CacheConfig +from ftllexengine.runtime.function_bridge import FunctionRegistry +from ftllexengine.runtime.functions import create_default_registry +from ftllexengine.validation.resource import validate_resource + +__all__ = [ + "MAX_LOCALE_LENGTH_HARD_LIMIT", "MAX_SOURCE_SIZE", "Any", "CacheConfig", + "ErrorCategory", "FluentBundle", "FormattingIntegrityError", "FrozenFluentError", + "FunctionRegistry", "Mock", "SyntaxIntegrityError", "ValidationError", + "assume", "create_default_registry", "event", "example", "given", "logging", + "normalize_locale", "patch", "pytest", "st", "validate_resource", +] diff --git a/tests/runtime_bundle_cases/basic.py b/tests/runtime_bundle_cases/basic.py new file mode 100644 index 00000000..3b0a31a2 --- /dev/null +++ b/tests/runtime_bundle_cases/basic.py @@ -0,0 +1,778 @@ +# mypy: ignore-errors +from tests.runtime_bundle_cases import ( + Any, + CacheConfig, + ErrorCategory, + FluentBundle, + FrozenFluentError, + Mock, + patch, + pytest, +) + + +class TestFluentBundleCreation: + """Test FluentBundle initialization.""" + + def test_create_bundle_with_locale(self) -> None: + """Create bundle with locale code.""" + bundle = FluentBundle("lv_LV") + + assert bundle.locale == "lv_lv" + + def test_create_bundle_initializes_empty_registries(self) -> None: + """Bundle starts with empty message/term registries.""" + bundle = FluentBundle("en_US") + + assert len(bundle.get_message_ids()) == 0 + assert not bundle.has_message("any-message") + + +class TestFluentBundleAddResource: + """Test FluentBundle add_resource method.""" + + @pytest.fixture + def bundle(self) -> Any: + """Create bundle for testing.""" + return FluentBundle("lv_LV", strict=False) + + def test_add_resource_simple_message(self, bundle: Any) -> None: + """add_resource parses and registers simple message.""" + bundle.add_resource("hello = Sveiki, pasaule!") + + assert bundle.has_message("hello") + assert "hello" in bundle.get_message_ids() + + def test_add_resource_multiple_messages(self, bundle: Any) -> None: + """add_resource registers all messages from source.""" + source = """ +hello = Sveiki! +goodbye = Uz redzēšanos! +thanks = Paldies! +""" + bundle.add_resource(source) + + assert bundle.has_message("hello") + assert bundle.has_message("goodbye") + assert bundle.has_message("thanks") + assert len(bundle.get_message_ids()) == 3 + + def test_add_resource_message_with_variable(self, bundle: Any) -> None: + """add_resource handles messages with variables.""" + bundle.add_resource("welcome = Laipni lūdzam, { $name }!") + + assert bundle.has_message("welcome") + + def test_add_resource_message_with_attribute(self, bundle: Any) -> None: + """add_resource handles messages with attributes.""" + source = """ +button-save = Saglabāt + .tooltip = Saglabā ierakstu +""" + bundle.add_resource(source) + + assert bundle.has_message("button-save") + + def test_add_resource_with_junk_entries_continues(self, bundle: Any) -> None: + """add_resource with non-critical syntax errors creates junk but continues.""" + # Parser is robust - creates Junk entries for invalid syntax but doesn't crash + bundle.add_resource("invalid message syntax") + + # Bundle should still work, junk is just ignored + assert len(bundle.get_message_ids()) == 0 # No valid messages parsed + + def test_add_multiple_resources_accumulates(self, bundle: Any) -> None: + """Multiple add_resource calls accumulate messages.""" + bundle.add_resource("msg1 = First") + bundle.add_resource("msg2 = Second") + + assert bundle.has_message("msg1") + assert bundle.has_message("msg2") + assert len(bundle.get_message_ids()) == 2 + + +class TestFluentBundleFormatPattern: + """Test FluentBundle format_pattern method.""" + + @pytest.fixture + def bundle(self) -> Any: + """Create bundle with sample messages.""" + bundle = FluentBundle("lv_LV", strict=False) + bundle.add_resource(""" +hello = Sveiki, pasaule! +welcome = Laipni lūdzam, { $name }! +greeting = { $name } saka { $message } +button-save = Saglabāt + .tooltip = Saglabā ierakstu datubāzē +""") + return bundle + + def test_format_pattern_simple_message(self, bundle: Any) -> None: + """format_pattern returns simple message text.""" + result, errors = bundle.format_pattern("hello") + + assert result == "Sveiki, pasaule!" + assert errors == (), f"Unexpected errors: {errors}" + + def test_format_pattern_with_variable(self, bundle: Any) -> None: + """format_pattern substitutes variable from args.""" + result, errors = bundle.format_pattern("welcome", {"name": "Jānis"}) + + assert "Jānis" in result + assert "Laipni lūdzam" in result + assert errors == (), f"Unexpected errors: {errors}" + + def test_format_pattern_with_multiple_variables(self, bundle: Any) -> None: + """format_pattern substitutes multiple variables.""" + result, errors = bundle.format_pattern("greeting", {"name": "Anna", "message": "Sveiki"}) + + assert "Anna" in result + assert "Sveiki" in result + assert errors == (), f"Unexpected errors: {errors}" + + def test_format_pattern_missing_variable_uses_placeholder(self, bundle: Any) -> None: + """format_pattern handles missing variable gracefully.""" + result, errors = bundle.format_pattern("welcome", {}) + + # Should not crash, returns some fallback + assert isinstance(result, str) + assert len(errors) == 1, ( + f"Expected 1 error for missing variable, got {len(errors)}: {errors}" + ) + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + assert "variable" in str(errors[0]).lower() or "name" in str(errors[0]).lower() + + def test_format_pattern_with_attribute_parameter(self, bundle: Any) -> None: + """format_pattern accepts attribute parameter.""" + result, errors = bundle.format_pattern("button-save", attribute="tooltip") + + # Should successfully retrieve the .tooltip attribute + assert result == "Saglabā ierakstu datubāzē" + assert errors == (), f"Unexpected errors: {errors}" + + def test_format_pattern_missing_message_raises_error(self, bundle: Any) -> None: + """format_pattern for non-existent message raises FrozenFluentError.""" + result, errors = bundle.format_pattern("nonexistent-message") + assert len(errors) == 1 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + assert "not found" in str(errors[0]).lower() + assert result == "{nonexistent-message}" + + def test_format_pattern_none_args(self, bundle: Any) -> None: + """format_pattern with args=None works for messages without variables.""" + result, errors = bundle.format_pattern("hello", None) + + assert result == "Sveiki, pasaule!" + assert errors == (), f"Unexpected errors: {errors}" + + def test_format_pattern_empty_args(self, bundle: Any) -> None: + """format_pattern with empty dict works.""" + result, errors = bundle.format_pattern("hello", {}) + + assert result == "Sveiki, pasaule!" + assert errors == (), f"Unexpected errors: {errors}" + + +class TestFluentBundleHasMessage: + """Test FluentBundle has_message method.""" + + @pytest.fixture + def bundle(self) -> Any: + """Create bundle with messages.""" + bundle = FluentBundle("en_US") + bundle.add_resource("existing = This message exists") + return bundle + + def test_has_message_returns_true_when_exists(self, bundle: Any) -> None: + """has_message returns True for existing message.""" + assert bundle.has_message("existing") is True + + def test_has_message_returns_false_when_not_exists(self, bundle: Any) -> None: + """has_message returns False for non-existent message.""" + assert bundle.has_message("nonexistent") is False + + +class TestFluentBundleGetMessageIds: + """Test FluentBundle get_message_ids method.""" + + def test_get_message_ids_empty_bundle(self) -> None: + """get_message_ids returns empty list for new bundle.""" + bundle = FluentBundle("de_DE") + + assert bundle.get_message_ids() == [] + + def test_get_message_ids_returns_all_ids(self) -> None: + """get_message_ids returns all registered message IDs.""" + bundle = FluentBundle("pl_PL") + bundle.add_resource(""" +msg1 = First +msg2 = Second +msg3 = Third +""") + + ids = bundle.get_message_ids() + + assert len(ids) == 3 + assert "msg1" in ids + assert "msg2" in ids + assert "msg3" in ids + + +class TestFluentBundleAddFunction: + """Test FluentBundle add_function method.""" + + @pytest.fixture + def bundle(self) -> Any: + """Create bundle.""" + return FluentBundle("en_US") + + def test_add_function_registers_custom_function(self) -> None: + """add_function adds custom function to bundle.""" + bundle = FluentBundle("en", use_isolating=False) + + def custom(value: object) -> str: + return str(value).upper() + + bundle.add_function("CUSTOM", custom) + + # Verify function works by using it in a message + bundle.add_resource("msg = { CUSTOM($val) }") + result, _ = bundle.format_pattern("msg", {"val": "test"}) + assert result == "TEST" + + def test_add_function_with_callable(self) -> None: + """add_function accepts any callable.""" + bundle = FluentBundle("en", use_isolating=False) + + # Function must return string per spec + bundle.add_function("LAMBDA", lambda x: str(int(x) * 2)) + bundle.add_resource("msg = { LAMBDA($n) }") + result, _ = bundle.format_pattern("msg", {"n": "5"}) + assert result == "10" + + +class TestFluentBundleErrorHandling: + """Test FluentBundle error handling and edge cases.""" + + @pytest.fixture + def bundle(self) -> Any: + """Create bundle with test message.""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource("test = Test message") + return bundle + + def test_format_pattern_handles_resolver_errors_gracefully(self, bundle: Any) -> None: + """format_pattern returns fallback on resolver errors.""" + # Add message that references undefined variable + bundle.add_resource("broken-msg = Value is { $undefined }") + + result, errors = bundle.format_pattern("broken-msg", {}) + + # Should return result with variable fallback, plus error + assert isinstance(result, str) + assert "{$undefined}" in result # Variable fallback + assert len(errors) >= 1, f"Expected at least 1 error for undefined variable, got {errors}" + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + + def test_format_pattern_handles_key_error_gracefully(self, bundle: Any) -> None: + """format_pattern handles KeyError (missing variable) gracefully.""" + bundle.add_resource("needs-var = Hello { $name }") + + # Call without providing required variable + result, errors = bundle.format_pattern("needs-var", {}) + + # Should return result with variable fallback, plus error + assert isinstance(result, str) + assert "{$name}" in result + assert len(errors) >= 1, f"Expected error for missing variable, got {errors}" + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + + def test_format_pattern_handles_attribute_error_gracefully(self, bundle: Any) -> None: + """format_pattern handles AttributeError gracefully.""" + bundle.add_resource("attr-msg = Test") + + # Try to access non-existent attribute + result, errors = bundle.format_pattern("attr-msg", attribute="nonexistent") + + # Should handle gracefully with fallback + error + assert isinstance(result, str) + assert len(errors) >= 1, f"Expected error for nonexistent attribute, got {errors}" + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + assert "attribute" in str(errors[0]).lower() + + def test_format_pattern_handles_unexpected_errors_gracefully(self, bundle: Any) -> None: + """format_pattern catches unexpected exceptions.""" + # Even if something goes really wrong, bundle should not crash + result, errors = bundle.format_pattern("test", {}) + + assert result == "Test message" + assert errors == (), f"Unexpected errors: {errors}" + + def test_add_resource_with_terms_and_junk(self) -> None: + """add_resource handles mix of messages, terms, and junk.""" + bundle = FluentBundle("en_US", strict=False) + + source = """ +message1 = Hello +-term1 = Brand Name +message2 = Goodbye +invalid syntax here +-term2 = Another Term +""" + bundle.add_resource(source) + + # Messages should be registered + assert bundle.has_message("message1") + assert bundle.has_message("message2") + + # Terms should not appear in messages + assert not bundle.has_message("-term1") + + # Should have exactly 2 messages + assert len(bundle.get_message_ids()) == 2 + + +class TestFluentBundleIntegration: + """Integration tests for FluentBundle with complex scenarios.""" + + def test_complete_workflow_simple(self) -> None: + """Full workflow: create, add resource, format.""" + bundle = FluentBundle("lv_LV") + bundle.add_resource("greeting = Sveiki, { $name }!") + + result, errors = bundle.format_pattern("greeting", {"name": "Pēteris"}) + + assert "Sveiki" in result + assert "Pēteris" in result + assert errors == (), f"Unexpected errors: {errors}" + + def test_multiple_locales_independent(self) -> None: + """Multiple bundles for different locales are independent.""" + bundle_lv = FluentBundle("lv_LV") + bundle_en = FluentBundle("en_US") + + bundle_lv.add_resource("hello = Sveiki!") + bundle_en.add_resource("hello = Hello!") + + result_lv, errors_lv = bundle_lv.format_pattern("hello") + assert result_lv == "Sveiki!" + assert errors_lv == () + result_en, errors_en = bundle_en.format_pattern("hello") + assert result_en == "Hello!" + assert errors_en == () + + def test_overwrite_message_with_new_resource(self) -> None: + """Adding resource with same message ID overwrites.""" + bundle = FluentBundle("en_US") + + bundle.add_resource("msg = Original") + result1, errors1 = bundle.format_pattern("msg") + assert result1 == "Original" + assert errors1 == () + + bundle.add_resource("msg = Updated") + result2, errors2 = bundle.format_pattern("msg") + assert result2 == "Updated" + assert errors2 == () + + +class TestFluentBundleEdgeCases: + """Test edge cases and additional coverage paths.""" + + def test_add_resource_with_terms_only(self) -> None: + """Bundle handles resources with only terms (no messages).""" + bundle = FluentBundle("en_US") + + # Add resource with only terms (lines 76-77) + bundle.add_resource(""" +-brand = MyApp +-version = 3.0 +-company = MyCompany +""") + + # No messages should be registered + assert len(bundle.get_message_ids()) == 0 + + # But terms are registered internally (can't query them directly) + # This exercises lines 76-77 (term registration) + + def test_format_pattern_with_recursion_error(self) -> None: + """Bundle handles RecursionError gracefully (line 152-155).""" + bundle = FluentBundle("en_US", strict=False) + + # While we can't easily create a RecursionError through normal means, + # we can test that other error types return fallback + bundle.add_resource("test-msg = Hello { $name }") + + # Missing variable triggers error path + result, errors = bundle.format_pattern("test-msg", {}) + + # Should return result with variable fallback, plus error + assert isinstance(result, str) + assert "{$name}" in result + assert len(errors) >= 1, f"Expected error for missing variable, got {errors}" + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + + def test_format_pattern_with_exception_in_resolver(self) -> None: + """Bundle catches unexpected exceptions in resolver (lines 156-160).""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg = Test value") + + # Normal case works + result, errors = bundle.format_pattern("msg", {}) + assert result == "Test value" + assert errors == (), f"Unexpected errors: {errors}" + + # Even with weird args, should not crash + result, errors = bundle.format_pattern( + "msg", {"weird": object()} # type: ignore[dict-item] + ) + assert isinstance(result, str) + assert errors == (), f"Unexpected errors: {errors}" + + def test_add_resource_with_invalid_fluent_syntax(self) -> None: + """Bundle handles completely invalid Fluent syntax.""" + bundle = FluentBundle("en_US", strict=False) + + # This would trigger parser error recovery + source = """ +valid-msg = This works +{ invalid { nested { braces +another-valid = Also works +""" + bundle.add_resource(source) + + # Valid messages should still be registered + assert bundle.has_message("valid-msg") + assert bundle.has_message("another-valid") + + def test_format_pattern_with_keyerror_from_resolver(self) -> None: + """Bundle handles KeyError from resolver (lines 148-151).""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource("needs-var = Value: { $required }") + + # Missing required variable triggers KeyError path + result, errors = bundle.format_pattern("needs-var", {}) + + # Should return fallback with variable reference + assert result == "Value: {$required}" + assert len(errors) == 1 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + + def test_format_pattern_with_attribute_error_from_resolver(self) -> None: + """Bundle handles AttributeError from resolver (lines 148-151).""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource(""" +msg = Test message + .tooltip = Tooltip text +""") + + # Request non-existent attribute triggers AttributeError path + result, errors = bundle.format_pattern("msg", attribute="nonexistent") + + # Should return fallback with attribute reference + assert result == "{msg.nonexistent}" + assert len(errors) == 1 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + + def test_add_function_registers_successfully(self) -> None: + """Bundle can register custom functions.""" + bundle = FluentBundle("en_US") + + # Add custom function + def uppercase(text: object) -> str: + return str(text).upper() + + bundle.add_function("UPPERCASE", uppercase) + + # Function is registered (can't easily test usage without full parser support) + # This exercises the add_function method + bundle.add_resource("msg = Test message") + result, errors = bundle.format_pattern("msg", {}) + assert result == "Test message" + assert errors == (), f"Unexpected errors: {errors}" + + def test_get_message_ids_with_terms_excluded(self) -> None: + """get_message_ids returns only messages, not terms.""" + bundle = FluentBundle("en_US") + + bundle.add_resource(""" +message1 = First message +-term1 = A term +message2 = Second message +-term2 = Another term +""") + + ids = bundle.get_message_ids() + + # Should have exactly 2 messages + assert len(ids) == 2 + assert "message1" in ids + assert "message2" in ids + + # Terms should NOT be in message IDs + assert "-term1" not in ids + assert "-term2" not in ids + + +class TestFluentBundleMockedErrors: + """Test FluentBundle error handlers using mocking.""" + + def test_format_pattern_with_keyerror_exception(self) -> None: + """Bundle propagates KeyError from resolver (fail-fast behavior). + + Internal errors (KeyError, AttributeError, etc.) are no longer + caught. This ensures bugs are detected immediately rather than hidden + behind fallback values. + """ + bundle = FluentBundle("en_US") + bundle.add_resource("msg = Hello { $name }") + + # Patch the resolver instance directly; resolver is eagerly initialized + # so patching the FluentResolver class does not affect existing bundles. + mock_resolver = Mock() + mock_resolver.resolve_message.side_effect = KeyError("name") + # KeyError propagates (fail-fast) + with ( + patch.object(bundle, "_resolver", mock_resolver), + pytest.raises(KeyError, match="name"), + ): + bundle.format_pattern("msg", {}) + + def test_format_pattern_with_attribute_error_exception(self) -> None: + """Bundle propagates AttributeError from resolver (fail-fast behavior). + + Internal errors are no longer caught. + """ + bundle = FluentBundle("en_US") + bundle.add_resource("msg = Hello") + + # Patch the resolver instance directly; resolver is eagerly initialized. + mock_resolver = Mock() + mock_resolver.resolve_message.side_effect = AttributeError("Invalid attribute") + # AttributeError propagates (fail-fast) + with ( + patch.object(bundle, "_resolver", mock_resolver), + pytest.raises(AttributeError, match="Invalid attribute"), + ): + bundle.format_pattern("msg", {}) + + def test_format_pattern_with_recursion_error_exception(self) -> None: + """Bundle propagates RecursionError from resolver (fail-fast behavior). + + Internal errors are no longer caught. + """ + bundle = FluentBundle("en_US") + bundle.add_resource("msg = Hello") + + # Patch the resolver instance directly; resolver is eagerly initialized. + mock_resolver = Mock() + mock_resolver.resolve_message.side_effect = RecursionError("Maximum recursion") + # RecursionError propagates (fail-fast) + with ( + patch.object(bundle, "_resolver", mock_resolver), + pytest.raises(RecursionError, match="Maximum recursion"), + ): + bundle.format_pattern("msg", {}) + + def test_format_pattern_with_unexpected_exception(self) -> None: + """Bundle propagates unexpected exceptions from resolver (fail-fast behavior). + + Internal errors are no longer caught. Only FluentError subclasses + are part of the normal error handling flow. + """ + bundle = FluentBundle("en_US") + bundle.add_resource("msg = Hello") + + # Patch the resolver instance directly; resolver is eagerly initialized. + mock_resolver = Mock() + mock_resolver.resolve_message.side_effect = RuntimeError("Unexpected error") + # RuntimeError propagates (fail-fast) + with ( + patch.object(bundle, "_resolver", mock_resolver), + pytest.raises(RuntimeError, match="Unexpected error"), + ): + bundle.format_pattern("msg", {}) + + # Note: Lines 76-77 (term debug logging) are unreachable with current parser + # Parser doesn't support Term syntax (-term = value), so isinstance(entry, Term) + # is never True. This is acceptable dead code for future parser enhancement. + + +class TestFluentBundleValidateResource: + """Test FluentBundle.validate_resource() method (Phase 4: Validation API).""" + + @pytest.fixture + def bundle(self) -> FluentBundle: + """Create bundle for testing.""" + return FluentBundle("en_US") + + def test_validate_valid_resource(self, bundle: FluentBundle) -> None: + """validate_resource returns success for valid FTL.""" + source = """hello = Hello, world! +goodbye = Goodbye!""" + result = bundle.validate_resource(source) + + assert result.is_valid + assert result.error_count == 0 + assert result.warning_count == 0 + assert len(result.errors) == 0 + assert len(result.warnings) == 0 + + def test_validate_empty_resource(self, bundle: FluentBundle) -> None: + """validate_resource handles empty string.""" + result = bundle.validate_resource("") + + assert result.is_valid + assert result.error_count == 0 + + def test_validate_resource_with_variables(self, bundle: FluentBundle) -> None: + """validate_resource handles messages with variables.""" + source = "welcome = Hello, { $name }!" + result = bundle.validate_resource(source) + + assert result.is_valid + assert result.error_count == 0 + + def test_validate_resource_with_select(self, bundle: FluentBundle) -> None: + """validate_resource handles SELECT expressions.""" + source = """emails = { $count -> + [one] 1 email + *[other] { $count } emails +}""" + result = bundle.validate_resource(source) + + assert result.is_valid + assert result.error_count == 0 + + def test_validate_invalid_syntax_returns_errors(self, bundle: FluentBundle) -> None: + """validate_resource returns errors for invalid syntax.""" + source = "invalid syntax without equals sign" + result = bundle.validate_resource(source) + + assert not result.is_valid + assert result.error_count == 1 + assert len(result.errors) == 1 + + def test_validate_multiple_errors(self, bundle: FluentBundle) -> None: + """validate_resource returns all errors found.""" + source = """hello = Hello +invalid line 1 +goodbye = Goodbye +invalid line 2""" + result = bundle.validate_resource(source) + + assert not result.is_valid + assert result.error_count == 2 + assert len(result.errors) == 2 + + def test_validate_does_not_modify_bundle(self, bundle: FluentBundle) -> None: + """validate_resource does not add messages to bundle.""" + source = "hello = Hello, world!" + + # Validate first + result = bundle.validate_resource(source) + assert result.is_valid + + # Bundle should still be empty + assert len(bundle.get_message_ids()) == 0 + assert not bundle.has_message("hello") + + def test_validation_result_properties(self, bundle: FluentBundle) -> None: + """ValidationResult properties work correctly.""" + # Valid resource + valid_result = bundle.validate_resource("hello = Hello") + assert valid_result.is_valid is True + assert valid_result.error_count == 0 + assert valid_result.warning_count == 0 + + # Invalid resource + invalid_result = bundle.validate_resource("invalid") + assert invalid_result.is_valid is False + assert invalid_result.error_count >= 1 + assert invalid_result.warning_count == 0 + + +def test_use_isolating_enabled_by_default(): + """Bidi isolation should be enabled by default per Fluent spec.""" + bundle = FluentBundle("ar") + bundle.add_resource("msg = مرحبا { $name }!") + result, errors = bundle.format_pattern("msg", {"name": "Alice"}) + + # Should contain FSI (U+2068) and PDI (U+2069) marks + assert "\u2068Alice\u2069" in result + assert result == "مرحبا \u2068Alice\u2069!" + assert errors == (), f"Unexpected errors: {errors}" + + +def test_use_isolating_can_be_disabled(): + """Bidi isolation can be disabled for LTR-only applications.""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource("msg = Hello { $name }!") + result, errors = bundle.format_pattern("msg", {"name": "Alice"}) + + # Should NOT contain isolation marks + assert "\u2068" not in result + assert "\u2069" not in result + assert result == "Hello Alice!" + assert errors == (), f"Unexpected errors: {errors}" + + +def test_use_isolating_with_multiple_placeables(): + """Bidi isolation wraps each placeable independently.""" + bundle = FluentBundle("ar", use_isolating=True) + bundle.add_resource("msg = { $first } و { $second }") + result, errors = bundle.format_pattern("msg", {"first": "Alice", "second": "Bob"}) + + # Each placeable wrapped independently + assert result == "\u2068Alice\u2069 و \u2068Bob\u2069" + assert errors == (), f"Unexpected errors: {errors}" + + +def test_cache_enabled_property_when_enabled(): + """cache_enabled property returns True when caching enabled.""" + bundle = FluentBundle("en", cache=CacheConfig()) + assert bundle.cache_enabled is True + + +def test_cache_enabled_property_when_disabled(): + """cache_enabled property returns False when caching disabled.""" + bundle = FluentBundle("en") + assert bundle.cache_enabled is False + + +def test_cache_enabled_property_default(): + """cache_enabled property returns False by default.""" + bundle = FluentBundle("en") + assert bundle.cache_enabled is False + + +def test_cache_config_size_when_enabled(): + """cache_config.size returns configured size when caching enabled.""" + bundle = FluentBundle("en", cache=CacheConfig(size=500)) + assert bundle.cache_config is not None + assert bundle.cache_config.size == 500 + + +def test_cache_config_is_none_when_disabled(): + """cache_config returns None when caching is disabled.""" + bundle = FluentBundle("en") + assert bundle.cache_config is None + assert bundle.cache_enabled is False + + +# ============================================================================ +# Branch Coverage Classes (from test_bundle_branch_coverage) +# ============================================================================ + +# ============================================================================= +# Property Accessors +# ============================================================================= diff --git a/tests/runtime_bundle_cases/introspection.py b/tests/runtime_bundle_cases/introspection.py new file mode 100644 index 00000000..2478c1cd --- /dev/null +++ b/tests/runtime_bundle_cases/introspection.py @@ -0,0 +1,276 @@ +# mypy: ignore-errors +from tests.runtime_bundle_cases import ( + Any, + CacheConfig, + FluentBundle, + create_default_registry, + pytest, +) + + +class TestBundleIntrospection: + """Test introspection and query methods.""" + + def test_get_message_variables_returns_frozenset(self) -> None: + """get_message_variables returns frozenset of variable names.""" + bundle = FluentBundle("en") + bundle.add_resource("greeting = Hello, { $name }!") + variables = bundle.get_message_variables("greeting") + assert "name" in variables + assert isinstance(variables, frozenset) + + def test_get_message_variables_raises_keyerror(self) -> None: + """get_message_variables raises KeyError for missing message.""" + bundle = FluentBundle("en") + with pytest.raises(KeyError, match="not found"): + bundle.get_message_variables("nonexistent") + + def test_get_all_message_variables(self) -> None: + """get_all_message_variables returns dict of variable sets.""" + bundle = FluentBundle("en") + bundle.add_resource( + "greeting = Hello, { $name }!\n" + "farewell = Bye, { $first } { $last }!\n" + "simple = No variables\n" + ) + all_vars = bundle.get_all_message_variables() + assert all_vars["greeting"] == frozenset({"name"}) + assert all_vars["farewell"] == frozenset({"first", "last"}) + assert all_vars["simple"] == frozenset() + + def test_get_all_message_variables_empty_bundle(self) -> None: + """get_all_message_variables returns empty dict when empty.""" + bundle = FluentBundle("en") + assert bundle.get_all_message_variables() == {} + + def test_introspect_message_returns_metadata(self) -> None: + """introspect_message returns MessageIntrospection with metadata.""" + bundle = FluentBundle("en") + bundle.add_resource( + "price = { NUMBER($amount, minimumFractionDigits: 2) }" + ) + info = bundle.introspect_message("price") + assert "amount" in info.get_variable_names() + assert "NUMBER" in info.get_function_names() + + def test_introspect_message_raises_keyerror(self) -> None: + """introspect_message raises KeyError for missing message.""" + bundle = FluentBundle("en") + with pytest.raises(KeyError, match="not found"): + bundle.introspect_message("nonexistent") + + def test_introspect_term_returns_metadata(self) -> None: + """introspect_term returns MessageIntrospection for term.""" + bundle = FluentBundle("en") + bundle.add_resource( + "-brand = { $case ->\n" + " [nominative] Firefox\n" + " *[other] Firefox\n}\n" + ) + info = bundle.introspect_term("brand") + assert "case" in info.get_variable_names() + + def test_introspect_term_raises_keyerror(self) -> None: + """introspect_term raises KeyError for missing term.""" + bundle = FluentBundle("en") + with pytest.raises(KeyError, match="Term 'nonexistent' not found"): + bundle.introspect_term("nonexistent") + + def test_introspect_term_success(self) -> None: + """introspect_term returns valid data for existing term.""" + bundle = FluentBundle("en") + bundle.add_resource( + "-brand = Firefox\n .gender = masculine" + ) + info = bundle.introspect_term("brand") + assert info is not None + + def test_has_attribute_true(self) -> None: + """has_attribute returns True when attribute exists.""" + bundle = FluentBundle("en") + bundle.add_resource("button = Click\n .tooltip = Save\n") + assert bundle.has_attribute("button", "tooltip") is True + + def test_has_attribute_false_missing_attribute(self) -> None: + """has_attribute returns False when attribute missing.""" + bundle = FluentBundle("en") + bundle.add_resource("button = Click\n .tooltip = Save\n") + assert bundle.has_attribute("button", "nonexistent") is False + + def test_has_attribute_false_missing_message(self) -> None: + """has_attribute returns False when message missing.""" + bundle = FluentBundle("en") + bundle.add_resource("msg = Hello") + assert bundle.has_attribute("nonexistent", "tooltip") is False + + def test_has_attribute_multiple_attributes(self) -> None: + """has_attribute correctly checks among multiple attributes.""" + bundle = FluentBundle("en") + bundle.add_resource( + "button = Click\n" + " .tooltip = Tooltip\n" + " .aria-label = Label\n" + " .placeholder = Enter\n" + ) + assert bundle.has_attribute("button", "tooltip") is True + assert bundle.has_attribute("button", "aria-label") is True + assert bundle.has_attribute("button", "placeholder") is True + assert bundle.has_attribute("button", "missing") is False + + +# ============================================================================= +# Formatting (format_pattern error paths) +# ============================================================================= + + +class TestBundleFormatting: + """Test formatting methods and error handling.""" + + def test_format_pattern_formats_message(self) -> None: + """format_pattern formats message without attribute access.""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource("welcome = Hello, { $name }!") + result, errors = bundle.format_pattern("welcome", {"name": "Alice"}) + assert result == "Hello, Alice!" + assert errors == () + + def test_format_pattern_handles_recursion_error(self) -> None: + """format_pattern catches RecursionError from circular refs.""" + bundle = FluentBundle("en", strict=False) + bundle.add_resource("msg1 = { msg2 }\nmsg2 = { msg1 }\n") + _result, errors = bundle.format_pattern("msg1") + assert len(errors) > 0 + + +# ============================================================================= +# Custom Functions +# ============================================================================= + + +class TestBundleCustomFunctions: + """Test custom function registration and registry isolation.""" + + def test_custom_function_registered_and_works(self) -> None: + """add_function registers custom function successfully.""" + bundle = FluentBundle("en") + + def custom(value: Any) -> str: + return str(value).upper() + + bundle.add_function("CUSTOM", custom) + bundle.add_resource("msg = { CUSTOM($val) }") + result, _ = bundle.format_pattern("msg", {"val": "hello"}) + assert "HELLO" in result + + def test_add_function_clears_cache(self) -> None: + """add_function clears cache after registration.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("msg = Hello") + bundle.format_pattern("msg") + assert bundle.cache_usage == 1 + + def custom(v: Any) -> str: + return str(v) + + bundle.add_function("CUSTOM", custom) + assert bundle.cache_usage == 0 + + def test_add_function_without_cache(self) -> None: + """add_function works when cache is disabled.""" + bundle = FluentBundle("en", use_isolating=False) + + def custom(val: str) -> str: + return val.upper() + + bundle.add_function("CUSTOM", custom) + bundle.add_resource("msg = { CUSTOM($val) }") + result, _ = bundle.format_pattern("msg", {"val": "test"}) + assert result == "TEST" + + def test_init_with_custom_registry(self) -> None: + """FluentBundle accepts custom FunctionRegistry.""" + registry = create_default_registry() + + def my_func(_val: int) -> str: + return "custom" + + registry.register(my_func, ftl_name="CUSTOM") + bundle = FluentBundle("en", functions=registry) + bundle.add_resource("test = { CUSTOM(123) }") + result, errors = bundle.format_pattern("test") + assert not errors + assert "custom" in result + + def test_init_copies_registry_for_isolation(self) -> None: + """FluentBundle creates copy of registry for isolation.""" + original = create_default_registry() + bundle = FluentBundle("en", strict=False, functions=original) + + def new_func(_val: int) -> str: + return "new" + + original.register(new_func, ftl_name="NEWFUNC") + bundle.add_resource("test = { NEWFUNC(1) }") + result, errors = bundle.format_pattern("test") + assert len(errors) > 0 or "NEWFUNC" not in result + + +# ============================================================================= +# get_babel_locale Method +# ============================================================================= + + +class TestBundleGetBabelLocale: + """Test get_babel_locale introspection method.""" + + def test_returns_locale_identifier(self) -> None: + """get_babel_locale returns Babel locale identifier.""" + assert FluentBundle("lv").get_babel_locale() == "lv" + + def test_handles_underscore_locale(self) -> None: + """get_babel_locale handles underscore-separated locales.""" + assert FluentBundle("en_US").get_babel_locale() == "en_US" + + def test_handles_hyphen_locale(self) -> None: + """get_babel_locale handles hyphen-separated locales.""" + result = FluentBundle("en-GB").get_babel_locale() + assert "en" in result + + def test_invalid_locale_is_rejected_at_construction(self) -> None: + """Unknown locales are rejected before a bundle can be created.""" + with pytest.raises(ValueError, match="Unknown locale identifier"): + FluentBundle("xx-INVALID") + + +# ============================================================================= +# Thread Safety +# ============================================================================= + + +class TestBundleThreadSafety: + """Test always-on thread safety via readers-writer lock.""" + + def test_add_resource_is_thread_safe(self) -> None: + """add_resource acquires lock (always-on thread safety).""" + bundle = FluentBundle("en") + bundle.add_resource("msg = Hello") + assert bundle.has_message("msg") + result, errors = bundle.format_pattern("msg") + assert result == "Hello" + assert errors == () + + def test_format_pattern_is_thread_safe(self) -> None: + """format_pattern acquires lock (always-on thread safety).""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource("greeting = Hello, { $name }!") + result, errors = bundle.format_pattern( + "greeting", {"name": "World"} + ) + assert result == "Hello, World!" + assert errors == () + + +# ============================================================================= +# Hypothesis Property-Based Tests +# ============================================================================= + diff --git a/tests/runtime_bundle_cases/properties.py b/tests/runtime_bundle_cases/properties.py new file mode 100644 index 00000000..9ad05634 --- /dev/null +++ b/tests/runtime_bundle_cases/properties.py @@ -0,0 +1,676 @@ +# mypy: ignore-errors +from tests.runtime_bundle_cases import ( + CacheConfig, + ErrorCategory, + FluentBundle, + FormattingIntegrityError, + FunctionRegistry, + SyntaxIntegrityError, + assume, + create_default_registry, + event, + example, + given, + logging, + normalize_locale, + pytest, + st, + validate_resource, +) + + +class TestBundleHypothesisProperties: + """Property-based tests for FluentBundle boundary exploration.""" + + # --- Init type validation (from test_bundle_100pct_final_coverage) --- + + @given( + invalid_functions=st.one_of( + st.dictionaries( + st.text(min_size=1, max_size=10), st.integers() + ), + st.lists(st.text()), + st.integers(), + st.text(), + st.none(), + ) + ) + def test_init_rejects_non_function_registry( + self, invalid_functions: object + ) -> None: + """FluentBundle.__init__ rejects non-FunctionRegistry functions.""" + if invalid_functions is None: + event("type=NoneType_valid") + return + + type_name = type(invalid_functions).__name__ + event(f"type={type_name}") + + with pytest.raises( + TypeError, + match="functions must be FunctionRegistry, not", + ): + FluentBundle( + "en_US", functions=invalid_functions # type: ignore[arg-type] + ) + + @example(invalid_functions={"NUMBER": lambda x: x}) + @example(invalid_functions=[]) + @example(invalid_functions=42) + @example(invalid_functions="not_a_registry") + @given( + invalid_functions=st.one_of( + st.dictionaries( + st.text(min_size=1, max_size=5), + st.integers(), + min_size=1, + ), + st.lists(st.integers(), min_size=1), + ) + ) + def test_init_type_error_message_includes_type_name( + self, invalid_functions: object + ) -> None: + """TypeError message includes actual type name.""" + type_name = type(invalid_functions).__name__ + event(f"type={type_name}") + + with pytest.raises(TypeError) as exc_info: + FluentBundle( + "en_US", functions=invalid_functions # type: ignore[arg-type] + ) + + assert type_name in str(exc_info.value) + assert "FunctionRegistry" in str(exc_info.value) + assert "create_default_registry" in str(exc_info.value) + + # --- Property getters (from test_bundle_100pct_final_coverage) --- + + @given( + max_expansion_size=st.integers( + min_value=1000, max_value=10_000_000 + ), + locale=st.sampled_from(["en_US", "de_DE", "lv_LV", "ja_JP"]), + ) + def test_max_expansion_size_preserved( + self, max_expansion_size: int, locale: str + ) -> None: + """max_expansion_size property returns configured value.""" + if max_expansion_size < 10_000: + event("boundary=small") + elif max_expansion_size > 1_000_000: + event("boundary=large") + else: + event("boundary=medium") + + bundle = FluentBundle( + locale, max_expansion_size=max_expansion_size + ) + assert bundle.max_expansion_size == max_expansion_size + + @given( + locale=st.sampled_from(["en", "de", "lv", "pl", "ar", "ja"]), + provide_custom_registry=st.booleans(), + ) + def test_function_registry_preserved( + self, locale: str, provide_custom_registry: bool + ) -> None: + """function_registry property returns valid registry.""" + if provide_custom_registry: + event("registry_type=custom") + custom_registry = create_default_registry() + bundle = FluentBundle(locale, functions=custom_registry) + else: + event("registry_type=shared") + bundle = FluentBundle(locale) + + registry = bundle.function_registry + assert isinstance(registry, FunctionRegistry) + assert "NUMBER" in registry + + # --- Comment handling (from test_bundle_100pct_final_coverage) --- + + @given( + num_comments=st.integers(min_value=1, max_value=10), + comment_style=st.sampled_from( + ["single", "double", "triple"] + ), + ) + def test_comments_handled_correctly( + self, num_comments: int, comment_style: str + ) -> None: + """Comment entries handled during resource registration.""" + event(f"comment_count={num_comments}") + event(f"comment_style={comment_style}") + + marker = {"single": "#", "double": "##", "triple": "###"}[ + comment_style + ] + lines = [f"{marker} Comment {i}" for i in range(num_comments)] + lines.extend(["", "msg = Hello"]) + + bundle = FluentBundle("en_US") + junk = bundle.add_resource("\n".join(lines)) + assert len(junk) == 0 + assert bundle.has_message("msg") + + @example(num_standalone=1) + @example(num_standalone=3) + @example(num_standalone=10) + @given(num_standalone=st.integers(min_value=1, max_value=20)) + def test_comments_do_not_create_junk( + self, num_standalone: int + ) -> None: + """Comments are skipped without creating Junk entries.""" + event(f"standalone_comments={num_standalone}") + + lines = ["### Section Header"] + lines.extend( + f"# Comment line {i}" for i in range(num_standalone) + ) + lines.extend(["", "message = Value", "## Trailing comment"]) + + bundle = FluentBundle("en_US") + junk = bundle.add_resource("\n".join(lines)) + assert len(junk) == 0 + assert bundle.has_message("message") + + # --- Strict mode cache interaction --- + # (from test_bundle_100pct_final_coverage) + + @given( + locale=st.sampled_from(["en", "de", "lv", "pl"]), + missing_var_name=st.text( + alphabet=st.characters( + min_codepoint=ord("a"), max_codepoint=ord("z") + ), + min_size=1, + max_size=20, + ), + ) + def test_strict_mode_raises_on_cached_error( + self, locale: str, missing_var_name: str + ) -> None: + """Strict mode raises FormattingIntegrityError on cached errors.""" + bundle = FluentBundle( + locale, strict=True, cache=CacheConfig() + ) + bundle.add_resource( + f"msg = Hello {{ ${missing_var_name} }}" + ) + + with pytest.raises(FormattingIntegrityError) as exc1: + bundle.format_pattern("msg", {}) + + event("cache_hit_type=error") + assert exc1.value.message_id == "msg" + assert len(exc1.value.fluent_errors) == 1 + assert ( + exc1.value.fluent_errors[0].category + == ErrorCategory.REFERENCE + ) + + with pytest.raises(FormattingIntegrityError) as exc2: + bundle.format_pattern("msg", {}) + assert exc2.value.message_id == "msg" + + @given( + locale=st.sampled_from(["en_US", "de_DE", "lv_LV"]), + message_text=st.text( + alphabet=st.characters( + min_codepoint=ord("A"), + max_codepoint=ord("z"), + blacklist_categories=("Cc", "Cs"), + ), + min_size=1, + max_size=50, + ), + ) + def test_strict_mode_cache_hit_without_errors( + self, locale: str, message_text: str + ) -> None: + """Strict mode cached success result returns normally.""" + safe = "".join( + c for c in message_text if c.isprintable() and c not in "{}#" + ).strip() + if not safe: + safe = "Hello" + + bundle = FluentBundle( + locale, strict=True, cache=CacheConfig() + ) + bundle.add_resource(f"msg = {safe}") + + r1, e1 = bundle.format_pattern("msg") + assert r1 == safe + assert e1 == () + + event("cache_hit_type=success") + + r2, e2 = bundle.format_pattern("msg") + assert r2 == safe + assert e2 == () + + # --- Configuration preservation properties --- + # (from test_bundle_complete_final_coverage, events added) + + @given( + st.text( + alphabet=st.sampled_from(["a", "b", "c", "_", "-"]), + min_size=1, + max_size=50, + ) + ) + def test_valid_locale_accepted(self, locale: str) -> None: + """Valid locale formats are accepted by FluentBundle.""" + if not locale or not locale[0].isalnum(): + event("outcome=filtered") + return + + try: + bundle = FluentBundle(locale) + event("outcome=accepted") + assert bundle.locale == normalize_locale(locale) + except ValueError: + event("outcome=rejected") + + @given(st.booleans()) + def test_use_isolating_preserved( + self, use_isolating: bool + ) -> None: + """use_isolating configuration is preserved.""" + kind = "isolating" if use_isolating else "non_isolating" + event(f"outcome={kind}") + bundle = FluentBundle("en", use_isolating=use_isolating) + assert bundle.use_isolating == use_isolating + + @given(st.booleans()) + def test_strict_mode_preserved(self, strict: bool) -> None: + """strict mode configuration is preserved.""" + kind = "strict" if strict else "lenient" + event(f"outcome={kind}") + bundle = FluentBundle("en", strict=strict) + assert bundle.strict == strict + + @given(st.integers(min_value=1, max_value=10000)) + def test_cache_config_size_preserved(self, cache_size: int) -> None: + """cache_config.size is preserved from CacheConfig constructor.""" + if cache_size < 100: + event("boundary=small") + elif cache_size < 5000: + event("boundary=medium") + else: + event("boundary=large") + bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) + assert bundle.cache_config is not None + assert bundle.cache_config.size == cache_size + + # --- Validation properties (from test_bundle_coverage, events added) --- + + @given( + term_name=st.from_regex( + r"[a-z][a-z0-9-]{0,10}", fullmatch=True + ) + ) + def test_duplicate_term_generates_warning( + self, term_name: str + ) -> None: + """Duplicate term IDs always generate warnings.""" + event("outcome=duplicate_warned") + bundle = FluentBundle("en_US", use_isolating=False) + ftl = f"-{term_name} = First\n-{term_name} = Second\n" + result = bundle.validate_resource(ftl) + assert any( + "Duplicate term ID" in w.message for w in result.warnings + ) + + @given( + term_a=st.from_regex( + r"[a-z][a-z0-9-]{0,10}", fullmatch=True + ), + term_b=st.from_regex( + r"[a-z][a-z0-9-]{0,10}", fullmatch=True + ), + ) + def test_undefined_term_ref_generates_warning( + self, term_a: str, term_b: str + ) -> None: + """Undefined term references always generate warnings.""" + assume(term_a != term_b) + event("outcome=undefined_warned") + bundle = FluentBundle("en_US", use_isolating=False) + ftl = f"-{term_a} = {{ -{term_b} }}" + result = bundle.validate_resource(ftl) + assert any( + f"undefined term '-{term_b}'" in w.message + for w in result.warnings + ) + + +# ============================================================================ +# LOCALE VALIDATION AND BUNDLE INTEGRATION COVERAGE +# ============================================================================ + + +class TestLocaleValidationAsciiOnly: + """Locale codes must be ASCII alphanumeric with underscore or hyphen separators.""" + + def test_valid_ascii_locales_accepted(self) -> None: + """Valid ASCII locale codes are accepted without error.""" + valid_locales = [ + "en", + "en_US", + "en-US", + "de_DE", + "lv_LV", + "zh_Hans_CN", + "pt_BR", + "ja_JP", + "ar_EG", + ] + for locale in valid_locales: + bundle = FluentBundle(locale) + assert bundle.locale == normalize_locale(locale) + + def test_unicode_locale_rejected(self) -> None: + """Locale codes with non-ASCII characters raise ValueError.""" + invalid_locales = [ + "\xe9_FR", + "\u65e5\u672c\u8a9e", + "en_\xfc", + "\xe4\xf6\xfc", + ] + for locale in invalid_locales: + with pytest.raises(ValueError, match="must be ASCII alphanumeric"): + FluentBundle(locale) + + def test_empty_locale_rejected(self) -> None: + """Empty locale code raises ValueError.""" + with pytest.raises(ValueError, match="locale cannot be blank"): + FluentBundle("") + + def test_invalid_format_rejected(self) -> None: + """Invalid locale code formats raise ValueError.""" + invalid_formats = [ + "_en", + "en_", + "en__US", + "en US", + "en.US", + "en@US", + ] + for locale in invalid_formats: + with pytest.raises(ValueError, match=r"Invalid locale:"): + FluentBundle(locale) + + @given( + st.builds( + lambda first, rest: first + rest, + first=st.text( + alphabet="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", + min_size=1, + max_size=1, + ), + rest=st.text( + alphabet="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", + min_size=0, + max_size=9, + ), + ) + ) + def test_ascii_alphanumeric_input_is_canonicalized_or_rejected(self, locale: str) -> None: + """PROPERTY: ASCII locale-like input either canonicalizes or fails explicitly.""" + event(f"locale_len={len(locale)}") + try: + bundle = FluentBundle(locale) + except ValueError: + with pytest.raises(ValueError, match=r"Unknown locale identifier|Invalid locale format"): + FluentBundle(locale) + event("outcome=rejected") + else: + assert bundle.locale == normalize_locale(locale) + event("outcome=accepted") + + +class TestBundleOverwriteWarning: + """Overwriting an existing message or term in add_resource logs a WARNING.""" + + def test_message_overwrite_logs_warning(self, caplog: pytest.LogCaptureFixture) -> None: + """Overwriting a message logs a warning with the message ID.""" + bundle = FluentBundle("en") + + with caplog.at_level(logging.WARNING): + bundle.add_resource("greeting = Hello") + bundle.add_resource("greeting = Goodbye") + + warning_messages = [ + record.message for record in caplog.records + if record.levelno == logging.WARNING + ] + assert any("Overwriting existing message 'greeting'" in msg for msg in warning_messages) + + def test_term_overwrite_logs_warning(self, caplog: pytest.LogCaptureFixture) -> None: + """Overwriting a term logs a warning with the term ID.""" + bundle = FluentBundle("en") + + with caplog.at_level(logging.WARNING): + bundle.add_resource("-brand = Acme") + bundle.add_resource("-brand = NewCorp") + + warning_messages = [ + record.message for record in caplog.records + if record.levelno == logging.WARNING + ] + assert any("Overwriting existing term '-brand'" in msg for msg in warning_messages) + + def test_no_warning_for_new_entries(self, caplog: pytest.LogCaptureFixture) -> None: + """No overwrite warning when adding distinct entries.""" + bundle = FluentBundle("en") + + with caplog.at_level(logging.WARNING): + bundle.add_resource("greeting = Hello") + bundle.add_resource("farewell = Goodbye") + + overwrite_warnings = [ + record.message for record in caplog.records + if record.levelno == logging.WARNING and "Overwriting" in record.message + ] + assert len(overwrite_warnings) == 0 + + def test_last_write_wins_behavior_preserved(self) -> None: + """Last Write Wins behavior: last added resource wins on repeated key.""" + bundle = FluentBundle("en") + bundle.add_resource("greeting = First") + bundle.add_resource("greeting = Second") + bundle.add_resource("greeting = Third") + + result, _ = bundle.format_pattern("greeting") + assert result == "Third" + + +class TestBundleIntegration: + """Integration tests via FluentBundle for multi-module coverage.""" + + def test_variant_key_failed_number_parse(self) -> None: + """Number-like variant key that fails parse falls through to identifier.""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource( + "msg = { $val ->\n" + " [-.test] Match\n" + " *[other] Other\n" + "}\n" + ) + result, _ = bundle.format_pattern( + "msg", {"val": "-.test"} + ) + assert result is not None + + def test_identifier_as_function_argument(self) -> None: + """Identifier becomes MessageReference in function call arguments.""" + bundle = FluentBundle("en_US") + + def test_func(val: str | int) -> str: + return str(val) + + bundle.add_function("TEST", test_func) + bundle.add_resource("ref = value") + bundle.add_resource("msg = { TEST(ref) }") + result, errors = bundle.format_pattern("msg") + assert not errors + assert result is not None + + def test_comment_with_crlf_ending(self) -> None: + """Comment with CRLF line ending is parsed correctly.""" + bundle = FluentBundle("en_US") + bundle.add_resource("# Comment\r\nmsg = value") + result, errors = bundle.format_pattern("msg") + assert not errors + assert "value" in result + + def test_full_coverage_integration(self) -> None: + """Integration test exercising parser, resolver, and validator together.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + "# Comment\n" + "msg1 = { $val }\n" + "msg2 = { NUMBER($val) }\n" + "msg3 = { -term }\n" + "msg4 = { other.attr }\n" + "sel = { 42 ->\n" + " [42] Match\n" + " *[other] Other\n" + "}\n" + "-brand = Firefox\n" + " .version = 1.0\n" + "empty =\n" + " .attr = Value\n" + ) + r1, _ = bundle.format_pattern("msg1", {"val": "t"}) + r2, _ = bundle.format_pattern("msg2", {"val": 42}) + r3, _ = bundle.format_pattern("sel") + assert all(r is not None for r in [r1, r2, r3]) + + validation = validate_resource( + "msg = { $val }\n-term = Firefox\n" + ) + assert validation is not None + + +class TestBundleLocaleValidationBeforeLoading: + """Locale validation happens before any resource loading attempt.""" + + def test_locale_validation_before_resource_loading(self) -> None: + """Invalid locale raises ValueError immediately, before resource loading.""" + with pytest.raises(ValueError, match="must be ASCII alphanumeric"): + FluentBundle("\xe9_FR") + + +# ============================================================================ +# TestAddResourceStream +# ============================================================================ + + +class TestAddResourceStream: + """FluentBundle.add_resource_stream incremental resource loading.""" + + def test_single_message_from_lines(self) -> None: + """add_resource_stream loads a single message from a line list.""" + bundle = FluentBundle("en") + bundle.add_resource_stream(["greeting = Hello\n"]) + assert bundle.has_message("greeting") + + def test_multiple_messages_blank_separated(self) -> None: + """Multiple messages separated by blank lines are all registered.""" + bundle = FluentBundle("en") + bundle.add_resource_stream(["msg1 = One\n", "\n", "msg2 = Two\n"]) + assert bundle.has_message("msg1") + assert bundle.has_message("msg2") + + def test_empty_stream_registers_nothing(self) -> None: + """Empty line iterable registers no messages.""" + bundle = FluentBundle("en") + bundle.add_resource_stream([]) + assert not bundle.has_message("anything") + + def test_returns_empty_junk_tuple_on_clean_source(self) -> None: + """Clean FTL stream returns empty junk tuple.""" + bundle = FluentBundle("en") + junk = bundle.add_resource_stream(["msg = Value\n"]) + assert junk == () + + def test_returns_junk_on_parse_error(self) -> None: + """Junk entries from invalid FTL are returned (not raised) in non-strict mode.""" + bundle = FluentBundle("en", strict=False) + junk = bundle.add_resource_stream([" invalid = indented\n"]) + assert len(junk) >= 1 + + def test_strict_mode_raises_on_junk(self) -> None: + """Strict mode raises SyntaxIntegrityError when the stream contains junk.""" + bundle = FluentBundle("en", strict=True) + with pytest.raises(SyntaxIntegrityError): + bundle.add_resource_stream([" invalid = indented\n"]) + + def test_source_path_threads_through(self) -> None: + """source_path kwarg is accepted without error.""" + bundle = FluentBundle("en") + bundle.add_resource_stream( + ["greeting = Hello\n"], source_path="locales/en/ui.ftl" + ) + assert bundle.has_message("greeting") + + def test_format_works_after_stream_load(self) -> None: + """Messages loaded via add_resource_stream are formattable.""" + bundle = FluentBundle("en") + bundle.add_resource_stream(["greeting = Hello, { $name }!\n"]) + result, errors = bundle.format_pattern("greeting", {"name": "World"}) + assert errors == () + assert result == "Hello, \u2068World\u2069!" + + def test_generator_input_accepted(self) -> None: + """Generator (not just list) is accepted as lines argument.""" + bundle = FluentBundle("en") + + def gen() -> object: + yield "msg = From generator\n" + + bundle.add_resource_stream(gen()) # type: ignore[arg-type] + assert bundle.has_message("msg") + + def test_equivalence_with_add_resource(self) -> None: + """add_resource_stream produces same messages as add_resource for same content.""" + source = "msg1 = One\n\nmsg2 = Two\n" + b1 = FluentBundle("en") + b1.add_resource(source) + b2 = FluentBundle("en") + b2.add_resource_stream(source.splitlines(keepends=True)) + assert b1.has_message("msg1") == b2.has_message("msg1") + assert b1.has_message("msg2") == b2.has_message("msg2") + r1, _ = b1.format_pattern("msg1") + r2, _ = b2.format_pattern("msg1") + assert r1 == r2 + + @given( + names=st.lists( + st.text( + min_size=1, + max_size=20, + alphabet=st.characters( + min_codepoint=ord("a"), + max_codepoint=ord("z"), + ), + ), + min_size=1, + max_size=10, + ) + ) + def test_all_messages_reachable_after_stream_load( + self, names: list[str] + ) -> None: + """All messages loaded via stream are reachable via has_message.""" + event(f"msg_count={len(names)}") + unique_names = list(dict.fromkeys(names)) + source = "\n\n".join(f"{name} = Value" for name in unique_names) + "\n" + bundle = FluentBundle("en") + bundle.add_resource_stream(source.splitlines(keepends=True)) + for name in unique_names: + assert bundle.has_message(name), f"Missing: {name}" diff --git a/tests/runtime_bundle_cases/state.py b/tests/runtime_bundle_cases/state.py new file mode 100644 index 00000000..4b7e9db9 --- /dev/null +++ b/tests/runtime_bundle_cases/state.py @@ -0,0 +1,772 @@ +# mypy: ignore-errors +from tests.runtime_bundle_cases import ( + MAX_LOCALE_LENGTH_HARD_LIMIT, + MAX_SOURCE_SIZE, + Any, + CacheConfig, + FluentBundle, + FormattingIntegrityError, + SyntaxIntegrityError, + ValidationError, + logging, + patch, + pytest, +) + + +class TestBundlePropertyAccessors: + """Test all property accessors for complete coverage.""" + + def test_locale_property_returns_configured_locale(self) -> None: + """locale property returns the canonical locale code.""" + bundle = FluentBundle("lv_LV") + assert bundle.locale == "lv_lv" + + bundle_ar = FluentBundle("ar_EG") + assert bundle_ar.locale == "ar_eg" + + def test_use_isolating_property_true(self) -> None: + """use_isolating property returns True when enabled.""" + bundle = FluentBundle("en", use_isolating=True) + assert bundle.use_isolating is True + + def test_use_isolating_property_false(self) -> None: + """use_isolating property returns False when disabled.""" + bundle = FluentBundle("en", use_isolating=False) + assert bundle.use_isolating is False + + def test_strict_property_returns_configured_value(self) -> None: + """strict property returns the strict mode boolean.""" + assert FluentBundle("en", strict=True).strict is True + assert FluentBundle("en", strict=False).strict is False + assert FluentBundle("en").strict is True + + def test_cache_enabled_property(self) -> None: + """cache_enabled property reflects configuration.""" + assert FluentBundle("en", cache=CacheConfig()).cache_enabled is True + assert FluentBundle("en").cache_enabled is False + + def test_cache_config_size_property(self) -> None: + """cache_config.size returns configured maximum.""" + bundle = FluentBundle("en", cache=CacheConfig(size=500)) + assert bundle.cache_config is not None + assert bundle.cache_config.size == 500 + + def test_cache_usage_property_tracks_entries(self) -> None: + """cache_usage property tracks current cached entries.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("msg1 = Hello\nmsg2 = World") + + assert bundle.cache_usage == 0 + bundle.format_pattern("msg1") + assert bundle.cache_usage == 1 + bundle.format_pattern("msg2") + assert bundle.cache_usage == 2 + + def test_cache_usage_returns_zero_when_disabled(self) -> None: + """cache_usage returns 0 when caching is disabled.""" + bundle = FluentBundle("en") + bundle.add_resource("msg = Hello") + bundle.format_pattern("msg") + assert bundle.cache_usage == 0 + + def test_cache_write_once_config(self) -> None: + """cache_config.write_once reflects configured boolean.""" + on = FluentBundle("en", cache=CacheConfig(write_once=True)) + assert on.cache_config is not None + assert on.cache_config.write_once is True + off = FluentBundle("en", cache=CacheConfig(write_once=False)) + assert off.cache_config is not None + assert off.cache_config.write_once is False + + def test_cache_enable_audit_config(self) -> None: + """cache_config.enable_audit reflects configured boolean.""" + on = FluentBundle("en", cache=CacheConfig(enable_audit=True)) + assert on.cache_config is not None + assert on.cache_config.enable_audit is True + off = FluentBundle("en", cache=CacheConfig(enable_audit=False)) + assert off.cache_config is not None + assert off.cache_config.enable_audit is False + + def test_cache_max_audit_entries_config(self) -> None: + """cache_config.max_audit_entries reflects configured maximum.""" + bundle = FluentBundle( + "en", cache=CacheConfig(max_audit_entries=5000) + ) + assert bundle.cache_config is not None + assert bundle.cache_config.max_audit_entries == 5000 + + def test_cache_max_entry_weight_config(self) -> None: + """cache_config.max_entry_weight reflects configured maximum.""" + bundle = FluentBundle( + "en", cache=CacheConfig(max_entry_weight=8000) + ) + assert bundle.cache_config is not None + assert bundle.cache_config.max_entry_weight == 8000 + + def test_cache_max_errors_per_entry_config(self) -> None: + """cache_config.max_errors_per_entry reflects configured maximum.""" + bundle = FluentBundle( + "en", cache=CacheConfig(max_errors_per_entry=25) + ) + assert bundle.cache_config is not None + assert bundle.cache_config.max_errors_per_entry == 25 + + def test_max_source_size_property(self) -> None: + """max_source_size property returns configured or default value.""" + assert FluentBundle("en", max_source_size=500_000).max_source_size == 500_000 + assert FluentBundle("en").max_source_size == MAX_SOURCE_SIZE + + def test_max_nesting_depth_property(self) -> None: + """max_nesting_depth property returns configured or default value.""" + assert FluentBundle("en", max_nesting_depth=50).max_nesting_depth == 50 + assert FluentBundle("en").max_nesting_depth == 100 + + +# ============================================================================= +# Locale Validation +# ============================================================================= + + +class TestBundleLocaleValidation: + """Test locale code validation in __init__.""" + + def test_rejects_invalid_characters(self) -> None: + """Locale with special characters raises ValueError.""" + with pytest.raises(ValueError, match=r"Invalid locale: 'en@invalid'"): + FluentBundle("en@invalid") + + def test_rejects_spaces(self) -> None: + """Locale with spaces raises ValueError.""" + with pytest.raises(ValueError, match=r"Invalid locale: 'en US'"): + FluentBundle("en US") + + def test_rejects_non_ascii(self) -> None: + """Locale with non-ASCII characters raises ValueError.""" + with pytest.raises(ValueError, match=r"Invalid locale: 'ën_FR'"): + FluentBundle("\u00ebn_FR") + + def test_accepts_hyphen_separator(self) -> None: + """Locale with hyphen separator accepted.""" + assert FluentBundle("en-US").locale == "en_us" + + def test_accepts_underscore_separator(self) -> None: + """Locale with underscore separator accepted.""" + assert FluentBundle("en_US").locale == "en_us" + + def test_exceeding_max_length_rejected(self) -> None: + """Locale exceeding MAX_LOCALE_LENGTH_HARD_LIMIT raises ValueError.""" + long_locale = "a" * (MAX_LOCALE_LENGTH_HARD_LIMIT + 1) + with pytest.raises(ValueError, match="locale exceeds maximum length"): + FluentBundle(long_locale) + + def test_exceeding_max_length_shows_truncated(self) -> None: + """Error message includes truncated locale and actual length.""" + long_locale = "X" * (MAX_LOCALE_LENGTH_HARD_LIMIT + 100) + with pytest.raises( + ValueError, match="locale exceeds maximum length" + ) as exc_info: + FluentBundle(long_locale) + error_msg = str(exc_info.value) + assert long_locale[:50] in error_msg + assert str(len(long_locale)) in error_msg + + +# ============================================================================= +# Special Methods (__repr__) +# ============================================================================= + + +class TestBundleSpecialMethods: + """Test __repr__ for complete coverage.""" + + def test_repr_shows_locale_and_counts(self) -> None: + """__repr__ returns string with locale and message/term counts.""" + bundle = FluentBundle("lv_LV") + repr_str = repr(bundle) + assert "FluentBundle" in repr_str + assert "lv_lv" in repr_str + assert "messages=0" in repr_str + assert "terms=0" in repr_str + + def test_repr_reflects_counts_after_adding_resources(self) -> None: + """__repr__ shows accurate counts after adding resources.""" + bundle = FluentBundle("en") + bundle.add_resource("msg1 = Hello\nmsg2 = World\n-brand = Firefox") + repr_str = repr(bundle) + assert "messages=2" in repr_str + assert "terms=1" in repr_str + + +# ============================================================================= +# for_system_locale Factory Method +# ============================================================================= + + +class TestBundleForSystemLocale: + """Test for_system_locale classmethod.""" + + def test_creates_bundle_with_detected_locale(self) -> None: + """for_system_locale creates bundle with system locale.""" + with patch( + "ftllexengine.runtime.bundle_lifecycle.get_system_locale", + return_value="en_US", + ): + bundle = FluentBundle.for_system_locale() + assert bundle.locale == "en_us" + + def test_passes_configuration_parameters(self) -> None: + """for_system_locale passes all configuration parameters.""" + with patch( + "ftllexengine.runtime.bundle_lifecycle.get_system_locale", + return_value="de_DE", + ): + bundle = FluentBundle.for_system_locale( + use_isolating=False, + cache=CacheConfig(size=2000), + strict=True, + max_source_size=500_000, + ) + assert bundle.locale == "de_de" + assert bundle.use_isolating is False + assert bundle.cache_enabled is True + assert bundle.cache_config is not None + assert bundle.cache_config.size == 2000 + assert bundle.strict is True + assert bundle.max_source_size == 500_000 + + def test_raises_when_locale_unavailable(self) -> None: + """for_system_locale raises RuntimeError when locale unavailable.""" + with patch( + "ftllexengine.runtime.bundle_lifecycle.get_system_locale", + side_effect=RuntimeError("Cannot determine system locale"), + ), pytest.raises(RuntimeError, match="Cannot determine"): + FluentBundle.for_system_locale() + + def test_falls_back_to_env_vars_when_getlocale_fails(self) -> None: + """for_system_locale uses env vars when getlocale() returns None.""" + with patch("locale.getlocale", return_value=(None, None)), patch.dict( + "os.environ", {"LC_ALL": "de_DE"}, clear=False + ): + bundle = FluentBundle.for_system_locale() + assert bundle.locale == "de_de" + + def test_tries_lc_messages_when_lc_all_missing(self) -> None: + """for_system_locale tries LC_MESSAGES when LC_ALL not set.""" + with patch("locale.getlocale", return_value=(None, None)), patch.dict( + "os.environ", {"LC_MESSAGES": "fr_FR"}, clear=True + ): + bundle = FluentBundle.for_system_locale() + assert bundle.locale == "fr_fr" + + def test_tries_lang_when_others_missing(self) -> None: + """for_system_locale tries LANG as final fallback.""" + with patch("locale.getlocale", return_value=(None, None)), patch.dict( + "os.environ", {"LANG": "es_ES"}, clear=True + ): + bundle = FluentBundle.for_system_locale() + assert bundle.locale == "es_es" + + def test_raises_when_no_locale_found(self) -> None: + """for_system_locale raises RuntimeError with no locale.""" + with ( + patch("locale.getlocale", return_value=(None, None)), + patch.dict("os.environ", {}, clear=True), + pytest.raises( + RuntimeError, match="Could not determine system locale" + ), + ): + FluentBundle.for_system_locale() + + def test_normalizes_posix_format(self) -> None: + """for_system_locale strips encoding suffix and normalizes.""" + with patch("locale.getlocale", return_value=("en_US.UTF-8", None)): + bundle = FluentBundle.for_system_locale() + assert bundle.locale == "en_us" + assert "UTF-8" not in bundle.locale + + def test_handles_locale_without_encoding(self) -> None: + """for_system_locale handles locale without encoding suffix.""" + with patch("locale.getlocale", return_value=("pl_PL", None)): + bundle = FluentBundle.for_system_locale() + assert bundle.locale == "pl_pl" + + +# ============================================================================= +# Resource Management (add_resource, comments, terms) +# ============================================================================= + + +class TestBundleResourceManagement: + """Test add_resource edge cases, comment handling, term attributes.""" + + def test_add_resource_with_comments(self) -> None: + """Comments are parsed but not registered as messages.""" + bundle = FluentBundle("en") + ftl_source = ( + "# Standalone comment\nmsg1 = Hello\n\n" + "## Section comment\nmsg2 = World\n\n" + "### Resource comment\n-term = Value\n" + ) + junk = bundle.add_resource(ftl_source) + assert len(junk) == 0 + assert bundle.has_message("msg1") + assert bundle.has_message("msg2") + assert len(bundle.get_message_ids()) == 2 + + def test_standalone_comment_only_resource(self) -> None: + """Resource containing only comments is valid.""" + bundle = FluentBundle("en") + junk = bundle.add_resource( + "# Comment\n## Section\n### Resource\n" + ) + assert len(junk) == 0 + assert len(bundle.get_message_ids()) == 0 + + def test_consecutive_comments(self) -> None: + """Multiple consecutive comments hit Comment->loop branch.""" + bundle = FluentBundle("en") + ftl = "## Section 1\n## Section 2\n### Resource\nmsg = Value\n" + junk = bundle.add_resource(ftl) + assert len(junk) == 0 + assert bundle.has_message("msg") + + def test_message_without_value_only_attributes(self) -> None: + """Message with no value, only attributes, is registered.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource("msg =\n .attr1 = Value 1\n .attr2 = Value 2\n") + assert bundle.has_message("msg") + + def test_term_with_multiple_attributes(self) -> None: + """Term with attributes is registered successfully.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource( + "-brand = Firefox\n .gender = masculine\n" + " .case = nominative\n" + ) + assert bundle is not None + + def test_add_resource_clears_cache(self) -> None: + """add_resource clears cache when enabled.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("first = First") + bundle.format_pattern("first") + assert bundle.get_cache_stats()["size"] > 0 # type: ignore[index] + bundle.add_resource("second = Second") + assert bundle.get_cache_stats()["size"] == 0 # type: ignore[index] + + def test_duplicate_terms_overwrite(self, caplog: Any) -> None: + """Duplicate term definitions produce overwrite warning.""" + bundle = FluentBundle("en") + bundle.add_resource("-brand = Firefox\n-brand = Chrome\n") + assert any( + "Overwriting existing term '-brand'" in r.message + for r in caplog.records + ) + + def test_multiple_duplicate_terms(self, caplog: Any) -> None: + """Multiple duplicate terms each produce warnings.""" + bundle = FluentBundle("en") + bundle.add_resource( + "-brand = First\n-version = First\n" + "-brand = Second\n-version = Second\n" + ) + warnings = [ + r for r in caplog.records + if "Overwriting existing term" in r.message + ] + assert len(warnings) == 2 + + def test_comments_with_debug_logging(self, caplog: Any) -> None: + """Comments are processed at debug level without errors.""" + caplog.set_level(logging.DEBUG) + bundle = FluentBundle("en") + ftl = ( + "# Comment before term\n" + "-brand = Firefox\n" + ) + junk = bundle.add_resource(ftl) + assert len(junk) == 0 + + +# ============================================================================= +# Type Validation (add_resource, validate_resource, format_pattern) +# ============================================================================= + + +class TestBundleTypeValidation: + """Test type validation at API boundaries.""" + + def test_add_resource_rejects_bytes(self) -> None: + """add_resource raises TypeError for bytes with decode suggestion.""" + bundle = FluentBundle("en") + with pytest.raises(TypeError, match=r"source must be str, not bytes"): + bundle.add_resource(b"msg = Hello") # type: ignore[arg-type] + with pytest.raises(TypeError, match=r"source.decode\('utf-8'\)"): + bundle.add_resource(b"msg = Hello") # type: ignore[arg-type] + + def test_add_resource_rejects_int(self) -> None: + """add_resource raises TypeError for non-string types.""" + bundle = FluentBundle("en") + with pytest.raises(TypeError, match=r"source must be str"): + bundle.add_resource(42) # type: ignore[arg-type] + + def test_validate_resource_rejects_bytes(self) -> None: + """validate_resource raises TypeError for bytes.""" + bundle = FluentBundle("en") + with pytest.raises(TypeError, match=r"source must be str, not bytes"): + bundle.validate_resource(b"msg = Hello") # type: ignore[arg-type] + + def test_format_pattern_empty_message_id(self) -> None: + """format_pattern with empty message ID returns fallback.""" + bundle = FluentBundle("en", strict=False) + result, errors = bundle.format_pattern("") + assert result == "{???}" + assert len(errors) == 1 + + def test_format_pattern_invalid_args_type(self) -> None: + """format_pattern with non-Mapping args returns fallback.""" + bundle = FluentBundle("en", strict=False) + bundle.add_resource("msg = Hello") + result, errors = bundle.format_pattern("msg", []) # type: ignore[arg-type] + assert result == "{???}" + assert len(errors) == 1 + + def test_format_pattern_invalid_attribute_type(self) -> None: + """format_pattern with non-string attribute returns fallback.""" + bundle = FluentBundle("en", strict=False) + bundle.add_resource("msg = Hello") + result, errors = bundle.format_pattern( + "msg", {}, attribute=123 # type: ignore[arg-type] + ) + assert result == "{???}" + assert len(errors) == 1 + + def test_strict_mode_raises_on_empty_message_id(self) -> None: + """format_pattern in strict mode raises on empty message ID.""" + bundle = FluentBundle("en", strict=True) + with pytest.raises(FormattingIntegrityError): + bundle.format_pattern("") + + def test_strict_mode_raises_on_invalid_args_type(self) -> None: + """format_pattern in strict mode raises on invalid args type.""" + bundle = FluentBundle("en", strict=True) + bundle.add_resource("msg = Hello") + with pytest.raises(FormattingIntegrityError): + bundle.format_pattern("msg", []) # type: ignore[arg-type] + + def test_strict_mode_raises_on_invalid_attribute_type(self) -> None: + """format_pattern in strict mode raises on invalid attribute type.""" + bundle = FluentBundle("en", strict=True) + bundle.add_resource("msg = Hello") + with pytest.raises(FormattingIntegrityError): + bundle.format_pattern( + "msg", {}, attribute=123 # type: ignore[arg-type] + ) + + +# ============================================================================= +# Strict Mode (syntax errors, formatting errors, caching) +# ============================================================================= + + +class TestBundleStrictMode: + """Test strict mode syntax and formatting error handling.""" + + def test_raises_syntax_integrity_error_on_junk(self) -> None: + """Strict mode raises SyntaxIntegrityError for junk entries.""" + bundle = FluentBundle("en", strict=True) + with pytest.raises( + SyntaxIntegrityError, match=r"Strict mode: .* syntax error" + ): + bundle.add_resource("msg = \n!!invalid!!") + + def test_error_includes_source_path(self) -> None: + """Strict mode error includes source_path when provided.""" + bundle = FluentBundle("en", strict=True) + with pytest.raises( + SyntaxIntegrityError, match=r"locales/en/messages.ftl" + ) as exc_info: + bundle.add_resource( + "msg = \n!!invalid!!", + source_path="locales/en/messages.ftl", + ) + assert exc_info.value.source_path == "locales/en/messages.ftl" + + def test_error_truncates_long_summary(self) -> None: + """Strict mode truncates to first 3 junk entries.""" + bundle = FluentBundle("en", strict=True) + invalid_ftl = ( + "msg1 =\n!!e1!!\nmsg2 =\n!!e2!!\n" + "msg3 =\n!!e3!!\nmsg4 =\n!!e4!!\n" + ) + with pytest.raises( + SyntaxIntegrityError, match=r"and \d+ more" + ): + bundle.add_resource(invalid_ftl) + + def test_does_not_mutate_bundle_on_error(self) -> None: + """Strict mode does not partially populate bundle on syntax error.""" + bundle = FluentBundle("en", strict=True) + bundle.add_resource("msg1 = Hello") + assert len(bundle.get_message_ids()) == 1 + + with pytest.raises(SyntaxIntegrityError): + bundle.add_resource("msg2 = World\n!!invalid!!") + assert len(bundle.get_message_ids()) == 1 + + def test_formatting_integrity_error_on_missing_var(self) -> None: + """Strict mode raises FormattingIntegrityError for missing vars.""" + bundle = FluentBundle("en", strict=True) + bundle.add_resource("msg = Hello { $name }") + with pytest.raises(FormattingIntegrityError, match=r"Strict mode"): + bundle.format_pattern("msg", {}) + + def test_formatting_error_includes_message_id(self) -> None: + """Strict mode formatting error includes message ID.""" + bundle = FluentBundle("en", strict=True) + bundle.add_resource("greeting = Hello { $name }") + with pytest.raises( + FormattingIntegrityError, match=r"greeting" + ) as exc_info: + bundle.format_pattern("greeting", {}) + assert exc_info.value.message_id == "greeting" + + def test_formatting_error_truncates_multiple_errors(self) -> None: + """Strict mode error truncates to first 3 formatting errors.""" + bundle = FluentBundle("en", strict=True) + bundle.add_resource("msg = { $a } { $b } { $c } { $d }") + with pytest.raises(FormattingIntegrityError, match=r"and \d+ more"): + bundle.format_pattern("msg", {}) + + +# ============================================================================= +# Validation (circular refs, undefined refs, duplicates, syntax errors) +# ============================================================================= + + +class TestBundleValidation: + """Test validate_resource warning and error detection.""" + + def test_detects_circular_message_refs(self) -> None: + """Circular message references generate warnings.""" + bundle = FluentBundle("en") + result = bundle.validate_resource( + "msg1 = { msg2 }\nmsg2 = { msg1 }\n" + ) + assert any( + "Circular message reference" in w.message + for w in result.warnings + ) + + def test_detects_self_referencing_message(self) -> None: + """Message referencing itself detected as circular.""" + bundle = FluentBundle("en") + result = bundle.validate_resource("msg = { msg }\n") + assert len(result.warnings) > 0 + + def test_detects_circular_term_refs(self) -> None: + """Circular term references generate warnings.""" + bundle = FluentBundle("en") + result = bundle.validate_resource( + "-term1 = { -term2 }\n-term2 = { -term1 }\n" + ) + assert any( + "Circular term reference" in w.message + for w in result.warnings + ) + + def test_detects_self_referencing_term(self) -> None: + """Term referencing itself detected as circular.""" + bundle = FluentBundle("en") + result = bundle.validate_resource("-term = { -term }\n") + assert len(result.warnings) > 0 + + def test_detects_term_attribute_circular_ref(self) -> None: + """Circular reference in term attribute detected.""" + bundle = FluentBundle("en") + result = bundle.validate_resource( + "-term = Value\n .attr = { -term.attr }\n" + ) + assert len(result.warnings) > 0 + + def test_detects_nested_term_circular_ref(self) -> None: + """Three-way circular term reference detected.""" + bundle = FluentBundle("en") + result = bundle.validate_resource( + "-t1 = { -t2 }\n-t2 = { -t3 }\n-t3 = { -t1 }\n" + ) + assert len(result.warnings) > 0 + + def test_detects_undefined_message_ref(self) -> None: + """Undefined message reference generates warning.""" + bundle = FluentBundle("en") + result = bundle.validate_resource("msg = { undefined }\n") + assert any( + "undefined" in w.message.lower() for w in result.warnings + ) + + def test_detects_undefined_term_ref_from_message(self) -> None: + """Message referencing undefined term generates warning.""" + bundle = FluentBundle("en") + result = bundle.validate_resource("msg = { -undefined_term }\n") + assert len(result.warnings) > 0 + + def test_detects_undefined_term_ref_from_term(self) -> None: + """Term referencing undefined term generates warning.""" + bundle = FluentBundle("en_US", use_isolating=False) + result = bundle.validate_resource("-term-a = { -term-b }\n") + assert any( + "undefined term '-term-b'" in w.message + for w in result.warnings + ) + + def test_detects_undefined_message_ref_from_term(self) -> None: + """Term referencing undefined message generates warning.""" + bundle = FluentBundle("en") + result = bundle.validate_resource("-term = { undefined_msg }\n") + assert len(result.warnings) > 0 + + def test_term_referencing_defined_message_no_warning(self) -> None: + """Term referencing a defined message does not warn.""" + bundle = FluentBundle("en_US", use_isolating=False) + result = bundle.validate_resource( + "greeting = Hello\n-term = { greeting }\n" + ) + assert not any( + "undefined message" in w.message for w in result.warnings + ) + + def test_detects_duplicate_term_id(self) -> None: + """Duplicate term ID generates warning.""" + bundle = FluentBundle("en_US", use_isolating=False) + result = bundle.validate_resource( + "-brand = Firefox\n-brand = Chrome\n" + ) + assert any( + "Duplicate term ID" in w.message for w in result.warnings + ) + + def test_message_without_value_validates(self) -> None: + """Message with only attributes validates successfully.""" + bundle = FluentBundle("en_US", use_isolating=False) + result = bundle.validate_resource("msg =\n .attr = Value\n") + assert result.is_valid + + def test_term_with_attributes_validates(self) -> None: + """Term with attributes validates successfully.""" + bundle = FluentBundle("en_US", use_isolating=False) + result = bundle.validate_resource( + "-term = Base\n .attr1 = A1\n .attr2 = A2\n" + ) + assert result.is_valid + + def test_handles_critical_syntax_error(self) -> None: + """Critical syntax errors produce validation errors.""" + bundle = FluentBundle("en") + result = bundle.validate_resource("msg = {{ invalid") + assert not result.is_valid + assert len(result.errors) > 0 + + def test_critical_error_returns_validation_error(self) -> None: + """Critical errors are ValidationError instances.""" + bundle = FluentBundle("en_US", use_isolating=False) + result = bundle.validate_resource("msg = {{ broken") + assert all( + isinstance(e, ValidationError) for e in result.errors + ) + + def test_integration_all_warning_types(self) -> None: + """Resource with all warning types produces correct warnings.""" + bundle = FluentBundle("en_US", use_isolating=False) + ftl = ( + "msg-dup = First\nmsg-dup = Second\n" + "-term-dup = First\n-term-dup = Second\n" + "circ-a = { circ-b }\ncirc-b = { circ-a }\n" + "-tc-a = { -tc-b }\n-tc-b = { -tc-a }\n" + "msg-undef = { missing-msg }\n" + "-term-undef = { -missing-term }\n" + "msg-attrs =\n .attr = Value\n" + "-term-attrs = Base\n .attr = Attribute\n" + ) + result = bundle.validate_resource(ftl) + warnings = " ".join(w.message for w in result.warnings) + assert "Duplicate message ID" in warnings + assert "Duplicate term ID" in warnings + assert "Circular message reference" in warnings + assert "Circular term reference" in warnings + assert "undefined message" in warnings + assert "undefined term" in warnings + + def test_message_without_value_no_crash(self) -> None: + """Validation doesn't crash on empty-value message.""" + bundle = FluentBundle("en") + result = bundle.validate_resource("empty =\n") + assert result is not None + + +# ============================================================================= +# Cache Management +# ============================================================================= + + +class TestBundleCacheManagement: + """Test clear_cache, get_cache_stats, cache invalidation.""" + + def test_clear_cache_when_enabled(self) -> None: + """clear_cache removes all cached format results.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("msg1 = Hello\nmsg2 = World") + bundle.format_pattern("msg1") + bundle.format_pattern("msg2") + assert bundle.cache_usage == 2 + bundle.clear_cache() + assert bundle.cache_usage == 0 + + def test_clear_cache_when_disabled(self) -> None: + """clear_cache succeeds when cache is disabled.""" + bundle = FluentBundle("en") + bundle.clear_cache() + assert bundle.get_cache_stats() is None + + def test_clear_cache_resets_to_empty(self) -> None: + """clear_cache resets the format cache to empty state.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("msg = Hello") + bundle.clear_cache() + assert bundle.cache_usage == 0 + + def test_get_cache_stats_returns_dict_when_enabled(self) -> None: + """get_cache_stats returns dict with hits/misses when enabled.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("msg = Hello") + bundle.format_pattern("msg", {}) + bundle.format_pattern("msg", {}) + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["hits"] == 1 + assert stats["misses"] == 1 + + def test_get_cache_stats_returns_none_when_disabled(self) -> None: + """get_cache_stats returns None when caching is disabled.""" + bundle = FluentBundle("en") + assert bundle.get_cache_stats() is None + + def test_format_pattern_caches_result(self) -> None: + """format_pattern caches results when cache enabled.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("msg = Hello") + result1, _ = bundle.format_pattern("msg") + stats1 = bundle.get_cache_stats() + assert stats1 is not None + assert stats1["misses"] == 1 + result2, _ = bundle.format_pattern("msg") + stats2 = bundle.get_cache_stats() + assert stats2 is not None + assert stats2["hits"] == 1 + assert result1 == result2 + + +# -- Introspection (variables, introspect_message/term, has_attribute) ------- + + diff --git a/tests/runtime_cache_hashable_cases/__init__.py b/tests/runtime_cache_hashable_cases/__init__.py new file mode 100644 index 00000000..fe5177c2 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/__init__.py @@ -0,0 +1,52 @@ +"""Tests for IntegrityCache hashable key construction, NaN normalization, and +unhashable argument handling. + +Covers: +- __init__ parameter validation +- _make_hashable type-tagged conversions (bool/int/Decimal/datetime/date/ + FluentNumber/list/dict/set/tuple) for collision-free cache keys +- Depth limiting to prevent O(N) key computation on adversarial inputs +- _make_key integration and error recovery (RecursionError, TypeError) +- NaN normalization (Decimal) to prevent cache pollution DoS vectors +- Hashable conversion of list/dict/set/tuple args for full cache coverage +- Unhashable argument graceful bypass (skips caching, increments counter) +- Error bloat protection (max_entry_weight, max_errors_per_entry) +- LRU eviction and move-to-end behavior +- Property accessors (size, hits, misses, unhashable_skips, oversize_skips) +""" + +from __future__ import annotations + +from datetime import UTC, date, datetime +from decimal import Decimal +from typing import Any, NoReturn + +import pytest +from hypothesis import event, example, given, settings +from hypothesis import strategies as st + +from ftllexengine.constants import MAX_DEPTH +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.runtime.cache import IntegrityCache +from ftllexengine.runtime.function_bridge import FluentNumber, FluentValue + +__all__ = [ + "MAX_DEPTH", + "UTC", + "Any", + "Decimal", + "ErrorCategory", + "FluentNumber", + "FluentValue", + "FrozenFluentError", + "IntegrityCache", + "NoReturn", + "date", + "datetime", + "event", + "example", + "given", + "pytest", + "settings", + "st", +] diff --git a/tests/runtime_cache_hashable_cases/section_10_property_accessors.py b/tests/runtime_cache_hashable_cases/section_10_property_accessors.py new file mode 100644 index 00000000..2764649d --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_10_property_accessors.py @@ -0,0 +1,186 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 10: PROPERTY ACCESSORS +# ============================================================================ + + +class TestIntegrityCacheProperties: + """Test IntegrityCache property accessors for size, hit/miss counters, and limits.""" + + def test_len_and_size_consistent(self) -> None: + """len(cache) and cache.size return the same current entry count.""" + cache = IntegrityCache(strict=False) + assert len(cache) == 0 + cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) + assert len(cache) == 1 + assert cache.size == 1 + cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) + assert len(cache) == 2 + assert cache.size == 2 + + def test_maxsize_property(self) -> None: + """maxsize property returns the configured maximum size.""" + cache = IntegrityCache(strict=False, maxsize=500) + assert cache.maxsize == 500 + + def test_max_entry_weight_property(self) -> None: + """max_entry_weight property returns the configured weight limit.""" + cache = IntegrityCache(strict=False, max_entry_weight=5000) + assert cache.max_entry_weight == 5000 + + def test_hits_increments_on_cache_hit(self) -> None: + """hits property increments each time get() finds an entry.""" + cache = IntegrityCache(strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="result", errors=()) + cache.get("msg", None, None, "en", use_isolating=True) + assert cache.hits == 1 + cache.get("msg", None, None, "en", use_isolating=True) + assert cache.hits == 2 + + def test_misses_increments_on_cache_miss(self) -> None: + """misses increments only for true cache misses, not unhashable bypasses.""" + cache = IntegrityCache(strict=False) + cache.get("msg1", None, None, "en", use_isolating=True) + assert cache.misses == 1 + cache.get("msg2", None, None, "en", use_isolating=True) + assert cache.misses == 2 + + def test_misses_not_incremented_for_unhashable_bypass(self) -> None: + """Unhashable args bypass the cache entirely; misses is not incremented. + + An unhashable bypass is not a cache miss: no key was constructed or + looked up. Only unhashable_skips reflects the event. Conflating them + would deflate hit_rate and mislead operators about cache efficiency. + """ + cache = IntegrityCache(strict=False) + + class UnknownType: + pass + + cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] + assert cache.unhashable_skips == 1 + assert cache.misses == 0 + + def test_hit_rate_excludes_unhashable_bypasses(self) -> None: + """hit_rate is computed over hashable interactions only: hits / (hits + misses). + + Unhashable bypasses do not count as misses, so they do not dilute the + rate. A cache with one hashable hit and one unhashable bypass reports + hit_rate=100.0, not 50.0. + """ + cache = IntegrityCache(strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.get("msg", None, None, "en", use_isolating=True) # hit + + class UnknownType: + pass + + cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] + + stats = cache.get_stats() + assert stats["hits"] == 1 + assert stats["misses"] == 0 + assert stats["unhashable_skips"] == 1 + assert stats["hit_rate"] == 100.0 + + def test_hit_rate_zero_on_all_true_misses(self) -> None: + """hit_rate is 0.0 when all interactions are true misses (no unhashable).""" + cache = IntegrityCache(strict=False) + cache.get("absent", None, None, "en", use_isolating=True) + stats = cache.get_stats() + assert stats["hits"] == 0 + assert stats["misses"] == 1 + assert stats["hit_rate"] == 0.0 + + def test_hit_rate_correct_mixed_hits_and_misses(self) -> None: + """hit_rate is accurate across a mix of hits, misses, and unhashable bypasses.""" + cache = IntegrityCache(strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.get("msg", None, None, "en", use_isolating=True) # hit + cache.get("msg", None, None, "en", use_isolating=True) # hit + cache.get("absent", None, None, "en", use_isolating=True) # miss + + class UnknownType: + pass + + cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] + + stats = cache.get_stats() + assert stats["hits"] == 2 + assert stats["misses"] == 1 + assert stats["unhashable_skips"] == 1 + # hit_rate = 2 / (2 + 1) * 100 = 66.67% + assert stats["hit_rate"] == round(2 / 3 * 100, 2) + + def test_unhashable_skips_increments_on_skip(self) -> None: + """unhashable_skips increments for both get() and put() skips.""" + cache = IntegrityCache(strict=False) + + class UnknownType: + pass + + get_args: dict[str, object] = {"data": UnknownType()} + cache.get("msg", get_args, None, "en", use_isolating=True) # type: ignore[arg-type] + assert cache.unhashable_skips == 1 + put_args: dict[str, object] = {"data": UnknownType()} + cache.put("msg", put_args, None, "en", use_isolating=True, formatted="result", errors=()) # type: ignore[arg-type] + assert cache.unhashable_skips == 2 + + def test_oversize_skips_increments_on_oversize_entry(self) -> None: + """oversize_skips increments when formatted string exceeds max_entry_weight.""" + cache = IntegrityCache(strict=False, max_entry_weight=10) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) + assert cache.oversize_skips == 1 + cache.put("msg2", None, None, "en", use_isolating=True, formatted="y" * 50, errors=()) + assert cache.oversize_skips == 2 + + @given( + st.integers(min_value=1, max_value=1000), + st.integers(min_value=1, max_value=10000), + st.integers(min_value=1, max_value=100), + ) + @settings(max_examples=50) + def test_property_constructor_parameters_stored_correctly( + self, + maxsize: int, + max_entry_weight: int, + max_errors_per_entry: int, + ) -> None: + """PROPERTY: Constructor parameters are stored and reflected by properties.""" + cache = IntegrityCache( + strict=False, + maxsize=maxsize, + max_entry_weight=max_entry_weight, + max_errors_per_entry=max_errors_per_entry, + ) + assert cache.maxsize == maxsize + assert cache.max_entry_weight == max_entry_weight + assert cache.size == 0 + assert cache.hits == 0 + assert cache.misses == 0 + event(f"maxsize={maxsize}") + + @given(st.text(min_size=0, max_size=100)) + @settings(max_examples=50) + def test_property_primitive_args_always_cacheable(self, text: str) -> None: + """PROPERTY: All primitive FluentValue types produce valid, retrievable entries.""" + cache = IntegrityCache(strict=False) + + args_list: list[dict[str, FluentValue]] = [ + {"text": text}, + {"num": 42}, + {"decimal": Decimal("3.14")}, + {"flag": True}, + {"val": None}, + ] + for args in args_list: + cache.put("msg", args, None, "en", use_isolating=True, formatted="result", errors=()) + entry = cache.get("msg", args, None, "en", use_isolating=True) + assert entry is not None + assert entry.as_result() == ("result", ()) + + event(f"text_len={len(text)}") diff --git a/tests/runtime_cache_hashable_cases/section_1_initialization_validation.py b/tests/runtime_cache_hashable_cases/section_1_initialization_validation.py new file mode 100644 index 00000000..603eb9c1 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_1_initialization_validation.py @@ -0,0 +1,42 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 1: INITIALIZATION VALIDATION +# ============================================================================ + + +class TestIntegrityCacheInitValidation: + """Test IntegrityCache.__init__ parameter validation.""" + + def test_maxsize_zero_rejected(self) -> None: + """IntegrityCache rejects maxsize=0.""" + with pytest.raises(ValueError, match="maxsize must be positive"): + IntegrityCache(maxsize=0) + + def test_maxsize_negative_rejected(self) -> None: + """IntegrityCache rejects negative maxsize.""" + with pytest.raises(ValueError, match="maxsize must be positive"): + IntegrityCache(maxsize=-1) + + def test_max_entry_weight_zero_rejected(self) -> None: + """IntegrityCache rejects max_entry_weight=0.""" + with pytest.raises(ValueError, match="max_entry_weight must be positive"): + IntegrityCache(max_entry_weight=0) + + def test_max_entry_weight_negative_rejected(self) -> None: + """IntegrityCache rejects negative max_entry_weight.""" + with pytest.raises(ValueError, match="max_entry_weight must be positive"): + IntegrityCache(max_entry_weight=-1) + + def test_max_errors_per_entry_zero_rejected(self) -> None: + """IntegrityCache rejects max_errors_per_entry=0.""" + with pytest.raises(ValueError, match="max_errors_per_entry must be positive"): + IntegrityCache(max_errors_per_entry=0) + + def test_max_errors_per_entry_negative_rejected(self) -> None: + """IntegrityCache rejects negative max_errors_per_entry.""" + with pytest.raises(ValueError, match="max_errors_per_entry must be positive"): + IntegrityCache(max_errors_per_entry=-1) diff --git a/tests/runtime_cache_hashable_cases/section_2_make_hashable_type_tagged_conversions.py b/tests/runtime_cache_hashable_cases/section_2_make_hashable_type_tagged_conversions.py new file mode 100644 index 00000000..ef919df8 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_2_make_hashable_type_tagged_conversions.py @@ -0,0 +1,170 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 2: MAKE HASHABLE - TYPE-TAGGED CONVERSIONS +# ============================================================================ + + +class TestMakeHashableTypes: + """Test IntegrityCache._make_hashable type-tagged conversions. + + Python's hash equality (hash(1) == hash(True)) would cause cache collisions. + Type-tagging ensures distinct cache keys per type. + """ + + def test_make_hashable_primitives(self) -> None: + """_make_hashable type-tags bool/int to prevent hash collisions. + + str and None are not tagged (no collision risk). + bool/int are type-tagged so hash(1) == hash(True) does not cause + cache key collisions. + """ + assert IntegrityCache._make_hashable("text") == "text" + assert IntegrityCache._make_hashable(None) is None + assert IntegrityCache._make_hashable(42) == ("__int__", 42) + assert IntegrityCache._make_hashable(True) == ("__bool__", True) + assert IntegrityCache._make_hashable(False) == ("__bool__", False) + + def test_make_hashable_decimal(self) -> None: + """_make_hashable type-tags Decimal with str() to preserve scale. + + Decimal("1.0") and Decimal("1") are equal in Python but produce + different plural forms in CLDR (visible fraction digits differ). + Type-tagging with str() preserves scale for correct cache keys. + """ + result = IntegrityCache._make_hashable(Decimal("123.45")) + assert result == ("__decimal__", "123.45") + assert isinstance(result, tuple) + + def test_make_hashable_datetime_naive(self) -> None: + """_make_hashable type-tags naive datetime with isoformat and '__naive__'. + + Two datetimes representing the same UTC instant with different tzinfo + compare equal but format differently. Including tz_key prevents collision. + Naive datetime gets '__naive__' sentinel as tz_key. + """ + dt = datetime(2024, 1, 1, 12, 0, 0) + result = IntegrityCache._make_hashable(dt) + assert result == ("__datetime__", "2024-01-01T12:00:00", "__naive__") + assert isinstance(result, tuple) + + def test_make_hashable_datetime_aware(self) -> None: + """_make_hashable type-tags aware datetime with UTC timezone string. + + Aware datetime includes the tzinfo string to prevent collisions between + identical times expressed in different timezones. + """ + dt = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) + result = IntegrityCache._make_hashable(dt) + assert result == ("__datetime__", "2024-01-01T12:00:00+00:00", "UTC") + assert isinstance(result, tuple) + + def test_make_hashable_date(self) -> None: + """_make_hashable type-tags date with isoformat.""" + d = date(2024, 1, 1) + result = IntegrityCache._make_hashable(d) + assert result == ("__date__", "2024-01-01") + assert isinstance(result, tuple) + + def test_make_hashable_fluent_number(self) -> None: + """_make_hashable type-tags FluentNumber with underlying type info for precision. + + FluentNumber wraps numeric values with formatting options. The inner value + is recursively normalized to handle NaN consistency. + """ + value = FluentNumber(value=42, formatted="42") + result = IntegrityCache._make_hashable(value) + assert result == ("__fluentnumber__", "int", ("__int__", 42), "42", None) + + def test_make_hashable_list_to_tuple(self) -> None: + """_make_hashable type-tags list distinctly from tuple. + + str([1,2]) = "[1, 2]" but str((1,2)) = "(1, 2)". Type-tagging with + '__list__' ensures lists and tuples produce different cache keys even + after both are converted to tuples internally. + """ + result = IntegrityCache._make_hashable([1, 2, [3, 4]]) + inner_list = ("__list__", (("__int__", 3), ("__int__", 4))) + expected = ("__list__", (("__int__", 1), ("__int__", 2), inner_list)) + assert result == expected + assert isinstance(result, tuple) + + def test_make_hashable_dict_to_sorted_tuples(self) -> None: + """_make_hashable converts dict to type-tagged sorted tuple of tuples.""" + result = IntegrityCache._make_hashable({"b": 2, "a": 1}) + assert isinstance(result, tuple) + assert result[0] == "__dict__" + inner = result[1] + assert isinstance(inner, tuple) + assert inner == (("a", ("__int__", 1)), ("b", ("__int__", 2))) + + def test_make_hashable_set_to_frozenset(self) -> None: + """_make_hashable converts set to type-tagged frozenset with type-tagged ints.""" + result = IntegrityCache._make_hashable({1, 2, 3}) + assert isinstance(result, tuple) + assert result[0] == "__set__" + inner = result[1] + expected_inner = frozenset({("__int__", 1), ("__int__", 2), ("__int__", 3)}) + assert inner == expected_inner + + def test_make_hashable_tuple_simple(self) -> None: + """_make_hashable type-tags tuples to distinguish from lists.""" + result = IntegrityCache._make_hashable((1, 2, 3)) + expected = ("__tuple__", (("__int__", 1), ("__int__", 2), ("__int__", 3))) + assert result == expected + assert isinstance(result, tuple) + + def test_make_hashable_tuple_with_nested_list(self) -> None: + """_make_hashable type-tags nested lists within tuples distinctly.""" + result = IntegrityCache._make_hashable((1, [2, 3], 4)) + inner_list = ("__list__", (("__int__", 2), ("__int__", 3))) + expected = ("__tuple__", (("__int__", 1), inner_list, ("__int__", 4))) + assert result == expected + assert isinstance(result, tuple) + hash(result) # Must be hashable end-to-end + + def test_make_hashable_tuple_with_nested_dict(self) -> None: + """_make_hashable type-tags tuples with nested dicts.""" + result = IntegrityCache._make_hashable((1, {"b": 2, "a": 1}, 3)) + inner_dict = ("__dict__", (("a", ("__int__", 1)), ("b", ("__int__", 2)))) + expected = ("__tuple__", (("__int__", 1), inner_dict, ("__int__", 3))) + assert result == expected + hash(result) + + def test_make_hashable_tuple_with_nested_set(self) -> None: + """_make_hashable type-tags tuples with nested sets.""" + result = IntegrityCache._make_hashable((1, {2, 3}, 4)) + inner_set = ("__set__", frozenset({("__int__", 2), ("__int__", 3)})) + expected = ("__tuple__", (("__int__", 1), inner_set, ("__int__", 4))) + assert result == expected + hash(result) + + def test_make_hashable_deeply_nested_tuple(self) -> None: + """_make_hashable type-tags all nested tuples, lists, and dicts.""" + result = IntegrityCache._make_hashable((1, (2, [3, {"a": 4}]), 5)) + inner_dict = ("__dict__", (("a", ("__int__", 4)),)) + inner_list = ("__list__", (("__int__", 3), inner_dict)) + inner_tuple = ("__tuple__", (("__int__", 2), inner_list)) + expected = ("__tuple__", (("__int__", 1), inner_tuple, ("__int__", 5))) + assert result == expected + hash(result) + + def test_make_hashable_nested_mixed_structures(self) -> None: + """_make_hashable handles mixed nested list/dict/set structures.""" + result = IntegrityCache._make_hashable([{"a": [1, 2]}, {3, 4}]) + assert isinstance(result, tuple) + assert result[0] == "__list__" + # Result must be fully hashable + hash(result) + + def test_make_hashable_unknown_type_raises(self) -> None: + """_make_hashable raises TypeError for unrecognized types.""" + + class CustomType: + pass + + with pytest.raises(TypeError, match="Unknown type in cache key"): + IntegrityCache._make_hashable(CustomType()) diff --git a/tests/runtime_cache_hashable_cases/section_3_make_hashable_depth_limiting.py b/tests/runtime_cache_hashable_cases/section_3_make_hashable_depth_limiting.py new file mode 100644 index 00000000..6ba2b5ab --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_3_make_hashable_depth_limiting.py @@ -0,0 +1,80 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 3: MAKE HASHABLE - DEPTH LIMITING +# ============================================================================ + + +class TestMakeHashableDepth: + """Test depth limiting in _make_hashable. + + Prevents O(N) key computation on adversarially nested inputs and guards + against stack overflow via RecursionError transformation. + """ + + def test_shallow_nesting_succeeds(self) -> None: + """Shallow nested structures convert successfully.""" + shallow = {"a": [1, 2, {"b": 3}]} + result = IntegrityCache._make_hashable(shallow) + assert result is not None + + def test_moderate_nesting_succeeds(self) -> None: + """Moderately nested structures (50 levels) convert successfully.""" + # 50 levels well under MAX_DEPTH + value: dict[str, Any] | int = 42 + for _ in range(50): + value = {"nested": value} + result = IntegrityCache._make_hashable(value) + assert result is not None + + def test_excessive_nesting_raises_type_error(self) -> None: + """Excessively nested structures raise TypeError with descriptive message.""" + value: dict[str, Any] | int = 42 + for _ in range(MAX_DEPTH + 10): + value = {"nested": value} + with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): + IntegrityCache._make_hashable(value) + + def test_custom_depth_parameter_respected(self) -> None: + """Custom depth parameter overrides default MAX_DEPTH.""" + value: dict[str, Any] | int = 42 + for _ in range(15): + value = {"nested": value} + + # Should fail at depth=10 + with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): + IntegrityCache._make_hashable(value, depth=10) + + # Should succeed at depth=20 + result = IntegrityCache._make_hashable(value, depth=20) + assert result is not None + + def test_list_nesting_depth_limited(self) -> None: + """List nesting respects depth limit.""" + value: list[Any] | int = 42 + for _ in range(MAX_DEPTH + 10): + value = [value] + with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): + IntegrityCache._make_hashable(value) + + def test_set_nesting_handled(self) -> None: + """Sets with simple values are converted; they cannot nest further. + + Sets cannot contain other sets (sets are unhashable), so depth is + naturally bounded. Simple sets should convert correctly. + """ + result = IntegrityCache._make_hashable({1, 2, 3}) + assert isinstance(result, tuple) + assert result[0] == "__set__" + assert isinstance(result[1], frozenset) + + def test_mixed_nesting_depth_limited(self) -> None: + """Mixed dict/list alternating nesting respects depth limit.""" + value: dict[str, Any] | list[Any] | int = 42 + for i in range(MAX_DEPTH + 10): + value = {"nested": value} if i % 2 == 0 else [value] + with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): + IntegrityCache._make_hashable(value) diff --git a/tests/runtime_cache_hashable_cases/section_4_make_key_integration.py b/tests/runtime_cache_hashable_cases/section_4_make_key_integration.py new file mode 100644 index 00000000..342fb053 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_4_make_key_integration.py @@ -0,0 +1,119 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 4: MAKE KEY INTEGRATION +# ============================================================================ + + +class TestMakeKey: + """Test _make_key integration with _make_hashable. + + _make_key builds a cache key tuple from (message_id, args, attribute, + locale_code, use_isolating). Returns None on any hashing failure, + allowing cache bypass without raising to the caller. + """ + + def test_make_key_with_none_args(self) -> None: + """_make_key with None args returns key with empty tuple for args component.""" + key = IntegrityCache._make_key("msg-id", None, None, "en-US", use_isolating=True) + assert key is not None + assert key == ("msg-id", (), None, "en-US", True) + + def test_make_key_with_simple_args(self) -> None: + """_make_key handles simple string/int arguments.""" + key = IntegrityCache._make_key( + message_id="test", + args={"name": "Alice", "count": 42}, + attribute=None, + locale_code="en", + use_isolating=True, + ) + assert key is not None + + def test_make_key_with_nested_args(self) -> None: + """_make_key handles nested list arguments via _make_hashable.""" + key = IntegrityCache._make_key( + message_id="test", + args={"items": [1, 2, 3]}, + attribute=None, + locale_code="en", + use_isolating=True, + ) + assert key is not None + + def test_make_key_with_all_fluent_value_types(self) -> None: + """_make_key accepts all valid FluentValue types.""" + key = IntegrityCache._make_key( + message_id="test", + args={ + "string": "hello", + "int": 42, + "decimal": Decimal("3.14"), + "decimal2": Decimal("99.99"), + "datetime": datetime(2024, 1, 1, tzinfo=UTC), + "date": date(2024, 1, 1), + "fluent_number": FluentNumber(value=100, formatted="100"), + }, + attribute=None, + locale_code="en", + use_isolating=True, + ) + assert key is not None + + def test_make_key_with_deeply_nested_returns_none(self) -> None: + """_make_key returns None for excessively nested args (graceful bypass).""" + deep: dict[str, Any] | int = 42 + for _ in range(MAX_DEPTH + 10): + deep = {"nested": deep} + key = IntegrityCache._make_key( + message_id="test", + args={"deep": deep}, + attribute=None, + locale_code="en", + use_isolating=True, + ) + assert key is None # Cache bypass, not a crash + + def test_make_key_with_unknown_type_returns_none(self) -> None: + """_make_key returns None for unknown types (graceful bypass).""" + + class CustomObject: + pass + + key = IntegrityCache._make_key( + message_id="test", + args={"custom": CustomObject()}, # type: ignore[dict-item] + attribute=None, + locale_code="en", + use_isolating=True, + ) + assert key is None + + def test_make_key_catches_recursion_error(self) -> None: + """_make_key returns None when RecursionError occurs (circular reference).""" + circular_list: list[object] = [] + circular_list.append(circular_list) + args: dict[str, object] = {"data": circular_list} + result = IntegrityCache._make_key( + "msg", args, None, "en", use_isolating=True # type: ignore[arg-type] + ) + assert result is None + + def test_make_key_catches_type_error_in_hash(self) -> None: + """_make_key returns None when TypeError occurs during hash verification.""" + + class UnhashableAfterConversion: + """Passes _make_hashable type dispatch but fails hash().""" + + def __hash__(self) -> int: # pylint: disable=invalid-hash-returned + msg = "cannot hash" + raise TypeError(msg) + + args: dict[str, object] = {"data": UnhashableAfterConversion()} + result = IntegrityCache._make_key( + "msg", args, None, "en", use_isolating=True # type: ignore[arg-type] + ) + assert result is None diff --git a/tests/runtime_cache_hashable_cases/section_5_na_n_normalization.py b/tests/runtime_cache_hashable_cases/section_5_na_n_normalization.py new file mode 100644 index 00000000..96db8094 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_5_na_n_normalization.py @@ -0,0 +1,176 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 5: NaN NORMALIZATION +# ============================================================================ + + +class TestNaNDecimalNormalization: + """Test that Decimal NaN values are normalized in cache keys.""" + + def test_decimal_nan_cache_key_consistency(self) -> None: + """Decimal NaN produces consistent cache key across independent instances.""" + cache = IntegrityCache(strict=False) + cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="Decimal Result", errors=()) + entry = cache.get("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Decimal Result" + + def test_decimal_nan_does_not_pollute_cache(self) -> None: + """Multiple puts with Decimal NaN update the same entry.""" + cache = IntegrityCache(strict=False, maxsize=100) + for i in range(10): + cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) + stats = cache.get_stats() + assert stats["size"] == 1, ( + f"Expected 1 entry but got {stats['size']}. " + "Decimal NaN normalization may not be working." + ) + + def test_decimal_snan_normalized_same_as_qnan(self) -> None: + """Signaling NaN and quiet NaN both normalize to the same canonical key.""" + cache = IntegrityCache(strict=False) + cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="QNaN", errors=()) + # sNaN should resolve to same cache key as qNaN + entry = cache.get("msg", {"val": Decimal("sNaN")}, None, "en", use_isolating=True) + assert entry is not None + + def test_decimal_nan_different_from_regular_decimal(self) -> None: + """Decimal NaN has different cache key from regular Decimal values.""" + cache = IntegrityCache(strict=False) + cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="NaN Result", errors=()) + cache.put("msg", {"val": Decimal("1.0")}, None, "en", use_isolating=True, formatted="Regular Result", errors=()) + + nan_entry = cache.get("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True) + regular_entry = cache.get("msg", {"val": Decimal("1.0")}, None, "en", use_isolating=True) + + assert nan_entry is not None + assert nan_entry.formatted == "NaN Result" + assert regular_entry is not None + assert regular_entry.formatted == "Regular Result" + assert cache.get_stats()["size"] == 2 + + +class TestNaNInNestedStructures: + """Test NaN normalization in nested data structures.""" + + def test_nan_in_list_normalized(self) -> None: + """NaN values within lists are normalized for cache key consistency.""" + cache = IntegrityCache(strict=False) + items = [Decimal(1), Decimal("NaN"), Decimal(3)] + cache.put("msg", {"items": items}, None, "en", use_isolating=True, formatted="List Result", errors=()) + entry = cache.get("msg", {"items": items}, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "List Result" + + def test_nan_in_dict_normalized(self) -> None: + """NaN values within dicts are normalized for cache key consistency.""" + cache = IntegrityCache(strict=False) + args: dict[str, FluentValue] = {"data": {"a": Decimal(1), "b": Decimal("NaN")}} + cache.put("msg", args, None, "en", use_isolating=True, formatted="Dict Result", errors=()) + data = {"a": Decimal(1), "b": Decimal("NaN")} + entry = cache.get("msg", {"data": data}, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Dict Result" + + def test_deeply_nested_nan_normalized(self) -> None: + """NaN values in deeply nested structures are normalized consistently.""" + cache = IntegrityCache(strict=False) + deep_args: dict[str, FluentValue] = { + "outer": { + "inner": [ + {"value": Decimal("NaN")}, + {"value": Decimal("sNaN")}, + ] + } + } + cache.put("msg", deep_args, None, "en", use_isolating=True, formatted="Deep Result", errors=()) + fresh_args: dict[str, FluentValue] = { + "outer": { + "inner": [ + {"value": Decimal("NaN")}, + {"value": Decimal("sNaN")}, + ] + } + } + entry = cache.get("msg", fresh_args, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Deep Result" + + +class TestNaNSecurityProperties: + """Test security properties of NaN normalization.""" + + def test_nan_cache_pollution_prevented(self) -> None: + """NaN-based cache pollution attack is prevented by normalization. + + Attack scenario: 100 NaN-containing requests without normalization would + create 100 unique, unretrievable entries, evicting all legitimate entries. + With normalization all NaN entries collapse to a single key. + """ + cache = IntegrityCache(strict=False, maxsize=10) + for i in range(5): + cache.put(f"legit{i}", None, None, "en", use_isolating=True, formatted=f"Legit {i}", errors=()) + for i in range(100): + cache.put("attack", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted=f"Attack {i}", errors=()) + + # 5 legit + 1 attack = 6 entries (attack collapses to 1 due to normalization) + assert cache.get_stats()["size"] == 6 + for i in range(5): + entry = cache.get(f"legit{i}", None, None, "en", use_isolating=True) + assert entry is not None, f"Legitimate entry legit{i} was evicted!" + + @given(st.decimals(allow_nan=True)) + @settings(max_examples=100) + @example(Decimal("NaN")) + @example(Decimal("sNaN")) + @example(Decimal("Inf")) + @example(Decimal("-Inf")) + def test_all_decimal_special_values_produce_retrievable_keys( + self, value: Decimal + ) -> None: + """PROPERTY: For any Decimal value, put followed by get returns the entry.""" + cache = IntegrityCache(strict=False) + args = {"val": value} + cache.put("msg", args, None, "en", use_isolating=True, formatted=f"Value: {value}", errors=()) + entry = cache.get("msg", args, None, "en", use_isolating=True) + assert entry is not None, f"Entry for value {value!r} was not retrievable" + is_nan = value.is_nan() or value.is_snan() + event(f"is_nan={is_nan}") + + +class TestNaNHashableValue: + """Test _make_hashable NaN handling directly.""" + + def test_make_hashable_decimal_nan_returns_canonical(self) -> None: + """_make_hashable returns canonical ('__decimal__', '__NaN__') for Decimal NaN.""" + result = IntegrityCache._make_hashable(Decimal("NaN")) + assert result == ("__decimal__", "__NaN__") + + def test_make_hashable_decimal_snan_returns_canonical(self) -> None: + """_make_hashable returns canonical ('__decimal__', '__NaN__') for Decimal sNaN.""" + result = IntegrityCache._make_hashable(Decimal("sNaN")) + assert result == ("__decimal__", "__NaN__") + + def test_make_hashable_regular_decimal_uses_str(self) -> None: + """_make_hashable returns tagged str for regular Decimal values.""" + result = IntegrityCache._make_hashable(Decimal("1.50")) + assert result == ("__decimal__", "1.50") + + def test_make_hashable_decimal_infinity_uses_str_not_nan_sentinel(self) -> None: + """Decimal Infinity uses str() representation, not the NaN sentinel. + + Infinity satisfies Inf == Inf (unlike NaN), so no special normalization + is needed. Both +Inf and -Inf produce distinct, retrievable keys. + """ + pos_inf = IntegrityCache._make_hashable(Decimal("Inf")) + neg_inf = IntegrityCache._make_hashable(Decimal("-Inf")) + nan_result = IntegrityCache._make_hashable(Decimal("NaN")) + + assert pos_inf == ("__decimal__", "Infinity") + assert neg_inf == ("__decimal__", "-Infinity") + assert pos_inf != nan_result + assert neg_inf != nan_result diff --git a/tests/runtime_cache_hashable_cases/section_6_hashable_conversion_cache_roundtrip_tests.py b/tests/runtime_cache_hashable_cases/section_6_hashable_conversion_cache_roundtrip_tests.py new file mode 100644 index 00000000..5a2e3fa7 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_6_hashable_conversion_cache_roundtrip_tests.py @@ -0,0 +1,176 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 6: HASHABLE CONVERSION - CACHE ROUNDTRIP TESTS +# ============================================================================ + + +class TestCacheHashableConversion: # pylint: disable=too-many-public-methods + """Test IntegrityCache automatic conversion of unhashable args to hashable keys. + + Lists, dicts, sets, and tuples are converted to hashable equivalents + (type-tagged tuples, sorted tuples, frozensets) enabling caching for these + types without requiring callers to pre-convert their arguments. + """ + + def test_get_with_list_value_now_cacheable(self) -> None: + """get() with list args succeeds: lists are converted to type-tagged tuples.""" + cache = IntegrityCache(strict=False, maxsize=100) + args = {"key": [1, 2, 3]} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + + def test_get_with_dict_value_now_cacheable(self) -> None: + """get() with nested dict args succeeds: dicts are converted to sorted tuples.""" + cache = IntegrityCache(strict=False, maxsize=100) + args = {"key": {"nested": "value"}} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + + def test_get_with_set_value_now_cacheable(self) -> None: + """get() with set args succeeds: sets are converted to type-tagged frozensets.""" + cache = IntegrityCache(strict=False, maxsize=100) + args: dict[str, object] = {"key": {1, 2, 3}} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + + def test_put_with_list_value_now_caches(self) -> None: + """put() with list args stores entry: lists are converted at key build time.""" + cache = IntegrityCache(strict=False, maxsize=100) + cache.put("msg-id", {"items": [1, 2, 3]}, None, "en-US", use_isolating=True, formatted="formatted", errors=()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + + def test_put_with_dict_value_now_caches(self) -> None: + """put() with nested dict args stores entry: dicts are converted at key build.""" + cache = IntegrityCache(strict=False, maxsize=100) + cache.put("msg-id", {"config": {"option": "value"}}, None, "en-US", use_isolating=True, formatted="fmt", errors=()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + + def test_make_key_converts_list_to_valid_key(self) -> None: + """_make_key returns a non-None key when args contain lists.""" + args: dict[str, object] = {"list_value": [1, 2, 3]} + key = IntegrityCache._make_key( + "msg-id", args, None, "en-US", use_isolating=True # type: ignore[arg-type] + ) + assert key is not None + + def test_make_key_converts_nested_structures_to_valid_key(self) -> None: + """_make_key returns a non-None key when args contain nested structures.""" + args: dict[str, object] = {"list": [1, 2], "dict": {"nested": "value"}} + key = IntegrityCache._make_key( + "msg-id", args, None, "en-US", use_isolating=True # type: ignore[arg-type] + ) + assert key is not None + + def test_get_with_tuple_value_cacheable(self) -> None: + """get() caches tuple-valued args correctly via type-tagged conversion.""" + cache = IntegrityCache(strict=False, maxsize=100) + args = {"coords": (10, 20, 30)} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + + def test_get_with_tuple_containing_list_cacheable(self) -> None: + """get() caches tuple-with-nested-list args: nested list is converted.""" + cache = IntegrityCache(strict=False, maxsize=100) + args: dict[str, object] = {"data": (1, [2, 3], 4)} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + + @given(st.tuples(st.integers(), st.integers(), st.integers())) + def test_get_with_various_tuples_cacheable( + self, tuple_value: tuple[int, int, int] + ) -> None: + """PROPERTY: Tuple-valued args cache and retrieve correctly.""" + cache = IntegrityCache(strict=False, maxsize=100) + args = {"tuple_arg": tuple_value} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert cache.unhashable_skips == 0 + event(f"tuple_len={len(tuple_value)}") + + @given(st.lists(st.integers(), min_size=1, max_size=10)) + def test_get_with_various_lists_cacheable(self, list_value: list[int]) -> None: + """PROPERTY: List-valued args cache and retrieve correctly.""" + cache = IntegrityCache(strict=False, maxsize=100) + args = {"list_arg": list_value} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert cache.unhashable_skips == 0 + event(f"list_len={len(list_value)}") + + @given( + st.dictionaries( + st.text(min_size=1, max_size=10), st.integers(), min_size=1, max_size=5 + ) + ) + def test_put_with_various_dicts_cacheable(self, dict_value: dict[str, int]) -> None: + """PROPERTY: Dict-valued args cache correctly.""" + cache = IntegrityCache(strict=False, maxsize=100) + args = {"dict_arg": dict_value} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) + assert len(cache) == 1 + assert cache.unhashable_skips == 0 + event(f"dict_len={len(dict_value)}") + + def test_mixed_hashable_and_convertible_args(self) -> None: + """Cache handles mixed hashable/convertible args in the same call.""" + cache = IntegrityCache(strict=False, maxsize=100) + args: dict[str, object] = { + "str_arg": "value", + "int_arg": 42, + "list_arg": [1, 2, 3], + } + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert cache.unhashable_skips == 0 + + def test_empty_list_cacheable(self) -> None: + """Empty lists are converted and cached correctly.""" + cache = IntegrityCache(strict=False, maxsize=100) + args: dict[str, list[object]] = {"empty_list": []} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert len(cache) == 1 + + def test_empty_dict_cacheable(self) -> None: + """Empty dicts are converted and cached correctly.""" + cache = IntegrityCache(strict=False, maxsize=100) + args: dict[str, dict[object, object]] = {"empty_dict": {}} + cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] + cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] + assert cached is not None + assert cached.as_result() == ("formatted", ()) + assert len(cache) == 1 diff --git a/tests/runtime_cache_hashable_cases/section_7_unhashable_argument_handling.py b/tests/runtime_cache_hashable_cases/section_7_unhashable_argument_handling.py new file mode 100644 index 00000000..9576a26d --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_7_unhashable_argument_handling.py @@ -0,0 +1,180 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 7: UNHASHABLE ARGUMENT HANDLING +# ============================================================================ + + +class TestUnhashableHandling: + """Test graceful bypass for arguments that cannot be hashed. + + Covers three bypass mechanisms: + 1. Unknown type in _make_hashable (case _ branch) + 2. Python's hash() raising TypeError + 3. RecursionError from circular references + In all cases: entry is not cached, unhashable_skips increments. + """ + + def test_get_with_unknown_type_skips_cache(self) -> None: + """get() with unknown type arg bypasses cache and increments unhashable_skips. + + UnknownType is not recognized by _make_hashable's match/case dispatch, + triggering TypeError("Unknown type in cache key") → _make_key returns None. + An unhashable bypass is not a cache miss: no key was looked up, so misses + is not incremented. Only unhashable_skips reflects the event. + """ + cache = IntegrityCache(strict=False) + + class UnknownType: + pass + + args: dict[str, object] = {"data": UnknownType()} + result = cache.get("msg", args, None, "en", use_isolating=True) # type: ignore[arg-type] + assert result is None + assert cache.unhashable_skips == 1 + assert cache.misses == 0 + assert cache.hits == 0 + + def test_put_with_unhashable_hash_raises_skips_cache(self) -> None: + """put() with arg whose __hash__ raises TypeError skips caching.""" + cache = IntegrityCache(strict=False) + + class CustomObject: + def __hash__(self) -> int: # pylint: disable=invalid-hash-returned + msg = "unhashable" + raise TypeError(msg) + + args: dict[str, object] = {"obj": CustomObject()} + cache.put("msg", args, None, "en", use_isolating=True, formatted="result", errors=()) # type: ignore[arg-type] + assert cache.size == 0 + assert cache.unhashable_skips == 1 + + def test_unhashable_custom_object_in_get_skipped(self) -> None: + """Custom unhashable objects in get() args bypass caching gracefully.""" + cache = IntegrityCache(strict=False, maxsize=100) + + class UnhashableClass: + def __init__(self) -> None: + self.data = [1, 2, 3] + + def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned + msg = "unhashable type" + raise TypeError(msg) + + custom_args: dict[str, object] = {"custom": UnhashableClass()} + result = cache.get("msg-id", custom_args, None, "en-US", use_isolating=True) # type: ignore[arg-type] + assert result is None + assert cache.unhashable_skips == 1 + + def test_unhashable_skips_not_incremented_for_convertible_types(self) -> None: + """unhashable_skips only counts truly unhashable objects; lists/dicts do not.""" + cache = IntegrityCache(strict=False, maxsize=100) + assert cache.unhashable_skips == 0 + + cache.get("msg1", {"list": [1]}, None, "en-US", use_isolating=True) + assert cache.unhashable_skips == 0 # Lists are convertible, not skipped + + cache.put("msg2", {"dict": {}}, None, "en-US", use_isolating=True, formatted="result", errors=()) + assert cache.unhashable_skips == 0 # Dicts are convertible, not skipped + + def test_unhashable_skips_preserved_on_clear(self) -> None: + """clear() does not reset unhashable_skips; counter is cumulative.""" + cache = IntegrityCache(strict=False, maxsize=100) + + class UnhashableClass: + def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned + msg = "unhashable type" + raise TypeError(msg) + + cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] + assert cache.unhashable_skips == 1 + # clear() removes entries but preserves cumulative observability metrics. + cache.clear() + assert cache.unhashable_skips == 1 + + def test_get_stats_includes_unhashable_skips(self) -> None: + """get_stats() reflects unhashable bypasses in unhashable_skips, not misses. + + Unhashable args bypass the cache entirely; no key lookup occurs. + misses counts only true cache misses (key looked up, not found). + """ + cache = IntegrityCache(strict=False, maxsize=100) + + class UnhashableClass: + def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned + msg = "unhashable type" + raise TypeError(msg) + + cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] + stats = cache.get_stats() + assert "unhashable_skips" in stats + assert stats["unhashable_skips"] == 1 + assert stats["misses"] == 0 + + def test_hashable_args_do_not_increment_unhashable_skips(self) -> None: + """Fully hashable primitive args never increment unhashable_skips.""" + cache = IntegrityCache(strict=False, maxsize=100) + args: dict[str, FluentValue] = {"str": "value", "int": 42, "decimal": Decimal("3.14")} + cache.get("msg1", args, None, "en-US", use_isolating=True) + cache.put("msg2", args, None, "en-US", use_isolating=True, formatted="result", errors=()) + assert cache.unhashable_skips == 0 + + def test_put_with_circular_reference_increments_skip_counter(self) -> None: + """Circular reference in args increments unhashable_skips and skips storage.""" + cache = IntegrityCache(strict=False, maxsize=100) + circular: dict[str, object] = {} + circular["self"] = circular # Circular reference + assert cache.unhashable_skips == 0 + cache.put( + message_id="test", + args=circular, # type: ignore[arg-type] + attribute=None, + locale_code="en", + use_isolating=True, + formatted="output", + errors=(), + ) + assert cache.unhashable_skips == 1 + assert len(cache) == 0 + + def test_put_with_nested_circular_reference_increments_skip(self) -> None: + """Nested circular reference also triggers unhashable_skips increment.""" + cache = IntegrityCache(strict=False, maxsize=50) + nested: dict[str, object] = {"level1": {}} + nested["level1"]["back"] = nested # type: ignore[index] + initial_skips = cache.unhashable_skips + cache.put( + message_id="nested_test", + args=nested, # type: ignore[arg-type] + attribute=None, + locale_code="lv", + use_isolating=True, + formatted="result", + errors=(), + ) + assert cache.unhashable_skips == initial_skips + 1 + assert len(cache) == 0 + + def test_put_with_custom_unhashable_in_args_dict(self) -> None: + """Custom unhashable object as a dict value triggers skip.""" + cache = IntegrityCache(strict=False, maxsize=100) + + class UnhashableObject: + __hash__ = None # type: ignore[assignment] + + unhashable_args = {"obj": UnhashableObject()} + initial_skips = cache.unhashable_skips + cache.put( + message_id="custom_obj", + args=unhashable_args, # type: ignore[arg-type] + attribute="attr", + locale_code="en_US", + use_isolating=True, + formatted="value", + errors=(), + ) + assert cache.unhashable_skips == initial_skips + 1 + assert len(cache) == 0 diff --git a/tests/runtime_cache_hashable_cases/section_8_error_bloat_protection.py b/tests/runtime_cache_hashable_cases/section_8_error_bloat_protection.py new file mode 100644 index 00000000..6d8051e0 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_8_error_bloat_protection.py @@ -0,0 +1,57 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 8: ERROR BLOAT PROTECTION +# ============================================================================ + + +class TestIntegrityCacheErrorBloatProtection: + """Test IntegrityCache error collection memory bounding. + + Prevents unbounded memory use when a single message generates many errors. + Two limits: max_errors_per_entry (count) and max_entry_weight (bytes). + """ + + def test_put_rejects_excessive_error_count(self) -> None: + """put() skips caching when error count exceeds max_errors_per_entry.""" + cache = IntegrityCache(strict=False, max_errors_per_entry=10) + errors = tuple( + FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) for i in range(15) + ) + cache.put("msg", None, None, "en", use_isolating=True, formatted="formatted text", errors=errors) + assert cache.size == 0 + assert cache.get_stats()["error_bloat_skips"] == 1 + assert cache.get("msg", None, None, "en", use_isolating=True) is None + + def test_put_rejects_excessive_error_weight(self) -> None: + """put() skips caching when total weight exceeds max_entry_weight. + + Dynamic weight: base (100) + string len + per-error weights. + 10 errors with 100-char messages + 100-char formatted string exceeds 2000. + """ + cache = IntegrityCache(strict=False, max_entry_weight=2000, max_errors_per_entry=50) + errors = tuple( + FrozenFluentError("E" * 100, ErrorCategory.REFERENCE) for _ in range(10) + ) + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=errors) + assert cache.size == 0 + # 10 errors pass the count check (10 <= 50), but combined weight + # (100 formatted + 10 * 200 per error = 2100) exceeds max_entry_weight=2000. + assert cache.get_stats()["combined_weight_skips"] == 1 + assert cache.get_stats()["error_bloat_skips"] == 0 + + def test_put_accepts_reasonable_error_collections(self) -> None: + """put() caches results with error counts and weights within limits.""" + cache = IntegrityCache(strict=False, max_entry_weight=15000, max_errors_per_entry=50) + errors = tuple( + FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) for i in range(10) + ) + cache.put("msg", None, None, "en", use_isolating=True, formatted="formatted text", errors=errors) + assert cache.size == 1 + assert cache.get_stats()["error_bloat_skips"] == 0 + cached = cache.get("msg", None, None, "en", use_isolating=True) + assert cached is not None + assert cached.as_result() == ("formatted text", errors) diff --git a/tests/runtime_cache_hashable_cases/section_9_lru_eviction_behavior.py b/tests/runtime_cache_hashable_cases/section_9_lru_eviction_behavior.py new file mode 100644 index 00000000..dfc30ec7 --- /dev/null +++ b/tests/runtime_cache_hashable_cases/section_9_lru_eviction_behavior.py @@ -0,0 +1,47 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_hashable.py.""" + +from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SECTION 9: LRU EVICTION BEHAVIOR +# ============================================================================ + + +class TestIntegrityCacheLRUBehavior: + """Test IntegrityCache LRU eviction and move-to-end behavior.""" + + def test_put_moves_existing_key_to_end_of_lru(self) -> None: + """put() on existing key marks it as recently used (moves to LRU tail).""" + cache = IntegrityCache(strict=False, maxsize=3) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) + cache.put("msg3", None, None, "en", use_isolating=True, formatted="result3", errors=()) + assert cache.size == 3 + + # Updating msg1 moves it to the LRU tail (recently used) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="updated1", errors=()) + + # Adding msg4 should evict msg2 (now the oldest) + cache.put("msg4", None, None, "en", use_isolating=True, formatted="result4", errors=()) + assert cache.size == 3 + + assert cache.get("msg2", None, None, "en", use_isolating=True) is None + entry1 = cache.get("msg1", None, None, "en", use_isolating=True) + assert entry1 is not None + assert entry1.as_result() == ("updated1", ()) + assert cache.get("msg3", None, None, "en", use_isolating=True) is not None + assert cache.get("msg4", None, None, "en", use_isolating=True) is not None + + def test_put_evicts_lru_entry_when_cache_full(self) -> None: + """put() evicts the least recently used entry when capacity is reached.""" + cache = IntegrityCache(strict=False, maxsize=2) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) + assert cache.size == 2 + + cache.put("msg3", None, None, "en", use_isolating=True, formatted="result3", errors=()) + assert cache.size == 2 + assert cache.get("msg1", None, None, "en", use_isolating=True) is None + assert cache.get("msg2", None, None, "en", use_isolating=True) is not None + assert cache.get("msg3", None, None, "en", use_isolating=True) is not None diff --git a/tests/runtime_cache_integrity_cases/__init__.py b/tests/runtime_cache_integrity_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/runtime_cache_integrity_cases/checksums.py b/tests/runtime_cache_integrity_cases/checksums.py new file mode 100644 index 00000000..f711602b --- /dev/null +++ b/tests/runtime_cache_integrity_cases/checksums.py @@ -0,0 +1,274 @@ +# mypy: ignore-errors +from __future__ import annotations + +import contextlib + +import pytest +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ( + ErrorCategory, + FrozenFluentError, +) +from ftllexengine.integrity import CacheCorruptionError +from ftllexengine.runtime.cache import ( + IntegrityCache, + IntegrityCacheEntry, +) + +# Sentinel key_hash for unit tests that verify checksum mechanics but do not +# need meaningful key binding (all-zeros = "unbound test entry"). +_NO_KEY_HASH: bytes = b"\x00" * 8 + +# ============================================================================ +# CHECKSUM VERIFICATION TESTS +# ============================================================================ + + + +class TestChecksumComputation: + """Test BLAKE2b-128 checksum computation.""" + + def test_checksum_computed_on_create(self) -> None: + """IntegrityCacheEntry.create() computes checksum.""" + entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + assert entry.checksum is not None + assert len(entry.checksum) == 16 # BLAKE2b-128 = 16 bytes + + def test_different_metadata_different_checksum(self) -> None: + """Different metadata (sequence, timestamp) produces different checksums. + + Checksums now include created_at and sequence for complete audit trail integrity. + Identical content with different metadata produces different checksums. + """ + entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + entry2 = IntegrityCacheEntry.create("Hello", (), sequence=2, key_hash=_NO_KEY_HASH) + # Checksums differ because sequence is different (and created_at likely differs) + assert entry1.checksum != entry2.checksum + + def test_different_content_different_checksum(self) -> None: + """Different content produces different checksums.""" + entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + entry2 = IntegrityCacheEntry.create("World", (), sequence=1, key_hash=_NO_KEY_HASH) + assert entry1.checksum != entry2.checksum + + def test_errors_affect_checksum(self) -> None: + """Errors are included in checksum computation.""" + error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) + entry_no_errors = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + entry_with_errors = IntegrityCacheEntry.create( + "Hello", (error,), sequence=1, key_hash=_NO_KEY_HASH + ) + assert entry_no_errors.checksum != entry_with_errors.checksum + + def test_verify_returns_true_for_valid_entry(self) -> None: + """verify() returns True for uncorrupted entry.""" + entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + assert entry.verify() is True + + def test_entry_as_result_preserves_content(self) -> None: + """as_result() returns correct (formatted, errors) pair.""" + errors = (FrozenFluentError("Test", ErrorCategory.REFERENCE),) + entry = IntegrityCacheEntry.create("Hello", errors, sequence=1, key_hash=_NO_KEY_HASH) + assert entry.as_result() == ("Hello", errors) + + @given(st.text(min_size=0, max_size=1000)) + @settings(max_examples=50) + def test_checksum_validates_correctly(self, text: str) -> None: + """PROPERTY: Checksum validation is deterministic for same entry. + + Checksums now include metadata (created_at, sequence) for complete audit + trail integrity. Different entries with same content will have different + checksums due to different timestamps. We verify that each entry's + checksum validates correctly. + """ + entry = IntegrityCacheEntry.create(text, (), sequence=1, key_hash=_NO_KEY_HASH) + # Each entry should validate its own checksum correctly + assert entry.verify() is True + event(f"text_len={len(text)}") + +class TestCorruptionDetectionStrictMode: + """Test corruption detection in strict mode (fail-fast).""" + + def test_strict_mode_raises_on_corruption(self) -> None: + """strict=True raises CacheCorruptionError on checksum mismatch.""" + cache = IntegrityCache(strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Simulate corruption by directly modifying internal state + key = next(iter(cache._cache.keys())) + original_entry = cache._cache[key] + + # Create corrupted entry with wrong checksum + corrupted = IntegrityCacheEntry( + formatted="Corrupted!", + errors=original_entry.errors, + checksum=original_entry.checksum, # Wrong checksum for new content + created_at=original_entry.created_at, + sequence=original_entry.sequence, + key_hash=original_entry.key_hash, + ) + cache._cache[key] = corrupted + + with pytest.raises(CacheCorruptionError) as exc_info: + cache.get("msg", None, None, "en", use_isolating=True) + + assert "corruption detected" in str(exc_info.value).lower() + assert exc_info.value.context is not None + assert exc_info.value.context.component == "cache" + + def test_strict_mode_corruption_counter_incremented(self) -> None: + """Corruption detection increments corruption_detected counter.""" + cache = IntegrityCache(strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Corrupt entry + key = next(iter(cache._cache.keys())) + entry = cache._cache[key] + corrupted = IntegrityCacheEntry( + formatted="Corrupted", + errors=entry.errors, + checksum=entry.checksum, + created_at=entry.created_at, + sequence=entry.sequence, + key_hash=entry.key_hash, + ) + cache._cache[key] = corrupted + + with contextlib.suppress(CacheCorruptionError): + cache.get("msg", None, None, "en", use_isolating=True) + + stats = cache.get_stats() + assert stats["corruption_detected"] == 1 + +class TestCorruptionDetectionNonStrictMode: + """Test corruption detection in non-strict mode (silent eviction).""" + + def test_non_strict_evicts_corrupted_entry(self) -> None: + """strict=False silently evicts corrupted entry.""" + cache = IntegrityCache(strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Verify entry exists + assert cache.get("msg", None, None, "en", use_isolating=True) is not None + + # Corrupt entry + key = next(iter(cache._cache.keys())) + entry = cache._cache[key] + corrupted = IntegrityCacheEntry( + formatted="Corrupted", + errors=entry.errors, + checksum=entry.checksum, + created_at=entry.created_at, + sequence=entry.sequence, + key_hash=entry.key_hash, + ) + cache._cache[key] = corrupted + + # Get returns None (not an exception) + result = cache.get("msg", None, None, "en", use_isolating=True) + assert result is None + + # Entry was evicted + stats = cache.get_stats() + assert stats["size"] == 0 + assert stats["corruption_detected"] == 1 + + def test_non_strict_records_miss_on_corruption(self) -> None: + """Corrupted entry results in cache miss.""" + cache = IntegrityCache(strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # First get is a hit + cache.get("msg", None, None, "en", use_isolating=True) + stats = cache.get_stats() + assert stats["hits"] == 1 + assert stats["misses"] == 0 + + # Corrupt entry + key = next(iter(cache._cache.keys())) + entry = cache._cache[key] + corrupted = IntegrityCacheEntry( + formatted="Corrupted", + errors=entry.errors, + checksum=entry.checksum, + created_at=entry.created_at, + sequence=entry.sequence, + key_hash=entry.key_hash, + ) + cache._cache[key] = corrupted + + # Second get is a miss (corruption detected, entry evicted) + cache.get("msg", None, None, "en", use_isolating=True) + stats = cache.get_stats() + assert stats["misses"] == 1 # Corruption triggers miss + +class TestKeyBindingConfusion: + """Cover the key-binding confusion check (lines 652-670). + + The key-binding check fires when an entry's stored key_hash doesn't match + the hash of the lookup key. This is distinct from a checksum mismatch: + the entry is internally consistent (verify() passes) but is stored under + the wrong key slot — a sign of active tampering or memory corruption. + + Strategy: put an entry under key B, inject it into the slot for key A, + then call get(key A). verify() passes (entry_b is internally valid) but + the key_hash bound to key B != _compute_key_hash(key A). + """ + + @staticmethod + def _inject_key_confused_entry(cache: IntegrityCache) -> None: + """Put msg-b, then move its entry into the msg-a slot.""" + cache.put("msg-b", None, None, "en", use_isolating=True, formatted="Hello B", errors=()) + key_b: tuple = ("msg-b", (), None, "en", True) + key_a: tuple = ("msg-a", (), None, "en", True) + # Inject entry_b under key_a — checksum is valid but key_hash is wrong + cache._cache[key_a] = cache._cache[key_b] + + def test_key_confusion_strict_raises(self) -> None: + """strict=True raises CacheCorruptionError on key-binding mismatch.""" + cache = IntegrityCache(strict=True) + self._inject_key_confused_entry(cache) + + with pytest.raises(CacheCorruptionError) as exc_info: + cache.get("msg-a", None, None, "en", use_isolating=True) + + assert "key confusion" in str(exc_info.value).lower() + assert exc_info.value.context is not None + assert exc_info.value.context.component == "cache" + assert exc_info.value.context.operation == "get" + + def test_key_confusion_strict_increments_counter(self) -> None: + """Key-binding confusion increments corruption_detected counter.""" + cache = IntegrityCache(strict=True) + self._inject_key_confused_entry(cache) + + with contextlib.suppress(CacheCorruptionError): + cache.get("msg-a", None, None, "en", use_isolating=True) + + assert cache.get_stats()["corruption_detected"] == 1 + + def test_key_confusion_non_strict_returns_none(self) -> None: + """strict=False evicts the confused entry and returns None.""" + cache = IntegrityCache(strict=False) + self._inject_key_confused_entry(cache) + + result = cache.get("msg-a", None, None, "en", use_isolating=True) + + assert result is None + stats = cache.get_stats() + assert stats["corruption_detected"] == 1 + assert stats["misses"] == 1 + + def test_key_confusion_non_strict_evicts_entry(self) -> None: + """Non-strict key confusion removes the confused entry from the cache.""" + cache = IntegrityCache(strict=False) + self._inject_key_confused_entry(cache) + + key_a: tuple = ("msg-a", (), None, "en", True) + assert key_a in cache._cache # Injected entry is present + + cache.get("msg-a", None, None, "en", use_isolating=True) + + assert key_a not in cache._cache diff --git a/tests/runtime_cache_integrity_cases/idempotence_and_hashes.py b/tests/runtime_cache_integrity_cases/idempotence_and_hashes.py new file mode 100644 index 00000000..2e45bc6b --- /dev/null +++ b/tests/runtime_cache_integrity_cases/idempotence_and_hashes.py @@ -0,0 +1,383 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +import threading +from datetime import UTC +from decimal import Decimal + +import pytest +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ( + ErrorCategory, + FrozenFluentError, +) +from ftllexengine.integrity import WriteConflictError +from ftllexengine.runtime.cache import ( + IntegrityCache, + IntegrityCacheEntry, +) + +# Sentinel key_hash for unit tests that verify checksum mechanics but do not +# need meaningful key binding (all-zeros = "unbound test entry"). +_NO_KEY_HASH: bytes = b"\x00" * 8 + +# ============================================================================ +# CHECKSUM VERIFICATION TESTS +# ============================================================================ + + + +class TestContentHash: + """Test content-only hash computation for idempotent write detection.""" + + def test_content_hash_computed(self) -> None: + """IntegrityCacheEntry has content_hash property.""" + entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + content_hash = entry.content_hash + assert content_hash is not None + assert len(content_hash) == 16 # BLAKE2b-128 + + def test_identical_content_same_hash(self) -> None: + """Entries with identical content have identical content hashes. + + This is critical for idempotent write detection: concurrent threads + computing the same formatted result should produce matching content hashes. + """ + entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + entry2 = IntegrityCacheEntry.create("Hello", (), sequence=2, key_hash=_NO_KEY_HASH) + + # Full checksums differ (include metadata) + assert entry1.checksum != entry2.checksum + + # Content hashes are identical + assert entry1.content_hash == entry2.content_hash + + def test_different_content_different_hash(self) -> None: + """Entries with different content have different content hashes.""" + entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + entry2 = IntegrityCacheEntry.create("World", (), sequence=1, key_hash=_NO_KEY_HASH) + + assert entry1.content_hash != entry2.content_hash + + def test_errors_affect_content_hash(self) -> None: + """Errors are included in content hash computation.""" + error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) + entry_no_errors = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) + entry_with_errors = IntegrityCacheEntry.create( + "Hello", (error,), sequence=1, key_hash=_NO_KEY_HASH + ) + + assert entry_no_errors.content_hash != entry_with_errors.content_hash + + @given(st.text(min_size=0, max_size=500)) + @settings(max_examples=30) + def test_content_hash_deterministic(self, text: str) -> None: + """PROPERTY: Content hash is deterministic for same content.""" + entry1 = IntegrityCacheEntry.create(text, (), sequence=1, key_hash=_NO_KEY_HASH) + entry2 = IntegrityCacheEntry.create(text, (), sequence=999, key_hash=_NO_KEY_HASH) + + assert entry1.content_hash == entry2.content_hash + event(f"text_len={len(text)}") + +class TestIdempotentWrites: + """Test idempotent write detection for thundering herd scenarios. + + In write_once mode, concurrent writes with identical content (formatted + errors) + are treated as idempotent operations, not conflicts. This prevents false-positive + WriteConflictError during thundering herds where multiple threads resolve the + same message simultaneously. + """ + + def test_idempotent_write_succeeds_in_strict_mode(self) -> None: + """Identical content is allowed in write_once + strict mode. + + Thundering herd scenario: Multiple threads resolve same message, + all compute identical results. Second thread should succeed silently. + """ + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Second put with IDENTICAL content should succeed (idempotent) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Verify entry unchanged + entry = cache.get("msg", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Hello" + assert entry.sequence == 1 # Original sequence preserved + + def test_different_content_raises_conflict(self) -> None: + """Different content raises WriteConflictError in strict mode.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + with pytest.raises(WriteConflictError): + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + + def test_idempotent_write_counter_incremented(self) -> None: + """Idempotent writes increment the idempotent_writes counter.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Perform idempotent writes + for _ in range(5): + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + stats = cache.get_stats() + assert stats["idempotent_writes"] == 5 + + def test_idempotent_writes_property(self) -> None: + """idempotent_writes property returns correct count.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + assert cache.idempotent_writes == 0 + + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + assert cache.idempotent_writes == 1 + + def test_idempotent_with_errors(self) -> None: + """Idempotent detection includes errors in comparison.""" + error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) + cache = IntegrityCache(write_once=True, strict=True) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=(error,)) + + # Same content WITH same error = idempotent + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=(error,)) + assert cache.idempotent_writes == 1 + + # Same text but WITHOUT error = conflict + with pytest.raises(WriteConflictError): + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + def test_idempotent_non_strict_mode(self) -> None: + """Idempotent writes also work in non-strict mode.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Idempotent write + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Different content silently ignored (non-strict) + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + + stats = cache.get_stats() + assert stats["idempotent_writes"] == 1 # Only one idempotent + + # Original value preserved + entry = cache.get("msg", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Hello" + + def test_idempotent_counter_preserved_on_clear(self) -> None: + """Idempotent counter is cumulative across clear() calls.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent + + assert cache.idempotent_writes == 1 + + # clear() removes entries but does NOT reset cumulative metrics. + cache.clear() + + assert cache.idempotent_writes == 1 + + def test_audit_records_idempotent_writes(self) -> None: + """Audit log records WRITE_ONCE_IDEMPOTENT operations.""" + cache = IntegrityCache(write_once=True, strict=True, enable_audit=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent + + audit_log = cache._audit_log + assert audit_log is not None + + # pylint: disable=not-an-iterable + operations = [entry.operation for entry in audit_log] + assert "WRITE_ONCE_IDEMPOTENT" in operations + + def test_audit_records_conflict(self) -> None: + """Audit log records WRITE_ONCE_CONFLICT for different content.""" + cache = IntegrityCache(write_once=True, strict=False, enable_audit=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) # Conflict (non-strict) + + audit_log = cache._audit_log + assert audit_log is not None + + # pylint: disable=not-an-iterable + operations = [entry.operation for entry in audit_log] + assert "WRITE_ONCE_CONFLICT" in operations + +class TestIdempotentWritesConcurrency: + """Test idempotent writes under concurrent access (thundering herd).""" + + def test_concurrent_identical_writes_no_exceptions(self) -> None: + """Concurrent writes with identical content all succeed (no exceptions). + + This is the thundering herd scenario: multiple threads resolve same + message simultaneously, all compute identical results. Without idempotent + detection, N-1 threads would crash with WriteConflictError. + """ + cache = IntegrityCache(write_once=True, strict=True) + errors: list[Exception] = [] + + def put_identical() -> None: + try: + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + except Exception as e: # pylint: disable=broad-exception-caught + errors.append(e) + + # 20 threads all trying to cache same value + threads = [threading.Thread(target=put_identical) for _ in range(20)] + for thread in threads: + thread.start() + for thread in threads: + thread.join() + + # NO exceptions should occur (all are idempotent or first write) + assert len(errors) == 0, f"Got {len(errors)} exceptions: {errors}" + + # Only one entry should exist + stats = cache.get_stats() + assert stats["size"] == 1 + + # Idempotent counter should reflect concurrent writes minus first + assert stats["idempotent_writes"] == 19 # 20 threads - 1 first write + + def test_concurrent_different_writes_raises_conflicts(self) -> None: + """Concurrent writes with DIFFERENT content raise conflicts.""" + cache = IntegrityCache(write_once=True, strict=True) + conflict_count = 0 + lock = threading.Lock() + + def put_different(i: int) -> None: + nonlocal conflict_count + try: + cache.put("msg", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) + except WriteConflictError: + with lock: + conflict_count += 1 + + # 10 threads all trying to cache DIFFERENT values + threads = [threading.Thread(target=put_different, args=(i,)) for i in range(10)] + for thread in threads: + thread.start() + for thread in threads: + thread.join() + + # Most writes should fail (conflict) + assert conflict_count >= 9 # At least 9 conflicts (1 succeeds) + + # Only one entry should exist + stats = cache.get_stats() + assert stats["size"] == 1 + +class TestDatetimeTimezoneCollisionPrevention: + """Test that datetime objects with different timezones produce distinct cache keys. + + Two datetime objects can represent the same UTC instant but have different tzinfo. + Python's datetime equality considers them equal, but they format to different + local time strings. The cache must distinguish them. + """ + + def test_same_utc_instant_different_timezone_distinct_keys(self) -> None: + """Datetimes with same UTC instant but different tzinfo produce distinct keys.""" + from datetime import datetime, timedelta, timezone + + # 12:00 UTC + dt_utc = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) + # 07:00 EST (UTC-5) = 12:00 UTC - SAME INSTANT + dt_est = datetime(2024, 1, 1, 7, 0, 0, tzinfo=timezone(timedelta(hours=-5))) + + # Verify they represent the same instant (Python equality) + assert dt_utc == dt_est + + # But they should produce DIFFERENT cache keys + key_utc = IntegrityCache._make_hashable(dt_utc) + key_est = IntegrityCache._make_hashable(dt_est) + assert key_utc != key_est + + def test_naive_datetime_distinguished_from_aware(self) -> None: + """Naive datetime is distinguished from aware datetime.""" + from datetime import datetime + + dt_naive = datetime(2024, 1, 1, 12, 0, 0) # noqa: DTZ001 - naive datetime by design + dt_aware = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) + + key_naive = IntegrityCache._make_hashable(dt_naive) + key_aware = IntegrityCache._make_hashable(dt_aware) + + # Different tz_key means different cache keys + assert key_naive != key_aware + assert isinstance(key_naive, tuple) + assert isinstance(key_aware, tuple) + assert key_naive[2] == "__naive__" + assert key_aware[2] == "UTC" + +class TestDecimalNegativeZeroCollisionPrevention: + """Test that Decimal("0") and Decimal("-0") produce distinct cache keys. + + Python's Decimal("0") == Decimal("-0"), but locale-aware formatting may + distinguish them (e.g., "-0" vs "0"). The cache must treat them as distinct. + """ + + def test_zero_and_negative_zero_distinct_keys(self) -> None: + """Decimal("0") and Decimal("-0") produce distinct cache keys.""" + key_pos = IntegrityCache._make_hashable(Decimal(0)) + key_neg = IntegrityCache._make_hashable(Decimal("-0")) + + # They're equal in Python + assert Decimal(0) == Decimal("-0") + + # But distinct in cache keys (via str representation) + assert key_pos != key_neg + assert key_pos == ("__decimal__", "0") + assert key_neg == ("__decimal__", "-0") + +class TestSequenceMappingABCSupport: + """Test that Sequence and Mapping ABCs are supported, not just list/tuple/dict.""" + + def test_userlist_accepted(self) -> None: + """UserList (Sequence ABC) is accepted and type-tagged.""" + from collections import UserList + + values = UserList([1, 2, 3]) + result = IntegrityCache._make_hashable(values) + + # Should be tagged as __seq__ (generic Sequence) + assert isinstance(result, tuple) + assert result[0] == "__seq__" + # Inner values are type-tagged + assert result[1] == (("__int__", 1), ("__int__", 2), ("__int__", 3)) + + def test_chainmap_accepted(self) -> None: + """ChainMap (Mapping ABC) is accepted with __mapping__ tag.""" + from collections import ChainMap + + values: ChainMap[str, int] = ChainMap({"a": 1}, {"b": 2}) + result = IntegrityCache._make_hashable(values) + + # Should be tagged tuple with __mapping__ prefix + assert isinstance(result, tuple) + assert result[0] == "__mapping__" + # ChainMap flattens to view of first-found keys + inner = result[1] + assert isinstance(inner, tuple) + assert ("a", ("__int__", 1)) in inner + assert ("b", ("__int__", 2)) in inner + + def test_list_still_tagged_as_list(self) -> None: + """Regular list still uses __list__ tag, not __seq__.""" + result = IntegrityCache._make_hashable([1, 2]) + assert isinstance(result, tuple) + assert result[0] == "__list__" + + def test_tuple_still_tagged_as_tuple(self) -> None: + """Regular tuple still uses __tuple__ tag, not __seq__.""" + result = IntegrityCache._make_hashable((1, 2)) + assert isinstance(result, tuple) + assert result[0] == "__tuple__" diff --git a/tests/runtime_cache_integrity_cases/integrity_edges.py b/tests/runtime_cache_integrity_cases/integrity_edges.py new file mode 100644 index 00000000..93301a90 --- /dev/null +++ b/tests/runtime_cache_integrity_cases/integrity_edges.py @@ -0,0 +1,561 @@ +# mypy: ignore-errors +from __future__ import annotations + +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ( + Diagnostic, + DiagnosticCode, + ErrorCategory, + FrozenErrorContext, + FrozenFluentError, +) +from ftllexengine.runtime.cache import ( + IntegrityCache, + IntegrityCacheEntry, + _estimate_error_weight, +) + +# Sentinel key_hash for unit tests that verify checksum mechanics but do not +# need meaningful key binding (all-zeros = "unbound test entry"). +_NO_KEY_HASH: bytes = b"\x00" * 8 + +# ============================================================================ +# CHECKSUM VERIFICATION TESTS +# ============================================================================ + + + +class TestIntegrityCacheEntryContentHash: + """Test IntegrityCacheEntry checksum computation with error.content_hash.""" + + def test_compute_checksum_uses_error_content_hash(self) -> None: + """_compute_checksum uses error.content_hash when available.""" + error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) + entry = IntegrityCacheEntry.create( + "formatted text", (error,), sequence=1, key_hash=_NO_KEY_HASH + ) + assert entry.checksum is not None + assert len(entry.checksum) == 16 # BLAKE2b-128 + assert entry.verify() is True + + def test_compute_checksum_with_multiple_errors_content_hash(self) -> None: + """_compute_checksum uses content_hash for multiple errors.""" + errors = ( + FrozenFluentError("Error 1", ErrorCategory.REFERENCE), + FrozenFluentError("Error 2", ErrorCategory.RESOLUTION), + FrozenFluentError("Error 3", ErrorCategory.CYCLIC), + ) + entry = IntegrityCacheEntry.create( + "formatted text", errors, sequence=1, key_hash=_NO_KEY_HASH + ) + assert entry.checksum is not None + assert entry.verify() is True + + @given(st.integers(min_value=1, max_value=10)) + @settings(max_examples=50) + def test_property_checksum_deterministic_with_errors(self, error_count: int) -> None: + """PROPERTY: Checksum is deterministic; each entry validates against itself. + + Checksums include metadata (created_at, sequence) for complete audit trail + integrity, so two independently created entries with the same content will + have different checksums. Each entry does self-validate correctly. + """ + errors = tuple( + FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) + for i in range(error_count) + ) + entry = IntegrityCacheEntry.create("formatted", errors, sequence=1, key_hash=_NO_KEY_HASH) + assert entry.verify() is True + entry2 = IntegrityCacheEntry.create("formatted", errors, sequence=1, key_hash=_NO_KEY_HASH) + assert entry2.verify() is True + event(f"error_count={error_count}") + + def test_cache_put_get_with_frozen_errors(self) -> None: + """Cache operations work correctly with FrozenFluentError.content_hash.""" + cache = IntegrityCache(strict=False) + errors = ( + FrozenFluentError("Reference error", ErrorCategory.REFERENCE), + FrozenFluentError("Resolution error", ErrorCategory.RESOLUTION), + ) + cache.put("msg", None, None, "en", use_isolating=True, formatted="formatted text", errors=errors) + entry = cache.get("msg", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "formatted text" + assert entry.errors == errors + assert entry.verify() is True + +class TestIntegrityCacheAuditLogDisabled: + """Test get_audit_log() returns empty tuple when audit logging is disabled.""" + + def test_get_audit_log_returns_empty_when_disabled_by_default(self) -> None: + """get_audit_log() returns empty tuple when audit disabled (default).""" + cache = IntegrityCache(strict=False) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) + cache.get("msg1", None, None, "en", use_isolating=True) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) + audit_log = cache.get_audit_log() + assert audit_log == () + assert isinstance(audit_log, tuple) + + def test_get_audit_log_returns_empty_when_disabled_explicit(self) -> None: + """get_audit_log() returns empty tuple when enable_audit=False explicitly.""" + cache = IntegrityCache(enable_audit=False, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="result", errors=()) + cache.get("msg", None, None, "en", use_isolating=True) + assert cache.get_audit_log() == () + + @given( + st.integers(min_value=1, max_value=20), + st.integers(min_value=1, max_value=10), + ) + @settings(max_examples=30) + def test_property_audit_log_always_empty_when_disabled( + self, put_count: int, get_count: int + ) -> None: + """PROPERTY: get_audit_log() always returns empty tuple when disabled.""" + cache = IntegrityCache(enable_audit=False, strict=False) + for i in range(put_count): + cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) + for i in range(get_count): + cache.get(f"msg{i % put_count}", None, None, "en", use_isolating=True) + audit_log = cache.get_audit_log() + assert audit_log == () + assert len(audit_log) == 0 + event(f"put_count={put_count}") + +class TestIntegrityCacheAuditLogEnabled: + """Test get_audit_log() returns tuple of entries when audit logging is enabled.""" + + def test_get_audit_log_returns_tuple_when_enabled(self) -> None: + """get_audit_log() returns tuple with entries when enable_audit=True.""" + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) + cache.get("msg1", None, None, "en", use_isolating=True) + cache.get("msg2", None, None, "en", use_isolating=True) # Miss + audit_log = cache.get_audit_log() + assert isinstance(audit_log, tuple) + assert len(audit_log) >= 3 # PUT + HIT + MISS + + @given(st.integers(min_value=1, max_value=10)) + @settings(max_examples=20) + def test_property_audit_log_returns_tuple_when_enabled(self, op_count: int) -> None: + """PROPERTY: get_audit_log() returns tuple of at least op_count entries.""" + cache = IntegrityCache(enable_audit=True, strict=False) + for i in range(op_count): + cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) + audit_log = cache.get_audit_log() + assert isinstance(audit_log, tuple) + assert len(audit_log) >= op_count + event(f"op_count={op_count}") + +class TestIntegrityCachePropertyGetters: + """Test property getters for complete coverage.""" + + def test_corruption_detected_property(self) -> None: + """corruption_detected property reflects detected corruption count.""" + cache = IntegrityCache(strict=False) + assert cache.corruption_detected == 0 + + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + key = next(iter(cache._cache.keys())) + original_entry = cache._cache[key] + corrupted = IntegrityCacheEntry( + formatted="Corrupted!", + errors=original_entry.errors, + checksum=original_entry.checksum, + created_at=original_entry.created_at, + sequence=original_entry.sequence, + key_hash=original_entry.key_hash, + ) + cache._cache[key] = corrupted + cache.get("msg", None, None, "en", use_isolating=True) + assert cache.corruption_detected == 1 + + def test_write_once_property(self) -> None: + """write_once property reflects constructor argument.""" + assert IntegrityCache(write_once=False, strict=False).write_once is False + assert IntegrityCache(write_once=True, strict=False).write_once is True + + def test_strict_property(self) -> None: + """strict property reflects constructor argument.""" + assert IntegrityCache(strict=False).strict is False + assert IntegrityCache(strict=True).strict is True + + @given(st.booleans(), st.booleans()) + @settings(max_examples=4) + def test_property_write_once_strict_reflect_constructor( + self, write_once: bool, strict: bool + ) -> None: + """PROPERTY: write_once and strict properties reflect constructor args.""" + cache = IntegrityCache(write_once=write_once, strict=strict) + assert cache.write_once == write_once + assert cache.strict == strict + wo = "write_once" if write_once else "normal" + event(f"mode={wo}") + + def test_corruption_detected_accumulates_across_multiple(self) -> None: + """corruption_detected accumulates across multiple corruption events.""" + cache = IntegrityCache(strict=False) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) + cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) + for key in list(cache._cache.keys()): + entry = cache._cache[key] + cache._cache[key] = IntegrityCacheEntry( + formatted="Corrupted", + errors=entry.errors, + checksum=entry.checksum, + created_at=entry.created_at, + sequence=entry.sequence, + key_hash=entry.key_hash, + ) + cache.get("msg1", None, None, "en", use_isolating=True) + assert cache.corruption_detected == 1 + cache.get("msg2", None, None, "en", use_isolating=True) + assert cache.corruption_detected == 2 + cache.get("msg3", None, None, "en", use_isolating=True) + assert cache.corruption_detected == 3 + + def test_error_bloat_skips_property(self) -> None: + """error_bloat_skips property reflects excess-error-count skip count.""" + cache = IntegrityCache(strict=False, max_errors_per_entry=2) + errors = tuple( + FrozenFluentError(f"err-{i}", ErrorCategory.REFERENCE) for i in range(3) + ) + assert cache.error_bloat_skips == 0 + + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=errors) + assert cache.error_bloat_skips == 1 + + def test_combined_weight_skips_property_initial_zero(self) -> None: + """combined_weight_skips property starts at zero.""" + cache = IntegrityCache(strict=False) + assert cache.combined_weight_skips == 0 + + def test_combined_weight_skips_property_incremented(self) -> None: + """combined_weight_skips property reflects combined-weight skip count.""" + # max_entry_weight=200: formatted (100 chars) passes check 1, + # but combined with error overhead (100 base + 150 msg = 250), total=350 fails. + cache = IntegrityCache(strict=False, max_entry_weight=200) + error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) + assert cache.combined_weight_skips == 0 + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) + assert cache.combined_weight_skips == 1 + + def test_write_once_conflicts_property_initial_zero(self) -> None: + """write_once_conflicts property starts at zero.""" + cache = IntegrityCache(write_once=True, strict=False) + assert cache.write_once_conflicts == 0 + + def test_write_once_conflicts_property_incremented(self) -> None: + """write_once_conflicts property reflects true conflict count.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + assert cache.write_once_conflicts == 0 + + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + assert cache.write_once_conflicts == 1 + +class TestIntegrityCacheEdgeCases: + """Additional edge cases for complete coverage.""" + + def test_entry_with_empty_errors_differs_from_entry_with_error(self) -> None: + """Entries with empty vs non-empty errors tuples have distinct checksums.""" + error = FrozenFluentError("Test", ErrorCategory.REFERENCE) + entry1 = IntegrityCacheEntry.create("text", (), sequence=1, key_hash=_NO_KEY_HASH) + entry2 = IntegrityCacheEntry.create("text", (error,), sequence=2, key_hash=_NO_KEY_HASH) + assert entry1.checksum != entry2.checksum + + def test_cache_stats_includes_all_integrity_fields(self) -> None: + """get_stats() includes corruption_detected, write_once, strict, audit_enabled.""" + cache = IntegrityCache(write_once=True, strict=True, enable_audit=False) + stats = cache.get_stats() + assert "corruption_detected" in stats + assert "write_once" in stats + assert "strict" in stats + assert "audit_enabled" in stats + assert stats["corruption_detected"] == 0 + assert stats["write_once"] is True + assert stats["strict"] is True + assert stats["audit_enabled"] is False + + def test_multiple_operations_exercise_all_properties(self) -> None: + """Exercise all properties through multiple cache operations.""" + cache = IntegrityCache( + maxsize=10, write_once=False, strict=False, enable_audit=False + ) + for i in range(5): + cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) + assert cache.size == 5 + assert cache.maxsize == 10 + assert cache.hits == 0 + assert cache.misses == 0 + assert cache.corruption_detected == 0 + assert cache.write_once is False + assert cache.strict is False + for i in range(5): + entry = cache.get(f"msg{i}", None, None, "en", use_isolating=True) + assert entry is not None + assert cache.hits == 5 + assert cache.get_audit_log() == () + +class TestEstimateErrorWeightWithContext: + """Test _estimate_error_weight with errors containing FrozenErrorContext. + + Covers the branch where error.context fields are processed. + """ + + def test_error_weight_with_context(self) -> None: + """Error with context includes all context field lengths in weight.""" + context = FrozenErrorContext( + input_value="test_input_value", + locale_code="en_US", + parse_type="number", + fallback_value="{!NUMBER}", + ) + error = FrozenFluentError( + "Parse error", ErrorCategory.FORMATTING, context=context + ) + weight = _estimate_error_weight(error) + expected_weight = ( + 100 # _ERROR_BASE_OVERHEAD + + len("Parse error") + + len("test_input_value") + + len("en_US") + + len("number") + + len("{!NUMBER}") + ) + assert weight == expected_weight + + def test_error_weight_without_context(self) -> None: + """Error without context only includes base overhead plus message length.""" + error = FrozenFluentError("Simple error", ErrorCategory.REFERENCE) + weight = _estimate_error_weight(error) + assert weight == 100 + len("Simple error") + + @given( + input_val=st.text(min_size=0, max_size=100), + locale=st.text(min_size=0, max_size=20), + parse_type=st.sampled_from( + ["", "currency", "date", "datetime", "decimal", "number"] + ), + fallback=st.text(min_size=0, max_size=50), + ) + @settings(max_examples=50) + def test_property_error_weight_accounts_for_all_context_fields( + self, + input_val: str, + locale: str, + parse_type: str, + fallback: str, + ) -> None: + """PROPERTY: Error weight correctly accounts for all context field lengths.""" + context = FrozenErrorContext( + input_value=input_val, + locale_code=locale, + parse_type=parse_type, + fallback_value=fallback, + ) + error = FrozenFluentError("Test", ErrorCategory.FORMATTING, context=context) + weight = _estimate_error_weight(error) + expected = ( + 100 + + len("Test") + + len(input_val) + + len(locale) + + len(parse_type) + + len(fallback) + ) + assert weight == expected + event(f"context_len={len(input_val) + len(locale)}") + +class TestEstimateErrorWeightDiagnosticBranches: + """Test _estimate_error_weight with diagnostic fields including resolution_path.""" + + def test_error_weight_diagnostic_without_resolution_path(self) -> None: + """Error with diagnostic but no resolution_path skips path length processing.""" + diagnostic = Diagnostic( + code=DiagnosticCode.MESSAGE_NOT_FOUND, + message="Reference error", + ) + error = FrozenFluentError( + "Message not found", ErrorCategory.REFERENCE, diagnostic=diagnostic + ) + weight = _estimate_error_weight(error) + expected = 100 + len("Message not found") + len("Reference error") + assert weight == expected + + def test_error_weight_diagnostic_with_resolution_path(self) -> None: + """Error with diagnostic and resolution_path includes path element lengths.""" + diagnostic = Diagnostic( + code=DiagnosticCode.CYCLIC_REFERENCE, + message="Reference error", + resolution_path=("message1", "term1", "message2"), + ) + error = FrozenFluentError( + "Circular reference", ErrorCategory.CYCLIC, diagnostic=diagnostic + ) + weight = _estimate_error_weight(error) + expected = ( + 100 + + len("Circular reference") + + len("Reference error") + + len("message1") + + len("term1") + + len("message2") + ) + assert weight == expected + + def test_error_weight_diagnostic_with_all_optional_fields(self) -> None: + """Error with diagnostic containing all optional fields includes them in weight.""" + diagnostic = Diagnostic( + code=DiagnosticCode.INVALID_ARGUMENT, + message="Invalid argument", + hint="Use NUMBER() function", + help_url="https://example.com/help", + function_name="CURRENCY", + argument_name="minimumFractionDigits", + expected_type="int", + received_type="str", + ftl_location="message.ftl:42", + ) + error = FrozenFluentError( + "Function call error", ErrorCategory.FORMATTING, diagnostic=diagnostic + ) + weight = _estimate_error_weight(error) + expected = ( + 100 + + len("Function call error") + + len("Invalid argument") + + len("Use NUMBER() function") + + len("https://example.com/help") + + len("CURRENCY") + + len("minimumFractionDigits") + + len("int") + + len("str") + + len("message.ftl:42") + ) + assert weight == expected + +class TestCacheEntryVerifyWithCorruptedError: + """Test IntegrityCacheEntry.verify() when error.verify_integrity() returns False. + + Exercises the defense-in-depth check where entry verification recurses into + each contained error's own verify_integrity() method. + """ + + def test_verify_returns_false_when_error_message_corrupted(self) -> None: + """IntegrityCacheEntry.verify() returns False when error is memory-corrupted. + + Simulates memory corruption: error._message is changed without updating + the stored _content_hash, causing verify_integrity() to return False. + """ + error = FrozenFluentError("Test error 2", ErrorCategory.REFERENCE) + entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) + object.__setattr__(error, "_frozen", False) + object.__setattr__(error, "_message", "corrupted message") + object.__setattr__(error, "_frozen", True) + assert error.verify_integrity() is False + assert entry.verify() is False + + def test_verify_detects_corruption_defense_in_depth(self) -> None: + """IntegrityCacheEntry.verify() provides defense-in-depth error verification.""" + error = FrozenFluentError("Original message", ErrorCategory.REFERENCE) + entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) + assert entry.verify() is True + object.__setattr__(error, "_frozen", False) + object.__setattr__(error, "_message", "Corrupted by memory error") + object.__setattr__(error, "_frozen", True) + assert error.verify_integrity() is False + assert entry.verify() is False + + def test_verify_returns_true_when_all_errors_valid(self) -> None: + """IntegrityCacheEntry.verify() returns True when all errors pass integrity.""" + errors = ( + FrozenFluentError("Error 1", ErrorCategory.REFERENCE), + FrozenFluentError("Error 2", ErrorCategory.FORMATTING), + FrozenFluentError("Error 3", ErrorCategory.CYCLIC), + ) + entry = IntegrityCacheEntry.create("Result", errors, sequence=1, key_hash=_NO_KEY_HASH) + assert entry.verify() is True + + def test_verify_returns_false_if_any_error_corrupted(self) -> None: + """IntegrityCacheEntry.verify() returns False if any single error is corrupted.""" + error1 = FrozenFluentError("Error 1", ErrorCategory.REFERENCE) + error2 = FrozenFluentError("Error 2", ErrorCategory.FORMATTING) + error3 = FrozenFluentError("Error 3", ErrorCategory.CYCLIC) + entry = IntegrityCacheEntry.create( + "Result", (error1, error2, error3), sequence=1, key_hash=_NO_KEY_HASH + ) + object.__setattr__(error2, "_frozen", False) + object.__setattr__(error2, "_content_hash", b"bad_hash_xxxxxxx") + object.__setattr__(error2, "_frozen", True) + assert entry.verify() is False + +class TestErrorWeightAndVerifyIntegration: + """Integration tests combining error weight estimation and verification.""" + + def test_large_error_with_context_and_diagnostic(self) -> None: + """Error with both context and diagnostic computes correct weight.""" + context = FrozenErrorContext( + input_value="very long input value that would increase weight significantly", + locale_code="en_US", + parse_type="currency", + fallback_value="{!CURRENCY}", + ) + diagnostic = Diagnostic( + code=DiagnosticCode.PARSE_DECIMAL_FAILED, + message="Failed to parse number", + hint="Check number format", + resolution_path=("step1", "step2", "step3"), + ) + error = FrozenFluentError( + "Complex error message", + ErrorCategory.FORMATTING, + diagnostic=diagnostic, + context=context, + ) + weight = _estimate_error_weight(error) + expected = ( + 100 + + len("Complex error message") + + len("Failed to parse number") + + len("Check number format") + + len("step1") + len("step2") + len("step3") + + len("very long input value that would increase weight significantly") + + len("en_US") + + len("currency") + + len("{!CURRENCY}") + ) + assert weight == expected + assert error.verify_integrity() is True + entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) + assert entry.verify() is True + + @given( + message=st.text(min_size=1, max_size=100), + input_val=st.text(min_size=0, max_size=50), + locale=st.text(min_size=0, max_size=10), + ) + @settings(max_examples=50) + def test_property_weight_estimation_deterministic( + self, message: str, input_val: str, locale: str + ) -> None: + """PROPERTY: Weight estimation is deterministic and positive.""" + context = FrozenErrorContext( + input_value=input_val, + locale_code=locale, + parse_type="number", + fallback_value="fallback", + ) + error = FrozenFluentError(message, ErrorCategory.FORMATTING, context=context) + weight1 = _estimate_error_weight(error) + weight2 = _estimate_error_weight(error) + assert weight1 == weight2 + assert weight1 > 0 + min_weight = len(message) + len(input_val) + len(locale) + len("number") + len("fallback") + assert weight1 >= min_weight + event(f"weight={weight1}") diff --git a/tests/runtime_cache_integrity_cases/limits_and_timing.py b/tests/runtime_cache_integrity_cases/limits_and_timing.py new file mode 100644 index 00000000..c093f796 --- /dev/null +++ b/tests/runtime_cache_integrity_cases/limits_and_timing.py @@ -0,0 +1,303 @@ +# mypy: ignore-errors +from __future__ import annotations + +import time + +import pytest +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.constants import DEFAULT_MAX_ENTRY_WEIGHT +from ftllexengine.diagnostics import ( + ErrorCategory, + FrozenFluentError, +) +from ftllexengine.integrity import CacheCorruptionError, IntegrityContext +from ftllexengine.runtime import FluentBundle +from ftllexengine.runtime.cache import ( + IntegrityCache, +) +from ftllexengine.runtime.cache_config import CacheConfig + +# Sentinel key_hash for unit tests that verify checksum mechanics but do not +# need meaningful key binding (all-zeros = "unbound test entry"). +_NO_KEY_HASH: bytes = b"\x00" * 8 + +# ============================================================================ +# CHECKSUM VERIFICATION TESTS +# ============================================================================ + + + +class TestCacheEntrySizeLimit: + """IntegrityCache max_entry_weight prevents caching of oversized results.""" + + def test_default_max_entry_weight(self) -> None: + """Default max_entry_weight is DEFAULT_MAX_ENTRY_WEIGHT (10,000 characters).""" + cache = IntegrityCache(strict=False) + assert cache.max_entry_weight == DEFAULT_MAX_ENTRY_WEIGHT + assert cache.max_entry_weight == 10_000 + + def test_custom_max_entry_weight(self) -> None: + """Custom max_entry_weight is stored and returned correctly.""" + cache = IntegrityCache(strict=False, max_entry_weight=1000) + assert cache.max_entry_weight == 1000 + + def test_invalid_max_entry_weight_rejected(self) -> None: + """Zero and negative max_entry_weight raise ValueError.""" + with pytest.raises(ValueError, match="max_entry_weight must be positive"): + IntegrityCache(strict=False, max_entry_weight=0) + + with pytest.raises(ValueError, match="max_entry_weight must be positive"): + IntegrityCache(strict=False, max_entry_weight=-1) + + def test_small_entries_cached(self) -> None: + """Entries below max_entry_weight are stored and retrievable.""" + cache = IntegrityCache(strict=False, max_entry_weight=1000) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) + + assert cache.size == 1 + assert cache.oversize_skips == 0 + + cached = cache.get("msg", None, None, "en", use_isolating=True) + assert cached is not None + assert cached.as_result() == ("x" * 100, ()) + + def test_large_entries_not_cached(self) -> None: + """Entries exceeding max_entry_weight are skipped and counted.""" + cache = IntegrityCache(strict=False, max_entry_weight=100) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 200, errors=()) + + assert cache.size == 0 + assert cache.oversize_skips == 1 + + cached = cache.get("msg", None, None, "en", use_isolating=True) + assert cached is None + + def test_boundary_entry_size(self) -> None: + """Entry exactly at max_entry_weight is cached (inclusive boundary).""" + cache = IntegrityCache(strict=False, max_entry_weight=100) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) + + assert cache.size == 1 + assert cache.oversize_skips == 0 + + def test_get_stats_includes_oversize_skips(self) -> None: + """get_stats() reports oversize_skips and max_entry_weight.""" + cache = IntegrityCache(strict=False, max_entry_weight=50) + + for i in range(5): + cache.put(f"msg-{i}", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) + + stats = cache.get_stats() + assert stats["oversize_skips"] == 5 + assert stats["max_entry_weight"] == 50 + assert stats["size"] == 0 + + def test_clear_preserves_oversize_skips(self) -> None: + """clear() removes entries but preserves cumulative oversize_skips counter.""" + cache = IntegrityCache(strict=False, max_entry_weight=50) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) + assert cache.oversize_skips == 1 + + cache.clear() + assert cache.oversize_skips == 1 + + def test_bundle_cache_uses_default_max_entry_weight(self) -> None: + """FluentBundle's internal cache uses default max_entry_weight.""" + bundle = FluentBundle("en", cache=CacheConfig()) + bundle.add_resource("msg = { $data }") + + small_data = "x" * 100 + bundle.format_pattern("msg", {"data": small_data}) + + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] == 1 + + @given(st.integers(min_value=1, max_value=1000)) + def test_max_entry_weight_property(self, size: int) -> None: + """PROPERTY: max_entry_weight is correctly stored and returned.""" + event(f"weight_size={size}") + cache = IntegrityCache(strict=False, max_entry_weight=size) + assert cache.max_entry_weight == size + + def test_combined_weight_skips_counter_incremented(self) -> None: + """Entries skipped due to combined weight increment combined_weight_skips. + + Scenario: formatted string (100 chars) passes check 1 (len <= max_entry_weight=200). + Error overhead = 100 (base) + 150 (message) = 250. Total = 350 > 200 fails check 3. + """ + cache = IntegrityCache(strict=False, max_entry_weight=200) + error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) + + stats = cache.get_stats() + assert stats["combined_weight_skips"] == 1 + assert stats["oversize_skips"] == 0 + assert stats["error_bloat_skips"] == 0 + assert stats["size"] == 0 + + def test_combined_weight_skips_distinct_from_oversize_skips(self) -> None: + """oversize_skips and combined_weight_skips are separate, distinct counters.""" + cache = IntegrityCache(strict=False, max_entry_weight=200) + heavy_error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) + + # Check 1 (oversize): formatted string alone exceeds max_entry_weight + cache.put("over-msg", None, None, "en", use_isolating=True, formatted="x" * 201, errors=()) + + # Check 3 (combined_weight): formatted OK, but combined total exceeds limit + cache.put("combined-msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(heavy_error,)) + + stats = cache.get_stats() + assert stats["oversize_skips"] == 1 + assert stats["combined_weight_skips"] == 1 + + def test_combined_weight_skips_distinct_from_error_bloat_skips(self) -> None: + """error_bloat_skips and combined_weight_skips are separate, distinct counters.""" + cache = IntegrityCache(strict=False, max_entry_weight=200, max_errors_per_entry=2) + heavy_error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) + + # Check 2 (error_bloat): too many errors by count + many_errors = tuple( + FrozenFluentError(f"e-{i}", ErrorCategory.REFERENCE) for i in range(3) + ) + cache.put("bloat-msg", None, None, "en", use_isolating=True, formatted="Hello", errors=many_errors) + + # Check 3 (combined_weight): error count OK (1 <= 2), combined weight fails + cache.put("combined-msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(heavy_error,)) + + stats = cache.get_stats() + assert stats["error_bloat_skips"] == 1 + assert stats["combined_weight_skips"] == 1 + + def test_combined_weight_skips_preserved_on_clear(self) -> None: + """clear() preserves cumulative combined_weight_skips counter.""" + cache = IntegrityCache(strict=False, max_entry_weight=200) + error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) + assert cache.combined_weight_skips == 1 + + cache.clear() + assert cache.combined_weight_skips == 1 + + def test_get_stats_includes_combined_weight_skips(self) -> None: + """get_stats() reports combined_weight_skips alongside related skip counters.""" + cache = IntegrityCache(strict=False, max_entry_weight=200) + error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) + + cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) + + stats = cache.get_stats() + assert "combined_weight_skips" in stats + assert stats["combined_weight_skips"] == 1 + +class TestWriteLogEntryWallTime: + """WriteLogEntry carries both monotonic timestamp and wall_time_unix.""" + + def test_write_log_entry_has_wall_time_unix_field(self) -> None: + """WriteLogEntry.wall_time_unix field exists and is a float.""" + before = time.time() + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) + after = time.time() + + log = cache.get_audit_log() + assert len(log) >= 1 + entry = log[0] + assert isinstance(entry.wall_time_unix, float) + assert isinstance(entry.cache_sequence, int) + # Wall time should be bracketed between the before/after calls + assert before <= entry.wall_time_unix <= after + + def test_write_log_entry_timestamp_is_monotonic(self) -> None: + """WriteLogEntry.timestamp (monotonic) is distinct from wall_time_unix.""" + + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) + + log = cache.get_audit_log() + entry = log[0] + # Monotonic and wall clock are different clocks — values may differ + assert isinstance(entry.timestamp, float) + assert isinstance(entry.wall_time_unix, float) + # Both should be positive + assert entry.timestamp > 0 + assert entry.wall_time_unix > 0 + + def test_audit_log_multiple_entries_wall_time_non_decreasing(self) -> None: + """wall_time_unix values across audit entries are non-decreasing.""" + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("a", None, None, "en", use_isolating=True, formatted="A", errors=()) + cache.put("b", None, None, "en", use_isolating=True, formatted="B", errors=()) + cache.put("c", None, None, "en", use_isolating=True, formatted="C", errors=()) + + log = cache.get_audit_log() + wall_times = [e.wall_time_unix for e in log] + for i in range(len(wall_times) - 1): + assert wall_times[i] <= wall_times[i + 1], ( + f"wall_time_unix not non-decreasing at index {i}: " + f"{wall_times[i]} > {wall_times[i + 1]}" + ) + + def test_audit_log_sequence_is_monotonic_even_with_misses(self) -> None: + """Audit-event sequence increases monotonically across misses and hits.""" + cache = IntegrityCache(enable_audit=True, strict=False) + cache.get("missing", None, None, "en", use_isolating=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) + cache.get("msg", None, None, "en", use_isolating=True) + + log = cache.get_audit_log() + sequences = [entry.sequence for entry in log] + assert sequences == sorted(sequences) + assert [entry.operation for entry in log] == ["MISS", "PUT", "HIT"] + assert [entry.cache_sequence for entry in log] == [0, 1, 1] + +class TestIntegrityContextWallTime: + """IntegrityContext.wall_time_unix is populated at integrity error sites.""" + + def test_integrity_context_wall_time_unix_field_exists(self) -> None: + """IntegrityContext accepts wall_time_unix and stores it correctly.""" + t = time.time() + ctx = IntegrityContext( + component="test", + operation="check", + timestamp=time.monotonic(), + wall_time_unix=t, + ) + assert ctx.wall_time_unix == t + + def test_integrity_context_wall_time_unix_defaults_to_none(self) -> None: + """IntegrityContext.wall_time_unix defaults to None for backwards compat.""" + ctx = IntegrityContext(component="test", operation="check") + assert ctx.wall_time_unix is None + + def test_cache_corruption_error_context_has_wall_time(self) -> None: + """CacheCorruptionError raised by strict cache carries wall_time_unix.""" + cache = IntegrityCache(enable_audit=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="ok", errors=()) + + # Corrupt the checksum by manipulating the stored entry directly + key = next(iter(cache._cache)) + entry = cache._cache[key] + + # Corrupt the checksum in-place via object.__setattr__ (frozen dataclass). + # content_hash is field(init=False), so we cannot pass it to __init__. + object.__setattr__(entry, "checksum", b"\x00" * 16) # deliberately invalid + cache._cache[key] = entry + + before = time.time() + with pytest.raises(CacheCorruptionError) as exc_info: + cache.get("msg", None, None, "en", use_isolating=True) + after = time.time() + + ctx = exc_info.value.context + assert ctx is not None + assert ctx.wall_time_unix is not None + assert before <= ctx.wall_time_unix <= after diff --git a/tests/runtime_cache_integrity_cases/write_once_audit.py b/tests/runtime_cache_integrity_cases/write_once_audit.py new file mode 100644 index 00000000..01bcf620 --- /dev/null +++ b/tests/runtime_cache_integrity_cases/write_once_audit.py @@ -0,0 +1,439 @@ +# mypy: ignore-errors +from __future__ import annotations + +import contextlib +import threading +from concurrent.futures import ThreadPoolExecutor, as_completed + +import pytest + +from ftllexengine.integrity import WriteConflictError +from ftllexengine.runtime.cache import ( + IntegrityCache, + IntegrityCacheEntry, + WriteLogEntry, +) + +# Sentinel key_hash for unit tests that verify checksum mechanics but do not +# need meaningful key binding (all-zeros = "unbound test entry"). +_NO_KEY_HASH: bytes = b"\x00" * 8 + +# ============================================================================ +# CHECKSUM VERIFICATION TESTS +# ============================================================================ + + + +class TestWriteOnceStrictMode: + """Test write-once semantics in strict mode.""" + + def test_write_once_allows_first_write(self) -> None: + """First write to a key succeeds.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + entry = cache.get("msg", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Hello" + + def test_write_once_strict_raises_on_second_write(self) -> None: + """Second write to same key raises WriteConflictError in strict mode.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + with pytest.raises(WriteConflictError) as exc_info: + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + + assert "write-once violation" in str(exc_info.value).lower() + assert exc_info.value.existing_seq == 1 + assert exc_info.value.new_seq == 2 # Would-be sequence of rejected entry + + def test_write_once_preserves_original_value(self) -> None: + """Write-once rejection preserves original cached value.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Original", errors=()) + + with contextlib.suppress(WriteConflictError): + cache.put("msg", None, None, "en", use_isolating=True, formatted="Updated", errors=()) + + entry = cache.get("msg", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Original" + + def test_write_once_conflict_counter_incremented_before_raise(self) -> None: + """write_once_conflicts is incremented before WriteConflictError is raised.""" + cache = IntegrityCache(write_once=True, strict=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + with contextlib.suppress(WriteConflictError): + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + + # Counter must be observable even after an exception was raised + assert cache.write_once_conflicts == 1 + +class TestWriteOnceNonStrictMode: + """Test write-once semantics in non-strict mode.""" + + def test_write_once_non_strict_silently_skips(self) -> None: + """Second write silently skipped in non-strict mode.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # No exception raised + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + + # Original value preserved + entry = cache.get("msg", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "Hello" + + def test_write_once_allows_different_keys(self) -> None: + """Write-once allows writes to different keys.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="First", errors=()) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="Second", errors=()) + + entry1 = cache.get("msg1", None, None, "en", use_isolating=True) + entry2 = cache.get("msg2", None, None, "en", use_isolating=True) + assert entry1 is not None + assert entry1.formatted == "First" + assert entry2 is not None + assert entry2.formatted == "Second" + + def test_write_once_conflict_counter_incremented(self) -> None: + """True write-once conflicts increment write_once_conflicts counter.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Different content for same key = true conflict + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + + stats = cache.get_stats() + assert stats["write_once_conflicts"] == 1 + + def test_write_once_conflict_counter_multiple(self) -> None: + """write_once_conflicts accumulates across repeated true conflicts.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + for i in range(5): + cache.put("msg", None, None, "en", use_isolating=True, formatted=f"World-{i}", errors=()) + + assert cache.write_once_conflicts == 5 + + def test_write_once_conflict_not_incremented_for_idempotent(self) -> None: + """Idempotent writes do NOT increment write_once_conflicts.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent + + assert cache.write_once_conflicts == 0 + assert cache.idempotent_writes == 1 + + def test_write_once_conflict_counter_preserved_on_clear(self) -> None: + """clear() preserves cumulative write_once_conflicts counter.""" + cache = IntegrityCache(write_once=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) # Conflict + + assert cache.write_once_conflicts == 1 + cache.clear() + assert cache.write_once_conflicts == 1 + +class TestWriteOnceDisabled: + """Test behavior when write-once is disabled (default).""" + + def test_default_allows_overwrites(self) -> None: + """Default cache allows overwriting entries.""" + cache = IntegrityCache(write_once=False, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + + entry = cache.get("msg", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.formatted == "World" + +class TestAuditLogging: + """Test audit logging functionality.""" + + def test_audit_disabled_by_default(self) -> None: + """Audit logging is disabled by default.""" + cache = IntegrityCache() + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.get("msg", None, None, "en", use_isolating=True) + + stats = cache.get_stats() + assert stats["audit_enabled"] is False + assert stats["audit_entries"] == 0 + + def test_audit_enabled_records_operations(self) -> None: + """Audit logging records operations when enabled.""" + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + cache.get("msg", None, None, "en", use_isolating=True) + cache.get("msg2", None, None, "en", use_isolating=True) # Miss + + stats = cache.get_stats() + assert stats["audit_enabled"] is True + assert stats["audit_entries"] >= 3 # PUT + HIT + MISS + + def test_audit_log_entry_structure(self) -> None: + """Audit log entries have correct structure.""" + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Access internal audit log for verification + audit_log = cache._audit_log + assert audit_log is not None + assert len(audit_log) >= 1 + + entry = audit_log[0] # pylint: disable=unsubscriptable-object + assert isinstance(entry, WriteLogEntry) + assert entry.operation == "PUT" + assert isinstance(entry.key_hash, str) + assert isinstance(entry.timestamp, float) + assert entry.sequence >= 0 + assert isinstance(entry.checksum_hex, str) + + def test_audit_log_records_all_operation_types(self) -> None: + """Audit log records HIT, MISS, PUT, EVICT operations.""" + cache = IntegrityCache(maxsize=2, enable_audit=True, strict=False) + + # PUT 3 entries to trigger eviction + cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) + cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) # Evicts msg1 + + # HIT + cache.get("msg2", None, None, "en", use_isolating=True) + + # MISS + cache.get("nonexistent", None, None, "en", use_isolating=True) + + audit_log = cache._audit_log + assert audit_log is not None + + # pylint: disable=not-an-iterable + operations = {entry.operation for entry in audit_log} + assert "PUT" in operations + assert "EVICT" in operations + assert "HIT" in operations + assert "MISS" in operations + + def test_audit_log_max_entries_enforced(self) -> None: + """Audit log respects max_audit_entries limit.""" + cache = IntegrityCache(enable_audit=True, max_audit_entries=5, strict=False) + + # Generate more operations than max_audit_entries + for i in range(10): + cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) + + audit_log = cache._audit_log + assert audit_log is not None + assert len(audit_log) <= 5 + + def test_audit_log_not_cleared_on_cache_clear(self) -> None: + """Audit log preserved when cache is cleared (historical record).""" + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + audit_log_before = len(cache._audit_log or []) + cache.clear() + audit_log_after = len(cache._audit_log or []) + + assert audit_log_after >= audit_log_before + + def test_audit_records_write_once_rejection(self) -> None: + """Audit log records WRITE_ONCE_CONFLICT for different content writes.""" + cache = IntegrityCache(write_once=True, enable_audit=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="First", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Second", errors=()) # Conflict (different content) + + audit_log = cache._audit_log + assert audit_log is not None + + # pylint: disable=not-an-iterable + operations = [entry.operation for entry in audit_log] + assert "WRITE_ONCE_CONFLICT" in operations + +class TestAuditLoggingCorruption: + """Test audit logging of corruption events.""" + + def test_audit_records_corruption(self) -> None: + """Audit log records CORRUPTION operations.""" + cache = IntegrityCache(enable_audit=True, strict=False) + cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + + # Corrupt entry + key = next(iter(cache._cache.keys())) + entry = cache._cache[key] + corrupted = IntegrityCacheEntry( + formatted="Corrupted", + errors=entry.errors, + checksum=entry.checksum, + created_at=entry.created_at, + sequence=entry.sequence, + key_hash=entry.key_hash, + ) + cache._cache[key] = corrupted + + # Trigger corruption detection + cache.get("msg", None, None, "en", use_isolating=True) + + audit_log = cache._audit_log + assert audit_log is not None + + # pylint: disable=not-an-iterable + operations = [entry.operation for entry in audit_log] + assert "CORRUPTION" in operations + +class TestSequenceNumbers: + """Test monotonically increasing sequence numbers.""" + + def test_sequence_increments_on_put(self) -> None: + """Sequence number increments with each put.""" + cache = IntegrityCache(strict=False) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) + cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) + + entry1 = cache.get("msg1", None, None, "en", use_isolating=True) + entry2 = cache.get("msg2", None, None, "en", use_isolating=True) + entry3 = cache.get("msg3", None, None, "en", use_isolating=True) + + assert entry1 is not None + assert entry1.sequence == 1 + assert entry2 is not None + assert entry2.sequence == 2 + assert entry3 is not None + assert entry3.sequence == 3 + + def test_sequence_not_reset_on_clear(self) -> None: + """Sequence number continues after cache clear (audit trail integrity).""" + cache = IntegrityCache(strict=False) + cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) + cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) + + stats_before = cache.get_stats() + assert stats_before["sequence"] == 2 + + cache.clear() + + cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) + + entry = cache.get("msg3", None, None, "en", use_isolating=True) + assert entry is not None + assert entry.sequence == 3 + +class TestConcurrentIntegrity: + """Test integrity under concurrent access.""" + + def test_concurrent_puts_maintain_integrity(self) -> None: + """Concurrent puts produce valid checksums.""" + cache = IntegrityCache(maxsize=100, strict=False) + + def put_entry(i: int) -> None: + cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) + + with ThreadPoolExecutor(max_workers=10) as executor: + futures = [executor.submit(put_entry, i) for i in range(100)] + for future in as_completed(futures): + future.result() + + # All entries should have valid checksums + for i in range(100): + entry = cache.get(f"msg{i}", None, None, "en", use_isolating=True) + if entry is not None: + assert entry.verify(), f"Entry msg{i} failed checksum verification" + + def test_write_once_thread_safety(self) -> None: + """Write-once semantics are thread-safe.""" + cache = IntegrityCache(write_once=True, strict=False) + success_count = 0 + lock = threading.Lock() + + def try_put() -> None: + nonlocal success_count + try: + cache.put("msg", None, None, "en", use_isolating=True, formatted="Value", errors=()) + with lock: + success_count += 1 + except WriteConflictError: + pass # Expected for some threads + + threads = [threading.Thread(target=try_put) for _ in range(20)] + for thread in threads: + thread.start() + for thread in threads: + thread.join() + + # Only one entry should exist + stats = cache.get_stats() + assert stats["size"] == 1 + +class TestIntegrityStats: + """Test integrity-related statistics.""" + + def test_stats_includes_integrity_fields(self) -> None: + """get_stats() includes all integrity-related fields.""" + cache = IntegrityCache( + write_once=True, + strict=True, + enable_audit=True, + ) + + stats = cache.get_stats() + + # Verify integrity-specific fields exist + assert "corruption_detected" in stats + assert "sequence" in stats + assert "write_once" in stats + assert "strict" in stats + assert "audit_enabled" in stats + assert "audit_entries" in stats + assert "write_once_conflicts" in stats + assert "combined_weight_skips" in stats + + # Verify types + assert isinstance(stats["corruption_detected"], int) + assert isinstance(stats["sequence"], int) + assert isinstance(stats["write_once"], bool) + assert isinstance(stats["strict"], bool) + assert isinstance(stats["audit_enabled"], bool) + assert isinstance(stats["audit_entries"], int) + assert isinstance(stats["write_once_conflicts"], int) + assert isinstance(stats["combined_weight_skips"], int) + + # Verify values reflect configuration + assert stats["write_once"] is True + assert stats["strict"] is True + assert stats["audit_enabled"] is True + assert stats["write_once_conflicts"] == 0 + assert stats["combined_weight_skips"] == 0 + + def test_corruption_counter_accumulates(self) -> None: + """corruption_detected counter accumulates across multiple corruptions.""" + cache = IntegrityCache(strict=False) + + for i in range(3): + cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) + + # Corrupt all entries + for key in list(cache._cache.keys()): + entry = cache._cache[key] + corrupted = IntegrityCacheEntry( + formatted="Corrupted", + errors=entry.errors, + checksum=entry.checksum, + created_at=entry.created_at, + sequence=entry.sequence, + key_hash=entry.key_hash, + ) + cache._cache[key] = corrupted + + # Trigger corruption detection for each + for i in range(3): + cache.get(f"msg{i}", None, None, "en", use_isolating=True) + + stats = cache.get_stats() + assert stats["corruption_detected"] == 3 diff --git a/tests/runtime_cache_property_cases/__init__.py b/tests/runtime_cache_property_cases/__init__.py new file mode 100644 index 00000000..4cec6ad1 --- /dev/null +++ b/tests/runtime_cache_property_cases/__init__.py @@ -0,0 +1,82 @@ +"""Property-based (Hypothesis) tests for FormatCache and IntegrityCache. + +All classes are marked with @pytest.mark.fuzz and run only via: + ./scripts/fuzz_hypofuzz.sh --deep + pytest -m fuzz + +Covers: +- IntegrityCache invariants: maxsize enforced, get-after-put, clear, hit/miss counters +- IntegrityCache LRU eviction patterns +- IntegrityCache key handling: locale, attribute, args dict stability +- IntegrityCache robustness: various arg types, duplicate puts, non-negative stats +- IntegrityCache statistics: hit_rate consistency, size matches entry count +- IntegrityCache init parameters stored correctly +- IntegrityCache primitives: all FluentValue types produce valid cache keys +- FormatCache invariants: transparency, isolation, LRU eviction, stats consistency +- FormatCache invalidation: add_resource, add_function +- FormatCache internals: __len__, properties, key uniqueness, attribute isolation +- FormatCache type collision prevention: bool/int, int/Decimal +""" + +from __future__ import annotations + +from decimal import Decimal + +import pytest +from hypothesis import assume, event, given, settings +from hypothesis import strategies as st + +from ftllexengine import FluentBundle +from ftllexengine.runtime.cache import IntegrityCache +from ftllexengine.runtime.cache_config import CacheConfig + +# ============================================================================ +# MODULE-LEVEL STRATEGIES (used by IntegrityCache tests) +# ============================================================================ + +# Strategy for message IDs - use st.from_regex per hypothesis.md +message_ids = st.from_regex(r"[a-z]+", fullmatch=True) + +# Strategy for locale codes +locale_codes = st.sampled_from(["en_US", "de_DE", "lv_LV", "fr_FR", "ja_JP"]) + +# Strategy for attributes - remove arbitrary max_size +attributes = st.one_of(st.none(), st.text(min_size=1)) + +# Strategy for cache values (result, errors) - remove arbitrary max_size +cache_values: st.SearchStrategy[tuple[str, tuple[()]]] = st.tuples( + st.text(min_size=0), + st.just(()), # Empty error tuple for simplicity +) + +# Strategy for message arguments - keep collection bound, remove text max_size +args_strategy = st.one_of( + st.none(), + st.dictionaries( + st.text(min_size=1), + st.one_of( + st.integers(), + st.decimals(allow_nan=False, allow_infinity=False), + st.text(), + ), + max_size=5, # Keep practical bound for dict size + ), +) + +__all__ = [ + "CacheConfig", + "Decimal", + "FluentBundle", + "IntegrityCache", + "args_strategy", + "assume", + "attributes", + "cache_values", + "event", + "given", + "locale_codes", + "message_ids", + "pytest", + "settings", + "st", +] diff --git a/tests/runtime_cache_property_cases/formatcache_properties_via_fluent_bundle.py b/tests/runtime_cache_property_cases/formatcache_properties_via_fluent_bundle.py new file mode 100644 index 00000000..7788a490 --- /dev/null +++ b/tests/runtime_cache_property_cases/formatcache_properties_via_fluent_bundle.py @@ -0,0 +1,553 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_property.py.""" + +from tests.runtime_cache_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# FORMATCACHE PROPERTIES (via FluentBundle) +# ============================================================================ + + +@st.composite +def message_args(draw: st.DrawFn) -> dict[str, str | int]: + """Generate valid message arguments.""" + num_args = draw(st.integers(min_value=0, max_value=5)) + args = {} + for _ in range(num_args): + key = draw(st.text( + alphabet=st.characters(min_codepoint=97, max_codepoint=122), + min_size=1, max_size=10, + )) + value = draw(st.one_of(st.text(min_size=0, max_size=20), st.integers())) + args[key] = value + return args + + +@pytest.mark.fuzz +class TestCacheProperties: + """Property-based tests for FormatCache behavior.""" + + @given(args=message_args()) + def test_cache_transparency(self, args: dict[str, str | int]) -> None: + """Cache hit returns same result as cache miss. + + Property: format_pattern(msg, args) with cache enabled should return + identical results to format_pattern(msg, args) without cache. + """ + ftl_vars = " ".join([f"{{ ${k} }}" for k in args]) + ftl_source = f"msg = Hello {ftl_vars}!" + + # Bundle without cache + bundle_no_cache = FluentBundle("en", use_isolating=False) + bundle_no_cache.add_resource(ftl_source) + result_no_cache, errors_no_cache = bundle_no_cache.format_pattern("msg", args) + + # Bundle with cache + bundle_with_cache = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle_with_cache.add_resource(ftl_source) + + # First call (cache miss) + result_miss, errors_miss = bundle_with_cache.format_pattern("msg", args) + assert result_miss == result_no_cache + assert len(errors_miss) == len(errors_no_cache) + + # Second call (cache hit) + result_hit, errors_hit = bundle_with_cache.format_pattern("msg", args) + assert result_hit == result_no_cache + assert len(errors_hit) == len(errors_no_cache) + + # Cache hit and miss must return identical results + assert result_miss == result_hit + assert len(errors_miss) == len(errors_hit) + event(f"arg_count={len(args)}") + + @given( + args1=message_args(), + args2=message_args(), + ) + def test_cache_isolation( + self, args1: dict[str, str | int], args2: dict[str, str | int] + ) -> None: + """Different args produce different cache entries. + + Property: format_pattern(msg, args1) and format_pattern(msg, args2) + should be cached separately if args differ. + """ + # Only test if args actually differ + if args1 == args2: + return + + ftl_vars = set(args1.keys()) | set(args2.keys()) + ftl_placeholders = " ".join([f"{{ ${k} }}" for k in ftl_vars]) + ftl_source = f"msg = Test {ftl_placeholders}" + + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False, strict=False) + bundle.add_resource(ftl_source) + + # Format with args1 + _result1, _ = bundle.format_pattern("msg", args1) + + # Format with args2 + _result2, _ = bundle.format_pattern("msg", args2) + + # Results should differ if args differ + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] == 2 # Two separate cache entries + event(f"key_count={len(args1)}") + + @given( + cache_size=st.integers(min_value=1, max_value=100), + num_messages=st.integers(min_value=1, max_value=200), + ) + def test_lru_eviction_property(self, cache_size: int, num_messages: int) -> None: + """Cache size never exceeds limit. + + Property: No matter how many format calls, cache size <= maxsize. + """ + bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) + + # Add many messages + ftl_source = "\n".join([f"msg{i} = Message {i}" for i in range(num_messages)]) + bundle.add_resource(ftl_source) + + # Format all messages + for i in range(num_messages): + bundle.format_pattern(f"msg{i}") + + # Cache size must respect limit + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] <= cache_size + assert stats["size"] == min(num_messages, cache_size) + evicted = num_messages > cache_size + event(f"eviction={evicted}") + + @given( + num_calls=st.integers(min_value=1, max_value=100), + ) + def test_stats_consistency_property(self, num_calls: int) -> None: + """Cache stats are always consistent. + + Property: hits + misses = total calls. + """ + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = Hello") + + # Make num_calls format calls + for _ in range(num_calls): + bundle.format_pattern("msg") + + # Stats must be consistent + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["hits"] + stats["misses"] == num_calls + assert stats["hits"] == num_calls - 1 # All but first are hits + assert stats["misses"] == 1 # Only first is miss + event(f"num_calls={num_calls}") + + +@pytest.mark.fuzz +class TestCacheInvalidationProperties: + """Property-based tests for cache invalidation.""" + + @given( + num_resources=st.integers(min_value=1, max_value=10), + ) + def test_invalidation_on_add_resource(self, num_resources: int) -> None: + """Cache entries are cleared every time add_resource is called. + + Property: After add_resource(), cache size = 0 and cumulative + hits/misses are unchanged (add_resource itself does not format + anything). Stats are cumulative by design — they are NOT reset on + clear so production observability is preserved across invalidations. + """ + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = Hello") + + # Warm up cache + bundle.format_pattern("msg") + + # Add resources multiple times + for i in range(num_resources): + stats_before = bundle.get_cache_stats() + assert stats_before is not None + + bundle.add_resource(f"msg{i} = World {i}") + + stats_after = bundle.get_cache_stats() + assert stats_after is not None + assert stats_after["size"] == 0 # Cache entries cleared + # Cumulative stats are preserved across clear (design intent: production + # observability must not be reset by routine cache invalidation). + assert stats_after["hits"] == stats_before["hits"] + assert stats_after["misses"] == stats_before["misses"] + event(f"num_resources={num_resources}") + + @given( + num_functions=st.integers(min_value=1, max_value=10), + ) + def test_invalidation_on_add_function(self, num_functions: int) -> None: + """Cache is cleared every time add_function is called. + + Property: After add_function(), cache size = 0 and stats reset. + """ + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = Hello") + + # Warm up cache + bundle.format_pattern("msg") + + # Add functions multiple times + for i in range(num_functions): + stats_before = bundle.get_cache_stats() + assert stats_before is not None + + def func(value: str) -> str: + return value.upper() + + bundle.add_function(f"FUNC{i}", func) + + stats_after = bundle.get_cache_stats() + assert stats_after is not None + assert stats_after["size"] == 0 # Cache cleared + event(f"num_functions={num_functions}") + + +@pytest.mark.fuzz +class TestCacheInternalProperties: + """Property-based tests for cache internals.""" + + @given( + cache_size=st.integers(min_value=1, max_value=100), + num_operations=st.integers(min_value=0, max_value=200), + ) + def test_cache_len_property(self, cache_size: int, num_operations: int) -> None: + """Cache __len__ always returns correct size. + + Property: len(cache) <= maxsize and len(cache) = stats["size"]. + """ + bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) + + # Add messages + ftl_source = "\n".join([f"msg{i} = Message {i}" for i in range(num_operations)]) + bundle.add_resource(ftl_source) + + # Format messages + for i in range(num_operations): + bundle.format_pattern(f"msg{i}") + + # len() should match stats + cache = bundle._cache + assert cache is not None # Type narrowing for mypy + stats = bundle.get_cache_stats() + assert stats is not None + assert len(cache) == stats["size"] + assert len(cache) <= cache_size + event(f"maxsize={cache_size}") + + @given( + cache_size=st.integers(min_value=1, max_value=50), + ) + def test_cache_properties_consistent(self, cache_size: int) -> None: + """Cache properties (maxsize, hits, misses) are consistent. + + Property: Properties always match internal state. + """ + bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) + bundle.add_resource("msg = Hello") + cache = bundle._cache + assert cache is not None # Type narrowing for mypy + + # maxsize property matches constructor + assert cache.maxsize == cache_size + + # hits and misses start at zero + assert cache.hits == 0 + assert cache.misses == 0 + + # After one call: 1 miss, 0 hits + bundle.format_pattern("msg") + assert cache.hits == 0 + assert cache.misses == 1 + + # After second call: 1 miss, 1 hit + bundle.format_pattern("msg") + assert cache.hits == 1 + assert cache.misses == 1 + event(f"maxsize={cache_size}") + + @given( + num_updates=st.integers(min_value=1, max_value=50), + ) + def test_cache_update_existing_key_property(self, num_updates: int) -> None: + """Updating existing cache entry doesn't increase size. + + Property: Repeatedly formatting same message keeps cache size at 1. + """ + bundle = FluentBundle("en", cache=CacheConfig(size=10)) + bundle.add_resource("msg = Hello") + cache = bundle._cache + assert cache is not None # Type narrowing for mypy + + # Format same message multiple times + for _ in range(num_updates): + bundle.format_pattern("msg") + + # Cache size should be 1 (same entry updated) + assert len(cache) == 1 + assert cache.hits == num_updates - 1 + assert cache.misses == 1 + event(f"updates={num_updates}") + + @given( + args_list=st.lists( + st.dictionaries( + keys=st.text(alphabet="abcdefghij", min_size=1, max_size=3), + values=st.integers(min_value=0, max_value=100), + min_size=0, + max_size=3, + ), + min_size=1, + max_size=20, + ) + ) + def test_cache_key_uniqueness_property(self, args_list: list[dict[str, int]]) -> None: + """Each unique args dict creates separate cache entry. + + Property: Distinct args → distinct cache keys → separate entries. + """ + bundle = FluentBundle("en", cache=CacheConfig(size=100), use_isolating=False, strict=False) + bundle.add_resource("msg = { $a } { $b } { $c }") + cache = bundle._cache + assert cache is not None # Type narrowing for mypy + + # Format with different args + for args in args_list: + bundle.format_pattern("msg", args) + + # Cache size equals number of unique args + unique_args = len({tuple(sorted(args.items())) for args in args_list}) + assert len(cache) == min(unique_args, 100) # Min with cache_size + event(f"unique_args={unique_args}") + + @given( + message_ids=st.lists( + st.text(alphabet="abcdefghij", min_size=3, max_size=10), + min_size=1, + max_size=20, + unique=True, + ) + ) + def test_cache_message_id_isolation_property( + self, message_ids: list[str] + ) -> None: + """Different message IDs create separate cache entries. + + Property: Each message_id → separate cache entry. + """ + bundle = FluentBundle("en", cache=CacheConfig(size=100)) + + # Add all messages + ftl_source = "\n".join([f"{msg_id} = Message {i}" for i, msg_id in enumerate(message_ids)]) + bundle.add_resource(ftl_source) + cache = bundle._cache + assert cache is not None # Type narrowing for mypy + + # Format all messages + for msg_id in message_ids: + bundle.format_pattern(msg_id) + + # Cache should have one entry per message + assert len(cache) == min(len(message_ids), 100) + event(f"msg_count={len(message_ids)}") + + @given( + attributes=st.lists( + st.one_of(st.none(), st.text(alphabet="abcdefghij", min_size=1, max_size=10)), + min_size=1, + max_size=10, + ) + ) + def test_cache_attribute_isolation_property( + self, attributes: list[str | None] + ) -> None: + """Different attributes create separate cache entries. + + Property: Each attribute → separate cache entry. + """ + bundle = FluentBundle("en", cache=CacheConfig(size=100), use_isolating=False) + + # Create message with multiple attributes + attrs_ftl = "\n ".join([f".{attr} = Attr {attr}" for attr in attributes if attr]) + bundle.add_resource(f"msg = Value\n {attrs_ftl}") + cache = bundle._cache + assert cache is not None # Type narrowing for mypy + + # Format with different attributes + seen_attrs = set() + for attr in attributes: + bundle.format_pattern("msg", attribute=attr) + seen_attrs.add(attr) + + # Cache should have one entry per unique attribute + assert len(cache) == len(seen_attrs) + event(f"attr_count={len(seen_attrs)}") + + @given( + num_operations=st.integers(min_value=0, max_value=100), + ) + def test_cache_size_property_consistency(self, num_operations: int) -> None: + """Cache size property matches internal state. + + Property: cache.size == len(cache._cache). + """ + bundle = FluentBundle("en", cache=CacheConfig(size=100)) + + # Add messages + ftl_source = "\n".join([f"msg{i} = Message {i}" for i in range(num_operations)]) + bundle.add_resource(ftl_source) + cache = bundle._cache + assert cache is not None # Type narrowing for mypy + + # Format messages + for i in range(num_operations): + bundle.format_pattern(f"msg{i}") + + # size property should match len() and stats + assert cache.size == len(cache) + stats = bundle.get_cache_stats() + assert stats is not None + assert cache.size == stats["size"] + event(f"entries={num_operations}") + + +@pytest.mark.fuzz +class TestCacheTypeCollisionPrevention: + """Tests for type collision prevention in cache keys. + + Python's hash equality means hash(1) == hash(True) == hash(1.0), which would + cause cache collisions when these values produce different formatted outputs. + The cache uses type-tagged tuples to prevent this. + """ + + def test_bool_int_produce_different_cache_entries(self) -> None: + """Boolean True and integer 1 produce distinct cache entries. + + In Fluent, True formats as "true" while 1 formats as "1". Without type + tagging, Python's hash equality would cause cache collision. + """ + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = { $v }") + + # Format with True first + result_bool, _ = bundle.format_pattern("msg", {"v": True}) + # Format with 1 (would collide without type tagging) + result_int, _ = bundle.format_pattern("msg", {"v": 1}) + + # Results must differ - bool formats as "true", int as "1" + assert result_bool == "true" + assert result_int == "1" + + # Cache should have 2 entries (not 1 due to collision) + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] == 2 + + def test_int_decimal_produce_different_cache_entries(self) -> None: + """Integer 1 and Decimal('1') produce distinct cache entries. + + Without type tagging, hash(1) == hash(Decimal('1')) would cause collision. + """ + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = { $v }") + + # Format with int first + _result_int, _ = bundle.format_pattern("msg", {"v": 1}) + # Format with Decimal (would collide without type tagging) + _result_decimal, _ = bundle.format_pattern("msg", {"v": Decimal(1)}) + + # Cache should have 2 entries + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] == 2 + + def test_bool_false_int_zero_distinct(self) -> None: + """Boolean False and integer 0 produce distinct cache entries. + + hash(False) == hash(0) in Python. + """ + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = { $v }") + + result_bool, _ = bundle.format_pattern("msg", {"v": False}) + result_int, _ = bundle.format_pattern("msg", {"v": 0}) + + # bool formats as "false", int as "0" + assert result_bool == "false" + assert result_int == "0" + + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] == 2 + + def test_cache_hit_returns_correct_typed_value(self) -> None: + """Cache hit returns value for correct type, not hash-equivalent type. + + After caching with int 1, looking up with bool True must NOT return + the cached "1", but cache miss and format "true". + """ + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = { $v }") + + # Cache with int 1 + bundle.format_pattern("msg", {"v": 1}) + + # Look up with bool True - must NOT be a cache hit for the int entry + result, _ = bundle.format_pattern("msg", {"v": True}) + + # If type tagging works, this returns "true" not "1" + assert result == "true" + + @given(st.booleans(), st.integers()) + def test_bool_int_always_distinct(self, b: bool, i: int) -> None: + """PROPERTY: Any bool and int pair with same Python hash produce distinct cache entries.""" + # Only test when hash would collide + if hash(b) != hash(i): + return + + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = { $v }") + + # Format both + bundle.format_pattern("msg", {"v": b}) + bundle.format_pattern("msg", {"v": i}) + + # Should be 2 entries despite hash equality + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] == 2 + event(f"bool={b}") + + @given(st.integers(), st.decimals(allow_nan=False, allow_infinity=False)) + def test_int_decimal_always_distinct_when_equal(self, i: int, d: Decimal) -> None: + """PROPERTY: Int and Decimal with same numeric value produce distinct cache entries.""" + # Only test when values are hash-equal (hash(n) == hash(Decimal(n)) in Python) + try: + if hash(i) != hash(d): + return + except (TypeError, ValueError): + return + + bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) + bundle.add_resource("msg = { $v }") + + # Format both + bundle.format_pattern("msg", {"v": i}) + bundle.format_pattern("msg", {"v": d}) + + # Should be 2 entries despite hash equality + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["size"] == 2 + event(f"int_value={i}") diff --git a/tests/runtime_cache_property_cases/property_tests_basic_invariants.py b/tests/runtime_cache_property_cases/property_tests_basic_invariants.py new file mode 100644 index 00000000..785391c0 --- /dev/null +++ b/tests/runtime_cache_property_cases/property_tests_basic_invariants.py @@ -0,0 +1,148 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_property.py.""" + +from tests.runtime_cache_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - BASIC INVARIANTS +# ============================================================================ + + +@pytest.mark.fuzz + +class TestCacheInvariants: + """Test fundamental IntegrityCache invariants.""" + + @given(maxsize=st.integers(min_value=1, max_value=10000)) + @settings(max_examples=100) + def test_cache_maxsize_enforced(self, maxsize: int) -> None: + """INVARIANT: Cache never exceeds maxsize.""" + cache = IntegrityCache(maxsize=maxsize, strict=False) + + # Add more than maxsize entries + for i in range(maxsize + 10): + cache.put( + f"msg_{i}", + None, + None, + "en_US", + use_isolating=True, + formatted=f"result_{i}", + errors=(), + ) + + # Cache should not exceed maxsize + assert cache.get_stats()["size"] <= maxsize + event(f"maxsize={maxsize}") + + @given( + msg_id=message_ids, + locale=locale_codes, + args=args_strategy, + attr=attributes, + value=cache_values, + ) + @settings(max_examples=200) + def test_get_after_put_returns_value( + self, + msg_id: str, + locale: str, + args: dict[str, int | Decimal | str] | None, + attr: str | None, + value: tuple[str, tuple[()]], + ) -> None: + """PROPERTY: get(k) after put(k, v) returns v.""" + cache = IntegrityCache(maxsize=100, strict=False) + + formatted, errors = value + cache.put(msg_id, args, attr, locale, use_isolating=True, formatted=formatted, errors=errors) + entry = cache.get(msg_id, args, attr, locale, use_isolating=True) + + assert entry is not None + assert entry.as_result() == value + has_args = args is not None + event(f"has_args={has_args}") + + @given( + msg_id=message_ids, + locale=locale_codes, + ) + @settings(max_examples=100) + def test_get_without_put_returns_none( + self, + msg_id: str, + locale: str, + ) -> None: + """PROPERTY: get(k) without put(k) returns None.""" + cache = IntegrityCache(maxsize=100, strict=False) + + result = cache.get(msg_id, None, None, locale, use_isolating=True) + + assert result is None + event(f"locale={locale}") + + @given(maxsize=st.integers(min_value=1, max_value=100)) + @settings(max_examples=50) + def test_clear_resets_cache_to_empty(self, maxsize: int) -> None: + """PROPERTY: clear() empties cache and resets counters.""" + cache = IntegrityCache(maxsize=maxsize, strict=False) + + # Add some entries + for i in range(min(10, maxsize)): + cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) + + # Clear + cache.clear() + + # Cache should be empty + stats = cache.get_stats() + assert stats["size"] == 0 + assert stats["hits"] == 0 + assert stats["misses"] == 0 + event(f"maxsize={maxsize}") + + @given( + msg_id=message_ids, + locale=locale_codes, + value=cache_values, + ) + @settings(max_examples=100) + def test_hit_counter_increments_on_cache_hit( + self, + msg_id: str, + locale: str, + value: tuple[str, tuple[()]], + ) -> None: + """PROPERTY: Cache hits increment hit counter.""" + cache = IntegrityCache(maxsize=100, strict=False) + + formatted, errors = value + cache.put(msg_id, None, None, locale, use_isolating=True, formatted=formatted, errors=errors) + + # First get - cache hit + initial_stats = cache.get_stats() + cache.get(msg_id, None, None, locale, use_isolating=True) + + stats_after_hit = cache.get_stats() + assert stats_after_hit["hits"] == initial_stats["hits"] + 1 + event(f"locale={locale}") + + @given( + msg_id=message_ids, + locale=locale_codes, + ) + @settings(max_examples=100) + def test_miss_counter_increments_on_cache_miss( + self, + msg_id: str, + locale: str, + ) -> None: + """PROPERTY: Cache misses increment miss counter.""" + cache = IntegrityCache(maxsize=100, strict=False) + + initial_stats = cache.get_stats() + cache.get(msg_id, None, None, locale, use_isolating=True) # Cache miss + + stats_after_miss = cache.get_stats() + assert stats_after_miss["misses"] == initial_stats["misses"] + 1 + event(f"locale={locale}") diff --git a/tests/runtime_cache_property_cases/property_tests_init_parameters.py b/tests/runtime_cache_property_cases/property_tests_init_parameters.py new file mode 100644 index 00000000..edbdd874 --- /dev/null +++ b/tests/runtime_cache_property_cases/property_tests_init_parameters.py @@ -0,0 +1,77 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_property.py.""" + +from tests.runtime_cache_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - INIT PARAMETERS +# ============================================================================ + + +@pytest.mark.fuzz +class TestIntegrityCacheHypothesisProperties: + """Property-based tests for IntegrityCache using Hypothesis.""" + + @given( + st.integers(min_value=1, max_value=1000), + st.integers(min_value=1, max_value=10000), + st.integers(min_value=1, max_value=100), + ) + @settings(max_examples=50) + def test_property_init_parameters_stored_correctly( + self, + maxsize: int, + max_entry_weight: int, + max_errors_per_entry: int, + ) -> None: + """PROPERTY: Constructor parameters are stored correctly.""" + cache = IntegrityCache( + strict=False, + maxsize=maxsize, + max_entry_weight=max_entry_weight, + max_errors_per_entry=max_errors_per_entry, + ) + + assert cache.maxsize == maxsize + assert cache.max_entry_weight == max_entry_weight + assert cache.size == 0 + assert cache.hits == 0 + assert cache.misses == 0 + event(f"maxsize={maxsize}") + + @given(st.text(min_size=0, max_size=100)) + @settings(max_examples=50) + def test_property_primitives_hashable(self, text: str) -> None: + """PROPERTY: All primitive types produce valid cache keys.""" + cache = IntegrityCache(strict=False) + + # String + cache.put("msg", {"text": text}, None, "en", use_isolating=True, formatted="result", errors=()) + entry = cache.get("msg", {"text": text}, None, "en", use_isolating=True) + assert entry is not None + assert entry.as_result() == ("result", ()) + + # Integer + cache.put("msg", {"num": 42}, None, "en", use_isolating=True, formatted="result", errors=()) + entry = cache.get("msg", {"num": 42}, None, "en", use_isolating=True) + assert entry is not None + assert entry.as_result() == ("result", ()) + + # Decimal + cache.put("msg", {"decimal": Decimal("3.14")}, None, "en", use_isolating=True, formatted="result", errors=()) + entry = cache.get("msg", {"decimal": Decimal("3.14")}, None, "en", use_isolating=True) + assert entry is not None + assert entry.as_result() == ("result", ()) + + # Bool + cache.put("msg", {"bool": True}, None, "en", use_isolating=True, formatted="result", errors=()) + entry = cache.get("msg", {"bool": True}, None, "en", use_isolating=True) + assert entry is not None + assert entry.as_result() == ("result", ()) + + # None + cache.put("msg", {"val": None}, None, "en", use_isolating=True, formatted="result", errors=()) + entry = cache.get("msg", {"val": None}, None, "en", use_isolating=True) + assert entry is not None + assert entry.as_result() == ("result", ()) + event(f"text_len={len(text)}") diff --git a/tests/runtime_cache_property_cases/property_tests_key_handling.py b/tests/runtime_cache_property_cases/property_tests_key_handling.py new file mode 100644 index 00000000..50211192 --- /dev/null +++ b/tests/runtime_cache_property_cases/property_tests_key_handling.py @@ -0,0 +1,129 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_property.py.""" + +from tests.runtime_cache_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - KEY HANDLING +# ============================================================================ + + +@pytest.mark.fuzz +class TestCacheKeyHandling: + """Test cache key construction and equality.""" + + @given( + msg_id=message_ids, + locale=locale_codes, + value=cache_values, + ) + @settings(max_examples=100) + def test_same_key_retrieves_same_value( + self, + msg_id: str, + locale: str, + value: tuple[str, tuple[()]], + ) -> None: + """PROPERTY: Same key components retrieve same cached value.""" + cache = IntegrityCache(maxsize=100, strict=False) + + formatted, errors = value + # Put with specific key + cache.put(msg_id, None, None, locale, use_isolating=True, formatted=formatted, errors=errors) + + # Get with same key components + entry = cache.get(msg_id, None, None, locale, use_isolating=True) + + assert entry is not None + assert entry.as_result() == value + event(f"locale={locale}") + + @given( + msg_id=message_ids, + locale1=locale_codes, + locale2=locale_codes, + value=cache_values, + ) + @settings(max_examples=100) + def test_different_locale_creates_different_key( + self, + msg_id: str, + locale1: str, + locale2: str, + value: tuple[str, tuple[()]], + ) -> None: + """PROPERTY: Different locales create different cache keys.""" + assume(locale1 != locale2) + + cache = IntegrityCache(maxsize=100, strict=False) + + formatted, errors = value + # Put with locale1 + cache.put(msg_id, None, None, locale1, use_isolating=True, formatted=formatted, errors=errors) + + # Get with locale2 should miss + result = cache.get(msg_id, None, None, locale2, use_isolating=True) + + assert result is None + event(f"locale_pair={locale1}_{locale2}") + + @given( + msg_id=message_ids, + locale=locale_codes, + attr1=attributes, + attr2=attributes, + value=cache_values, + ) + @settings(max_examples=100) + def test_different_attribute_creates_different_key( + self, + msg_id: str, + locale: str, + attr1: str | None, + attr2: str | None, + value: tuple[str, tuple[()]], + ) -> None: + """PROPERTY: Different attributes create different cache keys.""" + assume(attr1 != attr2) + + cache = IntegrityCache(maxsize=100, strict=False) + + formatted, errors = value + # Put with attr1 + cache.put(msg_id, None, attr1, locale, use_isolating=True, formatted=formatted, errors=errors) + + # Get with attr2 should miss + result = cache.get(msg_id, None, attr2, locale, use_isolating=True) + + assert result is None + has_attr1 = attr1 is not None + event(f"has_attr={has_attr1}") + + @given( + msg_id=message_ids, + locale=locale_codes, + value=cache_values, + ) + @settings(max_examples=100) + def test_args_dict_key_stability( + self, + msg_id: str, + locale: str, + value: tuple[str, tuple[()]], + ) -> None: + """PROPERTY: Equivalent args dicts produce same cache key.""" + cache = IntegrityCache(maxsize=100, strict=False) + + formatted, errors = value + # Put with args dict + args = {"x": 1, "y": 2} + cache.put(msg_id, args, None, locale, use_isolating=True, formatted=formatted, errors=errors) + + # Get with equivalent dict (different order) + args_reordered = {"y": 2, "x": 1} + entry = cache.get(msg_id, args_reordered, None, locale, use_isolating=True) + + # Should hit cache (dict key normalized) + assert entry is not None + assert entry.as_result() == value + event(f"locale={locale}") diff --git a/tests/runtime_cache_property_cases/property_tests_lru_eviction.py b/tests/runtime_cache_property_cases/property_tests_lru_eviction.py new file mode 100644 index 00000000..a85236c3 --- /dev/null +++ b/tests/runtime_cache_property_cases/property_tests_lru_eviction.py @@ -0,0 +1,70 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_property.py.""" + +from tests.runtime_cache_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - LRU EVICTION +# ============================================================================ + + +@pytest.mark.fuzz +class TestLRUEviction: + """Test LRU (Least Recently Used) eviction behavior.""" + + @given(maxsize=st.integers(min_value=2, max_value=10)) + @settings(max_examples=50) + def test_lru_evicts_least_recently_used(self, maxsize: int) -> None: + """PROPERTY: LRU eviction removes oldest entry.""" + cache = IntegrityCache(maxsize=maxsize, strict=False) + + # Fill cache to capacity + for i in range(maxsize): + cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) + + # Access first entry to make it recently used + cache.get("msg_0", None, None, "en_US", use_isolating=True) + + # Add one more entry (should evict msg_1, not msg_0) + cache.put("msg_new", None, None, "en_US", use_isolating=True, formatted="result_new", errors=()) + + # msg_0 should still be in cache (recently accessed) + assert cache.get("msg_0", None, None, "en_US", use_isolating=True) is not None + + # msg_1 should be evicted (oldest unreferenced) + assert cache.get("msg_1", None, None, "en_US", use_isolating=True) is None + event(f"maxsize={maxsize}") + + @given( + maxsize=st.integers(min_value=3, max_value=10), + access_pattern=st.lists( + st.integers(min_value=0, max_value=9), + min_size=5, + max_size=20, + ), + ) + @settings(max_examples=50) + def test_lru_access_pattern_eviction( + self, + maxsize: int, + access_pattern: list[int], + ) -> None: + """PROPERTY: LRU eviction respects access patterns.""" + cache = IntegrityCache(maxsize=maxsize, strict=False) + + # Fill cache + for i in range(maxsize): + cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) + + # Access entries according to pattern + for idx in access_pattern: + if idx < maxsize: + cache.get(f"msg_{idx}", None, None, "en_US", use_isolating=True) + + # Add new entries (will trigger evictions) + for i in range(maxsize, maxsize + 3): + cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) + + # Recently accessed entries should still be in cache + assert cache.get_stats()["size"] <= maxsize + event(f"pattern_len={len(access_pattern)}") diff --git a/tests/runtime_cache_property_cases/property_tests_robustness.py b/tests/runtime_cache_property_cases/property_tests_robustness.py new file mode 100644 index 00000000..587ba4dc --- /dev/null +++ b/tests/runtime_cache_property_cases/property_tests_robustness.py @@ -0,0 +1,85 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_property.py.""" + +from tests.runtime_cache_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - ROBUSTNESS +# ============================================================================ + + +@pytest.mark.fuzz +class TestCacheRobustness: + """Test cache robustness with various input types.""" + + @given( + args=st.dictionaries( + st.text(min_size=1), + st.one_of( + st.integers(), + st.decimals(allow_nan=False, allow_infinity=False), + st.text(), + st.booleans(), + st.none(), + ), + max_size=10, # Keep practical bound for dict size + ), + ) + @settings(max_examples=200) + def test_cache_handles_various_arg_types( + self, args: dict[str, int | Decimal | str | bool | None] + ) -> None: + """ROBUSTNESS: Cache handles various argument types.""" + cache = IntegrityCache(maxsize=100, strict=False) + + # Should not crash with various arg types + try: + cache.put("msg", args, None, "en_US", use_isolating=True, formatted="result", errors=()) + entry = cache.get("msg", args, None, "en_US", use_isolating=True) + # If put succeeded, get should return the value + if entry is not None: + assert entry.as_result() == ("result", ()) + except (TypeError, ValueError): + # Some types may not be hashable - acceptable + pass + event(f"arg_types={len(args)}") + + @given( + msg_ids=st.lists(message_ids, min_size=1, max_size=50), + maxsize=st.integers(min_value=1, max_value=10), + ) + @settings(max_examples=50) + def test_cache_handles_duplicate_puts( + self, + msg_ids: list[str], + maxsize: int, + ) -> None: + """ROBUSTNESS: Cache handles duplicate puts gracefully.""" + cache = IntegrityCache(maxsize=maxsize, strict=False) + + # Put same message multiple times + for msg_id in msg_ids: + cache.put(msg_id, None, None, "en_US", use_isolating=True, formatted=f"result_{msg_id}", errors=()) + + # Cache should still respect maxsize + assert cache.get_stats()["size"] <= maxsize + event(f"duplicates={len(msg_ids)}") + + @given(maxsize=st.integers(min_value=1, max_value=100)) + @settings(max_examples=50) + def test_cache_stats_never_negative(self, maxsize: int) -> None: + """ROBUSTNESS: Cache stats are never negative.""" + cache = IntegrityCache(maxsize=maxsize, strict=False) + + # Perform various operations + cache.put("msg", None, None, "en_US", use_isolating=True, formatted="result", errors=()) + cache.get("msg", None, None, "en_US", use_isolating=True) + cache.get("missing", None, None, "en_US", use_isolating=True) + cache.clear() + + stats = cache.get_stats() + assert stats["size"] >= 0 + assert stats["hits"] >= 0 + assert stats["misses"] >= 0 + assert stats["maxsize"] > 0 + event(f"maxsize={maxsize}") diff --git a/tests/runtime_cache_property_cases/property_tests_statistics.py b/tests/runtime_cache_property_cases/property_tests_statistics.py new file mode 100644 index 00000000..fc757205 --- /dev/null +++ b/tests/runtime_cache_property_cases/property_tests_statistics.py @@ -0,0 +1,72 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_cache_property.py.""" + +from tests.runtime_cache_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - STATISTICS +# ============================================================================ + + +@pytest.mark.fuzz +class TestCacheStatistics: + """Test cache statistics tracking.""" + + @given( + operations=st.lists( + st.tuples( + st.sampled_from(["put", "get"]), + message_ids, + ), + min_size=1, + max_size=50, + ), + ) + @settings(max_examples=50) + def test_hit_rate_consistency( + self, + operations: list[tuple[str, str]], + ) -> None: + """PROPERTY: hit_rate = hits / (hits + misses).""" + cache = IntegrityCache(maxsize=20, strict=False) + + for op, msg_id in operations: + if op == "put": + cache.put(msg_id, None, None, "en_US", use_isolating=True, formatted=f"result_{msg_id}", errors=()) + elif op == "get": + cache.get(msg_id, None, None, "en_US", use_isolating=True) + + stats = cache.get_stats() + total = stats["hits"] + stats["misses"] + + if total > 0: + expected_hit_rate = stats["hits"] / total + # hit_rate might be percentage (0-100) or decimal (0.0-1.0) + actual_rate: float = float(stats["hit_rate"]) + if actual_rate > 1.0: # Percentage format + actual_rate = actual_rate / 100.0 + assert abs(actual_rate - expected_hit_rate) < 0.01 + event(f"op_count={len(operations)}") + + @given( + num_entries=st.integers(min_value=0, max_value=50), + maxsize=st.integers(min_value=10, max_value=100), + ) + @settings(max_examples=50) + def test_size_equals_entry_count( + self, + num_entries: int, + maxsize: int, + ) -> None: + """PROPERTY: size stat equals actual number of cached entries.""" + cache = IntegrityCache(maxsize=maxsize, strict=False) + + # Add entries + for i in range(num_entries): + cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) + + stats = cache.get_stats() + expected_size = min(num_entries, maxsize) + + assert stats["size"] == expected_size + event(f"entries={num_entries}") diff --git a/tests/runtime_function_bridge_cases/__init__.py b/tests/runtime_function_bridge_cases/__init__.py new file mode 100644 index 00000000..df7a72ee --- /dev/null +++ b/tests/runtime_function_bridge_cases/__init__.py @@ -0,0 +1,63 @@ +"""Tests for runtime.function_bridge: FunctionRegistry, FunctionSignature, edge cases.""" + +from __future__ import annotations + +from decimal import Decimal +from typing import Any + +import pytest + +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.runtime.function_bridge import ( + _FTL_REQUIRES_LOCALE_ATTR, + FluentValue, + FunctionRegistry, + FunctionSignature, + fluent_function, +) + +# ============================================================================ +# HELPER FUNCTIONS FOR TESTING +# ============================================================================ + + +def sample_function(value: int, *, minimum_fraction_digits: int = 0) -> str: + """Sample function with snake_case parameters.""" + return f"{value:.{minimum_fraction_digits}f}" + + +def simple_function(text: str) -> str: + """Simple function with single parameter.""" + return text.upper() + + +def positional_only_function(value: int, /) -> str: + """Function with positional-only parameter.""" + return str(value * 2) + + +def mixed_params_function( + value: int, /, *, use_grouping: bool = False, date_style: str = "short" +) -> str: + """Function with mixed parameter types.""" + result = str(value) + if use_grouping: + result = f"{value:,}" + return f"{result} ({date_style})" + +__all__ = [ + "_FTL_REQUIRES_LOCALE_ATTR", + "Any", + "Decimal", + "ErrorCategory", + "FluentValue", + "FrozenFluentError", + "FunctionRegistry", + "FunctionSignature", + "fluent_function", + "mixed_params_function", + "positional_only_function", + "pytest", + "sample_function", + "simple_function", +] diff --git a/tests/runtime_function_bridge_cases/auto_generation_parameter_mapping_tests.py b/tests/runtime_function_bridge_cases/auto_generation_parameter_mapping_tests.py new file mode 100644 index 00000000..2980ba09 --- /dev/null +++ b/tests/runtime_function_bridge_cases/auto_generation_parameter_mapping_tests.py @@ -0,0 +1,66 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# AUTO-GENERATION PARAMETER MAPPING TESTS +# ============================================================================ + + +class TestAutoParameterMapping: + """Test automatic parameter mapping generation.""" + + def test_auto_map_snake_case_params(self) -> None: + """Auto-generate mappings for snake_case parameters.""" + registry = FunctionRegistry() + + def func(*, minimum_value: int = 0, maximum_value: int = 100) -> str: + return f"{minimum_value}:{maximum_value}" + + registry.register(func, ftl_name="FUNC") + + # Should auto-map: minimumValue -> minimum_value, maximumValue -> maximum_value + result = registry.call("FUNC", [], {"minimumValue": 1, "maximumValue": 10}) + assert result == "1:10" + + def test_auto_map_skips_self_parameter(self) -> None: + """Auto-mapping skips 'self' parameter.""" + + class TestClass: + def method(self, value: int) -> str: + return str(value) + + registry = FunctionRegistry() + obj = TestClass() + registry.register(obj.method, ftl_name="METHOD") + + result = registry.call("METHOD", [42], {}) + assert result == "42" + + def test_auto_map_with_positional_only_marker(self) -> None: + """Auto-mapping skips positional-only marker '/'.""" + registry = FunctionRegistry() + + registry.register(positional_only_function, ftl_name="POS") + + result = registry.call("POS", [21], {}) + assert result == "42" + + def test_custom_param_map_overrides_auto_map(self) -> None: + """Custom parameter mapping overrides auto-generated mapping.""" + registry = FunctionRegistry() + + def func(*, minimum_value: int = 0) -> str: + return str(minimum_value) + + # Auto would create: minimumValue -> minimum_value + # Custom override: minVal -> minimum_value + registry.register( + func, + ftl_name="FUNC", + param_map={"minVal": "minimum_value"}, + ) + + result = registry.call("FUNC", [], {"minVal": 42}) + assert result == "42" diff --git a/tests/runtime_function_bridge_cases/decorator_and_registry_coverage.py b/tests/runtime_function_bridge_cases/decorator_and_registry_coverage.py new file mode 100644 index 00000000..79ed8d83 --- /dev/null +++ b/tests/runtime_function_bridge_cases/decorator_and_registry_coverage.py @@ -0,0 +1,98 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# DECORATOR AND REGISTRY COVERAGE +# ============================================================================ + + +class TestFunctionBridgeCoverage: + """Test fluent_function decorator and FunctionRegistry coverage.""" + + def test_fluent_function_no_parentheses_usage(self) -> None: + """Using @fluent_function without parentheses applies decorator directly.""" + + @fluent_function + def my_upper(value: str) -> FluentValue: + return value.upper() + + result = my_upper("hello") + assert result == "HELLO" + + def test_fluent_function_with_parentheses_usage(self) -> None: + """Using @fluent_function() with parentheses works as factory.""" + + @fluent_function() + def my_lower(value: str) -> FluentValue: + return value.lower() + + result = my_lower("HELLO") + assert result == "hello" + + def test_fluent_function_with_locale_injection(self) -> None: + """Using @fluent_function(inject_locale=True) sets locale attribute.""" + + @fluent_function(inject_locale=True) + def locale_aware(value: str, locale: str) -> FluentValue: + return f"{value}@{locale}" + + assert hasattr(locale_aware, _FTL_REQUIRES_LOCALE_ATTR) + assert getattr(locale_aware, _FTL_REQUIRES_LOCALE_ATTR) is True + + def test_fluent_function_wrapper_returns_value(self) -> None: + """Wrapper function passes through the decorated function's return value.""" + + @fluent_function + def add_suffix(value: str, suffix: str = "!") -> FluentValue: + return f"{value}{suffix}" + + result = add_suffix("Hello", suffix="?") + assert result == "Hello?" + + def test_get_builtin_metadata_exists(self) -> None: + """get_builtin_metadata returns metadata for known built-in function.""" + registry = FunctionRegistry() + + meta = registry.get_builtin_metadata("NUMBER") + assert meta is not None + assert meta.requires_locale is True + + def test_get_builtin_metadata_not_exists(self) -> None: + """get_builtin_metadata returns None for unknown function name.""" + registry = FunctionRegistry() + + meta = registry.get_builtin_metadata("NONEXISTENT") + assert meta is None + + +class TestFunctionBridgeLeadingUnderscore: + """Test function parameter with leading underscore is preserved in mapping.""" + + def test_parameter_with_leading_underscore(self) -> None: + """Parameter with leading underscore is kept in param_mapping.""" + registry = FunctionRegistry() + + def test_func(_internal: str, public: str) -> str: # noqa: PT019 - intentional + return f"{_internal}:{public}" + + registry.register(test_func, ftl_name="TEST") + + sig = registry._functions["TEST"] # pylint: disable=protected-access + param_values = [v for _, v in sig.param_mapping] + assert "_internal" in param_values + + +class TestFunctionMetadataCallable: + """Test should_inject_locale returns False for unknown function names.""" + + def test_should_inject_locale_not_found(self) -> None: + """should_inject_locale returns False for unregistered function name.""" + registry = FunctionRegistry() + + def custom(val: str) -> str: + return val + + registry.register(custom, ftl_name="CUSTOM") + assert registry.should_inject_locale("NOTFOUND") is False diff --git a/tests/runtime_function_bridge_cases/edge_cases_and_integration_tests.py b/tests/runtime_function_bridge_cases/edge_cases_and_integration_tests.py new file mode 100644 index 00000000..b9c25d57 --- /dev/null +++ b/tests/runtime_function_bridge_cases/edge_cases_and_integration_tests.py @@ -0,0 +1,69 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# EDGE CASES AND INTEGRATION TESTS +# ============================================================================ + + +class TestFunctionBridgeEdgeCases: + """Test edge cases and corner scenarios.""" + + def test_register_multiple_functions(self) -> None: + """Register multiple functions in same registry.""" + registry = FunctionRegistry() + + def func1(x: int) -> str: + return str(x) + + def func2(x: int) -> str: + return str(x * 2) + + registry.register(func1, ftl_name="F1") + registry.register(func2, ftl_name="F2") + + assert registry.has_function("F1") + assert registry.has_function("F2") + assert registry.call("F1", [5], {}) == "5" + assert registry.call("F2", [5], {}) == "10" + + def test_overwrite_registered_function(self) -> None: + """Registering same FTL name twice overwrites previous.""" + registry = FunctionRegistry() + + def func1(_x: int) -> str: + return "first" + + def func2(_x: int) -> str: + return "second" + + registry.register(func1, ftl_name="FUNC") + registry.register(func2, ftl_name="FUNC") + + result = registry.call("FUNC", [1], {}) + assert result == "second" + + def test_empty_parameter_name(self) -> None: + """Handle empty parameter names gracefully.""" + result = FunctionRegistry._to_camel_case("") + assert result == "" + + def test_parameter_with_numbers(self) -> None: + """Handle parameter names with numbers.""" + result = FunctionRegistry._to_camel_case("param_123_test") + assert result == "param123Test" + + def test_call_with_unmapped_parameter(self) -> None: + """Call with parameter not in mapping passes through unchanged.""" + registry = FunctionRegistry() + + def func(**kwargs: Any) -> str: + return str(kwargs.get("unknownParam", "default")) + + registry.register(func, ftl_name="FUNC") + + # unknownParam not in auto-mapping, but should pass through + result = registry.call("FUNC", [], {"unknownParam": "custom"}) + assert result == "custom" diff --git a/tests/runtime_function_bridge_cases/edge_cases_from_test_function_bridge_edge_cases_py.py b/tests/runtime_function_bridge_cases/edge_cases_from_test_function_bridge_edge_cases_py.py new file mode 100644 index 00000000..b29cfc4c --- /dev/null +++ b/tests/runtime_function_bridge_cases/edge_cases_from_test_function_bridge_edge_cases_py.py @@ -0,0 +1,373 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# EDGE CASES (from test_function_bridge_edge_cases.py) +# ============================================================================ + + +class TestFrozenRegistryLines160To164: + """Test lines 160-164: TypeError when registering on frozen registry.""" + + def test_register_on_frozen_registry_raises_type_error(self) -> None: + """Test register() raises TypeError on frozen registry (lines 160-164).""" + registry = FunctionRegistry() + + # Freeze the registry + registry.freeze() + + # Try to register a function on frozen registry + def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + # Should raise TypeError with specific message + with pytest.raises( + TypeError, + match=r"Cannot modify frozen registry.*create_default_registry", + ): + registry.register(my_func, ftl_name="MYFUNC") + + +class TestParameterCollisionLines188To193: + """Test lines 188-193: ValueError on parameter name collision.""" + + def test_register_with_parameter_collision_raises_value_error(self) -> None: + """Test register() raises ValueError on parameter collision (lines 188-193).""" + registry = FunctionRegistry() + + # Create a function with parameters that will collide after stripping underscores + # Both `_value` and `value` would map to camelCase `value` + def colliding_func( + val: str, + locale_code: str, # noqa: ARG001 - unused + /, + _test_param: int = 0, # Will strip to `test_param` -> `testParam` + test_param: int = 0, # Also maps to `testParam` # noqa: ARG001 - unused + ) -> str: + return val + + # Should raise ValueError about parameter collision + with pytest.raises(ValueError, match=r"Parameter name collision.*testParam"): + registry.register(colliding_func, ftl_name="COLLIDE") + + +class TestFreezeMethodLine285: + """Test line 285: freeze() method.""" + + def test_freeze_sets_frozen_flag(self) -> None: + """Test freeze() sets _frozen = True (line 285).""" + registry = FunctionRegistry() + + # Initially not frozen + assert not registry.frozen + + # Freeze it + registry.freeze() + + # Should now be frozen + assert registry.frozen + + def test_freeze_prevents_registration(self) -> None: + """Test freeze() actually prevents further registration.""" + registry = FunctionRegistry() + + # Register a function before freezing + def func1(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + registry.register(func1, ftl_name="FUNC1") + assert "FUNC1" in registry + + # Freeze the registry + registry.freeze() + + # Try to register another function + def func2(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + # Should fail + with pytest.raises(TypeError): + registry.register(func2, ftl_name="FUNC2") + + # Original function still there + assert "FUNC1" in registry + # New function not added + assert "FUNC2" not in registry + + +class TestFrozenPropertyLine294: + """Test line 294: frozen property getter.""" + + def test_frozen_property_returns_false_initially(self) -> None: + """Test frozen property returns False for new registry (line 294).""" + registry = FunctionRegistry() + + # Should not be frozen initially + result = registry.frozen + + assert result is False + + def test_frozen_property_returns_true_after_freeze(self) -> None: + """Test frozen property returns True after freeze() (line 294).""" + registry = FunctionRegistry() + + # Freeze it + registry.freeze() + + # Property should return True + result = registry.frozen + + assert result is True + + def test_frozen_property_is_readonly(self) -> None: + """Test frozen property cannot be set directly.""" + registry = FunctionRegistry() + + # Should not be able to set frozen property + with pytest.raises(AttributeError): + registry.frozen = True # type: ignore[misc] + + +class TestFrozenRegistryCopyIntegration: + """Integration tests for frozen registry and copy().""" + + def test_copy_of_frozen_registry_is_mutable(self) -> None: + """Test copy() of frozen registry creates mutable copy.""" + registry = FunctionRegistry() + + # Register and freeze + def func1(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + registry.register(func1, ftl_name="FUNC1") + registry.freeze() + + # Create copy + copy = registry.copy() + + # Copy should not be frozen + assert not copy.frozen + + # Should be able to register on copy + def func2(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + copy.register(func2, ftl_name="FUNC2") + + # Copy has both functions + assert "FUNC1" in copy + assert "FUNC2" in copy + + # Original only has first function + assert "FUNC1" in registry + assert "FUNC2" not in registry + + +class TestFluentFunctionDecoratorWithParentheses: + """Test lines 134-148: fluent_function decorator WITH parentheses.""" + + def test_fluent_function_decorator_with_inject_locale_true(self) -> None: + """Test @fluent_function(inject_locale=True) decorator path (lines 134-148).""" + from ftllexengine.runtime.function_bridge import ( + fluent_function, + ) + + # Use decorator WITH parentheses + @fluent_function(inject_locale=True) + def my_format(value: str, locale_code: str, /) -> str: + return f"{value}_{locale_code}" + + # Verify the function works + result = my_format("test", "en_US") + assert result == "test_en_US" + + # Verify the locale injection marker was set + assert hasattr(my_format, "_ftl_requires_locale") + assert my_format._ftl_requires_locale is True + + def test_fluent_function_decorator_with_inject_locale_false(self) -> None: + """Test @fluent_function(inject_locale=False) decorator path.""" + from ftllexengine.runtime.function_bridge import ( + fluent_function, + ) + + # Use decorator WITH parentheses but inject_locale=False + @fluent_function(inject_locale=False) + def my_upper(value: str) -> str: + return value.upper() + + # Verify the function works + result = my_upper("test") + assert result == "TEST" + + # Verify the locale injection marker was NOT set + assert not getattr(my_upper, "_ftl_requires_locale", False) + + def test_fluent_function_decorator_without_parentheses(self) -> None: + """Test @fluent_function decorator WITHOUT parentheses (line 147).""" + from ftllexengine.runtime.function_bridge import ( + fluent_function, + ) + + # Use decorator WITHOUT parentheses + @fluent_function + def my_simple(value: str) -> str: + return value.lower() + + # Verify the function works + result = my_simple("TEST") + assert result == "test" + + # When used without parentheses and without inject_locale, should not set marker + assert not getattr(my_simple, "_ftl_requires_locale", False) + + +class TestRegisterWithUninspectableCallable: + """Test lines 258-264: ValueError when callable has no inspectable signature.""" + + def test_register_uninspectable_callable_raises_type_error(self) -> None: + """Test register() raises TypeError for callables without signatures (lines 258-264).""" + registry = FunctionRegistry() + + # Create a mock callable that signature() cannot inspect + class UninspectableCallable: + def __call__(self, *args: object, **kwargs: object) -> str: # noqa: ARG002 - unused + return "test" + + # Manually break signature inspection by making it raise ValueError + from unittest.mock import patch + + uninspectable = UninspectableCallable() + + with ( + patch( + "ftllexengine.runtime.function_registry_helpers.signature", + side_effect=ValueError("No signature"), + ), + pytest.raises( + TypeError, + match=r"Cannot register.*no inspectable signature.*param_mapping", + ), + ): + registry.register(uninspectable, ftl_name="UNINSPECTABLE") + + +class TestShouldInjectLocaleWithMissingFunction: + """Test lines 575-579: should_inject_locale when function not in registry.""" + + def test_should_inject_locale_returns_false_for_missing_function(self) -> None: + """Test should_inject_locale returns False for non-existent function (lines 575-576).""" + registry = FunctionRegistry() + + # Function doesn't exist in registry + result = registry.should_inject_locale("NONEXISTENT") + + # Should return False (not raise) + assert result is False + + def test_should_inject_locale_returns_false_for_function_without_marker(self) -> None: + """Test should_inject_locale when function has no marker. + + Returns False when function exists but has no marker (lines 578-579). + """ + registry = FunctionRegistry() + + # Register a function without locale injection marker + def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + registry.register(my_func, ftl_name="CUSTOM") + + # Function exists, but doesn't have _ftl_requires_locale marker + result = registry.should_inject_locale("CUSTOM") + + # Should return False (lines 578-579: getattr returns False) + assert result is False + + def test_should_inject_locale_returns_true_for_function_with_marker(self) -> None: + """Test should_inject_locale returns True when function has marker set.""" + from ftllexengine.runtime.function_bridge import ( + fluent_function, + ) + + registry = FunctionRegistry() + + # Register a function with locale injection marker + @fluent_function(inject_locale=True) + def my_format(value: str, locale_code: str, /) -> str: + return f"{value}_{locale_code}" + + registry.register(my_format, ftl_name="MYFORMAT") + + # Function has marker, should return True + result = registry.should_inject_locale("MYFORMAT") + + assert result is True + + +class TestGetExpectedPositionalArgs: + """Test lines 605-608: get_expected_positional_args method.""" + + def test_get_expected_positional_args_for_builtin_function(self) -> None: + """Test get_expected_positional_args returns count for built-in (lines 605-608).""" + from ftllexengine.runtime.functions import ( + create_default_registry, + ) + + registry = create_default_registry() + + # NUMBER is a built-in function with 1 positional arg + result = registry.get_expected_positional_args("NUMBER") + + assert result == 1 + + def test_get_expected_positional_args_for_custom_function(self) -> None: + """Test get_expected_positional_args returns None for custom function.""" + registry = FunctionRegistry() + + # Register a custom function + def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + registry.register(my_func, ftl_name="CUSTOM") + + # Custom function should return None (not in BUILTIN_FUNCTIONS) + result = registry.get_expected_positional_args("CUSTOM") + + assert result is None + + +class TestGetBuiltinMetadata: + """Test lines 626-628: get_builtin_metadata method.""" + + def test_get_builtin_metadata_for_builtin_function(self) -> None: + """Test get_builtin_metadata returns metadata for built-in (lines 626-628).""" + from ftllexengine.runtime.functions import ( + create_default_registry, + ) + + registry = create_default_registry() + + # NUMBER is a built-in function + metadata = registry.get_builtin_metadata("NUMBER") + + # Should return metadata object + assert metadata is not None + assert metadata.requires_locale is True + + def test_get_builtin_metadata_for_custom_function(self) -> None: + """Test get_builtin_metadata returns None for custom function.""" + registry = FunctionRegistry() + + # Register a custom function + def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused + return value + + registry.register(my_func, ftl_name="CUSTOM") + + # Custom function should return None + metadata = registry.get_builtin_metadata("CUSTOM") + + assert metadata is None diff --git a/tests/runtime_function_bridge_cases/function_calling_tests.py b/tests/runtime_function_bridge_cases/function_calling_tests.py new file mode 100644 index 00000000..c736c7be --- /dev/null +++ b/tests/runtime_function_bridge_cases/function_calling_tests.py @@ -0,0 +1,77 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# FUNCTION CALLING TESTS +# ============================================================================ + + +class TestFunctionCalling: + """Test calling registered functions.""" + + def test_call_function_with_positional_args(self) -> None: + """Call function with only positional arguments.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="UPPER") + + result = registry.call("UPPER", ["hello"], {}) + + assert result == "HELLO" + + def test_call_function_with_named_args(self) -> None: + """Call function with named arguments.""" + registry = FunctionRegistry() + registry.register(sample_function, ftl_name="FORMAT") + + # FTL: FORMAT($value, minimumFractionDigits: 2) + result = registry.call("FORMAT", [42], {"minimumFractionDigits": 2}) + + assert result == "42.00" + + def test_call_function_with_mixed_args(self) -> None: + """Call function with both positional and named arguments.""" + registry = FunctionRegistry() + registry.register(mixed_params_function, ftl_name="MIX") + + result = registry.call("MIX", [1000], {"useGrouping": True, "dateStyle": "long"}) + assert isinstance(result, str) + assert "1,000" in result + assert "long" in result + + def test_call_function_auto_converts_camel_to_snake(self) -> None: + """Function call auto-converts FTL camelCase to Python snake_case.""" + registry = FunctionRegistry() + + def test_func(*, minimum_value: int = 0, maximum_value: int = 100) -> str: + return f"{minimum_value}-{maximum_value}" + + registry.register(test_func, ftl_name="RANGE") + + # FTL uses camelCase: minimumValue, maximumValue + result = registry.call("RANGE", [], {"minimumValue": 5, "maximumValue": 50}) + + assert result == "5-50" + + def test_call_nonexistent_function_raises_error(self) -> None: + """Calling non-existent function raises FrozenFluentError with RESOLUTION category.""" + registry = FunctionRegistry() + + with pytest.raises(FrozenFluentError, match="Function 'NONEXISTENT' not found") as exc_info: + registry.call("NONEXISTENT", [], {}) + assert exc_info.value.category == ErrorCategory.RESOLUTION + + def test_call_function_that_raises_exception(self) -> None: + """Function that raises exception is wrapped in FrozenFluentError.""" + registry = FunctionRegistry() + + def failing_func(_value: int) -> str: + msg = "Something went wrong" + raise ValueError(msg) + + registry.register(failing_func, ftl_name="FAIL") + + with pytest.raises(FrozenFluentError, match="Function 'FAIL' failed") as exc_info: + registry.call("FAIL", [42], {}) + assert exc_info.value.category == ErrorCategory.RESOLUTION diff --git a/tests/runtime_function_bridge_cases/function_registry_basic_tests.py b/tests/runtime_function_bridge_cases/function_registry_basic_tests.py new file mode 100644 index 00000000..ef93faea --- /dev/null +++ b/tests/runtime_function_bridge_cases/function_registry_basic_tests.py @@ -0,0 +1,93 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# FUNCTION REGISTRY BASIC TESTS +# ============================================================================ + + +class TestFunctionRegistryBasic: + """Test basic FunctionRegistry functionality.""" + + def test_create_registry(self) -> None: + """Create empty function registry.""" + registry = FunctionRegistry() + + assert not registry.has_function("NUMBER") + + def test_register_function_with_default_name(self) -> None: + """Register function with auto-generated FTL name.""" + registry = FunctionRegistry() + + def number(value: int) -> str: + return str(value) + + registry.register(number) + + assert registry.has_function("NUMBER") + assert registry.get_python_name("NUMBER") == "number" + + def test_register_function_with_custom_ftl_name(self) -> None: + """Register function with custom FTL name.""" + registry = FunctionRegistry() + + registry.register(sample_function, ftl_name="NUM_FORMAT") + + assert registry.has_function("NUM_FORMAT") + assert not registry.has_function("SAMPLE_FUNCTION") + + def test_register_function_with_custom_param_map(self) -> None: + """Register function with custom parameter mappings.""" + registry = FunctionRegistry() + + def custom_func(arg1: int, *, special_arg: str = "") -> str: + return f"{arg1}:{special_arg}" + + registry.register( + custom_func, + ftl_name="CUSTOM", + param_map={"customArg": "special_arg"}, + ) + + result = registry.call("CUSTOM", [42], {"customArg": "test"}) + assert result == "42:test" + + def test_register_inject_locale_function_with_incompatible_signature(self) -> None: + """Register function with inject_locale=True but wrong signature raises TypeError. + + Regression test for API-REGISTRY-SIG-MISMATCH-001. + Functions marked with inject_locale=True must have at least 2 positional + parameters to receive (value, locale_code). Registration should fail-fast + rather than allowing runtime errors. + """ + from ftllexengine.runtime.function_bridge import ( + fluent_function, + ) + + @fluent_function(inject_locale=True) + def bad_func(value: int) -> str: + """Only 1 positional param - incompatible with locale injection.""" + return str(value) + + registry = FunctionRegistry() + + with pytest.raises(TypeError, match="inject_locale=True requires at least 2 positional"): + registry.register(bad_func, ftl_name="BAD") + + def test_register_inject_locale_function_with_compatible_signature(self) -> None: + """Register function with inject_locale=True and correct signature succeeds.""" + from ftllexengine.runtime.function_bridge import ( + fluent_function, + ) + + @fluent_function(inject_locale=True) + def good_func(value: int, locale_code: str) -> str: + """2 positional params - compatible with locale injection.""" + return f"{value}@{locale_code}" + + registry = FunctionRegistry() + registry.register(good_func, ftl_name="GOOD") + + assert registry.has_function("GOOD") diff --git a/tests/runtime_function_bridge_cases/function_signature_tests.py b/tests/runtime_function_bridge_cases/function_signature_tests.py new file mode 100644 index 00000000..e3f8eac7 --- /dev/null +++ b/tests/runtime_function_bridge_cases/function_signature_tests.py @@ -0,0 +1,37 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# FUNCTION SIGNATURE TESTS +# ============================================================================ + + +class TestFunctionSignature: + """Test FunctionSignature dataclass.""" + + def test_create_function_signature(self) -> None: + """Create FunctionSignature with all fields.""" + sig = FunctionSignature( + python_name="test_func", + ftl_name="TEST", + param_mapping=(("minimumValue", "minimum_value"),), + callable=str, + ) + + assert sig.python_name == "test_func" + assert sig.ftl_name == "TEST" + assert sig.param_mapping == (("minimumValue", "minimum_value"),) + + def test_function_signature_immutable(self) -> None: + """FunctionSignature is immutable.""" + sig = FunctionSignature( + python_name="test", + ftl_name="TEST", + param_mapping=(), + callable=lambda: "test", + ) + + with pytest.raises(AttributeError): + sig.python_name = "new_name" # type: ignore[misc] diff --git a/tests/runtime_function_bridge_cases/helper_functions_for_testing.py b/tests/runtime_function_bridge_cases/helper_functions_for_testing.py new file mode 100644 index 00000000..117a51d3 --- /dev/null +++ b/tests/runtime_function_bridge_cases/helper_functions_for_testing.py @@ -0,0 +1,33 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# HELPER FUNCTIONS FOR TESTING +# ============================================================================ + + +def sample_function(value: int, *, minimum_fraction_digits: int = 0) -> str: + """Sample function with snake_case parameters.""" + return f"{value:.{minimum_fraction_digits}f}" + + +def simple_function(text: str) -> str: + """Simple function with single parameter.""" + return text.upper() + + +def positional_only_function(value: int, /) -> str: + """Function with positional-only parameter.""" + return str(value * 2) + + +def mixed_params_function( + value: int, /, *, use_grouping: bool = False, date_style: str = "short" +) -> str: + """Function with mixed parameter types.""" + result = str(value) + if use_grouping: + result = f"{value:,}" + return f"{result} ({date_style})" diff --git a/tests/runtime_function_bridge_cases/introspection_api_tests.py b/tests/runtime_function_bridge_cases/introspection_api_tests.py new file mode 100644 index 00000000..05f60f6c --- /dev/null +++ b/tests/runtime_function_bridge_cases/introspection_api_tests.py @@ -0,0 +1,194 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# INTROSPECTION API TESTS +# ============================================================================ + + +class TestFunctionRegistryIntrospection: + """Test FunctionRegistry introspection methods.""" + + def test_list_functions_empty_registry(self) -> None: + """list_functions returns empty list for empty registry.""" + registry = FunctionRegistry() + + functions = registry.list_functions() + + assert functions == [] + + def test_list_functions_single_function(self) -> None: + """list_functions returns single function name.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="UPPER") + + functions = registry.list_functions() + + assert functions == ["UPPER"] + + def test_list_functions_multiple_functions(self) -> None: + """list_functions returns all registered function names.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="FUNC1") + registry.register(sample_function, ftl_name="FUNC2") + registry.register(positional_only_function, ftl_name="FUNC3") + + functions = registry.list_functions() + + assert set(functions) == {"FUNC1", "FUNC2", "FUNC3"} + assert len(functions) == 3 + + def test_get_function_info_existing_function(self) -> None: + """get_function_info returns metadata for registered function.""" + registry = FunctionRegistry() + registry.register(sample_function, ftl_name="FORMAT") + + info = registry.get_function_info("FORMAT") + + assert info is not None + assert info.python_name == "sample_function" + assert info.ftl_name == "FORMAT" + assert isinstance(info.param_mapping, tuple) + assert "minimumFractionDigits" in info.param_dict + assert info.param_dict["minimumFractionDigits"] == "minimum_fraction_digits" + assert callable(info.callable) + + def test_get_function_info_nonexistent_function(self) -> None: + """get_function_info returns None for unregistered function.""" + registry = FunctionRegistry() + + info = registry.get_function_info("NONEXISTENT") + + assert info is None + + def test_iter_empty_registry(self) -> None: + """Iterating empty registry yields no names.""" + registry = FunctionRegistry() + + names = list(registry) + + assert names == [] + + def test_iter_single_function(self) -> None: + """Iterating registry yields function names.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="UPPER") + + names = list(registry) + + assert names == ["UPPER"] + + def test_iter_multiple_functions(self) -> None: + """Iterating registry yields all function names.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="FUNC1") + registry.register(sample_function, ftl_name="FUNC2") + registry.register(positional_only_function, ftl_name="FUNC3") + + names = list(registry) + + assert set(names) == {"FUNC1", "FUNC2", "FUNC3"} + + def test_iter_for_loop(self) -> None: + """Can iterate registry in for loop.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="A") + registry.register(sample_function, ftl_name="B") + + collected_names = [] + for name in registry: + collected_names.append(name) + + assert set(collected_names) == {"A", "B"} + + def test_len_empty_registry(self) -> None: + """len() returns 0 for empty registry.""" + registry = FunctionRegistry() + + assert len(registry) == 0 + + def test_len_single_function(self) -> None: + """len() returns 1 for registry with one function.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="FUNC") + + assert len(registry) == 1 + + def test_len_multiple_functions(self) -> None: + """len() returns correct count for multiple functions.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="F1") + registry.register(sample_function, ftl_name="F2") + registry.register(positional_only_function, ftl_name="F3") + + assert len(registry) == 3 + + def test_len_after_overwrite(self) -> None: + """len() doesn't double-count after overwriting function.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="FUNC") + registry.register(sample_function, ftl_name="FUNC") + + assert len(registry) == 1 + + def test_contains_registered_function(self) -> None: + """'in' operator returns True for registered function.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="UPPER") + + assert "UPPER" in registry + + def test_contains_unregistered_function(self) -> None: + """'in' operator returns False for unregistered function.""" + registry = FunctionRegistry() + + assert "NONEXISTENT" not in registry + + def test_contains_case_sensitive(self) -> None: + """'in' operator is case-sensitive.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="UPPER") + + assert "UPPER" in registry + assert "upper" not in registry + assert "Upper" not in registry + + def test_introspection_integration(self) -> None: + """Combine introspection methods for function discovery.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="FUNC1") + registry.register(sample_function, ftl_name="FUNC2") + + # Check count + assert len(registry) == 2 + + # List all functions + functions = registry.list_functions() + assert len(functions) == 2 + + # Iterate and inspect each function + for name in registry: + assert name in registry + info = registry.get_function_info(name) + assert info is not None + assert info.ftl_name == name + + def test_copy_preserves_introspection(self) -> None: + """Copied registry preserves introspection capabilities.""" + original = FunctionRegistry() + original.register(simple_function, ftl_name="FUNC1") + original.register(sample_function, ftl_name="FUNC2") + + copied = original.copy() + + # Both registries have same functions + assert len(original) == len(copied) + assert set(original) == set(copied) + assert original.list_functions() == copied.list_functions() + + # Modifying copy doesn't affect original + copied.register(positional_only_function, ftl_name="FUNC3") + assert len(copied) == 3 + assert len(original) == 2 diff --git a/tests/runtime_function_bridge_cases/parameter_name_conversion_tests.py b/tests/runtime_function_bridge_cases/parameter_name_conversion_tests.py new file mode 100644 index 00000000..0746dd7f --- /dev/null +++ b/tests/runtime_function_bridge_cases/parameter_name_conversion_tests.py @@ -0,0 +1,36 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARAMETER NAME CONVERSION TESTS +# ============================================================================ + + +class TestParameterNameConversion: + """Test snake_case <-> camelCase conversion.""" + + def test_to_camel_case_single_word(self) -> None: + """Convert single word (no change).""" + result = FunctionRegistry._to_camel_case("value") + + assert result == "value" + + def test_to_camel_case_two_words(self) -> None: + """Convert two_words to twoWords.""" + result = FunctionRegistry._to_camel_case("minimum_value") + + assert result == "minimumValue" + + def test_to_camel_case_multiple_words(self) -> None: + """Convert multiple_word_name to multipleWordName.""" + result = FunctionRegistry._to_camel_case("minimum_fraction_digits") + + assert result == "minimumFractionDigits" + + def test_to_camel_case_already_camel(self) -> None: + """Convert camelCase (no underscores) stays same.""" + result = FunctionRegistry._to_camel_case("alreadyCamel") + + assert result == "alreadyCamel" diff --git a/tests/runtime_function_bridge_cases/real_world_usage_tests.py b/tests/runtime_function_bridge_cases/real_world_usage_tests.py new file mode 100644 index 00000000..2a2d9382 --- /dev/null +++ b/tests/runtime_function_bridge_cases/real_world_usage_tests.py @@ -0,0 +1,62 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# REAL-WORLD USAGE TESTS +# ============================================================================ + + +class TestRealWorldUsage: + """Test realistic usage scenarios.""" + + def test_number_formatting_function(self) -> None: + """Test NUMBER-like function with real parameters.""" + registry = FunctionRegistry() + + def number_format( + value: object, + *, + minimum_fraction_digits: int = 0, # noqa: ARG001 - unused + maximum_fraction_digits: int = 3, + use_grouping: bool = False, + ) -> str: + formatted = f"{Decimal(str(value)):.{maximum_fraction_digits}f}" + if use_grouping: + # Simple grouping simulation + parts = formatted.split(".") + parts[0] = f"{int(parts[0]):,}" + formatted = ".".join(parts) + return formatted + + registry.register(number_format, ftl_name="NUMBER") + + # FTL: { NUMBER($price, minimumFractionDigits: 2, useGrouping: true) } + result = registry.call( + "NUMBER", + [Decimal("1234.5")], + {"minimumFractionDigits": 2, "useGrouping": True}, + ) + assert isinstance(result, str) + assert "1,234" in result + + def test_datetime_formatting_function(self) -> None: + """Test DATETIME-like function with style parameters.""" + registry = FunctionRegistry() + + def datetime_format( + value: str, *, date_style: str = "short", time_style: str = "short" + ) -> str: + return f"{value} ({date_style}/{time_style})" + + registry.register(datetime_format, ftl_name="DATETIME") + + # FTL: { DATETIME($date, dateStyle: "long", timeStyle: "medium") } + result = registry.call( + "DATETIME", + ["2024-01-15"], + {"dateStyle": "long", "timeStyle": "medium"}, + ) + + assert result == "2024-01-15 (long/medium)" diff --git a/tests/runtime_function_bridge_cases/registry_query_tests.py b/tests/runtime_function_bridge_cases/registry_query_tests.py new file mode 100644 index 00000000..80392f51 --- /dev/null +++ b/tests/runtime_function_bridge_cases/registry_query_tests.py @@ -0,0 +1,42 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_function_bridge.py.""" + +from tests.runtime_function_bridge_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# REGISTRY QUERY TESTS +# ============================================================================ + + +class TestRegistryQueries: + """Test registry query methods.""" + + def test_has_function_returns_true_when_registered(self) -> None: + """has_function returns True for registered function.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="UPPER") + + assert registry.has_function("UPPER") + + def test_has_function_returns_false_when_not_registered(self) -> None: + """has_function returns False for unregistered function.""" + registry = FunctionRegistry() + + assert not registry.has_function("UNKNOWN") + + def test_get_python_name_returns_name_when_registered(self) -> None: + """get_python_name returns Python function name.""" + registry = FunctionRegistry() + registry.register(simple_function, ftl_name="UPPER") + + python_name = registry.get_python_name("UPPER") + + assert python_name == "simple_function" + + def test_get_python_name_returns_none_when_not_registered(self) -> None: + """get_python_name returns None for unregistered function.""" + registry = FunctionRegistry() + + python_name = registry.get_python_name("UNKNOWN") + + assert python_name is None diff --git a/tests/runtime_locale_context_cases/__init__.py b/tests/runtime_locale_context_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/runtime_locale_context_cases/boundaries_and_extras.py b/tests/runtime_locale_context_cases/boundaries_and_extras.py new file mode 100644 index 00000000..a834d929 --- /dev/null +++ b/tests/runtime_locale_context_cases/boundaries_and_extras.py @@ -0,0 +1,458 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +import logging +from datetime import UTC, datetime +from decimal import Decimal +from typing import Literal, get_args +from unittest.mock import MagicMock + +import pytest +from babel import numbers as babel_numbers + +from ftllexengine.constants import MAX_LOCALE_CACHE_SIZE +from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.runtime.locale_context import LocaleContext + +# ============================================================================ +# Construction Guard Tests +# ============================================================================ + + + +class TestLongLocaleCodeCoverage: + """Tests for long locale codes exceeding BCP 47 length.""" + + def test_long_valid_locale_code_warns( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """Long valid locale code triggers warning.""" + from ftllexengine.constants import ( + MAX_LOCALE_CODE_LENGTH, + ) + + LocaleContext.clear_cache() + + long_locale = "en-US-x-" + "a" * 30 + assert len(long_locale) > MAX_LOCALE_CODE_LENGTH + + with caplog.at_level(logging.WARNING): + ctx = LocaleContext.create(long_locale) + + assert any( + "exceeds typical BCP 47 length" in r.message + for r in caplog.records + ) + assert isinstance(ctx, LocaleContext) + + def test_long_unknown_locale_code_warns( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """Long unknown locale code triggers specific warning.""" + from ftllexengine.constants import ( + MAX_LOCALE_CODE_LENGTH, + ) + + LocaleContext.clear_cache() + + long_unknown = ( + "xyz-verylongvariantthatshouldexceedlimit" + ) + assert len(long_unknown) > MAX_LOCALE_CODE_LENGTH + + with caplog.at_level(logging.WARNING): + ctx = LocaleContext.create(long_unknown) + + relevant = [ + r.message + for r in caplog.records + if "Unknown locale" in r.message + ] + assert any("exceeds" in msg for msg in relevant) + assert ctx.is_fallback is True + + def test_long_invalid_format_locale_code_rejected(self) -> None: + """Long structurally invalid locale code is rejected before runtime fallback.""" + from ftllexengine.constants import ( + MAX_LOCALE_CODE_LENGTH, + ) + + LocaleContext.clear_cache() + + long_invalid = ( + "!!!INVALID@@@FORMAT###TOOLONG###LOCALE" + ) + assert len(long_invalid) > MAX_LOCALE_CODE_LENGTH + + with pytest.raises(ValueError, match=r"Invalid locale_code:"): + LocaleContext.create(long_invalid) + +class TestCurrencyBoundaryValues: + """Regression tests for currency formatting boundaries.""" + + @pytest.mark.parametrize("value", [ + Decimal(999), + Decimal("999.99"), + Decimal(1000), + Decimal("1000.00"), + Decimal("1000.01"), + Decimal(1001), + ]) + def test_currency_around_1000_boundary( + self, value: Decimal + ) -> None: + """Currency formatting works around 1000 boundary.""" + ctx = LocaleContext.create("en_US") + result = ctx.format_currency(value, currency="USD") + assert isinstance(result, str) + assert result + assert "$" in result or "USD" in result + + @pytest.mark.parametrize("locale", [ + "en_US", "de_DE", "fr_FR", "es_ES", "ja_JP", + "zh_CN", "ar_SA", "ru_RU", "pt_BR", "ko_KR", + "it_IT", "nl_NL", + ]) + def test_currency_1000_across_locales( + self, locale: str + ) -> None: + """Currency formatting for 1000 across locales.""" + ctx = LocaleContext.create(locale) + result = ctx.format_currency( + Decimal(1000), currency="USD" + ) + assert isinstance(result, str) + assert result + assert any(c.isdigit() for c in result) + + @pytest.mark.parametrize("value", [ + Decimal(-1000), + Decimal("-1000.00"), + ]) + def test_negative_1000_currency( + self, value: Decimal + ) -> None: + """Negative 1000 currency values format correctly.""" + ctx = LocaleContext.create("en_US") + result = ctx.format_currency(value, currency="USD") + assert isinstance(result, str) + assert result + assert "-" in result or "(" in result + + @pytest.mark.parametrize("currency", [ + "USD", "EUR", "GBP", "JPY", "CNY", "CHF", "CAD", + "AUD", + ]) + def test_currency_1000_multiple_currencies( + self, currency: str + ) -> None: + """Currency formatting for 1000 with currencies.""" + ctx = LocaleContext.create("en_US") + result = ctx.format_currency( + Decimal(1000), currency=currency + ) + assert isinstance(result, str) + assert result + assert any(c.isdigit() for c in result) + + def test_currency_1000_all_display_modes(self) -> None: + """Currency formatting 1000 with all display modes.""" + ctx = LocaleContext.create("en_US") + value = Decimal(1000) + + result_symbol = ctx.format_currency( + value, currency="USD", currency_display="symbol" + ) + assert "$" in result_symbol + + result_code = ctx.format_currency( + value, currency="USD", currency_display="code" + ) + assert "USD" in result_code + + result_name = ctx.format_currency( + value, currency="USD", currency_display="name" + ) + assert "dollar" in result_name.lower() + + def test_currency_integer_1000(self) -> None: + """Currency formatting handles int 1000.""" + ctx = LocaleContext.create("en_US") + result = ctx.format_currency(1000, currency="USD") + assert isinstance(result, str) + assert "$" in result or "USD" in result + + def test_currency_decimal_1000(self) -> None: + """Currency formatting handles Decimal 1000.""" + ctx = LocaleContext.create("en_US") + result = ctx.format_currency(Decimal(1000), currency="USD") + assert isinstance(result, str) + assert "$" in result or "USD" in result + +class TestLocaleContextCoverageExtra: + """Test LocaleContext cache_info and datetime formatting branches.""" + + def test_cache_info_returns_dict(self) -> None: + """cache_info() returns dict with size, max_size, and locales keys.""" + LocaleContext.clear_cache() + + LocaleContext.create("en-US") + LocaleContext.create("de-DE") + + info = LocaleContext.cache_info() + + assert isinstance(info, dict) + assert "size" in info + assert "max_size" in info + assert "locales" in info + assert info["size"] == 2 + locales = info["locales"] + assert isinstance(locales, tuple) + assert "en_us" in locales or "de_de" in locales + + def test_format_datetime_combined_styles(self) -> None: + """format_datetime with both date and time styles produces non-empty string.""" + ctx = LocaleContext.create("en-US") + + dt = datetime(2024, 6, 15, 14, 30, 0, tzinfo=UTC) + result = ctx.format_datetime(dt, date_style="medium", time_style="short") + + assert isinstance(result, str) + assert len(result) > 0 + + def test_format_datetime_date_only(self) -> None: + """format_datetime with date_style only produces non-empty string.""" + ctx = LocaleContext.create("en-US") + + dt = datetime(2024, 12, 25, 0, 0, 0, tzinfo=UTC) + result = ctx.format_datetime(dt, date_style="long") + + assert isinstance(result, str) + assert "2024" in result or "December" in result + +class TestLocaleContextBranchCoverageExtra: + """Additional tests for locale_context formatting branch coverage.""" + + def test_format_datetime_with_string_pattern(self) -> None: + """format_datetime with both date and time styles covers combined-pattern path.""" + ctx = LocaleContext.create("en-US") + + dt = datetime(2024, 1, 15, 10, 30, 0, tzinfo=UTC) + + result1 = ctx.format_datetime(dt, date_style="short", time_style="short") + assert isinstance(result1, str) + + result2 = ctx.format_datetime(dt, date_style="full", time_style="full") + assert isinstance(result2, str) + + def test_format_datetime_varied_styles(self) -> None: + """All standard datetime style combinations produce non-empty strings.""" + type _DateTimeStyle = Literal["short", "medium", "long", "full"] + styles: tuple[_DateTimeStyle, ...] = get_args(_DateTimeStyle) + + ctx = LocaleContext.create("en-US") + dt = datetime(2024, 3, 15, 10, 30, 0, tzinfo=UTC) + + for date_style in styles: + for time_style in styles: + result = ctx.format_datetime( + dt, date_style=date_style, time_style=time_style + ) + assert isinstance(result, str) + assert len(result) > 0 + +class TestLocaleContextCacheRaceCondition: + """LocaleContext cache handles the double-check locking pattern.""" + + def test_cache_hit_in_double_check_pattern(self) -> None: + """Cache hit during the inner lock check returns the cached instance.""" + LocaleContext.clear_cache() + + locale_code = "en_US" + ctx = LocaleContext.create(locale_code) + + cache_key = normalize_locale(locale_code) + with LocaleContext._cache_lock: # pylint: disable=protected-access + LocaleContext._cache.clear() # pylint: disable=protected-access + LocaleContext._cache[cache_key] = ctx # pylint: disable=protected-access + + result = LocaleContext.create(locale_code) + assert result is ctx + + LocaleContext.clear_cache() + +class TestLocaleContextDatetimePattern: + """LocaleContext formats datetime values using the locale's pattern.""" + + def test_datetime_pattern_without_format_method(self) -> None: + """format_datetime produces a non-empty string for en_US short styles.""" + LocaleContext.clear_cache() + + ctx = LocaleContext.create("en_US") + dt = datetime(2025, 6, 15, 14, 30, 0, tzinfo=UTC) + + result = ctx.format_datetime(dt, date_style="short", time_style="short") + assert result is not None + assert len(result) > 0 + + LocaleContext.clear_cache() + +class TestLocaleContextCacheLimitCoverage: + """LocaleContext LRU cache eviction at MAX_LOCALE_CACHE_SIZE.""" + + def test_cache_at_limit_evicts_lru_entry(self) -> None: + """When cache reaches MAX_LOCALE_CACHE_SIZE, LRU entry is evicted on next create.""" + LocaleContext.clear_cache() + + locales_to_fill = [f"en_TEST{i:04d}" for i in range(MAX_LOCALE_CACHE_SIZE)] + + for locale_code in locales_to_fill: + ctx = LocaleContext.create(locale_code) + assert ctx is not None + + cache_size = LocaleContext.cache_size() + assert cache_size >= MAX_LOCALE_CACHE_SIZE + + size_before = cache_size + + ctx = LocaleContext.create("de_TESTOVERFLOW") + assert ctx is not None + + cache_size_after = LocaleContext.cache_size() + assert cache_size_after <= MAX_LOCALE_CACHE_SIZE + assert cache_size_after <= size_before + 1 + + LocaleContext.clear_cache() + +class TestLocaleContextUnexpectedErrorPropagation: + """Unexpected errors propagate instead of being silently caught.""" + + def test_format_number_unexpected_error_propagates( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """RuntimeError in format_number propagates for debugging.""" + ctx = LocaleContext.create_or_raise("en_US") + + def mock_format_decimal(*_args: object, **_kwargs: object) -> str: + msg = "Mocked RuntimeError for testing" + raise RuntimeError(msg) + + monkeypatch.setattr(babel_numbers, "format_decimal", mock_format_decimal) + + with pytest.raises(RuntimeError, match="Mocked RuntimeError"): + ctx.format_number(Decimal("123.45")) + + def test_format_currency_unexpected_error_propagates( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """RuntimeError in format_currency propagates for debugging.""" + ctx = LocaleContext.create_or_raise("en_US") + + def mock_format_currency(*_args: object, **_kwargs: object) -> str: + msg = "Mocked RuntimeError for testing" + raise RuntimeError(msg) + + monkeypatch.setattr(babel_numbers, "format_currency", mock_format_currency) + + with pytest.raises(RuntimeError, match="Mocked RuntimeError"): + ctx.format_currency(Decimal(100), currency="USD") + +class TestLocaleContextCustomPatternCoverage: + """Custom pattern and currency code fallback branches in format_currency.""" + + def test_format_currency_with_custom_pattern(self) -> None: + """Custom pattern in format_currency is applied to the result.""" + ctx = LocaleContext.create_or_raise("en_US") + + result = ctx.format_currency(Decimal("1234.56"), currency="USD", pattern="#,##0.00 \xa4") + + assert isinstance(result, str) + assert "1,234.56" in result or "1234.56" in result + + def test_format_currency_code_display_fallback( + self, caplog: pytest.LogCaptureFixture, monkeypatch: pytest.MonkeyPatch + ) -> None: + """format_currency logs debug when locale pattern lacks currency placeholder.""" + mock_locale = MagicMock() + mock_pattern = MagicMock() + mock_pattern.pattern = "#,##0.00" # No currency placeholder (missing \xa4) + mock_locale.currency_formats = {"standard": mock_pattern} + + ctx = LocaleContext.create_or_raise("en_US") + original_babel_locale = ctx._babel_locale # pylint: disable=protected-access + + object.__setattr__(ctx, "_babel_locale", mock_locale) + + monkeypatch.setattr( + babel_numbers, + "format_currency", + lambda *_args, **_kwargs: "$100.00", + ) + + try: + with caplog.at_level(logging.DEBUG): + result = ctx.format_currency( + Decimal(100), currency="USD", currency_display="code" + ) + + assert isinstance(result, str) + + assert any( + "lacks placeholder" in record.message + for record in caplog.records + ) + finally: + object.__setattr__(ctx, "_babel_locale", original_babel_locale) + +class TestLocaleContextCurrencyCodeFallback: + """Currency code display fallback when standard_pattern is None or lacks attributes.""" + + def test_format_currency_code_no_standard_pattern( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """format_currency falls through to default when standard_pattern is None.""" + ctx = LocaleContext.create_or_raise("en_US") + original_babel_locale = ctx._babel_locale # pylint: disable=protected-access + + mock_locale = MagicMock() + mock_locale.currency_formats = {"standard": None} + + object.__setattr__(ctx, "_babel_locale", mock_locale) + + monkeypatch.setattr( + babel_numbers, + "format_currency", + lambda *_args, **_kwargs: "$100.00", + ) + + try: + result = ctx.format_currency(Decimal(100), currency="USD", currency_display="code") + assert isinstance(result, str) + finally: + object.__setattr__(ctx, "_babel_locale", original_babel_locale) + + def test_format_currency_code_pattern_no_attr( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """format_currency falls through to default when standard_pattern lacks 'pattern' attr.""" + ctx = LocaleContext.create_or_raise("en_US") + original_babel_locale = ctx._babel_locale # pylint: disable=protected-access + + mock_locale = MagicMock() + mock_pattern = object() # Plain object with no attributes + mock_locale.currency_formats = {"standard": mock_pattern} + + object.__setattr__(ctx, "_babel_locale", mock_locale) + + monkeypatch.setattr( + babel_numbers, + "format_currency", + lambda *_args, **_kwargs: "$100.00", + ) + + try: + result = ctx.format_currency(Decimal(100), currency="USD", currency_display="code") + assert isinstance(result, str) + finally: + object.__setattr__(ctx, "_babel_locale", original_babel_locale) diff --git a/tests/runtime_locale_context_cases/construction_and_cache.py b/tests/runtime_locale_context_cases/construction_and_cache.py new file mode 100644 index 00000000..c862e7eb --- /dev/null +++ b/tests/runtime_locale_context_cases/construction_and_cache.py @@ -0,0 +1,452 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +import logging +import sys +import threading +from typing import Any +from unittest.mock import patch + +import pytest +from babel import Locale + +import ftllexengine.core.babel_compat as _bc +from ftllexengine.constants import MAX_LOCALE_CACHE_SIZE +from ftllexengine.core.babel_compat import BabelImportError +from ftllexengine.core.locale_utils import normalize_locale +from ftllexengine.runtime.locale_context import LocaleContext + +# ============================================================================ +# Construction Guard Tests +# ============================================================================ + + + +class TestLocaleContextConstructionGuard: + """Test __post_init__ validation prevents direct construction.""" + + def test_direct_construction_without_token_raises(self) -> None: + """Direct construction without factory token raises TypeError.""" + babel_locale = Locale.parse("en_US") + + with pytest.raises(TypeError) as exc_info: + LocaleContext( + locale_code="en-US", + _babel_locale=babel_locale, + ) + + error_msg = str(exc_info.value) + assert "LocaleContext.create()" in error_msg + assert "LocaleContext.create_or_raise()" in error_msg + assert "direct construction" in error_msg + + def test_direct_construction_with_wrong_token_raises(self) -> None: + """Direct construction with invalid token raises TypeError.""" + babel_locale = Locale.parse("en_US") + wrong_token = object() + + with pytest.raises(TypeError) as exc_info: + LocaleContext( + locale_code="en-US", + _babel_locale=babel_locale, + _factory_token=wrong_token, + ) + + assert "LocaleContext.create()" in str(exc_info.value) + + def test_direct_construction_with_none_token_raises(self) -> None: + """Direct construction with None token raises TypeError.""" + babel_locale = Locale.parse("en_US") + + with pytest.raises(TypeError) as exc_info: + LocaleContext( + locale_code="en-US", + _babel_locale=babel_locale, + _factory_token=None, + ) + + error_msg = str(exc_info.value) + assert "LocaleContext.create()" in error_msg + assert "direct construction" in error_msg + + def test_factory_methods_bypass_guard(self) -> None: + """Factory methods bypass __post_init__ guard successfully.""" + ctx1 = LocaleContext.create("en-US") + assert isinstance(ctx1, LocaleContext) + + ctx2 = LocaleContext.create_or_raise("de-DE") + assert isinstance(ctx2, LocaleContext) + +class TestLocaleContextCacheManagement: + """Test LocaleContext cache operations.""" + + def test_clear_cache_empties_cache(self) -> None: + """clear_cache() empties the cache.""" + LocaleContext.clear_cache() + LocaleContext.create("en-US") + LocaleContext.create("de-DE") + assert LocaleContext.cache_size() > 0 + + LocaleContext.clear_cache() + assert LocaleContext.cache_size() == 0 + + def test_cache_size_returns_count(self) -> None: + """cache_size() returns number of cached instances.""" + LocaleContext.clear_cache() + assert LocaleContext.cache_size() == 0 + + LocaleContext.create("en-US") + assert LocaleContext.cache_size() == 1 + + LocaleContext.create("de-DE") + assert LocaleContext.cache_size() == 2 + + def test_cache_info_returns_dict(self) -> None: + """cache_info() returns dictionary with expected keys.""" + LocaleContext.clear_cache() + LocaleContext.create("en-US") + LocaleContext.create("de-DE") + + info = LocaleContext.cache_info() + + assert isinstance(info, dict) + assert "size" in info + assert "max_size" in info + assert "locales" in info + assert isinstance(info["locales"], tuple) + assert info["size"] == 2 + + def test_cache_info_after_clear(self) -> None: + """cache_info() returns empty after clearing.""" + LocaleContext.clear_cache() + LocaleContext.create("en-US") + + LocaleContext.clear_cache() + info = LocaleContext.cache_info() + + assert info["size"] == 0 + assert info["locales"] == () + + def test_cache_returns_same_instance(self) -> None: + """Cache returns the same instance for same locale.""" + LocaleContext.clear_cache() + + ctx1 = LocaleContext.create("en-US") + ctx2 = LocaleContext.create("en-US") + + assert ctx1 is ctx2 + + def test_cache_double_check_pattern(self) -> None: + """Cache double-check pattern returns existing instance.""" + from ftllexengine.core.locale_utils import ( + normalize_locale, + ) + from ftllexengine.runtime.locale_context import ( + _FACTORY_TOKEN, + ) + + LocaleContext.clear_cache() + + cache_key = normalize_locale("en-RACE-TEST") + pre_inserted_ctx = LocaleContext( + locale_code="en-RACE-TEST", + _babel_locale=Locale.parse("en_US"), + _factory_token=_FACTORY_TOKEN, + ) + + original_parse = Locale.parse + + def parse_with_insertion( + code: str, *args: Any, **kwargs: Any + ) -> Locale: + with LocaleContext._cache_lock: + if cache_key not in LocaleContext._cache: + LocaleContext._cache[cache_key] = ( + pre_inserted_ctx + ) + return original_parse(code, *args, **kwargs) + + with patch.object( + Locale, "parse", side_effect=parse_with_insertion + ): + result = LocaleContext.create("en-RACE-TEST") + + assert result is pre_inserted_ctx + + def test_cache_thread_safety(self) -> None: + """Cache is thread-safe under concurrent access.""" + LocaleContext.clear_cache() + + results: list[LocaleContext] = [] + + def create_context() -> None: + ctx = LocaleContext.create("en-US") + results.append(ctx) + + thread1 = threading.Thread(target=create_context) + thread2 = threading.Thread(target=create_context) + + thread1.start() + thread2.start() + thread1.join() + thread2.join() + + assert len(results) == 2 + assert results[0] is results[1] + + def test_cache_eviction_on_max_size(self) -> None: + """Cache evicts LRU entry when max size reached.""" + LocaleContext.clear_cache() + + locales = ["en-US"] + [ + f"de-DE-x-variant{i}" + for i in range(MAX_LOCALE_CACHE_SIZE) + ] + + for locale in locales[:MAX_LOCALE_CACHE_SIZE]: + LocaleContext.create(locale) + + assert ( + LocaleContext.cache_size() == MAX_LOCALE_CACHE_SIZE + ) + + LocaleContext.create(locales[MAX_LOCALE_CACHE_SIZE]) + + assert ( + LocaleContext.cache_size() == MAX_LOCALE_CACHE_SIZE + ) + + info = LocaleContext.cache_info() + locales_tuple = info["locales"] + assert isinstance(locales_tuple, tuple) + assert "en_US" not in locales_tuple + + def test_clear_cache_and_recreate(self) -> None: + """Cache clearing and recreation works correctly.""" + LocaleContext.clear_cache() + + ctx1 = LocaleContext.create("fr-FR") + assert ctx1.locale_code == "fr_fr" + + ctx2 = LocaleContext.create("fr-FR") + assert ctx1 is ctx2 + + LocaleContext.clear_cache() + ctx3 = LocaleContext.create("fr-FR") + assert ctx1 is not ctx3 + +class TestLocaleContextCreate: + """Test LocaleContext.create() factory with graceful fallback.""" + + def test_create_valid_locale(self) -> None: + """create() returns LocaleContext for valid locale.""" + ctx = LocaleContext.create("en-US") + assert isinstance(ctx, LocaleContext) + assert ctx.locale_code == "en_us" + + def test_create_unknown_locale_returns_context(self) -> None: + """create() returns LocaleContext for unknown locale.""" + LocaleContext.clear_cache() + result = LocaleContext.create("xx-UNKNOWN") + + assert isinstance(result, LocaleContext) + assert result.locale_code == "xx_unknown" + assert result.is_fallback is True + + def test_create_unknown_locale_warns( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """create() logs warning for unknown locale.""" + LocaleContext.clear_cache() + + with caplog.at_level(logging.WARNING): + LocaleContext.create("xx_INVALID") + + assert any( + "Unknown locale" in r.message + or "xx_INVALID" in r.message + for r in caplog.records + ) + + def test_create_invalid_format_raises(self) -> None: + """create() rejects structurally invalid locale boundary values.""" + LocaleContext.clear_cache() + + with pytest.raises(ValueError, match=r"Invalid locale_code: '!!!INVALID@@@'"): + LocaleContext.create("!!!INVALID@@@") + + def test_create_unknown_locale_uses_en_us(self) -> None: + """create() uses en_US formatting for unknown locales.""" + ctx = LocaleContext.create("invalid-locale-xyz") + locale = ctx.babel_locale + + assert locale.language == "en" + +class TestLocaleContextCreateOrRaise: + """Test create_or_raise() factory with strict validation.""" + + def test_create_or_raise_valid_locale(self) -> None: + """create_or_raise() returns LocaleContext for valid locale.""" + ctx = LocaleContext.create_or_raise("en-US") + assert isinstance(ctx, LocaleContext) + assert ctx.locale_code == "en_us" + assert ctx.is_fallback is False + + def test_create_or_raise_unknown_locale_raises(self) -> None: + """create_or_raise() raises ValueError for unknown locale.""" + with pytest.raises( + ValueError, match=r"Unknown locale identifier" + ): + LocaleContext.create_or_raise("xx-INVALID") + + def test_create_or_raise_invalid_format_raises(self) -> None: + """create_or_raise() raises ValueError for invalid format.""" + with pytest.raises(ValueError, match=r"Invalid locale_code: 'not a valid locale'"): + LocaleContext.create_or_raise( + "not a valid locale" + ) + + def test_create_or_raise_error_contains_locale_code( + self, + ) -> None: + """create_or_raise() error message includes locale code.""" + test_locales = ["bad-locale", "xyz-123"] + + for locale_code in test_locales: + with pytest.raises( + ValueError, match="locale" + ) as exc_info: + LocaleContext.create_or_raise(locale_code) + + assert normalize_locale(locale_code) in str(exc_info.value) + +class TestLocaleContextBabelImportErrors: + """Test ImportError paths when Babel is not installed.""" + + def test_create_raises_babel_import_error(self) -> None: + """create() raises BabelImportError when Babel unavailable.""" + LocaleContext.clear_cache() + + babel_module = sys.modules.pop("babel", None) + babel_core = sys.modules.pop("babel.core", None) + babel_dates_mod = sys.modules.pop("babel.dates", None) + babel_nums = sys.modules.pop("babel.numbers", None) + + # Reset sentinel so _check_babel_available() re-evaluates under the mock + _bc._babel_available = None + + try: + with patch.dict(sys.modules, {"babel": None}): + original_import = __import__ + + def mock_import( + name: str, + globals_dict: ( + dict[str, object] | None + ) = None, + locals_dict: ( + dict[str, object] | None + ) = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel": + err = ModuleNotFoundError("No module named 'babel'") + err.name = "babel" + raise err + return original_import( + name, + globals_dict, + locals_dict, + fromlist, + level, + ) + + with patch( + "builtins.__import__", + side_effect=mock_import, + ): + with pytest.raises( + BabelImportError + ) as exc_info: + LocaleContext.create("en-US") + + assert "LocaleContext.create" in str( + exc_info.value + ) + finally: + if babel_module is not None: + sys.modules["babel"] = babel_module + if babel_core is not None: + sys.modules["babel.core"] = babel_core + if babel_dates_mod is not None: + sys.modules["babel.dates"] = babel_dates_mod + if babel_nums is not None: + sys.modules["babel.numbers"] = babel_nums + # Reset sentinel so subsequent tests reinitialize with Babel available + _bc._babel_available = None + LocaleContext.clear_cache() + + def test_create_or_raise_raises_babel_import_error( + self, + ) -> None: + """create_or_raise() raises BabelImportError.""" + babel_module = sys.modules.pop("babel", None) + babel_core = sys.modules.pop("babel.core", None) + babel_dates_mod = sys.modules.pop("babel.dates", None) + babel_nums = sys.modules.pop("babel.numbers", None) + + # Reset sentinel so _check_babel_available() re-evaluates under the mock + _bc._babel_available = None + + try: + with patch.dict(sys.modules, {"babel": None}): + original_import = __import__ + + def mock_import( + name: str, + globals_dict: ( + dict[str, object] | None + ) = None, + locals_dict: ( + dict[str, object] | None + ) = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel": + err = ModuleNotFoundError("No module named 'babel'") + err.name = "babel" + raise err + return original_import( + name, + globals_dict, + locals_dict, + fromlist, + level, + ) + + with patch( + "builtins.__import__", + side_effect=mock_import, + ): + with pytest.raises( + BabelImportError + ) as exc_info: + LocaleContext.create_or_raise("en-US") + + assert "create_or_raise" in str( + exc_info.value + ) + finally: + if babel_module is not None: + sys.modules["babel"] = babel_module + if babel_core is not None: + sys.modules["babel.core"] = babel_core + if babel_dates_mod is not None: + sys.modules["babel.dates"] = babel_dates_mod + if babel_nums is not None: + sys.modules["babel.numbers"] = babel_nums + # Reset sentinel so subsequent tests reinitialize with Babel available + _bc._babel_available = None diff --git a/tests/runtime_locale_context_cases/datetime_and_currency.py b/tests/runtime_locale_context_cases/datetime_and_currency.py new file mode 100644 index 00000000..be255fb5 --- /dev/null +++ b/tests/runtime_locale_context_cases/datetime_and_currency.py @@ -0,0 +1,397 @@ +# mypy: ignore-errors +from __future__ import annotations + +import logging +from datetime import UTC, datetime +from decimal import Decimal +from unittest.mock import MagicMock, PropertyMock, patch + +import pytest +from babel import dates as babel_dates +from babel import numbers as babel_numbers + +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.runtime.locale_context import LocaleContext + +# ============================================================================ +# Construction Guard Tests +# ============================================================================ + + + +class TestFormatDatetime: + """Test format_datetime() with various locales and parameters.""" + + def test_format_datetime_en_us_short(self) -> None: + """format_datetime() with short style for en-US.""" + ctx = LocaleContext.create("en-US") + dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) + result = ctx.format_datetime(dt, date_style="short") + assert "10" in result or "27" in result + + def test_format_datetime_de_de_short(self) -> None: + """format_datetime() with short style for de-DE.""" + ctx = LocaleContext.create("de-DE") + dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) + result = ctx.format_datetime(dt, date_style="short") + assert "27" in result or "10" in result + + def test_format_datetime_custom_pattern(self) -> None: + """format_datetime() respects custom pattern.""" + ctx = LocaleContext.create("en-US") + dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) + result = ctx.format_datetime(dt, pattern="yyyy-MM-dd") + assert "2025" in result + assert "10" in result + assert "27" in result + + def test_format_datetime_from_iso_string(self) -> None: + """format_datetime() accepts ISO 8601 string.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_datetime( + "2025-10-27", date_style="short" + ) + assert "10" in result or "27" in result + + def test_format_datetime_invalid_string_raises( + self, + ) -> None: + """format_datetime() raises for invalid datetime string.""" + ctx = LocaleContext.create("en-US") + with pytest.raises(FrozenFluentError) as exc_info: + ctx.format_datetime( + "not-a-date", date_style="short" + ) + assert ( + exc_info.value.category == ErrorCategory.FORMATTING + ) + assert "not ISO 8601 format" in str(exc_info.value) + + def test_format_datetime_with_time_style(self) -> None: + """format_datetime() formats date and time together.""" + ctx = LocaleContext.create("en-US") + dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) + result = ctx.format_datetime( + dt, date_style="short", time_style="short" + ) + assert "10" in result or "27" in result + has_time = ( + "14" in result + or "2" in result + or "30" in result + ) + assert has_time + + def test_format_datetime_string_pattern(self) -> None: + """format_datetime() handles string datetime_pattern.""" + ctx = LocaleContext.create("en-US") + dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) + + with patch.object( + ctx.babel_locale.datetime_formats, "get" + ) as mock_get: + mock_get.return_value = "{1} at {0}" + result = ctx.format_datetime( + dt, date_style="medium", time_style="short" + ) + assert "at" in result + + def test_format_datetime_object_without_format_method( + self, + ) -> None: + """format_datetime() when pattern lacks format().""" + ctx = LocaleContext.create("en-US") + dt = datetime(2025, 7, 15, 10, 30, 0, tzinfo=UTC) + + class PatternWithoutFormat: + """Mock pattern without format() method.""" + + def __str__(self) -> str: + return "{1} @ {0}" + + mock_pattern = PatternWithoutFormat() + assert not hasattr(mock_pattern, "format") + + with patch.object( + ctx.babel_locale.datetime_formats, + "get", + return_value=mock_pattern, + ): + result = ctx.format_datetime( + dt, date_style="medium", time_style="short" + ) + assert " @ " in result + + def test_format_datetime_error_raises_formatting_error( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """format_datetime() raises FrozenFluentError on error.""" + def mock_format_date( + *_args: object, **_kwargs: object + ) -> None: + msg = "Mocked format error" + raise ValueError(msg) + + monkeypatch.setattr( + babel_dates, "format_date", mock_format_date + ) + + ctx = LocaleContext.create("en-US") + dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) + + with pytest.raises(FrozenFluentError) as exc_info: + ctx.format_datetime(dt, date_style="short") + assert ( + exc_info.value.category == ErrorCategory.FORMATTING + ) + +class TestFormatCurrency: + """Test format_currency() with various locales and parameters.""" + + def test_format_currency_en_us_symbol(self) -> None: + """format_currency() with symbol for en-US.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_currency( + Decimal("123.45"), currency="EUR" + ) + assert "123" in result + + def test_format_currency_lv_lv_symbol(self) -> None: + """format_currency() with symbol for lv-LV.""" + ctx = LocaleContext.create("lv-LV") + result = ctx.format_currency( + Decimal("123.45"), currency="EUR" + ) + assert "123" in result + + def test_format_currency_code_display(self) -> None: + """format_currency() displays currency code.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_currency( + Decimal("123.45"), + currency="USD", + currency_display="code", + ) + assert "USD" in result + assert "123.45" in result + + def test_format_currency_name_display(self) -> None: + """format_currency() displays currency name.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_currency( + Decimal("123.45"), + currency="USD", + currency_display="name", + ) + assert isinstance(result, str) + + def test_format_currency_symbol_display_standard( + self, + ) -> None: + """format_currency() with explicit symbol display.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_currency( + Decimal("123.45"), + currency="EUR", + currency_display="symbol", + ) + assert "123.45" in result + + def test_format_currency_custom_pattern(self) -> None: + """format_currency() respects custom pattern.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_currency( + Decimal("1234.56"), + currency="USD", + pattern="#,##0.00 \xa4", + ) + assert "1,234.56" in result or "1234.56" in result + + def test_format_currency_error_raises_formatting_error( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """format_currency() raises FrozenFluentError on error.""" + def mock_format_currency( + *_args: object, **_kwargs: object + ) -> None: + msg = "Mocked format error" + raise ValueError(msg) + + monkeypatch.setattr( + babel_numbers, + "format_currency", + mock_format_currency, + ) + + ctx = LocaleContext.create("en-US") + with pytest.raises(FrozenFluentError) as exc_info: + ctx.format_currency(Decimal("123.45"), currency="USD") + + assert ( + exc_info.value.category == ErrorCategory.FORMATTING + ) + assert "USD 123.45" in exc_info.value.fallback_value + +class TestGetIsoCodePattern: + """Test _get_iso_code_pattern() internal helper.""" + + def test_returns_string_or_none(self) -> None: + """_get_iso_code_pattern() returns string or None.""" + ctx = LocaleContext.create("en-US") + result = ctx._get_iso_code_pattern() + assert result is None or isinstance(result, str) + + def test_doubles_currency_sign(self) -> None: + """Doubles currency sign per CLDR spec.""" + ctx = LocaleContext.create("en-US") + result = ctx._get_iso_code_pattern() + if result is not None: + assert "\xa4\xa4" in result + + def test_none_when_no_standard(self) -> None: + """Returns None when standard pattern missing.""" + ctx = LocaleContext.create("en-US") + + mock_formats: dict[str, None] = {"standard": None} + mock_locale = MagicMock() + type(mock_locale).currency_formats = PropertyMock( + return_value=mock_formats + ) + + original_locale = ctx._babel_locale + object.__setattr__(ctx, "_babel_locale", mock_locale) + + try: + result = ctx._get_iso_code_pattern() + assert result is None + finally: + object.__setattr__( + ctx, "_babel_locale", original_locale + ) + + def test_none_when_no_pattern_attribute(self) -> None: + """Returns None when pattern attribute missing.""" + ctx = LocaleContext.create("en-US") + + mock_pattern = MagicMock(spec=[]) + mock_formats = {"standard": mock_pattern} + mock_locale = MagicMock() + type(mock_locale).currency_formats = PropertyMock( + return_value=mock_formats + ) + + original_locale = ctx._babel_locale + object.__setattr__(ctx, "_babel_locale", mock_locale) + + try: + result = ctx._get_iso_code_pattern() + assert result is None + finally: + object.__setattr__( + ctx, "_babel_locale", original_locale + ) + + def test_none_when_no_currency_placeholder( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """Returns None and logs when no placeholder.""" + ctx = LocaleContext.create("en-US") + + mock_pattern = MagicMock() + mock_pattern.pattern = "#,##0.00" + mock_formats = {"standard": mock_pattern} + mock_locale = MagicMock() + type(mock_locale).currency_formats = PropertyMock( + return_value=mock_formats + ) + + original_locale = ctx._babel_locale + object.__setattr__(ctx, "_babel_locale", mock_locale) + + try: + with caplog.at_level(logging.DEBUG): + result = ctx._get_iso_code_pattern() + + assert result is None + assert any( + "lacks placeholder" in r.message + for r in caplog.records + ) + finally: + object.__setattr__( + ctx, "_babel_locale", original_locale + ) + +class TestCurrencyPatternFallback: + """Test currency code display fallback paths.""" + + def test_code_display_with_invalid_pattern(self) -> None: + """Code display when pattern lacks placeholder.""" + ctx = LocaleContext.create("en-US") + + class MockPattern: + """Mock pattern without currency placeholder.""" + + pattern = "#,##0.00" + + with ( + patch.object( + ctx.babel_locale.currency_formats, + "get", + return_value=MockPattern(), + ), + patch( + "ftllexengine.runtime.locale_context.logger" + ) as mock_logger, + ): + result = ctx.format_currency( + Decimal("123.45"), + currency="USD", + currency_display="code", + ) + + assert isinstance(result, str) + assert "123" in result + mock_logger.debug.assert_called() + + def test_code_display_with_no_pattern_attribute( + self, + ) -> None: + """Code display when pattern lacks attribute.""" + ctx = LocaleContext.create("en-US") + + class MockPatternWithoutAttr: + """Mock pattern without pattern attribute.""" + + mock_obj = MockPatternWithoutAttr() + assert not hasattr(mock_obj, "pattern") + + with patch.object( + ctx.babel_locale.currency_formats, + "get", + return_value=mock_obj, + ): + result = ctx.format_currency( + Decimal("123.45"), + currency="USD", + currency_display="code", + ) + assert isinstance(result, str) + assert "123" in result + + def test_code_display_with_none_pattern(self) -> None: + """Code display when standard pattern is None.""" + ctx = LocaleContext.create("en-US") + + with patch.object( + ctx.babel_locale.currency_formats, + "get", + return_value=None, + ): + result = ctx.format_currency( + Decimal("123.45"), + currency="USD", + currency_display="code", + ) + assert isinstance(result, str) + assert "123" in result diff --git a/tests/runtime_locale_context_cases/fallback_and_strict.py b/tests/runtime_locale_context_cases/fallback_and_strict.py new file mode 100644 index 00000000..75220a81 --- /dev/null +++ b/tests/runtime_locale_context_cases/fallback_and_strict.py @@ -0,0 +1,179 @@ +# mypy: ignore-errors +from __future__ import annotations + +import logging +from unittest.mock import patch + +import pytest +from babel import Locale, UnknownLocaleError + +from ftllexengine.runtime.locale_context import ( + _UNKNOWN_LOCALE_WARNING_LIMIT, + LocaleContext, +) + + +class TestLocaleContextFallbackBehavior: + """Regression coverage for fallback-locale resolution paths.""" + + def test_create_accepts_babel_language_alias_locale(self) -> None: + """Alias locales accepted by Babel remain valid.""" + LocaleContext.clear_cache() + + ctx = LocaleContext.create("iw") + + assert ctx.is_fallback is False + assert ctx.babel_locale.language == "he" + + def test_create_or_raise_rejects_cached_fallback_locale(self) -> None: + """Strict creation never reuses a fallback cache entry as valid.""" + LocaleContext.clear_cache() + LocaleContext.create("xx-INVALID") + + with pytest.raises(ValueError, match="Unknown locale identifier 'xx_invalid'"): + LocaleContext.create_or_raise("xx-INVALID") + + def test_create_unknown_locale_flood_suppresses_extra_warnings( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """Repeated fallback locales emit bounded warnings.""" + LocaleContext.clear_cache() + + with caplog.at_level(logging.WARNING): + for i in range(_UNKNOWN_LOCALE_WARNING_LIMIT + 5): + LocaleContext.create(f"xx-TEST{i:04d}") + + unknown_messages = [ + record.message + for record in caplog.records + if "Unknown locale" in record.message + ] + suppression_messages = [ + record.message + for record in caplog.records + if "suppressed after" in record.message + ] + + assert len(unknown_messages) == _UNKNOWN_LOCALE_WARNING_LIMIT + assert suppression_messages == [ + "Additional locale fallback warnings suppressed after " + f"{_UNKNOWN_LOCALE_WARNING_LIMIT} events; most recent locale was " + "'xx_test0008'." + ] + + def test_create_skips_unknown_parse_for_impossible_language(self) -> None: + """Impossible language tags fall back without parsing the unknown code.""" + LocaleContext.clear_cache() + parse_calls: list[str] = [] + + class FakeLocaleClass: + @staticmethod + def parse(code: str) -> Locale: + parse_calls.append(code) + return Locale.parse("en_US") + + with ( + patch( + "ftllexengine.runtime.locale_resolution.get_locale_identifiers_func", + return_value=lambda: ("en_US", "de_DE"), + ), + patch( + "ftllexengine.runtime.locale_resolution.get_babel_global_func", + return_value=lambda name: {"iw": "he"} + if name == "language_aliases" + else {}, + ), + patch( + "ftllexengine.runtime.locale_context.get_locale_class", + return_value=FakeLocaleClass, + ), + ): + ctx = LocaleContext.create("xx-TEST0000") + + assert ctx.is_fallback is True + assert parse_calls == ["en_US"] + + def test_create_uses_unknown_locale_error_fallback_path( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """Known-language unknown locales fall back through Babel's error path.""" + LocaleContext.clear_cache() + + class FakeLocaleClass: + @staticmethod + def parse(code: str) -> Locale: + if code == "en_test": + raise UnknownLocaleError(code) + return Locale.parse("en_US") + + with ( + patch( + "ftllexengine.runtime.locale_context.get_locale_class", + return_value=FakeLocaleClass, + ), + patch( + "ftllexengine.runtime.locale_context.get_unknown_locale_error_class", + return_value=UnknownLocaleError, + ), + patch( + "ftllexengine.runtime.locale_context.is_definitely_unknown_locale", + return_value=False, + ), + caplog.at_level(logging.WARNING), + ): + ctx = LocaleContext.create("en-TEST") + + assert ctx.is_fallback is True + assert any("Unknown locale 'en_test'" in record.message for record in caplog.records) + + def test_create_or_raise_uses_unknown_locale_error_branch(self) -> None: + """Strict creation raises when Babel rejects a known-language locale.""" + LocaleContext.clear_cache() + + class FakeLocaleClass: + @staticmethod + def parse(code: str) -> Locale: + raise UnknownLocaleError(code) + + with ( + patch( + "ftllexengine.runtime.locale_context.get_locale_class", + return_value=FakeLocaleClass, + ), + patch( + "ftllexengine.runtime.locale_context.get_unknown_locale_error_class", + return_value=UnknownLocaleError, + ), + patch( + "ftllexengine.runtime.locale_context.is_definitely_unknown_locale", + return_value=False, + ), + pytest.raises(ValueError, match="Unknown locale identifier 'en_test'"), + ): + LocaleContext.create_or_raise("en-TEST") + + def test_create_or_raise_uses_invalid_locale_format_branch(self) -> None: + """Strict creation surfaces Babel ValueError messages unchanged.""" + LocaleContext.clear_cache() + + class FakeLocaleClass: + @staticmethod + def parse(code: str) -> Locale: + msg = f"synthetic parse failure for {code}" + raise ValueError(msg) + + with ( + patch( + "ftllexengine.runtime.locale_context.get_locale_class", + return_value=FakeLocaleClass, + ), + patch( + "ftllexengine.runtime.locale_context.is_definitely_unknown_locale", + return_value=False, + ), + pytest.raises( + ValueError, + match="Invalid locale format 'en_test': synthetic parse failure for en_test", + ), + ): + LocaleContext.create_or_raise("en-TEST") diff --git a/tests/runtime_locale_context_cases/number_formatting.py b/tests/runtime_locale_context_cases/number_formatting.py new file mode 100644 index 00000000..bf1d5ec3 --- /dev/null +++ b/tests/runtime_locale_context_cases/number_formatting.py @@ -0,0 +1,233 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +from decimal import Decimal + +import pytest +from babel import numbers as babel_numbers + +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.runtime.locale_context import LocaleContext + +# ============================================================================ +# Construction Guard Tests +# ============================================================================ + + + +class TestFormatNumber: + """Test format_number() with various locales and parameters.""" + + def test_format_number_en_us_grouping(self) -> None: + """format_number() formats with grouping for en-US.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number(Decimal("1234.5"), use_grouping=True) + assert "1,234" in result or "1234" in result + + def test_format_number_de_de_grouping(self) -> None: + """format_number() formats with grouping for de-DE.""" + ctx = LocaleContext.create("de-DE") + result = ctx.format_number(Decimal("1234.5"), use_grouping=True) + assert "1.234" in result or "1234" in result + + def test_format_number_fixed_decimals(self) -> None: + """format_number() formats with fixed decimal places.""" + ctx = LocaleContext.create("en-US") + + result = ctx.format_number( + Decimal("1234.5"), + minimum_fraction_digits=2, + maximum_fraction_digits=2, + ) + assert result == "1,234.50" + + result = ctx.format_number( + Decimal("1234.567"), + minimum_fraction_digits=0, + maximum_fraction_digits=0, + ) + assert result == "1,235" + assert "." not in result + + def test_format_number_fixed_three_decimals(self) -> None: + """format_number() with fixed 3 decimal places.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number( + Decimal("123.4"), + minimum_fraction_digits=3, + maximum_fraction_digits=3, + ) + assert result == "123.400" + + def test_format_number_custom_pattern(self) -> None: + """format_number() respects custom pattern.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number( + Decimal("-1234.56"), pattern="#,##0.00;(#,##0.00)" + ) + assert "1,234.56" in result or "1234.56" in result + + def test_format_number_preserves_decimal_precision( + self, + ) -> None: + """format_number() preserves large decimal precision.""" + ctx = LocaleContext.create("en-US") + + large_decimal = Decimal("123456789.123456789") + result = ctx.format_number( + large_decimal, + minimum_fraction_digits=2, + maximum_fraction_digits=2, + ) + + assert result == "123,456,789.12" + assert result.count(".") == 1 + decimal_part = result.split(".")[-1] + assert len(decimal_part) == 2 + + def test_format_number_with_decimal_type(self) -> None: + """format_number() with Decimal type for fixed decimals.""" + ctx = LocaleContext.create("de-DE") + + value = Decimal("1234.5") + result = ctx.format_number( + value, + minimum_fraction_digits=2, + maximum_fraction_digits=2, + ) + + assert "," in result + assert result == "1.234,50" + + def test_format_number_error_raises_formatting_error( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """format_number() raises FrozenFluentError on error.""" + def mock_format_decimal( + *_args: object, **_kwargs: object + ) -> None: + msg = "Mocked format error" + raise ValueError(msg) + + monkeypatch.setattr( + babel_numbers, + "format_decimal", + mock_format_decimal, + ) + + ctx = LocaleContext.create("en-US") + with pytest.raises(FrozenFluentError) as exc_info: + ctx.format_number(Decimal("123.45")) + + assert ( + exc_info.value.category == ErrorCategory.FORMATTING + ) + assert exc_info.value.fallback_value == "123.45" + +class TestFormatNumberDigitValidation: + """Test format_number() digit parameter validation.""" + + def test_minimum_fraction_digits_negative_raises( + self, + ) -> None: + """Raises ValueError for negative minimum.""" + ctx = LocaleContext.create("en-US") + with pytest.raises( + ValueError, + match=r"minimum_fraction_digits must be", + ): + ctx.format_number( + Decimal("123.45"), minimum_fraction_digits=-1 + ) + + def test_minimum_fraction_digits_exceeds_max_raises( + self, + ) -> None: + """Raises ValueError when exceeding MAX_FORMAT_DIGITS.""" + from ftllexengine.constants import ( + MAX_FORMAT_DIGITS, + ) + + ctx = LocaleContext.create("en-US") + with pytest.raises( + ValueError, + match=r"minimum_fraction_digits must be", + ): + ctx.format_number( + Decimal("123.45"), + minimum_fraction_digits=MAX_FORMAT_DIGITS + 1, + ) + + def test_maximum_fraction_digits_negative_raises( + self, + ) -> None: + """Raises ValueError for negative maximum.""" + ctx = LocaleContext.create("en-US") + with pytest.raises( + ValueError, + match=r"maximum_fraction_digits must be", + ): + ctx.format_number( + Decimal("123.45"), maximum_fraction_digits=-1 + ) + + def test_maximum_fraction_digits_exceeds_max_raises( + self, + ) -> None: + """Raises ValueError when exceeding MAX_FORMAT_DIGITS.""" + from ftllexengine.constants import ( + MAX_FORMAT_DIGITS, + ) + + ctx = LocaleContext.create("en-US") + with pytest.raises( + ValueError, + match=r"maximum_fraction_digits must be", + ): + ctx.format_number( + Decimal("123.45"), + maximum_fraction_digits=MAX_FORMAT_DIGITS + 1, + ) + +class TestFormatNumberSpecialValues: + """Test format_number() with special Decimal values.""" + + def test_format_number_positive_infinity(self) -> None: + """format_number() handles positive infinity.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number(Decimal("Infinity")) + assert isinstance(result, str) + assert len(result) > 0 + + def test_format_number_negative_infinity(self) -> None: + """format_number() handles negative infinity.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number(Decimal("-Infinity")) + assert isinstance(result, str) + assert len(result) > 0 + + def test_format_number_nan(self) -> None: + """format_number() handles NaN.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number(Decimal("NaN")) + assert isinstance(result, str) + assert len(result) > 0 + + def test_format_number_infinity_with_grouping(self) -> None: + """format_number() handles infinity with use_grouping.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number( + Decimal("Infinity"), use_grouping=False + ) + assert isinstance(result, str) + assert len(result) > 0 + + def test_format_number_nan_with_custom_pattern(self) -> None: + """format_number() handles NaN with custom pattern.""" + ctx = LocaleContext.create("en-US") + result = ctx.format_number( + Decimal("NaN"), pattern="#,##0.00" + ) + assert isinstance(result, str) + assert len(result) > 0 diff --git a/tests/runtime_plural_rules_cases/__init__.py b/tests/runtime_plural_rules_cases/__init__.py new file mode 100644 index 00000000..e98216c2 --- /dev/null +++ b/tests/runtime_plural_rules_cases/__init__.py @@ -0,0 +1,82 @@ +"""Tests for plural_rules.py - CLDR plural category selection using Babel. + +Comprehensive property-based tests ensuring plural rule correctness across all locales +and number ranges. Critical for multilingual applications with proper pluralization. + +Property-Based Testing Strategy: + Uses Hypothesis to verify mathematical properties and CLDR compliance across + locale families (Germanic, Slavic, Romance, Semitic, etc.). + +Coverage: + - All CLDR plural categories (zero, one, two, few, many, other) + - 30+ representative locales across language families + - Edge cases (unknown locales, large numbers, decimals) + - Babel ImportError path for parser-only installations +""" + +from __future__ import annotations + +import sys +from decimal import Decimal +from unittest.mock import patch + +import pytest +from babel.core import UnknownLocaleError +from hypothesis import assume, event, example, given +from hypothesis import strategies as st + +import ftllexengine.core.babel_compat as _bc +from ftllexengine.runtime.plural_rules import select_plural_category + +# ============================================================================ +# Hypothesis Strategies +# ============================================================================ + +# Valid locale codes across language families +LOCALE_CODES = st.sampled_from([ + "en", "en_US", "en_GB", + "lv", "lv_LV", + "de", "de_DE", + "pl", "pl_PL", + "ru", "ru_RU", + "ar", "ar_SA", + "fr", "fr_FR", + "es", "es_ES", + "it", "it_IT", + "pt", "pt_PT", "pt_BR", + "zh", "zh_CN", + "ja", "ja_JP", + "ko", "ko_KR", + "hi", "hi_IN", + "bn", "bn_BD", + "vi", "vi_VN", + "tr", "tr_TR", + "th", "th_TH", + "uk", "uk_UA", +]) + +# Numbers strategy (integers and decimals) +NUMBERS = st.one_of( + st.integers(min_value=0, max_value=1000000), + st.decimals( + min_value=Decimal(0), max_value=Decimal(1000000), + allow_nan=False, allow_infinity=False, + ), +) + +__all__ = [ + "LOCALE_CODES", + "NUMBERS", + "Decimal", + "UnknownLocaleError", + "_bc", + "assume", + "event", + "example", + "given", + "patch", + "pytest", + "select_plural_category", + "st", + "sys", +] diff --git a/tests/runtime_plural_rules_cases/babel_import_error_tests_lines_67_70.py b/tests/runtime_plural_rules_cases/babel_import_error_tests_lines_67_70.py new file mode 100644 index 00000000..13c000f8 --- /dev/null +++ b/tests/runtime_plural_rules_cases/babel_import_error_tests_lines_67_70.py @@ -0,0 +1,64 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Babel ImportError Tests (lines 67-70) +# ============================================================================ + + +class TestPluralRulesBabelImportError: + """Test ImportError path when Babel is not installed (lines 67-70).""" + + def test_select_plural_category_raises_babel_import_error_when_babel_unavailable( + self, + ) -> None: + """select_plural_category raises BabelImportError when Babel unavailable.""" + from ftllexengine.core.babel_compat import ( + BabelImportError, + ) + + # Temporarily hide babel from sys.modules + babel_module = sys.modules.pop("babel", None) + babel_core = sys.modules.pop("babel.core", None) + babel_dates = sys.modules.pop("babel.dates", None) + babel_numbers = sys.modules.pop("babel.numbers", None) + + # Reset sentinel so _check_babel_available() re-evaluates under the mock + _bc._babel_available = None + + try: + with patch.dict(sys.modules, {"babel": None, "babel.core": None}): + original_import = __import__ + + def mock_import_babel( + name: str, + globals_dict: dict[str, object] | None = None, + locals_dict: dict[str, object] | None = None, + fromlist: tuple[str, ...] = (), + level: int = 0, + ) -> object: + if name == "babel" or name.startswith("babel."): + err = ModuleNotFoundError("No module named 'babel'") + err.name = "babel" + raise err + return original_import(name, globals_dict, locals_dict, fromlist, level) + + with patch("builtins.__import__", side_effect=mock_import_babel): + with pytest.raises(BabelImportError) as exc_info: + select_plural_category(42, "en-US") + + assert "select_plural_category" in str(exc_info.value) + finally: + # Restore babel modules + if babel_module is not None: + sys.modules["babel"] = babel_module + if babel_core is not None: + sys.modules["babel.core"] = babel_core + if babel_dates is not None: + sys.modules["babel.dates"] = babel_dates + if babel_numbers is not None: + sys.modules["babel.numbers"] = babel_numbers + # Reset sentinel so subsequent tests reinitialize with Babel available + _bc._babel_available = None diff --git a/tests/runtime_plural_rules_cases/decimal_support_tests.py b/tests/runtime_plural_rules_cases/decimal_support_tests.py new file mode 100644 index 00000000..df8592b4 --- /dev/null +++ b/tests/runtime_plural_rules_cases/decimal_support_tests.py @@ -0,0 +1,42 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Decimal Support Tests +# ============================================================================ + + +class TestDecimalSupport: + """Test Decimal type support in plural category selection.""" + + @given(n=st.integers(min_value=0, max_value=1000)) + @example(n=0) + @example(n=1) + @example(n=5) + def test_decimal_matches_integer(self, n: int) -> None: + """Decimal and integer with same value produce same category. + + Property: For all n in Z, f(n) = f(Decimal(n)) + """ + int_result = select_plural_category(n, "en_US") + decimal_result = select_plural_category(Decimal(n), "en_US") + + event(f"category={int_result}") + assert int_result == decimal_result + + def test_decimal_one_is_one(self) -> None: + """Decimal(1) matches 'one' category in English.""" + result = select_plural_category(Decimal(1), "en_US") + assert result == "one" + + def test_decimal_zero_is_other(self) -> None: + """Decimal(0) matches 'other' category in English.""" + result = select_plural_category(Decimal(0), "en_US") + assert result == "other" + + def test_decimal_fractional_is_other(self) -> None: + """Decimal fractional values match 'other' category in English.""" + result = select_plural_category(Decimal("1.5"), "en_US") + assert result == "other" diff --git a/tests/runtime_plural_rules_cases/locale_format_tests.py b/tests/runtime_plural_rules_cases/locale_format_tests.py new file mode 100644 index 00000000..91eaa2fc --- /dev/null +++ b/tests/runtime_plural_rules_cases/locale_format_tests.py @@ -0,0 +1,35 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Locale Format Tests +# ============================================================================ + + +class TestLocaleFormats: + """Test various locale code formats.""" + + def test_locale_case_insensitive(self) -> None: + """Locale code is case-insensitive.""" + result_upper = select_plural_category(0, "LV_LV") + result_lower = select_plural_category(0, "lv_lv") + result_mixed = select_plural_category(0, "Lv_LV") + + assert result_upper == "zero" + assert result_lower == "zero" + assert result_mixed == "zero" + + def test_short_locale_code_without_region(self) -> None: + """Short locale codes (without region) work correctly.""" + result = select_plural_category(0, "lv") + assert result == "zero" + + def test_bcp47_hyphen_format_supported(self) -> None: + """BCP-47 format with hyphens (en-US) works correctly.""" + result = select_plural_category(1, "en-US") + assert result == "one" + + result = select_plural_category(0, "lv-LV") + assert result == "zero" diff --git a/tests/runtime_plural_rules_cases/ordinal_plural_rule_tests.py b/tests/runtime_plural_rules_cases/ordinal_plural_rule_tests.py new file mode 100644 index 00000000..ca25b210 --- /dev/null +++ b/tests/runtime_plural_rules_cases/ordinal_plural_rule_tests.py @@ -0,0 +1,120 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Ordinal Plural Rule Tests +# ============================================================================ + + +class TestOrdinalPluralRules: + """Tests for the ordinal=True parameter using CLDR ordinal plural rules. + + Ordinal rules apply to rank/position contexts (1st, 2nd, 3rd, ...). + English ordinal rules: 1->one (1st), 2->two (2nd), 3->few (3rd), + 4+->other unless ends in 1/2/3 with specific exceptions. + """ + + def test_english_ordinal_one(self) -> None: + """English ordinal: 1 -> 'one' (1st).""" + assert select_plural_category(1, "en_US", ordinal=True) == "one" + + def test_english_ordinal_two(self) -> None: + """English ordinal: 2 -> 'two' (2nd).""" + assert select_plural_category(2, "en_US", ordinal=True) == "two" + + def test_english_ordinal_few(self) -> None: + """English ordinal: 3 -> 'few' (3rd).""" + assert select_plural_category(3, "en_US", ordinal=True) == "few" + + def test_english_ordinal_other(self) -> None: + """English ordinal: 4+ (no suffix rule) -> 'other' (4th).""" + assert select_plural_category(4, "en_US", ordinal=True) == "other" + + def test_english_ordinal_eleven(self) -> None: + """English ordinal: 11 -> 'other' (11th, not 11st).""" + assert select_plural_category(11, "en_US", ordinal=True) == "other" + + def test_english_ordinal_twelve(self) -> None: + """English ordinal: 12 -> 'other' (12th, not 12nd).""" + assert select_plural_category(12, "en_US", ordinal=True) == "other" + + def test_english_ordinal_thirteen(self) -> None: + """English ordinal: 13 -> 'other' (13th, not 13rd).""" + assert select_plural_category(13, "en_US", ordinal=True) == "other" + + def test_english_ordinal_twenty_one(self) -> None: + """English ordinal: 21 -> 'one' (21st).""" + assert select_plural_category(21, "en_US", ordinal=True) == "one" + + def test_english_ordinal_twenty_two(self) -> None: + """English ordinal: 22 -> 'two' (22nd).""" + assert select_plural_category(22, "en_US", ordinal=True) == "two" + + def test_english_ordinal_twenty_three(self) -> None: + """English ordinal: 23 -> 'few' (23rd).""" + assert select_plural_category(23, "en_US", ordinal=True) == "few" + + def test_ordinal_false_is_default(self) -> None: + """ordinal=False (default) uses cardinal rules, same as omitting the parameter.""" + assert ( + select_plural_category(1, "en_US") + == select_plural_category(1, "en_US", ordinal=False) + ) + # Ordinal and cardinal differ for n=2 in English: + # cardinal: 2 -> "other"; ordinal: 2 -> "two" + assert select_plural_category(2, "en_US") == "other" + assert select_plural_category(2, "en_US", ordinal=True) == "two" + + @given(n=st.integers(min_value=0, max_value=1000), locale=LOCALE_CODES) + @example(n=1, locale="en_US") + @example(n=2, locale="en_US") + @example(n=3, locale="en_US") + @example(n=11, locale="en_US") + def test_ordinal_always_returns_valid_category( + self, n: int, locale: str + ) -> None: + """Ordinal selection always returns a valid CLDR category. + + Property: For all n and locale, ordinal result in valid_categories + """ + result = select_plural_category(n, locale, ordinal=True) + + valid_categories = {"zero", "one", "two", "few", "many", "other"} + assert result in valid_categories + + event(f"category={result}") + event(f"locale={locale}") + + @given(n=st.integers(min_value=0, max_value=1000), locale=LOCALE_CODES) + def test_ordinal_deterministic(self, n: int, locale: str) -> None: + """Ordinal selection is deterministic: same inputs produce same result.""" + result1 = select_plural_category(n, locale, ordinal=True) + result2 = select_plural_category(n, locale, ordinal=True) + + assert result1 == result2 + event(f"category={result1}") + + @given(n=NUMBERS, locale=LOCALE_CODES) + def test_ordinal_never_crashes(self, n: int | Decimal, locale: str) -> None: + """Ordinal selection never crashes for any valid n/locale combination.""" + result = select_plural_category(n, locale, ordinal=True) + + assert isinstance(result, str) + event(f"category={result}") + event(f"n_type={type(n).__name__}") + + def test_ordinal_unknown_locale_falls_back_to_other(self) -> None: + """Unknown locale falls back to 'other' for ordinal rules.""" + result = select_plural_category(1, "xx_XX", ordinal=True) + assert result == "other" + + def test_ordinal_welsh_has_multiple_categories(self) -> None: + """Welsh (cy) ordinal rules use multiple categories (zero, one, two, few, many, other).""" + # Welsh ordinals use all 6 categories + results = {select_plural_category(n, "cy", ordinal=True) for n in range(10)} + # Welsh ordinals produce at least 2 distinct categories + assert len(results) >= 2 + valid = {"zero", "one", "two", "few", "many", "other"} + assert results <= valid diff --git a/tests/runtime_plural_rules_cases/precision_parameter_tests.py b/tests/runtime_plural_rules_cases/precision_parameter_tests.py new file mode 100644 index 00000000..db234e7a --- /dev/null +++ b/tests/runtime_plural_rules_cases/precision_parameter_tests.py @@ -0,0 +1,212 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Precision Parameter Tests +# ============================================================================ + + +class TestPrecisionParameter: + """Test precision parameter for CLDR v operand handling (lines 118-121). + + The precision parameter is critical for NUMBER() formatting. It controls + the CLDR v operand (fraction digit count), which affects plural category + selection in many locales. + + Key property: 1 (integer) vs 1.00 (precision=2) may have different plural + categories because they have different v values (v=0 vs v=2). + """ + + def test_precision_changes_english_one_to_other(self) -> None: + """English: precision converts 'one' to 'other' (lines 118-121). + + Critical case: 1 is "one" but 1.00 (with v=2) is "other" in English. + This is the primary use case for the precision parameter. + """ + result_no_precision = select_plural_category(1, "en_US") + result_with_precision = select_plural_category(1, "en_US", precision=2) + + assert result_no_precision == "one" + assert result_with_precision == "other" + + @given( + n=st.integers(min_value=0, max_value=1000), + precision=st.integers(min_value=1, max_value=10), + ) + @example(n=1, precision=1) + @example(n=1, precision=2) + @example(n=42, precision=5) + def test_precision_always_returns_valid_category( + self, n: int, precision: int + ) -> None: + """Precision parameter always returns valid CLDR category (lines 118-121). + + Property: For all n, precision, and locale, result in valid_categories + """ + result = select_plural_category(n, "en_US", precision=precision) + + event(f"category={result}") + event(f"precision={precision}") + valid_categories = {"zero", "one", "two", "few", "many", "other"} + assert result in valid_categories + + @given( + n=st.decimals( + min_value=Decimal(0), max_value=Decimal(100), allow_nan=False, allow_infinity=False + ), + precision=st.integers(min_value=1, max_value=6), + ) + @example(n=Decimal("1.5"), precision=2) + @example(n=Decimal("42.7"), precision=1) + def test_precision_with_fractional_decimals(self, n: Decimal, precision: int) -> None: + """Precision works correctly with Decimal inputs (lines 118-121). + + Property: Decimal values are quantized correctly for plural selection + """ + result = select_plural_category(n, "en_US", precision=precision) + + event(f"category={result}") + event(f"precision={precision}") + valid_categories = {"zero", "one", "two", "few", "many", "other"} + assert result in valid_categories + + @given( + n=st.integers(min_value=0, max_value=100), + precision=st.integers(min_value=1, max_value=8), + ) + @example(n=1, precision=1) + @example(n=1, precision=5) + def test_precision_with_decimals(self, n: int, precision: int) -> None: + """Precision works correctly with Decimal inputs (lines 118-121). + + Property: Decimal(n) with precision is handled correctly + """ + decimal_n = Decimal(n) + result = select_plural_category(decimal_n, "en_US", precision=precision) + + event(f"category={result}") + event(f"precision={precision}") + valid_categories = {"zero", "one", "two", "few", "many", "other"} + assert result in valid_categories + + def test_precision_one_formats_to_one_decimal_place(self) -> None: + """Precision=1 formats to one decimal place (lines 118-121).""" + result = select_plural_category(1, "en_US", precision=1) + assert result == "other" + + result = select_plural_category(5, "en_US", precision=1) + assert result == "other" + + def test_precision_zero_ignored(self) -> None: + """Precision=0 is ignored (condition precision > 0 on line 111). + + When precision=0, the code takes the else branch (line 124), not lines 118-121. + """ + result_no_precision = select_plural_category(1, "en_US") + result_precision_zero = select_plural_category(1, "en_US", precision=0) + + assert result_no_precision == "one" + assert result_precision_zero == "one" + + def test_precision_none_ignored(self) -> None: + """Precision=None is ignored (condition precision is not None on line 111). + + When precision=None, the code takes the else branch (line 124), not lines 118-121. + """ + result_no_precision = select_plural_category(1, "en_US") + result_precision_none = select_plural_category(1, "en_US", precision=None) + + assert result_no_precision == "one" + assert result_precision_none == "one" + + @given( + n=st.integers(min_value=0, max_value=100), + precision=st.integers(min_value=1, max_value=5), + locale=LOCALE_CODES, + ) + @example(n=1, precision=2, locale="en_US") + @example(n=1, precision=2, locale="ru_RU") + @example(n=0, precision=1, locale="lv_LV") + def test_precision_consistency_across_locales( + self, n: int, precision: int, locale: str + ) -> None: + """Precision produces consistent results across locales (lines 118-121). + + Property: Same (n, precision, locale) always returns same category + """ + result1 = select_plural_category(n, locale, precision=precision) + result2 = select_plural_category(n, locale, precision=precision) + + event(f"locale={locale}") + event(f"precision={precision}") + event(f"category={result1}") + assert result1 == result2 + + def test_precision_large_value(self) -> None: + """Precision handles large precision values correctly (lines 118-121).""" + result = select_plural_category(1, "en_US", precision=10) + assert result == "other" + + result = select_plural_category(42, "en_US", precision=15) + assert result == "other" + + @given( + n=st.integers(min_value=1, max_value=100), + precision=st.integers(min_value=1, max_value=6), + ) + @example(n=1, precision=1) + @example(n=21, precision=2) + @example(n=11, precision=1) + def test_precision_affects_slavic_rules( + self, n: int, precision: int + ) -> None: + """Precision affects Slavic plural rules (lines 118-121). + + In Slavic languages, integers have complex rules, but formatted decimals + typically fall into the "other" category. + """ + result_no_precision = select_plural_category(n, "ru_RU") + result_with_precision = select_plural_category(n, "ru_RU", precision=precision) + + event(f"category_no_precision={result_no_precision}") + event(f"category_with_precision={result_with_precision}") + event(f"precision={precision}") + valid_categories = {"zero", "one", "two", "few", "many", "other"} + assert result_no_precision in valid_categories + assert result_with_precision in valid_categories + + @given( + n=st.integers(min_value=0, max_value=10), + precision=st.integers(min_value=1, max_value=4), + ) + @example(n=0, precision=1) + @example(n=2, precision=2) + def test_precision_with_arabic_complex_rules( + self, n: int, precision: int + ) -> None: + """Precision works with Arabic's complex 6-category system (lines 118-121). + + Property: Precision affects category selection in all locale systems + """ + result = select_plural_category(n, "ar_SA", precision=precision) + + event(f"category={result}") + event(f"precision={precision}") + valid_categories = {"zero", "one", "two", "few", "many", "other"} + assert result in valid_categories + + def test_precision_quantization_correctness(self) -> None: + """Precision quantizes numbers correctly (lines 118-121). + + Verifies the Decimal quantization logic produces expected v operand. + """ + result = select_plural_category(5, "en_US", precision=2) + assert result == "other" + + result = select_plural_category(0, "en_US", precision=3) + assert result == "other" + + result = select_plural_category(100, "en_US", precision=1) + assert result == "other" diff --git a/tests/runtime_plural_rules_cases/property_tests_edge_cases.py b/tests/runtime_plural_rules_cases/property_tests_edge_cases.py new file mode 100644 index 00000000..cee4fc97 --- /dev/null +++ b/tests/runtime_plural_rules_cases/property_tests_edge_cases.py @@ -0,0 +1,53 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Property Tests - Edge Cases +# ============================================================================ + + +class TestPluralRuleEdgeCases: + """Property-based tests for edge cases.""" + + @given(locale=st.text(min_size=1, max_size=10)) + @example(locale="invalid") + @example(locale="xx_YY") + def test_arbitrary_locale_never_crashes(self, locale: str) -> None: + """Arbitrary locale never crashes. + + Property: For all locale strings, select_plural_category does not raise + """ + result = select_plural_category(42, locale) + event(f"locale_len={len(locale)}") + event(f"category={result}") + assert isinstance(result, str) + + @given(n=st.decimals( + min_value=Decimal(-1000), max_value=Decimal(0), + allow_nan=False, allow_infinity=False, + )) + @example(n=Decimal(-1)) + @example(n=Decimal(-100)) + def test_negative_numbers_return_valid_category(self, n: Decimal) -> None: + """Negative numbers return valid category. + + Property: For all n < 0, category ∈ valid_categories + """ + result = select_plural_category(n, "en") + event(f"category={result}") + assert result in {"zero", "one", "two", "few", "many", "other"} + + @given(locale=LOCALE_CODES) + @example(locale="en_US") + @example(locale="ru_RU") + def test_very_large_numbers(self, locale: str) -> None: + """Very large numbers work correctly. + + Property: For all locales, large numbers return valid category + """ + result = select_plural_category(10**9, locale) + event(f"locale={locale}") + event(f"category={result}") + assert result in {"zero", "one", "two", "few", "many", "other"} diff --git a/tests/runtime_plural_rules_cases/property_tests_invariants.py b/tests/runtime_plural_rules_cases/property_tests_invariants.py new file mode 100644 index 00000000..6d8b19cc --- /dev/null +++ b/tests/runtime_plural_rules_cases/property_tests_invariants.py @@ -0,0 +1,73 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Property Tests - Invariants +# ============================================================================ + + +class TestPluralRuleInvariants: + """Property-based tests for invariants that must hold for all plural rules.""" + + @given(n=NUMBERS, locale=LOCALE_CODES) + @example(n=0, locale="en_US") + @example(n=1, locale="en_US") + @example(n=2, locale="ar_SA") + def test_always_returns_valid_category(self, n: int | Decimal, locale: str) -> None: + """Plural selection always returns valid CLDR category. + + Property: For all n and locale, result ∈ {zero, one, two, few, many, other} + """ + result = select_plural_category(n, locale) + + valid_categories = {"zero", "one", "two", "few", "many", "other"} + assert result in valid_categories + + n_type = type(n).__name__ + event(f"category={result}") + event(f"n_type={n_type}") + event(f"locale={locale}") + + @given(n=NUMBERS, locale=LOCALE_CODES) + @example(n=42, locale="lv_LV") + def test_never_returns_none(self, n: int | Decimal, locale: str) -> None: + """Plural selection never returns None. + + Property: For all n and locale, result is not None + """ + result = select_plural_category(n, locale) + + assert result is not None + event(f"category={result}") + + @given(n=st.integers(min_value=0, max_value=1000), locale=LOCALE_CODES) + @example(n=1, locale="en_US") + @example(n=5, locale="ru_RU") + def test_integer_consistency(self, n: int, locale: str) -> None: + """Same integer always returns same category for same locale. + + Property: f(n, locale) = f(n, locale) (idempotence) + """ + result1 = select_plural_category(n, locale) + result2 = select_plural_category(n, locale) + + assert result1 == result2 + event(f"category={result1}") + event(f"locale={locale}") + + @given(n=NUMBERS) + @example(n=0) + @example(n=1) + @example(n=42) + def test_unknown_locale_defaults_to_cldr_root(self, n: int | Decimal) -> None: + """Unknown locale uses CLDR root rules (always 'other'). + + Property: For all n, select_plural_category(n, unknown) = "other" + """ + result = select_plural_category(n, "xx_XX") + + assert result == "other" + n_type = type(n).__name__ + event(f"n_type={n_type}") diff --git a/tests/runtime_plural_rules_cases/property_tests_locale_specific_rules.py b/tests/runtime_plural_rules_cases/property_tests_locale_specific_rules.py new file mode 100644 index 00000000..07c6376f --- /dev/null +++ b/tests/runtime_plural_rules_cases/property_tests_locale_specific_rules.py @@ -0,0 +1,218 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Property Tests - Locale-Specific Rules +# ============================================================================ + + +class TestEnglishPluralRules: + """Property-based tests for English plural rules (one/other).""" + + @given(n=st.integers(min_value=2, max_value=1000)) + @example(n=2) + @example(n=100) + def test_integers_not_one_are_other(self, n: int) -> None: + """English: integers != 1 are 'other'. + + Property: For all n in Z where n != 1, category = "other" + """ + assume(n != 1) + + result = select_plural_category(n, "en") + + assert result == "other" + event(f"n={n}") + + def test_one_is_one(self) -> None: + """English: 1 is 'one'.""" + assert select_plural_category(1, "en") == "one" + + def test_zero_is_other(self) -> None: + """English: 0 is 'other'.""" + assert select_plural_category(0, "en") == "other" + + @given(n=st.decimals( + min_value=Decimal("0.1"), max_value=Decimal(1000), + allow_nan=False, allow_infinity=False, + )) + @example(n=Decimal("0.5")) + @example(n=Decimal("2.5")) + def test_decimals_are_other(self, n: Decimal) -> None: + """English: Decimals not equal to 1 are 'other'. + + Property: For all n in Q where n != 1, category = "other" + """ + assume(n != Decimal(1)) + + result = select_plural_category(n, "en") + + assert result == "other" + is_whole = n % 1 == 0 + event(f"decimal_is_whole={is_whole}") + + +class TestLatvianPluralRules: + """Property-based tests for Latvian plural rules (zero/one/other).""" + + def test_zero_is_zero(self) -> None: + """Latvian: 0 is 'zero'.""" + assert select_plural_category(0, "lv") == "zero" + + @given(n=st.integers(min_value=1, max_value=1000)) + @example(n=1) + @example(n=21) + @example(n=11) + def test_rules_consistency(self, n: int) -> None: + """Latvian: rules are consistent with CLDR. + + Property: Category determined by modulo operations per CLDR spec + """ + result = select_plural_category(n, "lv") + + i_mod_10 = n % 10 + i_mod_100 = n % 100 + + event(f"category={result}") + event(f"n_mod_10={i_mod_10}") + if i_mod_10 == 0: + assert result in {"zero", "other"} + elif i_mod_10 == 1 and i_mod_100 != 11: + assert result == "one" + else: + assert result in {"zero", "other"} + + +class TestSlavicPluralRules: + """Property-based tests for Slavic languages (Russian, Polish).""" + + @given(n=st.integers(min_value=1, max_value=1000)) + @example(n=1) + @example(n=21) + @example(n=11) + def test_one_rule(self, n: int) -> None: + """Slavic: numbers ending in 1 (but not 11) are 'one'. + + Property: n % 10 = 1 AND n % 100 ≠ 11 => category = "one" + """ + i_mod_10 = n % 10 + i_mod_100 = n % 100 + + result = select_plural_category(n, "ru") + + event(f"category={result}") + event(f"n_mod_10={i_mod_10}") + if i_mod_10 == 1 and i_mod_100 != 11: + assert result == "one" + + @given(n=st.integers(min_value=2, max_value=1000)) + @example(n=2) + @example(n=22) + @example(n=12) + def test_few_rule(self, n: int) -> None: + """Slavic: numbers ending in 2-4 (but not 12-14) are 'few'. + + Property: 2 ≤ n % 10 ≤ 4 AND NOT 12 ≤ n % 100 ≤ 14 => category = "few" + """ + i_mod_10 = n % 10 + i_mod_100 = n % 100 + + result = select_plural_category(n, "ru") + + event(f"category={result}") + event(f"n_mod_10={i_mod_10}") + if 2 <= i_mod_10 <= 4 and not 12 <= i_mod_100 <= 14: + assert result == "few" + + @given(n=st.integers(min_value=5, max_value=1000)) + @example(n=5) + @example(n=15) + @example(n=100) + def test_many_rule(self, n: int) -> None: + """Slavic: specific patterns are 'many'. + + Property: (n % 10 = 0) OR (5 ≤ n % 10 ≤ 9) OR (11 ≤ n % 100 ≤ 14) => category = "many" + """ + i_mod_10 = n % 10 + i_mod_100 = n % 100 + + result = select_plural_category(n, "ru") + + event(f"category={result}") + event(f"n_mod_10={i_mod_10}") + if i_mod_10 == 0 or 5 <= i_mod_10 <= 9 or 11 <= i_mod_100 <= 14: + assert result == "many" + + @given( + fraction=st.decimals( + min_value=Decimal("0.01"), max_value=Decimal("999.99"), + allow_nan=False, allow_infinity=False, + ) + ) + @example(fraction=Decimal("0.5")) + @example(fraction=Decimal("1.5")) + def test_fractional_numbers_return_other(self, fraction: Decimal) -> None: + """Slavic: fractional numbers return 'other'. + + Property: For all n in Q where n not in Z, category = "other" + """ + assume(fraction % 1 != 0) + + category = select_plural_category(fraction, "ru_RU") + + event(f"category={category}") + assert category == "other" + + +class TestArabicPluralRules: + """Property-based tests for Arabic plural rules (all 6 categories).""" + + def test_zero_is_zero(self) -> None: + """Arabic: 0 is 'zero'.""" + assert select_plural_category(0, "ar") == "zero" + + def test_one_is_one(self) -> None: + """Arabic: 1 is 'one'.""" + assert select_plural_category(1, "ar") == "one" + + def test_two_is_two(self) -> None: + """Arabic: 2 is 'two'.""" + assert select_plural_category(2, "ar") == "two" + + @given(n=st.integers(min_value=3, max_value=10)) + @example(n=3) + @example(n=10) + def test_three_to_ten_are_few(self, n: int) -> None: + """Arabic: 3-10 are 'few'. + + Property: 3 ≤ n ≤ 10 => category = "few" + """ + result = select_plural_category(n, "ar") + event(f"n={n}") + assert result == "few" + + @given(n=st.integers(min_value=11, max_value=99)) + @example(n=11) + @example(n=99) + def test_eleven_to_ninetynine_are_many(self, n: int) -> None: + """Arabic: 11-99 are 'many'. + + Property: 11 ≤ n ≤ 99 => category = "many" + """ + result = select_plural_category(n, "ar") + event(f"n={n}") + assert result == "many" + + @given(n=st.integers(min_value=100, max_value=1000)) + @example(n=100) + @example(n=500) + def test_hundreds_valid_category(self, n: int) -> None: + """Arabic: 100+ return valid category based on remainder. + + Property: For all n ≥ 100, category ∈ valid_categories + """ + result = select_plural_category(n, "ar") + event(f"category={result}") + assert result in {"zero", "one", "two", "few", "many", "other"} diff --git a/tests/runtime_plural_rules_cases/property_tests_metamorphic_properties.py b/tests/runtime_plural_rules_cases/property_tests_metamorphic_properties.py new file mode 100644 index 00000000..dbca8299 --- /dev/null +++ b/tests/runtime_plural_rules_cases/property_tests_metamorphic_properties.py @@ -0,0 +1,52 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Property Tests - Metamorphic Properties +# ============================================================================ + + +class TestPluralRuleMetamorphic: + """Metamorphic property tests.""" + + @given( + n=st.integers(min_value=0, max_value=1000), + locale=st.sampled_from(["fr_FR", "it_IT", "pt_PT", "pt_BR"]), + ) + @example(n=1, locale="fr_FR") + @example(n=50, locale="it_IT") + def test_adding_hundred_preserves_validity_for_romance( + self, n: int, locale: str + ) -> None: + """For Romance languages, adding 100 preserves category validity. + + Metamorphic property: If f(n) is valid, then f(n+100) is also valid + """ + result1 = select_plural_category(n, locale) + result2 = select_plural_category(n + 100, locale) + + event(f"locale={locale}") + event(f"category_n={result1}") + valid = {"zero", "one", "two", "few", "many", "other"} + assert result1 in valid + assert result2 in valid + + @given(n=st.integers(min_value=1, max_value=100)) + @example(n=1) + @example(n=50) + def test_english_german_similarity_for_small_numbers(self, n: int) -> None: + """English and German have similar rules for small numbers. + + Metamorphic property: Both use only one/other categories + """ + en_result = select_plural_category(n, "en") + de_result = select_plural_category(n, "de") + + event(f"category_en={en_result}") + assert en_result in {"one", "other"} + assert de_result in {"one", "other"} + + if n == 1: + assert en_result == de_result == "one" diff --git a/tests/runtime_plural_rules_cases/rounding_consistency_tests_round_half_up.py b/tests/runtime_plural_rules_cases/rounding_consistency_tests_round_half_up.py new file mode 100644 index 00000000..bc21ade8 --- /dev/null +++ b/tests/runtime_plural_rules_cases/rounding_consistency_tests_round_half_up.py @@ -0,0 +1,104 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Rounding Consistency Tests (ROUND_HALF_UP) +# ============================================================================ + + +class TestRoundingConsistency: + """Tests that plural selection rounding matches formatting rounding. + + Both plural_rules.py and locale_context.py use ROUND_HALF_EVEN (Babel default) + so that the displayed number and its plural form always agree. + Half-values (x.5) round to the nearest even digit in both paths. + """ + + def test_half_value_rounds_even_for_plural(self) -> None: + """2.5 with precision=0 rounds to 2, selecting 'other' in English.""" + # 2.5 -> 2 (ROUND_HALF_EVEN: 2 is even), which is 'other' in English + result = select_plural_category(Decimal("2.5"), "en_US", precision=0) + assert result == "other" + + def test_half_value_3_5_rounds_up_for_plural(self) -> None: + """3.5 with precision=0 rounds to 4, selecting 'other' in English.""" + # 3.5 -> 4 (ROUND_HALF_EVEN: 4 is even) + result = select_plural_category(Decimal("3.5"), "en_US", precision=0) + assert result == "other" + + def test_half_value_0_5_rounds_to_zero_for_plural(self) -> None: + """0.5 with precision=0 rounds to 0, selecting 'other' in English.""" + # 0.5 -> 0 (ROUND_HALF_EVEN: 0 is even), which is 'other' in English + result = select_plural_category(Decimal("0.5"), "en_US", precision=0) + assert result == "other" + + def test_half_value_1_5_rounds_up_for_plural(self) -> None: + """1.5 with precision=0 rounds to 2, selecting 'other' in English.""" + # 1.5 -> 2 (ROUND_HALF_EVEN: 2 is even) + result = select_plural_category(Decimal("1.5"), "en_US", precision=0) + assert result == "other" + + def test_rounding_matches_formatting_at_half_values(self) -> None: + """Verify that Decimal quantization uses ROUND_HALF_EVEN, matching Babel. + + This is the core consistency property: the number displayed to the user + and the plural category selected must agree on rounding direction. + """ + from decimal import ROUND_HALF_EVEN + + test_cases = [ + (Decimal("0.5"), 0, Decimal(0)), + (Decimal("1.5"), 0, Decimal(2)), + (Decimal("2.5"), 0, Decimal(2)), + (Decimal("3.5"), 0, Decimal(4)), + (Decimal("1.005"), 2, Decimal("1.00")), + (Decimal("1.015"), 2, Decimal("1.02")), + (Decimal("2.445"), 2, Decimal("2.44")), + ] + + for value, precision, expected_rounded in test_cases: + quantizer = Decimal(10) ** -precision + rounded = value.quantize(quantizer, rounding=ROUND_HALF_EVEN) + assert rounded == expected_rounded, ( + f"Expected {value} with precision={precision} to round to " + f"{expected_rounded}, got {rounded}" + ) + + @given( + n=st.decimals( + min_value=Decimal(0), max_value=Decimal(100), allow_nan=False, allow_infinity=False + ), + precision=st.integers(min_value=0, max_value=4), + ) + @example(n=Decimal("0.5"), precision=0) + @example(n=Decimal("2.5"), precision=0) + @example(n=Decimal("3.5"), precision=0) + @example(n=Decimal("1.005"), precision=2) + def test_plural_rounding_direction_property( + self, n: Decimal, precision: int + ) -> None: + """Plural rounding direction matches ROUND_HALF_EVEN for all inputs. + + Property: The Decimal value used for plural selection must equal the + value obtained by ROUND_HALF_EVEN quantization. + """ + from decimal import ROUND_HALF_EVEN + + quantizer = Decimal(10) ** -precision + expected = n.quantize(quantizer, rounding=ROUND_HALF_EVEN) + + # The plural category must correspond to the ROUND_HALF_EVEN result. + # We verify indirectly: call select_plural_category with precision, + # then call again with the explicitly-rounded value (no precision). + category_via_precision = select_plural_category(n, "en_US", precision=precision) + category_via_rounded = select_plural_category(expected, "en_US") + + event(f"category_via_precision={category_via_precision}") + event(f"precision={precision}") + assert category_via_precision == category_via_rounded, ( + f"Rounding mismatch for n={n}, precision={precision}: " + f"precision path gave '{category_via_precision}', " + f"explicitly rounded {expected} gave '{category_via_rounded}'" + ) diff --git a/tests/runtime_plural_rules_cases/slavic_plural_rule_coverage.py b/tests/runtime_plural_rules_cases/slavic_plural_rule_coverage.py new file mode 100644 index 00000000..92e55c5e --- /dev/null +++ b/tests/runtime_plural_rules_cases/slavic_plural_rule_coverage.py @@ -0,0 +1,22 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SLAVIC PLURAL RULE COVERAGE +# ============================================================================ + + +class TestSlavicRuleReturnOther: + """Slavic plural rules return 'other' for numbers not matching one/few/many.""" + + def test_slavic_rule_return_other(self) -> None: + """Polish plural rules return 'many' or 'other' for 111 (ends in 1 but mod 100 == 11).""" + # 111 % 10 = 1, 111 % 100 = 11 + # Polish: 'one' requires mod_100 != 11, so 111 skips 'one' + # Polish: 'few' requires 2-4, so 111 skips 'few' + # Polish: 'many' covers 0 and 5-9 and 11-14; 111 does not match (mod_10 == 1) + # Remaining cases return 'other' + result = select_plural_category(111, "pl") + assert result in ["many", "other"] diff --git a/tests/runtime_plural_rules_cases/ultimate_fallback_tests.py b/tests/runtime_plural_rules_cases/ultimate_fallback_tests.py new file mode 100644 index 00000000..8ffeeb09 --- /dev/null +++ b/tests/runtime_plural_rules_cases/ultimate_fallback_tests.py @@ -0,0 +1,37 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_plural_rules.py.""" + +from tests.runtime_plural_rules_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Ultimate Fallback Tests +# ============================================================================ + + +class TestUltimateFallback: + """Test ultimate fallback when both locale and root fail.""" + + def test_ultimate_fallback_when_root_locale_also_fails(self) -> None: + """Return 'other' when even root locale loading fails (lines 83-87). + + This is defensive programming - should never happen with valid Babel installation. + """ + with patch("ftllexengine.core.locale_utils.get_babel_locale") as mock_get: + mock_get.side_effect = UnknownLocaleError("mocked failure") + + result = select_plural_category(42, "completely_invalid_locale") + assert result == "other" + + def test_ultimate_fallback_with_value_error(self) -> None: + """Return 'other' when get_babel_locale raises ValueError (lines 83-87).""" + with patch("ftllexengine.core.locale_utils.get_babel_locale") as mock_get: + mock_get.side_effect = ValueError("mocked failure") + + result = select_plural_category(1, "invalid") + assert result == "other" + + result = select_plural_category(0, "invalid") + assert result == "other" + + result = select_plural_category(100, "invalid") + assert result == "other" diff --git a/tests/runtime_resolver_depth_cycles_cases/__init__.py b/tests/runtime_resolver_depth_cycles_cases/__init__.py new file mode 100644 index 00000000..594ff793 --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/__init__.py @@ -0,0 +1,70 @@ +"""Resolver depth limiting and cycle detection tests. + +Consolidates: +- test_resolver_cycles.py (direct/indirect/deep cycles, cycle detection properties) +- test_resolver_depth_limit.py (MAX_DEPTH enforcement, attribute chains) +- test_resolver_depth_guard_and_variants.py (guard edge cases, multi-placeables, + malformed NumberLiteral, fallback depth protection) +- test_resolver_expression_depth.py (SelectExpression depth, Placeable depth, mixed) +- test_resolver_expression_depth_and_select.py (ResolutionContext expression depth) +- test_resolver_expansion_budget.py (expansion budget DoS protection) +""" + +from __future__ import annotations + +import pytest +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine.constants import FALLBACK_INVALID, MAX_DEPTH +from ftllexengine.diagnostics import DiagnosticCode, ErrorCategory, FrozenFluentError +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.function_bridge import FunctionRegistry +from ftllexengine.runtime.resolution_context import GlobalDepthGuard, ResolutionContext +from ftllexengine.runtime.resolver import FluentResolver +from ftllexengine.syntax import ( + CallArguments, + FunctionReference, + Identifier, + Message, + NumberLiteral, + Pattern, + Placeable, + SelectExpression, + StringLiteral, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.ast import InlineExpression + +__all__ = [ + "FALLBACK_INVALID", + "MAX_DEPTH", + "CallArguments", + "DiagnosticCode", + "ErrorCategory", + "FluentBundle", + "FluentResolver", + "FrozenFluentError", + "FunctionReference", + "FunctionRegistry", + "GlobalDepthGuard", + "Identifier", + "InlineExpression", + "Message", + "NumberLiteral", + "Pattern", + "Placeable", + "ResolutionContext", + "SelectExpression", + "StringLiteral", + "TextElement", + "VariableReference", + "Variant", + "event", + "given", + "pytest", + "settings", + "st", +] diff --git a/tests/runtime_resolver_depth_cycles_cases/cycle_detection.py b/tests/runtime_resolver_depth_cycles_cases/cycle_detection.py new file mode 100644 index 00000000..e1bd0c17 --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/cycle_detection.py @@ -0,0 +1,136 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Cycle Detection +# ============================================================================ + + +class TestDirectCycles: + """Tests for direct self-referential cycles.""" + + def test_message_references_itself(self) -> None: + """Direct cycle: message references itself.""" + bundle = FluentBundle("en-US", strict=False) + bundle.add_resource("self = { self }") + + result, errors = bundle.format_pattern("self") + + assert isinstance(result, str) + assert len(errors) > 0 + cyclic_errors = [ + e for e in errors + if isinstance(e, FrozenFluentError) and e.category == ErrorCategory.CYCLIC + ] + assert len(cyclic_errors) > 0 + + def test_term_references_itself(self) -> None: + """Direct cycle: term references itself.""" + bundle = FluentBundle("en-US", strict=False) + bundle.add_resource( + """ +-self = { -self } +msg = { -self } +""" + ) + + result, errors = bundle.format_pattern("msg") + + assert isinstance(result, str) + assert len(errors) > 0 + + +class TestIndirectCycles: + """Tests for indirect cycles through chains.""" + + def test_two_message_cycle(self) -> None: + """Indirect cycle: a -> b -> a.""" + bundle = FluentBundle("en-US", strict=False) + bundle.add_resource( + """ +msg-a = { msg-b } +msg-b = { msg-a } +""" + ) + + result, errors = bundle.format_pattern("msg-a") + + assert isinstance(result, str) + assert len(errors) > 0 + cyclic_errors = [ + e for e in errors + if isinstance(e, FrozenFluentError) and e.category == ErrorCategory.CYCLIC + ] + assert len(cyclic_errors) > 0 + + def test_three_message_cycle(self) -> None: + """Indirect cycle: a -> b -> c -> a.""" + bundle = FluentBundle("en-US", strict=False) + bundle.add_resource( + """ +msg-a = { msg-b } +msg-b = { msg-c } +msg-c = { msg-a } +""" + ) + + result, errors = bundle.format_pattern("msg-a") + + assert isinstance(result, str) + assert len(errors) > 0 + + def test_term_to_message_cycle(self) -> None: + """Mixed cycle: term -> message -> term.""" + bundle = FluentBundle("en-US", strict=False) + bundle.add_resource( + """ +-brand = { product } +product = { -brand } Browser +""" + ) + + result, _ = bundle.format_pattern("product") + + assert isinstance(result, str) + + +class TestDeepChains: + """Tests for deep non-cyclic chains.""" + + def test_chain_at_depth_limit(self) -> None: + """Chain shorter than MAX_DEPTH resolves to leaf value.""" + depth = min(MAX_DEPTH - 1, 50) + messages = [] + for i in range(depth): + if i < depth - 1: + messages.append(f"msg{i} = {{ msg{i + 1} }}") + else: + messages.append(f"msg{i} = End") + + bundle = FluentBundle("en-US") + bundle.add_resource("\n".join(messages)) + + result, _ = bundle.format_pattern("msg0") + + assert isinstance(result, str) + assert "End" in result + + def test_chain_exceeding_depth_limit(self) -> None: + """Chain exceeding MAX_DEPTH produces error.""" + depth = MAX_DEPTH + 10 + messages = [] + for i in range(depth): + if i < depth - 1: + messages.append(f"msg{i} = {{ msg{i + 1} }}") + else: + messages.append(f"msg{i} = End") + + bundle = FluentBundle("en-US", strict=False) + bundle.add_resource("\n".join(messages)) + + result, errors = bundle.format_pattern("msg0") + + assert isinstance(result, str) + assert len(errors) > 0 diff --git a/tests/runtime_resolver_depth_cycles_cases/fallback_depth_protection.py b/tests/runtime_resolver_depth_cycles_cases/fallback_depth_protection.py new file mode 100644 index 00000000..a5042a70 --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/fallback_depth_protection.py @@ -0,0 +1,91 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Fallback Depth Protection +# ============================================================================ + + +class TestGetFallbackForPlaceableDepthProtection: + """Coverage for depth protection in _get_fallback_for_placeable.""" + + def _make_resolver(self) -> FluentResolver: + return FluentResolver( + locale="en", + messages={}, + terms={}, + function_registry=FunctionRegistry(), + ) + + def test_fallback_depth_zero_returns_invalid(self) -> None: + """Fallback with depth=0 returns FALLBACK_INVALID immediately.""" + resolver = self._make_resolver() + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("x")), + variants=( + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="v"),)), + default=True, + ), + ), + ) + + result = resolver._get_fallback_for_placeable(select_expr, depth=0) + + assert result == FALLBACK_INVALID + + def test_fallback_negative_depth_returns_invalid(self) -> None: + """Fallback with negative depth returns FALLBACK_INVALID.""" + resolver = self._make_resolver() + + result = resolver._get_fallback_for_placeable( + VariableReference(id=Identifier("x")), depth=-1 + ) + + assert result == FALLBACK_INVALID + + @given(depth=st.integers(max_value=0)) + def test_fallback_non_positive_depth_property(self, depth: int) -> None: + """Property: Any non-positive depth returns FALLBACK_INVALID immediately.""" + event(f"depth={depth}") + resolver = self._make_resolver() + + result = resolver._get_fallback_for_placeable( + StringLiteral(value="test"), depth=depth + ) + + assert result == FALLBACK_INVALID + + def test_fallback_depth_one_processes_normally(self) -> None: + """Fallback with depth=1 processes expression normally.""" + resolver = self._make_resolver() + + result = resolver._get_fallback_for_placeable( + VariableReference(id=Identifier("count")), depth=1 + ) + + assert result == "{$count}" + + def test_fallback_select_expression_depth_decremented(self) -> None: + """SelectExpression fallback decrements depth for recursive call.""" + resolver = self._make_resolver() + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("count")), + variants=( + Variant( + key=Identifier("x"), + value=Pattern(elements=(TextElement(value="variant"),)), + default=True, + ), + ), + ) + + # depth=1 → outer select processes, recursive selector call uses depth=0 + # which returns FALLBACK_INVALID; result should contain "{???} -> ..." + result = resolver._get_fallback_for_placeable(select_expr, depth=1) + + assert FALLBACK_INVALID in result + assert " -> ..." in result diff --git a/tests/runtime_resolver_depth_cycles_cases/global_depth_guard_edge_cases.py b/tests/runtime_resolver_depth_cycles_cases/global_depth_guard_edge_cases.py new file mode 100644 index 00000000..b015fa57 --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/global_depth_guard_edge_cases.py @@ -0,0 +1,24 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# GlobalDepthGuard Edge Cases +# ============================================================================ + + +class TestGlobalDepthGuardEdgeCases: + """Coverage for GlobalDepthGuard.__exit__ defensive branch.""" + + def test_exit_without_enter(self) -> None: + """Guard exit without enter does not crash (defensive branch).""" + guard = GlobalDepthGuard(max_depth=100) + # _token remains None; __exit__ defensive branch covered. + guard.__exit__(None, None, None) + + def test_exit_returns_none(self) -> None: + """Guard __exit__ does not suppress exceptions.""" + guard = GlobalDepthGuard(max_depth=100) + with guard: + pass diff --git a/tests/runtime_resolver_depth_cycles_cases/malformed_number_literal_in_variant_keys.py b/tests/runtime_resolver_depth_cycles_cases/malformed_number_literal_in_variant_keys.py new file mode 100644 index 00000000..d4a51f43 --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/malformed_number_literal_in_variant_keys.py @@ -0,0 +1,29 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Malformed NumberLiteral in Variant Keys +# ============================================================================ + + +class TestVariantMatchingMalformedNumberLiteral: + """NumberLiteral.__post_init__ prevents construction with invalid raw strings. + + Previously, programmatically constructed ASTs could contain invalid + NumberLiteral.raw strings that bypassed the parser. NumberLiteral.__post_init__ + now enforces the invariant at construction time, making the resolver's + former InvalidOperation handler unreachable via normal API usage. + """ + + def test_malformed_raw_rejected_at_construction(self) -> None: + """NumberLiteral rejects raw string that does not parse as a number.""" + with pytest.raises(ValueError, match="not a valid number literal"): + NumberLiteral(value=42, raw="not_a_number") + + def test_multiple_malformed_raws_all_rejected(self) -> None: + """NumberLiteral rejects each invalid raw string at construction time.""" + for bad_raw in ("invalid1", "also_invalid", "not-a-number", "[1,2,3]"): + with pytest.raises(ValueError, match="not a valid number literal"): + NumberLiteral(value=1, raw=bad_raw) diff --git a/tests/runtime_resolver_depth_cycles_cases/max_depth_enforcement.py b/tests/runtime_resolver_depth_cycles_cases/max_depth_enforcement.py new file mode 100644 index 00000000..5875195d --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/max_depth_enforcement.py @@ -0,0 +1,157 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# MAX_DEPTH Enforcement +# ============================================================================ + + +class TestMaxDepthLimit: + """Tests for maximum resolution depth enforcement.""" + + def test_max_depth_constant_exists(self) -> None: + """MAX_DEPTH constant is defined and reasonable.""" + assert MAX_DEPTH == 100 + + def test_shallow_chain_succeeds(self) -> None: + """Chain of 5 messages resolves without error.""" + bundle = FluentBundle("en") + bundle.add_resource( + """ +m0 = { m1 } +m1 = { m2 } +m2 = { m3 } +m3 = { m4 } +m4 = Final value +""" + ) + + result, errors = bundle.format_pattern("m0") + + assert errors == () + assert "\u2068" in result or "Final value" in result + + def test_moderate_chain_succeeds(self) -> None: + """Chain of 50 messages resolves without error.""" + bundle = FluentBundle("en") + lines = [] + for i in range(49): + lines.append(f"m{i} = {{ m{i+1} }}") + lines.append("m49 = Done") + bundle.add_resource("\n".join(lines)) + + result, errors = bundle.format_pattern("m0") + + assert errors == () + assert "Done" in result + + def test_deep_chain_hits_limit(self) -> None: + """Chain exceeding MAX_DEPTH returns error.""" + bundle = FluentBundle("en", strict=False) + depth = MAX_DEPTH + 10 + lines = [] + for i in range(depth - 1): + lines.append(f"m{i} = {{ m{i+1} }}") + lines.append(f"m{depth-1} = Final") + bundle.add_resource("\n".join(lines)) + + _, errors = bundle.format_pattern("m0") + + assert len(errors) > 0 + depth_errors = [e for e in errors if isinstance(e, FrozenFluentError)] + assert len(depth_errors) > 0 + + def test_exactly_at_limit_succeeds(self) -> None: + """Chain of exactly MAX_DEPTH - 1 nesting levels succeeds.""" + bundle = FluentBundle("en") + depth = MAX_DEPTH - 1 + lines = [] + for i in range(depth - 1): + lines.append(f"m{i} = {{ m{i+1} }}") + lines.append(f"m{depth-1} = End") + bundle.add_resource("\n".join(lines)) + + result, _ = bundle.format_pattern("m0") + + assert "End" in result + + def test_depth_limit_error_message_contains_depth_info(self) -> None: + """Error message for depth limit references depth.""" + bundle = FluentBundle("en", strict=False) + depth = MAX_DEPTH + 5 + lines = [] + for i in range(depth - 1): + lines.append(f"msg{i} = {{ msg{i+1} }}") + lines.append(f"msg{depth-1} = End") + bundle.add_resource("\n".join(lines)) + + _, errors = bundle.format_pattern("msg0") + + assert len(errors) > 0 + error_str = str(errors[0]) + assert "depth" in error_str.lower() or "Maximum" in error_str + + def test_cyclic_detected_before_depth(self) -> None: + """Cyclic reference is detected before hitting depth limit.""" + bundle = FluentBundle("en", strict=False) + bundle.add_resource( + """ +a = { b } +b = { c } +c = { a } +""" + ) + + result, errors = bundle.format_pattern("a") + + assert len(errors) > 0 + assert "{" in result # Fallback format + + def test_independent_resolutions_dont_share_depth(self) -> None: + """Separate format_pattern calls have independent depth tracking.""" + bundle = FluentBundle("en") + bundle.add_resource( + """ +a1 = { a2 } +a2 = { a3 } +a3 = A Done + +b1 = { b2 } +b2 = B Done +""" + ) + + result_a, errors_a = bundle.format_pattern("a1") + result_b, errors_b = bundle.format_pattern("b1") + + assert errors_a == () + assert errors_b == () + assert "A Done" in result_a + assert "B Done" in result_b + + +class TestMaxDepthWithAttributes: + """Tests for depth limit with attribute access.""" + + def test_attribute_chain_counts_toward_depth(self) -> None: + """Message.attribute references count toward depth.""" + bundle = FluentBundle("en") + bundle.add_resource( + """ +m0 = Value + .attr = { m1.attr } +m1 = Value + .attr = { m2.attr } +m2 = Value + .attr = { m3.attr } +m3 = Value + .attr = Final +""" + ) + + result, errors = bundle.format_pattern("m0", attribute="attr") + + assert errors == () + assert "Final" in result diff --git a/tests/runtime_resolver_depth_cycles_cases/multi_placeable_pattern_resolution.py b/tests/runtime_resolver_depth_cycles_cases/multi_placeable_pattern_resolution.py new file mode 100644 index 00000000..005033d3 --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/multi_placeable_pattern_resolution.py @@ -0,0 +1,74 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Multi-Placeable Pattern Resolution +# ============================================================================ + + +class TestPatternMultiplePlaceables: + """Coverage for pattern with multiple consecutive placeables.""" + + def test_pattern_with_two_placeables_in_sequence(self) -> None: + """Pattern with consecutive placeables resolves all correctly.""" + pattern = Pattern( + elements=( + Placeable(expression=VariableReference(id=Identifier("first"))), + TextElement(value=" and "), + Placeable(expression=VariableReference(id=Identifier("second"))), + ) + ) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message( + message, {"first": "A", "second": "B"} + ) + + assert result == "A and B" + assert errors == () + + @given( + count=st.integers(min_value=2, max_value=10), + values=st.lists(st.text(min_size=1, max_size=10), min_size=2, max_size=10), + ) + def test_pattern_with_multiple_placeables_property( + self, count: int, values: list[str] + ) -> None: + """Property: Pattern with N placeables resolves all correctly.""" + event(f"count={count}") + values = values[:count] + if len(values) < count: + values.extend(["X"] * (count - len(values))) + + elements: list[TextElement | Placeable] = [] + for i in range(count): + if i > 0: + elements.append(TextElement(value=" ")) + elements.append( + Placeable(expression=VariableReference(id=Identifier(f"v{i}"))) + ) + + pattern = Pattern(elements=tuple(elements)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + args = {f"v{i}": values[i] for i in range(count)} + result, errors = resolver.resolve_message(message, args) + + assert errors == () + assert result == " ".join(values) diff --git a/tests/runtime_resolver_depth_cycles_cases/pattern_loop_expansion_budget.py b/tests/runtime_resolver_depth_cycles_cases/pattern_loop_expansion_budget.py new file mode 100644 index 00000000..4a68e9e4 --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/pattern_loop_expansion_budget.py @@ -0,0 +1,144 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Pattern Loop Expansion Budget +# ============================================================================ + + +class TestPatternLoopEarlyExit: + """Tests for pattern loop early-exit when budget exceeded.""" + + def test_pattern_loop_defensive_check_with_context_over_budget(self) -> None: + """Pattern loop defensive check triggers when total_chars > budget.""" + pattern = Pattern( + elements=( + TextElement(value="A" * 10), + TextElement(value="B" * 10), + ) + ) + message = Message(id=Identifier(name="test"), value=pattern, attributes=()) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"test": message}, + terms={}, + function_registry=registry, + max_expansion_size=50, + ) + + context = ResolutionContext(max_expansion_size=50) + context._total_chars = 60 # Simulate budget already exceeded + + result, errors = resolver.resolve_message(message, args={}, context=context) + + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + assert has_budget_error + assert len(result) == 0 or result == "{test}" + + def test_pattern_loop_exits_when_budget_already_exceeded(self) -> None: + """Pattern loop exits early if budget exceeded before next element.""" + pattern = Pattern( + elements=( + TextElement(value="A" * 50), + TextElement(value="B" * 50), + TextElement(value="C" * 50), + ) + ) + message = Message(id=Identifier(name="test"), value=pattern, attributes=()) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"test": message}, + terms={}, + function_registry=registry, + max_expansion_size=75, + ) + + result, errors = resolver.resolve_message(message, args={}) + + assert len(errors) > 0 + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + assert has_budget_error + assert "C" not in result + + def test_pattern_loop_early_exit_on_boundary(self) -> None: + """Pattern loop exits when total_chars exactly equals budget.""" + pattern = Pattern( + elements=( + TextElement(value="X" * 10), + TextElement(value="Y" * 10), + ) + ) + message = Message(id=Identifier(name="boundary"), value=pattern, attributes=()) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"boundary": message}, + terms={}, + function_registry=registry, + max_expansion_size=10, + ) + + _result, errors = resolver.resolve_message(message, args={}) + + assert len(errors) > 0 + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + assert has_budget_error + + @given( + element_count=st.integers(min_value=2, max_value=10), + chars_per_element=st.integers(min_value=5, max_value=20), + ) + @settings(max_examples=50) + def test_pattern_loop_early_exit_property( + self, element_count: int, chars_per_element: int + ) -> None: + """Property: Pattern loop always exits when budget exceeded.""" + event(f"element_count={element_count}") + + elements = tuple( + TextElement(value=f"{chr(65 + i)}" * chars_per_element) + for i in range(element_count) + ) + pattern = Pattern(elements=elements) + message = Message(id=Identifier(name="prop"), value=pattern, attributes=()) + + total_chars = element_count * chars_per_element + budget = total_chars // 2 + + event("budget_scenario=exceeded") + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"prop": message}, + terms={}, + function_registry=registry, + max_expansion_size=budget, + ) + + result, errors = resolver.resolve_message(message, args={}) + + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + if has_budget_error: + event("error_path=early_exit_detected") + assert len(result) < total_chars + event("result_type=partial") diff --git a/tests/runtime_resolver_depth_cycles_cases/placeable_expansion_budget_break.py b/tests/runtime_resolver_depth_cycles_cases/placeable_expansion_budget_break.py new file mode 100644 index 00000000..ed9ee00e --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/placeable_expansion_budget_break.py @@ -0,0 +1,246 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Placeable Expansion Budget Break +# ============================================================================ + + +class TestPlaceableExpansionBudgetBreak: + """Tests for Placeable exception handler break on expansion budget error.""" + + def test_placeable_expansion_budget_breaks_pattern_loop(self) -> None: + """Expansion budget error from Placeable breaks pattern resolution.""" + outer_pattern = Pattern( + elements=( + TextElement(value="Before"), + Placeable( + expression=VariableReference(id=Identifier(name="big_value")) + ), + TextElement(value="After"), # Must not be processed. + ) + ) + outer_message = Message( + id=Identifier(name="outer"), value=outer_pattern, attributes=() + ) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"outer": outer_message}, + terms={}, + function_registry=registry, + max_expansion_size=50, + ) + + result, errors = resolver.resolve_message( + outer_message, args={"big_value": "Z" * 100} + ) + + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + assert has_budget_error + assert "After" not in result + + def test_placeable_budget_error_via_select_expression(self) -> None: + """Expansion budget error from SelectExpression in Placeable breaks loop.""" + variants = ( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="A" * 60),)), + default=True, + ), + ) + select_expr = SelectExpression( + selector=VariableReference(id=Identifier(name="count")), variants=variants + ) + pattern = Pattern( + elements=( + TextElement(value="Start"), + Placeable(expression=select_expr), + TextElement(value="End"), # Must not be processed. + ) + ) + message = Message(id=Identifier(name="select"), value=pattern, attributes=()) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"select": message}, + terms={}, + function_registry=registry, + max_expansion_size=40, + ) + + result, errors = resolver.resolve_message(message, args={"count": 1}) + + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + assert has_budget_error + assert "End" not in result + + def test_placeable_budget_error_via_function_call(self) -> None: + """Expansion budget error from function result in Placeable breaks loop.""" + def large_output() -> str: + return "LARGE" * 100 + + registry = FunctionRegistry() + registry.register(large_output, ftl_name="BIGFUNC") + + func_call = FunctionReference( + id=Identifier(name="BIGFUNC"), + arguments=CallArguments(positional=(), named=()), + ) + pattern = Pattern( + elements=( + TextElement(value="Prefix"), + Placeable(expression=func_call), + TextElement(value="Suffix"), # Must not be processed. + ) + ) + message = Message(id=Identifier(name="func"), value=pattern, attributes=()) + resolver = FluentResolver( + locale="en_US", + messages={"func": message}, + terms={}, + function_registry=registry, + max_expansion_size=100, + ) + + result, errors = resolver.resolve_message(message, args={}) + + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + assert has_budget_error + assert "Suffix" not in result + + @given( + variant_size=st.integers(min_value=50, max_value=200), + budget=st.integers(min_value=10, max_value=100), + ) + @settings(max_examples=30) + def test_placeable_budget_break_property( + self, variant_size: int, budget: int + ) -> None: + """Property: Placeable budget errors always break pattern loop.""" + event(f"variant_size={variant_size}") + event(f"budget={budget}") + + if variant_size <= budget: + event("skip=variant_fits_budget") + return + + variants = ( + Variant( + key=Identifier(name="key"), + value=Pattern(elements=(TextElement(value="X" * variant_size),)), + default=True, + ), + ) + select = SelectExpression( + selector=VariableReference(id=Identifier(name="var")), variants=variants + ) + pattern = Pattern( + elements=( + Placeable(expression=select), + TextElement(value="Marker"), # Must not appear. + ) + ) + message = Message(id=Identifier(name="test"), value=pattern, attributes=()) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"test": message}, + terms={}, + function_registry=registry, + max_expansion_size=budget, + ) + + result, errors = resolver.resolve_message(message, args={"var": "key"}) + + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors + ) + if has_budget_error: + event("error_path=budget_break") + assert "Marker" not in result + event("result_type=partial") + + +class TestExpansionBudgetIntegration: + """Integration tests for expansion budget across resolver components.""" + + def test_expansion_budget_with_isolating_marks(self) -> None: + """Expansion budget accounts for Unicode isolating marks.""" + pattern = Pattern( + elements=( + Placeable(expression=VariableReference(id=Identifier(name="v1"))), + Placeable(expression=VariableReference(id=Identifier(name="v2"))), + ) + ) + message = Message(id=Identifier(name="iso"), value=pattern, attributes=()) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"iso": message}, + terms={}, + function_registry=registry, + use_isolating=True, + max_expansion_size=15, + ) + + # Each variable: 5 chars content + 2 chars marks (FSI + PDI) = 7 chars + # Total: 14 chars (just under budget of 15) + _result, errors = resolver.resolve_message( + message, args={"v1": "AAAAA", "v2": "BBBBB"} + ) + assert len(errors) == 0 + + # 8-char values: 10 + 10 = 20 > 15 + _result2, errors2 = resolver.resolve_message( + message, + args={"v1": "AAAAAAAA", "v2": "BBBBBBBB"}, + ) + has_budget_error = any( + e.diagnostic is not None + and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + for e in errors2 + ) + assert has_budget_error + + def test_expansion_budget_error_diagnostic_includes_counts(self) -> None: + """Expansion budget error diagnostic includes actual and limit values.""" + pattern = Pattern(elements=(TextElement(value="X" * 100),)) + message = Message(id=Identifier(name="err"), value=pattern, attributes=()) + registry = FunctionRegistry() + resolver = FluentResolver( + locale="en_US", + messages={"err": message}, + terms={}, + function_registry=registry, + max_expansion_size=50, + ) + + _result, errors = resolver.resolve_message(message, args={}) + + assert len(errors) > 0 + budget_error = next( + e + for e in errors + if e.diagnostic and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + ) + assert budget_error.diagnostic is not None + diagnostic_str = str(budget_error.diagnostic) + assert "50" in diagnostic_str + assert "100" in diagnostic_str or "exceeded" in diagnostic_str.lower() diff --git a/tests/runtime_resolver_depth_cycles_cases/resolution_context_tests.py b/tests/runtime_resolver_depth_cycles_cases/resolution_context_tests.py new file mode 100644 index 00000000..ba3fe64c --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/resolution_context_tests.py @@ -0,0 +1,144 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# ResolutionContext Tests +# ============================================================================ + + +class TestResolutionContext: + """Tests for ResolutionContext cycle detection.""" + + def test_push_pop_balance(self) -> None: + """Context push/pop maintains balanced state.""" + ctx = ResolutionContext() + + ctx.push("a") + ctx.push("b") + ctx.push("c") + + assert ctx.depth == 3 + assert ctx.contains("a") + assert ctx.contains("b") + assert ctx.contains("c") + + assert ctx.pop() == "c" + assert ctx.pop() == "b" + assert ctx.pop() == "a" + + assert ctx.depth == 0 + assert not ctx.contains("a") + + def test_cycle_detection_o1(self) -> None: + """Cycle detection is O(1) via set.""" + ctx = ResolutionContext() + + for i in range(100): + ctx.push(f"msg{i}") + + assert ctx.contains("msg0") + assert ctx.contains("msg50") + assert ctx.contains("msg99") + assert not ctx.contains("msg100") + + def test_get_cycle_path(self) -> None: + """Cycle path includes full resolution stack.""" + ctx = ResolutionContext() + + ctx.push("a") + ctx.push("b") + ctx.push("c") + + path = ctx.get_cycle_path("a") + + assert path == ["a", "b", "c", "a"] + + +class TestResolutionContextExpressionDepth: + """Test ResolutionContext.expression_depth property.""" + + def test_expression_depth_property_initial(self) -> None: + """expression_depth property returns 0 initially.""" + context = ResolutionContext() + + assert context.expression_depth == 0 + + def test_expression_depth_property_after_increment(self) -> None: + """expression_depth property reflects guard depth after increment.""" + context = ResolutionContext() + + with context.expression_guard: + assert context.expression_depth == 1 + with context.expression_guard: + assert context.expression_depth == 2 + + assert context.expression_depth == 0 + + +class TestResolutionContextTrackExpansion: + """Direct tests for ResolutionContext.track_expansion() accumulation. + + Targets the expansion budget DoS protection: track_expansion() accumulates + character counts without raising. Callers check + ``total_chars > max_expansion_size`` after each call and generate + FrozenFluentError themselves (separation of state tracking from error policy). + """ + + def test_track_expansion_accumulates_correctly(self) -> None: + """track_expansion() accumulates total_chars without raising.""" + context = ResolutionContext(max_expansion_size=100) + + context.track_expansion(99) + assert context.total_chars == 99 + assert context.total_chars <= context.max_expansion_size + + # Exceeding budget is detectable by caller; no exception raised here + context.track_expansion(2) + assert context.total_chars == 101 + assert context.total_chars > context.max_expansion_size + + def test_track_expansion_exact_budget_limit_detectable(self) -> None: + """Exact budget limit is detectable by caller after track_expansion.""" + context = ResolutionContext(max_expansion_size=100) + + context.track_expansion(100) + assert context.total_chars == 100 + # At exactly the budget: caller may allow or deny based on policy + assert context.total_chars <= context.max_expansion_size + + # One more char pushes over the limit — caller detects via comparison + context.track_expansion(1) + assert context.total_chars == 101 + assert context.total_chars > context.max_expansion_size + + @given( + budget=st.integers(min_value=1, max_value=1000), + first_chunk=st.integers(min_value=0, max_value=500), + ) + @settings(max_examples=50) + def test_track_expansion_accumulates_accurately( + self, budget: int, first_chunk: int + ) -> None: + """Property: track_expansion() always accumulates total_chars precisely. + + For any budget and chunk sizes, total_chars must equal the exact sum of + all chunk arguments passed. The caller detects budget exhaustion via + ``total_chars > max_expansion_size``. + """ + context = ResolutionContext(max_expansion_size=budget) + + context.track_expansion(first_chunk) + assert context.total_chars == first_chunk + + over_budget = first_chunk > budget + event("boundary=at_or_over_budget" if over_budget else "boundary=under_budget") + + # Add one more chunk that guarantees budget is exceeded + second_chunk = budget - first_chunk + 1 + if second_chunk > 0: + context.track_expansion(second_chunk) + assert context.total_chars == first_chunk + second_chunk + assert context.total_chars > context.max_expansion_size + event("error_path=budget_exceeded") diff --git a/tests/runtime_resolver_depth_cycles_cases/select_expression_placeable_mixed_depth_limits.py b/tests/runtime_resolver_depth_cycles_cases/select_expression_placeable_mixed_depth_limits.py new file mode 100644 index 00000000..1a0c583b --- /dev/null +++ b/tests/runtime_resolver_depth_cycles_cases/select_expression_placeable_mixed_depth_limits.py @@ -0,0 +1,258 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_runtime_resolver_depth_cycles.py.""" + +from tests.runtime_resolver_depth_cycles_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SelectExpression / Placeable / Mixed Depth Limits +# ============================================================================ + + +class TestSelectExpressionDepthLimit: + """Verify depth limiting for SelectExpression recursion through variants. + + Regression: SEC-RESOLVE-RECURSION-6. + """ + + def _create_nested_select_ast(self, depth: int) -> Message: + """Create a Message with SelectExpression nested to specified depth.""" + inner_pattern = Pattern(elements=(TextElement(value="innermost"),)) + current_pattern = inner_pattern + + for _ in range(depth): + select_expr = SelectExpression( + selector=VariableReference(id=Identifier(name="var")), + variants=( + Variant( + key=Identifier(name="one"), + value=current_pattern, + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="other"),)), + default=True, + ), + ), + ) + current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) + + return Message( + id=Identifier(name="nested"), + value=current_pattern, + attributes=(), + comment=None, + ) + + def test_shallow_nesting_resolves_successfully(self) -> None: + """SelectExpression with shallow nesting resolves normally.""" + bundle = FluentBundle("en_US") + message = self._create_nested_select_ast(depth=5) + bundle._messages["nested"] = message + + result, errors = bundle.format_pattern("nested", {"var": "one"}) + + assert "innermost" in result + assert errors == () + + def test_deep_nesting_triggers_depth_limit(self) -> None: + """SelectExpression nested beyond MAX_DEPTH triggers depth limit.""" + bundle = FluentBundle("en_US", strict=False) + message = self._create_nested_select_ast(depth=MAX_DEPTH + 10) + bundle._messages["nested"] = message + + _result, errors = bundle.format_pattern("nested", {"var": "one"}) + + assert len(errors) >= 1 + error_messages = [str(e) for e in errors] + assert any("depth" in msg.lower() for msg in error_messages) + + def test_exact_max_depth_boundary(self) -> None: + """Behavior at exactly MAX_DEPTH does not crash.""" + bundle = FluentBundle("en_US", strict=False) + message = self._create_nested_select_ast(depth=MAX_DEPTH) + bundle._messages["nested"] = message + + result, _errors = bundle.format_pattern("nested", {"var": "one"}) + + assert result is not None + + def test_just_under_max_depth(self) -> None: + """Nesting just under MAX_DEPTH produces no depth errors.""" + bundle = FluentBundle("en_US") + message = self._create_nested_select_ast(depth=MAX_DEPTH - 5) + bundle._messages["nested"] = message + + _result, errors = bundle.format_pattern("nested", {"var": "one"}) + + depth_errors = [e for e in errors if "depth" in str(e).lower()] + assert len(depth_errors) == 0 + + +class TestNestedPlaceableDepthLimit: + """Verify depth limiting for nested Placeables like { { { x } } }.""" + + def _create_nested_placeable_ast(self, depth: int) -> Message: + """Create a Message with Placeables nested to specified depth.""" + inner_expr: InlineExpression = VariableReference(id=Identifier(name="var")) + current_expr: InlineExpression = inner_expr + + for _ in range(depth): + current_expr = Placeable(expression=current_expr) + + return Message( + id=Identifier(name="nested"), + value=Pattern(elements=(Placeable(expression=current_expr),)), + attributes=(), + comment=None, + ) + + def test_shallow_placeable_nesting_resolves(self) -> None: + """Shallow placeable nesting resolves normally.""" + bundle = FluentBundle("en_US") + message = self._create_nested_placeable_ast(depth=5) + bundle._messages["nested"] = message + + result, errors = bundle.format_pattern("nested", {"var": "hello"}) + + assert "hello" in result + assert errors == () + + def test_deep_placeable_nesting_triggers_limit(self) -> None: + """Deep placeable nesting triggers depth limit.""" + bundle = FluentBundle("en_US", strict=False) + message = self._create_nested_placeable_ast(depth=MAX_DEPTH + 10) + bundle._messages["nested"] = message + + _result, errors = bundle.format_pattern("nested", {"var": "hello"}) + + assert len(errors) >= 1 + + +class TestMixedNestingDepthLimit: + """Verify depth limiting for mixed SelectExpression and Placeable nesting.""" + + def _create_mixed_nesting_ast(self, select_depth: int, placeable_depth: int) -> Message: + """Create a Message mixing SelectExpression and Placeable nesting.""" + inner_expr: InlineExpression = VariableReference(id=Identifier(name="var")) + current_expr: InlineExpression = inner_expr + + for _ in range(placeable_depth): + current_expr = Placeable(expression=current_expr) + + current_pattern = Pattern(elements=(Placeable(expression=current_expr),)) + + for _ in range(select_depth): + select_expr = SelectExpression( + selector=VariableReference(id=Identifier(name="sel")), + variants=( + Variant( + key=Identifier(name="a"), + value=current_pattern, + default=False, + ), + Variant( + key=Identifier(name="b"), + value=Pattern(elements=(TextElement(value="b"),)), + default=True, + ), + ), + ) + current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) + + return Message( + id=Identifier(name="mixed"), + value=current_pattern, + attributes=(), + comment=None, + ) + + def test_combined_nesting_exceeds_limit(self) -> None: + """Combined nesting exceeding MAX_DEPTH produces depth error.""" + bundle = FluentBundle("en_US", strict=False) + message = self._create_mixed_nesting_ast( + select_depth=MAX_DEPTH // 2 + 10, + placeable_depth=MAX_DEPTH // 2 + 10, + ) + bundle._messages["mixed"] = message + + _result, errors = bundle.format_pattern("mixed", {"var": "x", "sel": "a"}) + + assert len(errors) >= 1 + + +class TestDepthLimitWithCustomLimit: + """Verify custom depth limit configuration.""" + + def test_custom_lower_depth_limit(self) -> None: + """Custom lower depth limit triggers earlier than default.""" + bundle = FluentBundle("en_US", max_nesting_depth=10, strict=False) + + inner_pattern = Pattern(elements=(TextElement(value="inner"),)) + current_pattern = inner_pattern + + for _ in range(15): # 15 > 10 custom limit, < 100 default + select_expr = SelectExpression( + selector=NumberLiteral(value=1, raw="1"), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=current_pattern, + default=True, + ), + ), + ) + current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) + + message = Message( + id=Identifier(name="test"), + value=current_pattern, + attributes=(), + comment=None, + ) + bundle._messages["test"] = message + + result, _errors = bundle.format_pattern("test", {}) + + assert result is not None + + +class TestDepthLimitPropertyBased: + """Property-based tests for depth limiting.""" + + @given(st.integers(min_value=1, max_value=50)) + @settings(max_examples=20) + def test_depth_under_limit_never_errors_on_depth(self, depth: int) -> None: + """Nesting under MAX_DEPTH produces no depth errors.""" + event(f"depth={depth}") + bundle = FluentBundle("en_US") + + inner_pattern = Pattern(elements=(TextElement(value="ok"),)) + current_pattern = inner_pattern + + for _ in range(depth): + select_expr = SelectExpression( + selector=NumberLiteral(value=1, raw="1"), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=current_pattern, + default=True, + ), + ), + ) + current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) + + message = Message( + id=Identifier(name="test"), + value=current_pattern, + attributes=(), + comment=None, + ) + bundle._messages["test"] = message + + result, errors = bundle.format_pattern("test", {}) + + depth_errors = [e for e in errors if "depth" in str(e).lower()] + assert len(depth_errors) == 0 + assert "ok" in result diff --git a/tests/runtime_resolver_selection_cases/__init__.py b/tests/runtime_resolver_selection_cases/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/runtime_resolver_selection_cases/fallback_and_errors.py b/tests/runtime_resolver_selection_cases/fallback_and_errors.py new file mode 100644 index 00000000..70af67c2 --- /dev/null +++ b/tests/runtime_resolver_selection_cases/fallback_and_errors.py @@ -0,0 +1,312 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +from datetime import UTC, datetime +from decimal import Decimal + +import pytest +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.function_bridge import FunctionRegistry +from ftllexengine.runtime.resolver import FluentResolver +from ftllexengine.syntax.ast import ( + CallArguments, + FunctionReference, + Identifier, + Message, + Pattern, + Placeable, + SelectExpression, + StringLiteral, + TextElement, + VariableReference, + Variant, +) + +# ============================================================================ +# PATTERN LOOP CONTINUATION +# ============================================================================ + + + +class TestFallbackVariantNoVariants: + """Empty variant list and missing default error paths (lines 645-648).""" + + def test_select_expression_with_no_variants_rejected_at_construction(self) -> None: + """SelectExpression with empty variants is rejected by __post_init__.""" + selector = VariableReference(id=Identifier("count")) + with pytest.raises(ValueError, match="requires at least one variant"): + SelectExpression(selector=selector, variants=()) + + def test_select_expression_without_default_rejected_at_construction(self) -> None: + """SelectExpression without a default variant is rejected by __post_init__.""" + selector = VariableReference(id=Identifier("count")) + variant = Variant( + key=Identifier("one"), + value=Pattern(elements=(TextElement(value="one"),)), + default=False, + ) + with pytest.raises(ValueError, match="exactly one default variant"): + SelectExpression(selector=selector, variants=(variant,)) + +class TestSelectExpressionFallbackPaths: + """Test fallback variant selection logic.""" + + def test_selector_error_uses_default_variant(self) -> None: + """When selector fails due to missing variable, uses default variant.""" + selector = VariableReference(id=Identifier("missing")) + variants = ( + Variant( + key=Identifier("one"), + value=Pattern(elements=(TextElement(value="variant one"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default variant"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {}) + assert "default variant" in result + assert len(errors) > 0 + + def test_selector_error_uses_default_variant_fallback(self) -> None: + """When selector fails, the marked default variant is selected.""" + selector = VariableReference(id=Identifier("missing")) + variants = ( + Variant( + key=Identifier("first"), + value=Pattern(elements=(TextElement(value="first variant"),)), + default=False, + ), + Variant( + key=Identifier("second"), + value=Pattern(elements=(TextElement(value="default variant"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, _errors = resolver.resolve_message(message, {}) + assert "default variant" in result + +class TestResolverFluentNumberVariantMatching: + """Test FluentNumber handling in variant selection.""" + + def test_fluent_number_matches_numeric_variant_key(self) -> None: + """FluentNumber value extraction for numeric variant matching (line 502).""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource( + """ +msg = { NUMBER($count) -> + [1000] Exactly one thousand + *[other] Other value +} +""" + ) + + result, errors = bundle.format_pattern("msg", {"count": 1000}) + assert len(errors) == 0 + assert "Exactly one thousand" in result + + def test_fluent_number_plural_category_selection(self) -> None: + """FluentNumber value extraction for CLDR plural matching (line 608).""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource( + """ +msg = { NUMBER($count) -> + [one] One item + *[other] Many items +} +""" + ) + + result, errors = bundle.format_pattern("msg", {"count": 1}) + assert len(errors) == 0 + assert "One item" in result + + def test_fluent_number_with_formatted_display(self) -> None: + """FluentNumber preserves numeric value for matching while showing formatted string.""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource( + """ +msg = { NUMBER($amount, minimumFractionDigits: 2) -> + [1000] Exactly one thousand + *[other] Other +} +""" + ) + + result, errors = bundle.format_pattern("msg", {"amount": 1000}) + assert len(errors) == 0 + assert "Exactly one thousand" in result + +class TestFormatValueComprehensive: + """Test _format_value with all FluentValue types.""" + + def _make_resolver(self) -> FluentResolver: + return FluentResolver( + locale="en_US", + messages={}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + def test_format_value_with_string(self) -> None: + """Verify _format_value handles strings.""" + resolver = self._make_resolver() + assert resolver._format_value("test") == "test" + assert resolver._format_value("") == "" + + def test_format_value_with_bool_true(self) -> None: + """Verify _format_value handles True as 'true'.""" + assert self._make_resolver()._format_value(True) == "true" + + def test_format_value_with_bool_false(self) -> None: + """Verify _format_value handles False as 'false'.""" + assert self._make_resolver()._format_value(False) == "false" + + def test_format_value_with_int(self) -> None: + """Verify _format_value handles integers.""" + resolver = self._make_resolver() + assert resolver._format_value(42) == "42" + assert resolver._format_value(0) == "0" + assert resolver._format_value(-100) == "-100" + + def test_format_value_with_decimal(self) -> None: + """Verify _format_value handles Decimal values.""" + resolver = self._make_resolver() + assert resolver._format_value(Decimal("3.14")) == "3.14" + assert resolver._format_value(Decimal(0)) == "0" + assert resolver._format_value(Decimal("123.45")) == "123.45" + + def test_format_value_with_none(self) -> None: + """Verify _format_value handles None as empty string.""" + assert self._make_resolver()._format_value(None) == "" + + def test_format_value_with_datetime(self) -> None: + """Verify _format_value handles datetime via str().""" + dt = datetime(2025, 12, 11, 15, 30, 45, tzinfo=UTC) + result = self._make_resolver()._format_value(dt) + assert "2025" in result + assert "12" in result + assert "11" in result + + @given( + value=st.one_of( + st.text(), + st.integers(), + st.decimals(allow_nan=False, allow_infinity=False), + st.booleans(), + st.none(), + ) + ) + def test_format_value_never_raises(self, value: str | int | Decimal | bool | None) -> None: + """Property: _format_value never raises exceptions.""" + event(f"value_type={type(value).__name__}") + result = self._make_resolver()._format_value(value) + assert isinstance(result, str) + +class TestResolverErrorPaths: + """Test error handling paths in resolver.""" + + def test_missing_variable_returns_error_message(self) -> None: + """Missing variable in select expression returns error with fallback.""" + ftl = """test = { $x -> + [a] Value A + *[b] Default +} +""" + bundle = FluentBundle("en", use_isolating=False, strict=False) + bundle.add_resource(ftl) + + result, errors = bundle.format_pattern("test", {}) + assert len(errors) > 0 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.REFERENCE + assert errors[0].diagnostic is not None + assert errors[0].diagnostic.code.name == "VARIABLE_NOT_PROVIDED" + assert result == "Default" + +class TestPlaceableWithFormattingError: + """Coverage for Placeable exception path with FrozenFluentError FORMATTING.""" + + def test_placeable_formatting_error_with_fallback(self) -> None: + """Placeable that raises FrozenFluentError (FORMATTING) uses fallback value.""" + from ftllexengine.diagnostics import ( + FrozenErrorContext, + ) + + def raise_formatting_error(_value: str) -> str: + context = FrozenErrorContext( + input_value="test", + locale_code="en", + parse_type="number", + fallback_value="FALLBACK", + ) + msg = "Custom formatting error" + raise FrozenFluentError( + msg, + ErrorCategory.FORMATTING, + context=context, + ) + + registry = FunctionRegistry() + registry.register(raise_formatting_error, ftl_name="ERROR_FUNC") + + func_call = FunctionReference( + id=Identifier("ERROR_FUNC"), + arguments=CallArguments( + positional=(StringLiteral(value="test"),), + named=(), + ), + ) + + pattern = Pattern( + elements=( + TextElement(value="Before "), + Placeable(expression=func_call), + TextElement(value=" After"), + ) + ) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=registry, + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {}) + assert result == "Before FALLBACK After" + assert len(errors) == 1 + assert isinstance(errors[0], FrozenFluentError) + assert errors[0].category == ErrorCategory.FORMATTING diff --git a/tests/runtime_resolver_selection_cases/number_literal_edges.py b/tests/runtime_resolver_selection_cases/number_literal_edges.py new file mode 100644 index 00000000..40da635f --- /dev/null +++ b/tests/runtime_resolver_selection_cases/number_literal_edges.py @@ -0,0 +1,435 @@ +# mypy: ignore-errors +from __future__ import annotations + +from decimal import Decimal + +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.function_bridge import FunctionRegistry +from ftllexengine.runtime.resolver import FluentResolver +from ftllexengine.syntax.ast import ( + Identifier, + Message, + NumberLiteral, + Pattern, + Placeable, + SelectExpression, + TextElement, + VariableReference, + Variant, +) + +# ============================================================================ +# PATTERN LOOP CONTINUATION +# ============================================================================ + + + +class TestNumericVariantEdgeCases: + """Edge cases for numeric variant matching.""" + + def test_boolean_does_not_match_number_variant(self) -> None: + """Boolean values do not match numeric variants (isinstance guard).""" + selector = VariableReference(id=Identifier("flag")) + variants = ( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="numeric one"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {"flag": True}) + assert not errors + assert "default" in result + + def test_none_selector_uses_default(self) -> None: + """None selector value falls through to default.""" + selector = VariableReference(id=Identifier("value")) + variants = ( + Variant( + key=Identifier("none"), + value=Pattern(elements=(TextElement(value="none variant"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default variant"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {"value": None}) + assert not errors + assert "default variant" in result + + @given( + decimal_str=st.decimals( + min_value=Decimal("-100.00"), + max_value=Decimal("100.00"), + allow_nan=False, + allow_infinity=False, + places=2, + ) + ) + def test_decimal_variant_matching_property(self, decimal_str: Decimal) -> None: + """Property: Decimal values match exactly when variant key matches.""" + sign = "negative" if decimal_str.is_signed() else "positive" + event(f"decimal_sign={sign}") + selector = VariableReference(id=Identifier("amount")) + str_repr = str(decimal_str) + variants = ( + Variant( + key=NumberLiteral(value=decimal_str, raw=str_repr), + value=Pattern(elements=(TextElement(value="exact"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {"amount": decimal_str}) + assert not errors + assert "exact" in result + +class TestNumberLiteralNonMatchingValue: + """Coverage for NumberLiteral with non-matching value (line 616->611).""" + + def test_number_literal_variants_first_no_match_second_matches(self) -> None: + """Multiple NumberLiteral variants where first doesn't match, second does.""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("count")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=2, raw="2"), + value=Pattern(elements=(TextElement(value="two"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=3, raw="3"), + value=Pattern(elements=(TextElement(value="three"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="fallback"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"count": 2}) + assert result == "two" + assert errors == () + + def test_number_literal_variants_all_no_match_uses_default(self) -> None: + """NumberLiteral variants all fail to match, use default.""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("count")), + variants=( + Variant( + key=NumberLiteral(value=10, raw="10"), + value=Pattern(elements=(TextElement(value="ten"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=20, raw="20"), + value=Pattern(elements=(TextElement(value="twenty"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"count": 5}) + assert result == "default" + assert errors == () + + def test_number_literal_with_decimal_no_match(self) -> None: + """NumberLiteral variants with Decimal selector that doesn't match.""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("amount")), + variants=( + Variant( + key=NumberLiteral(value=100, raw="100"), + value=Pattern(elements=(TextElement(value="hundred"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=200, raw="200"), + value=Pattern(elements=(TextElement(value="two_hundred"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="other_amount"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"amount": Decimal("150.50")}) + assert result == "other_amount" + assert errors == () + + def test_number_literal_decimal_no_exact_match(self) -> None: + """NumberLiteral variants with Decimal that doesn't exactly match.""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("val")), + variants=( + Variant( + key=NumberLiteral(value=Decimal("1.0"), raw="1.0"), + value=Pattern(elements=(TextElement(value="one_point_oh"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=Decimal("2.5"), raw="2.5"), + value=Pattern(elements=(TextElement(value="two_point_five"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="other_decimal"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"val": Decimal("3.7")}) + assert result == "other_decimal" + assert errors == () + +class TestNumberLiteralSelectorCoverage: + """Test NumberLiteral selector branch in _find_exact_variant (branch 400->395).""" + + def test_number_literal_selector_exact_match(self) -> None: + """Branch 400->395 - Number literal variant exact matching.""" + bundle = FluentBundle("en_US", use_isolating=False) + + bundle.add_resource( + """ +items = { $count -> + [0] No items + [1] One item + [42] The answer + *[other] { $count } items +} +""" + ) + + result, _ = bundle.format_pattern("items", {"count": 0}) + assert "No items" in result + + result, _ = bundle.format_pattern("items", {"count": 1}) + assert "One item" in result + + result, _ = bundle.format_pattern("items", {"count": 42}) + assert "The answer" in result + + def test_number_literal_selector_no_match(self) -> None: + """Branch 400->395 - Number literal no match falls through to default.""" + bundle = FluentBundle("en_US", use_isolating=False) + + bundle.add_resource( + """ +level = { $num -> + [1] Level 1 + [2] Level 2 + *[other] Level unknown +} +""" + ) + + result, _ = bundle.format_pattern("level", {"num": 99}) + assert "Level unknown" in result + + def test_number_literal_with_float_selector(self) -> None: + """Branch 400->395 - Float selector matching number literals.""" + bundle = FluentBundle("en_US", use_isolating=False) + + bundle.add_resource( + """ +rating = { $stars -> + [1] Poor + [2] Fair + [3] Good + [4] Great + [5] Excellent + *[other] Unrated +} +""" + ) + + result, _ = bundle.format_pattern("rating", {"stars": Decimal(5)}) + assert "Excellent" in result + + result, _ = bundle.format_pattern("rating", {"stars": Decimal("3.5")}) + assert "Unrated" in result + + def test_number_literal_match_second_key(self) -> None: + """Branch 400->395 - Number literal match on second+ key (loop continuation).""" + bundle = FluentBundle("en_US", use_isolating=False) + + bundle.add_resource( + """ +score = { $points -> + [10] Ten points + [20] Twenty points + [30] Thirty points + *[other] Unknown +} +""" + ) + + result, _ = bundle.format_pattern("score", {"points": 20}) + assert "Twenty points" in result + + result, _ = bundle.format_pattern("score", {"points": 30}) + assert "Thirty points" in result + +class TestNumberLiteralVariantMatching: + """Test exact number literal matching in select expressions.""" + + def test_exact_number_literal_match_with_integer(self) -> None: + """Exact match with integer NumberLiteral (line 479).""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource( + """ +msg = { $count -> + [0] zero items + [1] one item + [42] exactly forty-two + *[other] many items +} +""" + ) + + result, errors = bundle.format_pattern("msg", {"count": 42}) + assert result == "exactly forty-two" + assert errors == () + + def test_exact_number_literal_match_with_decimal_pi(self) -> None: + """Exact match with Decimal NumberLiteral value (pi example).""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource( + """ +msg = { $value -> + [3.14] pi + [2.71] euler + *[other] unknown +} +""" + ) + + result, errors = bundle.format_pattern("msg", {"value": Decimal("3.14")}) + assert result == "pi" + assert errors == () + + def test_exact_number_literal_match_with_decimal(self) -> None: + """Exact match with Decimal NumberLiteral (financial value precision).""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource( + """ +msg = { $amount -> + [99.99] special price + *[other] regular price +} +""" + ) + + result, errors = bundle.format_pattern("msg", {"amount": Decimal("99.99")}) + assert result == "special price" + assert errors == () diff --git a/tests/runtime_resolver_selection_cases/numeric_matching.py b/tests/runtime_resolver_selection_cases/numeric_matching.py new file mode 100644 index 00000000..2d937908 --- /dev/null +++ b/tests/runtime_resolver_selection_cases/numeric_matching.py @@ -0,0 +1,452 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +from decimal import Decimal + +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.runtime.function_bridge import FunctionRegistry +from ftllexengine.runtime.resolver import FluentResolver +from ftllexengine.syntax.ast import ( + Identifier, + Message, + NumberLiteral, + Pattern, + Placeable, + SelectExpression, + TextElement, + VariableReference, + Variant, +) + +# ============================================================================ +# PATTERN LOOP CONTINUATION +# ============================================================================ + + + +class TestNumberLiteralVariantWithNonNumericSelector: + """Coverage for NumberLiteral variant key with non-numeric selector (line 616->611).""" + + def test_number_literal_variant_with_string_selector(self) -> None: + """SelectExpression with NumberLiteral variants but string selector falls to default.""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("val")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=2, raw="2"), + value=Pattern(elements=(TextElement(value="two"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="fallback"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"val": "not_a_number"}) + assert result == "fallback" + assert errors == () + + def test_number_literal_variant_with_none_selector(self) -> None: + """SelectExpression with NumberLiteral variant but None selector falls to default.""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("val")), + variants=( + Variant( + key=NumberLiteral(value=42, raw="42"), + value=Pattern(elements=(TextElement(value="forty-two"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"val": None}) + assert result == "default" + assert errors == () + + def test_number_literal_variant_with_bool_selector(self) -> None: + """Bool selector matches identifier variant, not NumberLiteral. + + Booleans are excluded from numeric matching (even though isinstance(True, int)) + because they should match [true]/[false] identifier variants, not number literals. + """ + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("val")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="number_one"),)), + default=False, + ), + Variant( + key=Identifier("true"), + value=Pattern(elements=(TextElement(value="bool_true"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="fallback"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"val": True}) + assert result == "bool_true" + assert errors == () + + def test_number_literal_variants_with_date_selector(self) -> None: + """SelectExpression with NumberLiteral variants but date selector falls to default.""" + from datetime import date + + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("val")), + variants=( + Variant( + key=NumberLiteral(value=3, raw="3"), + value=Pattern(elements=(TextElement(value="three"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="not_numeric"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"val": date(2024, 1, 1)}) + assert result == "not_numeric" + assert errors == () + +class TestVariantMatchingBranches: + """Test variant matching loop continuation branches.""" + + def test_select_with_non_matching_number_literals_covers_loop_continuation( + self, + ) -> None: + """SelectExpression with non-matching NumberLiterals covers 634->629.""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("num")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=2, raw="2"), + value=Pattern(elements=(TextElement(value="two"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=3, raw="3"), + value=Pattern(elements=(TextElement(value="three"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"num": 99}) + assert result == "default" + assert errors == () + + def test_select_with_string_matching_identifier_after_number_literals(self) -> None: + """String selector skips NumberLiteral variants to match Identifier (634->629).""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("status")), + variants=( + Variant( + key=NumberLiteral(value=100, raw="100"), + value=Pattern(elements=(TextElement(value="hundred"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=200, raw="200"), + value=Pattern(elements=(TextElement(value="two_hundred"),)), + default=False, + ), + Variant( + key=Identifier("active"), + value=Pattern(elements=(TextElement(value="Active"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"status": "active"}) + assert result == "Active" + assert errors == () + + def test_select_with_bool_selector_skips_number_literals(self) -> None: + """Bool selector skips NumberLiterals, matches Identifier (634->629).""" + select_expr = SelectExpression( + selector=VariableReference(id=Identifier("flag")), + variants=( + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="zero"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one"),)), + default=False, + ), + Variant( + key=Identifier("true"), + value=Pattern(elements=(TextElement(value="yes"),)), + default=False, + ), + Variant( + key=Identifier("false"), + value=Pattern(elements=(TextElement(value="no"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="unknown"),)), + default=True, + ), + ), + ) + + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"flag": True}) + assert result == "yes" + assert errors == () + +class TestVariantNumericMatching: + """Numeric variant matching (line 479->474 coverage).""" + + def test_exact_number_literal_match(self) -> None: + """Exact number match with NumberLiteral variant key.""" + selector = VariableReference(id=Identifier("count")) + variants = ( + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="zero items"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one item"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {"count": 0}) + assert not errors + assert "zero items" in result + + result, errors = resolver.resolve_message(message, {"count": 1}) + assert not errors + assert "one item" in result + + def test_decimal_exact_match_in_variant(self) -> None: + """Decimal value matches NumberLiteral variant key.""" + selector = VariableReference(id=Identifier("amount")) + variants = ( + Variant( + key=NumberLiteral(value=Decimal("1.5"), raw="1.5"), + value=Pattern(elements=(TextElement(value="exact match"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="default"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {"amount": Decimal("1.5")}) + assert not errors + assert "exact match" in result + + def test_float_exact_match_in_variant(self) -> None: + """Float value matches NumberLiteral variant key.""" + selector = VariableReference(id=Identifier("price")) + variants = ( + Variant( + key=NumberLiteral(value=Decimal("9.99"), raw="9.99"), + value=Pattern(elements=(TextElement(value="special price"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="regular price"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {"price": Decimal("9.99")}) + assert not errors + assert "special price" in result + + @given(number=st.integers(min_value=-100, max_value=100)) + def test_integer_exact_matching_property(self, number: int) -> None: + """Property: Integer selectors match NumberLiteral variants exactly.""" + event(f"number={number}") + selector = VariableReference(id=Identifier("n")) + variants = ( + Variant( + key=NumberLiteral(value=number, raw=str(number)), + value=Pattern(elements=(TextElement(value="matched"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement(value="not matched"),)), + default=True, + ), + ) + select_expr = SelectExpression(selector=selector, variants=variants) + pattern = Pattern(elements=(Placeable(expression=select_expr),)) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {"n": number}) + assert not errors + assert "matched" in result diff --git a/tests/runtime_resolver_selection_cases/pattern_resolution.py b/tests/runtime_resolver_selection_cases/pattern_resolution.py new file mode 100644 index 00000000..b7332740 --- /dev/null +++ b/tests/runtime_resolver_selection_cases/pattern_resolution.py @@ -0,0 +1,392 @@ +# mypy: ignore-errors +# mypy: ignore-errors +from __future__ import annotations + +from decimal import Decimal + +import pytest + +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.runtime.function_bridge import FunctionRegistry +from ftllexengine.runtime.resolution_context import ResolutionContext +from ftllexengine.runtime.resolver import FluentResolver +from ftllexengine.syntax.ast import ( + Identifier, + Message, + NumberLiteral, + Pattern, + Placeable, + SelectExpression, + TextElement, + VariableReference, + Variant, +) + +# ============================================================================ +# PATTERN LOOP CONTINUATION +# ============================================================================ + + + +class TestPatternLoopContinuation: + """Coverage for pattern loop continuation (line 390->386).""" + + def test_empty_pattern_no_elements(self) -> None: + """Pattern with no elements exits loop immediately.""" + pattern = Pattern(elements=()) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {}) + assert result == "" + assert errors == () + + def test_pattern_text_then_placeable_then_text(self) -> None: + """Pattern with alternating Text/Placeable/Text elements.""" + pattern = Pattern( + elements=( + TextElement(value="Start "), + Placeable(expression=VariableReference(id=Identifier("var"))), + TextElement(value=" End"), + ) + ) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"var": "X"}) + assert result == "Start X End" + assert errors == () + + def test_pattern_only_text_elements(self) -> None: + """Pattern with only TextElements (no Placeables).""" + pattern = Pattern( + elements=( + TextElement(value="First "), + TextElement(value="Second "), + TextElement(value="Third"), + ) + ) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + ) + + result, errors = resolver.resolve_message(message, {}) + assert result == "First Second Third" + assert errors == () + +class TestPatternResolutionBranches: + """Test pattern resolution loop continuation branches.""" + + def test_pattern_with_multiple_text_elements_covers_loop_continuation(self) -> None: + """Pattern with TextElement followed by another TextElement covers 404->400.""" + pattern = Pattern( + elements=( + TextElement(value="Hello "), + TextElement(value="World"), + ) + ) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {}) + assert result == "Hello World" + assert errors == () + + def test_pattern_text_then_placeable_covers_loop_continuation(self) -> None: + """Pattern with TextElement followed by Placeable covers 404->400.""" + pattern = Pattern( + elements=( + TextElement(value="Value: "), + Placeable(expression=VariableReference(id=Identifier("x"))), + ) + ) + message = Message(id=Identifier("msg"), value=pattern, attributes=()) + + resolver = FluentResolver( + locale="en", + messages={"msg": message}, + terms={}, + function_registry=FunctionRegistry(), + use_isolating=False, + ) + + result, errors = resolver.resolve_message(message, {"x": "42"}) + assert "Value: " in result + assert "42" in result + assert errors == () + + def test_pattern_three_elements_ensures_multiple_loop_iterations(self) -> None: + """Pattern with three elements ensures loop continuation branch is hit.""" + ftl = """msg = Start { $var } End""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("msg", {"var": "middle"}) + assert result == "Start middle End" + +class TestMatchCaseBranchCoverage: + """Test match/case control flow branches in resolver.""" + + def test_placeable_followed_by_text_in_pattern(self) -> None: + """Pattern with Placeable followed by TextElement tests 404->400 branch.""" + ftl = """msg = { $x } text""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("msg", {"x": "value"}) + assert result == "value text" + + def test_multiple_placeables_in_pattern(self) -> None: + """Pattern with multiple Placeables ensures loop continuation.""" + ftl = """msg = { $a }{ $b }""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("msg", {"a": "A", "b": "B"}) + assert result == "AB" + + def test_select_with_number_literal_then_identifier_variant(self) -> None: + """SelectExpression with NumberLiteral followed by Identifier variant covers 634->629.""" + ftl = """ +msg = { $val -> + [1] one + [2] two + *[other] default +} +""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("msg", {"val": "other"}) + assert result == "default" + + def test_select_number_literal_no_match_continues_to_next(self) -> None: + """SelectExpression where first NumberLiteral doesn't match, second does.""" + ftl = """ +msg = { $count -> + [10] ten + [20] twenty + [30] thirty + *[other] default +} +""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("msg", {"count": 20}) + assert result == "twenty" + + def test_select_with_isolating_enabled_exercises_placeable_branch(self) -> None: + """Pattern with use_isolating=True covers Placeable branch with isolation.""" + ftl = """msg = Prefix { $val } Suffix""" + bundle = FluentBundle("en", use_isolating=True) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("msg", {"val": "middle"}) + assert "Prefix" in result + assert "middle" in result + assert "Suffix" in result + +class TestTextElementBranch: + """Test TextElement branch in pattern resolution.""" + + def test_pattern_with_only_text_no_placeables(self) -> None: + """Pattern with only TextElement, no Placeable (line 286->282).""" + bundle = FluentBundle("en_US") + bundle.add_resource("simple = This is plain text with no variables") + + result, errors = bundle.format_pattern("simple") + assert result == "This is plain text with no variables" + assert errors == () + +class TestSelectExpressionEdgeCases: + """Test edge cases in select expression resolution.""" + + def test_select_with_no_matching_variant_uses_default(self) -> None: + """Select with no match uses default variant.""" + ftl = """ +test = { $value -> + [one] One + *[other] Other +} +""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("test", {"value": "unknown"}) + assert "Other" in result + + def test_select_with_number_tries_plural_category(self) -> None: + """Select with number value tries plural category matching.""" + ftl = """ +test = { $count -> + [one] One item + *[other] Many items +} +""" + bundle = FluentBundle("en", use_isolating=False) + bundle.add_resource(ftl) + + result, _ = bundle.format_pattern("test", {"count": 1}) + assert "One item" in result + + result, _ = bundle.format_pattern("test", {"count": 5}) + assert "Many items" in result + + def test_select_with_no_default_raises_at_construction(self) -> None: + """SelectExpression with no default variant raises ValueError at construction.""" + with pytest.raises(ValueError, match="exactly one default variant"): + SelectExpression( + selector=VariableReference(id=Identifier(name="x")), + variants=( + Variant( + key=Identifier(name="a"), + value=Pattern(elements=(TextElement(value="A"),)), + default=False, + ), + Variant( + key=Identifier(name="b"), + value=Pattern(elements=(TextElement(value="B"),)), + default=False, + ), + ), + ) + + def test_select_with_empty_variants_raises_at_construction(self) -> None: + """SelectExpression with no variants raises ValueError at construction.""" + with pytest.raises(ValueError, match="at least one variant"): + SelectExpression( + selector=VariableReference(id=Identifier(name="x")), + variants=(), + ) + + def test_number_literal_rejects_invalid_raw(self) -> None: + """NumberLiteral.__post_init__ prevents construction with invalid raw strings. + + Previously, the resolver handled programmatically constructed ASTs where + NumberLiteral.raw was unparseable as Decimal. NumberLiteral now enforces + the invariant at construction time, making such ASTs impossible via the + normal API. + """ + with pytest.raises(ValueError, match="not a valid number literal"): + NumberLiteral(value=Decimal("0.0"), raw="invalid") + + def test_deeply_nested_select_expression_fallback(self) -> None: + """Deeply nested SelectExpression in fallback generation doesn't overflow.""" + from ftllexengine.runtime.functions import ( + create_default_registry, + ) + from ftllexengine.syntax.ast import Expression + + nested_select: Expression = VariableReference(id=Identifier(name="missing")) + for _ in range(100): + nested_select = SelectExpression( + selector=nested_select, # type: ignore[arg-type] + variants=( + Variant( + key=Identifier(name="key"), + value=Pattern(elements=(TextElement(value="Value"),)), + default=True, + ), + ), + ) + + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=nested_select),)), + attributes=(), + ) + + resolver = FluentResolver( + locale="en", + messages={"test": msg}, + terms={}, + function_registry=create_default_registry(), + use_isolating=False, + ) + + result, _ = resolver.resolve_message(msg, {}) + assert isinstance(result, str) + assert len(result) > 0 + +class TestSelectVariantBranchCoverage: + """Direct resolver internal calls for select expression branch coverage.""" + + def test_select_variant_loop_with_no_match_on_number_literal(self) -> None: + """Select expression where no NumberLiteral matches continues loop to default.""" + resolver = FluentResolver( + locale="en", + messages={}, + terms={}, + function_registry=FunctionRegistry(), + ) + + selector = NumberLiteral(value=5, raw="5") + variants = ( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=()), + default=False, + ), + Variant( + key=NumberLiteral(value=2, raw="2"), + value=Pattern(elements=()), + default=False, + ), + Variant( + key=NumberLiteral(value=3, raw="3"), + value=Pattern(elements=()), + default=True, + ), + ) + + select_expr = SelectExpression(selector=selector, variants=variants) + context = ResolutionContext() + result = resolver._resolve_select_expression(select_expr, {}, [], context) + assert result == "" + + def test_pattern_elements_loop_with_text_only(self) -> None: + """Pattern resolution with only TextElement tests loop continuation.""" + resolver = FluentResolver( + locale="en", + messages={}, + terms={}, + function_registry=FunctionRegistry(), + ) + + pattern = Pattern( + elements=( + TextElement(value="Hello "), + TextElement(value="World"), + TextElement(value="!"), + ) + ) + + context = ResolutionContext() + result = resolver._resolve_pattern(pattern, {}, [], context) + assert result == "Hello World!" diff --git a/tests/strategies/currency.py b/tests/strategies/currency.py index 4d03fdec..7df9239d 100644 --- a/tests/strategies/currency.py +++ b/tests/strategies/currency.py @@ -19,6 +19,7 @@ from decimal import Decimal +from babel.numbers import format_decimal from hypothesis import event from hypothesis import strategies as st from hypothesis.strategies import composite @@ -85,6 +86,11 @@ ] +def _format_amount_for_locale(amount: Decimal, locale: str) -> str: + """Format one Decimal using the locale's decimal separators.""" + return str(format_decimal(amount, locale=locale, decimal_quantization=False)) + + @composite def currency_amounts(draw: st.DrawFn) -> Decimal: """Generate realistic currency amounts. @@ -148,16 +154,17 @@ def unambiguous_currency_inputs( fmt = draw(st.sampled_from([ "prefix", "suffix", "iso_prefix", "iso_suffix", ])) + formatted_amount = _format_amount_for_locale(amount, locale) match fmt: case "prefix": - value = f"{symbol}{amount}" + value = f"{symbol}{formatted_amount}" case "suffix": - value = f"{amount} {symbol}" + value = f"{formatted_amount} {symbol}" case "iso_prefix": - value = f"{code} {amount}" + value = f"{code} {formatted_amount}" case _: # iso_suffix - value = f"{amount} {code}" + value = f"{formatted_amount} {code}" event("currency_input_type=unambiguous_symbol") event(f"currency_input_format={fmt}") @@ -217,12 +224,13 @@ def iso_code_currency_inputs( locale = draw(st.sampled_from(_FORMATTING_LOCALES)) amount = draw(currency_amounts()) position = draw(st.sampled_from(["prefix", "suffix"])) + formatted_amount = _format_amount_for_locale(amount, locale) match position: case "prefix": - value = f"{code} {amount}" + value = f"{code} {formatted_amount}" case _: - value = f"{amount} {code}" + value = f"{formatted_amount} {code}" event("currency_input_type=iso_code") event(f"currency_iso_position={position}") diff --git a/tests/strategies/ftl.py b/tests/strategies/ftl.py index 79c1c5e5..bc2ae03a 100644 --- a/tests/strategies/ftl.py +++ b/tests/strategies/ftl.py @@ -1,2657 +1,8 @@ -"""Hypothesis strategies for generating valid FTL syntax. - -Provides custom strategies for property-based testing of the Fluent parser, -serializer, and resolver. - -Strategy Categories: -- String strategies: Generate FTL source text (for parsing) -- AST strategies: Generate AST nodes directly (for serialization) -- Edge case strategies: Generate boundary conditions -""" - -from __future__ import annotations - -import string -from decimal import Decimal - -from hypothesis import event -from hypothesis import strategies as st -from hypothesis.strategies import composite - -from ftllexengine.enums import CommentType -from ftllexengine.runtime.function_bridge import FluentNumber -from ftllexengine.syntax.ast import ( - Attribute, - CallArguments, - Comment, - Expression, - FunctionReference, - Identifier, - InlineExpression, - Junk, - Message, - MessageReference, - NamedArgument, - NumberLiteral, - Pattern, - Placeable, - Resource, - SelectExpression, - StringLiteral, - Term, - TermReference, - TextElement, - VariableReference, - Variant, -) - -# ============================================================================= -# Constants -# ============================================================================= - -# FTL identifier character sets per spec: [a-zA-Z][a-zA-Z0-9_-]* -# CRITICAL: Both uppercase AND lowercase letters are valid per FTL specification. -FTL_IDENTIFIER_FIRST_CHARS: str = string.ascii_letters # [a-zA-Z] -FTL_IDENTIFIER_REST_CHARS: str = string.ascii_letters + string.digits + "-_" - -# Common identifier parts for testing -IDENTIFIER_PARTS = ("foo", "bar", "baz", "value", "count", "name", "id", "key") - -# FTL-safe alphabet (no special FTL characters) -FTL_SAFE_CHARS = string.ascii_letters + string.digits + " .,!?'-" - -# Unicode test characters (various scripts and special chars) -UNICODE_CHARS = ( - "\u4e16\u754c" # Chinese: world - "\u0414\u043e\u0431\u0440\u043e" # Russian: Dobro - "\u3053\u3093\u306b\u3061\u306f" # Japanese: konnichiwa - "\u00e9\u00e0\u00fc\u00f1" # Latin extended: accents - "\u2019\u2018\u201c\u201d" # Smart quotes -) - - -# ============================================================================= -# String Strategies (for parsing) -# ============================================================================= - - -@composite -def ftl_identifiers(draw: st.DrawFn) -> str: - """Generate valid FTL identifiers. - - FTL spec: [a-zA-Z][a-zA-Z0-9_-]* - Uses both uppercase AND lowercase per specification. - """ - first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) - rest = draw( - st.text( - alphabet=FTL_IDENTIFIER_REST_CHARS, - max_size=20, - ) - ) - return first + rest - - -# Reserved keywords in FTL (for intensive fuzzing of keyword handling) -FTL_RESERVED_KEYWORDS = ( - "NUMBER", - "DATETIME", - "one", - "other", - "zero", - "two", - "few", - "many", -) - - -@composite -def ftl_identifiers_with_keywords(draw: st.DrawFn) -> str: - """Generate FTL identifiers, sometimes using reserved keywords. - - Used for intensive fuzzing to test keyword handling paths. - 50% chance of returning a reserved keyword, otherwise a random identifier. - """ - if draw(st.booleans()): - return draw(st.sampled_from(FTL_RESERVED_KEYWORDS)) - - first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) - rest = draw( - st.text( - alphabet=FTL_IDENTIFIER_REST_CHARS, - max_size=64, - ) - ) - return first + rest - - -@composite -def ftl_identifier_boundary(draw: st.DrawFn) -> str: - """Generate boundary-case identifiers for edge testing. - - Tests single-char, long identifiers, and repeated separators. - """ - choice = draw(st.sampled_from(["single", "long", "hyphen", "underscore"])) - if choice == "single": - return draw(st.sampled_from("abcdefghijklmnopqrstuvwxyz")) - if choice == "long": - # Maximum practical length - return "a" + draw( - st.text( - alphabet="abcdefghijklmnopqrstuvwxyz0123456789", - min_size=200, - max_size=200, - ) - ) - if choice == "hyphen": - return "a" + "-" * draw(st.integers(1, 10)) + "b" - # underscore - return "a" + "_" * draw(st.integers(1, 10)) + "b" - - -@composite -def ftl_simple_text(draw: st.DrawFn) -> str: - """Generate simple text without special FTL characters. - - Ensures text is not whitespace-only (blank lines are message separators). - """ - text = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=50)) - # Ensure not whitespace-only - if text.strip() == "": - text = draw(st.sampled_from(string.ascii_letters)) - return text - - -@composite -def ftl_unicode_text(draw: st.DrawFn) -> str: - """Generate text with comprehensive Unicode coverage. - - Uses Hypothesis's full Unicode text strategy, filtering only: - - FTL structural characters: { } [ ] * $ - . # - - Control characters (Cc category) - - Newlines (message separators) - - Surrogates (Cs category) - - This provides much broader Unicode coverage than the limited UNICODE_CHARS - constant, including non-BMP characters, ZWJ sequences, RTL text, etc. - (MAINT-FUZZ-UNICODE-UNDEREXPOSURE-001) - """ - # Full Unicode text with FTL structural chars filtered - text = draw( - st.text( - alphabet=st.characters( - blacklist_categories=("Cc", "Cs"), # No control chars or surrogates - blacklist_characters="{}[]*$-.#\n\r", # No FTL structural chars - ), - min_size=1, - max_size=30, - ) - ) - # Ensure non-whitespace content - if text.strip() == "": - text = draw(st.sampled_from(list(UNICODE_CHARS))) - return text - - -@composite -def ftl_unicode_stress_text(draw: st.DrawFn) -> str: - """Generate Unicode stress test cases. - - Events emitted: - - unicode={category}: Unicode stress category (emoji, rtl, combining, etc.) - - Specifically targets edge cases that may cause encoding or display issues: - - Non-BMP characters (emoji, math symbols) - - ZWJ sequences - - RTL markers and bidirectional text - - Combining characters - - Rare scripts - """ - # Stress cases with categories for event emission - stress_cases = [ - ("\U0001F600", "emoji"), # Emoji (non-BMP) - ("\U0001F469\u200D\U0001F4BB", "zwj"), # ZWJ sequence (woman technologist) - ("\u202Eevil\u202C", "rtl"), # RTL override - ("cafe\u0301", "combining"), # Combining accent (e as e + combining acute) - ("\u0627\u0644\u0639\u0631\u0628\u064A\u0629", "arabic"), # Arabic - ("\u4E2D\u6587", "cjk"), # Chinese - ("\u0928\u092E\u0938\u094D\u0924\u0947", "devanagari"), # Hindi (Devanagari) - ("\uFEFF", "bom"), # BOM - ("\u200B", "zero_width"), # Zero-width space - ("\u00A0", "nbsp"), # Non-breaking space - ("\U0001F1FA\U0001F1F8", "flag"), # Flag emoji (regional indicators) - ] - text, category = draw(st.sampled_from(stress_cases)) - - # Emit event for HypoFuzz coverage guidance - event(f"unicode={category}") - - return text - - -# ============================================================================= -# Chaos Mode Strategies (parser stress testing) -# ============================================================================= - - -@composite -def ftl_chaos_text(draw: st.DrawFn) -> str: - """Generate text WITH FTL structural characters for parser stress testing. - - Unlike ftl_unicode_text() which filters out {}[]*$-.#, this strategy - INCLUDES these characters to test parser error recovery, escape handling, - and edge cases where FTL syntax appears in unexpected places. - - WARNING: This generates potentially invalid FTL. Use for: - - Parser error recovery testing - - Junk node generation testing - - Fuzzing edge cases - - Do NOT use for roundtrip testing where valid FTL is required. - """ - # Include FTL structural characters - text = draw( - st.text( - alphabet=st.characters( - blacklist_categories=("Cc", "Cs"), # No control chars or surrogates - blacklist_characters="\n\r", # Only filter newlines (entry separators) - ), - min_size=1, - max_size=50, - ) - ) - # Ensure non-whitespace content - if text.strip() == "": - text = draw(st.sampled_from(["text", "value", "test"])) - return text - - -@composite -def ftl_chaos_source(draw: st.DrawFn) -> str: - """Generate raw FTL source with chaos text for intensive parser fuzzing. - - Creates FTL-like structures with potentially invalid content to stress - test parser error handling and recovery mechanisms. - - Events emitted: - - strategy=chaos_{pattern}: Chaos injection pattern used (for HypoFuzz guidance) - - Generates variations like: - - msg = { unterminated - - msg = value { $var } more { unclosed - - msg = [ bracket ] confusion - """ - msg_id = draw(ftl_identifiers()) - chaos = draw(ftl_chaos_text()) - - # Choose chaos injection pattern - pattern = draw( - st.sampled_from([ - "plain", # msg = - "prefix_brace", # msg = { - "suffix_brace", # msg = } - "embedded_dollar", # msg = text $ more - "bracket_noise", # msg = [ ] - "mixed", # msg = { $x } { more - ]) - ) - - # Emit event for HypoFuzz coverage guidance - event(f"strategy=chaos_{pattern}") - - match pattern: - case "plain": - return f"{msg_id} = {chaos}" - case "prefix_brace": - return f"{msg_id} = {{ {chaos}" - case "suffix_brace": - return f"{msg_id} = {chaos} }}" - case "embedded_dollar": - prefix = draw(ftl_simple_text()) - return f"{msg_id} = {prefix} ${chaos}" - case "bracket_noise": - return f"{msg_id} = [ {chaos} ]" - case _: # mixed - var = draw(ftl_identifiers()) - return f"{msg_id} = {{ ${var} }} {chaos} {{ more" - - -@composite -def ftl_pathological_nesting(draw: st.DrawFn) -> str: - """Generate pathologically nested FTL for parser depth limit testing. - - Creates deeply nested structures that approach or exceed MAX_DEPTH: - - Nested placeables: { { { { $x } } } } - - Nested selects: { $a -> [x] { $b -> [y] value } } - - Events emitted: - - boundary={under|at|over}_max_depth: Depth boundary condition (for HypoFuzz) - - Used for testing: - - Parser depth guards - - Stack overflow prevention - - Error recovery at depth limits - """ - from ftllexengine.constants import MAX_DEPTH # noqa: PLC0415 - import inside function - - msg_id = draw(ftl_identifiers()) - - # Choose between boundary, at-limit, and over-limit with labels - depth_choice = draw( - st.sampled_from([ - (MAX_DEPTH - 5, "under"), # Safely within limits - (MAX_DEPTH - 1, "under"), # Just under limit - (MAX_DEPTH, "at"), # At limit - (MAX_DEPTH + 1, "over"), # Just over limit - (MAX_DEPTH + 10, "over"), # Well over limit - ]) - ) - depth, boundary_label = depth_choice - - # Emit boundary event for HypoFuzz coverage guidance - event(f"boundary={boundary_label}_max_depth") - event(f"depth={depth}") - - # Generate nested braces - open_braces = "{ " * depth - close_braces = " }" * depth - inner_var = draw(ftl_identifiers()) - - return f"{msg_id} = {open_braces}${inner_var}{close_braces}" - - -@composite -def ftl_multiline_chaos_source(draw: st.DrawFn) -> str: - """Generate multi-entry chaos FTL with line breaks at invalid positions. - - Events emitted: - - strategy=multiline_chaos_{pattern}: Chaos injection pattern (for HypoFuzz) - - D7 fix: Tests parser error recovery for multiline malformed input. - Real-world malformed FTL often involves: - - Continuation lines without proper indentation - - Entries split across unexpected boundaries - - CRLF mid-token - - Unclosed structures spanning multiple lines - """ - num_entries = draw(st.integers(min_value=2, max_value=4)) - entries: list[str] = [] - - pattern = draw( - st.sampled_from([ - "mid_identifier", # Line break inside identifier - "mid_placeable", # Line break inside placeable - "between_eq_value", # Line break between = and value - "unclosed_multiline", # Unclosed brace spanning lines - "bad_continuation", # Bad indentation on continuation - ]) - ) - event(f"strategy=multiline_chaos_{pattern}") - - for i in range(num_entries): - msg_id = f"msg{i}" - match pattern: - case "mid_identifier": - # Break identifier across lines (invalid) - entries.append(f"ms\ng{i} = value{i}") - case "mid_placeable": - # Break placeable across lines - entries.append(f"{msg_id} = text {{ $va\nr{i} }} more") - case "between_eq_value": - # Line break between = and value - entries.append(f"{msg_id} =\nvalue{i}") - case "unclosed_multiline": - # Unclosed brace spanning to next entry - if i < num_entries - 1: - entries.append(f"{msg_id} = {{ $var{i}") - else: - entries.append(f"{msg_id} = closed }}") - case _: # bad_continuation - # Tab indentation (invalid per FTL spec) - entries.append(f"{msg_id} = first line\n\tcontinuation") - - return "\n".join(entries) - - -@composite -def ftl_simple_messages(draw: st.DrawFn) -> str: - """Generate simple FTL messages (ID = value). - - Example: hello = Hello, world! - """ - msg_id = draw(ftl_identifiers()) - value = draw(ftl_simple_text()) - return f"{msg_id} = {value}" - - -@composite -def ftl_messages_with_placeables(draw: st.DrawFn) -> str: - """Generate FTL messages containing placeables. - - Example: greeting = Hello { $name }! - """ - msg_id = draw(ftl_identifiers()) - var_name = draw(ftl_identifiers()) - prefix = draw(ftl_simple_text()) - suffix = draw(st.text(alphabet=FTL_SAFE_CHARS, max_size=20)) - - return f"{msg_id} = {prefix} {{ ${var_name} }}{suffix}" - - -@composite -def ftl_terms(draw: st.DrawFn) -> str: - """Generate FTL term definitions. - - Example: -brand = Firefox - """ - term_id = draw(ftl_identifiers()) - value = draw(ftl_simple_text()) - return f"-{term_id} = {value}" - - -@composite -def ftl_comments(draw: st.DrawFn) -> str: - """Generate FTL comments (all types). - - Returns one of: # comment, ## group comment, ### resource comment - """ - level = draw(st.sampled_from(["#", "##", "###"])) - content = draw(ftl_simple_text()) - return f"{level} {content}" - - -@composite -def ftl_numbers(draw: st.DrawFn) -> int | Decimal: - """Generate valid FTL numbers. - - FTL number literals support format: -?[0-9]+(.[0-9]+)? - No scientific notation. Subnormal values are excluded because - their string representation uses scientific notation (e.g., 1e-308). - """ - return draw( - st.one_of( - st.integers(min_value=-1000000, max_value=1000000), - st.decimals( - min_value=Decimal(-1000000), - max_value=Decimal(1000000), - allow_nan=False, - allow_infinity=False, - ), - ) - ) - - -@composite -def ftl_financial_numbers(draw: st.DrawFn) -> Decimal: - """Generate financial-scale numbers for financial application testing. - - Events emitted: - - strategy=financial_{magnitude}: Number magnitude category (for HypoFuzz) - - strategy=financial_decimals_{n}: Decimal places (for ISO 4217 coverage) - - Financial applications handle amounts in billions (GDP, fund values, - transaction volumes). This strategy generates numbers across the full range - needed for financial formatting tests. - - Magnitude ranges: - - small: < 1,000 (retail transactions) - - medium: 1,000 - 1,000,000 (business transactions) - - large: 1M - 1B (enterprise, fund values) - - huge: > 1B (national accounts, GDP) - - Decimal places aligned with ISO 4217: - - 0 decimals: JPY, KRW, VND - - 2 decimals: USD, EUR, GBP (standard) - - 3 decimals: KWD, BHD, OMR - - 4 decimals: CLF, UYW (accounting units) - """ - magnitude = draw(st.sampled_from(["small", "medium", "large", "huge"])) - decimals = draw(st.sampled_from([0, 2, 3, 4])) - - match magnitude: - case "small": - base = draw(st.integers(min_value=-999, max_value=999)) - case "medium": - base = draw(st.integers(min_value=-999999, max_value=999999)) - case "large": - base = draw(st.integers(min_value=-999999999, max_value=999999999)) - case _: # huge - base = draw(st.integers(min_value=-999999999999, max_value=999999999999)) - - event(f"strategy=financial_{magnitude}") - event(f"strategy=financial_decimals_{decimals}") - - if decimals == 0: - return Decimal(base) - - # Add decimal component based on ISO 4217 decimal places using exact arithmetic. - divisor = 10 ** decimals - fraction = draw(st.integers(min_value=0, max_value=divisor - 1)) - return Decimal(base) + Decimal(fraction) / Decimal(divisor) - - -# ============================================================================= -# Identifier Case Strategies (for function bridge testing) -# ============================================================================= - - -@composite -def snake_case_identifiers(draw: st.DrawFn) -> str: - """Generate snake_case identifiers. - - Events emitted: - - bridge_id_parts={n}: Number of identifier parts - """ - parts = draw(st.lists(st.sampled_from(IDENTIFIER_PARTS), min_size=1, max_size=3)) - event(f"bridge_id_parts={len(parts)}") - return "_".join(parts) - - -@composite -def camel_case_identifiers(draw: st.DrawFn) -> str: - """Generate camelCase identifiers. - - Events emitted: - - bridge_id_parts={n}: Number of identifier parts - """ - parts = draw(st.lists(st.sampled_from(IDENTIFIER_PARTS), min_size=1, max_size=3)) - event(f"bridge_id_parts={len(parts)}") - if not parts: - return "value" - return parts[0] + "".join(p.capitalize() for p in parts[1:]) - - -# ============================================================================= -# Function Bridge Strategies -# ============================================================================= - - -@composite -def ftl_function_names(draw: st.DrawFn) -> str: - """Generate valid FTL function names (UPPERCASE identifiers). - - Events emitted: - - bridge_fname_len={n}: Length category of generated name - """ - name = draw( - st.text( - alphabet=st.characters( - whitelist_categories=("Lu",), # type: ignore[arg-type] - min_codepoint=65, - max_codepoint=90, - ), - min_size=1, - max_size=20, - ).filter(lambda s: s.isidentifier()) - ) - length = "short" if len(name) <= 5 else "long" - event(f"bridge_fname_len={length}") - return name - - -@composite -def fluent_numbers(draw: st.DrawFn) -> FluentNumber: - """Generate FluentNumber instances with diverse value/format combos. - - FluentNumber.value is int | Decimal (never float — precision requirement). - - Events emitted: - - bridge_fnum_type={type}: Value type (int, decimal) - - bridge_fnum_precision={n}: Precision category (none, 0, low, high) - """ - value_type = draw(st.sampled_from(["int", "decimal"])) - event(f"bridge_fnum_type={value_type}") - - # Draw precision category first for bucket-first uniform distribution. - # "none" represents FluentNumber.precision=None (unspecified precision). - prec_cat = draw(st.sampled_from(["none", "0", "low", "high"])) - - precision: int | None - places: int - if prec_cat == "none": - precision = None - places = 0 - elif prec_cat == "0": - precision = 0 - places = 0 - elif prec_cat == "low": - precision = draw(st.integers(min_value=1, max_value=2)) - places = precision - else: # high - precision = draw(st.integers(min_value=3, max_value=6)) - places = precision - - value: int | Decimal - int_part = draw(st.integers(min_value=-999999, max_value=999999)) - if value_type == "int": - value = int_part - formatted = str(value) - elif places > 0: - frac = draw( - st.integers(min_value=0, max_value=10**places - 1) - ) - frac_str = str(frac).zfill(places) - value = Decimal(f"{int_part}.{frac_str}") - formatted = str(value) - else: - value = Decimal(int_part) - formatted = str(value) - - event(f"bridge_fnum_precision={prec_cat}") - - return FluentNumber( - value=value, formatted=formatted, precision=precision - ) - - -# ============================================================================= -# AST Node Strategies (for serialization testing) -# ============================================================================= - - -@composite -def ftl_text_elements(draw: st.DrawFn) -> TextElement: - """Generate TextElement AST nodes.""" - value = draw(ftl_simple_text()) - return TextElement(value=value) - - -@composite -def ftl_variable_references(draw: st.DrawFn) -> VariableReference: - """Generate VariableReference AST nodes.""" - name = draw(ftl_identifiers()) - return VariableReference(id=Identifier(name=name)) - - -@composite -def ftl_number_literals(draw: st.DrawFn) -> NumberLiteral: - """Generate NumberLiteral AST nodes with valid FTL raw format. - - FTL number syntax: -?[0-9]+(.[0-9]+)? - No scientific notation allowed. Uses fixed-point notation for Decimals. - """ - value = draw(ftl_numbers()) - - # Ensure raw string uses fixed-point notation (no scientific notation) - # str(Decimal) may use 'E' notation for very small/large values - raw = format(value, "f") if isinstance(value, Decimal) else str(value) - - return NumberLiteral(value=value, raw=raw) - - -@composite -def ftl_string_literals(draw: st.DrawFn) -> StringLiteral: - """Generate StringLiteral AST nodes.""" - value = draw(ftl_simple_text()) - return StringLiteral(value=value) - - -@composite -def ftl_named_arguments(draw: st.DrawFn) -> NamedArgument: - """Generate NamedArgument AST nodes for function calls. - - Named arguments have the form: key: value - Example: minimumFractionDigits: 2 - """ - name = draw(ftl_identifiers()) - # Per FTL spec EBNF: named-argument ::= identifier ":" literal - # where literal ::= number-literal | quoted-literal - # Named argument values are constrained to StringLiteral and NumberLiteral only. - value = draw( - st.one_of( - ftl_string_literals(), - ftl_number_literals(), - ) - ) - return NamedArgument(name=Identifier(name=name), value=value) - - -@composite -def ftl_call_arguments(draw: st.DrawFn) -> CallArguments: - """Generate CallArguments AST nodes for function/term calls. - - Call arguments consist of positional and named arguments. - Example: $count, minimumFractionDigits: 2 - """ - # Generate 0-3 positional arguments - num_positional = draw(st.integers(min_value=0, max_value=3)) - positional = tuple( - draw( - st.one_of( - ftl_variable_references(), - ftl_string_literals(), - ftl_number_literals(), - ) - ) - for _ in range(num_positional) - ) - - # Generate 0-3 named arguments with unique names - num_named = draw(st.integers(min_value=0, max_value=3)) - named_keys = draw( - st.lists( - st.sampled_from([ - "minimumFractionDigits", - "maximumFractionDigits", - "useGrouping", - "style", - "currency", - "dateStyle", - "timeStyle", - ]), - min_size=num_named, - max_size=num_named, - unique=True, - ) - ) - named = tuple( - NamedArgument( - name=Identifier(name=key), - value=draw( - st.one_of( - ftl_string_literals(), - ftl_number_literals(), - ) - ), - ) - for key in named_keys - ) - - return CallArguments(positional=positional, named=named) - - -@composite -def ftl_function_references(draw: st.DrawFn) -> FunctionReference: - """Generate FunctionReference AST nodes. - - Function references are UPPERCASE per Fluent convention. - Example: NUMBER($count, minimumFractionDigits: 2) - """ - # Use realistic builtin function names - func_name = draw( - st.sampled_from([ - "NUMBER", - "DATETIME", - "CURRENCY", - "PLURAL", - "CUSTOM", - ]) - ) - arguments = draw(ftl_call_arguments()) - return FunctionReference( - id=Identifier(name=func_name), - arguments=arguments, - ) - - -@composite -def ftl_term_references(draw: st.DrawFn) -> TermReference: - """Generate TermReference AST nodes. - - Term references start with - and may have attributes and arguments. - Example: -brand, -brand.short, -term(case: "genitive") - """ - term_id = draw(ftl_identifiers()) - # Optionally include an attribute reference - has_attr = draw(st.booleans()) - attribute = Identifier(name=draw(ftl_identifiers())) if has_attr else None - # Optionally include arguments (for parameterized terms) - has_args = draw(st.booleans()) - arguments = draw(ftl_call_arguments()) if has_args else None - - return TermReference( - id=Identifier(name=term_id), - attribute=attribute, - arguments=arguments, - ) - - -@composite -def ftl_message_references(draw: st.DrawFn) -> MessageReference: - """Generate MessageReference AST nodes. - - Message references refer to other messages, optionally with attributes. - Example: other-message, other-message.title - """ - msg_id = draw(ftl_identifiers()) - # Optionally include an attribute reference - has_attr = draw(st.booleans()) - attribute = Identifier(name=draw(ftl_identifiers())) if has_attr else None - - return MessageReference( - id=Identifier(name=msg_id), - attribute=attribute, - ) - - -@composite -def ftl_placeables(draw: st.DrawFn, max_depth: int = 2) -> Placeable: - """Generate Placeable AST nodes with comprehensive expression coverage. - - Generates all InlineExpression types defined in the Fluent spec: - - StringLiteral, NumberLiteral, VariableReference (simple) - - MessageReference, TermReference, FunctionReference (references) - - Nested Placeable (recursive) - - Uses weighted probability to control explosion while ensuring coverage. - - Events emitted: - - strategy=placeable_{choice}: Expression type generated (for HypoFuzz guidance) - - Args: - draw: Hypothesis draw function - max_depth: Maximum nesting depth (default 2 to avoid explosion) - """ - expression: Expression - if max_depth <= 0: - # Base case: only simple leaf expressions - choice = draw(st.sampled_from(["variable", "string", "number"])) - match choice: - case "variable": - expression = draw(ftl_variable_references()) - case "string": - expression = draw(ftl_string_literals()) - case _: # number - expression = draw(ftl_number_literals()) - event(f"strategy=placeable_{choice}_leaf") - else: - # Choose expression type with weighted probability: - # - Simple types (variable, string, number): 60% - common cases - # - References (message, term, function): 30% - complex but important - # - Nested/select: 10% - recursive, expensive - choice = draw( - st.sampled_from([ - # Simple types (6x weight) - "variable", "variable", "variable", - "string", "string", - "number", - # Reference types (3x weight) - "message_ref", - "term_ref", - "function_ref", - # Recursive types (1x weight) - "nested", - ]) - ) - - match choice: - case "variable": - expression = draw(ftl_variable_references()) - case "string": - expression = draw(ftl_string_literals()) - case "number": - expression = draw(ftl_number_literals()) - case "message_ref": - expression = draw(ftl_message_references()) - case "term_ref": - expression = draw(ftl_term_references()) - case "function_ref": - expression = draw(ftl_function_references()) - case _: # nested - inner = draw(ftl_placeables(max_depth=max_depth - 1)) - expression = inner.expression - - # Emit event for HypoFuzz coverage guidance - event(f"strategy=placeable_{choice}") - - return Placeable(expression=expression) - - -@composite -def ftl_deep_placeables(draw: st.DrawFn, depth: int = 5) -> Placeable: - """Generate deeply nested Placeable structures for depth limit testing. - - Creates chains of nested placeables up to the specified depth. - Used for testing parser/serializer depth guards. - - Events emitted: - - strategy=deep_placeable_depth={n}: Current nesting depth - """ - event(f"strategy=deep_placeable_depth={depth}") - - if depth <= 1: - return Placeable(expression=draw(ftl_variable_references())) - - inner = draw(ftl_deep_placeables(depth=depth - 1)) - return Placeable(expression=inner.expression) - - -@composite -def ftl_reference_placeables(draw: st.DrawFn) -> Placeable: - """Generate placeables with reference expressions only. - - Targeted strategy for fuzzing the previously-underexposed reference types: - - FunctionReference: { NUMBER($x) } - - TermReference: { -brand } - - MessageReference: { other-message } - - Used for intensive coverage of function/term/message reference parsing - and resolution paths. - """ - expression = draw( - st.one_of( - ftl_function_references(), - ftl_term_references(), - ftl_message_references(), - ) - ) - return Placeable(expression=expression) - - -@composite -def ftl_boundary_depth_placeables(draw: st.DrawFn) -> Placeable: - """Generate placeables at MAX_DEPTH boundary for limit testing. - - Events emitted: - - boundary={under|at|over}_max_depth: Depth boundary condition - - Specifically targets the boundary conditions around MAX_DEPTH: - - MAX_DEPTH - 1: Just under limit (should succeed) - - MAX_DEPTH: At limit (should succeed or fail cleanly) - - MAX_DEPTH + 1: Just over limit (should fail cleanly) - - Used for testing: - - Parser depth guards - - Serializer depth guards - - Resolver depth tracking - """ - from ftllexengine.constants import MAX_DEPTH # noqa: PLC0415 - import inside function - - # Choose boundary point - boundary = draw( - st.sampled_from([ - ("under", MAX_DEPTH - 1), - ("at", MAX_DEPTH), - ("over", MAX_DEPTH + 1), - ]) - ) - label, depth = boundary - - # Emit boundary event for HypoFuzz coverage guidance - event(f"boundary={label}_max_depth") - - # Generate nested placeable at chosen depth - return draw(ftl_deep_placeables(depth=min(depth, 150))) # Cap at 150 for safety - - -@composite -def ftl_boundary_depth_messages(draw: st.DrawFn) -> Message: - """Generate Message AST nodes with boundary-depth patterns. - - Creates complete Message nodes containing deeply nested structures - at the MAX_DEPTH boundary for integration testing. - """ - from ftllexengine.constants import MAX_DEPTH # noqa: PLC0415 - import inside function - - msg_id = Identifier(name=draw(ftl_identifiers())) - - # Choose depth relative to MAX_DEPTH - depth_offset = draw(st.sampled_from([-1, 0, 1])) - depth = MAX_DEPTH + depth_offset - - # Generate pattern with deeply nested placeable - deep_placeable = draw(ftl_deep_placeables(depth=min(depth, 150))) - pattern = Pattern(elements=(TextElement(value="Prefix "), deep_placeable)) - - return Message(id=msg_id, value=pattern, attributes=()) - - -@composite -def ftl_patterns(draw: st.DrawFn) -> Pattern: - """Generate Pattern AST nodes with mixed elements.""" - elements = draw( - st.lists( - st.one_of(ftl_text_elements(), ftl_placeables()), - min_size=1, - max_size=4, - ) - ) - return Pattern(elements=tuple(elements)) - - -@composite -def ftl_variants(draw: st.DrawFn) -> Variant: - """Generate individual Variant AST nodes for select expressions. - - WARNING: This strategy generates variants with RANDOM default flags. - FTL SelectExpression requires EXACTLY ONE default variant (marked with *). - Using this strategy directly to build SelectExpression will likely fail - validation in SelectExpression.__post_init__. - - For valid SelectExpression generation, use ftl_select_expressions() which - properly manages the exactly-one-default invariant. - - This strategy is intended for: - - Testing individual variant serialization - - Type guard testing - - Low-level AST manipulation tests - """ - key = draw( - st.one_of( - st.builds(Identifier, name=ftl_identifiers()), - ftl_number_literals(), - ) - ) - value = draw(ftl_patterns()) - default = draw(st.booleans()) - return Variant(key=key, value=value, default=default) - - -@composite -def ftl_select_expressions(draw: st.DrawFn) -> SelectExpression: - """Generate SelectExpression AST nodes with valid variants. - - Events emitted: - - strategy=select_selector_{type}: Selector expression type (for HypoFuzz) - - strategy=select_variants_{n}: Number of variants generated - - Ensures: - - Exactly one default variant (per Fluent spec) - - Unique variant keys (per Fluent spec) - - D3 fix: Selector can be any InlineExpression, not just VariableReference. - Per FTL spec, common patterns include NUMBER($count) for locale-aware plurals. - """ - # D3 fix: Generate diverse selector types with weighted probability - selector_type = draw( - st.sampled_from([ - "variable", "variable", "variable", "variable", # 40% variable - "number", "number", # 20% number literal - "function", "function", # 20% function (e.g., NUMBER($x)) - "string", # 10% string literal - "term_ref", # 10% term reference - ]) - ) - - selector: InlineExpression - match selector_type: - case "variable": - selector = draw(ftl_variable_references()) - case "number": - selector = draw(ftl_number_literals()) - case "function": - selector = draw(ftl_function_references()) - case "string": - selector = draw(ftl_string_literals()) - case _: # term_ref - selector = draw(ftl_term_references()) - - event(f"strategy=select_selector_{selector_type}") - - # Generate 2-4 unique variant keys using st.sampled_from predefined set - # This avoids expensive rejection-based uniqueness while ensuring valid keys - num_variants = draw(st.integers(min_value=2, max_value=4)) - - # Emit event for HypoFuzz coverage guidance - event(f"strategy=select_variants_{num_variants}") - - # Use predefined unique key names (efficient, no rejection needed) - available_keys = ["one", "two", "three", "four", "five", "other", "zero"] - key_names = draw( - st.lists( - st.sampled_from(available_keys), - min_size=num_variants, - max_size=num_variants, - unique=True, - ) - ) - unique_keys = [Identifier(name=name) for name in key_names] - - # Generate variant values - values = [draw(ftl_patterns()) for _ in range(num_variants)] - - # Choose exactly one variant to be the default - default_index = draw(st.integers(min_value=0, max_value=num_variants - 1)) - - variants = tuple( - Variant(key=unique_keys[i], value=values[i], default=i == default_index) - for i in range(num_variants) - ) - - return SelectExpression(selector=selector, variants=variants) - - -@composite -def ftl_select_expressions_with_number_keys(draw: st.DrawFn) -> SelectExpression: - """Generate SelectExpression with NumberLiteral variant keys. - - Events emitted: - - strategy=select_number_keys: SelectExpression with numeric keys - - Used to test serialization branch for NumberLiteral variant keys. - Per Fluent spec, variant keys can be either Identifier or NumberLiteral. - """ - selector = draw(ftl_variable_references()) - - # Generate 2-4 numeric variant keys - num_variants = draw(st.integers(min_value=2, max_value=4)) - - # Emit event for HypoFuzz coverage guidance - event("strategy=select_number_keys") - - # Generate unique numeric keys (0, 1, 2, etc.) - numeric_keys = [NumberLiteral(value=Decimal(str(i)), raw=str(i)) for i in range(num_variants)] - - # Generate variant values - values = [draw(ftl_patterns()) for _ in range(num_variants)] - - # Choose exactly one variant to be the default - default_index = draw(st.integers(min_value=0, max_value=num_variants - 1)) - - variants = tuple( - Variant(key=numeric_keys[i], value=values[i], default=i == default_index) - for i in range(num_variants) - ) - - return SelectExpression(selector=selector, variants=variants) - - -@composite -def ftl_function_references_no_args(draw: st.DrawFn) -> FunctionReference: - """Generate FunctionReference without arguments. - - Events emitted: - - strategy=function_no_args: FunctionReference with empty arguments - - Used to test serialization branch for FunctionReference without arguments. - While uncommon in practice, the AST structure permits CallArguments with - empty positional and named tuples. - """ - # Use realistic builtin function names - func_name = draw( - st.sampled_from([ - "NUMBER", - "DATETIME", - "CURRENCY", - "PLURAL", - "CUSTOM", - ]) - ) - - # Emit event for HypoFuzz coverage guidance - event("strategy=function_no_args") - - # Create CallArguments with no arguments - arguments = CallArguments(positional=(), named=()) - - return FunctionReference( - id=Identifier(name=func_name), - arguments=arguments, - ) - - -@composite -def ftl_attribute_nodes(draw: st.DrawFn) -> Attribute: - """Generate Attribute AST nodes for messages and terms. - - Events emitted: - - strategy=attribute: Attribute node generated (for HypoFuzz guidance) - - Attributes are key-value pairs attached to messages/terms: - - .title = Button Title - - .aria-label = Accessible label - - .accesskey = B - """ - attr_id = Identifier(name=draw(ftl_identifiers())) - attr_value = draw(ftl_patterns()) - - event("strategy=attribute") - return Attribute(id=attr_id, value=attr_value) - - -@composite -def ftl_message_nodes(draw: st.DrawFn, *, include_attributes: bool = True) -> Message: - """Generate Message AST nodes. - - Events emitted: - - strategy=message_{with|no}_attrs: Message attribute presence (for HypoFuzz) - - Messages must have a value (pattern). Messages without values - are invalid FTL and get parsed as Junk. - - Args: - include_attributes: If True, 30% chance of generating attributes. - """ - id_val = Identifier(name=draw(ftl_identifiers())) - value = draw(ftl_patterns()) - - # 30% chance of attributes when enabled (D1 fix) - attributes: tuple[Attribute, ...] = () - if include_attributes and draw(st.integers(min_value=0, max_value=9)) < 3: - num_attrs = draw(st.integers(min_value=1, max_value=3)) - # Generate unique attribute names - attr_names = draw( - st.lists(ftl_identifiers(), min_size=num_attrs, max_size=num_attrs, unique=True) - ) - attributes = tuple( - Attribute(id=Identifier(name=name), value=draw(ftl_patterns())) - for name in attr_names - ) - event("strategy=message_with_attrs") - else: - event("strategy=message_no_attrs") - - return Message(id=id_val, value=value, attributes=attributes) - - -@composite -def ftl_comment_nodes(draw: st.DrawFn) -> Comment: - """Generate Comment AST nodes of all types. - - Events emitted: - - strategy=comment_{type}: Comment type generated (for HypoFuzz guidance) - - Generates all three FTL comment types per spec: - - COMMENT: # Single comment - - GROUP: ## Group comment (applies to following entries) - - RESOURCE: ### Resource comment (file-level) - """ - content = draw(ftl_simple_text()) - # D5 fix: Generate all comment types with weighted probability - comment_type = draw( - st.sampled_from([ - CommentType.COMMENT, - CommentType.COMMENT, - CommentType.COMMENT, # 60% regular - CommentType.GROUP, # 20% group - CommentType.RESOURCE, # 20% resource - ]) - ) - event(f"strategy=comment_{comment_type.name.lower()}") - return Comment(content=content, type=comment_type) - - -@composite -def ftl_junk_nodes(draw: st.DrawFn) -> Junk: - """Generate Junk AST nodes.""" - content = draw(st.text(min_size=1, max_size=50)) - return Junk(content=content) - - -@composite -def ftl_term_nodes(draw: st.DrawFn, *, include_attributes: bool = True) -> Term: - """Generate Term AST nodes. - - Events emitted: - - strategy=term_{with|no}_attrs: Term attribute presence (for HypoFuzz) - - Args: - include_attributes: If True, 30% chance of generating attributes. - """ - id_val = Identifier(name=draw(ftl_identifiers())) - value = draw(ftl_patterns()) - - # 30% chance of attributes when enabled (D1 fix) - attributes: tuple[Attribute, ...] = () - if include_attributes and draw(st.integers(min_value=0, max_value=9)) < 3: - num_attrs = draw(st.integers(min_value=1, max_value=3)) - attr_names = draw( - st.lists(ftl_identifiers(), min_size=num_attrs, max_size=num_attrs, unique=True) - ) - attributes = tuple( - Attribute(id=Identifier(name=name), value=draw(ftl_patterns())) - for name in attr_names - ) - event("strategy=term_with_attrs") - else: - event("strategy=term_no_attrs") - - return Term(id=id_val, value=value, attributes=attributes) - - -@composite -def ftl_resources(draw: st.DrawFn) -> Resource: - """Generate complete Resource AST nodes with messages, terms, and comments. - - Events emitted: - - strategy=resource_entry_{type}: Entry types included (for HypoFuzz guidance) - - Generates mixed entry types reflecting real FTL files: - - 60% messages (primary content) - - 20% terms (reusable snippets) - - 20% comments (documentation) - - Ensures unique IDs within each namespace (messages vs terms are separate). - """ - entries = draw( - st.lists( - st.one_of( - # D2 fix: Include terms in resource generation - ftl_message_nodes(), # 60% messages (3x weight) - ftl_message_nodes(), - ftl_message_nodes(), - ftl_term_nodes(), # 20% terms - ftl_comment_nodes(), # 20% comments - ), - min_size=1, - max_size=5, - ) - ) - - # Deduplicate IDs within each namespace (messages and terms are separate) - seen_message_ids: set[str] = set() - seen_term_ids: set[str] = set() - unique_entries: list[Message | Term | Comment] = [] - - for entry in entries: - match entry: - case Message(id=ident): - if ident.name not in seen_message_ids: - seen_message_ids.add(ident.name) - unique_entries.append(entry) - event("strategy=resource_entry_message") - case Term(id=ident): - if ident.name not in seen_term_ids: - seen_term_ids.add(ident.name) - unique_entries.append(entry) - event("strategy=resource_entry_term") - case Comment(): - unique_entries.append(entry) - event("strategy=resource_entry_comment") - - return Resource(entries=tuple(unique_entries)) - - -@composite -def any_ast_entry(draw: st.DrawFn) -> Message | Term | Comment | Junk: - """Generate any AST entry type for type guard testing.""" - return draw( - st.one_of( - ftl_message_nodes(), - ftl_term_nodes(), - ftl_comment_nodes(), - ftl_junk_nodes(), - ) - ) - - -@composite -def any_ast_pattern_element(draw: st.DrawFn) -> TextElement | Placeable: - """Generate any pattern element type for type guard testing.""" - return draw(st.one_of(ftl_text_elements(), ftl_placeables())) - - -# ============================================================================= -# Edge Case Strategies (boundary testing) -# ============================================================================= - - -@composite -def ftl_boundary_identifiers(draw: st.DrawFn) -> str: - """Generate boundary-case identifiers. - - Tests: single char, very long, edge characters. - Uses FTL_IDENTIFIER_FIRST_CHARS per spec (includes uppercase). - """ - case = draw(st.sampled_from(["single", "long", "numeric", "hyphen", "underscore"])) - match case: - case "single": - return draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) - case "long": - first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) - return first + "x" * draw(st.integers(50, 100)) - case "numeric": - first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) - return first + "123456789" - case "hyphen": - first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) - return first + "-" + draw(ftl_identifiers()) - case _: # underscore - first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) - return first + "_" + draw(ftl_identifiers()) - - -@composite -def ftl_empty_pattern_messages(draw: st.DrawFn) -> str: - """Generate messages with minimal/empty patterns. - - Edge case: message = (with trailing space only) - """ - msg_id = draw(ftl_identifiers()) - case = draw(st.sampled_from(["space", "single", "newline"])) - match case: - case "space": - return f"{msg_id} = " - case "single": - return f"{msg_id} = x" - case _: - return f"{msg_id} =\n" - - -@composite -def ftl_multiline_messages(draw: st.DrawFn) -> str: - """Generate multiline FTL messages. - - Tests continuation line handling with various indentation. - """ - msg_id = draw(ftl_identifiers()) - line1 = draw(ftl_simple_text()) - indent = " " * draw(st.integers(1, 8)) - line2 = draw(ftl_simple_text()) - - return f"{msg_id} = {line1}\n{indent}{line2}" - - -# ============================================================================= -# Recursive Strategies (deep nesting tests) -# ============================================================================= - - -def _ensure_unique_variant_keys_with_default( - variants: list[Variant], -) -> tuple[Variant, ...]: - """Ensure variants have unique keys and at least one default.""" - seen_keys: set[str] = set() - unique_variants: list[Variant] = [] - - for v in variants: - key_name = v.key.name if hasattr(v.key, "name") else str(v.key.value) - if key_name not in seen_keys: - seen_keys.add(key_name) - unique_variants.append(v) - - # Ensure at least 2 variants - if len(unique_variants) < 2: - unique_variants.append( - Variant( - key=Identifier(name="fallback"), - value=Pattern(elements=(TextElement(value="other"),)), - default=False, - ) - ) - - # Ensure exactly one default variant (required by SelectExpression.__post_init__) - # First, strip all defaults - unique_variants = [ - Variant(key=v.key, value=v.value, default=False) for v in unique_variants - ] - # Then set exactly the last one as default - unique_variants[-1] = Variant( - key=unique_variants[-1].key, - value=unique_variants[-1].value, - default=True, - ) - - return tuple(unique_variants) - - -def ftl_deeply_nested_selects( - max_depth: int = 5, -) -> st.SearchStrategy[SelectExpression]: - """Generate deeply nested select expressions. - - Used for validator stress testing - creates selects with nested selects - as selectors, up to max_depth levels deep. - - Args: - max_depth: Maximum nesting depth for select expressions - - Returns: - Strategy generating SelectExpression with possible nesting - """ - base_select = st.builds( - SelectExpression, - selector=ftl_variable_references(), - variants=st.lists(ftl_variants(), min_size=2, max_size=4).map( - _ensure_unique_variant_keys_with_default - ), - ) - - def extend( - children: st.SearchStrategy[SelectExpression], - ) -> st.SearchStrategy[SelectExpression]: - return st.builds( - SelectExpression, - selector=children, - variants=st.lists(ftl_variants(), min_size=2, max_size=4).map( - _ensure_unique_variant_keys_with_default - ), - ) - - return st.recursive(base_select, extend, max_leaves=max_depth) - - -# ============================================================================= -# AST Mutation Strategies -# ============================================================================= - - -@composite -def mutate_identifier(draw: st.DrawFn, identifier: Identifier) -> Identifier: - """Mutate an identifier by changing its name.""" - mutation_type = draw(st.sampled_from(["prefix", "suffix", "replace", "case"])) - - match mutation_type: - case "prefix": - new_name = "mut_" + identifier.name - case "suffix": - new_name = identifier.name + "_mut" - case "replace": - new_name = draw(ftl_identifiers()) - case _: # case - new_name = identifier.name.swapcase() - - return Identifier(name=new_name) - - -@composite -def mutate_text_element(draw: st.DrawFn, element: TextElement) -> TextElement: - """Mutate a text element's value.""" - mutation_type = draw(st.sampled_from(["append", "prepend", "replace", "empty"])) - - match mutation_type: - case "append": - new_value = element.value + draw(ftl_simple_text()) - case "prepend": - new_value = draw(ftl_simple_text()) + element.value - case "replace": - new_value = draw(ftl_simple_text()) - case _: # empty - new_value = " " - - return TextElement(value=new_value) - - -@composite -def mutate_pattern(draw: st.DrawFn, pattern: Pattern) -> Pattern: - """Mutate a pattern by modifying its elements.""" - if not pattern.elements: - # Empty pattern - add an element - new_elements = (draw(ftl_text_elements()),) - return Pattern(elements=new_elements) - - mutation_type = draw(st.sampled_from(["add", "remove", "modify"])) - - elements = list(pattern.elements) - - match mutation_type: - case "add": - new_elem = draw(st.one_of(ftl_text_elements(), ftl_placeables())) - pos = draw(st.integers(0, len(elements))) - elements.insert(pos, new_elem) - case "remove": - if len(elements) > 1: - idx = draw(st.integers(0, len(elements) - 1)) - elements.pop(idx) - case _: # modify - if elements: - idx = draw(st.integers(0, len(elements) - 1)) - if isinstance(elements[idx], TextElement): - elem = elements[idx] - elements[idx] = draw(mutate_text_element(elem)) # type: ignore[arg-type] - - return Pattern(elements=tuple(elements)) - - -@composite -def mutate_message(draw: st.DrawFn, message: Message) -> Message: - """Mutate a message (id, value, or attributes).""" - mutation_type = draw(st.sampled_from(["id", "value", "add_attr", "remove_attr"])) - - new_id = message.id - new_value = message.value - new_attrs = list(message.attributes) - - match mutation_type: - case "id": - new_id = draw(mutate_identifier(message.id)) - case "value": - if message.value: - new_value = draw(mutate_pattern(message.value)) - case "add_attr": - attr = Attribute( - id=Identifier(name=draw(ftl_identifiers())), - value=draw(ftl_patterns()), - ) - new_attrs.append(attr) - case _: # remove_attr - if new_attrs: - idx = draw(st.integers(0, len(new_attrs) - 1)) - new_attrs.pop(idx) - - return Message(id=new_id, value=new_value, attributes=tuple(new_attrs)) - - -@composite -def swap_variant_keys(draw: st.DrawFn, select: SelectExpression) -> SelectExpression: - """Swap variant keys in a select expression.""" - variants = list(select.variants) - - if len(variants) < 2: - return select - - # Swap two random variants' keys - idx1, idx2 = draw(st.lists(st.integers(0, len(variants) - 1), min_size=2, max_size=2)) - if idx1 != idx2: - key1 = variants[idx1].key - key2 = variants[idx2].key - variants[idx1] = Variant( - key=key2, value=variants[idx1].value, default=variants[idx1].default - ) - variants[idx2] = Variant( - key=key1, value=variants[idx2].value, default=variants[idx2].default - ) - - return SelectExpression(selector=select.selector, variants=tuple(variants)) - - -# ============================================================================= -# Resolver Argument Strategies -# ============================================================================= - - -@composite -def resolver_string_args(draw: st.DrawFn) -> dict[str, str]: - """Generate string-only resolver arguments.""" - keys = draw(st.lists(ftl_identifiers(), min_size=0, max_size=5, unique=True)) - return {k: draw(ftl_simple_text()) for k in keys} - - -@composite -def resolver_number_args(draw: st.DrawFn) -> dict[str, int | Decimal]: - """Generate number-only resolver arguments.""" - keys = draw(st.lists(ftl_identifiers(), min_size=0, max_size=5, unique=True)) - return {k: draw(ftl_numbers()) for k in keys} - - -@composite -def resolver_mixed_args(draw: st.DrawFn) -> dict[str, str | int | Decimal]: - """Generate mixed-type resolver arguments.""" - keys = draw(st.lists(ftl_identifiers(), min_size=0, max_size=5, unique=True)) - result: dict[str, str | int | Decimal] = {} - - for k in keys: - value: str | int | Decimal = draw( - st.one_of( - ftl_simple_text(), - st.integers(min_value=-1000000, max_value=1000000), - st.decimals( - min_value=Decimal(-1000000), - max_value=Decimal(1000000), - allow_nan=False, - allow_infinity=False, - ), - ) - ) - result[k] = value - - return result - - -@composite -def resolver_edge_case_args(draw: st.DrawFn) -> dict[str, str | int | Decimal]: - """Generate edge case resolver arguments.""" - edge_values: list[str | int | Decimal] = [ - "", # Empty string - " ", # Whitespace only - "0", # Zero as string - 0, # Zero - -1, # Negative - Decimal(0), # Decimal zero - Decimal("0.1"), # Small decimal - Decimal(10000000000), # Large number - Decimal(-10000000000), # Large negative - ] - - keys = draw(st.lists(ftl_identifiers(), min_size=1, max_size=3, unique=True)) - return {k: draw(st.sampled_from(edge_values)) for k in keys} - - -# ============================================================================= -# Deeply Nested AST Strategies -# ============================================================================= - - -@composite -def deeply_nested_placeables(draw: st.DrawFn, depth: int = 10) -> Placeable: - """Generate deeply nested placeables: { { { ... { $var } ... } } }.""" - # Start with innermost expression - inner: VariableReference | Placeable = draw(ftl_variable_references()) - - # Wrap in placeables - for _ in range(depth): - inner = Placeable(expression=inner) - - return inner # type: ignore[return-value] - - -def deeply_nested_message_chain(depth: int = 10) -> st.SearchStrategy[Resource]: - """Generate a chain of messages referencing each other.""" - messages: list[Message] = [] - - for i in range(depth): - msg_id = Identifier(name=f"msg{i}") - - if i < depth - 1: - # Reference next message - ref = MessageReference(id=Identifier(name=f"msg{i + 1}"), attribute=None) - pattern = Pattern(elements=(Placeable(expression=ref),)) - else: - # Terminal message - pattern = Pattern(elements=(TextElement(value="End of chain"),)) - - messages.append(Message(id=msg_id, value=pattern, attributes=())) - - return st.just(Resource(entries=tuple(messages))) - - -@composite -def deeply_nested_select(draw: st.DrawFn, depth: int = 5) -> SelectExpression: - """Generate deeply nested select expressions.""" - # Base case: simple select - base_selector = draw(ftl_variable_references()) - base_variants = ( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="One"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=True, - ), - ) - - current = SelectExpression(selector=base_selector, variants=base_variants) - - # Wrap in additional selects - for i in range(depth - 1): - # Use current select as value in a variant - wrapper_variants = ( - Variant( - key=Identifier(name=f"nested{i}"), - value=Pattern(elements=(Placeable(expression=current),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value=f"Fallback {i}"),)), - default=True, - ), - ) - current = SelectExpression( - selector=draw(ftl_variable_references()), - variants=wrapper_variants, - ) - - return current - - -def wide_resource(width: int = 50) -> st.SearchStrategy[Resource]: - """Generate a resource with many messages (width test).""" - messages: list[Message] = [] - - for i in range(width): - msg = Message( - id=Identifier(name=f"msg{i}"), - value=Pattern(elements=(TextElement(value=f"Message {i}"),)), - attributes=(), - ) - messages.append(msg) - - return st.just(Resource(entries=tuple(messages))) - - -def message_with_many_attributes(attr_count: int = 20) -> st.SearchStrategy[Message]: - """Generate a message with many attributes.""" - attrs: list[Attribute] = [] - - for i in range(attr_count): - attr = Attribute( - id=Identifier(name=f"attr{i}"), - value=Pattern(elements=(TextElement(value=f"Attribute {i}"),)), - ) - attrs.append(attr) - - return st.just( - Message( - id=Identifier(name="many_attrs"), - value=Pattern(elements=(TextElement(value="Main value"),)), - attributes=tuple(attrs), - ) - ) - - -# ============================================================================= -# Whitespace Edge Case Strategies (for fuzzing whitespace handling bugs) -# ============================================================================= - - -# Line ending variations for mixed line ending tests -_LINE_ENDINGS: tuple[str, ...] = ("\n", "\r\n", "\r") - - -@composite -def blank_line(draw: st.DrawFn) -> str: - """Generate a blank line containing only spaces. - - Tests blank line handling in patterns and between entries. - Per FTL spec, blank lines may contain spaces but no other content. - """ - space_count = draw(st.integers(min_value=0, max_value=8)) - return " " * space_count - - -@composite -def blank_lines_sequence(draw: st.DrawFn) -> str: - """Generate a sequence of blank lines with varying whitespace. - - Tests handling of multiple consecutive blank lines, which affects: - - Comment separation logic - - Pattern indentation calculation - - Entry boundary detection - """ - line_count = draw(st.integers(min_value=1, max_value=5)) - lines: list[str] = [] - for _ in range(line_count): - spaces = draw(st.integers(min_value=0, max_value=4)) - lines.append(" " * spaces) - return "\n".join(lines) - - -@composite -def text_with_trailing_whitespace(draw: st.DrawFn) -> str: - """Generate text with trailing whitespace (spaces or tabs). - - Tests trailing whitespace handling which can affect: - - Pattern value boundaries - - Serializer output normalization - - Roundtrip consistency - """ - base_text = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=30)) - # Ensure base has content - if base_text.strip() == "": - base_text = draw(st.sampled_from(string.ascii_letters)) - - trailing_type = draw(st.sampled_from(["spaces", "tabs", "mixed"])) - count = draw(st.integers(min_value=1, max_value=4)) - - match trailing_type: - case "spaces": - trailing = " " * count - case "tabs": - trailing = "\t" * count - case _: # mixed - trailing = " \t" * count - - return base_text + trailing - - -@composite -def text_with_tabs(draw: st.DrawFn) -> str: - """Generate text containing tab characters. - - Per FTL spec, tabs are NOT valid whitespace and should create Junk - when appearing in syntactic positions (e.g., indentation, between - identifier and equals sign). This strategy generates text with - embedded tabs for rejection testing. - """ - prefix = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=15)) - if prefix.strip() == "": - prefix = draw(st.sampled_from(string.ascii_letters)) - - tab_position = draw(st.sampled_from(["middle", "start", "end"])) - - match tab_position: - case "middle": - suffix = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=15)) - if suffix.strip() == "": - suffix = draw(st.sampled_from(string.ascii_letters)) - return prefix + "\t" + suffix - case "start": - return "\t" + prefix - case _: # end - return prefix + "\t" - - -@composite -def mixed_line_endings_text(draw: st.DrawFn) -> str: - """Generate multi-line text with mixed line endings. - - Tests CRLF normalization handling: - - Unix (LF): \\n - - Windows (CRLF): \\r\\n - - Legacy Mac (CR): \\r - - Mixed line endings in the same file is a real-world scenario - that can occur from cross-platform editing. - """ - line_count = draw(st.integers(min_value=2, max_value=5)) - lines: list[str] = [] - - for _ in range(line_count): - # Generate line content - line = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=20)) - if line.strip() == "": - line = draw(st.sampled_from(string.ascii_letters)) - lines.append(line) - - # Join with random line endings - result_parts: list[str] = [] - for i, line in enumerate(lines): - result_parts.append(line) - if i < len(lines) - 1: - ending = draw(st.sampled_from(_LINE_ENDINGS)) - result_parts.append(ending) - - return "".join(result_parts) - - -@composite -def variant_key_with_whitespace(draw: st.DrawFn) -> str: - """Generate variant key with whitespace inside brackets. - - Tests FTL-GRAMMAR-003 and SPEC-VARIANT-WHITESPACE-001: - - Spaces after opening bracket: [ one] - - Spaces before closing bracket: [one ] - - Newlines inside variant key: [ \\n one \\n ] - """ - key = draw(ftl_identifiers()) - - whitespace_type = draw( - st.sampled_from(["leading", "trailing", "both", "newlines", "mixed"]) - ) - - match whitespace_type: - case "leading": - spaces = " " * draw(st.integers(min_value=1, max_value=3)) - return f"[{spaces}{key}]" - case "trailing": - spaces = " " * draw(st.integers(min_value=1, max_value=3)) - return f"[{key}{spaces}]" - case "both": - leading = " " * draw(st.integers(min_value=1, max_value=2)) - trailing = " " * draw(st.integers(min_value=1, max_value=2)) - return f"[{leading}{key}{trailing}]" - case "newlines": - return f"[ \n {key} \n ]" - case _: # mixed - return f"[ \n{key} ]" - - -@composite -def placeable_with_whitespace(draw: st.DrawFn) -> str: - """Generate placeable expression with whitespace around braces. - - Tests FTL-STRICT-WHITESPACE-001: - - Newlines after opening brace: { \\n $var } - - Newlines before closing brace: { $var \\n } - - Mixed whitespace around placeables - """ - var_name = draw(ftl_identifiers()) - - whitespace_type = draw(st.sampled_from(["after_open", "before_close", "both"])) - - match whitespace_type: - case "after_open": - return f"{{ \n ${var_name} }}" - case "before_close": - return f"{{ ${var_name} \n }}" - case _: # both - return f"{{ \n ${var_name} \n }}" - - -@composite -def variable_indent_multiline_pattern(draw: st.DrawFn) -> str: - """Generate multiline pattern with DIFFERENT indentation per line. - - Tests common_indent calculation in parse_pattern(): - - Each continuation line has independent indentation - - Common indent should be minimum of all non-blank lines - - Blank lines (spaces only) should be skipped in indent calculation - - Addresses FTL-GRAMMAR-001: Blank lines before first content. - """ - line_count = draw(st.integers(min_value=2, max_value=5)) - lines: list[str] = [] - - for _ in range(line_count): - indent = " " * draw(st.integers(min_value=1, max_value=8)) - content = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=20)) - if content.strip() == "": - content = draw(st.sampled_from(string.ascii_letters)) - lines.append(indent + content) - - return "\n".join(lines) - - -@composite -def pattern_with_leading_blank_lines(draw: st.DrawFn) -> str: - """Generate pattern with blank lines before first content line. - - Tests FTL-GRAMMAR-001: Parser must skip blank lines before - measuring common_indent in multiline patterns. - - Example: msg =\\n\\n value - Should produce "value", not " value". - """ - blank_count = draw(st.integers(min_value=1, max_value=3)) - indent = " " * draw(st.integers(min_value=1, max_value=8)) - content = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=20)) - if content.strip() == "": - content = draw(st.sampled_from(string.ascii_letters)) - - blank_lines = "\n" * blank_count - return f"{blank_lines}{indent}{content}" - - -@composite -def ftl_message_with_whitespace_edge_cases(draw: st.DrawFn) -> str: - """Generate FTL message exercising whitespace edge cases. - - Combines multiple whitespace edge cases into complete messages - for comprehensive fuzzing of whitespace handling. - """ - msg_id = draw(ftl_identifiers()) - - case_type = draw( - st.sampled_from([ - "trailing_ws", - "multiline_varied_indent", - "leading_blanks", - "placeable_ws", - ]) - ) - - match case_type: - case "trailing_ws": - value = draw(text_with_trailing_whitespace()) - return f"{msg_id} = {value}" - case "multiline_varied_indent": - pattern = draw(variable_indent_multiline_pattern()) - return f"{msg_id} =\n{pattern}" - case "leading_blanks": - pattern = draw(pattern_with_leading_blank_lines()) - return f"{msg_id} ={pattern}" - case _: # placeable_ws - placeable = draw(placeable_with_whitespace()) - return f"{msg_id} = Hello {placeable} World" - - -@composite -def ftl_select_with_whitespace_variants(draw: st.DrawFn) -> str: - """Generate select expression with whitespace edge cases in variants. - - Tests variant key whitespace handling and variant value patterns - with whitespace edge cases. - """ - msg_id = draw(ftl_identifiers()) - selector_var = draw(ftl_identifiers()) - - num_variants = draw(st.integers(min_value=2, max_value=4)) - default_idx = draw(st.integers(min_value=0, max_value=num_variants - 1)) - - variant_keys = ["one", "two", "few", "many", "other", "zero"] - used_keys = draw( - st.lists( - st.sampled_from(variant_keys), - min_size=num_variants, - max_size=num_variants, - unique=True, - ) - ) - - variants: list[str] = [] - for i, key in enumerate(used_keys): - prefix = "*" if i == default_idx else " " - - # Randomly add whitespace to variant key - if draw(st.booleans()): - key_str = draw(variant_key_with_whitespace()) - # Replace the generated key with our unique key - key_str = key_str.replace(key_str[1:-1].strip(), key) - else: - key_str = f"[{key}]" - - value = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=15)) - if value.strip() == "": - value = "value" - variants.append(f"{prefix}{key_str} {value}") - - variants_str = "\n ".join(variants) - return f"{msg_id} = {{ ${selector_var} ->\n {variants_str}\n}}" - - -def _generate_unique_id(draw: st.DrawFn, seen_ids: set[str]) -> str: - """Generate a unique FTL identifier not already in seen_ids.""" - msg_id = draw(ftl_identifiers()) - while msg_id in seen_ids: - msg_id = draw(ftl_identifiers()) - seen_ids.add(msg_id) - return msg_id - - -def _generate_whitespace_message_entry(draw: st.DrawFn, msg_id: str) -> str: - """Generate a message entry with whitespace edge cases.""" - ws_case = draw(st.sampled_from(["trailing", "multiline", "leading_blank"])) - match ws_case: - case "trailing": - value = draw(text_with_trailing_whitespace()) - return f"{msg_id} = {value}" - case "multiline": - pattern = draw(variable_indent_multiline_pattern()) - return f"{msg_id} =\n{pattern}" - case _: - pattern = draw(pattern_with_leading_blank_lines()) - return f"{msg_id} ={pattern}" - - -@composite -def ftl_resource_with_whitespace_chaos(draw: st.DrawFn) -> str: - """Generate FTL resource with mixed whitespace edge cases. - - Events emitted: - - strategy=ws_chaos_entry_{type}: Entry type in whitespace chaos resource - - Combines multiple entry types with various whitespace edge cases - for comprehensive cross-contamination testing. - """ - num_entries = draw(st.integers(min_value=2, max_value=8)) - entries: list[str] = [] - seen_ids: set[str] = set() - - # Track entry types for event emission - entry_types_used: list[str] = [] - - for _ in range(num_entries): - entry_type = draw( - st.sampled_from([ - "simple", - "whitespace_message", - "select_whitespace", - "term", - "comment", - "blank_lines", - ]) - ) - entry_types_used.append(entry_type) - - match entry_type: - case "simple": - msg_id = _generate_unique_id(draw, seen_ids) - value = draw(ftl_simple_text()) - entries.append(f"{msg_id} = {value}") - - case "whitespace_message": - msg_id = _generate_unique_id(draw, seen_ids) - entries.append(_generate_whitespace_message_entry(draw, msg_id)) - - case "select_whitespace": - entry = draw(ftl_select_with_whitespace_variants()) - entry_id = entry.split(" = ")[0] - if entry_id not in seen_ids: - seen_ids.add(entry_id) - entries.append(entry) - - case "term": - term_id = draw(ftl_identifiers()) - value = draw(ftl_simple_text()) - entries.append(f"-{term_id} = {value}") - - case "comment": - level = draw(st.sampled_from(["#", "##", "###"])) - content = draw(ftl_simple_text()) - entries.append(f"{level} {content}") - - case _: # blank_lines - blanks = draw(blank_lines_sequence()) - entries.append(blanks) - - # Emit events for entry type diversity - for et in set(entry_types_used): - event(f"strategy=ws_chaos_entry_{et}") - - return "\n\n".join(entries) - - -# ============================================================================= -# Negative Oracle Strategies (intentionally invalid FTL) -# ============================================================================= -# (MAINT-FUZZ-NEGATIVE-ORACLE-MISSING-001) - - -@composite -def ftl_invalid_select_no_default(draw: st.DrawFn) -> str: - """Generate SelectExpression without default variant (invalid per spec). - - FTL requires exactly one variant to be marked as default with *. - """ - msg_id = draw(ftl_identifiers()) - selector = f"${ draw(ftl_identifiers()) }" - variant1 = draw(ftl_identifiers()) - variant2 = draw(ftl_identifiers()) - - # No asterisk on any variant - invalid - return f"{msg_id} = {{ {selector} ->\n [{variant1}] value1\n [{variant2}] value2\n}}" - - -@composite -def ftl_invalid_unclosed_placeable(draw: st.DrawFn) -> str: - """Generate message with unclosed placeable (invalid syntax).""" - msg_id = draw(ftl_identifiers()) - var_name = draw(ftl_identifiers()) - return f"{msg_id} = Hello {{ ${var_name}" # Missing closing } - - -@composite -def ftl_invalid_unterminated_string(draw: st.DrawFn) -> str: - """Generate message with unterminated string literal (invalid syntax).""" - msg_id = draw(ftl_identifiers()) - return f'{msg_id} = {{ "unterminated string }}' # Missing closing quote - - -@composite -def ftl_invalid_bad_identifier_start(draw: st.DrawFn) -> str: - """Generate message with invalid identifier (starts with digit/symbol).""" - bad_start = draw(st.sampled_from(["0", "1", "_", "-", ".", "@"])) - rest = draw(ftl_identifiers()) - return f"{bad_start}{rest} = value" - - -@composite -def ftl_invalid_double_equals(draw: st.DrawFn) -> str: - """Generate message with double equals sign (invalid syntax).""" - msg_id = draw(ftl_identifiers()) - return f"{msg_id} == value" - - -@composite -def ftl_invalid_missing_value(draw: st.DrawFn) -> str: - """Generate message with missing value (invalid for messages).""" - msg_id = draw(ftl_identifiers()) - return f"{msg_id} =" # No value, no attributes - - -@composite -def ftl_invalid_ftl(draw: st.DrawFn) -> str: - """Generate any type of invalid FTL for error path testing. - - Events emitted: - - strategy=invalid_{type}: Type of invalid FTL generated - - Used for testing parser error recovery and diagnostic generation. - """ - # Choose invalid type explicitly to emit event - invalid_type = draw( - st.sampled_from([ - "no_default", - "unclosed_placeable", - "unterminated_string", - "bad_identifier", - "double_equals", - "missing_value", - ]) - ) - - # Emit event for HypoFuzz coverage guidance - event(f"strategy=invalid_{invalid_type}") - - match invalid_type: - case "no_default": - return draw(ftl_invalid_select_no_default()) - case "unclosed_placeable": - return draw(ftl_invalid_unclosed_placeable()) - case "unterminated_string": - return draw(ftl_invalid_unterminated_string()) - case "bad_identifier": - return draw(ftl_invalid_bad_identifier_start()) - case "double_equals": - return draw(ftl_invalid_double_equals()) - case _: # missing_value - return draw(ftl_invalid_missing_value()) - - -@composite -def ftl_valid_with_injected_error(draw: st.DrawFn) -> tuple[str, str]: - """Generate valid FTL then inject an error. - - Returns tuple of (original_valid_ftl, corrupted_ftl). - Useful for differential testing of error recovery. - """ - # Generate valid FTL - msg_id = draw(ftl_identifiers()) - value = draw(ftl_simple_text()) - valid_ftl = f"{msg_id} = {value}" - - # Choose corruption type - corruption = draw( - st.sampled_from([ - "remove_equals", - "add_unclosed_brace", - "corrupt_identifier", - "insert_null", - ]) - ) - - match corruption: - case "remove_equals": - corrupted = valid_ftl.replace(" = ", " ", 1) - case "add_unclosed_brace": - corrupted = valid_ftl.replace(value, f"{{ {value}", 1) - case "corrupt_identifier": - corrupted = "0" + valid_ftl - case _: # insert_null - mid = len(valid_ftl) // 2 - corrupted = valid_ftl[:mid] + "\x00" + valid_ftl[mid:] - - return (valid_ftl, corrupted) - - -# ============================================================================= -# Circular Reference Strategies (semantic errors, syntactically valid) -# ============================================================================= - - -@composite -def ftl_circular_message_2way(draw: st.DrawFn) -> str: - """Generate 2-message circular reference: A -> B -> A. - - Syntactically valid FTL that causes infinite loop at resolution time. - Tests resolver cycle detection. - """ - # D6 fix: Use st.lists(unique=True) instead of rejection loop - ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) - id_a, id_b = ids - - return f"{id_a} = {{ {id_b} }}\n{id_b} = {{ {id_a} }}" - - -@composite -def ftl_circular_message_3way(draw: st.DrawFn) -> str: - """Generate 3-message circular reference: A -> B -> C -> A. - - Tests transitive cycle detection in resolver. - """ - # D6 fix: Use st.lists(unique=True) instead of rejection loop - ids = draw(st.lists(ftl_identifiers(), min_size=3, max_size=3, unique=True)) - id_a, id_b, id_c = ids - - return f"{id_a} = {{ {id_b} }}\n{id_b} = {{ {id_c} }}\n{id_c} = {{ {id_a} }}" - - -@composite -def ftl_circular_self_reference(draw: st.DrawFn) -> str: - """Generate self-referencing message: A -> A. - - Simplest form of circular reference. - """ - msg_id = draw(ftl_identifiers()) - return f"{msg_id} = Value {{ {msg_id} }}" - - -@composite -def ftl_circular_term_2way(draw: st.DrawFn) -> str: - """Generate 2-term circular reference: -A -> -B -> -A. - - Tests cycle detection in term resolution. - """ - # D6 fix: Use st.lists(unique=True) instead of rejection loop - ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) - id_a, id_b = ids - - return f"-{id_a} = {{ -{id_b} }}\n-{id_b} = {{ -{id_a} }}" - - -@composite -def ftl_circular_mixed(draw: st.DrawFn) -> str: - """Generate circular reference mixing messages and terms. - - msg -> -term -> msg creates cross-namespace cycle. - """ - msg_id = draw(ftl_identifiers()) - term_id = draw(ftl_identifiers()) - - return f"{msg_id} = {{ -{term_id} }}\n-{term_id} = {{ {msg_id} }}" - - -@composite -def ftl_circular_via_attribute(draw: st.DrawFn) -> str: - """Generate circular reference through attributes. - - msg.attr -> other -> msg.attr - """ - # D6 fix: Use st.lists(unique=True) instead of rejection loop - ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) - id_a, id_b = ids - attr = draw(ftl_identifiers()) - - return f"""{id_a} = Base - .{attr} = {{ {id_b} }} -{id_b} = {{ {id_a}.{attr} }}""" - - -@composite -def ftl_circular_deep(draw: st.DrawFn) -> str: - """Generate circular reference with N messages in chain. - - msg0 -> msg1 -> ... -> msgN -> msg0 - """ - chain_length = draw(st.integers(min_value=3, max_value=10)) - ids = [f"msg{i}" for i in range(chain_length)] - - lines = [] - for i, msg_id in enumerate(ids): - next_id = ids[(i + 1) % chain_length] - lines.append(f"{msg_id} = {{ {next_id} }}") - - return "\n".join(lines) - - -@composite -def ftl_circular_references(draw: st.DrawFn) -> str: - """Generate any type of circular reference for cycle detection testing. - - Events emitted: - - strategy=circular_{type}: Type of circular reference generated - - Combined strategy for comprehensive cycle detection fuzzing. - """ - # Map circular types to their generator strategies - generators = { - "2way": ftl_circular_message_2way, - "3way": ftl_circular_message_3way, - "self": ftl_circular_self_reference, - "term_2way": ftl_circular_term_2way, - "mixed": ftl_circular_mixed, - "via_attr": ftl_circular_via_attribute, - "deep": ftl_circular_deep, - } - - # Choose circular reference type explicitly to emit event - circular_type = draw(st.sampled_from(list(generators.keys()))) - - # Emit event for HypoFuzz coverage guidance - event(f"strategy=circular_{circular_type}") - - return draw(generators[circular_type]()) - - -# ============================================================================= -# Semantically Broken Strategies (valid syntax, runtime errors) -# ============================================================================= - - -@composite -def ftl_undefined_reference(draw: st.DrawFn) -> str: - """Generate message referencing undefined message/term. - - Syntactically valid but will fail at resolution time. - """ - # D6 fix: Use st.lists(unique=True) instead of rejection loop - ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) - msg_id, undefined_id = ids - - ref_type = draw(st.sampled_from(["message", "term", "attribute"])) - - match ref_type: - case "message": - return f"{msg_id} = {{ {undefined_id} }}" - case "term": - return f"{msg_id} = {{ -{undefined_id} }}" - case _: # attribute - return f"{msg_id} = {{ {undefined_id}.nonexistent }}" - - -@composite -def ftl_undefined_variable(draw: st.DrawFn) -> str: - """Generate message using undefined variable. - - Variables are provided at format time, so this tests resolver - behavior when required variables are missing. - """ - msg_id = draw(ftl_identifiers()) - var_name = draw(ftl_identifiers()) - - return f"{msg_id} = Hello {{ ${var_name} }}!" - - -@composite -def ftl_function_arity_mismatch(draw: st.DrawFn) -> str: - """Generate function call with wrong number of arguments. - - Tests function argument validation at resolution time. - """ - msg_id = draw(ftl_identifiers()) - func_name = draw(st.sampled_from(["NUMBER", "DATETIME", "CURRENCY"])) - - # NUMBER/DATETIME require at least one positional arg - arity = draw(st.sampled_from(["zero_args", "too_many_args"])) - - match arity: - case "zero_args": - return f"{msg_id} = {{ {func_name}() }}" - case _: # too_many_args - vars_list = ", ".join(f"${draw(ftl_identifiers())}" for _ in range(5)) - return f"{msg_id} = {{ {func_name}({vars_list}) }}" - - -@composite -def ftl_select_missing_variant(draw: st.DrawFn) -> str: - """Generate select expression where runtime selector matches no variant. - - Valid syntax but may produce fallback behavior at runtime. - """ - msg_id = draw(ftl_identifiers()) - var_name = draw(ftl_identifiers()) - - # Define variants that won't match most runtime values - return f"""{msg_id} = {{ ${var_name} -> - [impossiblevalue1] Value 1 - [impossiblevalue2] Value 2 - *[other] Default -}}""" - - -@composite -def ftl_semantically_broken(draw: st.DrawFn) -> str: - """Generate any semantically broken (but syntactically valid) FTL. - - Events emitted: - - strategy=semantic_{type}: Type of semantic error generated - - Combined strategy for resolver error handling testing. - """ - # Choose semantic error type explicitly to emit event - semantic_type = draw( - st.sampled_from([ - "undefined_ref", - "undefined_var", - "arity_mismatch", - "missing_variant", - "circular", - ]) - ) - - # Emit event for HypoFuzz coverage guidance - event(f"strategy=semantic_{semantic_type}") - - match semantic_type: - case "undefined_ref": - return draw(ftl_undefined_reference()) - case "undefined_var": - return draw(ftl_undefined_variable()) - case "arity_mismatch": - return draw(ftl_function_arity_mismatch()) - case "missing_variant": - return draw(ftl_select_missing_variant()) - case _: # circular - return draw(ftl_circular_references()) - - -# ============================================================================= -# Invalid AST Construction Helpers (for validation testing) -# ============================================================================= - - -def build_invalid_select_no_defaults( - selector: VariableReference | None = None, -) -> SelectExpression: - """Build SelectExpression with NO default variants (invalid). - - Bypasses __post_init__ validation to test serializer validation layer. - This is defense-in-depth testing: programmatically constructed ASTs - might bypass parser validation. - - Returns: - SelectExpression with all variants having default=False - """ - if selector is None: - selector = VariableReference(id=Identifier(name="count")) - - variants = ( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="One"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=False, - ), - ) - - # Bypass __post_init__ validation using object.__setattr__ - # This creates an invalid AST for testing serializer validation - obj = object.__new__(SelectExpression) - object.__setattr__(obj, "selector", selector) - object.__setattr__(obj, "variants", variants) - object.__setattr__(obj, "span", None) - - return obj - - -def build_invalid_select_multiple_defaults( - selector: VariableReference | None = None, -) -> SelectExpression: - """Build SelectExpression with MULTIPLE default variants (invalid). - - Bypasses __post_init__ validation to test serializer validation layer. - - Returns: - SelectExpression with all variants having default=True - """ - if selector is None: - selector = VariableReference(id=Identifier(name="count")) - - variants = ( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="One"),)), - default=True, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=True, - ), - ) - - # Bypass __post_init__ validation using object.__setattr__ - obj = object.__new__(SelectExpression) - object.__setattr__(obj, "selector", selector) - object.__setattr__(obj, "variants", variants) - object.__setattr__(obj, "span", None) - - return obj +"""Aggregated FTL strategy surface.""" + +from tests.strategies.ftl_ast import * # noqa: F403 - re-export split FTL AST strategies +from tests.strategies.ftl_negative import * # noqa: F403 - re-export split invalid FTL strategies +from tests.strategies.ftl_shared import * # noqa: F403 - re-export split FTL constants and shared imports +from tests.strategies.ftl_strings import * # noqa: F403 - re-export split FTL string strategies +from tests.strategies.ftl_structural import * # noqa: F403 - re-export split structural FTL strategies +from tests.strategies.ftl_whitespace import * # noqa: F403 - re-export split whitespace FTL strategies diff --git a/tests/strategies/ftl_ast.py b/tests/strategies/ftl_ast.py new file mode 100644 index 00000000..11d2e0ca --- /dev/null +++ b/tests/strategies/ftl_ast.py @@ -0,0 +1,752 @@ +from tests.strategies.ftl_shared import ( + Attribute, + CallArguments, + Comment, + CommentType, + Decimal, + Expression, + FunctionReference, + Identifier, + InlineExpression, + Junk, + Message, + MessageReference, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TermReference, + TextElement, + VariableReference, + Variant, + composite, + event, + st, +) +from tests.strategies.ftl_strings import ftl_identifiers, ftl_numbers, ftl_simple_text + + +@composite +def ftl_text_elements(draw: st.DrawFn) -> TextElement: + """Generate TextElement AST nodes.""" + value = draw(ftl_simple_text()) + return TextElement(value=value) + + +@composite +def ftl_variable_references(draw: st.DrawFn) -> VariableReference: + """Generate VariableReference AST nodes.""" + name = draw(ftl_identifiers()) + return VariableReference(id=Identifier(name=name)) + + +@composite +def ftl_number_literals(draw: st.DrawFn) -> NumberLiteral: + """Generate NumberLiteral AST nodes with valid FTL raw format. + + FTL number syntax: -?[0-9]+(.[0-9]+)? + No scientific notation allowed. Uses fixed-point notation for Decimals. + """ + value = draw(ftl_numbers()) + + # Ensure raw string uses fixed-point notation (no scientific notation) + # str(Decimal) may use 'E' notation for very small/large values + raw = format(value, "f") if isinstance(value, Decimal) else str(value) + + return NumberLiteral(value=value, raw=raw) + + +@composite +def ftl_string_literals(draw: st.DrawFn) -> StringLiteral: + """Generate StringLiteral AST nodes.""" + value = draw(ftl_simple_text()) + return StringLiteral(value=value) + + +@composite +def ftl_named_arguments(draw: st.DrawFn) -> NamedArgument: + """Generate NamedArgument AST nodes for function calls. + + Named arguments have the form: key: value + Example: minimumFractionDigits: 2 + """ + name = draw(ftl_identifiers()) + # Per FTL spec EBNF: named-argument ::= identifier ":" literal + # where literal ::= number-literal | quoted-literal + # Named argument values are constrained to StringLiteral and NumberLiteral only. + value = draw( + st.one_of( + ftl_string_literals(), + ftl_number_literals(), + ) + ) + return NamedArgument(name=Identifier(name=name), value=value) + + +@composite +def ftl_call_arguments(draw: st.DrawFn) -> CallArguments: + """Generate CallArguments AST nodes for function/term calls. + + Call arguments consist of positional and named arguments. + Example: $count, minimumFractionDigits: 2 + """ + # Generate 0-3 positional arguments + num_positional = draw(st.integers(min_value=0, max_value=3)) + positional = tuple( + draw( + st.one_of( + ftl_variable_references(), + ftl_string_literals(), + ftl_number_literals(), + ) + ) + for _ in range(num_positional) + ) + + # Generate 0-3 named arguments with unique names + num_named = draw(st.integers(min_value=0, max_value=3)) + named_keys = draw( + st.lists( + st.sampled_from([ + "minimumFractionDigits", + "maximumFractionDigits", + "useGrouping", + "style", + "currency", + "dateStyle", + "timeStyle", + ]), + min_size=num_named, + max_size=num_named, + unique=True, + ) + ) + named = tuple( + NamedArgument( + name=Identifier(name=key), + value=draw( + st.one_of( + ftl_string_literals(), + ftl_number_literals(), + ) + ), + ) + for key in named_keys + ) + + return CallArguments(positional=positional, named=named) + + +@composite +def ftl_function_references(draw: st.DrawFn) -> FunctionReference: + """Generate FunctionReference AST nodes. + + Function references are UPPERCASE per Fluent convention. + Example: NUMBER($count, minimumFractionDigits: 2) + """ + # Use realistic builtin function names + func_name = draw( + st.sampled_from([ + "NUMBER", + "DATETIME", + "CURRENCY", + "PLURAL", + "CUSTOM", + ]) + ) + arguments = draw(ftl_call_arguments()) + return FunctionReference( + id=Identifier(name=func_name), + arguments=arguments, + ) + + +@composite +def ftl_term_references(draw: st.DrawFn) -> TermReference: + """Generate TermReference AST nodes. + + Term references start with - and may have attributes and arguments. + Example: -brand, -brand.short, -term(case: "genitive") + """ + term_id = draw(ftl_identifiers()) + # Optionally include an attribute reference + has_attr = draw(st.booleans()) + attribute = Identifier(name=draw(ftl_identifiers())) if has_attr else None + # Optionally include arguments (for parameterized terms) + has_args = draw(st.booleans()) + arguments = draw(ftl_call_arguments()) if has_args else None + + return TermReference( + id=Identifier(name=term_id), + attribute=attribute, + arguments=arguments, + ) + + +@composite +def ftl_message_references(draw: st.DrawFn) -> MessageReference: + """Generate MessageReference AST nodes. + + Message references refer to other messages, optionally with attributes. + Example: other-message, other-message.title + """ + msg_id = draw(ftl_identifiers()) + # Optionally include an attribute reference + has_attr = draw(st.booleans()) + attribute = Identifier(name=draw(ftl_identifiers())) if has_attr else None + + return MessageReference( + id=Identifier(name=msg_id), + attribute=attribute, + ) + + +@composite +def ftl_placeables(draw: st.DrawFn, max_depth: int = 2) -> Placeable: + """Generate Placeable AST nodes with comprehensive expression coverage. + + Generates all InlineExpression types defined in the Fluent spec: + - StringLiteral, NumberLiteral, VariableReference (simple) + - MessageReference, TermReference, FunctionReference (references) + - Nested Placeable (recursive) + + Uses weighted probability to control explosion while ensuring coverage. + + Events emitted: + - strategy=placeable_{choice}: Expression type generated (for HypoFuzz guidance) + + Args: + draw: Hypothesis draw function + max_depth: Maximum nesting depth (default 2 to avoid explosion) + """ + expression: Expression + if max_depth <= 0: + # Base case: only simple leaf expressions + choice = draw(st.sampled_from(["variable", "string", "number"])) + match choice: + case "variable": + expression = draw(ftl_variable_references()) + case "string": + expression = draw(ftl_string_literals()) + case _: # number + expression = draw(ftl_number_literals()) + event(f"strategy=placeable_{choice}_leaf") + else: + # Choose expression type with weighted probability: + # - Simple types (variable, string, number): 60% - common cases + # - References (message, term, function): 30% - complex but important + # - Nested/select: 10% - recursive, expensive + choice = draw( + st.sampled_from([ + # Simple types (6x weight) + "variable", "variable", "variable", + "string", "string", + "number", + # Reference types (3x weight) + "message_ref", + "term_ref", + "function_ref", + # Recursive types (1x weight) + "nested", + ]) + ) + + match choice: + case "variable": + expression = draw(ftl_variable_references()) + case "string": + expression = draw(ftl_string_literals()) + case "number": + expression = draw(ftl_number_literals()) + case "message_ref": + expression = draw(ftl_message_references()) + case "term_ref": + expression = draw(ftl_term_references()) + case "function_ref": + expression = draw(ftl_function_references()) + case _: # nested + inner = draw(ftl_placeables(max_depth=max_depth - 1)) + expression = inner.expression + + # Emit event for HypoFuzz coverage guidance + event(f"strategy=placeable_{choice}") + + return Placeable(expression=expression) + + +@composite +def ftl_deep_placeables(draw: st.DrawFn, depth: int = 5) -> Placeable: + """Generate deeply nested Placeable structures for depth limit testing. + + Creates chains of nested placeables up to the specified depth. + Used for testing parser/serializer depth guards. + + Events emitted: + - strategy=deep_placeable_depth={n}: Current nesting depth + """ + event(f"strategy=deep_placeable_depth={depth}") + + if depth <= 1: + return Placeable(expression=draw(ftl_variable_references())) + + inner = draw(ftl_deep_placeables(depth=depth - 1)) + return Placeable(expression=inner.expression) + + +@composite +def ftl_reference_placeables(draw: st.DrawFn) -> Placeable: + """Generate placeables with reference expressions only. + + Targeted strategy for fuzzing the previously-underexposed reference types: + - FunctionReference: { NUMBER($x) } + - TermReference: { -brand } + - MessageReference: { other-message } + + Used for intensive coverage of function/term/message reference parsing + and resolution paths. + """ + expression = draw( + st.one_of( + ftl_function_references(), + ftl_term_references(), + ftl_message_references(), + ) + ) + return Placeable(expression=expression) + + +@composite +def ftl_boundary_depth_placeables(draw: st.DrawFn) -> Placeable: + """Generate placeables at MAX_DEPTH boundary for limit testing. + + Events emitted: + - boundary={under|at|over}_max_depth: Depth boundary condition + + Specifically targets the boundary conditions around MAX_DEPTH: + - MAX_DEPTH - 1: Just under limit (should succeed) + - MAX_DEPTH: At limit (should succeed or fail cleanly) + - MAX_DEPTH + 1: Just over limit (should fail cleanly) + + Used for testing: + - Parser depth guards + - Serializer depth guards + - Resolver depth tracking + """ + from ftllexengine.constants import MAX_DEPTH # noqa: PLC0415 - import inside function + + # Choose boundary point + boundary = draw( + st.sampled_from([ + ("under", MAX_DEPTH - 1), + ("at", MAX_DEPTH), + ("over", MAX_DEPTH + 1), + ]) + ) + label, depth = boundary + + # Emit boundary event for HypoFuzz coverage guidance + event(f"boundary={label}_max_depth") + + # Generate nested placeable at chosen depth + return draw(ftl_deep_placeables(depth=min(depth, 150))) # Cap at 150 for safety + + +@composite +def ftl_boundary_depth_messages(draw: st.DrawFn) -> Message: + """Generate Message AST nodes with boundary-depth patterns. + + Creates complete Message nodes containing deeply nested structures + at the MAX_DEPTH boundary for integration testing. + """ + from ftllexengine.constants import MAX_DEPTH # noqa: PLC0415 - import inside function + + msg_id = Identifier(name=draw(ftl_identifiers())) + + # Choose depth relative to MAX_DEPTH + depth_offset = draw(st.sampled_from([-1, 0, 1])) + depth = MAX_DEPTH + depth_offset + + # Generate pattern with deeply nested placeable + deep_placeable = draw(ftl_deep_placeables(depth=min(depth, 150))) + pattern = Pattern(elements=(TextElement(value="Prefix "), deep_placeable)) + + return Message(id=msg_id, value=pattern, attributes=()) + + +@composite +def ftl_patterns(draw: st.DrawFn) -> Pattern: + """Generate Pattern AST nodes with mixed elements.""" + elements = draw( + st.lists( + st.one_of(ftl_text_elements(), ftl_placeables()), + min_size=1, + max_size=4, + ) + ) + return Pattern(elements=tuple(elements)) + + +@composite +def ftl_variants(draw: st.DrawFn) -> Variant: + """Generate individual Variant AST nodes for select expressions. + + WARNING: This strategy generates variants with RANDOM default flags. + FTL SelectExpression requires EXACTLY ONE default variant (marked with *). + Using this strategy directly to build SelectExpression will likely fail + validation in SelectExpression.__post_init__. + + For valid SelectExpression generation, use ftl_select_expressions() which + properly manages the exactly-one-default invariant. + + This strategy is intended for: + - Testing individual variant serialization + - Type guard testing + - Low-level AST manipulation tests + """ + key = draw( + st.one_of( + st.builds(Identifier, name=ftl_identifiers()), + ftl_number_literals(), + ) + ) + value = draw(ftl_patterns()) + default = draw(st.booleans()) + return Variant(key=key, value=value, default=default) + + +@composite +def ftl_select_expressions(draw: st.DrawFn) -> SelectExpression: + """Generate SelectExpression AST nodes with valid variants. + + Events emitted: + - strategy=select_selector_{type}: Selector expression type (for HypoFuzz) + - strategy=select_variants_{n}: Number of variants generated + + Ensures: + - Exactly one default variant (per Fluent spec) + - Unique variant keys (per Fluent spec) + + D3 fix: Selector can be any InlineExpression, not just VariableReference. + Per FTL spec, common patterns include NUMBER($count) for locale-aware plurals. + """ + # D3 fix: Generate diverse selector types with weighted probability + selector_type = draw( + st.sampled_from([ + "variable", "variable", "variable", "variable", # 40% variable + "number", "number", # 20% number literal + "function", "function", # 20% function (e.g., NUMBER($x)) + "string", # 10% string literal + "term_ref", # 10% term reference + ]) + ) + + selector: InlineExpression + match selector_type: + case "variable": + selector = draw(ftl_variable_references()) + case "number": + selector = draw(ftl_number_literals()) + case "function": + selector = draw(ftl_function_references()) + case "string": + selector = draw(ftl_string_literals()) + case _: # term_ref + selector = draw(ftl_term_references()) + + event(f"strategy=select_selector_{selector_type}") + + # Generate 2-4 unique variant keys using st.sampled_from predefined set + # This avoids expensive rejection-based uniqueness while ensuring valid keys + num_variants = draw(st.integers(min_value=2, max_value=4)) + + # Emit event for HypoFuzz coverage guidance + event(f"strategy=select_variants_{num_variants}") + + # Use predefined unique key names (efficient, no rejection needed) + available_keys = ["one", "two", "three", "four", "five", "other", "zero"] + key_names = draw( + st.lists( + st.sampled_from(available_keys), + min_size=num_variants, + max_size=num_variants, + unique=True, + ) + ) + unique_keys = [Identifier(name=name) for name in key_names] + + # Generate variant values + values = [draw(ftl_patterns()) for _ in range(num_variants)] + + # Choose exactly one variant to be the default + default_index = draw(st.integers(min_value=0, max_value=num_variants - 1)) + + variants = tuple( + Variant(key=unique_keys[i], value=values[i], default=i == default_index) + for i in range(num_variants) + ) + + return SelectExpression(selector=selector, variants=variants) + + +@composite +def ftl_select_expressions_with_number_keys(draw: st.DrawFn) -> SelectExpression: + """Generate SelectExpression with NumberLiteral variant keys. + + Events emitted: + - strategy=select_number_keys: SelectExpression with numeric keys + + Used to test serialization branch for NumberLiteral variant keys. + Per Fluent spec, variant keys can be either Identifier or NumberLiteral. + """ + selector = draw(ftl_variable_references()) + + # Generate 2-4 numeric variant keys + num_variants = draw(st.integers(min_value=2, max_value=4)) + + # Emit event for HypoFuzz coverage guidance + event("strategy=select_number_keys") + + # Generate unique numeric keys (0, 1, 2, etc.) + numeric_keys = [NumberLiteral(value=Decimal(str(i)), raw=str(i)) for i in range(num_variants)] + + # Generate variant values + values = [draw(ftl_patterns()) for _ in range(num_variants)] + + # Choose exactly one variant to be the default + default_index = draw(st.integers(min_value=0, max_value=num_variants - 1)) + + variants = tuple( + Variant(key=numeric_keys[i], value=values[i], default=i == default_index) + for i in range(num_variants) + ) + + return SelectExpression(selector=selector, variants=variants) + + +@composite +def ftl_function_references_no_args(draw: st.DrawFn) -> FunctionReference: + """Generate FunctionReference without arguments. + + Events emitted: + - strategy=function_no_args: FunctionReference with empty arguments + + Used to test serialization branch for FunctionReference without arguments. + While uncommon in practice, the AST structure permits CallArguments with + empty positional and named tuples. + """ + # Use realistic builtin function names + func_name = draw( + st.sampled_from([ + "NUMBER", + "DATETIME", + "CURRENCY", + "PLURAL", + "CUSTOM", + ]) + ) + + # Emit event for HypoFuzz coverage guidance + event("strategy=function_no_args") + + # Create CallArguments with no arguments + arguments = CallArguments(positional=(), named=()) + + return FunctionReference( + id=Identifier(name=func_name), + arguments=arguments, + ) + + +@composite +def ftl_attribute_nodes(draw: st.DrawFn) -> Attribute: + """Generate Attribute AST nodes for messages and terms. + + Events emitted: + - strategy=attribute: Attribute node generated (for HypoFuzz guidance) + + Attributes are key-value pairs attached to messages/terms: + - .title = Button Title + - .aria-label = Accessible label + - .accesskey = B + """ + attr_id = Identifier(name=draw(ftl_identifiers())) + attr_value = draw(ftl_patterns()) + + event("strategy=attribute") + return Attribute(id=attr_id, value=attr_value) + + +@composite +def ftl_message_nodes(draw: st.DrawFn, *, include_attributes: bool = True) -> Message: + """Generate Message AST nodes. + + Events emitted: + - strategy=message_{with|no}_attrs: Message attribute presence (for HypoFuzz) + + Messages must have a value (pattern). Messages without values + are invalid FTL and get parsed as Junk. + + Args: + include_attributes: If True, 30% chance of generating attributes. + """ + id_val = Identifier(name=draw(ftl_identifiers())) + value = draw(ftl_patterns()) + + # 30% chance of attributes when enabled (D1 fix) + attributes: tuple[Attribute, ...] = () + if include_attributes and draw(st.integers(min_value=0, max_value=9)) < 3: + num_attrs = draw(st.integers(min_value=1, max_value=3)) + # Generate unique attribute names + attr_names = draw( + st.lists(ftl_identifiers(), min_size=num_attrs, max_size=num_attrs, unique=True) + ) + attributes = tuple( + Attribute(id=Identifier(name=name), value=draw(ftl_patterns())) + for name in attr_names + ) + event("strategy=message_with_attrs") + else: + event("strategy=message_no_attrs") + + return Message(id=id_val, value=value, attributes=attributes) + + +@composite +def ftl_comment_nodes(draw: st.DrawFn) -> Comment: + """Generate Comment AST nodes of all types. + + Events emitted: + - strategy=comment_{type}: Comment type generated (for HypoFuzz guidance) + + Generates all three FTL comment types per spec: + - COMMENT: # Single comment + - GROUP: ## Group comment (applies to following entries) + - RESOURCE: ### Resource comment (file-level) + """ + content = draw(ftl_simple_text()) + # D5 fix: Generate all comment types with weighted probability + comment_type = draw( + st.sampled_from([ + CommentType.COMMENT, + CommentType.COMMENT, + CommentType.COMMENT, # 60% regular + CommentType.GROUP, # 20% group + CommentType.RESOURCE, # 20% resource + ]) + ) + event(f"strategy=comment_{comment_type.name.lower()}") + return Comment(content=content, type=comment_type) + + +@composite +def ftl_junk_nodes(draw: st.DrawFn) -> Junk: + """Generate Junk AST nodes.""" + content = draw(st.text(min_size=1, max_size=50)) + return Junk(content=content) + + +@composite +def ftl_term_nodes(draw: st.DrawFn, *, include_attributes: bool = True) -> Term: + """Generate Term AST nodes. + + Events emitted: + - strategy=term_{with|no}_attrs: Term attribute presence (for HypoFuzz) + + Args: + include_attributes: If True, 30% chance of generating attributes. + """ + id_val = Identifier(name=draw(ftl_identifiers())) + value = draw(ftl_patterns()) + + # 30% chance of attributes when enabled (D1 fix) + attributes: tuple[Attribute, ...] = () + if include_attributes and draw(st.integers(min_value=0, max_value=9)) < 3: + num_attrs = draw(st.integers(min_value=1, max_value=3)) + attr_names = draw( + st.lists(ftl_identifiers(), min_size=num_attrs, max_size=num_attrs, unique=True) + ) + attributes = tuple( + Attribute(id=Identifier(name=name), value=draw(ftl_patterns())) + for name in attr_names + ) + event("strategy=term_with_attrs") + else: + event("strategy=term_no_attrs") + + return Term(id=id_val, value=value, attributes=attributes) + + +@composite +def ftl_resources(draw: st.DrawFn) -> Resource: + """Generate complete Resource AST nodes with messages, terms, and comments. + + Events emitted: + - strategy=resource_entry_{type}: Entry types included (for HypoFuzz guidance) + + Generates mixed entry types reflecting real FTL files: + - 60% messages (primary content) + - 20% terms (reusable snippets) + - 20% comments (documentation) + + Ensures unique IDs within each namespace (messages vs terms are separate). + """ + entries = draw( + st.lists( + st.one_of( + # D2 fix: Include terms in resource generation + ftl_message_nodes(), # 60% messages (3x weight) + ftl_message_nodes(), + ftl_message_nodes(), + ftl_term_nodes(), # 20% terms + ftl_comment_nodes(), # 20% comments + ), + min_size=1, + max_size=5, + ) + ) + + # Deduplicate IDs within each namespace (messages and terms are separate) + seen_message_ids: set[str] = set() + seen_term_ids: set[str] = set() + unique_entries: list[Message | Term | Comment] = [] + + for entry in entries: + match entry: + case Message(id=ident): + if ident.name not in seen_message_ids: + seen_message_ids.add(ident.name) + unique_entries.append(entry) + event("strategy=resource_entry_message") + case Term(id=ident): + if ident.name not in seen_term_ids: + seen_term_ids.add(ident.name) + unique_entries.append(entry) + event("strategy=resource_entry_term") + case Comment(): + unique_entries.append(entry) + event("strategy=resource_entry_comment") + + return Resource(entries=tuple(unique_entries)) + + +@composite +def any_ast_entry(draw: st.DrawFn) -> Message | Term | Comment | Junk: + """Generate any AST entry type for type guard testing.""" + return draw( + st.one_of( + ftl_message_nodes(), + ftl_term_nodes(), + ftl_comment_nodes(), + ftl_junk_nodes(), + ) + ) + + +@composite +def any_ast_pattern_element(draw: st.DrawFn) -> TextElement | Placeable: + """Generate any pattern element type for type guard testing.""" + return draw(st.one_of(ftl_text_elements(), ftl_placeables())) diff --git a/tests/strategies/ftl_negative.py b/tests/strategies/ftl_negative.py new file mode 100644 index 00000000..98efb110 --- /dev/null +++ b/tests/strategies/ftl_negative.py @@ -0,0 +1,458 @@ +from tests.strategies.ftl_shared import ( + Identifier, + Pattern, + SelectExpression, + TextElement, + VariableReference, + Variant, + composite, + event, + st, +) +from tests.strategies.ftl_strings import ftl_identifiers, ftl_simple_text + + +@composite +def ftl_invalid_select_no_default(draw: st.DrawFn) -> str: + """Generate SelectExpression without default variant (invalid per spec). + + FTL requires exactly one variant to be marked as default with *. + """ + msg_id = draw(ftl_identifiers()) + selector = f"${ draw(ftl_identifiers()) }" + variant1 = draw(ftl_identifiers()) + variant2 = draw(ftl_identifiers()) + + # No asterisk on any variant - invalid + return f"{msg_id} = {{ {selector} ->\n [{variant1}] value1\n [{variant2}] value2\n}}" + + +@composite +def ftl_invalid_unclosed_placeable(draw: st.DrawFn) -> str: + """Generate message with unclosed placeable (invalid syntax).""" + msg_id = draw(ftl_identifiers()) + var_name = draw(ftl_identifiers()) + return f"{msg_id} = Hello {{ ${var_name}" # Missing closing } + + +@composite +def ftl_invalid_unterminated_string(draw: st.DrawFn) -> str: + """Generate message with unterminated string literal (invalid syntax).""" + msg_id = draw(ftl_identifiers()) + return f'{msg_id} = {{ "unterminated string }}' # Missing closing quote + + +@composite +def ftl_invalid_bad_identifier_start(draw: st.DrawFn) -> str: + """Generate message with invalid identifier (starts with digit/symbol).""" + bad_start = draw(st.sampled_from(["0", "1", "_", "-", ".", "@"])) + rest = draw(ftl_identifiers()) + return f"{bad_start}{rest} = value" + + +@composite +def ftl_invalid_double_equals(draw: st.DrawFn) -> str: + """Generate message with double equals sign (invalid syntax).""" + msg_id = draw(ftl_identifiers()) + return f"{msg_id} == value" + + +@composite +def ftl_invalid_missing_value(draw: st.DrawFn) -> str: + """Generate message with missing value (invalid for messages).""" + msg_id = draw(ftl_identifiers()) + return f"{msg_id} =" # No value, no attributes + + +@composite +def ftl_invalid_ftl(draw: st.DrawFn) -> str: + """Generate any type of invalid FTL for error path testing. + + Events emitted: + - strategy=invalid_{type}: Type of invalid FTL generated + + Used for testing parser error recovery and diagnostic generation. + """ + # Choose invalid type explicitly to emit event + invalid_type = draw( + st.sampled_from([ + "no_default", + "unclosed_placeable", + "unterminated_string", + "bad_identifier", + "double_equals", + "missing_value", + ]) + ) + + # Emit event for HypoFuzz coverage guidance + event(f"strategy=invalid_{invalid_type}") + + match invalid_type: + case "no_default": + return draw(ftl_invalid_select_no_default()) + case "unclosed_placeable": + return draw(ftl_invalid_unclosed_placeable()) + case "unterminated_string": + return draw(ftl_invalid_unterminated_string()) + case "bad_identifier": + return draw(ftl_invalid_bad_identifier_start()) + case "double_equals": + return draw(ftl_invalid_double_equals()) + case _: # missing_value + return draw(ftl_invalid_missing_value()) + + +@composite +def ftl_valid_with_injected_error(draw: st.DrawFn) -> tuple[str, str]: + """Generate valid FTL then inject an error. + + Returns tuple of (original_valid_ftl, corrupted_ftl). + Useful for differential testing of error recovery. + """ + # Generate valid FTL + msg_id = draw(ftl_identifiers()) + value = draw(ftl_simple_text()) + valid_ftl = f"{msg_id} = {value}" + + # Choose corruption type + corruption = draw( + st.sampled_from([ + "remove_equals", + "add_unclosed_brace", + "corrupt_identifier", + "insert_null", + ]) + ) + + match corruption: + case "remove_equals": + corrupted = valid_ftl.replace(" = ", " ", 1) + case "add_unclosed_brace": + corrupted = valid_ftl.replace(value, f"{{ {value}", 1) + case "corrupt_identifier": + corrupted = "0" + valid_ftl + case _: # insert_null + mid = len(valid_ftl) // 2 + corrupted = valid_ftl[:mid] + "\x00" + valid_ftl[mid:] + + return (valid_ftl, corrupted) + + +# ============================================================================= +# Circular Reference Strategies (semantic errors, syntactically valid) +# ============================================================================= + + +@composite +def ftl_circular_message_2way(draw: st.DrawFn) -> str: + """Generate 2-message circular reference: A -> B -> A. + + Syntactically valid FTL that causes infinite loop at resolution time. + Tests resolver cycle detection. + """ + # D6 fix: Use st.lists(unique=True) instead of rejection loop + ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) + id_a, id_b = ids + + return f"{id_a} = {{ {id_b} }}\n{id_b} = {{ {id_a} }}" + + +@composite +def ftl_circular_message_3way(draw: st.DrawFn) -> str: + """Generate 3-message circular reference: A -> B -> C -> A. + + Tests transitive cycle detection in resolver. + """ + # D6 fix: Use st.lists(unique=True) instead of rejection loop + ids = draw(st.lists(ftl_identifiers(), min_size=3, max_size=3, unique=True)) + id_a, id_b, id_c = ids + + return f"{id_a} = {{ {id_b} }}\n{id_b} = {{ {id_c} }}\n{id_c} = {{ {id_a} }}" + + +@composite +def ftl_circular_self_reference(draw: st.DrawFn) -> str: + """Generate self-referencing message: A -> A. + + Simplest form of circular reference. + """ + msg_id = draw(ftl_identifiers()) + return f"{msg_id} = Value {{ {msg_id} }}" + + +@composite +def ftl_circular_term_2way(draw: st.DrawFn) -> str: + """Generate 2-term circular reference: -A -> -B -> -A. + + Tests cycle detection in term resolution. + """ + # D6 fix: Use st.lists(unique=True) instead of rejection loop + ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) + id_a, id_b = ids + + return f"-{id_a} = {{ -{id_b} }}\n-{id_b} = {{ -{id_a} }}" + + +@composite +def ftl_circular_mixed(draw: st.DrawFn) -> str: + """Generate circular reference mixing messages and terms. + + msg -> -term -> msg creates cross-namespace cycle. + """ + msg_id = draw(ftl_identifiers()) + term_id = draw(ftl_identifiers()) + + return f"{msg_id} = {{ -{term_id} }}\n-{term_id} = {{ {msg_id} }}" + + +@composite +def ftl_circular_via_attribute(draw: st.DrawFn) -> str: + """Generate circular reference through attributes. + + msg.attr -> other -> msg.attr + """ + # D6 fix: Use st.lists(unique=True) instead of rejection loop + ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) + id_a, id_b = ids + attr = draw(ftl_identifiers()) + + return f"""{id_a} = Base + .{attr} = {{ {id_b} }} +{id_b} = {{ {id_a}.{attr} }}""" + + +@composite +def ftl_circular_deep(draw: st.DrawFn) -> str: + """Generate circular reference with N messages in chain. + + msg0 -> msg1 -> ... -> msgN -> msg0 + """ + chain_length = draw(st.integers(min_value=3, max_value=10)) + ids = [f"msg{i}" for i in range(chain_length)] + + lines = [] + for i, msg_id in enumerate(ids): + next_id = ids[(i + 1) % chain_length] + lines.append(f"{msg_id} = {{ {next_id} }}") + + return "\n".join(lines) + + +@composite +def ftl_circular_references(draw: st.DrawFn) -> str: + """Generate any type of circular reference for cycle detection testing. + + Events emitted: + - strategy=circular_{type}: Type of circular reference generated + + Combined strategy for comprehensive cycle detection fuzzing. + """ + # Map circular types to their generator strategies + generators = { + "2way": ftl_circular_message_2way, + "3way": ftl_circular_message_3way, + "self": ftl_circular_self_reference, + "term_2way": ftl_circular_term_2way, + "mixed": ftl_circular_mixed, + "via_attr": ftl_circular_via_attribute, + "deep": ftl_circular_deep, + } + + # Choose circular reference type explicitly to emit event + circular_type = draw(st.sampled_from(list(generators.keys()))) + + # Emit event for HypoFuzz coverage guidance + event(f"strategy=circular_{circular_type}") + + return draw(generators[circular_type]()) + + +# ============================================================================= +# Semantically Broken Strategies (valid syntax, runtime errors) +# ============================================================================= + + +@composite +def ftl_undefined_reference(draw: st.DrawFn) -> str: + """Generate message referencing undefined message/term. + + Syntactically valid but will fail at resolution time. + """ + # D6 fix: Use st.lists(unique=True) instead of rejection loop + ids = draw(st.lists(ftl_identifiers(), min_size=2, max_size=2, unique=True)) + msg_id, undefined_id = ids + + ref_type = draw(st.sampled_from(["message", "term", "attribute"])) + + match ref_type: + case "message": + return f"{msg_id} = {{ {undefined_id} }}" + case "term": + return f"{msg_id} = {{ -{undefined_id} }}" + case _: # attribute + return f"{msg_id} = {{ {undefined_id}.nonexistent }}" + + +@composite +def ftl_undefined_variable(draw: st.DrawFn) -> str: + """Generate message using undefined variable. + + Variables are provided at format time, so this tests resolver + behavior when required variables are missing. + """ + msg_id = draw(ftl_identifiers()) + var_name = draw(ftl_identifiers()) + + return f"{msg_id} = Hello {{ ${var_name} }}!" + + +@composite +def ftl_function_arity_mismatch(draw: st.DrawFn) -> str: + """Generate function call with wrong number of arguments. + + Tests function argument validation at resolution time. + """ + msg_id = draw(ftl_identifiers()) + func_name = draw(st.sampled_from(["NUMBER", "DATETIME", "CURRENCY"])) + + # NUMBER/DATETIME require at least one positional arg + arity = draw(st.sampled_from(["zero_args", "too_many_args"])) + + match arity: + case "zero_args": + return f"{msg_id} = {{ {func_name}() }}" + case _: # too_many_args + vars_list = ", ".join(f"${draw(ftl_identifiers())}" for _ in range(5)) + return f"{msg_id} = {{ {func_name}({vars_list}) }}" + + +@composite +def ftl_select_missing_variant(draw: st.DrawFn) -> str: + """Generate select expression where runtime selector matches no variant. + + Valid syntax but may produce fallback behavior at runtime. + """ + msg_id = draw(ftl_identifiers()) + var_name = draw(ftl_identifiers()) + + # Define variants that won't match most runtime values + return f"""{msg_id} = {{ ${var_name} -> + [impossiblevalue1] Value 1 + [impossiblevalue2] Value 2 + *[other] Default +}}""" + + +@composite +def ftl_semantically_broken(draw: st.DrawFn) -> str: + """Generate any semantically broken (but syntactically valid) FTL. + + Events emitted: + - strategy=semantic_{type}: Type of semantic error generated + + Combined strategy for resolver error handling testing. + """ + # Choose semantic error type explicitly to emit event + semantic_type = draw( + st.sampled_from([ + "undefined_ref", + "undefined_var", + "arity_mismatch", + "missing_variant", + "circular", + ]) + ) + + # Emit event for HypoFuzz coverage guidance + event(f"strategy=semantic_{semantic_type}") + + match semantic_type: + case "undefined_ref": + return draw(ftl_undefined_reference()) + case "undefined_var": + return draw(ftl_undefined_variable()) + case "arity_mismatch": + return draw(ftl_function_arity_mismatch()) + case "missing_variant": + return draw(ftl_select_missing_variant()) + case _: # circular + return draw(ftl_circular_references()) + + +# ============================================================================= +# Invalid AST Construction Helpers (for validation testing) +# ============================================================================= + + +def build_invalid_select_no_defaults( + selector: VariableReference | None = None, +) -> SelectExpression: + """Build SelectExpression with NO default variants (invalid). + + Bypasses __post_init__ validation to test serializer validation layer. + This is defense-in-depth testing: programmatically constructed ASTs + might bypass parser validation. + + Returns: + SelectExpression with all variants having default=False + """ + if selector is None: + selector = VariableReference(id=Identifier(name="count")) + + variants = ( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="One"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=False, + ), + ) + + # Bypass __post_init__ validation using object.__setattr__ + # This creates an invalid AST for testing serializer validation + obj = object.__new__(SelectExpression) + object.__setattr__(obj, "selector", selector) + object.__setattr__(obj, "variants", variants) + object.__setattr__(obj, "span", None) + + return obj + + +def build_invalid_select_multiple_defaults( + selector: VariableReference | None = None, +) -> SelectExpression: + """Build SelectExpression with MULTIPLE default variants (invalid). + + Bypasses __post_init__ validation to test serializer validation layer. + + Returns: + SelectExpression with all variants having default=True + """ + if selector is None: + selector = VariableReference(id=Identifier(name="count")) + + variants = ( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="One"),)), + default=True, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, + ), + ) + + # Bypass __post_init__ validation using object.__setattr__ + obj = object.__new__(SelectExpression) + object.__setattr__(obj, "selector", selector) + object.__setattr__(obj, "variants", variants) + object.__setattr__(obj, "span", None) + + return obj diff --git a/tests/strategies/ftl_shared.py b/tests/strategies/ftl_shared.py new file mode 100644 index 00000000..95afe448 --- /dev/null +++ b/tests/strategies/ftl_shared.py @@ -0,0 +1,59 @@ +"""Shared imports and constants for split FTL strategies.""" + +from __future__ import annotations + +import string +from decimal import Decimal + +from hypothesis import event +from hypothesis import strategies as st +from hypothesis.strategies import composite + +from ftllexengine.enums import CommentType +from ftllexengine.runtime.function_bridge import FluentNumber +from ftllexengine.syntax.ast import ( + Attribute, + CallArguments, + Comment, + Expression, + FunctionReference, + Identifier, + InlineExpression, + Junk, + Message, + MessageReference, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TermReference, + TextElement, + VariableReference, + Variant, +) + +FTL_IDENTIFIER_FIRST_CHARS: str = string.ascii_letters +FTL_IDENTIFIER_REST_CHARS: str = string.ascii_letters + string.digits + "-_" +IDENTIFIER_PARTS = ("foo", "bar", "baz", "value", "count", "name", "id", "key") +FTL_SAFE_CHARS = string.ascii_letters + string.digits + " .,!?'-" +UNICODE_CHARS = ( + "\u4e16\u754c" + "\u0414\u043e\u0431\u0440\u043e" + "\u3053\u3093\u306b\u3061\u306f" + "\u00e9\u00e0\u00fc\u00f1" + "\u2019\u2018\u201c\u201d" +) + +__all__ = [ + "FTL_IDENTIFIER_FIRST_CHARS", "FTL_IDENTIFIER_REST_CHARS", "FTL_SAFE_CHARS", + "IDENTIFIER_PARTS", "UNICODE_CHARS", "Attribute", "CallArguments", "Comment", "CommentType", + "Decimal", "Expression", "FluentNumber", "FunctionReference", "Identifier", + "InlineExpression", "Junk", "Message", "MessageReference", "NamedArgument", "NumberLiteral", + "Pattern", "Placeable", "Resource", "SelectExpression", "StringLiteral", "Term", + "TermReference", "TextElement", "VariableReference", "Variant", "composite", "event", "st", + "string", +] diff --git a/tests/strategies/ftl_strings.py b/tests/strategies/ftl_strings.py new file mode 100644 index 00000000..c923cae4 --- /dev/null +++ b/tests/strategies/ftl_strings.py @@ -0,0 +1,582 @@ +from tests.strategies.ftl_shared import ( + FTL_IDENTIFIER_FIRST_CHARS, + FTL_IDENTIFIER_REST_CHARS, + FTL_SAFE_CHARS, + IDENTIFIER_PARTS, + UNICODE_CHARS, + Decimal, + FluentNumber, + composite, + event, + st, + string, +) + + +@composite +def ftl_identifiers(draw: st.DrawFn) -> str: + """Generate valid FTL identifiers. + + FTL spec: [a-zA-Z][a-zA-Z0-9_-]* + Uses both uppercase AND lowercase per specification. + """ + first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) + rest = draw( + st.text( + alphabet=FTL_IDENTIFIER_REST_CHARS, + max_size=20, + ) + ) + return first + rest + + +# Reserved keywords in FTL (for intensive fuzzing of keyword handling) +FTL_RESERVED_KEYWORDS = ( + "NUMBER", + "DATETIME", + "one", + "other", + "zero", + "two", + "few", + "many", +) + + +@composite +def ftl_identifiers_with_keywords(draw: st.DrawFn) -> str: + """Generate FTL identifiers, sometimes using reserved keywords. + + Used for intensive fuzzing to test keyword handling paths. + 50% chance of returning a reserved keyword, otherwise a random identifier. + """ + if draw(st.booleans()): + return draw(st.sampled_from(FTL_RESERVED_KEYWORDS)) + + first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) + rest = draw( + st.text( + alphabet=FTL_IDENTIFIER_REST_CHARS, + max_size=64, + ) + ) + return first + rest + + +@composite +def ftl_identifier_boundary(draw: st.DrawFn) -> str: + """Generate boundary-case identifiers for edge testing. + + Tests single-char, long identifiers, and repeated separators. + """ + choice = draw(st.sampled_from(["single", "long", "hyphen", "underscore"])) + if choice == "single": + return draw(st.sampled_from("abcdefghijklmnopqrstuvwxyz")) + if choice == "long": + # Maximum practical length + return "a" + draw( + st.text( + alphabet="abcdefghijklmnopqrstuvwxyz0123456789", + min_size=200, + max_size=200, + ) + ) + if choice == "hyphen": + return "a" + "-" * draw(st.integers(1, 10)) + "b" + # underscore + return "a" + "_" * draw(st.integers(1, 10)) + "b" + + +@composite +def ftl_simple_text(draw: st.DrawFn) -> str: + """Generate simple text without special FTL characters. + + Ensures text is not whitespace-only (blank lines are message separators). + """ + text = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=50)) + # Ensure not whitespace-only + if text.strip() == "": + text = draw(st.sampled_from(string.ascii_letters)) + return text + + +@composite +def ftl_unicode_text(draw: st.DrawFn) -> str: + """Generate text with comprehensive Unicode coverage. + + Uses Hypothesis's full Unicode text strategy, filtering only: + - FTL structural characters: { } [ ] * $ - . # + - Control characters (Cc category) + - Newlines (message separators) + - Surrogates (Cs category) + + This provides much broader Unicode coverage than the limited UNICODE_CHARS + constant, including non-BMP characters, ZWJ sequences, RTL text, etc. + (MAINT-FUZZ-UNICODE-UNDEREXPOSURE-001) + """ + # Full Unicode text with FTL structural chars filtered + text = draw( + st.text( + alphabet=st.characters( + blacklist_categories=("Cc", "Cs"), # No control chars or surrogates + blacklist_characters="{}[]*$-.#\n\r", # No FTL structural chars + ), + min_size=1, + max_size=30, + ) + ) + # Ensure non-whitespace content + if text.strip() == "": + text = draw(st.sampled_from(list(UNICODE_CHARS))) + return text + + +@composite +def ftl_unicode_stress_text(draw: st.DrawFn) -> str: + """Generate Unicode stress test cases. + + Events emitted: + - unicode={category}: Unicode stress category (emoji, rtl, combining, etc.) + + Specifically targets edge cases that may cause encoding or display issues: + - Non-BMP characters (emoji, math symbols) + - ZWJ sequences + - RTL markers and bidirectional text + - Combining characters + - Rare scripts + """ + # Stress cases with categories for event emission + stress_cases = [ + ("\U0001F600", "emoji"), # Emoji (non-BMP) + ("\U0001F469\u200D\U0001F4BB", "zwj"), # ZWJ sequence (woman technologist) + ("\u202Eevil\u202C", "rtl"), # RTL override + ("cafe\u0301", "combining"), # Combining accent (e as e + combining acute) + ("\u0627\u0644\u0639\u0631\u0628\u064A\u0629", "arabic"), # Arabic + ("\u4E2D\u6587", "cjk"), # Chinese + ("\u0928\u092E\u0938\u094D\u0924\u0947", "devanagari"), # Hindi (Devanagari) + ("\uFEFF", "bom"), # BOM + ("\u200B", "zero_width"), # Zero-width space + ("\u00A0", "nbsp"), # Non-breaking space + ("\U0001F1FA\U0001F1F8", "flag"), # Flag emoji (regional indicators) + ] + text, category = draw(st.sampled_from(stress_cases)) + + # Emit event for HypoFuzz coverage guidance + event(f"unicode={category}") + + return text + + +# ============================================================================= +# Chaos Mode Strategies (parser stress testing) +# ============================================================================= + + +@composite +def ftl_chaos_text(draw: st.DrawFn) -> str: + """Generate text WITH FTL structural characters for parser stress testing. + + Unlike ftl_unicode_text() which filters out {}[]*$-.#, this strategy + INCLUDES these characters to test parser error recovery, escape handling, + and edge cases where FTL syntax appears in unexpected places. + + WARNING: This generates potentially invalid FTL. Use for: + - Parser error recovery testing + - Junk node generation testing + - Fuzzing edge cases + + Do NOT use for roundtrip testing where valid FTL is required. + """ + # Include FTL structural characters + text = draw( + st.text( + alphabet=st.characters( + blacklist_categories=("Cc", "Cs"), # No control chars or surrogates + blacklist_characters="\n\r", # Only filter newlines (entry separators) + ), + min_size=1, + max_size=50, + ) + ) + # Ensure non-whitespace content + if text.strip() == "": + text = draw(st.sampled_from(["text", "value", "test"])) + return text + + +@composite +def ftl_chaos_source(draw: st.DrawFn) -> str: + """Generate raw FTL source with chaos text for intensive parser fuzzing. + + Creates FTL-like structures with potentially invalid content to stress + test parser error handling and recovery mechanisms. + + Events emitted: + - strategy=chaos_{pattern}: Chaos injection pattern used (for HypoFuzz guidance) + + Generates variations like: + - msg = { unterminated + - msg = value { $var } more { unclosed + - msg = [ bracket ] confusion + """ + msg_id = draw(ftl_identifiers()) + chaos = draw(ftl_chaos_text()) + + # Choose chaos injection pattern + pattern = draw( + st.sampled_from([ + "plain", # msg = + "prefix_brace", # msg = { + "suffix_brace", # msg = } + "embedded_dollar", # msg = text $ more + "bracket_noise", # msg = [ ] + "mixed", # msg = { $x } { more + ]) + ) + + # Emit event for HypoFuzz coverage guidance + event(f"strategy=chaos_{pattern}") + + match pattern: + case "plain": + return f"{msg_id} = {chaos}" + case "prefix_brace": + return f"{msg_id} = {{ {chaos}" + case "suffix_brace": + return f"{msg_id} = {chaos} }}" + case "embedded_dollar": + prefix = draw(ftl_simple_text()) + return f"{msg_id} = {prefix} ${chaos}" + case "bracket_noise": + return f"{msg_id} = [ {chaos} ]" + case _: # mixed + var = draw(ftl_identifiers()) + return f"{msg_id} = {{ ${var} }} {chaos} {{ more" + + +@composite +def ftl_pathological_nesting(draw: st.DrawFn) -> str: + """Generate pathologically nested FTL for parser depth limit testing. + + Creates deeply nested structures that approach or exceed MAX_DEPTH: + - Nested placeables: { { { { $x } } } } + - Nested selects: { $a -> [x] { $b -> [y] value } } + + Events emitted: + - boundary={under|at|over}_max_depth: Depth boundary condition (for HypoFuzz) + + Used for testing: + - Parser depth guards + - Stack overflow prevention + - Error recovery at depth limits + """ + from ftllexengine.constants import MAX_DEPTH # noqa: PLC0415 - import inside function + + msg_id = draw(ftl_identifiers()) + + # Choose between boundary, at-limit, and over-limit with labels + depth_choice = draw( + st.sampled_from([ + (MAX_DEPTH - 5, "under"), # Safely within limits + (MAX_DEPTH - 1, "under"), # Just under limit + (MAX_DEPTH, "at"), # At limit + (MAX_DEPTH + 1, "over"), # Just over limit + (MAX_DEPTH + 10, "over"), # Well over limit + ]) + ) + depth, boundary_label = depth_choice + + # Emit boundary event for HypoFuzz coverage guidance + event(f"boundary={boundary_label}_max_depth") + event(f"depth={depth}") + + # Generate nested braces + open_braces = "{ " * depth + close_braces = " }" * depth + inner_var = draw(ftl_identifiers()) + + return f"{msg_id} = {open_braces}${inner_var}{close_braces}" + + +@composite +def ftl_multiline_chaos_source(draw: st.DrawFn) -> str: + """Generate multi-entry chaos FTL with line breaks at invalid positions. + + Events emitted: + - strategy=multiline_chaos_{pattern}: Chaos injection pattern (for HypoFuzz) + + D7 fix: Tests parser error recovery for multiline malformed input. + Real-world malformed FTL often involves: + - Continuation lines without proper indentation + - Entries split across unexpected boundaries + - CRLF mid-token + - Unclosed structures spanning multiple lines + """ + num_entries = draw(st.integers(min_value=2, max_value=4)) + entries: list[str] = [] + + pattern = draw( + st.sampled_from([ + "mid_identifier", # Line break inside identifier + "mid_placeable", # Line break inside placeable + "between_eq_value", # Line break between = and value + "unclosed_multiline", # Unclosed brace spanning lines + "bad_continuation", # Bad indentation on continuation + ]) + ) + event(f"strategy=multiline_chaos_{pattern}") + + for i in range(num_entries): + msg_id = f"msg{i}" + match pattern: + case "mid_identifier": + # Break identifier across lines (invalid) + entries.append(f"ms\ng{i} = value{i}") + case "mid_placeable": + # Break placeable across lines + entries.append(f"{msg_id} = text {{ $va\nr{i} }} more") + case "between_eq_value": + # Line break between = and value + entries.append(f"{msg_id} =\nvalue{i}") + case "unclosed_multiline": + # Unclosed brace spanning to next entry + if i < num_entries - 1: + entries.append(f"{msg_id} = {{ $var{i}") + else: + entries.append(f"{msg_id} = closed }}") + case _: # bad_continuation + # Tab indentation (invalid per FTL spec) + entries.append(f"{msg_id} = first line\n\tcontinuation") + + return "\n".join(entries) + + +@composite +def ftl_simple_messages(draw: st.DrawFn) -> str: + """Generate simple FTL messages (ID = value). + + Example: hello = Hello, world! + """ + msg_id = draw(ftl_identifiers()) + value = draw(ftl_simple_text()) + return f"{msg_id} = {value}" + + +@composite +def ftl_messages_with_placeables(draw: st.DrawFn) -> str: + """Generate FTL messages containing placeables. + + Example: greeting = Hello { $name }! + """ + msg_id = draw(ftl_identifiers()) + var_name = draw(ftl_identifiers()) + prefix = draw(ftl_simple_text()) + suffix = draw(st.text(alphabet=FTL_SAFE_CHARS, max_size=20)) + + return f"{msg_id} = {prefix} {{ ${var_name} }}{suffix}" + + +@composite +def ftl_terms(draw: st.DrawFn) -> str: + """Generate FTL term definitions. + + Example: -brand = Firefox + """ + term_id = draw(ftl_identifiers()) + value = draw(ftl_simple_text()) + return f"-{term_id} = {value}" + + +@composite +def ftl_comments(draw: st.DrawFn) -> str: + """Generate FTL comments (all types). + + Returns one of: # comment, ## group comment, ### resource comment + """ + level = draw(st.sampled_from(["#", "##", "###"])) + content = draw(ftl_simple_text()) + return f"{level} {content}" + + +@composite +def ftl_numbers(draw: st.DrawFn) -> int | Decimal: + """Generate valid FTL numbers. + + FTL number literals support format: -?[0-9]+(.[0-9]+)? + No scientific notation. Subnormal values are excluded because + their string representation uses scientific notation (e.g., 1e-308). + """ + return draw( + st.one_of( + st.integers(min_value=-1000000, max_value=1000000), + st.decimals( + min_value=Decimal(-1000000), + max_value=Decimal(1000000), + allow_nan=False, + allow_infinity=False, + ), + ) + ) + + +@composite +def ftl_financial_numbers(draw: st.DrawFn) -> Decimal: + """Generate financial-scale numbers for financial application testing. + + Events emitted: + - strategy=financial_{magnitude}: Number magnitude category (for HypoFuzz) + - strategy=financial_decimals_{n}: Decimal places (for ISO 4217 coverage) + + Financial applications handle amounts in billions (GDP, fund values, + transaction volumes). This strategy generates numbers across the full range + needed for financial formatting tests. + + Magnitude ranges: + - small: < 1,000 (retail transactions) + - medium: 1,000 - 1,000,000 (business transactions) + - large: 1M - 1B (enterprise, fund values) + - huge: > 1B (national accounts, GDP) + + Decimal places aligned with ISO 4217: + - 0 decimals: JPY, KRW, VND + - 2 decimals: USD, EUR, GBP (standard) + - 3 decimals: KWD, BHD, OMR + - 4 decimals: CLF, UYW (accounting units) + """ + magnitude = draw(st.sampled_from(["small", "medium", "large", "huge"])) + decimals = draw(st.sampled_from([0, 2, 3, 4])) + + match magnitude: + case "small": + base = draw(st.integers(min_value=-999, max_value=999)) + case "medium": + base = draw(st.integers(min_value=-999999, max_value=999999)) + case "large": + base = draw(st.integers(min_value=-999999999, max_value=999999999)) + case _: # huge + base = draw(st.integers(min_value=-999999999999, max_value=999999999999)) + + event(f"strategy=financial_{magnitude}") + event(f"strategy=financial_decimals_{decimals}") + + if decimals == 0: + return Decimal(base) + + # Add decimal component based on ISO 4217 decimal places using exact arithmetic. + divisor = 10 ** decimals + fraction = draw(st.integers(min_value=0, max_value=divisor - 1)) + return Decimal(base) + Decimal(fraction) / Decimal(divisor) + + +# ============================================================================= +# Identifier Case Strategies (for function bridge testing) +# ============================================================================= + + +@composite +def snake_case_identifiers(draw: st.DrawFn) -> str: + """Generate snake_case identifiers. + + Events emitted: + - bridge_id_parts={n}: Number of identifier parts + """ + parts = draw(st.lists(st.sampled_from(IDENTIFIER_PARTS), min_size=1, max_size=3)) + event(f"bridge_id_parts={len(parts)}") + return "_".join(parts) + + +@composite +def camel_case_identifiers(draw: st.DrawFn) -> str: + """Generate camelCase identifiers. + + Events emitted: + - bridge_id_parts={n}: Number of identifier parts + """ + parts = draw(st.lists(st.sampled_from(IDENTIFIER_PARTS), min_size=1, max_size=3)) + event(f"bridge_id_parts={len(parts)}") + if not parts: + return "value" + return parts[0] + "".join(p.capitalize() for p in parts[1:]) + + +# ============================================================================= +# Function Bridge Strategies +# ============================================================================= + + +@composite +def ftl_function_names(draw: st.DrawFn) -> str: + """Generate valid FTL function names (UPPERCASE identifiers). + + Events emitted: + - bridge_fname_len={n}: Length category of generated name + """ + name = draw( + st.text( + alphabet=st.characters( + whitelist_categories=("Lu",), # type: ignore[arg-type] + min_codepoint=65, + max_codepoint=90, + ), + min_size=1, + max_size=20, + ).filter(lambda s: s.isidentifier()) + ) + length = "short" if len(name) <= 5 else "long" + event(f"bridge_fname_len={length}") + return name + + +@composite +def fluent_numbers(draw: st.DrawFn) -> FluentNumber: + """Generate FluentNumber instances with diverse value/format combos. + + FluentNumber.value is int | Decimal (never float — precision requirement). + + Events emitted: + - bridge_fnum_type={type}: Value type (int, decimal) + - bridge_fnum_precision={n}: Precision category (none, 0, low, high) + """ + value_type = draw(st.sampled_from(["int", "decimal"])) + event(f"bridge_fnum_type={value_type}") + + # Draw precision category first for bucket-first uniform distribution. + # "none" represents FluentNumber.precision=None (unspecified precision). + prec_cat = draw(st.sampled_from(["none", "0", "low", "high"])) + + precision: int | None + places: int + if prec_cat == "none": + precision = None + places = 0 + elif prec_cat == "0": + precision = 0 + places = 0 + elif prec_cat == "low": + precision = draw(st.integers(min_value=1, max_value=2)) + places = precision + else: # high + precision = draw(st.integers(min_value=3, max_value=6)) + places = precision + + value: int | Decimal + int_part = draw(st.integers(min_value=-999999, max_value=999999)) + if value_type == "int": + value = int_part + formatted = str(value) + elif places > 0: + frac = draw( + st.integers(min_value=0, max_value=10**places - 1) + ) + frac_str = str(frac).zfill(places) + value = Decimal(f"{int_part}.{frac_str}") + formatted = str(value) + else: + value = Decimal(int_part) + formatted = str(value) + + event(f"bridge_fnum_precision={prec_cat}") + + return FluentNumber( + value=value, formatted=formatted, precision=precision + ) diff --git a/tests/strategies/ftl_structural.py b/tests/strategies/ftl_structural.py new file mode 100644 index 00000000..e4cca027 --- /dev/null +++ b/tests/strategies/ftl_structural.py @@ -0,0 +1,479 @@ +from tests.strategies.ftl_ast import ( + ftl_patterns, + ftl_placeables, + ftl_text_elements, + ftl_variable_references, + ftl_variants, +) +from tests.strategies.ftl_shared import ( + FTL_IDENTIFIER_FIRST_CHARS, + Attribute, + Decimal, + Identifier, + Message, + MessageReference, + Pattern, + Placeable, + Resource, + SelectExpression, + TextElement, + VariableReference, + Variant, + composite, + event, + st, +) +from tests.strategies.ftl_strings import ftl_identifiers, ftl_numbers, ftl_simple_text + + +@composite +def ftl_boundary_identifiers(draw: st.DrawFn) -> str: + """Generate boundary-case identifiers. + + Tests: single char, very long, edge characters. + Uses FTL_IDENTIFIER_FIRST_CHARS per spec (includes uppercase). + """ + case = draw(st.sampled_from(["single", "long", "numeric", "hyphen", "underscore"])) + event(f"strategy=boundary_identifier_{case}") + match case: + case "single": + return draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) + case "long": + first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) + return first + "x" * draw(st.integers(50, 100)) + case "numeric": + first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) + return first + "123456789" + case "hyphen": + first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) + return first + "-" + draw(ftl_identifiers()) + case _: # underscore + first = draw(st.sampled_from(FTL_IDENTIFIER_FIRST_CHARS)) + return first + "_" + draw(ftl_identifiers()) + + +@composite +def ftl_empty_pattern_messages(draw: st.DrawFn) -> str: + """Generate messages with minimal/empty patterns. + + Edge case: message = (with trailing space only) + """ + msg_id = draw(ftl_identifiers()) + case = draw(st.sampled_from(["space", "single", "newline"])) + event(f"strategy=empty_pattern_{case}") + match case: + case "space": + return f"{msg_id} = " + case "single": + return f"{msg_id} = x" + case _: + return f"{msg_id} =\n" + + +@composite +def ftl_multiline_messages(draw: st.DrawFn) -> str: + """Generate multiline FTL messages. + + Tests continuation line handling with various indentation. + """ + msg_id = draw(ftl_identifiers()) + line1 = draw(ftl_simple_text()) + indent = " " * draw(st.integers(1, 8)) + line2 = draw(ftl_simple_text()) + event(f"strategy=multiline_indent_{len(indent)}") + + return f"{msg_id} = {line1}\n{indent}{line2}" + + +# ============================================================================= +# Recursive Strategies (deep nesting tests) +# ============================================================================= + + +def _ensure_unique_variant_keys_with_default( + variants: list[Variant], +) -> tuple[Variant, ...]: + """Ensure variants have unique keys and at least one default.""" + seen_keys: set[str] = set() + unique_variants: list[Variant] = [] + + for v in variants: + key_name = v.key.name if hasattr(v.key, "name") else str(v.key.value) + if key_name not in seen_keys: + seen_keys.add(key_name) + unique_variants.append(v) + + # Ensure at least 2 variants + if len(unique_variants) < 2: + unique_variants.append( + Variant( + key=Identifier(name="fallback"), + value=Pattern(elements=(TextElement(value="other"),)), + default=False, + ) + ) + + # Ensure exactly one default variant (required by SelectExpression.__post_init__) + # First, strip all defaults + unique_variants = [ + Variant(key=v.key, value=v.value, default=False) for v in unique_variants + ] + # Then set exactly the last one as default + unique_variants[-1] = Variant( + key=unique_variants[-1].key, + value=unique_variants[-1].value, + default=True, + ) + + return tuple(unique_variants) + + +def ftl_deeply_nested_selects( + max_depth: int = 5, +) -> st.SearchStrategy[SelectExpression]: + """Generate deeply nested select expressions. + + Used for validator stress testing - creates selects with nested selects + as selectors, up to max_depth levels deep. + + Args: + max_depth: Maximum nesting depth for select expressions + + Returns: + Strategy generating SelectExpression with possible nesting + """ + base_select = st.builds( + SelectExpression, + selector=ftl_variable_references(), + variants=st.lists(ftl_variants(), min_size=2, max_size=4).map( + _ensure_unique_variant_keys_with_default + ), + ) + + def extend( + children: st.SearchStrategy[SelectExpression], + ) -> st.SearchStrategy[SelectExpression]: + return st.builds( + SelectExpression, + selector=children, + variants=st.lists(ftl_variants(), min_size=2, max_size=4).map( + _ensure_unique_variant_keys_with_default + ), + ) + + return st.recursive(base_select, extend, max_leaves=max_depth) + + +# ============================================================================= +# AST Mutation Strategies +# ============================================================================= + + +@composite +def mutate_identifier(draw: st.DrawFn, identifier: Identifier) -> Identifier: + """Mutate an identifier by changing its name.""" + mutation_type = draw(st.sampled_from(["prefix", "suffix", "replace", "case"])) + event(f"strategy=mutate_identifier_{mutation_type}") + + match mutation_type: + case "prefix": + new_name = "mut_" + identifier.name + case "suffix": + new_name = identifier.name + "_mut" + case "replace": + new_name = draw(ftl_identifiers()) + case _: # case + new_name = identifier.name.swapcase() + + return Identifier(name=new_name) + + +@composite +def mutate_text_element(draw: st.DrawFn, element: TextElement) -> TextElement: + """Mutate a text element's value.""" + mutation_type = draw(st.sampled_from(["append", "prepend", "replace", "empty"])) + event(f"strategy=mutate_text_{mutation_type}") + + match mutation_type: + case "append": + new_value = element.value + draw(ftl_simple_text()) + case "prepend": + new_value = draw(ftl_simple_text()) + element.value + case "replace": + new_value = draw(ftl_simple_text()) + case _: # empty + new_value = " " + + return TextElement(value=new_value) + + +@composite +def mutate_pattern(draw: st.DrawFn, pattern: Pattern) -> Pattern: + """Mutate a pattern by modifying its elements.""" + if not pattern.elements: + # Empty pattern - add an element + event("strategy=mutate_pattern_seed") + new_elements = (draw(ftl_text_elements()),) + return Pattern(elements=new_elements) + + mutation_type = draw(st.sampled_from(["add", "remove", "modify"])) + event(f"strategy=mutate_pattern_{mutation_type}") + + elements = list(pattern.elements) + + match mutation_type: + case "add": + new_elem = draw(st.one_of(ftl_text_elements(), ftl_placeables())) + pos = draw(st.integers(0, len(elements))) + elements.insert(pos, new_elem) + case "remove": + if len(elements) > 1: + idx = draw(st.integers(0, len(elements) - 1)) + elements.pop(idx) + case _: # modify + if elements: + idx = draw(st.integers(0, len(elements) - 1)) + if isinstance(elements[idx], TextElement): + elem = elements[idx] + elements[idx] = draw(mutate_text_element(elem)) # type: ignore[arg-type] + + return Pattern(elements=tuple(elements)) + + +@composite +def mutate_message(draw: st.DrawFn, message: Message) -> Message: + """Mutate a message (id, value, or attributes).""" + mutation_type = draw(st.sampled_from(["id", "value", "add_attr", "remove_attr"])) + event(f"strategy=mutate_message_{mutation_type}") + + new_id = message.id + new_value = message.value + new_attrs = list(message.attributes) + + match mutation_type: + case "id": + new_id = draw(mutate_identifier(message.id)) + case "value": + if message.value: + new_value = draw(mutate_pattern(message.value)) + case "add_attr": + attr = Attribute( + id=Identifier(name=draw(ftl_identifiers())), + value=draw(ftl_patterns()), + ) + new_attrs.append(attr) + case _: # remove_attr + if new_attrs: + idx = draw(st.integers(0, len(new_attrs) - 1)) + new_attrs.pop(idx) + + return Message(id=new_id, value=new_value, attributes=tuple(new_attrs)) + + +@composite +def swap_variant_keys(draw: st.DrawFn, select: SelectExpression) -> SelectExpression: + """Swap variant keys in a select expression.""" + variants = list(select.variants) + + if len(variants) < 2: + event("strategy=swap_variant_keys_noop") + return select + + # Swap two random variants' keys + idx1, idx2 = draw(st.lists(st.integers(0, len(variants) - 1), min_size=2, max_size=2)) + event("strategy=swap_variant_keys_attempt") + if idx1 != idx2: + key1 = variants[idx1].key + key2 = variants[idx2].key + variants[idx1] = Variant( + key=key2, value=variants[idx1].value, default=variants[idx1].default + ) + variants[idx2] = Variant( + key=key1, value=variants[idx2].value, default=variants[idx2].default + ) + + return SelectExpression(selector=select.selector, variants=tuple(variants)) + + +# ============================================================================= +# Resolver Argument Strategies +# ============================================================================= + + +@composite +def resolver_string_args(draw: st.DrawFn) -> dict[str, str]: + """Generate string-only resolver arguments.""" + keys = draw(st.lists(ftl_identifiers(), min_size=0, max_size=5, unique=True)) + event(f"strategy=resolver_string_args_{len(keys)}") + return {k: draw(ftl_simple_text()) for k in keys} + + +@composite +def resolver_number_args(draw: st.DrawFn) -> dict[str, int | Decimal]: + """Generate number-only resolver arguments.""" + keys = draw(st.lists(ftl_identifiers(), min_size=0, max_size=5, unique=True)) + event(f"strategy=resolver_number_args_{len(keys)}") + return {k: draw(ftl_numbers()) for k in keys} + + +@composite +def resolver_mixed_args(draw: st.DrawFn) -> dict[str, str | int | Decimal]: + """Generate mixed-type resolver arguments.""" + keys = draw(st.lists(ftl_identifiers(), min_size=0, max_size=5, unique=True)) + event(f"strategy=resolver_mixed_args_{len(keys)}") + result: dict[str, str | int | Decimal] = {} + + for k in keys: + value: str | int | Decimal = draw( + st.one_of( + ftl_simple_text(), + st.integers(min_value=-1000000, max_value=1000000), + st.decimals( + min_value=Decimal(-1000000), + max_value=Decimal(1000000), + allow_nan=False, + allow_infinity=False, + ), + ) + ) + result[k] = value + + return result + + +@composite +def resolver_edge_case_args(draw: st.DrawFn) -> dict[str, str | int | Decimal]: + """Generate edge case resolver arguments.""" + edge_values: list[str | int | Decimal] = [ + "", # Empty string + " ", # Whitespace only + "0", # Zero as string + 0, # Zero + -1, # Negative + Decimal(0), # Decimal zero + Decimal("0.1"), # Small decimal + Decimal(10000000000), # Large number + Decimal(-10000000000), # Large negative + ] + + keys = draw(st.lists(ftl_identifiers(), min_size=1, max_size=3, unique=True)) + event(f"strategy=resolver_edge_args_{len(keys)}") + return {k: draw(st.sampled_from(edge_values)) for k in keys} + + +# ============================================================================= +# Deeply Nested AST Strategies +# ============================================================================= + + +@composite +def deeply_nested_placeables(draw: st.DrawFn, depth: int = 10) -> Placeable: + """Generate deeply nested placeables: { { { ... { $var } ... } } }.""" + event(f"strategy=deep_placeable_depth={depth}") + # Start with innermost expression + inner: VariableReference | Placeable = draw(ftl_variable_references()) + + # Wrap in placeables + for _ in range(depth): + inner = Placeable(expression=inner) + + return inner # type: ignore[return-value] + + +def deeply_nested_message_chain(depth: int = 10) -> st.SearchStrategy[Resource]: + """Generate a chain of messages referencing each other.""" + messages: list[Message] = [] + + for i in range(depth): + msg_id = Identifier(name=f"msg{i}") + + if i < depth - 1: + # Reference next message + ref = MessageReference(id=Identifier(name=f"msg{i + 1}"), attribute=None) + pattern = Pattern(elements=(Placeable(expression=ref),)) + else: + # Terminal message + pattern = Pattern(elements=(TextElement(value="End of chain"),)) + + messages.append(Message(id=msg_id, value=pattern, attributes=())) + + return st.just(Resource(entries=tuple(messages))) + + +@composite +def deeply_nested_select(draw: st.DrawFn, depth: int = 5) -> SelectExpression: + """Generate deeply nested select expressions.""" + event(f"strategy=deep_select_depth={depth}") + # Base case: simple select + base_selector = draw(ftl_variable_references()) + base_variants = ( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="One"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, + ), + ) + + current = SelectExpression(selector=base_selector, variants=base_variants) + + # Wrap in additional selects + for i in range(depth - 1): + # Use current select as value in a variant + wrapper_variants = ( + Variant( + key=Identifier(name=f"nested{i}"), + value=Pattern(elements=(Placeable(expression=current),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value=f"Fallback {i}"),)), + default=True, + ), + ) + current = SelectExpression( + selector=draw(ftl_variable_references()), + variants=wrapper_variants, + ) + + return current + + +def wide_resource(width: int = 50) -> st.SearchStrategy[Resource]: + """Generate a resource with many messages (width test).""" + messages: list[Message] = [] + + for i in range(width): + msg = Message( + id=Identifier(name=f"msg{i}"), + value=Pattern(elements=(TextElement(value=f"Message {i}"),)), + attributes=(), + ) + messages.append(msg) + + return st.just(Resource(entries=tuple(messages))) + + +def message_with_many_attributes(attr_count: int = 20) -> st.SearchStrategy[Message]: + """Generate a message with many attributes.""" + attrs: list[Attribute] = [] + + for i in range(attr_count): + attr = Attribute( + id=Identifier(name=f"attr{i}"), + value=Pattern(elements=(TextElement(value=f"Attribute {i}"),)), + ) + attrs.append(attr) + + return st.just( + Message( + id=Identifier(name="many_attrs"), + value=Pattern(elements=(TextElement(value="Main value"),)), + attributes=tuple(attrs), + ) + ) diff --git a/tests/strategies/ftl_whitespace.py b/tests/strategies/ftl_whitespace.py new file mode 100644 index 00000000..7afc69e2 --- /dev/null +++ b/tests/strategies/ftl_whitespace.py @@ -0,0 +1,388 @@ +from tests.strategies.ftl_shared import FTL_SAFE_CHARS, composite, event, st, string +from tests.strategies.ftl_strings import ftl_identifiers, ftl_simple_text + +# Line ending variations for mixed line ending tests +_LINE_ENDINGS: tuple[str, ...] = ("\n", "\r\n", "\r") + + +@composite +def blank_line(draw: st.DrawFn) -> str: + """Generate a blank line containing only spaces. + + Tests blank line handling in patterns and between entries. + Per FTL spec, blank lines may contain spaces but no other content. + """ + space_count = draw(st.integers(min_value=0, max_value=8)) + return " " * space_count + + +@composite +def blank_lines_sequence(draw: st.DrawFn) -> str: + """Generate a sequence of blank lines with varying whitespace. + + Tests handling of multiple consecutive blank lines, which affects: + - Comment separation logic + - Pattern indentation calculation + - Entry boundary detection + """ + line_count = draw(st.integers(min_value=1, max_value=5)) + lines: list[str] = [] + for _ in range(line_count): + spaces = draw(st.integers(min_value=0, max_value=4)) + lines.append(" " * spaces) + return "\n".join(lines) + + +@composite +def text_with_trailing_whitespace(draw: st.DrawFn) -> str: + """Generate text with trailing whitespace (spaces or tabs). + + Tests trailing whitespace handling which can affect: + - Pattern value boundaries + - Serializer output normalization + - Roundtrip consistency + """ + base_text = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=30)) + # Ensure base has content + if base_text.strip() == "": + base_text = draw(st.sampled_from(string.ascii_letters)) + + trailing_type = draw(st.sampled_from(["spaces", "tabs", "mixed"])) + count = draw(st.integers(min_value=1, max_value=4)) + + match trailing_type: + case "spaces": + trailing = " " * count + case "tabs": + trailing = "\t" * count + case _: # mixed + trailing = " \t" * count + + return base_text + trailing + + +@composite +def text_with_tabs(draw: st.DrawFn) -> str: + """Generate text containing tab characters. + + Per FTL spec, tabs are NOT valid whitespace and should create Junk + when appearing in syntactic positions (e.g., indentation, between + identifier and equals sign). This strategy generates text with + embedded tabs for rejection testing. + """ + prefix = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=15)) + if prefix.strip() == "": + prefix = draw(st.sampled_from(string.ascii_letters)) + + tab_position = draw(st.sampled_from(["middle", "start", "end"])) + + match tab_position: + case "middle": + suffix = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=15)) + if suffix.strip() == "": + suffix = draw(st.sampled_from(string.ascii_letters)) + return prefix + "\t" + suffix + case "start": + return "\t" + prefix + case _: # end + return prefix + "\t" + + +@composite +def mixed_line_endings_text(draw: st.DrawFn) -> str: + """Generate multi-line text with mixed line endings. + + Tests CRLF normalization handling: + - Unix (LF): \\n + - Windows (CRLF): \\r\\n + - Legacy Mac (CR): \\r + + Mixed line endings in the same file is a real-world scenario + that can occur from cross-platform editing. + """ + line_count = draw(st.integers(min_value=2, max_value=5)) + lines: list[str] = [] + + for _ in range(line_count): + # Generate line content + line = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=20)) + if line.strip() == "": + line = draw(st.sampled_from(string.ascii_letters)) + lines.append(line) + + # Join with random line endings + result_parts: list[str] = [] + for i, line in enumerate(lines): + result_parts.append(line) + if i < len(lines) - 1: + ending = draw(st.sampled_from(_LINE_ENDINGS)) + result_parts.append(ending) + + return "".join(result_parts) + + +@composite +def variant_key_with_whitespace(draw: st.DrawFn) -> str: + """Generate variant key with whitespace inside brackets. + + Tests FTL-GRAMMAR-003 and SPEC-VARIANT-WHITESPACE-001: + - Spaces after opening bracket: [ one] + - Spaces before closing bracket: [one ] + - Newlines inside variant key: [ \\n one \\n ] + """ + key = draw(ftl_identifiers()) + + whitespace_type = draw( + st.sampled_from(["leading", "trailing", "both", "newlines", "mixed"]) + ) + + match whitespace_type: + case "leading": + spaces = " " * draw(st.integers(min_value=1, max_value=3)) + return f"[{spaces}{key}]" + case "trailing": + spaces = " " * draw(st.integers(min_value=1, max_value=3)) + return f"[{key}{spaces}]" + case "both": + leading = " " * draw(st.integers(min_value=1, max_value=2)) + trailing = " " * draw(st.integers(min_value=1, max_value=2)) + return f"[{leading}{key}{trailing}]" + case "newlines": + return f"[ \n {key} \n ]" + case _: # mixed + return f"[ \n{key} ]" + + +@composite +def placeable_with_whitespace(draw: st.DrawFn) -> str: + """Generate placeable expression with whitespace around braces. + + Tests FTL-STRICT-WHITESPACE-001: + - Newlines after opening brace: { \\n $var } + - Newlines before closing brace: { $var \\n } + - Mixed whitespace around placeables + """ + var_name = draw(ftl_identifiers()) + + whitespace_type = draw(st.sampled_from(["after_open", "before_close", "both"])) + + match whitespace_type: + case "after_open": + return f"{{ \n ${var_name} }}" + case "before_close": + return f"{{ ${var_name} \n }}" + case _: # both + return f"{{ \n ${var_name} \n }}" + + +@composite +def variable_indent_multiline_pattern(draw: st.DrawFn) -> str: + """Generate multiline pattern with DIFFERENT indentation per line. + + Tests common_indent calculation in parse_pattern(): + - Each continuation line has independent indentation + - Common indent should be minimum of all non-blank lines + - Blank lines (spaces only) should be skipped in indent calculation + + Addresses FTL-GRAMMAR-001: Blank lines before first content. + """ + line_count = draw(st.integers(min_value=2, max_value=5)) + lines: list[str] = [] + + for _ in range(line_count): + indent = " " * draw(st.integers(min_value=1, max_value=8)) + content = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=20)) + if content.strip() == "": + content = draw(st.sampled_from(string.ascii_letters)) + lines.append(indent + content) + + return "\n".join(lines) + + +@composite +def pattern_with_leading_blank_lines(draw: st.DrawFn) -> str: + """Generate pattern with blank lines before first content line. + + Tests FTL-GRAMMAR-001: Parser must skip blank lines before + measuring common_indent in multiline patterns. + + Example: msg =\\n\\n value + Should produce "value", not " value". + """ + blank_count = draw(st.integers(min_value=1, max_value=3)) + indent = " " * draw(st.integers(min_value=1, max_value=8)) + content = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=20)) + if content.strip() == "": + content = draw(st.sampled_from(string.ascii_letters)) + + blank_lines = "\n" * blank_count + return f"{blank_lines}{indent}{content}" + + +@composite +def ftl_message_with_whitespace_edge_cases(draw: st.DrawFn) -> str: + """Generate FTL message exercising whitespace edge cases. + + Combines multiple whitespace edge cases into complete messages + for comprehensive fuzzing of whitespace handling. + """ + msg_id = draw(ftl_identifiers()) + + case_type = draw( + st.sampled_from([ + "trailing_ws", + "multiline_varied_indent", + "leading_blanks", + "placeable_ws", + ]) + ) + + match case_type: + case "trailing_ws": + value = draw(text_with_trailing_whitespace()) + return f"{msg_id} = {value}" + case "multiline_varied_indent": + pattern = draw(variable_indent_multiline_pattern()) + return f"{msg_id} =\n{pattern}" + case "leading_blanks": + pattern = draw(pattern_with_leading_blank_lines()) + return f"{msg_id} ={pattern}" + case _: # placeable_ws + placeable = draw(placeable_with_whitespace()) + return f"{msg_id} = Hello {placeable} World" + + +@composite +def ftl_select_with_whitespace_variants(draw: st.DrawFn) -> str: + """Generate select expression with whitespace edge cases in variants. + + Tests variant key whitespace handling and variant value patterns + with whitespace edge cases. + """ + msg_id = draw(ftl_identifiers()) + selector_var = draw(ftl_identifiers()) + + num_variants = draw(st.integers(min_value=2, max_value=4)) + default_idx = draw(st.integers(min_value=0, max_value=num_variants - 1)) + + variant_keys = ["one", "two", "few", "many", "other", "zero"] + used_keys = draw( + st.lists( + st.sampled_from(variant_keys), + min_size=num_variants, + max_size=num_variants, + unique=True, + ) + ) + + variants: list[str] = [] + for i, key in enumerate(used_keys): + prefix = "*" if i == default_idx else " " + + # Randomly add whitespace to variant key + if draw(st.booleans()): + key_str = draw(variant_key_with_whitespace()) + # Replace the generated key with our unique key + key_str = key_str.replace(key_str[1:-1].strip(), key) + else: + key_str = f"[{key}]" + + value = draw(st.text(alphabet=FTL_SAFE_CHARS, min_size=1, max_size=15)) + if value.strip() == "": + value = "value" + variants.append(f"{prefix}{key_str} {value}") + + variants_str = "\n ".join(variants) + return f"{msg_id} = {{ ${selector_var} ->\n {variants_str}\n}}" + + +def _generate_unique_id(draw: st.DrawFn, seen_ids: set[str]) -> str: + """Generate a unique FTL identifier not already in seen_ids.""" + msg_id = draw(ftl_identifiers()) + while msg_id in seen_ids: + msg_id = draw(ftl_identifiers()) + seen_ids.add(msg_id) + return msg_id + + +def _generate_whitespace_message_entry(draw: st.DrawFn, msg_id: str) -> str: + """Generate a message entry with whitespace edge cases.""" + ws_case = draw(st.sampled_from(["trailing", "multiline", "leading_blank"])) + match ws_case: + case "trailing": + value = draw(text_with_trailing_whitespace()) + return f"{msg_id} = {value}" + case "multiline": + pattern = draw(variable_indent_multiline_pattern()) + return f"{msg_id} =\n{pattern}" + case _: + pattern = draw(pattern_with_leading_blank_lines()) + return f"{msg_id} ={pattern}" + + +@composite +def ftl_resource_with_whitespace_chaos(draw: st.DrawFn) -> str: + """Generate FTL resource with mixed whitespace edge cases. + + Events emitted: + - strategy=ws_chaos_entry_{type}: Entry type in whitespace chaos resource + + Combines multiple entry types with various whitespace edge cases + for comprehensive cross-contamination testing. + """ + num_entries = draw(st.integers(min_value=2, max_value=8)) + entries: list[str] = [] + seen_ids: set[str] = set() + + # Track entry types for event emission + entry_types_used: list[str] = [] + + for _ in range(num_entries): + entry_type = draw( + st.sampled_from([ + "simple", + "whitespace_message", + "select_whitespace", + "term", + "comment", + "blank_lines", + ]) + ) + entry_types_used.append(entry_type) + + match entry_type: + case "simple": + msg_id = _generate_unique_id(draw, seen_ids) + value = draw(ftl_simple_text()) + entries.append(f"{msg_id} = {value}") + + case "whitespace_message": + msg_id = _generate_unique_id(draw, seen_ids) + entries.append(_generate_whitespace_message_entry(draw, msg_id)) + + case "select_whitespace": + entry = draw(ftl_select_with_whitespace_variants()) + entry_id = entry.split(" = ")[0] + if entry_id not in seen_ids: + seen_ids.add(entry_id) + entries.append(entry) + + case "term": + term_id = draw(ftl_identifiers()) + value = draw(ftl_simple_text()) + entries.append(f"-{term_id} = {value}") + + case "comment": + level = draw(st.sampled_from(["#", "##", "###"])) + content = draw(ftl_simple_text()) + entries.append(f"{level} {content}") + + case _: # blank_lines + blanks = draw(blank_lines_sequence()) + entries.append(blanks) + + # Emit events for entry type diversity + for et in set(entry_types_used): + event(f"strategy=ws_chaos_entry_{et}") + + return "\n\n".join(entries) diff --git a/tests/syntax_cursor_cases/__init__.py b/tests/syntax_cursor_cases/__init__.py new file mode 100644 index 00000000..059390d1 --- /dev/null +++ b/tests/syntax_cursor_cases/__init__.py @@ -0,0 +1,19 @@ +"""Tests for syntax.cursor: Cursor, ParseError, ParseResult, LineOffsetCache. + +Validates the immutable cursor pattern for type-safe parsing, line/column +computation, and the LineOffsetCache binary-search infrastructure. +""" + +from __future__ import annotations + +import pytest + +from ftllexengine.syntax.cursor import Cursor, LineOffsetCache, ParseError, ParseResult + +__all__ = [ + "Cursor", + "LineOffsetCache", + "ParseError", + "ParseResult", + "pytest", +] diff --git a/tests/syntax_cursor_cases/advance_operations.py b/tests/syntax_cursor_cases/advance_operations.py new file mode 100644 index 00000000..512cd30a --- /dev/null +++ b/tests/syntax_cursor_cases/advance_operations.py @@ -0,0 +1,86 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# ADVANCE OPERATIONS +# ============================================================================ + + +class TestCursorAdvance: + """Test cursor advancement.""" + + def test_advance_single_position(self) -> None: + """Advance cursor by 1 position.""" + cursor = Cursor("hello", 0) + + new_cursor = cursor.advance() + + assert new_cursor.pos == 1 + assert new_cursor.current == "e" + # Original unchanged + assert cursor.pos == 0 + + def test_advance_multiple_positions(self) -> None: + """Advance cursor by multiple positions.""" + cursor = Cursor("hello", 0) + + new_cursor = cursor.advance(3) + + assert new_cursor.pos == 3 + assert new_cursor.current == "l" + + def test_advance_to_eof(self) -> None: + """Advance cursor to EOF.""" + cursor = Cursor("hello", 0) + + new_cursor = cursor.advance(5) + + assert new_cursor.pos == 5 + assert new_cursor.is_eof + + def test_advance_beyond_eof_clamps_to_length(self) -> None: + """Advance beyond EOF clamps to source length.""" + cursor = Cursor("hello", 0) + + new_cursor = cursor.advance(100) + + assert new_cursor.pos == 5 + assert new_cursor.is_eof + + def test_advance_preserves_immutability(self) -> None: + """Advance creates new cursor, original unchanged.""" + cursor = Cursor("hello", 2) + + new_cursor = cursor.advance() + + assert cursor.pos == 2 + assert new_cursor.pos == 3 + assert cursor is not new_cursor + + def test_advance_zero_positions_raises(self) -> None: + """Advance by 0 raises ValueError — zero advance is a no-op and always a bug.""" + cursor = Cursor("hello", 2) + + with pytest.raises(ValueError, match="advance\\(\\) count must be >= 1, got 0"): + cursor.advance(0) + + def test_advance_negative_positions_raises(self) -> None: + """Advance by negative count raises ValueError. + + Negative advance is always a programming error: cursor.advance(-1) at + pos=0 would create Cursor(source, -1) which makes .current return + source[-1] (the last character), silently corrupting parser state. + """ + cursor = Cursor("hello", 2) + + with pytest.raises(ValueError, match="advance\\(\\) count must be >= 1, got -1"): + cursor.advance(-1) + + def test_advance_large_negative_positions_raises(self) -> None: + """Advance by large negative count raises ValueError.""" + cursor = Cursor("hello", 4) + + with pytest.raises(ValueError, match="advance\\(\\) count must be >= 1, got -100"): + cursor.advance(-100) diff --git a/tests/syntax_cursor_cases/current_character_access.py b/tests/syntax_cursor_cases/current_character_access.py new file mode 100644 index 00000000..1c11abc9 --- /dev/null +++ b/tests/syntax_cursor_cases/current_character_access.py @@ -0,0 +1,58 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# CURRENT CHARACTER ACCESS +# ============================================================================ + + +class TestCursorCurrent: + """Test current character access.""" + + def test_current_at_start(self) -> None: + """Get current character at start.""" + cursor = Cursor("hello", 0) + + assert cursor.current == "h" + + def test_current_in_middle(self) -> None: + """Get current character in middle.""" + cursor = Cursor("hello", 2) + + assert cursor.current == "l" + + def test_current_at_last_char(self) -> None: + """Get current character at last position.""" + cursor = Cursor("hello", 4) + + assert cursor.current == "o" + + def test_current_raises_eof_error_at_end(self) -> None: + """Accessing current at EOF raises EOFError.""" + cursor = Cursor("hello", 5) + + with pytest.raises(EOFError, match="Unexpected EOF"): + _ = cursor.current + + def test_current_raises_value_error_beyond_end(self) -> None: + """Constructing cursor beyond end raises ValueError, not EOFError. + + The valid way to reach EOF is pos == len(source); positions strictly + greater are rejected at construction time so .current is never reached. + """ + with pytest.raises(ValueError, match="exceeds source length"): + Cursor("hello", 10) + + def test_current_with_unicode(self) -> None: + """Get current character with Unicode.""" + cursor = Cursor("привет", 0) + + assert cursor.current == "п" + + def test_current_with_emoji(self) -> None: + """Get current character with emoji.""" + cursor = Cursor("hello 👋 world", 6) + + assert cursor.current == "👋" diff --git a/tests/syntax_cursor_cases/cursor_basic_tests.py b/tests/syntax_cursor_cases/cursor_basic_tests.py new file mode 100644 index 00000000..4f33241d --- /dev/null +++ b/tests/syntax_cursor_cases/cursor_basic_tests.py @@ -0,0 +1,53 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# CURSOR BASIC TESTS +# ============================================================================ + + +class TestCursorBasic: + """Test basic cursor functionality.""" + + def test_create_cursor(self) -> None: + """Create cursor at position 0.""" + cursor = Cursor("hello", 0) + + assert cursor.source == "hello" + assert cursor.pos == 0 + assert not cursor.is_eof + + def test_create_cursor_at_middle(self) -> None: + """Create cursor at middle of source.""" + cursor = Cursor("hello", 2) + + assert cursor.pos == 2 + assert cursor.current == "l" + + def test_cursor_immutability(self) -> None: + """Cursor is immutable (frozen dataclass).""" + cursor = Cursor("hello", 0) + + with pytest.raises(AttributeError): + cursor.pos = 5 # type: ignore[misc] + + def test_cursor_negative_pos_raises_value_error(self) -> None: + """Cursor with negative pos raises ValueError (lines 95-96). + + Negative positions silently return characters from the end of the + source via Python indexing. The guard makes this construction error + explicit rather than allowing silent wrong-character access. + """ + with pytest.raises(ValueError, match="must be >= 0"): + Cursor("hello", -1) + + def test_cursor_pos_beyond_source_raises_value_error(self) -> None: + """Cursor with pos > len(source) raises ValueError (lines 98-102). + + advance() always clamps to len(source); constructing with a larger + value indicates a programming error, not a valid EOF position. + """ + with pytest.raises(ValueError, match="exceeds source length"): + Cursor("hello", 6) diff --git a/tests/syntax_cursor_cases/edge_cases.py b/tests/syntax_cursor_cases/edge_cases.py new file mode 100644 index 00000000..4867402b --- /dev/null +++ b/tests/syntax_cursor_cases/edge_cases.py @@ -0,0 +1,49 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# EDGE CASES +# ============================================================================ + + +class TestCursorEdgeCases: + """Test cursor edge cases.""" + + def test_empty_source(self) -> None: + """Handle empty source string.""" + cursor = Cursor("", 0) + + assert cursor.is_eof + assert cursor.source == "" + + def test_single_character_source(self) -> None: + """Handle single character source.""" + cursor = Cursor("x", 0) + + assert cursor.current == "x" + assert not cursor.is_eof + + def test_cursor_with_only_newlines(self) -> None: + """Handle source with only newlines.""" + cursor = Cursor("\n\n\n", 0) + + assert cursor.current == "\n" + line, _ = cursor.compute_line_col() + assert line == 1 + + def test_cursor_with_tabs(self) -> None: + """Handle source with tabs.""" + cursor = Cursor("hello\tworld", 5) + + assert cursor.current == "\t" + + def test_cursor_with_mixed_whitespace(self) -> None: + """Handle source with mixed whitespace.""" + source = " \t\n \t\n" + cursor = Cursor(source, 4) + + line, col = cursor.compute_line_col() + assert line == 2 + assert col == 1 diff --git a/tests/syntax_cursor_cases/eof_detection.py b/tests/syntax_cursor_cases/eof_detection.py new file mode 100644 index 00000000..073604c8 --- /dev/null +++ b/tests/syntax_cursor_cases/eof_detection.py @@ -0,0 +1,46 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# EOF DETECTION +# ============================================================================ + + +class TestCursorEOF: + """Test EOF detection.""" + + def test_is_eof_false_at_start(self) -> None: + """is_eof is False at start of source.""" + cursor = Cursor("hello", 0) + + assert not cursor.is_eof + + def test_is_eof_false_in_middle(self) -> None: + """is_eof is False in middle of source.""" + cursor = Cursor("hello", 2) + + assert not cursor.is_eof + + def test_is_eof_true_at_end(self) -> None: + """is_eof is True at end of source.""" + cursor = Cursor("hello", 5) + + assert cursor.is_eof + + def test_construction_beyond_end_raises(self) -> None: + """Constructing a cursor with pos > len(source) raises ValueError. + + EOF is represented exclusively by pos == len(source). Positions beyond + the source length are construction errors: advance() always clamps to + len(source), so they cannot arise through normal cursor navigation. + """ + with pytest.raises(ValueError, match="exceeds source length"): + Cursor("hello", 10) + + def test_is_eof_true_for_empty_source(self) -> None: + """is_eof is True for empty source at position 0.""" + cursor = Cursor("", 0) + + assert cursor.is_eof diff --git a/tests/syntax_cursor_cases/integration_tests.py b/tests/syntax_cursor_cases/integration_tests.py new file mode 100644 index 00000000..2e4b64ae --- /dev/null +++ b/tests/syntax_cursor_cases/integration_tests.py @@ -0,0 +1,76 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# INTEGRATION TESTS +# ============================================================================ + + +class TestCursorIntegration: + """Test cursor in realistic parsing scenarios.""" + + def test_parse_identifier_pattern(self) -> None: + """Simulate parsing an identifier.""" + cursor = Cursor("hello_world = value", 0) + start_pos = cursor.pos + + # Advance while identifier characters + while (not cursor.is_eof and cursor.current.isalnum()) or cursor.current == "_": + cursor = cursor.advance() + + identifier = Cursor("hello_world = value", start_pos).slice_to(cursor.pos) + + assert identifier == "hello_world" + assert cursor.current == " " + + def test_parse_quoted_string_pattern(self) -> None: + """Simulate parsing a quoted string.""" + cursor = Cursor('"hello world"', 0) + + # Skip opening quote + cursor = cursor.advance() + start_pos = cursor.pos + + # Advance until closing quote + while not cursor.is_eof and cursor.current != '"': + cursor = cursor.advance() + + content = Cursor('"hello world"', start_pos).slice_to(cursor.pos) + + assert content == "hello world" + + def test_skip_whitespace_pattern(self) -> None: + """Simulate skipping whitespace.""" + cursor = Cursor(" hello", 0) + + # Skip whitespace + while not cursor.is_eof and cursor.current in " \t\n": + cursor = cursor.advance() + + assert cursor.current == "h" + assert cursor.pos == 3 + + def test_lookahead_pattern(self) -> None: + """Simulate lookahead for parser decision.""" + cursor = Cursor("hello = value", 5) + + # Check if next char is '=' + if cursor.peek(1) == "=": + cursor = cursor.advance(2) # Skip ' =' + + assert cursor.current == " " + assert cursor.pos == 7 + + def test_error_reporting_pattern(self) -> None: + """Simulate error reporting with line:col.""" + source = "line1\nline2 { $var\nline3" + cursor = Cursor(source, 18) # After $var + + error = ParseError("Expected '}'", cursor, expected=("}", )) + formatted = error.format_with_context() + + assert "2:13:" in formatted + assert "line2 { $var" in formatted + assert "^" in formatted diff --git a/tests/syntax_cursor_cases/line_and_column_computation.py b/tests/syntax_cursor_cases/line_and_column_computation.py new file mode 100644 index 00000000..e4f67407 --- /dev/null +++ b/tests/syntax_cursor_cases/line_and_column_computation.py @@ -0,0 +1,85 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# LINE AND COLUMN COMPUTATION +# ============================================================================ + + +class TestCursorLineCol: + """Test line and column computation.""" + + def test_compute_line_col_at_start(self) -> None: + """Compute line:col at start of source.""" + cursor = Cursor("hello", 0) + + line, col = cursor.compute_line_col() + + assert line == 1 + assert col == 1 + + def test_compute_line_col_in_first_line(self) -> None: + """Compute line:col in middle of first line.""" + cursor = Cursor("hello world", 6) + + line, col = cursor.compute_line_col() + + assert line == 1 + assert col == 7 + + def test_compute_line_col_at_newline(self) -> None: + """Compute line:col at newline character.""" + cursor = Cursor("hello\nworld", 5) + + line, col = cursor.compute_line_col() + + assert line == 1 + assert col == 6 + + def test_compute_line_col_after_newline(self) -> None: + """Compute line:col after newline (start of line 2).""" + cursor = Cursor("hello\nworld", 6) + + line, col = cursor.compute_line_col() + + assert line == 2 + assert col == 1 + + def test_compute_line_col_in_second_line(self) -> None: + """Compute line:col in middle of second line.""" + cursor = Cursor("hello\nworld", 9) + + line, col = cursor.compute_line_col() + + assert line == 2 + assert col == 4 + + def test_compute_line_col_multiple_lines(self) -> None: + """Compute line:col across multiple lines.""" + source = "line1\nline2\nline3\nline4" + cursor = Cursor(source, 12) # Start of line3 + + line, col = cursor.compute_line_col() + + assert line == 3 + assert col == 1 + + def test_compute_line_col_at_eof(self) -> None: + """Compute line:col at EOF.""" + cursor = Cursor("hello\nworld", 11) + + line, col = cursor.compute_line_col() + + assert line == 2 + assert col == 6 + + def test_line_col_property(self) -> None: + """Test line_col property convenience wrapper.""" + cursor = Cursor("hello\nworld", 9) + + line, col = cursor.compute_line_col() + + assert line == 2 + assert col == 4 diff --git a/tests/syntax_cursor_cases/line_offset_cache_tests_from_test_cursor_infrastructure_py.py b/tests/syntax_cursor_cases/line_offset_cache_tests_from_test_cursor_infrastructure_py.py new file mode 100644 index 00000000..37cda58d --- /dev/null +++ b/tests/syntax_cursor_cases/line_offset_cache_tests_from_test_cursor_infrastructure_py.py @@ -0,0 +1,259 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# LINE OFFSET CACHE TESTS (from test_cursor_infrastructure.py) +# ============================================================================ + + +class TestLineOffsetCacheInit: + """LineOffsetCache builds the offset table correctly during initialization.""" + + def test_init_empty_source(self) -> None: + """Empty source produces a single-element offset table [(0,)].""" + cache = LineOffsetCache("") + + assert cache._source_len == 0 # pylint: disable=protected-access + assert cache._offsets == (0,) # pylint: disable=protected-access + + def test_init_single_line(self) -> None: + """Single-line source with no newlines produces offset table [(0,)].""" + cache = LineOffsetCache("hello world") + + assert cache._source_len == 11 # pylint: disable=protected-access + assert cache._offsets == (0,) # pylint: disable=protected-access + + def test_init_multiple_lines(self) -> None: + """Three-line source produces offsets at the start of each line.""" + cache = LineOffsetCache("line1\nline2\nline3") + + assert cache._source_len == 17 # pylint: disable=protected-access + assert cache._offsets == (0, 6, 12) # pylint: disable=protected-access + + def test_init_trailing_newline(self) -> None: + """Trailing newline produces a third entry for the (empty) final line.""" + cache = LineOffsetCache("line1\nline2\n") + + assert cache._source_len == 12 # pylint: disable=protected-access + assert cache._offsets == (0, 6, 12) # pylint: disable=protected-access + + def test_init_consecutive_newlines(self) -> None: + """Consecutive newlines create entries for each empty line.""" + cache = LineOffsetCache("a\n\nb") + + assert cache._source_len == 4 # pylint: disable=protected-access + assert cache._offsets == (0, 2, 3) # pylint: disable=protected-access + + def test_init_only_newlines(self) -> None: + """Source with only newlines creates an entry after each one.""" + cache = LineOffsetCache("\n\n\n") + + assert cache._source_len == 3 # pylint: disable=protected-access + assert cache._offsets == (0, 1, 2, 3) # pylint: disable=protected-access + + +class TestLineOffsetCacheGetLineCol: + """LineOffsetCache.get_line_col maps byte offsets to (line, column) pairs.""" + + def test_get_line_col_first_position(self) -> None: + """Position 0 maps to line 1, column 1.""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(0) + + assert line == 1 + assert col == 1 + + def test_get_line_col_middle_of_first_line(self) -> None: + """Position 2 on first line maps to line 1, column 3.""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(2) + + assert line == 1 + assert col == 3 + + def test_get_line_col_start_of_second_line(self) -> None: + """First position of second line maps to line 2, column 1.""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(6) + + assert line == 2 + assert col == 1 + + def test_get_line_col_middle_of_second_line(self) -> None: + """Middle of second line maps to the correct column.""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(8) + + assert line == 2 + assert col == 3 + + def test_get_line_col_at_newline(self) -> None: + """Position at the newline character maps to the end of that line.""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(5) + + assert line == 1 + assert col == 6 + + def test_get_line_col_at_end(self) -> None: + """Position at source length maps to the final line end position.""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(11) + + assert line == 2 + assert col == 6 + + def test_get_line_col_negative_position(self) -> None: + """Negative position is clamped to 0 (line 1, column 1).""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(-5) + + assert line == 1 + assert col == 1 + + def test_get_line_col_position_beyond_source(self) -> None: + """Position beyond source length is clamped to source length.""" + cache = LineOffsetCache("hello\nworld") + + line, col = cache.get_line_col(100) + + assert line == 2 + assert col == 6 + + def test_get_line_col_empty_source(self) -> None: + """Position 0 in empty source maps to line 1, column 1.""" + cache = LineOffsetCache("") + + line, col = cache.get_line_col(0) + + assert line == 1 + assert col == 1 + + def test_get_line_col_third_line(self) -> None: + """Position at the start of the third line maps correctly.""" + cache = LineOffsetCache("a\nb\nc\nd") + + line, col = cache.get_line_col(4) + + assert line == 3 + assert col == 1 + + def test_get_line_col_many_lines_binary_search(self) -> None: + """Binary search finds correct line across many lines.""" + source = "\n".join(f"line{i}" for i in range(10)) + cache = LineOffsetCache(source) + + assert cache.get_line_col(0) == (1, 1) + assert cache.get_line_col(24) == (5, 1) + assert cache.get_line_col(42) == (8, 1) + assert cache.get_line_col(54) == (10, 1) + + def test_get_line_col_long_lines(self) -> None: + """Long lines with many characters compute column correctly.""" + cache = LineOffsetCache("a" * 100 + "\n" + "b" * 50) + + line, col = cache.get_line_col(110) + + assert line == 2 + assert col == 10 + + def test_get_line_col_position_exactly_at_source_len(self) -> None: + """Position equal to source length maps to just past the last character.""" + source = "abc" + cache = LineOffsetCache(source) + + line, col = cache.get_line_col(3) + + assert line == 1 + assert col == 4 + + def test_get_line_col_consecutive_calls(self) -> None: + """Multiple consecutive calls all return correct values.""" + cache = LineOffsetCache("hello\nworld\n!") + + assert cache.get_line_col(0) == (1, 1) + assert cache.get_line_col(5) == (1, 6) + assert cache.get_line_col(6) == (2, 1) + assert cache.get_line_col(11) == (2, 6) + assert cache.get_line_col(12) == (3, 1) + + +class TestCursorSkipLineEnd: + """Cursor.skip_line_end recognizes LF as the only line-ending character.""" + + def test_skip_line_end_at_regular_char(self) -> None: + """At a regular character, skip_line_end returns self unchanged.""" + cursor = Cursor("hello\nworld", 0) + + result = cursor.skip_line_end() + + assert result.pos == 0 + assert result is cursor + + def test_skip_line_end_at_middle_char(self) -> None: + """At a middle character, skip_line_end returns self unchanged.""" + cursor = Cursor("hello\nworld", 2) + + result = cursor.skip_line_end() + + assert result.pos == 2 + assert result is cursor + + def test_skip_line_end_at_lf(self) -> None: + """At LF, skip_line_end advances past the newline.""" + cursor = Cursor("hello\nworld", 5) + + result = cursor.skip_line_end() + + assert result.pos == 6 + + def test_skip_line_end_cr_not_recognized(self) -> None: + """CR alone is not a recognized line ending; cursor stays put. + + Cursor expects LF-normalized input. CR must be converted before + creating a Cursor. FluentParserV1.parse() handles normalization. + """ + cursor = Cursor("hello\rworld", 5) + + result = cursor.skip_line_end() + + assert result.pos == 5 + assert result is cursor + + def test_skip_line_end_at_crlf_cr_position(self) -> None: + """At CR within CRLF, skip_line_end does not advance (CR not recognized). + + For proper handling, normalize input to LF before creating a Cursor. + """ + cursor = Cursor("hello\r\nworld", 5) + + result = cursor.skip_line_end() + + assert result.pos == 5 + assert result is cursor + + def test_skip_line_end_at_crlf_lf_position(self) -> None: + """At LF within CRLF, skip_line_end advances past the LF.""" + cursor = Cursor("hello\r\nworld", 6) + + result = cursor.skip_line_end() + + assert result.pos == 7 + + def test_skip_line_end_at_eof(self) -> None: + """At EOF, skip_line_end returns self unchanged.""" + cursor = Cursor("hello", 5) + + result = cursor.skip_line_end() + + assert result.pos == 5 + assert result is cursor diff --git a/tests/syntax_cursor_cases/parse_error_tests.py b/tests/syntax_cursor_cases/parse_error_tests.py new file mode 100644 index 00000000..678bd44f --- /dev/null +++ b/tests/syntax_cursor_cases/parse_error_tests.py @@ -0,0 +1,129 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSE ERROR TESTS +# ============================================================================ + + +class TestParseError: + """Test ParseError functionality.""" + + def test_create_parse_error(self) -> None: + """Create ParseError with message and cursor.""" + cursor = Cursor("hello", 2) + error = ParseError("Expected '}'", cursor) + + assert error.message == "Expected '}'" + assert error.cursor.pos == 2 + assert error.expected == () + + def test_create_parse_error_with_expected(self) -> None: + """Create ParseError with expected tokens.""" + cursor = Cursor("hello", 2) + error = ParseError("Unexpected", cursor, expected=("}", "]")) + + assert error.expected == ("}", "]") + + def test_parse_error_immutability(self) -> None: + """ParseError is immutable.""" + cursor = Cursor("hello", 2) + error = ParseError("Error", cursor) + + with pytest.raises(AttributeError): + error.message = "New error" # type: ignore[misc] + + def test_format_error_simple(self) -> None: + """Format error without expected tokens.""" + cursor = Cursor("hello", 2) + error = ParseError("Expected '}'", cursor) + + formatted = error.format_error() + + assert "1:3:" in formatted + assert "Expected '}'" in formatted + + def test_format_error_with_expected(self) -> None: + """Format error with expected tokens.""" + cursor = Cursor("hello", 2) + error = ParseError("Unexpected token", cursor, expected=("}", "]")) + + formatted = error.format_error() + + assert "1:3:" in formatted + assert "Unexpected token" in formatted + assert "expected:" in formatted + assert "'}'" in formatted + assert "']'" in formatted + + def test_format_error_multiline_source(self) -> None: + """Format error with multiline source.""" + source = "line1\nline2\nline3" + cursor = Cursor(source, 8) # Middle of line2 + error = ParseError("Error here", cursor) + + formatted = error.format_error() + + assert "2:3:" in formatted + + def test_format_with_context_single_line(self) -> None: + """Format error with context for single line.""" + cursor = Cursor("hello world", 6) + error = ParseError("Expected '}'", cursor) + + formatted = error.format_with_context() + + assert "1:7:" in formatted + assert "hello world" in formatted + assert "^" in formatted + + def test_format_with_context_multiline(self) -> None: + """Format error with context showing multiple lines.""" + source = "line1\nline2\nline3\nline4" + cursor = Cursor(source, 8) # Middle of line2 + error = ParseError("Error", cursor) + + formatted = error.format_with_context() + + assert "2:3:" in formatted + assert "line1" in formatted + assert "line2" in formatted + assert "line3" in formatted + assert "^" in formatted + + def test_format_with_context_custom_context_lines(self) -> None: + """Format error with custom context line count.""" + source = "line1\nline2\nline3\nline4\nline5" + cursor = Cursor(source, 12) # Start of line3 + error = ParseError("Error", cursor) + + formatted = error.format_with_context(context_lines=1) + + assert "line2" in formatted + assert "line3" in formatted + assert "line4" in formatted + + def test_format_with_context_at_start(self) -> None: + """Format error with context at start of file.""" + source = "line1\nline2\nline3" + cursor = Cursor(source, 0) + error = ParseError("Error at start", cursor) + + formatted = error.format_with_context() + + assert "1:1:" in formatted + assert "line1" in formatted + assert "^" in formatted + + def test_format_with_context_at_end(self) -> None: + """Format error with context at end of file.""" + source = "line1\nline2\nline3" + cursor = Cursor(source, 17) # End of line3 + error = ParseError("Error at end", cursor) + + formatted = error.format_with_context() + + assert "line3" in formatted + assert "^" in formatted diff --git a/tests/syntax_cursor_cases/parse_result_tests.py b/tests/syntax_cursor_cases/parse_result_tests.py new file mode 100644 index 00000000..6de22f7b --- /dev/null +++ b/tests/syntax_cursor_cases/parse_result_tests.py @@ -0,0 +1,37 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSE RESULT TESTS +# ============================================================================ + + +class TestParseResult: + """Test ParseResult container.""" + + def test_create_parse_result(self) -> None: + """Create ParseResult with value and cursor.""" + cursor = Cursor("hello", 0) + result = ParseResult("h", cursor.advance()) + + assert result.value == "h" + assert result.cursor.pos == 1 + + def test_parse_result_immutability(self) -> None: + """ParseResult is immutable.""" + cursor = Cursor("hello", 0) + result = ParseResult("test", cursor) + + with pytest.raises(AttributeError): + result.value = "new" # type: ignore[misc] + + def test_parse_result_with_complex_value(self) -> None: + """ParseResult can hold complex types.""" + cursor = Cursor("hello", 3) + value = {"key": "value", "list": [1, 2, 3]} + result = ParseResult(value, cursor) + + assert result.value == {"key": "value", "list": [1, 2, 3]} + assert result.cursor.pos == 3 diff --git a/tests/syntax_cursor_cases/peek_operations.py b/tests/syntax_cursor_cases/peek_operations.py new file mode 100644 index 00000000..2a241929 --- /dev/null +++ b/tests/syntax_cursor_cases/peek_operations.py @@ -0,0 +1,90 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PEEK OPERATIONS +# ============================================================================ + + +class TestCursorPeek: + """Test peek operations.""" + + def test_peek_current(self) -> None: + """Peek at current position (offset 0).""" + cursor = Cursor("hello", 0) + + assert cursor.peek(0) == "h" + + def test_peek_next(self) -> None: + """Peek at next position (offset 1).""" + cursor = Cursor("hello", 0) + + assert cursor.peek(1) == "e" + + def test_peek_multiple_ahead(self) -> None: + """Peek multiple positions ahead.""" + cursor = Cursor("hello", 0) + + assert cursor.peek(2) == "l" + assert cursor.peek(3) == "l" + assert cursor.peek(4) == "o" + + def test_peek_at_eof_returns_none(self) -> None: + """Peek at EOF returns None.""" + cursor = Cursor("hello", 5) + + assert cursor.peek(0) is None + + def test_peek_beyond_eof_returns_none(self) -> None: + """Peek beyond EOF returns None.""" + cursor = Cursor("hello", 3) + + assert cursor.peek(5) is None + + def test_peek_does_not_modify_cursor(self) -> None: + """Peek does not modify cursor position.""" + cursor = Cursor("hello", 0) + + _ = cursor.peek(3) + assert cursor.pos == 0 + assert cursor.current == "h" + + def test_peek_negative_offset_returns_none(self) -> None: + """Negative offset returns None. + + Without the target_pos < 0 guard, peek(-1) at pos=0 would compute + target_pos=-1, skip the >=len(source) check (since -1 < 5), and + return source[-1]="o" via Python negative indexing — a silent wrong + read. The guard makes look-behind attempts safe-but-unproductive. + """ + cursor = Cursor("hello", 0) + + assert cursor.peek(-1) is None + assert cursor.peek(-5) is None + + def test_peek_negative_offset_at_mid_source_returns_none(self) -> None: + """Negative offset whose magnitude exceeds pos returns None. + + At pos=2, peek(-3) yields target_pos=-1 < 0. Without the guard this + would silently return source[-1] ("o") instead of None. + """ + cursor = Cursor("hello", 2) + + # offset that undershoots the start of the source + assert cursor.peek(-3) is None + + def test_peek_negative_offset_exactly_at_start_returns_none(self) -> None: + """Negative offset equal to pos yields target_pos=0, which is valid. + + peek(-2) at pos=2 yields target_pos=0 which is within bounds and + returns the first character. Only offsets that produce negative + target_pos return None. + """ + cursor = Cursor("hello", 2) + + # target_pos = 0 — within bounds, returns first char + assert cursor.peek(-2) == "h" + # target_pos = -1 — before start, returns None + assert cursor.peek(-3) is None diff --git a/tests/syntax_cursor_cases/slice_operations.py b/tests/syntax_cursor_cases/slice_operations.py new file mode 100644 index 00000000..6856758d --- /dev/null +++ b/tests/syntax_cursor_cases/slice_operations.py @@ -0,0 +1,60 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor.py.""" + +from tests.syntax_cursor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SLICE OPERATIONS +# ============================================================================ + + +class TestCursorSlice: + """Test cursor slice operations.""" + + def test_slice_to_from_start(self) -> None: + """Slice from start to middle.""" + cursor = Cursor("hello world", 0) + + text = cursor.slice_to(5) + + assert text == "hello" + + def test_slice_to_from_middle(self) -> None: + """Slice from middle position.""" + cursor = Cursor("hello world", 6) + + text = cursor.slice_to(11) + + assert text == "world" + + def test_slice_to_empty(self) -> None: + """Slice with same start and end returns empty string.""" + cursor = Cursor("hello", 2) + + text = cursor.slice_to(2) + + assert text == "" + + def test_slice_to_single_char(self) -> None: + """Slice single character.""" + cursor = Cursor("hello", 1) + + text = cursor.slice_to(2) + + assert text == "e" + + def test_slice_to_entire_source(self) -> None: + """Slice entire source from position 0.""" + cursor = Cursor("hello", 0) + + text = cursor.slice_to(5) + + assert text == "hello" + + def test_slice_to_with_unicode(self) -> None: + """Slice with Unicode characters.""" + cursor = Cursor("привет мир", 0) + + text = cursor.slice_to(6) + + assert text == "привет" diff --git a/tests/syntax_cursor_property_cases/__init__.py b/tests/syntax_cursor_property_cases/__init__.py new file mode 100644 index 00000000..c393a0ff --- /dev/null +++ b/tests/syntax_cursor_property_cases/__init__.py @@ -0,0 +1,42 @@ +"""Hypothesis property-based tests for syntax.cursor module. + +Tests cursor immutability, EOF handling, navigation, and ParseResult/ParseError +properties. Combines targeted property tests with comprehensive contract verification. +""" + +from __future__ import annotations + +import pytest +from hypothesis import assume, event, given, settings +from hypothesis import strategies as st + +from ftllexengine.syntax.cursor import Cursor, ParseError, ParseResult + +# ============================================================================ +# HYPOTHESIS STRATEGIES +# ============================================================================ + + +# Strategy for source text - keep max_size for performance +source_text = st.text( + alphabet=st.characters(blacklist_categories=["Cc"], blacklist_characters=["\x00"]), + min_size=0, + max_size=200, # Keep practical bound for performance +) + +# Strategy for positions (will be constrained by source length) +positions = st.integers(min_value=0, max_value=500) + +__all__ = [ + "Cursor", + "ParseError", + "ParseResult", + "assume", + "event", + "given", + "positions", + "pytest", + "settings", + "source_text", + "st", +] diff --git a/tests/syntax_cursor_property_cases/contract_tests_from_test_cursor_comprehensive_py.py b/tests/syntax_cursor_property_cases/contract_tests_from_test_cursor_comprehensive_py.py new file mode 100644 index 00000000..b023dd87 --- /dev/null +++ b/tests/syntax_cursor_property_cases/contract_tests_from_test_cursor_comprehensive_py.py @@ -0,0 +1,605 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor_property.py.""" + +from tests.syntax_cursor_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# CONTRACT TESTS (from test_cursor_comprehensive.py) +# ============================================================================ + + +class TestCursorImmutabilityContracts: + """Contract-level tests for Cursor immutability.""" + + def test_cursor_frozen(self) -> None: + """Property: Cursor instances are immutable (frozen).""" + cursor = Cursor(source="hello", pos=0) + + with pytest.raises((AttributeError, TypeError)): + cursor.pos = 1 # type: ignore[misc] + + @given(st.text(), st.integers(min_value=0, max_value=1000)) + def test_cursor_construction(self, source: str, pos: int) -> None: + """Property: Cursor can be constructed with any valid source and position.""" + event(f"input_len={len(source)}") + # Clamp position to valid range + pos = min(pos, len(source)) + cursor = Cursor(source=source, pos=pos) + assert cursor.source == source + assert cursor.pos == pos + + +class TestCursorEOFProperty: + """Property-based tests for Cursor.is_eof property.""" + + def test_is_eof_at_start_of_nonempty_string(self) -> None: + """Verify is_eof is False at start of non-empty string.""" + cursor = Cursor(source="hello", pos=0) + assert cursor.is_eof is False + + def test_is_eof_at_end_of_string(self) -> None: + """Verify is_eof is True at end of string.""" + cursor = Cursor(source="hello", pos=5) + assert cursor.is_eof is True + + def test_construction_beyond_end_raises(self) -> None: + """Verify constructing cursor with pos > len(source) raises ValueError.""" + with pytest.raises(ValueError, match="exceeds source length"): + Cursor(source="hello", pos=10) + + def test_is_eof_empty_string(self) -> None: + """Verify is_eof is True for empty string at position 0.""" + cursor = Cursor(source="", pos=0) + assert cursor.is_eof is True + + @given(st.text(min_size=1)) + def test_is_eof_middle_of_string(self, source: str) -> None: + """Property: is_eof is False in middle of string.""" + event(f"input_len={len(source)}") + mid_pos = len(source) // 2 + cursor = Cursor(source=source, pos=mid_pos) + if mid_pos < len(source): + assert cursor.is_eof is False + + +class TestCursorCurrentProperty: + """Property-based tests for Cursor.current property.""" + + def test_current_at_start(self) -> None: + """Verify current returns first character at position 0.""" + cursor = Cursor(source="hello", pos=0) + assert cursor.current == "h" + + def test_current_in_middle(self) -> None: + """Verify current returns character at current position.""" + cursor = Cursor(source="hello", pos=2) + assert cursor.current == "l" + + def test_current_raises_at_eof(self) -> None: + """Verify current raises EOFError at EOF.""" + cursor = Cursor(source="hello", pos=5) + with pytest.raises(EOFError, match="EOF"): + _ = cursor.current + + def test_construction_beyond_eof_raises(self) -> None: + """Verify construction with pos beyond source length raises ValueError. + + The valid range for pos is [0, len(source)]. Positions strictly greater + than len(source) are rejected at construction time. + """ + with pytest.raises(ValueError, match="exceeds source length"): + Cursor(source="hello", pos=10) + + @given( + st.text(min_size=1).flatmap( + lambda s: st.tuples(st.just(s), st.integers(min_value=0, max_value=len(s) - 1)) + ) + ) + def test_current_returns_correct_character(self, source_pos: tuple[str, int]) -> None: + """Property: current returns character at position if valid.""" + source, pos = source_pos + event(f"input_len={len(source)}") + event(f"offset={pos}") + cursor = Cursor(source=source, pos=pos) + assert cursor.current == source[pos] + + +class TestCursorPeekMethod: + """Property-based tests for Cursor.peek() method.""" + + def test_peek_at_current_position(self) -> None: + """Verify peek(0) returns current character.""" + cursor = Cursor(source="hello", pos=0) + assert cursor.peek(0) == "h" + + def test_peek_ahead_one(self) -> None: + """Verify peek(1) returns next character.""" + cursor = Cursor(source="hello", pos=0) + assert cursor.peek(1) == "e" + + def test_peek_beyond_eof_returns_none(self) -> None: + """Verify peek() returns None when peeking beyond EOF.""" + cursor = Cursor(source="hello", pos=4) + assert cursor.peek(1) is None + + def test_peek_at_eof_returns_none(self) -> None: + """Verify peek() returns None at EOF.""" + cursor = Cursor(source="hello", pos=5) + assert cursor.peek(0) is None + + @given(st.text(min_size=2), st.integers(min_value=0, max_value=10)) + def test_peek_with_various_offsets(self, source: str, offset: int) -> None: + """Property: peek(offset) returns correct character or None.""" + event(f"input_len={len(source)}") + event(f"offset={offset}") + cursor = Cursor(source=source, pos=0) + result = cursor.peek(offset) + + in_bounds = offset < len(source) + event(f"valid={in_bounds}") + if in_bounds: + assert result == source[offset] + else: + assert result is None + + @given( + source=st.text(min_size=1), + pos=st.integers(min_value=0, max_value=20), + offset=st.integers(min_value=-50, max_value=-1), + ) + def test_peek_negative_offset_always_returns_none_or_valid( + self, source: str, pos: int, offset: int + ) -> None: + """Property: peek(offset) with negative offset returns None or in-bounds char. + + Verifies the target_pos < 0 guard: negative offsets whose magnitude + exceeds pos must return None, never a character from the END of the source + (Python negative indexing trap). + """ + pos = min(pos, len(source)) + cursor = Cursor(source=source, pos=pos) + target_pos = pos + offset + result = cursor.peek(offset) + + if target_pos < 0: + event("outcome=negative_target_returns_none") + # Without the guard this would silently return source[target_pos] + # (a character from the END of source). Must be None. + assert result is None + elif target_pos >= len(source): + event("outcome=beyond_eof_returns_none") + assert result is None + else: + event("outcome=in_bounds_lookbehind") + assert result == source[target_pos] + + +class TestCursorAdvanceMethod: + """Property-based tests for Cursor.advance() method.""" + + def test_advance_single_position(self) -> None: + """Verify advance() moves cursor by 1 position.""" + cursor = Cursor(source="hello", pos=0) + new_cursor = cursor.advance() + assert new_cursor.pos == 1 + assert cursor.pos == 0 # Original unchanged + + def test_advance_multiple_positions(self) -> None: + """Verify advance(count) moves cursor by count positions.""" + cursor = Cursor(source="hello", pos=0) + new_cursor = cursor.advance(3) + assert new_cursor.pos == 3 + + def test_advance_clamped_at_eof(self) -> None: + """Verify advance() clamps position at EOF.""" + cursor = Cursor(source="hello", pos=3) + new_cursor = cursor.advance(10) + assert new_cursor.pos == 5 # Clamped to len(source) + + def test_advance_from_eof_stays_at_eof(self) -> None: + """Verify advance() from EOF stays at EOF.""" + cursor = Cursor(source="hello", pos=5) + new_cursor = cursor.advance() + assert new_cursor.pos == 5 + + @given( + st.text(), + st.integers(min_value=0, max_value=100), + st.integers(min_value=1, max_value=10), + ) + def test_advance_returns_new_cursor(self, source: str, pos: int, count: int) -> None: + """Property: advance() returns new cursor, original unchanged.""" + event(f"input_len={len(source)}") + event(f"offset={pos}") + pos = min(pos, len(source)) + cursor = Cursor(source=source, pos=pos) + new_cursor = cursor.advance(count) + + # Original unchanged + assert cursor.pos == pos + # New cursor advanced (clamped at len(source)) + expected_pos = min(pos + count, len(source)) + assert new_cursor.pos == expected_pos + + +class TestCursorSliceToMethod: + """Property-based tests for Cursor.slice_to() method.""" + + def test_slice_to_simple_range(self) -> None: + """Verify slice_to() extracts substring.""" + cursor = Cursor(source="hello world", pos=0) + text = cursor.slice_to(5) + assert text == "hello" + + def test_slice_to_from_middle(self) -> None: + """Verify slice_to() works from middle position.""" + cursor = Cursor(source="hello world", pos=6) + text = cursor.slice_to(11) + assert text == "world" + + def test_slice_to_empty_range(self) -> None: + """Verify slice_to() returns empty string for empty range.""" + cursor = Cursor(source="hello", pos=2) + text = cursor.slice_to(2) + assert text == "" + + @given(st.text(min_size=1)) + def test_slice_to_full_string(self, source: str) -> None: + """Property: slice_to(len(source)) from pos=0 returns full string.""" + event(f"input_len={len(source)}") + cursor = Cursor(source=source, pos=0) + text = cursor.slice_to(len(source)) + assert text == source + + +class TestCursorSkipSpacesMethod: + """Property-based tests for Cursor.skip_spaces() method.""" + + def test_skip_spaces_no_spaces(self) -> None: + """Verify skip_spaces() returns same cursor when no spaces.""" + cursor = Cursor(source="hello", pos=0) + new_cursor = cursor.skip_spaces() + assert new_cursor.pos == 0 + + def test_skip_spaces_leading_spaces(self) -> None: + """Verify skip_spaces() skips leading spaces.""" + cursor = Cursor(source=" hello", pos=0) + new_cursor = cursor.skip_spaces() + assert new_cursor.pos == 3 + assert new_cursor.current == "h" + + def test_skip_spaces_all_spaces(self) -> None: + """Verify skip_spaces() handles all-space string.""" + cursor = Cursor(source=" ", pos=0) + new_cursor = cursor.skip_spaces() + assert new_cursor.is_eof is True + + def test_skip_spaces_only_space_not_tab(self) -> None: + """Verify skip_spaces() only skips space (U+0020), not tab.""" + cursor = Cursor(source=" \thello", pos=0) + new_cursor = cursor.skip_spaces() + assert new_cursor.pos == 2 + assert new_cursor.current == "\t" + + def test_skip_spaces_not_newline(self) -> None: + """Verify skip_spaces() does not skip newlines.""" + cursor = Cursor(source=" \nhello", pos=0) + new_cursor = cursor.skip_spaces() + assert new_cursor.pos == 2 + assert new_cursor.current == "\n" + + +class TestCursorSkipWhitespaceMethod: + """Property-based tests for Cursor.skip_whitespace() method.""" + + def test_skip_whitespace_no_whitespace(self) -> None: + """Verify skip_whitespace() returns same cursor when no whitespace.""" + cursor = Cursor(source="hello", pos=0) + new_cursor = cursor.skip_whitespace() + assert new_cursor.pos == 0 + + def test_skip_whitespace_mixed_whitespace(self) -> None: + """Verify skip_whitespace() skips space and newline. + + Note: CR is normalized to LF at parser entry, so skip_whitespace + only needs to handle space and LF. + """ + cursor = Cursor(source=" \n hello", pos=0) + new_cursor = cursor.skip_whitespace() + assert new_cursor.pos == 5 + assert new_cursor.current == "h" + + def test_skip_whitespace_all_whitespace(self) -> None: + """Verify skip_whitespace() handles all-whitespace string.""" + cursor = Cursor(source=" \n ", pos=0) + new_cursor = cursor.skip_whitespace() + assert new_cursor.is_eof is True + + def test_skip_whitespace_not_tab(self) -> None: + """Verify skip_whitespace() does not skip tab.""" + cursor = Cursor(source=" \n\thello", pos=0) + new_cursor = cursor.skip_whitespace() + assert new_cursor.pos == 2 + assert new_cursor.current == "\t" + + +class TestCursorExpectMethod: + """Property-based tests for Cursor.expect() method.""" + + def test_expect_match(self) -> None: + """Verify expect() returns new cursor when character matches.""" + cursor = Cursor(source="hello", pos=0) + new_cursor = cursor.expect("h") + assert new_cursor is not None + assert new_cursor.pos == 1 + + def test_expect_no_match(self) -> None: + """Verify expect() returns None when character does not match.""" + cursor = Cursor(source="hello", pos=0) + result = cursor.expect("x") + assert result is None + + def test_expect_at_eof(self) -> None: + """Verify expect() returns None at EOF.""" + cursor = Cursor(source="hello", pos=5) + result = cursor.expect("h") + assert result is None + + @given(st.text(min_size=1), st.characters()) + def test_expect_various_characters(self, source: str, char: str) -> None: + """Property: expect() behavior depends on current character.""" + event(f"input_len={len(source)}") + cursor = Cursor(source=source, pos=0) + result = cursor.expect(char) + + matched = source[0] == char + event(f"valid={matched}") + if matched: + assert result is not None + assert result.pos == 1 + else: + assert result is None + + +class TestCursorComputeLineColMethod: + """Property-based tests for Cursor.compute_line_col() method.""" + + def test_compute_line_col_first_line_first_col(self) -> None: + """Verify compute_line_col() returns (1, 1) at position 0.""" + cursor = Cursor(source="hello", pos=0) + line, col = cursor.compute_line_col() + assert line == 1 + assert col == 1 + + def test_compute_line_col_first_line_later_col(self) -> None: + """Verify compute_line_col() returns correct column on first line.""" + cursor = Cursor(source="hello", pos=2) + line, col = cursor.compute_line_col() + assert line == 1 + assert col == 3 + + def test_compute_line_col_second_line(self) -> None: + """Verify compute_line_col() returns (2, 1) at start of second line.""" + cursor = Cursor(source="line1\nline2", pos=6) + line, col = cursor.compute_line_col() + assert line == 2 + assert col == 1 + + def test_compute_line_col_second_line_middle(self) -> None: + """Verify compute_line_col() returns correct position on second line.""" + cursor = Cursor(source="line1\nline2", pos=8) + line, col = cursor.compute_line_col() + assert line == 2 + assert col == 3 + + def test_compute_line_col_multiple_lines(self) -> None: + """Verify compute_line_col() handles multiple newlines.""" + cursor = Cursor(source="a\nb\nc\nd", pos=6) # Position at 'd' + line, col = cursor.compute_line_col() + assert line == 4 + assert col == 1 + + +class TestParseResultDataclass: + """Property-based tests for ParseResult dataclass.""" + + def test_parse_result_frozen(self) -> None: + """Property: ParseResult instances are immutable (frozen).""" + cursor = Cursor(source="test", pos=0) + result: ParseResult[str] = ParseResult(value="parsed", cursor=cursor) + + with pytest.raises((AttributeError, TypeError)): + result.value = "changed" # type: ignore[misc] + + @given(st.text(), st.text(), st.integers(min_value=0, max_value=100)) + def test_parse_result_construction_string( + self, value: str, source: str, pos: int + ) -> None: + """Property: ParseResult can be constructed with string values.""" + event(f"input_len={len(source)}") + event(f"offset={pos}") + pos = min(pos, len(source)) + cursor = Cursor(source=source, pos=pos) + result: ParseResult[str] = ParseResult(value=value, cursor=cursor) + assert result.value == value + assert result.cursor is cursor + + @given(st.integers()) + def test_parse_result_construction_int(self, value: int) -> None: + """Property: ParseResult can be constructed with int values.""" + event(f"value={value}") + cursor = Cursor(source="test", pos=0) + result: ParseResult[int] = ParseResult(value=value, cursor=cursor) + assert result.value == value + + def test_parse_result_generic_type(self) -> None: + """Verify ParseResult works with various types.""" + cursor = Cursor(source="test", pos=0) + + # String type + str_result: ParseResult[str] = ParseResult(value="hello", cursor=cursor) + assert str_result.value == "hello" + + # List type + list_result: ParseResult[list[int]] = ParseResult(value=[1, 2, 3], cursor=cursor) + assert list_result.value == [1, 2, 3] + + # Tuple type + tuple_result: ParseResult[tuple[str, int]] = ParseResult( + value=("test", 42), cursor=cursor + ) + assert tuple_result.value == ("test", 42) + + +class TestParseErrorDataclass: + """Property-based tests for ParseError dataclass.""" + + def test_parse_error_frozen(self) -> None: + """Property: ParseError instances are immutable (frozen).""" + cursor = Cursor(source="test", pos=0) + error = ParseError(message="error", cursor=cursor) + + with pytest.raises((AttributeError, TypeError)): + error.message = "changed" # type: ignore[misc] + + @given(st.text(), st.text()) + def test_parse_error_construction_minimal(self, message: str, source: str) -> None: + """Property: ParseError can be constructed with message and cursor only.""" + event(f"input_len={len(source)}") + cursor = Cursor(source=source, pos=0) + error = ParseError(message=message, cursor=cursor) + assert error.message == message + assert error.cursor is cursor + assert error.expected == () + + def test_parse_error_construction_with_expected(self) -> None: + """Verify ParseError can be constructed with expected tokens.""" + cursor = Cursor(source="test", pos=0) + error = ParseError(message="error", cursor=cursor, expected=("}", "]")) + assert error.expected == ("}", "]") + + +class TestParseErrorFormatError: + """Property-based tests for ParseError.format_error() method.""" + + def test_format_error_simple(self) -> None: + """Verify format_error() returns formatted error string.""" + cursor = Cursor(source="hello", pos=2) + error = ParseError(message="Test error", cursor=cursor) + formatted = error.format_error() + assert "1:3:" in formatted + assert "Test error" in formatted + + def test_format_error_with_expected(self) -> None: + """Verify format_error() includes expected tokens.""" + cursor = Cursor(source="hello", pos=0) + error = ParseError(message="Unexpected", cursor=cursor, expected=("}", "]")) + formatted = error.format_error() + assert "expected:" in formatted + assert "'}'" in formatted + assert "']'" in formatted + + def test_format_error_multiline_source(self) -> None: + """Verify format_error() shows correct line number for multiline source.""" + cursor = Cursor(source="line1\nline2\nline3", pos=6) # Start of line2 + error = ParseError(message="Error on line 2", cursor=cursor) + formatted = error.format_error() + assert "2:1:" in formatted + + +class TestParseErrorFormatWithContext: + """Property-based tests for ParseError.format_with_context() method.""" + + def test_format_with_context_simple(self) -> None: + """Verify format_with_context() shows source context.""" + cursor = Cursor(source="hello world", pos=6) + error = ParseError(message="Test error", cursor=cursor) + formatted = error.format_with_context() + + assert "1:7: Test error" in formatted + assert "hello world" in formatted + assert "^" in formatted # Pointer + + def test_format_with_context_multiline(self) -> None: + """Verify format_with_context() shows multiple lines.""" + source = "line1\nline2\nline3" + cursor = Cursor(source=source, pos=6) # Start of line2 + error = ParseError(message="Error", cursor=cursor) + formatted = error.format_with_context() + + assert "line1" in formatted + assert "line2" in formatted + assert "line3" in formatted + assert "^" in formatted + + def test_format_with_context_custom_context_lines(self) -> None: + """Verify format_with_context() respects context_lines parameter.""" + source = "line1\nline2\nline3\nline4\nline5" + cursor = Cursor(source=source, pos=12) # Line 3 + error = ParseError(message="Error", cursor=cursor) + + # With context_lines=1, should show lines 2-4 + formatted = error.format_with_context(context_lines=1) + assert "line2" in formatted + assert "line3" in formatted + assert "line4" in formatted + + def test_format_with_context_pointer_alignment(self) -> None: + """Verify format_with_context() aligns pointer correctly.""" + cursor = Cursor(source="hello", pos=2) + error = ParseError(message="Error", cursor=cursor) + formatted = error.format_with_context() + + lines = formatted.split("\n") + # Find the line with hello and the pointer line + for i, line in enumerate(lines): + if "hello" in line and i + 1 < len(lines): + # Next line should have pointer at correct position + pointer_line = lines[i + 1] + # The pointer should be at column 3 (accounting for line number prefix) + assert "^" in pointer_line + + +class TestCursorIntegrationContracts: + """Integration contract tests for Cursor methods working together.""" + + def test_cursor_parse_word(self) -> None: + """Integration: Use cursor to parse a word.""" + cursor = Cursor(source="hello world", pos=0) + start_pos = cursor.pos + + # Advance until space + while not cursor.is_eof and cursor.current != " ": + cursor = cursor.advance() + + # Extract word + word = Cursor(source="hello world", pos=start_pos).slice_to(cursor.pos) + assert word == "hello" + + def test_cursor_skip_and_parse(self) -> None: + """Integration: Skip whitespace then parse.""" + cursor = Cursor(source=" hello", pos=0) + + # Skip spaces + cursor = cursor.skip_spaces() + + # Parse word + start_pos = cursor.pos + while not cursor.is_eof and cursor.current.isalpha(): + cursor = cursor.advance() + + word = Cursor(source=" hello", pos=start_pos).slice_to(cursor.pos) + assert word == "hello" + + def test_cursor_peek_and_expect(self) -> None: + """Integration: Use peek to look ahead, then expect.""" + cursor = Cursor(source="hello", pos=0) + + # Peek ahead + assert cursor.peek(0) == "h" + assert cursor.peek(1) == "e" + + # Expect and advance + new_cursor = cursor.expect("h") + assert new_cursor is not None + assert new_cursor.current == "e" diff --git a/tests/syntax_cursor_property_cases/property_tests_eof_handling.py b/tests/syntax_cursor_property_cases/property_tests_eof_handling.py new file mode 100644 index 00000000..ccd0de8f --- /dev/null +++ b/tests/syntax_cursor_property_cases/property_tests_eof_handling.py @@ -0,0 +1,69 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor_property.py.""" + +from tests.syntax_cursor_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - EOF HANDLING +# ============================================================================ + + +class TestCursorEOF: + """Test EOF (End Of File) detection properties.""" + + @given(source=source_text) + @settings(max_examples=200) + def test_is_eof_true_at_end(self, source: str) -> None: + """PROPERTY: is_eof is True when pos >= len(source).""" + event(f"text_len={len(source)}") + cursor = Cursor(source, len(source)) + assert cursor.is_eof is True + + @given(source=source_text.filter(lambda s: len(s) > 0)) + @settings(max_examples=200) + def test_is_eof_false_before_end(self, source: str) -> None: + """PROPERTY: is_eof is False when pos < len(source).""" + event(f"text_len={len(source)}") + cursor = Cursor(source, 0) + assert cursor.is_eof is False + + @given(source=source_text) + @settings(max_examples=100) + def test_current_raises_eoferror_at_eof(self, source: str) -> None: + """PROPERTY: current raises EOFError when is_eof is True.""" + event(f"text_len={len(source)}") + cursor = Cursor(source, len(source)) + + if cursor.is_eof: + with pytest.raises(EOFError): + _ = cursor.current + + @given(source=source_text.filter(lambda s: len(s) > 0)) + @settings(max_examples=100) + def test_current_succeeds_before_eof(self, source: str) -> None: + """PROPERTY: current succeeds when is_eof is False.""" + event(f"text_len={len(source)}") + cursor = Cursor(source, 0) + + if not cursor.is_eof: + # Should not raise + char = cursor.current + assert isinstance(char, str) + assert len(char) == 1 + + @given(source=source_text.filter(lambda s: len(s) > 0)) + @settings(max_examples=100) + def test_advance_until_eof_reaches_end(self, source: str) -> None: + """PROPERTY: Advancing through source eventually reaches EOF.""" + event(f"text_len={len(source)}") + cursor = Cursor(source, 0) + + # Advance until EOF + for _ in range(len(source) + 1): + if cursor.is_eof: + break + cursor = cursor.advance() + + # Should be at EOF + assert cursor.is_eof is True + assert cursor.pos >= len(source) diff --git a/tests/syntax_cursor_property_cases/property_tests_idempotence.py b/tests/syntax_cursor_property_cases/property_tests_idempotence.py new file mode 100644 index 00000000..34661468 --- /dev/null +++ b/tests/syntax_cursor_property_cases/property_tests_idempotence.py @@ -0,0 +1,71 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor_property.py.""" + +from tests.syntax_cursor_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - IDEMPOTENCE +# ============================================================================ + + +class TestCursorIdempotence: + """Test idempotent cursor operations.""" + + @given(source=source_text, pos=positions) + @settings(max_examples=100) + def test_is_eof_is_idempotent(self, source: str, pos: int) -> None: + """PROPERTY: Multiple is_eof calls return same value.""" + event(f"source_len={len(source)}") + # Clamp pos to the valid range [0, len(source)] + pos = min(pos, len(source)) + cursor = Cursor(source, pos) + + result1 = cursor.is_eof + result2 = cursor.is_eof + result3 = cursor.is_eof + + assert result1 == result2 == result3 + + @given(source=source_text.filter(lambda s: len(s) > 0)) + @settings(max_examples=100) + def test_current_is_idempotent(self, source: str) -> None: + """PROPERTY: Multiple current accesses return same character.""" + event(f"source_len={len(source)}") + cursor = Cursor(source, 0) + + if not cursor.is_eof: + char1 = cursor.current + char2 = cursor.current + char3 = cursor.current + + assert char1 == char2 == char3 + + @given( + source=source_text.filter(lambda s: len(s) > 2), + offset=st.integers(min_value=0, max_value=5), + ) + @settings(max_examples=100) + def test_peek_is_idempotent(self, source: str, offset: int) -> None: + """PROPERTY: Multiple peek calls return same result.""" + event(f"offset={offset}") + cursor = Cursor(source, 0) + + peek1 = cursor.peek(offset) + peek2 = cursor.peek(offset) + peek3 = cursor.peek(offset) + + assert peek1 == peek2 == peek3 + + @given(source=source_text, pos=st.integers(min_value=0, max_value=100)) + @settings(max_examples=100) + def test_line_col_is_idempotent(self, source: str, pos: int) -> None: + """PROPERTY: Multiple line_col accesses return same value.""" + event(f"source_len={len(source)}") + pos = min(pos, len(source)) + cursor = Cursor(source, pos) + + lc1 = cursor.compute_line_col() + lc2 = cursor.compute_line_col() + lc3 = cursor.compute_line_col() + + assert lc1 == lc2 == lc3 diff --git a/tests/syntax_cursor_property_cases/property_tests_immutability.py b/tests/syntax_cursor_property_cases/property_tests_immutability.py new file mode 100644 index 00000000..bffdb0c8 --- /dev/null +++ b/tests/syntax_cursor_property_cases/property_tests_immutability.py @@ -0,0 +1,61 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor_property.py.""" + +from tests.syntax_cursor_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - IMMUTABILITY +# ============================================================================ + + +class TestCursorImmutability: + """Test cursor immutability properties.""" + + @given(source=source_text, pos=positions) + @settings(max_examples=200) + def test_cursor_is_immutable(self, source: str, pos: int) -> None: + """INVARIANT: Cursor is immutable - advance() returns NEW cursor.""" + assume(pos < len(source)) # Valid position + event(f"text_len={len(source)}") + + cursor = Cursor(source, pos) + original_pos = cursor.pos + + # Advance cursor + new_cursor = cursor.advance() + + # Original cursor unchanged + assert cursor.pos == original_pos + # New cursor has new position + assert new_cursor.pos == original_pos + 1 + + @given(source=source_text, pos=positions) + @settings(max_examples=200) + def test_advance_count_returns_new_cursor(self, source: str, pos: int) -> None: + """PROPERTY: advance(count) returns new cursor, original unchanged.""" + assume(pos < len(source)) + event(f"pos={pos}") + + cursor = Cursor(source, pos) + original_pos = cursor.pos + + # Advance by N + n = min(5, len(source) - pos) + new_cursor = cursor.advance(n) + + # Original unchanged + assert cursor.pos == original_pos + # New cursor advanced by N + assert new_cursor.pos == original_pos + n + + @given(source=source_text) + @settings(max_examples=100) + def test_cursor_advance_preserves_source(self, source: str) -> None: + """PROPERTY: advance() preserves source string.""" + event(f"text_len={len(source)}") + cursor = Cursor(source, 0) + + while not cursor.is_eof: + new_cursor = cursor.advance() + assert new_cursor.source == source + cursor = new_cursor diff --git a/tests/syntax_cursor_property_cases/property_tests_line_column_tracking.py b/tests/syntax_cursor_property_cases/property_tests_line_column_tracking.py new file mode 100644 index 00000000..a38dc918 --- /dev/null +++ b/tests/syntax_cursor_property_cases/property_tests_line_column_tracking.py @@ -0,0 +1,61 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor_property.py.""" + +from tests.syntax_cursor_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - LINE/COLUMN TRACKING +# ============================================================================ + + +class TestCursorLineColumn: + """Test line and column tracking properties.""" + + @given(source=source_text) + @settings(max_examples=100) + def test_line_starts_at_one(self, source: str) -> None: + """PROPERTY: Line numbers start at 1.""" + event(f"source_len={len(source)}") + cursor = Cursor(source, 0) + line, _ = cursor.compute_line_col() + + assert line >= 1 + + @given(source=source_text) + @settings(max_examples=100) + def test_column_starts_at_one(self, source: str) -> None: + """PROPERTY: Column numbers start at 1.""" + event(f"source_len={len(source)}") + cursor = Cursor(source, 0) + _, column = cursor.compute_line_col() + + assert column >= 1 + + @given(lines=st.lists(st.text(), min_size=1, max_size=10)) # Keep list bound for performance + @settings(max_examples=50) + def test_newline_increments_line_number(self, lines: list[str]) -> None: + """PROPERTY: Newlines increment line number.""" + event(f"line_count={len(lines)}") + source = "\n".join(lines) + + # Count newlines + newline_count = source.count("\n") + + # Advance to end + cursor_end = Cursor(source, len(source)) + line_end, _ = cursor_end.compute_line_col() + + # Line number should be newline_count + 1 + assert line_end == newline_count + 1 + + @given(source=source_text) + @settings(max_examples=50) + def test_compute_line_col_equals_property(self, source: str) -> None: + """PROPERTY: compute_line_col() returns same as line_col property.""" + event(f"source_len={len(source)}") + cursor = Cursor(source, min(len(source), 10)) + + result1 = cursor.compute_line_col() + result2 = cursor.compute_line_col() + + assert result1 == result2 diff --git a/tests/syntax_cursor_property_cases/property_tests_navigation.py b/tests/syntax_cursor_property_cases/property_tests_navigation.py new file mode 100644 index 00000000..c0bcdfcf --- /dev/null +++ b/tests/syntax_cursor_property_cases/property_tests_navigation.py @@ -0,0 +1,86 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor_property.py.""" + +from tests.syntax_cursor_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - NAVIGATION +# ============================================================================ + + +class TestCursorNavigation: + """Test cursor navigation properties.""" + + @given(source=source_text, pos=positions) + @settings(max_examples=200) + def test_current_returns_char_at_position(self, source: str, pos: int) -> None: + """PROPERTY: current returns character at pos.""" + assume(pos < len(source)) + event(f"pos={pos}") + + cursor = Cursor(source, pos) + + if not cursor.is_eof: + assert cursor.current == source[pos] + + @given( + source=source_text.filter(lambda s: len(s) > 1), + n=st.integers(min_value=1, max_value=10), + ) + @settings(max_examples=100) + def test_advance_count_moves_by_count(self, source: str, n: int) -> None: + """PROPERTY: advance(k) moves position by k.""" + event(f"advance_count={n}") + cursor = Cursor(source, 0) + n_safe = min(n, len(source)) + + new_cursor = cursor.advance(n_safe) + + assert new_cursor.pos == cursor.pos + n_safe + + @given(source=source_text.filter(lambda s: len(s) > 0)) + @settings(max_examples=100) + def test_advance_once_equals_advance_one(self, source: str) -> None: + """PROPERTY: advance() == advance(1).""" + event(f"source_len={len(source)}") + cursor = Cursor(source, 0) + + cursor1 = cursor.advance() + cursor2 = cursor.advance(1) + + assert cursor1.pos == cursor2.pos + + @given( + source=source_text.filter(lambda s: len(s) > 2), + offset=st.integers(min_value=0, max_value=10), + ) + @settings(max_examples=100) + def test_peek_reads_ahead_without_advancing(self, source: str, offset: int) -> None: + """PROPERTY: peek(offset) reads ahead without changing position.""" + event(f"offset={offset}") + cursor = Cursor(source, 0) + + if offset < len(source): + peeked = cursor.peek(offset) + pos_after_peek = cursor.pos + + # Peek should not change position + assert pos_after_peek == 0 + # Peek should return correct character + assert peeked == source[offset] + + @given( + source=source_text.filter(lambda s: len(s) > 0), + start_pos=st.integers(min_value=0, max_value=50), + ) + @settings(max_examples=100) + def test_slice_to_extracts_substring(self, source: str, start_pos: int) -> None: + """PROPERTY: slice_to(end) extracts source[pos:end].""" + event(f"source_len={len(source)}") + start_pos = min(start_pos, len(source) - 1) + cursor = Cursor(source, start_pos) + + end_pos = min(start_pos + 5, len(source)) + extracted = cursor.slice_to(end_pos) + + assert extracted == source[start_pos:end_pos] diff --git a/tests/syntax_cursor_property_cases/property_tests_robustness.py b/tests/syntax_cursor_property_cases/property_tests_robustness.py new file mode 100644 index 00000000..9113d6d0 --- /dev/null +++ b/tests/syntax_cursor_property_cases/property_tests_robustness.py @@ -0,0 +1,81 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_cursor_property.py.""" + +from tests.syntax_cursor_property_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY TESTS - ROBUSTNESS +# ============================================================================ + + +class TestCursorRobustness: + """Test cursor robustness with edge cases.""" + + @given(source=source_text) + @settings(max_examples=100) + def test_empty_source_is_eof(self, source: str) -> None: + """PROPERTY: Empty source is always EOF.""" + event(f"source_len={len(source)}") + if len(source) == 0: + cursor = Cursor(source, 0) + assert cursor.is_eof is True + + @given(source=source_text) + @settings(max_examples=100) + def test_position_at_end_is_eof(self, source: str) -> None: + """PROPERTY: pos == len(source) is the canonical EOF position.""" + event(f"source_len={len(source)}") + cursor = Cursor(source, len(source)) + assert cursor.is_eof is True + + @given(source=source_text, pos=st.integers(min_value=1, max_value=1000)) + @settings(max_examples=100) + def test_position_strictly_beyond_end_raises(self, source: str, pos: int) -> None: + """PROPERTY: pos > len(source) raises ValueError at construction. + + advance() always clamps to len(source), so positions strictly beyond + the source length cannot arise through normal cursor navigation and + indicate a construction error. + """ + assume(pos > len(source)) + event(f"excess={pos - len(source)}") + with pytest.raises(ValueError, match="exceeds source length"): + Cursor(source, pos) + + @given(source=source_text.filter(lambda s: len(s) > 0)) + @settings(max_examples=50) + def test_advance_at_eof_stays_at_eof(self, source: str) -> None: + """PROPERTY: Advancing at EOF stays at EOF.""" + event(f"source_len={len(source)}") + cursor = Cursor(source, len(source)) + assert cursor.is_eof is True + + # Advance should keep us at or past EOF + new_cursor = cursor.advance() + assert new_cursor.is_eof is True + + @given( + source=source_text.filter(lambda s: len(s) > 0), + offset=st.integers(min_value=0, max_value=100), + ) + @settings(max_examples=100) + def test_peek_beyond_eof_returns_none(self, source: str, offset: int) -> None: + """PROPERTY: peek(offset) returns None when offset >= remaining chars.""" + event(f"offset={offset}") + cursor = Cursor(source, 0) + + if offset >= len(source): + result = cursor.peek(offset) + assert result is None + + @given(source=source_text, count=st.integers(min_value=1, max_value=1000)) + @settings(max_examples=100) + def test_advance_clamps_at_eof(self, source: str, count: int) -> None: + """PROPERTY: advance(count) clamps position at source length.""" + event(f"advance_count={count}") + cursor = Cursor(source, 0) + + new_cursor = cursor.advance(count) + + # Position should not exceed source length + assert new_cursor.pos <= len(source) diff --git a/tests/syntax_parser_core_cases/__init__.py b/tests/syntax_parser_core_cases/__init__.py new file mode 100644 index 00000000..8d817df1 --- /dev/null +++ b/tests/syntax_parser_core_cases/__init__.py @@ -0,0 +1,49 @@ +"""Core parser tests: blank line detection, comment merging, DoS protection, error recovery. + +Tests for ``ftllexengine.syntax.parser.core``: + +- ``_has_blank_line_between``: Region-based newline detection for comment merging +- ``_CommentAccumulator``: Span handling and content joining for adjacent comments +- ``FluentParserV1``: Comment merging, term/message/junk parsing, DoS limits, + nesting depth clamping, source size validation, error recovery, + parse_stream incremental entry parsing +""" + +from __future__ import annotations + +import logging +import sys + +import pytest +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.constants import MAX_SOURCE_SIZE +from ftllexengine.diagnostics import DiagnosticCode +from ftllexengine.enums import CommentType +from ftllexengine.syntax.ast import Comment, Junk, Message, Span, Term +from ftllexengine.syntax.parser.core import ( + FluentParserV1, + _CommentAccumulator, + _has_blank_line_between, +) + +__all__ = [ + "MAX_SOURCE_SIZE", + "Comment", + "CommentType", + "DiagnosticCode", + "FluentParserV1", + "Junk", + "Message", + "Span", + "Term", + "_CommentAccumulator", + "_has_blank_line_between", + "event", + "given", + "logging", + "pytest", + "st", + "sys", +] diff --git a/tests/syntax_parser_core_cases/blank_line_detection.py b/tests/syntax_parser_core_cases/blank_line_detection.py new file mode 100644 index 00000000..9be7111e --- /dev/null +++ b/tests/syntax_parser_core_cases/blank_line_detection.py @@ -0,0 +1,124 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestBlankLineDetection +# ============================================================================ + + +class TestBlankLineDetection: + """Direct tests for ``_has_blank_line_between``. + + The function checks whether a region of the source string contains + at least one newline character. After parse_comment consumes the + trailing newline, any remaining newline in the gap indicates a + blank line was present between comments. + """ + + # -- Positive: regions containing newlines ---------------------------- + + def test_empty_region_has_no_blank_line(self) -> None: + """Empty region (start == end) contains no newline.""" + source = "content" + assert _has_blank_line_between(source, 0, 0) is False + + def test_consecutive_newlines(self) -> None: + """Two consecutive newlines in region are detected.""" + source = "\n\n" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_single_newline_in_region(self) -> None: + """Single newline indicates blank line (trailing LF already consumed).""" + source = "line1\nline2" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_newline_space_newline(self) -> None: + """Newline-space-newline sequence contains a newline.""" + source = "line1\n \nline2" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_multiple_spaces_between_newlines(self) -> None: + """Multiple spaces between newlines still contains newlines.""" + source = "start\n \nend" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_consecutive_newlines_at_start(self) -> None: + """Consecutive newlines at start of region.""" + source = "\n\ncontent" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_newline_at_end_only(self) -> None: + """Single newline at end of content is detected.""" + source = "content\n" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_alternating_newlines_and_spaces(self) -> None: + """Alternating pattern of newlines and spaces.""" + source = "\n \n \n" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_content_between_newlines(self) -> None: + """Content between newlines does not prevent newline detection.""" + source = "\nX\n" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_tab_between_newlines(self) -> None: + """Tab between newlines does not prevent newline detection.""" + source = "\n\t\n" + assert _has_blank_line_between(source, 0, len(source)) is True + + # -- Negative: regions without newlines -------------------------------- + + def test_spaces_only_no_newlines(self) -> None: + """Region with only spaces has no newline.""" + source = "content content" + assert _has_blank_line_between(source, 7, 12) is False + + def test_no_newline_ascii_content(self) -> None: + """Plain ASCII content without newlines.""" + source = "abcdefghijklmnop" + assert _has_blank_line_between(source, 0, len(source)) is False + + def test_mixed_whitespace_no_newline(self) -> None: + """Mixed spaces without newline in subregion.""" + source = "start end" + assert _has_blank_line_between(source, 5, 9) is False + + # -- Region boundary handling ------------------------------------------ + + def test_blank_line_partially_in_region(self) -> None: + """Region containing newlines is detected.""" + source = "prefix\n\nsuffix" + assert _has_blank_line_between(source, 6, 8) is True + + def test_blank_line_before_region(self) -> None: + """Newlines before region are not detected.""" + source = "\n\ncontent" + assert _has_blank_line_between(source, 2, len(source)) is False + + def test_blank_line_after_region(self) -> None: + """Newlines after region are not detected.""" + source = "content\n\n" + assert _has_blank_line_between(source, 0, 7) is False + + # -- Comment merging gap scenarios ------------------------------------- + + def test_comment_gap_two_newlines(self) -> None: + """Two newlines in a row create a blank line gap.""" + source = "\n\n" + assert _has_blank_line_between(source, 0, len(source)) is True + + def test_comment_gap_empty(self) -> None: + """Zero-length gap between consecutive comments has no blank line.""" + comment1_end = len("# Comment1\n") + source = "# Comment1\n# Comment2\n" + assert _has_blank_line_between( + source, comment1_end, comment1_end + ) is False + + def test_comment_gap_whitespace_only_line(self) -> None: + """Whitespace-only line between newlines is a blank line.""" + source = "\n \n" + assert _has_blank_line_between(source, 0, len(source)) is True diff --git a/tests/syntax_parser_core_cases/comment_merging.py b/tests/syntax_parser_core_cases/comment_merging.py new file mode 100644 index 00000000..1a23f37d --- /dev/null +++ b/tests/syntax_parser_core_cases/comment_merging.py @@ -0,0 +1,262 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestCommentMerging +# ============================================================================ + + +class TestCommentMerging: + """Comment merging via ``FluentParserV1`` and ``_CommentAccumulator``. + + Adjacent single-hash comments without blank lines between them are + merged into a single Comment node. Different comment types (``#``, + ``##``, ``###``) are never merged. Blank lines separate comment groups. + """ + + # -- Parser-level merging ---------------------------------------------- + + def test_adjacent_comments_merge(self) -> None: + """Adjacent single-hash comments merge into one.""" + parser = FluentParserV1() + resource = parser.parse("# Line 1\n# Line 2\n# Line 3\n") + assert len(resource.entries) == 1 + comment = resource.entries[0] + assert isinstance(comment, Comment) + assert "Line 1" in comment.content + assert "Line 2" in comment.content + assert "Line 3" in comment.content + + def test_different_comment_types_dont_merge(self) -> None: + """Comments of different types are not merged.""" + parser = FluentParserV1() + resource = parser.parse("\n# Single\n## Group\n") + assert len(resource.entries) == 2 + c1 = resource.entries[0] + c2 = resource.entries[1] + assert isinstance(c1, Comment) + assert isinstance(c2, Comment) + assert c1.type == CommentType.COMMENT + assert c2.type == CommentType.GROUP + + def test_comments_separated_by_multiple_blank_lines(self) -> None: + """Multiple blank lines prevent merging.""" + parser = FluentParserV1() + resource = parser.parse("\n# First\n\n\n# Second\n") + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) == 2 + + def test_comments_separated_by_content(self) -> None: + """Non-comment content between comments prevents merging.""" + parser = FluentParserV1() + resource = parser.parse( + "\n# First comment\ntext\n# Second comment\n" + ) + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) == 2 + + def test_content_between_comments_separates(self) -> None: + """Text content between comments causes separation.""" + parser = FluentParserV1() + resource = parser.parse("# Comment1\ntext content here\n# Comment2") + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) == 2 + + def test_multiple_newlines_with_content(self) -> None: + """Multiple newlines with interspersed content separates.""" + parser = FluentParserV1() + resource = parser.parse("\n# First\n\n\nx\n# Second") + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) == 2 + + def test_newline_content_newline_pattern(self) -> None: + """Pattern: newline, content, newline separates comments.""" + parser = FluentParserV1() + resource = parser.parse("# First\nx\n\n# Second") + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) == 2 + + def test_merged_comment_span_covers_all(self) -> None: + """Merged comment span starts at first and ends at last.""" + parser = FluentParserV1() + resource = parser.parse("# Line 1\n# Line 2\n# Line 3") + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) == 1 + merged = comments[0] + assert merged.span is not None + assert merged.span.start == 0 + + def test_blank_line_with_spaces_between_comments(self) -> None: + """Comments with single blank line (containing spaces).""" + parser = FluentParserV1() + resource = parser.parse("# First\n\n# Second") + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) >= 1 + + # -- _CommentAccumulator span edge cases ------------------------------- + + def test_accumulator_finalize_last_span_only(self) -> None: + """Finalize when first_span is None but last_span is not.""" + first = Comment( + content="First", type=CommentType.COMMENT, span=None, + ) + acc = _CommentAccumulator(first) + second = Comment( + content="Second", + type=CommentType.COMMENT, + span=Span(start=10, end=30), + ) + acc.add(second) + result = acc.finalize() + assert result.content == "First\nSecond" + assert result.span is not None + assert result.span.start == 10 + assert result.span.end == 30 + + def test_accumulator_finalize_neither_span(self) -> None: + """Finalize when both spans are None.""" + first = Comment( + content="No span 1", type=CommentType.GROUP, span=None, + ) + acc = _CommentAccumulator(first) + second = Comment( + content="No span 2", type=CommentType.GROUP, span=None, + ) + acc.add(second) + result = acc.finalize() + assert result.content == "No span 1\nNo span 2" + assert result.type == CommentType.GROUP + assert result.span is None + + def test_accumulator_finalize_both_spans(self) -> None: + """Finalize when both first and last have spans.""" + first = Comment( + content="A", + type=CommentType.COMMENT, + span=Span(start=0, end=5), + ) + acc = _CommentAccumulator(first) + second = Comment( + content="B", + type=CommentType.COMMENT, + span=Span(start=6, end=11), + ) + acc.add(second) + result = acc.finalize() + assert result.content == "A\nB" + assert result.span is not None + assert result.span.start == 0 + assert result.span.end == 11 + + # -- Comment attachment to terms --------------------------------------- + + def test_single_hash_comment_attached_to_term(self) -> None: + """Single-hash comment immediately before term is attached.""" + parser = FluentParserV1() + resource = parser.parse( + "# This comment should attach\n-my-term = Term Value\n" + ) + assert len(resource.entries) == 1 + entry = resource.entries[0] + assert isinstance(entry, Term) + assert entry.id.name == "my-term" + assert entry.comment is not None + assert isinstance(entry.comment, Comment) + assert entry.comment.type == CommentType.COMMENT + assert "This comment should attach" in entry.comment.content + + def test_multiple_comments_attached_to_term(self) -> None: + """Multiple adjacent comments merge and attach to term.""" + parser = FluentParserV1() + source = ( + "# Comment line 1\n# Comment line 2\n" + "# Comment line 3\n-my-term = Value\n" + ) + resource = parser.parse(source) + assert len(resource.entries) == 1 + entry = resource.entries[0] + assert isinstance(entry, Term) + assert entry.comment is not None + assert "Comment line 1" in entry.comment.content + assert "Comment line 2" in entry.comment.content + assert "Comment line 3" in entry.comment.content + + def test_group_comment_before_term_not_attached(self) -> None: + """Group comment (##) before term is not attached.""" + parser = FluentParserV1() + resource = parser.parse("## Group comment\n-my-term = Value\n") + assert len(resource.entries) == 2 + comment = resource.entries[0] + term = resource.entries[1] + assert isinstance(comment, Comment) + assert comment.type == CommentType.GROUP + assert isinstance(term, Term) + assert term.comment is None + + def test_comment_with_blank_lines_before_term_not_attached(self) -> None: + """Blank lines between comment and term prevent attachment.""" + parser = FluentParserV1() + resource = parser.parse("# Comment\n\n\n-my-term = Value\n") + assert len(resource.entries) == 2 + comment = resource.entries[0] + term = resource.entries[1] + assert isinstance(comment, Comment) + assert isinstance(term, Term) + assert term.comment is None + + # -- CRLF handling in comment merging ---------------------------------- + + def test_crlf_comments(self) -> None: + """Parser handles CRLF line endings in comment regions.""" + parser = FluentParserV1() + resource = parser.parse("# Comment 1\r\n\r\n# Comment 2") + assert resource is not None + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) >= 1 + + def test_cr_only_comments(self) -> None: + """Parser handles CR-only line endings in comment regions.""" + parser = FluentParserV1() + resource = parser.parse("# Comment 1\r\r# Comment 2") + assert resource is not None + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) >= 1 + + def test_spaces_between_crlf_newlines(self) -> None: + """Parser handles spaces between CRLF newlines.""" + parser = FluentParserV1() + resource = parser.parse("# Comment 1\r\n \r\n# Comment 2") + assert resource is not None + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) >= 1 + + def test_no_blank_line_adjacent_comments_merge(self) -> None: + """Adjacent comments with no blank line merge into one.""" + parser = FluentParserV1() + resource = parser.parse("# Comment 1\n# Comment 2") + comments = [ + e for e in resource.entries if isinstance(e, Comment) + ] + assert len(comments) == 1 diff --git a/tests/syntax_parser_core_cases/do_slimits_and_validation.py b/tests/syntax_parser_core_cases/do_slimits_and_validation.py new file mode 100644 index 00000000..d71a661f --- /dev/null +++ b/tests/syntax_parser_core_cases/do_slimits_and_validation.py @@ -0,0 +1,155 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestDoSLimitsAndValidation +# ============================================================================ + + +class TestDoSLimitsAndValidation: + """DoS protection: nesting depth, source size, parameter validation. + + Verifies nesting depth clamping, source size limits, and + constructor parameter validation. + """ + + # -- Nesting depth exceeded -------------------------------------------- + + def test_depth_exceeded_specific_annotation(self) -> None: + """Nesting depth exceeded produces specific diagnostic.""" + parser = FluentParserV1(max_nesting_depth=1) + source = "msg = { { $var } }\n" + result = parser.parse(source) + assert len(result.entries) == 1 + junk_entry = result.entries[0] + assert isinstance(junk_entry, Junk) + assert len(junk_entry.annotations) == 1 + annotation = junk_entry.annotations[0] + assert ( + annotation.code + == DiagnosticCode.PARSE_NESTING_DEPTH_EXCEEDED.name + ) + assert "Nesting depth limit exceeded" in annotation.message + assert "max: 1" in annotation.message + + # -- Recursion limit clamping ------------------------------------------ + + def test_clamps_excessive_nesting_depth( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """Excessive max_nesting_depth is clamped to safe limit.""" + recursion_limit = sys.getrecursionlimit() + max_safe_depth = recursion_limit - 50 + excessive_depth = recursion_limit + 100 + with caplog.at_level( + logging.WARNING, + logger="ftllexengine.syntax.parser.core", + ): + parser = FluentParserV1(max_nesting_depth=excessive_depth) + assert parser.max_nesting_depth == max_safe_depth + assert parser.max_nesting_depth < excessive_depth + assert len(caplog.records) == 1 + warning = caplog.records[0] + assert warning.levelname == "WARNING" + assert "max_nesting_depth" in warning.message + assert "exceeds Python recursion limit" in warning.message + assert "Clamping to" in warning.message + + def test_accepts_depth_within_recursion_limit( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """No warning when nesting depth is within safe limit.""" + with caplog.at_level( + logging.WARNING, + logger="ftllexengine.syntax.parser.core", + ): + parser = FluentParserV1(max_nesting_depth=50) + assert parser.max_nesting_depth == 50 + assert len(caplog.records) == 0 + + # -- Source size validation -------------------------------------------- + + def test_max_source_size_default(self) -> None: + """Default max_source_size equals MAX_SOURCE_SIZE constant.""" + parser = FluentParserV1() + assert parser.max_source_size == MAX_SOURCE_SIZE + + def test_max_source_size_custom(self) -> None: + """Custom max_source_size is stored.""" + parser = FluentParserV1(max_source_size=5000) + assert parser.max_source_size == 5000 + + def test_max_source_size_disabled(self) -> None: + """max_source_size=0 disables the limit.""" + parser = FluentParserV1(max_source_size=0) + assert parser.max_source_size == 0 + + def test_oversized_source_raises_value_error(self) -> None: + """parse() raises ValueError when source exceeds limit.""" + parser = FluentParserV1(max_source_size=100) + oversized = "a" * 101 + with pytest.raises( + ValueError, + match=( + r"Source length \(101 characters\) " + r"exceeds maximum \(100 characters\)" + ), + ): + parser.parse(oversized) + + def test_oversized_error_includes_config_hint(self) -> None: + """ValueError includes configuration hint.""" + parser = FluentParserV1(max_source_size=50) + with pytest.raises( + ValueError, + match="Configure max_source_size in FluentParserV1", + ): + parser.parse("x" * 51) + + def test_source_at_exact_limit(self) -> None: + """parse() allows source exactly at size limit.""" + parser = FluentParserV1(max_source_size=100) + result = parser.parse(("msg = value\n" * 8)[:100]) + assert result is not None + + def test_disabled_limit_accepts_large_source(self) -> None: + """max_source_size=0 accepts arbitrarily large source.""" + parser = FluentParserV1(max_source_size=0) + result = parser.parse("msg = " + ("x" * 100000)) + assert result is not None + + def test_none_limit_accepts_large_source(self) -> None: + """max_source_size=None accepts arbitrarily large source.""" + parser = FluentParserV1(max_source_size=None) + result = parser.parse("msg = " + ("y" * 100000)) + assert result is not None + + # -- Parameter validation ---------------------------------------------- + + def test_rejects_zero_nesting_depth(self) -> None: + """max_nesting_depth=0 raises ValueError.""" + with pytest.raises( + ValueError, + match=r"max_nesting_depth must be positive \(got 0\)", + ): + FluentParserV1(max_nesting_depth=0) + + def test_rejects_negative_nesting_depth(self) -> None: + """max_nesting_depth=-1 raises ValueError.""" + with pytest.raises( + ValueError, + match=r"max_nesting_depth must be positive \(got -1\)", + ): + FluentParserV1(max_nesting_depth=-1) + + def test_accepts_positive_nesting_depth(self) -> None: + """Positive max_nesting_depth is accepted.""" + parser = FluentParserV1(max_nesting_depth=50) + assert parser.max_nesting_depth == 50 + + def test_accepts_none_nesting_depth(self) -> None: + """None max_nesting_depth uses default.""" + parser = FluentParserV1(max_nesting_depth=None) + assert parser.max_nesting_depth > 0 diff --git a/tests/syntax_parser_core_cases/do_sprotection.py b/tests/syntax_parser_core_cases/do_sprotection.py new file mode 100644 index 00000000..8f356fbf --- /dev/null +++ b/tests/syntax_parser_core_cases/do_sprotection.py @@ -0,0 +1,212 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestDoSProtection +# ============================================================================ + + +class TestDoSAbortBehavior: + """DoS abort behavior: max_parse_errors and abort thresholds. + + The parser aborts when the number of Junk entries exceeds + ``max_parse_errors``, preventing memory exhaustion from + severely malformed input. + """ + + # -- max_parse_errors: indented junk ----------------------------------- + + def test_abort_on_indented_junk( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """Parser aborts when indented junk count exceeds limit.""" + parser = FluentParserV1(max_parse_errors=3) + source = ( + " indented1\n# comment\n" + " indented2\n# comment\n" + " indented3\n# comment\n" + " indented4\n" + ) + with caplog.at_level(logging.WARNING): + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 3 + assert any( + "Parse aborted" in r.message for r in caplog.records + ) + assert any( + "exceeded maximum of 3 Junk entries" in r.message + for r in caplog.records + ) + + # -- max_parse_errors: failed comments --------------------------------- + + def test_abort_on_failed_comments( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """Parser aborts when malformed comment count exceeds limit.""" + parser = FluentParserV1(max_parse_errors=2) + source = "####\n####\n####\n####\n" + with caplog.at_level(logging.WARNING): + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 2 + assert any( + "Parse aborted" in r.message for r in caplog.records + ) + assert any( + "exceeded maximum of 2 Junk entries" in r.message + for r in caplog.records + ) + + def test_malformed_comment_creates_junk_with_diagnostic(self) -> None: + """Malformed comment creates Junk with proper diagnostic.""" + parser = FluentParserV1() + result = parser.parse("#####\n") + assert len(result.entries) == 1 + junk_entry = result.entries[0] + assert isinstance(junk_entry, Junk) + assert junk_entry.content == "#####" + assert len(junk_entry.annotations) == 1 + assert ( + junk_entry.annotations[0].code + == DiagnosticCode.PARSE_JUNK.name + ) + assert "Invalid comment syntax" in junk_entry.annotations[0].message + + # -- max_parse_errors: message parse failures -------------------------- + + def test_abort_on_message_failures( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """Parser aborts when message parse failures exceed limit.""" + parser = FluentParserV1(max_parse_errors=3) + source = "msg1\nmsg2\nmsg3\nmsg4\nmsg5\n" + with caplog.at_level(logging.WARNING): + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 3 + assert any( + "Parse aborted" in r.message for r in caplog.records + ) + assert any( + "exceeded maximum of 3 Junk entries" in r.message + for r in caplog.records + ) + + def test_generic_parse_error_annotation(self) -> None: + """Generic parse error when nesting depth not exceeded.""" + parser = FluentParserV1() + result = parser.parse("invalid syntax here\n") + assert len(result.entries) == 1 + junk_entry = result.entries[0] + assert isinstance(junk_entry, Junk) + assert len(junk_entry.annotations) == 1 + annotation = junk_entry.annotations[0] + assert annotation.code == DiagnosticCode.PARSE_JUNK.name + assert annotation.message == "Parse error" + + # -- max_parse_errors: mixed junk types -------------------------------- + + def test_mixed_junk_types_count_toward_limit( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """All junk types count together toward the limit.""" + parser = FluentParserV1(max_parse_errors=4) + source = ( + " indented1\nmsg1 = ok\n####\n" + "invalid\nmsg2 = ok\n indented2\n####\n" + ) + with caplog.at_level(logging.WARNING): + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 4 + assert any( + "Parse aborted" in r.message for r in caplog.records + ) + + def test_depth_exceeded_counts_toward_limit( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """Depth exceeded errors count toward max_parse_errors.""" + parser = FluentParserV1( + max_nesting_depth=1, max_parse_errors=2, + ) + source = ( + "m1 = { { $x } }\nm2 = { { $y } }\n" + "m3 = { { $z } }\n" + ) + with caplog.at_level(logging.WARNING): + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 2 + depth_count = sum( + 1 + for entry in junk + for ann in entry.annotations + if ann.code + == DiagnosticCode.PARSE_NESTING_DEPTH_EXCEEDED.name + ) + assert depth_count >= 1 + + # -- max_parse_errors: boundary conditions ----------------------------- + + def test_disabled_max_parse_errors_never_aborts(self) -> None: + """Parser with max_parse_errors=0 never aborts.""" + parser = FluentParserV1(max_parse_errors=0) + source = "####\n" * 200 + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 200 + + def test_exact_boundary(self) -> None: + """Parser creates exactly max_parse_errors junk entries at limit.""" + parser = FluentParserV1(max_parse_errors=5) + source = "####\n" * 5 + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 5 + + def test_one_over_boundary( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """Parser with 6 errors and limit of 5 aborts at 5.""" + parser = FluentParserV1(max_parse_errors=5) + source = "####\n" * 6 + with caplog.at_level(logging.WARNING): + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == 5 + assert any( + "Parse aborted" in r.message for r in caplog.records + ) + + # -- Log message content ----------------------------------------------- + + def test_log_suggests_fixing_source( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """DoS protection log mentions malformed FTL input.""" + parser = FluentParserV1(max_parse_errors=1) + source = "####\n####\n" + with caplog.at_level(logging.WARNING): + parser.parse(source) + assert any( + "severely malformed FTL input" in r.message + for r in caplog.records + ) + + def test_log_suggests_increasing_limit( + self, caplog: pytest.LogCaptureFixture, + ) -> None: + """DoS protection log mentions increasing max_parse_errors.""" + parser = FluentParserV1(max_parse_errors=1) + source = "####\n####\n" + with caplog.at_level(logging.WARNING): + parser.parse(source) + assert any( + "increasing max_parse_errors" in r.message + for r in caplog.records + ) diff --git a/tests/syntax_parser_core_cases/parse_stream_cases.py b/tests/syntax_parser_core_cases/parse_stream_cases.py new file mode 100644 index 00000000..63fffa01 --- /dev/null +++ b/tests/syntax_parser_core_cases/parse_stream_cases.py @@ -0,0 +1,197 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestParseStream +# ============================================================================ + + +class TestParseStream: + """FluentParserV1.parse_stream incremental entry parsing.""" + + def test_empty_iterable_yields_nothing(self) -> None: + """Empty line iterable produces no entries.""" + parser = FluentParserV1() + assert list(parser.parse_stream([])) == [] + + def test_single_message_from_lines(self) -> None: + """Single message lines yield one Message entry.""" + parser = FluentParserV1() + lines = ["greeting = Hello\n"] + entries = list(parser.parse_stream(lines)) + assert len(entries) == 1 + assert isinstance(entries[0], Message) + assert entries[0].id.name == "greeting" + + def test_two_messages_blank_line_separated(self) -> None: + """Two messages separated by blank line are yielded in order.""" + parser = FluentParserV1() + lines = ["msg1 = One\n", "\n", "msg2 = Two\n"] + entries = list(parser.parse_stream(lines)) + assert len(entries) == 2 + assert isinstance(entries[0], Message) + assert isinstance(entries[1], Message) + assert entries[0].id.name == "msg1" + assert entries[1].id.name == "msg2" + + def test_adjacent_messages_same_chunk(self) -> None: + """Messages with no blank line between are in one chunk, both yielded.""" + parser = FluentParserV1() + lines = ["msg1 = One\n", "msg2 = Two\n"] + entries = list(parser.parse_stream(lines)) + msg_entries = [e for e in entries if isinstance(e, Message)] + assert len(msg_entries) == 2 + assert msg_entries[0].id.name == "msg1" + assert msg_entries[1].id.name == "msg2" + + def test_comment_attached_to_message(self) -> None: + """Comment immediately before message (no blank line) is attached.""" + parser = FluentParserV1() + lines = ["# A comment\n", "greeting = Hello\n"] + entries = list(parser.parse_stream(lines)) + messages = [e for e in entries if isinstance(e, Message)] + assert len(messages) == 1 + assert messages[0].comment is not None + assert "A comment" in messages[0].comment.content + + def test_standalone_comment_blank_line_before_message(self) -> None: + """Comment separated by blank line from message is a standalone Comment.""" + parser = FluentParserV1() + lines = ["# Standalone\n", "\n", "greeting = Hello\n"] + entries = list(parser.parse_stream(lines)) + assert len(entries) == 2 + assert isinstance(entries[0], Comment) + assert isinstance(entries[1], Message) + assert entries[1].comment is None + + def test_multiline_message_with_attributes(self) -> None: + """Message with attributes spanning multiple lines is yielded as one Message.""" + parser = FluentParserV1() + lines = ["submit =\n", " .label = Submit\n", " .tooltip = Click here\n"] + entries = list(parser.parse_stream(lines)) + messages = [e for e in entries if isinstance(e, Message)] + assert len(messages) == 1 + assert messages[0].id.name == "submit" + assert len(messages[0].attributes) == 2 + + def test_junk_entry_is_yielded(self) -> None: + """Unparseable content is yielded as Junk.""" + parser = FluentParserV1() + lines = [" indented = invalid\n"] + entries = list(parser.parse_stream(lines)) + junk_entries = [e for e in entries if isinstance(e, Junk)] + assert len(junk_entries) >= 1 + + def test_generator_input_accepted(self) -> None: + """Generator (not just list) is accepted as lines argument.""" + parser = FluentParserV1() + + def line_gen() -> object: + yield "msg = Value\n" + + entries = list(parser.parse_stream(line_gen())) # type: ignore[arg-type] + assert len(entries) == 1 + assert isinstance(entries[0], Message) + + def test_lines_without_trailing_newlines(self) -> None: + """Lines without trailing newlines are handled correctly.""" + parser = FluentParserV1() + lines = ["msg1 = One", "", "msg2 = Two"] + entries = list(parser.parse_stream(lines)) + msg_entries = [e for e in entries if isinstance(e, Message)] + assert len(msg_entries) == 2 + + def test_leading_blank_line_is_skipped(self) -> None: + """Blank line before any content is silently skipped. + + When a blank line is encountered with an empty accumulator chunk, the + elif chunk: branch evaluates to False and the loop continues to the next + line without flushing. This covers the 593->589 branch in parse_stream. + """ + parser = FluentParserV1() + lines = ["", "greeting = Hello\n"] + entries = list(parser.parse_stream(lines)) + msg_entries = [e for e in entries if isinstance(e, Message)] + assert len(msg_entries) == 1 + assert msg_entries[0].id.name == "greeting" + + def test_consecutive_blank_lines_between_messages(self) -> None: + """Consecutive blank lines between messages are handled correctly. + + After the first blank line flushes the accumulator, a second consecutive + blank line hits the elif chunk: False branch (empty accumulator) again, + exercising the 593->589 branch path a second time per stream. + """ + parser = FluentParserV1() + lines = ["msg1 = One\n", "\n", "\n", "msg2 = Two\n"] + entries = list(parser.parse_stream(lines)) + msg_entries = [e for e in entries if isinstance(e, Message)] + assert len(msg_entries) == 2 + assert msg_entries[0].id.name == "msg1" + assert msg_entries[1].id.name == "msg2" + + def test_term_is_yielded(self) -> None: + """Term entry is correctly parsed and yielded.""" + parser = FluentParserV1() + lines = ["-brand = Firefox\n"] + entries = list(parser.parse_stream(lines)) + terms = [e for e in entries if isinstance(e, Term)] + assert len(terms) == 1 + assert terms[0].id.name == "brand" + + @given( + names=st.lists( + st.text( + min_size=1, + max_size=20, + alphabet=st.characters( + min_codepoint=ord("a"), + max_codepoint=ord("z"), + ), + ), + min_size=1, + max_size=10, + ) + ) + def test_entry_count_matches_parse_stream_vs_parse( + self, names: list[str] + ) -> None: + """parse_stream yields same entry count as parse() for well-formed FTL.""" + event(f"msg_count={len(names)}") + source = "\n\n".join(f"{name} = Value" for name in names) + "\n" + parser = FluentParserV1() + stream_entries = list(parser.parse_stream(source.splitlines(keepends=True))) + full_entries = list(parser.parse(source).entries) + assert len(stream_entries) == len(full_entries) + + @given( + names=st.lists( + st.text( + min_size=1, + max_size=20, + alphabet=st.characters( + min_codepoint=ord("a"), + max_codepoint=ord("z"), + ), + ), + min_size=1, + max_size=10, + ) + ) + def test_message_ids_match_parse_stream_vs_parse( + self, names: list[str] + ) -> None: + """Message IDs from parse_stream match those from parse() for well-formed FTL.""" + event(f"msg_count={len(names)}") + # Deduplicate names to avoid overwrite warnings + unique_names = list(dict.fromkeys(names)) + source = "\n\n".join(f"{name} = Value" for name in unique_names) + "\n" + parser = FluentParserV1() + stream_ids = { + e.id.name for e in parser.parse_stream(source.splitlines(keepends=True)) + if isinstance(e, Message) + } + full_ids = {e.id.name for e in parser.parse(source).entries if isinstance(e, Message)} + assert stream_ids == full_ids diff --git a/tests/syntax_parser_core_cases/parser_core_hypothesis.py b/tests/syntax_parser_core_cases/parser_core_hypothesis.py new file mode 100644 index 00000000..39fd5000 --- /dev/null +++ b/tests/syntax_parser_core_cases/parser_core_hypothesis.py @@ -0,0 +1,301 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestParserCoreHypothesis +# ============================================================================ + + +class TestParserCoreHypothesis: + """Property-based tests for parser core components. + + Uses Hypothesis to verify invariants across generated inputs. + All ``@given`` tests emit ``event()`` calls for HypoFuzz guidance. + """ + + # -- _has_blank_line_between properties -------------------------------- + + @given( + prefix=st.text( + alphabet=st.characters(blacklist_characters=["\n"]), + max_size=10, + ), + suffix=st.text( + alphabet=st.characters(blacklist_characters=["\n"]), + max_size=10, + ), + ) + def test_newline_pair_always_detected( + self, prefix: str, suffix: str, + ) -> None: + """Two consecutive newlines in region are always detected.""" + source = f"{prefix}\n\n{suffix}" + event(f"input_len={len(source)}") + assert _has_blank_line_between(source, 0, len(source)) is True + + @given(st.integers(min_value=2, max_value=10)) + def test_multiple_newlines_always_detected( + self, count: int, + ) -> None: + """Multiple consecutive newlines always detected.""" + event(f"boundary=newline_count_{count}") + source = "\n" * count + assert _has_blank_line_between(source, 0, len(source)) is True + + @given(st.integers(min_value=0, max_value=50)) + def test_spaces_only_never_blank(self, space_count: int) -> None: + """Spaces without newlines never produce a blank line.""" + event(f"boundary=space_count_{min(space_count, 10)}") + source = " " * space_count + assert _has_blank_line_between(source, 0, len(source)) is False + + @given( + st.text( + alphabet=st.characters( + blacklist_characters=["\n", " "], + min_codepoint=33, + max_codepoint=126, + ), + min_size=1, + max_size=20, + ) + ) + def test_ascii_no_newline_no_blank(self, text: str) -> None: + """ASCII text without newlines or spaces has no blank line.""" + event(f"input_len={len(text)}") + assert _has_blank_line_between(text, 0, len(text)) is False + + @given( + non_blank=st.characters( + blacklist_categories=("Zs", "Zl", "Zp"), + blacklist_characters=["\n"], + ) + ) + def test_non_blank_char_with_newlines( + self, non_blank: str, + ) -> None: + """Non-blank char between newlines: first newline is detected.""" + event("outcome=newline_detected") + source = f"\n{non_blank}\n" + assert _has_blank_line_between(source, 0, len(source)) is True + + @given( + lines=st.lists( + st.text( + alphabet=st.characters( + blacklist_categories=("Zs", "Zl", "Zp"), + blacklist_characters=["\n"], + ), + min_size=1, + max_size=5, + ), + min_size=1, + max_size=10, + ) + ) + def test_joined_lines_blank_iff_multiple( + self, lines: list[str], + ) -> None: + """Single-newline-joined non-ws lines: blank iff >1 line.""" + event(f"boundary=line_count_{len(lines)}") + source = "\n".join(lines) + result = _has_blank_line_between(source, 0, len(source)) + if len(lines) > 1: + assert result is True + else: + assert result is False + + @given( + non_blank_chars=st.lists( + st.characters( + blacklist_categories=("Zs", "Zl", "Zp"), + blacklist_characters=["\n"], + ), + min_size=1, + max_size=5, + ) + ) + def test_interleaved_newlines_always_detected( + self, non_blank_chars: list[str], + ) -> None: + """Interleaved newlines and non-blank chars: always has newline.""" + event(f"boundary=char_count_{len(non_blank_chars)}") + parts: list[str] = [] + for char in non_blank_chars: + parts.append("\n") + parts.append(char) + parts.append("\n") + source = "".join(parts) + assert _has_blank_line_between(source, 0, len(source)) is True + + # -- Parser hash-combination property ---------------------------------- + + @given( + st.text( + alphabet="#\n\r \t", min_size=1, max_size=50, + ) + ) + def test_hash_combinations_no_crash(self, source: str) -> None: + """Parser handles any combination of hashes and whitespace.""" + event(f"input_len={len(source)}") + parser = FluentParserV1() + resource = parser.parse(source) + assert resource is not None + assert isinstance(resource.entries, tuple) + has_entries = len(resource.entries) > 0 + event(f"outcome=has_entries_{has_entries}") + + # -- max_parse_errors property ----------------------------------------- + + @given(st.integers(min_value=1, max_value=10)) + def test_custom_limit_respected(self, limit: int) -> None: + """Parser aborts at exactly max_parse_errors limit.""" + event(f"boundary=limit_{limit}") + parser = FluentParserV1(max_parse_errors=limit) + source = "####\n" * (limit + 2) + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) == limit + + # -- Nesting depth property -------------------------------------------- + + @given(st.integers(min_value=1, max_value=5)) + def test_depth_exceeded_includes_limit( + self, depth_limit: int, + ) -> None: + """Depth exceeded diagnostic includes the configured limit.""" + event(f"boundary=depth_{depth_limit}") + parser = FluentParserV1(max_nesting_depth=depth_limit) + nesting = ( + "{ " * (depth_limit + 1) + + "$x" + + " }" * (depth_limit + 1) + ) + source = f"msg = {nesting}\n" + result = parser.parse(source) + junk = [e for e in result.entries if isinstance(e, Junk)] + assert len(junk) >= 1 + for entry in junk: + for ann in entry.annotations: + if ( + ann.code + == DiagnosticCode.PARSE_NESTING_DEPTH_EXCEEDED.name + ): + assert f"max: {depth_limit}" in ann.message + return + pytest.fail( + "Expected PARSE_NESTING_DEPTH_EXCEEDED annotation" + ) + + # -- Recursion limit clamping property --------------------------------- + + @given(depth_offset=st.integers(min_value=1, max_value=500)) + def test_any_excessive_depth_clamped( + self, depth_offset: int, + ) -> None: + """Any depth exceeding recursion limit is clamped.""" + event(f"boundary=offset_{min(depth_offset, 50)}") + recursion_limit = sys.getrecursionlimit() + max_safe = recursion_limit - 50 + excessive = recursion_limit + depth_offset + parser = FluentParserV1(max_nesting_depth=excessive) + assert parser.max_nesting_depth == max_safe + + # -- _CommentAccumulator span property --------------------------------- + + @given( + content1=st.text(min_size=1, max_size=50), + content2=st.text(min_size=1, max_size=50), + start=st.integers(min_value=0, max_value=1000), + end=st.integers(min_value=0, max_value=1000), + ) + def test_accumulator_span_combinations( + self, + content1: str, + content2: str, + start: int, + end: int, + ) -> None: + """Accumulator always produces valid Comment for any span config.""" + if end < start: + start, end = end, start + span = Span(start=start, end=end) + + for first_has in (True, False): + for last_has in (True, False): + event( + f"outcome=first_{first_has}_last_{last_has}" + ) + first = Comment( + content=content1, + type=CommentType.COMMENT, + span=span if first_has else None, + ) + acc = _CommentAccumulator(first) + second = Comment( + content=content2, + type=CommentType.COMMENT, + span=span if last_has else None, + ) + acc.add(second) + result = acc.finalize() + + assert content1 in result.content + assert content2 in result.content + assert "\n" in result.content + + if first_has or last_has: + assert result.span is not None + if first_has != last_has: + assert result.span == span + else: + assert result.span is None + + # -- Comment attachment to term property -------------------------------- + + @given( + comment_text=st.text( + min_size=1, + max_size=100, + alphabet=st.characters( + min_codepoint=32, + max_codepoint=126, + exclude_characters="#\n", + ), + ), + term_name=st.text( + min_size=1, + max_size=20, + alphabet=st.characters( + min_codepoint=ord("a"), + max_codepoint=ord("z"), + ), + ), + term_value=st.text( + min_size=1, + max_size=50, + alphabet=st.characters( + min_codepoint=32, + max_codepoint=126, + exclude_characters="\n{}", + ), + ), + ) + def test_comment_attaches_to_adjacent_term( + self, + comment_text: str, + term_name: str, + term_value: str, + ) -> None: + """Single-hash comment immediately before term is attached.""" + event("outcome=term_attachment") + parser = FluentParserV1() + source = f"# {comment_text}\n-{term_name} = {term_value}\n" + resource = parser.parse(source) + terms = [e for e in resource.entries if isinstance(e, Term)] + if terms: + term = terms[0] + assert term.comment is not None + assert comment_text in term.comment.content diff --git a/tests/syntax_parser_core_cases/parser_entry_recovery.py b/tests/syntax_parser_core_cases/parser_entry_recovery.py new file mode 100644 index 00000000..27022639 --- /dev/null +++ b/tests/syntax_parser_core_cases/parser_entry_recovery.py @@ -0,0 +1,153 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestParserEntryRecovery +# ============================================================================ + + +class TestParserEntryRecovery: + """Parser entry recovery: empty input, CRLF, messages, terms, junk. + + Verifies the parser handles empty/whitespace input, CRLF line endings, + message and term parsing basics, and junk creation for invalid content. + """ + + # -- Empty / whitespace ------------------------------------------------ + + def test_empty_source(self) -> None: + """Empty source produces empty resource.""" + parser = FluentParserV1() + resource = parser.parse("") + assert resource is not None + assert len(resource.entries) == 0 + + def test_whitespace_only(self) -> None: + """Whitespace-only source produces empty resource.""" + parser = FluentParserV1() + resource = parser.parse(" \n\n \n") + assert resource is not None + assert len(resource.entries) == 0 + + # -- CRLF handling ----------------------------------------------------- + + def test_crlf_line_endings(self) -> None: + """Parser handles CRLF line endings.""" + parser = FluentParserV1() + resource = parser.parse("msg1 = value1\r\nmsg2 = value2\r\n") + assert resource is not None + assert len(resource.entries) >= 2 + + # -- Message parsing --------------------------------------------------- + + def test_simple_message(self) -> None: + """Simple message parsing.""" + parser = FluentParserV1() + resource = parser.parse("msg = value") + assert resource is not None + assert len(resource.entries) == 1 + assert isinstance(resource.entries[0], Message) + + def test_multiple_messages(self) -> None: + """Multiple messages.""" + parser = FluentParserV1() + resource = parser.parse( + "msg1 = value1\nmsg2 = value2\nmsg3 = value3\n" + ) + assert resource is not None + assert len(resource.entries) == 3 + + # -- Term parsing ------------------------------------------------------ + + def test_simple_term(self) -> None: + """Simple term parsing.""" + parser = FluentParserV1() + resource = parser.parse("-term = value") + assert resource is not None + assert len(resource.entries) == 1 + assert isinstance(resource.entries[0], Term) + + def test_term_with_id(self) -> None: + """Term preserves identifier.""" + parser = FluentParserV1() + resource = parser.parse("-my-term = Term Value") + assert len(resource.entries) == 1 + assert isinstance(resource.entries[0], Term) + assert resource.entries[0].id.name == "my-term" + + def test_multiple_terms(self) -> None: + """Multiple terms.""" + parser = FluentParserV1() + source = "-term1 = Value 1\n-term2 = Value 2\n-term3 = Value 3\n" + resource = parser.parse(source) + assert len(resource.entries) == 3 + assert all(isinstance(e, Term) for e in resource.entries) + + def test_term_with_attributes(self) -> None: + """Term with attributes.""" + parser = FluentParserV1() + source = "-term = Main Value\n .attr = Attribute Value\n" + resource = parser.parse(source) + assert len(resource.entries) >= 1 + + def test_term_and_message_coexist(self) -> None: + """Terms and messages in same resource.""" + parser = FluentParserV1() + source = "-term = term value\nmsg = message value\n" + resource = parser.parse(source) + assert len(resource.entries) == 2 + + def test_failed_term_parsing(self) -> None: + """Parser handles failed term parsing (dash not followed by valid term).""" + parser = FluentParserV1() + result = parser.parse("- invalid\n") + assert result is not None + assert len(result.entries) > 0 + + # -- Junk handling ----------------------------------------------------- + + def test_junk_creates_entry(self) -> None: + """Unparseable content creates Junk entry.""" + parser = FluentParserV1() + resource = parser.parse("%%% invalid syntax") + assert resource is not None + assert len(resource.entries) > 0 + assert any(isinstance(e, Junk) for e in resource.entries) + + def test_junk_continues_parsing(self) -> None: + """Parser continues after junk entry.""" + parser = FluentParserV1() + resource = parser.parse("%%% invalid\nmsg = valid message\n") + assert resource is not None + assert len(resource.entries) >= 2 + + def test_multiline_junk(self) -> None: + """Multi-line junk handling.""" + parser = FluentParserV1() + source = "%%% line 1\n line 2\n line 3\nmsg = valid\n" + resource = parser.parse(source) + assert resource is not None + assert len(resource.entries) > 0 + + def test_junk_eof_with_trailing_spaces(self) -> None: + """Junk parsing handles trailing spaces at EOF.""" + parser = FluentParserV1() + resource = parser.parse("%%% invalid ") + assert resource is not None + assert len(resource.entries) > 0 + assert isinstance(resource.entries[0], Junk) + + def test_junk_trailing_spaces_at_eof(self) -> None: + """Junk with trailing spaces at EOF.""" + parser = FluentParserV1() + resource = parser.parse("invalid syntax ") + assert resource is not None + + def test_multiline_junk_ends_at_eof(self) -> None: + """Multiline junk ending at EOF.""" + parser = FluentParserV1() + source = "invalid line 1\n invalid line 2\n " + resource = parser.parse(source) + assert resource is not None diff --git a/tests/syntax_parser_core_cases/parser_error_recovery_core.py b/tests/syntax_parser_core_cases/parser_error_recovery_core.py new file mode 100644 index 00000000..f12428e0 --- /dev/null +++ b/tests/syntax_parser_core_cases/parser_error_recovery_core.py @@ -0,0 +1,92 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_core.py.""" + +from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TestParserErrorRecoveryCore +# ============================================================================ + + +class TestParserCommentRecovery: + """Parser comment parsing edge cases and comment type handling. + + Verifies comment recovery, comment types (single, group, resource), + and edge cases like hash-only lines and EOF handling. + """ + + # -- Comment parsing edge cases ---------------------------------------- + + def test_comment_without_newline_at_eof(self) -> None: + """Comment without trailing newline at EOF.""" + parser = FluentParserV1() + resource = parser.parse("# This is a comment") + assert resource is not None + assert len(resource.entries) > 0 + + def test_hash_only_at_eof(self) -> None: + """Single hash at EOF.""" + parser = FluentParserV1() + resource = parser.parse("#") + assert resource is not None + + def test_hash_with_newline_at_eof(self) -> None: + """Hash followed by newline at EOF.""" + parser = FluentParserV1() + resource = parser.parse("#\n") + assert resource is not None + + def test_multiple_hashes_at_eof(self) -> None: + """Multiple hashes (###) at EOF.""" + parser = FluentParserV1() + resource = parser.parse("###") + assert resource is not None + + def test_hash_followed_by_valid_message(self) -> None: + """Recovery from hash-only line then valid message.""" + parser = FluentParserV1() + resource = parser.parse("#\nmsg = value") + assert resource is not None + assert len(resource.entries) > 0 + + def test_hash_blank_line_then_message(self) -> None: + """Recovery from hash, blank line, then message.""" + parser = FluentParserV1() + resource = parser.parse("#\n\nmsg = value") + assert resource is not None + assert len(resource.entries) > 0 + + def test_multiple_failed_comment_lines(self) -> None: + """Recovery from multiple consecutive hash-only lines.""" + parser = FluentParserV1() + resource = parser.parse("#\n#\n#\nmsg = value") + assert resource is not None + + # -- Comment types ----------------------------------------------------- + + def test_single_line_comment(self) -> None: + """Single-line comment before message.""" + parser = FluentParserV1() + resource = parser.parse("# This is a comment\nmsg = value") + assert resource is not None + assert len(resource.entries) >= 1 + + def test_group_comment(self) -> None: + """Group comment (##) before message.""" + parser = FluentParserV1() + resource = parser.parse("## Group comment\nmsg = value") + assert resource is not None + + def test_resource_comment(self) -> None: + """Resource comment (###) before message.""" + parser = FluentParserV1() + resource = parser.parse("### Resource comment\nmsg = value") + assert resource is not None + + def test_multiple_comment_types(self) -> None: + """Multiple comment types in one resource.""" + parser = FluentParserV1() + source = "# Comment 1\n## Comment 2\n### Comment 3\nmsg = value\n" + resource = parser.parse(source) + assert resource is not None + assert len(resource.entries) >= 1 diff --git a/tests/syntax_parser_error_recovery_cases/__init__.py b/tests/syntax_parser_error_recovery_cases/__init__.py new file mode 100644 index 00000000..9f11e5ee --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/__init__.py @@ -0,0 +1,83 @@ +"""Error recovery, defensive code paths, and edge-case coverage for parser rules. + +Consolidated from 12 per-metric test files into a single semantic unit. +Covers: error paths, defensive/unreachable branches (via mocking), FluentParserV1 +integration for malformed input, and property-based edge-case tests. +""" + +from __future__ import annotations + +import logging +import sys +from unittest.mock import patch + +from ftllexengine.syntax.ast import ( + Attribute, + Identifier, + Junk, + Message, + MessageReference, + NumberLiteral, + Placeable, + StringLiteral, + TextElement, + VariableReference, +) +from ftllexengine.syntax.cursor import Cursor, ParseError, ParseResult +from ftllexengine.syntax.parser.core import FluentParserV1 +from ftllexengine.syntax.parser.rules import ( + ParseContext, + _parse_inline_hyphen, + _parse_inline_identifier, + parse_argument_expression, + parse_attribute, + parse_call_arguments, + parse_function_reference, + parse_inline_expression, + parse_message, + parse_pattern, + parse_placeable, + parse_select_expression, + parse_simple_pattern, + parse_term, + parse_term_reference, + parse_variant, + parse_variant_key, +) + +__all__ = [ + "Attribute", + "Cursor", + "FluentParserV1", + "Identifier", + "Junk", + "Message", + "MessageReference", + "NumberLiteral", + "ParseContext", + "ParseError", + "ParseResult", + "Placeable", + "StringLiteral", + "TextElement", + "VariableReference", + "_parse_inline_hyphen", + "_parse_inline_identifier", + "logging", + "parse_argument_expression", + "parse_attribute", + "parse_call_arguments", + "parse_function_reference", + "parse_inline_expression", + "parse_message", + "parse_pattern", + "parse_placeable", + "parse_select_expression", + "parse_simple_pattern", + "parse_term", + "parse_term_reference", + "parse_variant", + "parse_variant_key", + "patch", + "sys", +] diff --git a/tests/syntax_parser_error_recovery_cases/argument_expression_error_paths.py b/tests/syntax_parser_error_recovery_cases/argument_expression_error_paths.py new file mode 100644 index 00000000..995b238b --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/argument_expression_error_paths.py @@ -0,0 +1,103 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# ARGUMENT EXPRESSION ERROR PATHS +# ============================================================================ + + +class TestArgumentExpressionErrorPaths: + """Error paths in parse_argument_expression.""" + + def test_eof_returns_none(self) -> None: + """EOF at argument position.""" + assert parse_argument_expression(Cursor("", 0)) is None + + def test_invalid_char_returns_none(self) -> None: + """Invalid character (@) returns None.""" + assert parse_argument_expression(Cursor("@", 0)) is None + + def test_term_ref_fails_line_1105(self) -> None: + """Line 1105: Term reference parse fails (hyphen + identifier).""" + result = parse_argument_expression(Cursor("-x.123)", 0)) + assert result is None + + def test_term_ref_bare_hyphen_fails(self) -> None: + """Hyphen followed by ')' fails term and number parse.""" + assert parse_argument_expression(Cursor("-)", 0)) is None + + def test_number_fails_defensive_line_1120(self) -> None: + """Line 1120: parse_number returns None on digit (defensive). + + Requires mocking because parse_number is robust for digit start. + """ + with patch( + "ftllexengine.syntax.parser.expressions.parse_number" + ) as mock: + mock.return_value = ParseError("forced failure", Cursor("9)", 0)) + assert parse_argument_expression(Cursor("9)", 0)) is None + + def test_identifier_fails_defensive_line_1139(self) -> None: + """Line 1139: parse_identifier returns None (defensive). + + Requires mocking because is_identifier_start guarantees success. + """ + with patch( + "ftllexengine.syntax.parser.expressions.parse_identifier" + ) as mock: + mock.return_value = ParseError("forced failure", Cursor("x)", 0)) + assert parse_argument_expression(Cursor("x)", 0)) is None + + def test_function_ref_fails_line_1150(self) -> None: + """Line 1150: parse_function_reference returns None.""" + assert parse_argument_expression( + Cursor("FUNC(@)", 0) + ) is None + + def test_function_ref_succeeds(self) -> None: + """Function reference parsing succeeds.""" + result = parse_argument_expression(Cursor("NUMBER(42)", 0)) + assert result is not None + + def test_uppercase_no_paren_is_message_ref(self) -> None: + """Uppercase identifier without '(' is MessageReference.""" + result = parse_argument_expression(Cursor("NUMBER", 0)) + assert result is not None + assert isinstance(result.value, MessageReference) + assert result.value.id.name == "NUMBER" + + def test_uppercase_open_paren_at_eof(self) -> None: + """Uppercase + '(' but incomplete call.""" + assert parse_argument_expression(Cursor("NUMBER(", 0)) is None + + def test_negative_number_succeeds(self) -> None: + """Negative number parses as NumberLiteral.""" + result = parse_argument_expression(Cursor("-123", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_positive_number_succeeds(self) -> None: + """Digit-start parses as NumberLiteral.""" + result = parse_argument_expression(Cursor("42", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_string_literal_argument(self) -> None: + """String literal in argument position.""" + result = parse_argument_expression(Cursor('"text"', 0)) + assert result is not None + assert isinstance(result.value, StringLiteral) + + def test_inline_placeable_argument(self) -> None: + """Inline placeable { $var } in argument position.""" + result = parse_argument_expression(Cursor("{ $var }", 0)) + assert result is not None + assert isinstance(result.value, Placeable) + + def test_identifier_with_underscore(self) -> None: + """Identifier can contain underscore after letter.""" + result = parse_argument_expression(Cursor("my_var", 0)) + assert result is not None + assert isinstance(result.value, MessageReference) diff --git a/tests/syntax_parser_error_recovery_cases/call_arguments_error_paths.py b/tests/syntax_parser_error_recovery_cases/call_arguments_error_paths.py new file mode 100644 index 00000000..1751f63c --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/call_arguments_error_paths.py @@ -0,0 +1,53 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# CALL ARGUMENTS ERROR PATHS +# ============================================================================ + + +class TestCallArgumentsErrorPaths: + """Error paths in parse_call_arguments.""" + + def test_named_arg_name_not_identifier(self) -> None: + """Named argument name must be identifier (not variable).""" + result = parse_call_arguments(Cursor('$var: "value")', 0)) + assert result is None + + def test_duplicate_named_argument(self) -> None: + """Duplicate named argument names.""" + assert parse_call_arguments(Cursor("x: 1, x: 2)", 0)) is None + + def test_named_arg_missing_value(self) -> None: + """Expected value after ':' but got ')'.""" + assert parse_call_arguments(Cursor("x: )", 0)) is None + + def test_named_arg_value_parse_fails(self) -> None: + """Value expression parse fails after ':'.""" + assert parse_call_arguments(Cursor("x: @)", 0)) is None + + def test_named_arg_eof_after_colon(self) -> None: + """EOF after ':' in named argument.""" + assert parse_call_arguments(Cursor("x:", 0)) is None + + def test_positional_after_named(self) -> None: + """Positional args must come before named.""" + assert parse_call_arguments(Cursor("x: 1, $var)", 0)) is None + + def test_named_arg_non_literal_value(self) -> None: + """Named argument value must be literal.""" + assert parse_call_arguments( + Cursor("x: $var)", 0) + ) is None + + def test_trailing_comma(self) -> None: + """Trailing comma in argument list.""" + result = parse_call_arguments(Cursor("1, 2, )", 0)) + assert result is not None + assert len(result.value.positional) == 2 + + def test_argument_expression_fails_in_loop(self) -> None: + """Argument expression fails at '@'.""" + assert parse_call_arguments(Cursor("@)", 0)) is None diff --git a/tests/syntax_parser_error_recovery_cases/debug_logging.py b/tests/syntax_parser_error_recovery_cases/debug_logging.py new file mode 100644 index 00000000..cbfbd1d2 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/debug_logging.py @@ -0,0 +1,28 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# DEBUG LOGGING +# ============================================================================ + + +class TestDebugLogging: + """Tests for debug logging coverage (junk creation).""" + + def test_junk_creation_triggers_debug_log(self) -> None: + """Debug logging when creating Junk entries.""" + logging.basicConfig( + level=logging.DEBUG, stream=sys.stderr, force=True + ) + try: + parser = FluentParserV1() + res = parser.parse("invalid { syntax") + assert len(res.entries) >= 1 + except KeyError: + pass + finally: + logging.basicConfig( + level=logging.WARNING, force=True + ) diff --git a/tests/syntax_parser_error_recovery_cases/defensive_mocking_tests_unreachable_code_paths.py b/tests/syntax_parser_error_recovery_cases/defensive_mocking_tests_unreachable_code_paths.py new file mode 100644 index 00000000..69304213 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/defensive_mocking_tests_unreachable_code_paths.py @@ -0,0 +1,68 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# DEFENSIVE MOCKING TESTS (UNREACHABLE CODE PATHS) +# ============================================================================ + + +class TestDefensiveMocking: + """Defensive None checks for unreachable code paths. + + These lines are structurally unreachable in normal execution but + exist as guardrails against future refactoring. + """ + + def test_parse_message_attrs_returns_none(self) -> None: + """parse_message_attributes returns None (defensive).""" + with patch( + "ftllexengine.syntax.parser.entries" + ".parse_message_attributes" + ) as mock: + mock.return_value = None + assert parse_message( + Cursor("hello = value", 0) + ) is None + + def test_parse_attribute_pattern_returns_none(self) -> None: + """parse_pattern returns None in parse_attribute (defensive).""" + with patch( + "ftllexengine.syntax.parser.entries.parse_pattern" + ) as mock: + mock.return_value = None + assert parse_attribute( + Cursor(".attr = value", 0) + ) is None + + def test_parse_term_pattern_returns_none(self) -> None: + """parse_pattern returns None in parse_term (defensive).""" + with patch( + "ftllexengine.syntax.parser.entries.parse_pattern" + ) as mock: + mock.return_value = None + assert parse_term( + Cursor("-brand = value", 0) + ) is None + + def test_parse_term_attrs_returns_none_line_2038(self) -> None: + """Line 2038: parse_message_attributes returns None in term.""" + with patch( + "ftllexengine.syntax.parser.entries" + ".parse_message_attributes" + ) as mock: + mock.return_value = None + assert parse_term( + Cursor("-brand = value", 0) + ) is None + + def test_parse_message_pattern_returns_none(self) -> None: + """parse_pattern returns None in parse_message (defensive).""" + with patch( + "ftllexengine.syntax.parser.entries.parse_pattern" + ) as mock: + mock.return_value = None + assert parse_message( + Cursor("hello = value", 0) + ) is None diff --git a/tests/syntax_parser_error_recovery_cases/function_reference_error_paths.py b/tests/syntax_parser_error_recovery_cases/function_reference_error_paths.py new file mode 100644 index 00000000..273a1bd1 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/function_reference_error_paths.py @@ -0,0 +1,37 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# FUNCTION REFERENCE ERROR PATHS +# ============================================================================ + + +class TestFunctionReferenceErrorPaths: + """Error paths in parse_function_reference.""" + + def test_identifier_parse_fails(self) -> None: + """Non-identifier character at start.""" + assert parse_function_reference(Cursor("123", 0)) is None + + def test_missing_opening_paren(self) -> None: + """Valid name but no '('.""" + assert parse_function_reference(Cursor("FUNC", 0)) is None + + def test_missing_closing_paren(self) -> None: + """Arguments but no closing ')'.""" + assert parse_function_reference(Cursor("FUNC($x", 0)) is None + + def test_arguments_parse_fails(self) -> None: + """Call arguments fail at '@'.""" + assert parse_function_reference( + Cursor("FUNC(@)", 0) + ) is None + + def test_depth_exceeded(self) -> None: + """Nesting depth exceeded.""" + ctx = ParseContext(max_nesting_depth=1, current_depth=2) + assert parse_function_reference( + Cursor("FUNC($x)", 0), ctx + ) is None diff --git a/tests/syntax_parser_error_recovery_cases/inline_expression_and_helper_error_paths.py b/tests/syntax_parser_error_recovery_cases/inline_expression_and_helper_error_paths.py new file mode 100644 index 00000000..1f752105 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/inline_expression_and_helper_error_paths.py @@ -0,0 +1,55 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# INLINE EXPRESSION AND HELPER ERROR PATHS +# ============================================================================ + + +class TestInlineExpressionErrorPaths: + """Error paths in inline expression helpers.""" + + def test_inline_hyphen_all_fail(self) -> None: + """_parse_inline_hyphen: both term and number fail.""" + assert _parse_inline_hyphen(Cursor("-", 0)) is None + + def test_inline_hyphen_term_attr_fails_line_1365(self) -> None: + """Line 1365: Term reference fails (invalid attribute).""" + result = _parse_inline_hyphen(Cursor("-x.123", 0)) + assert result is None + + def test_inline_identifier_function_fails(self) -> None: + """_parse_inline_identifier: function parse fails.""" + assert _parse_inline_identifier( + Cursor("func(@)", 0) + ) is None + + def test_inline_identifier_parse_fails(self) -> None: + """_parse_inline_identifier: parse_identifier fails.""" + assert _parse_inline_identifier(Cursor("123", 0)) is None + + def test_inline_expression_eof(self) -> None: + """parse_inline_expression: EOF returns None.""" + assert parse_inline_expression(Cursor("", 0)) is None + + def test_inline_expression_invalid_char(self) -> None: + """parse_inline_expression: invalid character returns None.""" + assert parse_inline_expression(Cursor("@", 0)) is None + + def test_inline_expression_variable_fails(self) -> None: + """parse_inline_expression: '$' but identifier fails.""" + assert parse_inline_expression(Cursor("$", 0)) is None + + def test_inline_expression_nested_placeable_fails(self) -> None: + """parse_inline_expression: nested placeable fails.""" + assert parse_inline_expression(Cursor("{ @ }", 0)) is None + + def test_inline_expression_message_attr_fails(self) -> None: + """Message reference attribute parsing fails (invalid attr).""" + cursor = Cursor("msg.-test", 0) + result = parse_inline_expression(cursor) + assert result is None or ( + result is not None and hasattr(result, "value") + ) diff --git a/tests/syntax_parser_error_recovery_cases/message_reference_with_attributes.py b/tests/syntax_parser_error_recovery_cases/message_reference_with_attributes.py new file mode 100644 index 00000000..4e8cff57 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/message_reference_with_attributes.py @@ -0,0 +1,83 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# MESSAGE REFERENCE WITH ATTRIBUTES +# ============================================================================ + + +class TestMessageReferenceWithAttribute: + """Coverage for lowercase message references with .attribute syntax.""" + + def test_msg_dot_attr_inline(self) -> None: + """Parse { msg.attr } in inline expression.""" + parser = FluentParserV1() + res = parser.parse("key = { msg.attr }") + msg = res.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + p = msg.value.elements[0] + assert isinstance(p, Placeable) + ref = p.expression + assert isinstance(ref, MessageReference) + assert ref.id.name == "msg" + assert ref.attribute is not None + assert ref.attribute.name == "attr" + + def test_msg_dot_attr_in_attribute_value(self) -> None: + """Parse { msg.help } in message attribute value.""" + parser = FluentParserV1() + res = parser.parse( + "key = Value\n .tooltip = { msg.help }\n" + ) + msg = res.entries[0] + assert isinstance(msg, Message) + attr = msg.attributes[0] + assert isinstance(attr, Attribute) + p = attr.value.elements[0] + assert isinstance(p, Placeable) + ref = p.expression + assert isinstance(ref, MessageReference) + assert ref.attribute is not None + assert ref.attribute.name == "help" + + def test_msg_dot_missing_attr_name(self) -> None: + """{ msg. } with missing attribute name.""" + parser = FluentParserV1() + res = parser.parse("key = { msg. }") + assert len(res.entries) >= 1 + + def test_msg_dot_invalid_attr(self) -> None: + """{ msg.@ } with invalid attribute.""" + parser = FluentParserV1() + res = parser.parse("key = { msg.@ }") + assert res is not None + + def test_msg_dot_hash_attr(self) -> None: + """{ msg.# } with invalid attribute.""" + parser = FluentParserV1() + res = parser.parse("key = { msg.# }") + assert len(res.entries) >= 1 + + def test_mixed_identifiers_with_attributes(self) -> None: + """Various identifier cases with attributes.""" + parser = FluentParserV1() + cases = [ + ("key = { foo.bar }", "foo", "bar"), + ("key = { a.b }", "a", "b"), + ("key = { msg123.attr456 }", "msg123", "attr456"), + ] + for source, exp_msg, exp_attr in cases: + res = parser.parse(source) + msg = res.entries[0] + assert isinstance(msg, Message), f"Failed: {source}" + assert msg.value is not None + p = msg.value.elements[0] + assert isinstance(p, Placeable) + ref = p.expression + assert isinstance(ref, MessageReference) + assert ref.id.name == exp_msg + assert ref.attribute is not None + assert ref.attribute.name == exp_attr diff --git a/tests/syntax_parser_error_recovery_cases/parser_integration_malformed_input.py b/tests/syntax_parser_error_recovery_cases/parser_integration_malformed_input.py new file mode 100644 index 00000000..ab782777 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/parser_integration_malformed_input.py @@ -0,0 +1,248 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSER INTEGRATION - MALFORMED INPUT +# ============================================================================ + + +class TestParserMalformedInput: + """FluentParserV1 integration for error recovery on malformed FTL.""" + + def test_four_hash_comment_recovery(self) -> None: + """Invalid >3 hash comment is recovered as junk.""" + parser = FluentParserV1() + res = parser.parse( + "#### Invalid\nkey = value" + ) + assert any( + hasattr(e, "id") and e.id.name == "key" + for e in res.entries + ) + + def test_multiple_junk_entries(self) -> None: + """Multiple malformed entries create multiple junk entries.""" + parser = FluentParserV1() + res = parser.parse( + "!!!invalid1\n!!!invalid2\nkey = value\n" + ) + assert any( + hasattr(e, "id") and e.id.name == "key" + for e in res.entries + ) + + def test_junk_with_unicode(self) -> None: + """Junk entries with non-ASCII characters.""" + parser = FluentParserV1() + res = parser.parse("¡¡¡ invalid\nkey = value\n") + assert len(res.entries) >= 1 + + def test_empty_variant_key(self) -> None: + """Empty variant key [].""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $c -> [] x *[o] O }\n" + ) + assert len(res.entries) >= 1 + + def test_unclosed_variant_bracket(self) -> None: + """Unclosed variant bracket.""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $c -> [unclosed X *[o] O }\n" + ) + assert len(res.entries) >= 1 + + def test_select_missing_arrow(self) -> None: + """Select expression without '->'.""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $val\n [one] One\n *[other] Other\n}\n" + ) + junk = [e for e in res.entries if isinstance(e, Junk)] + assert len(junk) >= 1 + + def test_unclosed_placeable(self) -> None: + """Unclosed placeable creates junk.""" + parser = FluentParserV1() + res = parser.parse("msg = { $value") + assert isinstance(res.entries[0], Junk) + + def test_invalid_variant_syntax(self) -> None: + """Invalid variant syntax (missing '[').""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $c ->\n one] One\n *[other] O\n}\n" + ) + junk = [e for e in res.entries if isinstance(e, Junk)] + assert len(junk) >= 1 + + def test_empty_placeable(self) -> None: + """Empty placeable { }.""" + parser = FluentParserV1() + res = parser.parse("key = { }") + assert res is not None + + def test_standalone_attribute(self) -> None: + """Attribute without Message/Term creates junk.""" + parser = FluentParserV1() + res = parser.parse(" .attr = Value") + assert isinstance(res.entries[0], Junk) + + def test_invalid_term_name(self) -> None: + """Term '-' without valid identifier.""" + parser = FluentParserV1() + res = parser.parse("- = Invalid") + assert len(res.entries) >= 1 + + def test_message_without_equals(self) -> None: + """Message identifier without '=' creates junk.""" + parser = FluentParserV1() + res = parser.parse("test Hello") + assert isinstance(res.entries[0], Junk) + + def test_identifier_starting_with_number(self) -> None: + """Identifier starting with number creates junk.""" + parser = FluentParserV1() + res = parser.parse("123invalid = Value") + assert isinstance(res.entries[0], Junk) + + def test_eof_after_equals(self) -> None: + """EOF after '=' sign.""" + parser = FluentParserV1() + res = parser.parse("msg =") + assert len(res.entries) > 0 + + def test_eof_after_identifier(self) -> None: + """File ends right after message ID.""" + parser = FluentParserV1() + res = parser.parse("msg") + assert len(res.entries) > 0 + + def test_multiple_errors_creates_multiple_junk(self) -> None: + """Multiple errors create junk interleaved with valid entries.""" + parser = FluentParserV1() + res = parser.parse( + "invalid1 Missing\nvalid = Good\n" + "invalid2 Also\nanother = OK\n" + ) + assert len(res.entries) == 4 + junk_count = sum( + 1 for e in res.entries if isinstance(e, Junk) + ) + assert junk_count == 2 + + +class TestParserMalformedExpressions: + """FluentParserV1 integration for malformed expressions.""" + + def test_invalid_selector_variable(self) -> None: + """$ followed by invalid character in selector.""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $-invalid -> *[key] Value }" + ) + assert any(isinstance(e, Junk) for e in res.entries) + + def test_unclosed_string_literal_in_selector(self) -> None: + """Unclosed string literal in selector.""" + parser = FluentParserV1() + res = parser.parse( + 'msg = { "unclosed -> *[key] Value }' + ) + assert any(isinstance(e, Junk) for e in res.entries) + + def test_function_no_parens(self) -> None: + """UPPERCASE without parens is MessageReference.""" + parser = FluentParserV1() + res = parser.parse("key = { FUNC }") + msg = res.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + p = msg.value.elements[0] + assert isinstance(p, Placeable) + assert isinstance(p.expression, MessageReference) + + def test_function_missing_argument(self) -> None: + """Function with incomplete arguments.""" + parser = FluentParserV1() + res = parser.parse("key = { UPPERCASE( }") + assert res is not None + + def test_function_invalid_argument(self) -> None: + """Function with @invalid argument.""" + parser = FluentParserV1() + res = parser.parse("key = { FUNC(@invalid) }") + assert res is not None + + def test_term_ref_invalid_identifier(self) -> None: + """Term reference '-#' with invalid identifier.""" + parser = FluentParserV1() + res = parser.parse("key = { -# }") + assert len(res.entries) >= 1 + + def test_lowercase_function_call(self) -> None: + """Lowercase identifier with () is now valid per spec.""" + parser = FluentParserV1() + res = parser.parse("key = { lowercase() }") + assert len(res.entries) >= 1 + + def test_nested_malformed(self) -> None: + """Deeply malformed nested structures.""" + parser = FluentParserV1() + res = parser.parse( + "key1 = { $v -> [a] { FUNC( *[b] X }\nkey2 = ok\n" + ) + assert len(res.entries) >= 1 + + def test_term_reference_arguments_unclosed(self) -> None: + """Term arguments without closing ')'.""" + parser = FluentParserV1() + res = parser.parse("key = { -term(arg ") + assert res is not None + + def test_named_argument_number_as_name(self) -> None: + """Number as named argument name.""" + parser = FluentParserV1() + res = parser.parse('key = { FUNC(123: "value") }') + assert res is not None + + def test_duplicate_named_argument_via_parser(self) -> None: + """Duplicate named argument names via parser.""" + parser = FluentParserV1() + res = parser.parse("key = { FUNC(foo: 1, foo: 2) }") + assert res is not None + + def test_positional_after_named_via_parser(self) -> None: + """Positional after named argument.""" + parser = FluentParserV1() + res = parser.parse("key = { FUNC(name: 1, 2) }") + assert res is not None + + def test_named_arg_missing_value_via_parser(self) -> None: + """Named argument missing value.""" + parser = FluentParserV1() + res = parser.parse("key = { FUNC(name:) }") + assert res is not None + + def test_incomplete_number_at_eof(self) -> None: + """Number literal at EOF without closing brace.""" + parser = FluentParserV1() + res = parser.parse("msg = { 42") + assert len(res.entries) > 0 + + def test_number_multiple_decimal_points(self) -> None: + """Number with multiple decimal points.""" + parser = FluentParserV1() + res = parser.parse("msg = { 1.2.3 }") + assert len(res.entries) >= 1 + + def test_select_with_empty_variant_value(self) -> None: + """Select expression with empty variant value.""" + parser = FluentParserV1() + res = parser.parse( + "test = { $c ->\n [one]\n *[other] O\n}\n" + ) + assert len(res.entries) >= 1 diff --git a/tests/syntax_parser_error_recovery_cases/parser_integration_suite.py b/tests/syntax_parser_error_recovery_cases/parser_integration_suite.py new file mode 100644 index 00000000..6e9c150b --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/parser_integration_suite.py @@ -0,0 +1,105 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSER INTEGRATION SUITE +# ============================================================================ + + +class TestParserIntegration: + """Integration tests combining multiple edge cases.""" + + def test_complex_resource(self) -> None: + """FTL resource exercising multiple edge cases.""" + parser = FluentParserV1() + res = parser.parse( + "# Comment\n" + "msg = Value\n" + " .a = Short attr\n" + "\n" + "-t = Term\n" + "\n" + "select = { $n ->\n" + " [0] Zero\n" + " [1] One\n" + " *[other] Other\n" + "}\n" + "\n" + "func = { FUNC() }\n" + "\n" + "complex = { $a }{ $b } text { UPPER($c) }\n" + ) + assert len(res.entries) >= 5 + + def test_select_with_number_and_identifier_keys(self) -> None: + """Select with both number and identifier variant keys.""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $c ->\n" + " [0] Zero\n" + " [1] One\n" + " [42] Forty-two\n" + " *[other] Other\n" + "}\n" + ) + assert len(res.entries) >= 1 + + def test_select_identifier_keys(self) -> None: + """Select with identifier variant keys.""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $v ->\n" + " [yes] Affirmative\n" + " *[no] Negative\n" + "}\n" + ) + assert len(res.entries) >= 1 + + def test_variant_key_negative_hyphen_not_number(self) -> None: + """Variant key starts with - but isn't a number.""" + parser = FluentParserV1() + res = parser.parse( + "msg = { $s ->\n" + " [-not-a-number] Value\n" + " *[default] Default\n" + "}\n" + ) + assert len(res.entries) >= 1 + + def test_term_attribute_selection(self) -> None: + """Select on term attribute.""" + parser = FluentParserV1() + res = parser.parse( + "-term = Term\n" + " .attr = a\n" + "msg = { -term.attr -> *[a] Value }\n" + ) + assert len(res.entries) >= 1 + + def test_term_reference_arguments_via_parser(self) -> None: + """Term reference with arguments.""" + parser = FluentParserV1() + res = parser.parse( + "msg = { -term(case: 'accusative') }" + ) + assert len(res.entries) >= 1 + + def test_pattern_with_only_placeables(self) -> None: + """Pattern with adjacent placeables.""" + parser = FluentParserV1() + res = parser.parse("msg = { $a }{ $b }{ $c }") + assert len(res.entries) > 0 + + def test_function_variations(self) -> None: + """Function with various argument combinations.""" + parser = FluentParserV1() + for src in [ + "m = { FUNC() }", + "m = { FUNC($a, $b, $c) }", + 'm = { FUNC(key: "value", ot: "data") }', + 'm = { FUNC($p1, $p2, named: "value") }', + ]: + res = parser.parse(src) + assert len(res.entries) > 0, f"Failed: {src}" diff --git a/tests/syntax_parser_error_recovery_cases/pattern_continuation_edge_cases.py b/tests/syntax_parser_error_recovery_cases/pattern_continuation_edge_cases.py new file mode 100644 index 00000000..4ec8065f --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/pattern_continuation_edge_cases.py @@ -0,0 +1,68 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PATTERN CONTINUATION EDGE CASES +# ============================================================================ + + +class TestPatternContinuationEdgeCases: + """Pattern continuation and text accumulation edge cases.""" + + def test_pattern_line_691_placeable_continuation(self) -> None: + """Placeable then continuation creates new text element.""" + result = parse_pattern(Cursor("{$x}\n {$y}", 0)) + assert result is not None + + def test_pattern_continuation_after_placeable(self) -> None: + """Continuation text as new element after placeable.""" + result = parse_pattern( + Cursor("{$var}\n continuation", 0) + ) + assert result is not None + assert len(result.value.elements) >= 2 + + def test_continuation_at_start(self) -> None: + """Continuation at start of pattern.""" + result = parse_pattern(Cursor("\n {$x}", 0)) + assert result is not None + + def test_simple_pattern_continuation_before_placeable(self) -> None: + """text accumulation before placeable in simple pattern.""" + result = parse_simple_pattern( + Cursor("hello\n world{$x}", 0) + ) + assert result is not None + + def test_simple_pattern_continuation_at_end(self) -> None: + """text accumulation finalized at end of simple pattern.""" + result = parse_simple_pattern( + Cursor("hello\n world", 0) + ) + assert result is not None + + def test_pattern_at_eof_no_newline(self) -> None: + """Pattern ends at EOF without newline.""" + parser = FluentParserV1() + res = parser.parse("key = value") + assert len(res.entries) == 1 + + def test_pattern_ending_at_variant_marker(self) -> None: + """Pattern ends at start of variant marker.""" + parser = FluentParserV1() + res = parser.parse("key = text\n [") + assert len(res.entries) >= 1 + + def test_select_with_malformed_arrow_eof(self) -> None: + """Incomplete arrow at EOF.""" + parser = FluentParserV1() + res = parser.parse("key = { $var -") + assert len(res.entries) >= 1 + + def test_function_with_trailing_comma(self) -> None: + """Function call with trailing comma.""" + parser = FluentParserV1() + res = parser.parse("key = { FUNC(a, b,) }") + assert len(res.entries) >= 1 diff --git a/tests/syntax_parser_error_recovery_cases/placeable_error_paths.py b/tests/syntax_parser_error_recovery_cases/placeable_error_paths.py new file mode 100644 index 00000000..4ebef61f --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/placeable_error_paths.py @@ -0,0 +1,49 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PLACEABLE ERROR PATHS +# ============================================================================ + + +class TestPlaceableErrorPaths: + """Error paths in parse_placeable.""" + + def test_depth_exceeded(self) -> None: + """Nesting depth exceeded returns None.""" + ctx = ParseContext(max_nesting_depth=1, current_depth=2) + assert parse_placeable(Cursor("$var}", 0), ctx) is None + + def test_expression_parse_fails(self) -> None: + """Expression fails at '@'.""" + assert parse_placeable(Cursor("@}", 0)) is None + + def test_select_parse_fails(self) -> None: + """Select expression fails (no variants).""" + assert parse_placeable(Cursor("$var -> }", 0)) is None + + def test_select_missing_closing_brace(self) -> None: + """Select expression without closing }.""" + result = parse_placeable( + Cursor("$var -> [one] 1 *[other] N", 0) + ) + assert result is None + + def test_simple_expression_missing_closing_brace(self) -> None: + """Simple expression without closing }.""" + assert parse_placeable(Cursor("$var", 0)) is None + + def test_valid_selector_with_select_line_1585(self) -> None: + """Line 1585: Valid selector with select expression.""" + result = parse_placeable( + Cursor("$n -> [one] One *[other] Many}", 0) + ) + assert result is not None + + def test_hyphen_not_arrow(self) -> None: + """'-' but not '->' skips to simple close.""" + result = parse_placeable(Cursor("$var - }", 0)) + # Malformed, may return None or partial + assert result is None or result is not None diff --git a/tests/syntax_parser_error_recovery_cases/term_reference_error_paths.py b/tests/syntax_parser_error_recovery_cases/term_reference_error_paths.py new file mode 100644 index 00000000..c0bed762 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/term_reference_error_paths.py @@ -0,0 +1,62 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TERM REFERENCE ERROR PATHS +# ============================================================================ + + +class TestTermReferenceErrorPaths: + """Error paths in parse_term_reference.""" + + def test_missing_hyphen(self) -> None: + """No '-' at start.""" + assert parse_term_reference(Cursor("brand", 0)) is None + + def test_identifier_fails_after_hyphen(self) -> None: + """Identifier parse fails after '-'.""" + assert parse_term_reference(Cursor("-", 0)) is None + + def test_attribute_identifier_fails(self) -> None: + """Attribute identifier parse fails after '.'.""" + assert parse_term_reference(Cursor("-brand.", 0)) is None + + def test_arguments_parse_fails(self) -> None: + """Call arguments fail for term args.""" + assert parse_term_reference( + Cursor("-brand(@)", 0) + ) is None + + def test_arguments_missing_closing_paren_1449(self) -> None: + """Lines 1449-1450: Expected ')' after term arguments.""" + result = parse_term_reference( + Cursor("-brand(case: 'nom'", 0) + ) + assert result is None + + def test_depth_exceeded_with_arguments(self) -> None: + """Depth exceeded when parsing term arguments.""" + ctx = ParseContext(max_nesting_depth=2) + nested = ctx.enter_nesting().enter_nesting() + result = parse_term_reference( + Cursor('-brand(case: "nom")', 0), nested + ) + assert result is None + + def test_without_arguments_at_depth_limit(self) -> None: + """Term ref without args succeeds at depth limit.""" + ctx = ParseContext(max_nesting_depth=2) + nested = ctx.enter_nesting().enter_nesting() + result = parse_term_reference(Cursor("-brand", 0), nested) + assert result is not None + assert result.value.id.name == "brand" + + def test_with_arguments_succeeds(self) -> None: + """Term ref with arguments below depth limit.""" + result = parse_term_reference( + Cursor('-term(case: "gen")', 0) + ) + assert result is not None + assert result.value.arguments is not None diff --git a/tests/syntax_parser_error_recovery_cases/variant_key_error_paths.py b/tests/syntax_parser_error_recovery_cases/variant_key_error_paths.py new file mode 100644 index 00000000..19332d8d --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/variant_key_error_paths.py @@ -0,0 +1,65 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VARIANT KEY ERROR PATHS +# ============================================================================ + + +class TestVariantKeyErrorPaths: + """Error paths in parse_variant_key and parse_variant.""" + + def test_negative_sign_both_fail(self) -> None: + """Hyphen: parse_number fails, parse_identifier fails too.""" + cursor = Cursor("-", 0) + result = parse_variant_key(cursor) + assert result is None + + def test_negative_sign_identifier_fallback_via_mock(self) -> None: + """Lines 878-879: Number fails, identifier succeeds (defensive). + + Structurally unreachable without mocking because if cursor starts + with '-', parse_identifier also fails (can't start with '-'). + """ + with ( + patch( + "ftllexengine.syntax.parser.expressions.parse_number" + ) as mock_num, + patch( + "ftllexengine.syntax.parser.expressions.parse_identifier" + ) as mock_id, + ): + mock_num.return_value = ParseError("forced failure", Cursor("-test", 0)) + mock_id.return_value = ParseResult( + "test", Cursor("test", 4) + ) + cursor = Cursor("-test", 0) + result = parse_variant_key(cursor) + assert result is not None + + def test_variant_missing_opening_bracket(self) -> None: + """parse_variant: no '[' at start.""" + assert parse_variant(Cursor("one", 0)) is None + + def test_variant_missing_closing_bracket(self) -> None: + """parse_variant: no ']' after key.""" + assert parse_variant(Cursor("[one", 0)) is None + + def test_variant_invalid_key(self) -> None: + """parse_variant: invalid key character.""" + assert parse_variant(Cursor("[@]", 0)) is None + + def test_select_no_variants(self) -> None: + """parse_select_expression: immediate close, no variants.""" + sel = VariableReference(id=Identifier("count")) + assert parse_select_expression(Cursor("}", 0), sel, 0) is None + + def test_select_no_default_variant(self) -> None: + """parse_select_expression: variants without default.""" + sel = VariableReference(id=Identifier("count")) + result = parse_select_expression( + Cursor("[one] item\n}", 0), sel, 0 + ) + assert result is None diff --git a/tests/syntax_parser_error_recovery_cases/whitespace_and_line_ending_edge_cases.py b/tests/syntax_parser_error_recovery_cases/whitespace_and_line_ending_edge_cases.py new file mode 100644 index 00000000..995fecc2 --- /dev/null +++ b/tests/syntax_parser_error_recovery_cases/whitespace_and_line_ending_edge_cases.py @@ -0,0 +1,68 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_error_recovery.py.""" + +from tests.syntax_parser_error_recovery_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# WHITESPACE AND LINE ENDING EDGE CASES +# ============================================================================ + + +class TestWhitespaceAndLineEndings: + """Whitespace, CRLF, and formatting edge cases.""" + + def test_crlf_multiline(self) -> None: + """CRLF (\\r\\n) line endings in multiline pattern.""" + parser = FluentParserV1() + res = parser.parse( + "key =\r\n Line one\r\n Line two\r\n" + ) + msg = res.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + assert len(msg.value.elements) >= 2 + + def test_mixed_line_endings(self) -> None: + """Mixed \\r\\n and \\n line endings.""" + parser = FluentParserV1() + res = parser.parse( + "k1 = v1\r\nk2 = v2\nk3 = v3" + ) + assert len(res.entries) == 3 + + def test_tabs_in_pattern(self) -> None: + """Tabs in pattern are literal text.""" + parser = FluentParserV1() + res = parser.parse("key = value\twith\ttabs") + assert len(res.entries) == 1 + + def test_multiple_blank_lines(self) -> None: + """Multiple consecutive blank lines between entries.""" + parser = FluentParserV1() + res = parser.parse("k1 = v1\n\n\n\nk2 = v2") + assert len(res.entries) == 2 + + def test_empty_source(self) -> None: + """Empty source produces empty resource.""" + parser = FluentParserV1() + res = parser.parse("") + assert len(res.entries) == 0 + + def test_windows_crlf_entries(self) -> None: + """Windows CRLF between entries.""" + parser = FluentParserV1() + res = parser.parse("test = Hello\r\nworld = World\r\n") + assert len(res.entries) == 2 + + def test_text_with_stop_char_bracket(self) -> None: + """Text stops at '[' bracket.""" + parser = FluentParserV1() + res = parser.parse("key = text[bracket") + msg = res.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + text_vals = [ + e.value for e in msg.value.elements + if isinstance(e, TextElement) + ] + assert any("text" in v for v in text_vals) diff --git a/tests/syntax_parser_expressions_cases/__init__.py b/tests/syntax_parser_expressions_cases/__init__.py new file mode 100644 index 00000000..3131acdb --- /dev/null +++ b/tests/syntax_parser_expressions_cases/__init__.py @@ -0,0 +1,111 @@ +"""Tests for parser expression and placeable handling. + +Tests expression parsing functions: parse_variable_reference, +parse_variant_key, parse_variant, parse_select_expression, +parse_argument_expression, parse_call_arguments, parse_function_reference, +parse_term_reference, parse_inline_expression, parse_placeable, and +associated helpers (_parse_inline_hyphen, _parse_inline_identifier, +_parse_inline_number_literal, _parse_inline_string_literal, +_parse_message_attribute, _is_variant_marker, _is_valid_variant_key_char, +_trim_pattern_blank_lines, validate_message_content). +""" + +from __future__ import annotations + +from typing import cast + +from hypothesis import event, example, given +from hypothesis import strategies as st + +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.syntax.ast import ( + Attribute, + Identifier, + Message, + MessageReference, + NumberLiteral, + Pattern, + Placeable, + SelectExpression, + StringLiteral, + TermReference, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.cursor import Cursor +from ftllexengine.syntax.parser import FluentParserV1 +from ftllexengine.syntax.parser.rules import _MAX_LOOKAHEAD_CHARS as MAX_LOOKAHEAD_CHARS +from ftllexengine.syntax.parser.rules import ( + ParseContext, + _is_valid_variant_key_char, + _is_variant_marker, + _parse_inline_hyphen, + _parse_inline_identifier, + _parse_inline_number_literal, + _parse_inline_string_literal, + _parse_message_attribute, + _trim_pattern_blank_lines, + parse_argument_expression, + parse_call_arguments, + parse_function_reference, + parse_inline_expression, + parse_message, + parse_pattern, + parse_placeable, + parse_select_expression, + parse_simple_pattern, + parse_term_reference, + parse_variable_reference, + parse_variant, + parse_variant_key, + validate_message_content, +) + +__all__ = [ + "MAX_LOOKAHEAD_CHARS", + "Attribute", + "Cursor", + "FluentBundle", + "FluentParserV1", + "Identifier", + "Message", + "MessageReference", + "NumberLiteral", + "ParseContext", + "Pattern", + "Placeable", + "SelectExpression", + "StringLiteral", + "TermReference", + "TextElement", + "VariableReference", + "Variant", + "_is_valid_variant_key_char", + "_is_variant_marker", + "_parse_inline_hyphen", + "_parse_inline_identifier", + "_parse_inline_number_literal", + "_parse_inline_string_literal", + "_parse_message_attribute", + "_trim_pattern_blank_lines", + "cast", + "event", + "example", + "given", + "parse_argument_expression", + "parse_call_arguments", + "parse_function_reference", + "parse_inline_expression", + "parse_message", + "parse_pattern", + "parse_placeable", + "parse_select_expression", + "parse_simple_pattern", + "parse_term_reference", + "parse_variable_reference", + "parse_variant", + "parse_variant_key", + "st", + "validate_message_content", +] diff --git a/tests/syntax_parser_expressions_cases/argument_expression_call_arguments.py b/tests/syntax_parser_expressions_cases/argument_expression_call_arguments.py new file mode 100644 index 00000000..85cc1ba9 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/argument_expression_call_arguments.py @@ -0,0 +1,156 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# ARGUMENT EXPRESSION & CALL ARGUMENTS +# ============================================================================ + + +class TestParseArgumentExpression: + """Tests for parse_argument_expression dispatch paths.""" + + def test_eof_returns_none(self) -> None: + """EOF returns None.""" + assert parse_argument_expression(Cursor("", 0)) is None + + def test_string_literal(self) -> None: + """Parses string literal argument.""" + result = parse_argument_expression(Cursor('"text"', 0)) + assert result is not None + assert isinstance(result.value, StringLiteral) + + def test_negative_number(self) -> None: + """Parses negative number argument.""" + result = parse_argument_expression(Cursor("-123", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_term_reference(self) -> None: + """Parses term reference (-brand) argument.""" + result = parse_argument_expression(Cursor("-brand", 0)) + assert result is not None + assert isinstance(result.value, TermReference) + + def test_positive_number(self) -> None: + """Parses positive number argument.""" + result = parse_argument_expression(Cursor("42", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_inline_placeable(self) -> None: + """Parses inline placeable { expr } argument.""" + result = parse_argument_expression(Cursor("{ $var }", 0)) + assert result is not None + assert isinstance(result.value, Placeable) + + def test_message_reference_no_paren(self) -> None: + """Identifier without '(' parsed as MessageReference.""" + result = parse_argument_expression(Cursor("msg:", 0)) + assert result is not None + assert isinstance(result.value, MessageReference) + + def test_invalid_char_returns_none(self) -> None: + """Invalid character returns None.""" + assert parse_argument_expression(Cursor("@", 0)) is None + + def test_variable_reference_fails(self) -> None: + """'$' alone fails variable reference.""" + assert parse_argument_expression(Cursor("$", 0)) is None + + def test_string_literal_fails(self) -> None: + """Unclosed quote fails string literal.""" + assert parse_argument_expression(Cursor('"', 0)) is None + + def test_term_reference_fails(self) -> None: + """'-' alone fails term reference.""" + assert parse_argument_expression(Cursor("-", 0)) is None + + def test_negative_number_invalid(self) -> None: + """'-x' fails both term reference and number parse.""" + result = parse_argument_expression(Cursor("-x", 0)) + assert result is None or result is not None + + def test_placeable_fails(self) -> None: + """Invalid placeable content fails.""" + assert parse_argument_expression( + Cursor("{ @ }", 0) + ) is None + + def test_identifier_fails(self) -> None: + """Non-identifier start character fails.""" + assert parse_argument_expression(Cursor("@)", 0)) is None + + def test_function_reference_fails(self) -> None: + """Function reference with invalid args fails.""" + assert parse_argument_expression( + Cursor("FUNC(@)", 0) + ) is None + + def test_term_ref_fails_hyphen_only(self) -> None: + """Hyphen alone in argument position.""" + assert parse_argument_expression(Cursor("-)", 0)) is None + + def test_number_after_digit(self) -> None: + """Digit start parses as number.""" + result = parse_argument_expression(Cursor("0)", 0)) + assert result is not None + + def test_function_ref_fails_lower(self) -> None: + """Lowercase identifier with paren fails function ref.""" + result = parse_argument_expression(Cursor("func (", 0)) + assert result is None + + +class TestParseCallArguments: + """Tests for parse_call_arguments error paths.""" + + def test_named_arg_not_identifier(self) -> None: + """Named argument name must be identifier.""" + result = parse_call_arguments(Cursor('$var: "value")', 0)) + assert result is None + + def test_duplicate_named_argument(self) -> None: + """Duplicate named argument names fail.""" + assert parse_call_arguments( + Cursor("x: 1, x: 2)", 0) + ) is None + + def test_named_arg_missing_value(self) -> None: + """Expected value after ':'.""" + assert parse_call_arguments( + Cursor("x: )", 0) + ) is None + + def test_named_arg_value_parse_fails(self) -> None: + """Value expression parse fails.""" + assert parse_call_arguments( + Cursor("x: @)", 0) + ) is None + + def test_named_arg_non_literal_value(self) -> None: + """Named argument value must be literal.""" + assert parse_call_arguments( + Cursor("x: $var)", 0) + ) is None + + def test_positional_after_named_error(self) -> None: + """Positional args must come before named.""" + assert parse_call_arguments( + Cursor("x: 1, $var)", 0) + ) is None + + def test_trailing_comma(self) -> None: + """Trailing comma handled gracefully.""" + result = parse_call_arguments(Cursor("1, 2, )", 0)) + assert result is not None + assert len(result.value.positional) == 2 + + def test_argument_expression_fails(self) -> None: + """Argument expression parse fails.""" + assert parse_call_arguments(Cursor("@)", 0)) is None + + def test_named_arg_eof_after_colon(self) -> None: + """EOF after ':' in named argument.""" + assert parse_call_arguments(Cursor("x:", 0)) is None diff --git a/tests/syntax_parser_expressions_cases/function_reference.py b/tests/syntax_parser_expressions_cases/function_reference.py new file mode 100644 index 00000000..43f5ad72 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/function_reference.py @@ -0,0 +1,68 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# FUNCTION REFERENCE +# ============================================================================ + + +class TestParseFunctionReference: + """Tests for parse_function_reference paths.""" + + def test_valid_function(self) -> None: + """Valid function reference parses successfully.""" + result = parse_function_reference(Cursor("NUMBER(42)", 0)) + assert result is not None + + def test_function_with_named_args(self) -> None: + """Function with named arguments parses.""" + result = parse_function_reference( + Cursor('NUMBER(42, style: "percent")', 0) + ) + assert result is not None + + def test_missing_opening_paren(self) -> None: + """Returns None when '(' is missing.""" + assert parse_function_reference(Cursor("FUNC", 0)) is None + + def test_missing_closing_paren(self) -> None: + """Returns None when ')' is missing.""" + assert parse_function_reference( + Cursor("FUNC($x", 0) + ) is None + + def test_no_identifier(self) -> None: + """Returns None when identifier is missing.""" + assert parse_function_reference(Cursor(" ", 0)) is None + + def test_non_identifier_start(self) -> None: + """Returns None for non-identifier start.""" + assert parse_function_reference(Cursor("123", 0)) is None + + def test_depth_exceeded(self) -> None: + """Returns None when nesting depth exceeded.""" + context = ParseContext(max_nesting_depth=1, current_depth=2) + result = parse_function_reference( + Cursor("FUNC($x)", 0), context + ) + assert result is None + + def test_arguments_parse_fails(self) -> None: + """Returns None when call arguments fail.""" + assert parse_function_reference( + Cursor("FUNC(@)", 0) + ) is None + + def test_no_closing_paren_after_args(self) -> None: + """Function with incomplete arguments.""" + assert parse_function_reference( + Cursor("NUMBER(", 0) + ) is None + + def test_invalid_arg_syntax(self) -> None: + """Function with invalid argument syntax.""" + assert parse_function_reference( + Cursor("FUNC(,,,)", 0) + ) is None diff --git a/tests/syntax_parser_expressions_cases/inline_expression.py b/tests/syntax_parser_expressions_cases/inline_expression.py new file mode 100644 index 00000000..ecd15ffd --- /dev/null +++ b/tests/syntax_parser_expressions_cases/inline_expression.py @@ -0,0 +1,70 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# INLINE EXPRESSION +# ============================================================================ + + +class TestParseInlineExpression: + """Tests for parse_inline_expression dispatch.""" + + def test_eof_returns_none(self) -> None: + """EOF returns None.""" + assert parse_inline_expression(Cursor("", 0)) is None + + def test_variable_reference(self) -> None: + """'$' dispatches to variable reference.""" + result = parse_inline_expression(Cursor("$var", 0)) + assert result is not None + assert isinstance(result.value, VariableReference) + + def test_variable_reference_fails(self) -> None: + """'$' alone fails.""" + assert parse_inline_expression(Cursor("$", 0)) is None + + def test_string_literal(self) -> None: + """Quote dispatches to string literal.""" + result = parse_inline_expression(Cursor('"text"', 0)) + assert result is not None + assert isinstance(result.value, StringLiteral) + + def test_hyphen_dispatch(self) -> None: + """'-' dispatches to hyphen handler.""" + result = parse_inline_expression(Cursor("-brand", 0)) + assert result is not None + + def test_nested_placeable(self) -> None: + """'{' dispatches to nested placeable.""" + result = parse_inline_expression(Cursor("{ $var }", 0)) + assert result is not None + assert isinstance(result.value, Placeable) + + def test_nested_placeable_fails(self) -> None: + """Invalid nested placeable fails.""" + assert parse_inline_expression( + Cursor("{ @ }", 0) + ) is None + + def test_digit_dispatch(self) -> None: + """Digit dispatches to number literal.""" + result = parse_inline_expression(Cursor("42", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_identifier_dispatch(self) -> None: + """Identifier dispatches to message reference.""" + result = parse_inline_expression(Cursor("msg", 0)) + assert result is not None + assert isinstance(result.value, MessageReference) + + def test_invalid_char_returns_none(self) -> None: + """Invalid character returns None.""" + assert parse_inline_expression(Cursor("@", 0)) is None + + def test_inline_expression_past_eof(self) -> None: + """Cursor past content returns None.""" + result = parse_inline_expression(Cursor("$", 1)) + assert result is None diff --git a/tests/syntax_parser_expressions_cases/inline_expression_helpers.py b/tests/syntax_parser_expressions_cases/inline_expression_helpers.py new file mode 100644 index 00000000..4ed567d8 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/inline_expression_helpers.py @@ -0,0 +1,93 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# INLINE EXPRESSION HELPERS +# ============================================================================ + + +class TestInlineExpressionHelpers: + """Tests for inline expression helper functions.""" + + def test_inline_string_literal(self) -> None: + """String literal inline expression.""" + result = _parse_inline_string_literal(Cursor('"text"', 0)) + assert result is not None + assert isinstance(result.value, StringLiteral) + + def test_inline_string_literal_fails(self) -> None: + """Unclosed string literal returns None.""" + assert _parse_inline_string_literal(Cursor('"', 0)) is None + + def test_inline_number_literal(self) -> None: + """Number literal inline expression.""" + result = _parse_inline_number_literal(Cursor("42", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_inline_number_single_digit(self) -> None: + """Single digit number parses.""" + result = _parse_inline_number_literal(Cursor("1", 0)) + assert result is not None + + def test_inline_hyphen_term(self) -> None: + """Hyphen-prefixed term reference.""" + result = _parse_inline_hyphen(Cursor("-brand", 0)) + assert result is not None + assert isinstance(result.value, TermReference) + + def test_inline_hyphen_number(self) -> None: + """Hyphen-prefixed negative number.""" + result = _parse_inline_hyphen(Cursor("-123", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_inline_hyphen_fails(self) -> None: + """Hyphen alone returns None.""" + assert _parse_inline_hyphen(Cursor("-", 0)) is None + + def test_message_attribute_with_dot(self) -> None: + """Parse .attribute suffix.""" + attr, _ = _parse_message_attribute(Cursor(".attr", 0)) + assert attr is not None + assert isinstance(attr, Identifier) + + def test_message_attribute_no_dot(self) -> None: + """No dot returns None.""" + attr, _ = _parse_message_attribute(Cursor("x", 0)) + assert attr is None + + def test_message_attribute_identifier_fails(self) -> None: + """Dot followed by non-identifier returns None.""" + attr, _ = _parse_message_attribute(Cursor(".123", 0)) + assert attr is None + + def test_inline_identifier_function_call(self) -> None: + """Identifier followed by '(' is function call.""" + result = _parse_inline_identifier(Cursor("FUNC($x)", 0)) + assert result is not None + + def test_inline_identifier_message_ref(self) -> None: + """Identifier as message reference.""" + result = _parse_inline_identifier(Cursor("msg", 0)) + assert result is not None + assert isinstance(result.value, MessageReference) + + def test_inline_identifier_with_attribute(self) -> None: + """Message reference with attribute.""" + result = _parse_inline_identifier(Cursor("msg.attr", 0)) + assert result is not None + assert isinstance(result.value, MessageReference) + assert result.value.attribute is not None + + def test_inline_identifier_non_ident_start(self) -> None: + """Non-identifier start returns None.""" + assert _parse_inline_identifier(Cursor("123", 0)) is None + + def test_inline_identifier_function_fails(self) -> None: + """Lowercase function with invalid args fails.""" + assert _parse_inline_identifier( + Cursor("func(@)", 0) + ) is None diff --git a/tests/syntax_parser_expressions_cases/integration_via_fluentbundle.py b/tests/syntax_parser_expressions_cases/integration_via_fluentbundle.py new file mode 100644 index 00000000..4193d03e --- /dev/null +++ b/tests/syntax_parser_expressions_cases/integration_via_fluentbundle.py @@ -0,0 +1,152 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# INTEGRATION VIA FLUENTBUNDLE +# ============================================================================ + + +class TestExpressionsIntegration: + """Integration tests via FluentBundle for expression paths.""" + + def test_function_name_not_uppercase(self) -> None: + """Lowercase function name fails, soft recovery.""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource("msg = { lowercase() }") + result, errors = bundle.format_pattern("msg") + assert len(errors) > 0 or "{" in result + + def test_function_missing_paren(self) -> None: + """UPPERCASE without paren treated as message reference, soft recovery.""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource("msg = { NUMBER }") + result, errors = bundle.format_pattern("msg") + assert "{NUMBER}" in result or len(errors) > 0 + + def test_string_literal_selector(self) -> None: + """String literal as selector in select expression.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + 'msg = {"test" ->\n' + " [test] Matched\n" + " *[other] Other\n" + "}" + ) + result, _ = bundle.format_pattern("msg") + assert "Matched" in result or "test" in result + + def test_number_literal_selector(self) -> None: + """Number literal as selector.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + "msg = {42 ->\n" + " [42] Exact match\n" + " *[other] Other\n" + "}" + ) + result, _ = bundle.format_pattern("msg") + assert result is not None + + def test_nested_selects(self) -> None: + """Nested select expressions.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + "msg = {NUMBER(1) ->\n" + " [one] {NUMBER(2) ->\n" + " [one] One-One\n" + " *[other] One-Other\n" + " }\n" + " *[other] Other\n" + "}" + ) + result, _ = bundle.format_pattern("msg") + assert result is not None + + def test_function_with_multiple_args(self) -> None: + """Function call with multiple named arguments, soft recovery.""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource( + 'msg = {NUMBER(42, style: "percent")}' + ) + result, _ = bundle.format_pattern("msg") + assert result is not None + + def test_attribute_access(self) -> None: + """Message attribute reference in placeable.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + "base = Base\n" + " .attr = Attribute\n\n" + "msg = {base.attr}" + ) + result, _ = bundle.format_pattern("msg") + assert "Attribute" in result + + def test_term_attribute_selector(self) -> None: + """Term attribute as selector.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + "-brand = Firefox\n" + " .version = 1\n\n" + "msg = {-brand.version ->\n" + " [1] Version One\n" + " *[other] Other Version\n" + "}" + ) + result, _ = bundle.format_pattern("msg") + assert result is not None + + def test_deeply_nested_expressions(self) -> None: + """Deep nesting of expressions.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + "msg = {NUMBER(1) ->\n" + " [one] {NUMBER(2) ->\n" + " [one] {NUMBER(3) ->\n" + " [one] Deep\n" + " *[other] Level3\n" + " }\n" + " *[other] Level2\n" + " }\n" + " *[other] Level1\n" + "}" + ) + result, _ = bundle.format_pattern("msg") + assert result is not None + + def test_select_missing_arrow(self) -> None: + """Select expression without -> operator, soft recovery.""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource( + "msg = {NUMBER(1)\n" + " [one] One\n" + " *[other] Other\n" + "}" + ) + result, _errors = bundle.format_pattern("msg") + assert result is not None + + def test_select_missing_default_via_bundle(self) -> None: + """Select without default variant via bundle, soft recovery.""" + bundle = FluentBundle("en_US", strict=False) + bundle.add_resource( + "msg = {NUMBER(1) ->\n" + " [one] One\n" + " [two] Two\n" + "}" + ) + result, _errors = bundle.format_pattern("msg") + assert result is not None + + def test_unicode_expression(self) -> None: + """Unicode characters in expressions.""" + bundle = FluentBundle("en_US") + bundle.add_resource( + 'msg = {"Hello \\u4E16\\u754C" ->\n' + " *[other] Unicode test\n" + "}" + ) + result, _ = bundle.format_pattern("msg") + assert result is not None diff --git a/tests/syntax_parser_expressions_cases/line_targeted_coverage_parse_simple_pattern_parse_pattern.py b/tests/syntax_parser_expressions_cases/line_targeted_coverage_parse_simple_pattern_parse_pattern.py new file mode 100644 index 00000000..9795be78 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/line_targeted_coverage_parse_simple_pattern_parse_pattern.py @@ -0,0 +1,90 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# LINE-TARGETED COVERAGE (parse_simple_pattern / parse_pattern) +# ============================================================================ + + +class TestSimplePatternLineCoverage: + """Targeted line coverage for parse_simple_pattern.""" + + def test_accumulated_text_before_placeable_prepend(self) -> None: + """Accumulated text merged with last element before placeable.""" + result = parse_simple_pattern( + Cursor("First\n continued{$var}", 0) + ) + assert result is not None + + def test_accumulated_text_before_placeable_new(self) -> None: + """Accumulated text as new element before placeable.""" + result = parse_simple_pattern( + Cursor("\n start{$var}", 0) + ) + assert result is not None + + def test_finalize_accumulated_merged(self) -> None: + """Finalize accumulated text merged with existing element.""" + result = parse_simple_pattern( + Cursor("Text\n more continuation", 0) + ) + assert result is not None + + def test_finalize_accumulated_new_element(self) -> None: + """Finalize accumulated text as new element.""" + result = parse_simple_pattern( + Cursor("{$var}\n ending text", 0) + ) + assert result is not None + + def test_variant_continuation_extra_spaces(self) -> None: + """Variant value with extra indent before placeable.""" + source = ( + "msg = {$count ->\n" + " [one] Items:\n" + " {$count}\n" + " *[other] Items\n" + "}" + ) + result = parse_message(Cursor(source, 0), ParseContext()) + assert result is not None + assert isinstance(result.value, Message) + + def test_variant_trailing_accumulated_spaces(self) -> None: + """Variant ending with accumulated extra spaces.""" + source = ( + "msg = {$count ->\n" + " [one] Items\n\n" + " *[other] More\n" + "}" + ) + result = parse_message(Cursor(source, 0), ParseContext()) + assert result is not None + assert isinstance(result.value, Message) + + +class TestPatternLineCoverage: + """Targeted line coverage for parse_pattern.""" + + def test_accumulated_as_new_element(self) -> None: + """Accumulated continuation becomes new element.""" + result = parse_pattern( + Cursor("{$x}\n text after placeable", 0) + ) + assert result is not None + + def test_finalize_merged(self) -> None: + """Finalize merged with existing element.""" + result = parse_pattern( + Cursor("Text\n final continuation", 0) + ) + assert result is not None + + def test_finalize_new_element(self) -> None: + """Finalize as new element.""" + result = parse_pattern( + Cursor("{$x}\n final", 0) + ) + assert result is not None diff --git a/tests/syntax_parser_expressions_cases/message_content_validation.py b/tests/syntax_parser_expressions_cases/message_content_validation.py new file mode 100644 index 00000000..4b7c4769 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/message_content_validation.py @@ -0,0 +1,36 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VALIDATE MESSAGE CONTENT +# ============================================================================ + + +class TestValidateMessageContent: + """Tests for validate_message_content.""" + + def test_empty_pattern_with_attributes_valid(self) -> None: + """No pattern but with attributes is valid.""" + pattern = Pattern(elements=()) + attributes = [ + Attribute( + id=Identifier("attr"), + value=Pattern( + elements=(TextElement("val"),) + ), + ) + ] + assert validate_message_content(pattern, attributes) + + def test_pattern_no_attributes_valid(self) -> None: + """Pattern with no attributes is valid.""" + pattern = Pattern(elements=(TextElement("value"),)) + assert validate_message_content(pattern, []) + + def test_no_pattern_no_attributes_invalid(self) -> None: + """Neither pattern nor attributes is invalid.""" + assert not validate_message_content( + Pattern(elements=()), [] + ) diff --git a/tests/syntax_parser_expressions_cases/parse_context.py b/tests/syntax_parser_expressions_cases/parse_context.py new file mode 100644 index 00000000..f968d5ad --- /dev/null +++ b/tests/syntax_parser_expressions_cases/parse_context.py @@ -0,0 +1,21 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSE CONTEXT +# ============================================================================ + + +class TestParseContextDepthExceeded: + """Tests for ParseContext._depth_exceeded_flag edge case.""" + + def test_mark_depth_exceeded_with_none_flag(self) -> None: + """Handle _depth_exceeded_flag being None gracefully.""" + context = object.__new__(ParseContext) + object.__setattr__(context, "max_nesting_depth", 5) + object.__setattr__(context, "current_depth", 0) + object.__setattr__(context, "_depth_exceeded_flag", None) + context.mark_depth_exceeded() + assert context._depth_exceeded_flag is None diff --git a/tests/syntax_parser_expressions_cases/parser_rules_branch_coverage.py b/tests/syntax_parser_expressions_cases/parser_rules_branch_coverage.py new file mode 100644 index 00000000..3a60835b --- /dev/null +++ b/tests/syntax_parser_expressions_cases/parser_rules_branch_coverage.py @@ -0,0 +1,69 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSER/RULES BRANCH COVERAGE +# ============================================================================ + + +class TestParserRulesCoverage: + """Test parser/rules.py coverage gaps for function arguments.""" + + def test_placeable_as_function_argument(self) -> None: + """Placeable inside function call arguments parses successfully.""" + parser = FluentParserV1() + ftl = 'msg = { NUMBER({ "5" }) }' + + resource = parser.parse(ftl) + + assert len(resource.entries) == 1 + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + + def test_function_reference_as_argument(self) -> None: + """Function reference inside function arguments parses without crash.""" + parser = FluentParserV1() + ftl = "msg = { NUMBER(UPPER($val)) }" + + resource = parser.parse(ftl) + + assert len(resource.entries) >= 1 + + def test_uppercase_identifier_not_function(self) -> None: + """Uppercase identifier without parentheses is treated as message reference.""" + parser = FluentParserV1() + ftl = "msg = { THIS }" + + resource = parser.parse(ftl) + + assert len(resource.entries) == 1 + msg = resource.entries[0] + assert isinstance(msg, Message) + + +class TestParserRulesBranchCoverage: + """Additional tests for parser/rules branch coverage.""" + + def test_parse_complex_select_with_functions(self) -> None: + """Complex select expression with function calls in variants parses correctly.""" + parser = FluentParserV1() + ftl = """ +complex = { $gender -> + [male] Mr. { $lastName } + [female] Ms. { $lastName } + *[other] { $firstName } { $lastName } +} +""" + resource = parser.parse(ftl) + assert len(resource.entries) == 1 + + def test_parse_nested_function_calls(self) -> None: + """NUMBER with string literal argument parses correctly.""" + parser = FluentParserV1() + ftl = 'msg = { NUMBER("123.45") }' + + resource = parser.parse(ftl) + assert len(resource.entries) == 1 diff --git a/tests/syntax_parser_expressions_cases/placeable.py b/tests/syntax_parser_expressions_cases/placeable.py new file mode 100644 index 00000000..2d248965 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/placeable.py @@ -0,0 +1,83 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PLACEABLE +# ============================================================================ + + +class TestParsePlaceable: + """Tests for parse_placeable paths.""" + + def test_simple_variable(self) -> None: + """Parses simple variable placeable.""" + result = parse_placeable(Cursor("$var}", 0)) + assert result is not None + assert isinstance(result.value.expression, VariableReference) + + def test_depth_exceeded(self) -> None: + """Returns None when nesting depth exceeded.""" + context = ParseContext(max_nesting_depth=1, current_depth=2) + assert parse_placeable( + Cursor("$var}", 0), context + ) is None + + def test_expression_fails(self) -> None: + """Invalid expression content returns None.""" + assert parse_placeable(Cursor("@}", 0)) is None + + def test_whitespace_only(self) -> None: + """Only whitespace inside braces returns None.""" + assert parse_placeable(Cursor(" }", 1)) is None + + def test_empty_content(self) -> None: + """Empty content returns None.""" + assert parse_placeable(Cursor("}", 0)) is None + + def test_select_valid_selector(self) -> None: + """Select expression with valid selector.""" + result = parse_placeable( + Cursor("$x -> [one] 1 *[other] N}", 0) + ) + assert result is not None + + def test_select_expression_fails(self) -> None: + """Select expression parse fails (no variants).""" + assert parse_placeable(Cursor("$var -> }", 0)) is None + + def test_select_missing_closing_brace(self) -> None: + """Missing '}' after select expression.""" + assert parse_placeable( + Cursor("$var -> [one] 1 *[other] N", 0) + ) is None + + def test_simple_expression_missing_brace(self) -> None: + """Missing '}' after simple expression.""" + assert parse_placeable(Cursor("$var", 0)) is None + + def test_function_followed_by_hyphen(self) -> None: + """Function selector with hyphen (not ->) returns None.""" + assert parse_placeable( + Cursor("NUMBER(42)-}", 0) + ) is None + + def test_function_followed_by_hyphen_eof(self) -> None: + """Function selector with hyphen at EOF returns None.""" + assert parse_placeable( + Cursor("NUMBER(42)-", 0) + ) is None + + def test_message_ref_with_hyphen_in_name(self) -> None: + """Message ref with hyphen in identifier name.""" + result = parse_placeable(Cursor("msg-}", 0)) + assert result is not None + + def test_nested_opening_braces(self) -> None: + """Multiple nested opening braces fail.""" + assert parse_placeable(Cursor("{{{", 1)) is None + + def test_incomplete_expression(self) -> None: + """Incomplete expression returns None.""" + assert parse_placeable(Cursor("NUMBER", 0)) is None diff --git a/tests/syntax_parser_expressions_cases/term_reference.py b/tests/syntax_parser_expressions_cases/term_reference.py new file mode 100644 index 00000000..1adf84f3 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/term_reference.py @@ -0,0 +1,74 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TERM REFERENCE +# ============================================================================ + + +class TestParseTermReference: + """Tests for parse_term_reference paths.""" + + def test_valid_term(self) -> None: + """Valid term reference parses.""" + result = parse_term_reference(Cursor("-brand", 0)) + assert result is not None + assert result.value.id.name == "brand" + + def test_term_with_attribute(self) -> None: + """Term with .attribute access.""" + result = parse_term_reference(Cursor("-brand.short", 0)) + assert result is not None + assert result.value.attribute is not None + + def test_missing_hyphen(self) -> None: + """Returns None without '-' prefix.""" + assert parse_term_reference(Cursor("brand", 0)) is None + + def test_no_identifier_after_hyphen(self) -> None: + """Returns None when identifier missing after '-'.""" + assert parse_term_reference(Cursor("-", 0)) is None + + def test_no_identifier_with_spaces(self) -> None: + """Returns None with spaces after '-'.""" + assert parse_term_reference(Cursor("- ", 0)) is None + + def test_attribute_parse_fails(self) -> None: + """Dot without attribute name returns None.""" + assert parse_term_reference(Cursor("-term.", 0)) is None + + def test_attribute_with_spaces_fails(self) -> None: + """Dot followed by whitespace returns None.""" + assert parse_term_reference( + Cursor("-brand. ", 0) + ) is None + + def test_arguments_parse_fails(self) -> None: + """Invalid arguments return None.""" + assert parse_term_reference( + Cursor("-brand(@)", 0) + ) is None + + def test_arguments_missing_closing_paren(self) -> None: + """Missing ')' after term arguments.""" + assert parse_term_reference( + Cursor("-brand(case: 'nom'", 0) + ) is None + + def test_missing_closing_paren_no_args(self) -> None: + """Missing ')' after open paren.""" + assert parse_term_reference(Cursor("-brand(", 0)) is None + + def test_depth_exceeded(self) -> None: + """Returns None when nesting depth exceeded.""" + context = ParseContext(max_nesting_depth=1, current_depth=2) + result = parse_term_reference( + Cursor("-brand(case: 'nom')", 0), context + ) + assert result is None + + def test_attribute_identifier_parse_fails(self) -> None: + """Attribute identifier parse fails after dot.""" + assert parse_term_reference(Cursor("-brand.", 0)) is None diff --git a/tests/syntax_parser_expressions_cases/variable_reference.py b/tests/syntax_parser_expressions_cases/variable_reference.py new file mode 100644 index 00000000..13252bf5 --- /dev/null +++ b/tests/syntax_parser_expressions_cases/variable_reference.py @@ -0,0 +1,60 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VARIABLE REFERENCE +# ============================================================================ + + +class TestParseVariableReference: + """Tests for parse_variable_reference error and success paths.""" + + def test_no_dollar_sign(self) -> None: + """Returns None without '$' prefix.""" + assert parse_variable_reference(Cursor("name", 0)) is None + + def test_at_eof(self) -> None: + """Returns None at EOF.""" + assert parse_variable_reference(Cursor("", 0)) is None + + def test_dollar_only(self) -> None: + """Returns None with just '$' (no identifier).""" + assert parse_variable_reference(Cursor("$ ", 0)) is None + + def test_dollar_followed_by_digit(self) -> None: + """Returns None with '$' followed by digit.""" + assert parse_variable_reference(Cursor("$123", 0)) is None + + def test_valid_variable_reference(self) -> None: + """Parses valid '$name' as VariableReference.""" + result = parse_variable_reference(Cursor("$var", 0)) + assert result is not None + assert isinstance(result.value, VariableReference) + assert result.value.id.name == "var" + + @given(st.text(min_size=1).filter(lambda t: not t.startswith("$"))) + @example("") + @example("x") + def test_no_dollar_prefix_property(self, text: str) -> None: + """Non-$ prefixed text always returns None.""" + event(f"first_char={repr(text[:1]) if text else 'eof'}") + cursor = Cursor(text, 0) + result = parse_variable_reference(cursor) + assert result is None + + @given(st.text(max_size=0)) + @example("$") + @example("$123") + @example("$ ") + def test_dollar_without_valid_identifier_property( + self, suffix: str + ) -> None: + """'$' plus invalid identifier always returns None.""" + event(f"suffix_len={len(suffix)}") + text = "$" + suffix + cursor = Cursor(text, 0) + result = parse_variable_reference(cursor) + if result is not None: + assert isinstance(result.value, VariableReference) diff --git a/tests/syntax_parser_expressions_cases/variant_key_variant_marker.py b/tests/syntax_parser_expressions_cases/variant_key_variant_marker.py new file mode 100644 index 00000000..f412cf2e --- /dev/null +++ b/tests/syntax_parser_expressions_cases/variant_key_variant_marker.py @@ -0,0 +1,209 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VARIANT KEY & VARIANT MARKER +# ============================================================================ + + +class TestIsValidVariantKeyChar: + """Tests for _is_valid_variant_key_char helper.""" + + @given(st.sampled_from([".", "-", "_"])) + def test_special_chars_in_variant_keys(self, char: str) -> None: + """Special character handling follows identifier rules.""" + event(f"char={char!r}") + if char == "_": + assert _is_valid_variant_key_char(char, is_first=True) + else: + assert not _is_valid_variant_key_char(char, is_first=True) + assert _is_valid_variant_key_char(char, is_first=False) + + +class TestIsVariantMarker: + """Tests for _is_variant_marker lookahead logic.""" + + def test_eof_cursor_returns_false(self) -> None: + """EOF cursor returns False.""" + assert not _is_variant_marker(Cursor("", 0)) + + def test_empty_brackets_not_variant(self) -> None: + """Empty [] is not a variant key.""" + assert not _is_variant_marker(Cursor("[]", 0)) + + def test_bracket_at_eof_after_closing(self) -> None: + """Valid variant when ] at EOF.""" + assert _is_variant_marker(Cursor("[one]", 0)) + + def test_bracket_followed_by_newline(self) -> None: + """Valid variant when ] followed by newline.""" + assert _is_variant_marker(Cursor("[one]\n", 0)) + + def test_bracket_followed_by_closing_brace(self) -> None: + """Valid variant when ] followed by }.""" + assert _is_variant_marker(Cursor("[one]}", 0)) + + def test_bracket_followed_by_open_bracket(self) -> None: + """Valid variant when ] followed by [.""" + assert _is_variant_marker(Cursor("[one][two]", 0)) + + def test_bracket_followed_by_asterisk(self) -> None: + """Valid variant when ] followed by *.""" + assert _is_variant_marker(Cursor("[one]*[other]", 0)) + + def test_bracket_with_comma_not_variant(self) -> None: + """Comma makes it literal text, not variant.""" + assert not _is_variant_marker(Cursor("[1, 2]", 0)) + + def test_bracket_with_invalid_char_not_variant(self) -> None: + """Invalid char for identifier/number.""" + assert not _is_variant_marker(Cursor("[in@valid]", 0)) + + def test_bracket_exceeds_lookahead(self) -> None: + """Exceeded lookahead before finding ].""" + long_text = "[" + "a" * (MAX_LOOKAHEAD_CHARS + 10) + assert not _is_variant_marker(Cursor(long_text, 0)) + + def test_lookahead_exhausted_in_whitespace_scan(self) -> None: + """Lookahead exhausted while skipping whitespace after ].""" + text = "[one]" + " " * (MAX_LOOKAHEAD_CHARS + 10) + result = _is_variant_marker(Cursor(text, 0)) + assert isinstance(result, bool) + + def test_non_bracket_non_asterisk_returns_false(self) -> None: + """Non-[ non-* character returns False.""" + assert not _is_variant_marker(Cursor("x", 0)) + + def test_variant_marker_with_leading_space(self) -> None: + """Leading space after '[' is valid per Fluent EBNF.""" + assert _is_variant_marker(Cursor("[ one]", 0)) + + def test_variant_marker_with_multiple_leading_spaces(self) -> None: + """Multiple leading spaces after '[' are valid.""" + assert _is_variant_marker(Cursor("[ other]", 0)) + + @given( + num_spaces=st.integers(min_value=1, max_value=10), + key=st.sampled_from( + ["one", "other", "few", "many", "zero", "0", "42"] + ), + ) + def test_variant_marker_leading_spaces_property( + self, num_spaces: int, key: str + ) -> None: + """Any number of leading spaces in variant key is valid.""" + event(f"num_spaces={num_spaces}") + event(f"key_type={'digit' if key.isdigit() else 'ident'}") + source = f"[{' ' * num_spaces}{key}]" + assert _is_variant_marker(Cursor(source, 0)) + + +class TestParseVariantKey: + """Tests for parse_variant_key paths.""" + + def test_identifier_variant_key(self) -> None: + """Identifier parsed as variant key.""" + result = parse_variant_key(Cursor("abc", 0)) + assert result is not None + assert isinstance(result.value, Identifier) + assert result.value.name == "abc" + + def test_identifier_from_bracket(self) -> None: + """Variant key parsed from inside brackets.""" + result = parse_variant_key(Cursor("[abc]", 1)) + assert result is not None + assert isinstance(result.value, Identifier) + + def test_number_variant_key(self) -> None: + """Number parsed as variant key.""" + result = parse_variant_key(Cursor("42", 0)) + assert result is not None + assert isinstance(result.value, NumberLiteral) + + def test_negative_number_fallback_fails(self) -> None: + """Hyphen followed by non-digit: both number and identifier fail.""" + assert parse_variant_key(Cursor("-foo", 0)) is None + + def test_hyphen_alone_fails(self) -> None: + """Hyphen alone fails both number and identifier parse.""" + assert parse_variant_key(Cursor("-", 0)) is None + + def test_invalid_start_char_fails(self) -> None: + """Characters invalid for both number and identifier fail.""" + assert parse_variant_key(Cursor("???", 1)) is None + + @given(st.integers(min_value=0, max_value=1000)) + @example(42) + @example(-42) + @example(0) + def test_numeric_variant_key_property(self, num: int) -> None: + """Numeric variant keys parsed correctly.""" + event(f"num={num}") + result = parse_variant_key(Cursor(str(num), 0)) + if result is not None: + assert isinstance( + result.value, (NumberLiteral, Identifier) + ) + + +class TestTrimPatternBlankLines: + """Tests for _trim_pattern_blank_lines edge cases.""" + + def test_empty_returns_empty(self) -> None: + """Empty list returns empty tuple.""" + assert _trim_pattern_blank_lines([]) == () + + def test_single_placeable_preserved(self) -> None: + """Placeable-only pattern is preserved.""" + placeable = Placeable( + expression=VariableReference(id=Identifier("x")) + ) + result = _trim_pattern_blank_lines([placeable]) + assert len(result) == 1 + assert result[0] == placeable + + def test_text_with_content_after_newline_preserved(self) -> None: + """Content after last newline is preserved.""" + elements = cast( + "list[TextElement | Placeable]", + [TextElement(value="Hello\nWorld")], + ) + result = _trim_pattern_blank_lines(elements) + assert len(result) == 1 + assert isinstance(result[0], TextElement) + assert result[0].value == "Hello\nWorld" + + def test_trailing_blank_line_removed(self) -> None: + """Trailing blank line is removed.""" + elements = cast( + "list[TextElement | Placeable]", + [TextElement(value="Content\n \n")], + ) + result = _trim_pattern_blank_lines(elements) + assert len(result) == 1 + assert isinstance(result[0], TextElement) + assert result[0].value == "Content" + + def test_leading_all_whitespace_removed(self) -> None: + """First element all whitespace is removed.""" + elements = cast( + "list[TextElement | Placeable]", + [TextElement(value=" "), TextElement(value="content")], + ) + result = _trim_pattern_blank_lines(elements) + assert len(result) == 1 + assert isinstance(result[0], TextElement) + assert result[0].value == "content" + + def test_trailing_all_whitespace_removed(self) -> None: + """Last element all whitespace after trimming is removed.""" + elements = cast( + "list[TextElement | Placeable]", + [TextElement(value="content"), TextElement(value="\n ")], + ) + result = _trim_pattern_blank_lines(elements) + assert len(result) == 1 + assert isinstance(result[0], TextElement) + assert result[0].value == "content" diff --git a/tests/syntax_parser_expressions_cases/variant_select_expression.py b/tests/syntax_parser_expressions_cases/variant_select_expression.py new file mode 100644 index 00000000..8a16961f --- /dev/null +++ b/tests/syntax_parser_expressions_cases/variant_select_expression.py @@ -0,0 +1,272 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_expressions.py.""" + +from tests.syntax_parser_expressions_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VARIANT & SELECT EXPRESSION +# ============================================================================ + + +class TestParseVariant: + """Tests for parse_variant error paths.""" + + def test_missing_opening_bracket(self) -> None: + """Returns None when '[' is missing.""" + assert parse_variant(Cursor("one", 0)) is None + + def test_missing_closing_bracket(self) -> None: + """Returns None when ']' is missing.""" + assert parse_variant(Cursor("[one", 0)) is None + + def test_invalid_key(self) -> None: + """Returns None when variant key is invalid.""" + assert parse_variant(Cursor("[@]", 0)) is None + + def test_variant_with_pattern(self) -> None: + """Variant with text pattern succeeds.""" + result = parse_variant(Cursor("[one] item", 0)) + assert result is not None + assert isinstance(result.value, Variant) + + def test_variant_with_empty_pattern(self) -> None: + """Variant with empty pattern succeeds.""" + result = parse_variant(Cursor("[one] ", 0)) + assert result is not None or result is None + + +class TestParseSelectExpression: + """Tests for parse_select_expression validation and EOF handling.""" + + def test_no_variants_returns_none(self) -> None: + """Must have at least one variant.""" + selector = VariableReference(id=Identifier("count")) + result = parse_select_expression( + Cursor("}", 0), selector, 0 + ) + assert result is None + + def test_no_default_variant_returns_none(self) -> None: + """Must have exactly one default variant.""" + selector = VariableReference(id=Identifier("count")) + result = parse_select_expression( + Cursor("[one] item\n}", 0), selector, 0 + ) + assert result is None + + def test_multiple_defaults_returns_none(self) -> None: + """Multiple default variants detected.""" + selector = VariableReference(id=Identifier("count")) + result = parse_select_expression( + Cursor("*[one] One\n*[other] Other", 0), selector, 0 + ) + assert result is None + + def test_variant_parse_fails_in_loop(self) -> None: + """Variant parse failure in loop returns None.""" + selector = VariableReference(id=Identifier("x")) + result = parse_select_expression( + Cursor("[@]", 0), selector, 0 + ) + assert result is None + + def test_eof_after_variant_whitespace(self) -> None: + """EOF reached after skip_blank between variants.""" + source = "*[other] value\n\n\n" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 1 + assert result.cursor.is_eof + + def test_eof_multiple_blank_lines_after_variant(self) -> None: + """EOF with multiple blank lines after variant.""" + source = "*[other] text\n\n\n\n" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 1 + assert result.cursor.is_eof + + def test_eof_single_newline_after_variant(self) -> None: + """EOF with single newline after variant.""" + source = "*[default] value\n" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 1 + assert result.cursor.is_eof + + def test_eof_empty_pattern_variant(self) -> None: + """Variant with empty pattern followed by EOF.""" + source = "*[other]\n\n" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 1 + assert len(result.value.variants[0].value.elements) == 0 + assert result.cursor.is_eof + + def test_eof_multiple_variants(self) -> None: + """Multiple variants with EOF after last one.""" + source = "[one] singular\n*[other] plural\n\n" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 2 + assert result.cursor.is_eof + + def test_eof_complex_pattern(self) -> None: + """Complex pattern in variant, then EOF.""" + source = "*[other] You have items\n\n" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 1 + assert result.cursor.is_eof + + def test_immediate_eof(self) -> None: + """EOF immediately after arrow position.""" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor("", 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is None + + def test_whitespace_then_eof(self) -> None: + """Only whitespace after arrow, then EOF.""" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(" \n ", 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is None + + def test_variant_leading_spaces_integration(self) -> None: + """Variant keys with leading spaces via parse_message.""" + source = ( + "msg = {$count ->\n" + " [ one] item\n" + " *[other] items\n}" + ) + result = parse_message(Cursor(source, 0), ParseContext()) + assert result is not None + message = result.value + assert message.value is not None + assert len(message.value.elements) == 1 + placeable = message.value.elements[0] + assert isinstance(placeable, Placeable) + assert isinstance(placeable.expression, SelectExpression) + + def test_multiline_select_complex_spacing(self) -> None: + """Complex spacing and continuation in variant patterns.""" + source = ( + "msg = {$count ->\n" + " [ zero]\n" + " No items\n" + " [one]\n" + " {$count} item\n" + " *[other]\n" + " {$count} items\n" + "}" + ) + result = parse_message(Cursor(source, 0), ParseContext()) + assert result is not None + assert result.value.value is not None + + @given(st.integers(min_value=1, max_value=20)) + @example(1) + @example(5) + @example(20) + def test_eof_variable_newlines_property( + self, num_newlines: int + ) -> None: + """Various numbers of trailing newlines trigger EOF handling.""" + event(f"num_newlines={num_newlines}") + source = f"*[other] value{'\\n' * num_newlines}" + # Build actual newlines + source = "*[other] value" + "\n" * num_newlines + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 1 + assert result.cursor.is_eof + + @given(st.text(alphabet="\n", min_size=1, max_size=50)) + @example("\n") + @example("\n\n\n") + @example("\n\n\n\n\n") + def test_eof_arbitrary_newlines_property( + self, whitespace: str + ) -> None: + """Arbitrary newline sequences after variant trigger EOF.""" + event(f"ws_len={len(whitespace)}") + source = f"*[other] text{whitespace}" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + assert result is not None + assert len(result.value.variants) == 1 + assert result.cursor.is_eof + + @given( + st.lists( + st.sampled_from( + ["[one]", "[two]", "[zero]", "*[other]"] + ), + min_size=1, + max_size=5, + ) + ) + @example(["*[other]"]) + @example(["[one]", "*[other]"]) + def test_variant_configurations_property( + self, variant_keys: list[str] + ) -> None: + """Various variant configurations with EOF handling.""" + num_keys = len(variant_keys) + has_default = any("*" in k for k in variant_keys) + event(f"num_variants={num_keys}") + event(f"has_default={has_default}") + variants_text = "\n".join( + f"{key} text" for key in variant_keys + ) + source = f"{variants_text}\n\n" + selector = VariableReference(id=None) # type: ignore[arg-type] + result = parse_select_expression( + Cursor(source, 0), selector, start_pos=0, + context=ParseContext(), + ) + default_count = sum( + 1 for key in variant_keys if "*" in key + ) + if default_count == 1: + assert result is not None + assert len(result.value.variants) == len(variant_keys) + assert result.cursor.is_eof + else: + assert result is None diff --git a/tests/syntax_parser_patterns_cases/__init__.py b/tests/syntax_parser_patterns_cases/__init__.py new file mode 100644 index 00000000..d4de2351 --- /dev/null +++ b/tests/syntax_parser_patterns_cases/__init__.py @@ -0,0 +1,68 @@ +"""Tests for parser pattern and whitespace handling. + +Tests whitespace utilities (skip_blank_inline, skip_blank, +is_indented_continuation, skip_multiline_pattern_start) and pattern +parsing (parse_pattern, parse_simple_pattern) including multiline +continuation, blank line handling, text accumulation, variant delimiter +lookahead, and CRLF normalization. +""" + +from __future__ import annotations + +from unittest.mock import patch + +import pytest +from hypothesis import event, example, given +from hypothesis import strategies as st + +from ftllexengine import parse_ftl +from ftllexengine.runtime.bundle import FluentBundle +from ftllexengine.syntax.ast import ( + Message, + Pattern, + Placeable, + SelectExpression, + Term, + TextElement, +) +from ftllexengine.syntax.cursor import Cursor +from ftllexengine.syntax.parser.rules import ( + ParseContext, + parse_message, + parse_pattern, + parse_simple_pattern, + parse_variant, +) +from ftllexengine.syntax.parser.whitespace import ( + is_indented_continuation, + skip_blank, + skip_blank_inline, + skip_multiline_pattern_start, +) + +__all__ = [ + "Cursor", + "FluentBundle", + "Message", + "ParseContext", + "Pattern", + "Placeable", + "SelectExpression", + "Term", + "TextElement", + "event", + "example", + "given", + "is_indented_continuation", + "parse_ftl", + "parse_message", + "parse_pattern", + "parse_simple_pattern", + "parse_variant", + "patch", + "pytest", + "skip_blank", + "skip_blank_inline", + "skip_multiline_pattern_start", + "st", +] diff --git a/tests/syntax_parser_patterns_cases/hypothesis_property_tests.py b/tests/syntax_parser_patterns_cases/hypothesis_property_tests.py new file mode 100644 index 00000000..fe9e9010 --- /dev/null +++ b/tests/syntax_parser_patterns_cases/hypothesis_property_tests.py @@ -0,0 +1,216 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_patterns.py.""" + +from tests.syntax_parser_patterns_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# HYPOTHESIS PROPERTY TESTS +# ============================================================================ + + +class TestPatternsHypothesis: + """Property-based tests for pattern and whitespace handling.""" + + @given(st.integers(min_value=0, max_value=100)) + def test_skip_blank_inline_various_counts( + self, space_count: int + ) -> None: + """Any number of spaces skipped by skip_blank_inline.""" + event(f"space_count={space_count}") + source = " " * space_count + "hello" + cursor = Cursor(source=source, pos=0) + assert skip_blank_inline(cursor).pos == space_count + + @given(st.integers(min_value=1, max_value=20)) + def test_is_indented_continuation_various( + self, indent_count: int + ) -> None: + """Any indentation level detected as continuation.""" + event(f"indent_count={indent_count}") + source = "\n" + " " * indent_count + "text" + cursor = Cursor(source=source, pos=0) + assert is_indented_continuation(cursor) is True + + @given( + extra_indent=st.integers(min_value=1, max_value=12), + base_indent=st.integers(min_value=4, max_value=8), + ) + def test_extra_spaces_before_placeable( + self, extra_indent: int, base_indent: int + ) -> None: + """Extra indentation before placeable is preserved.""" + boundary = "deep" if extra_indent > 8 else "shallow" + event(f"boundary={boundary}") + event(f"base_indent={base_indent}") + base = " " * base_indent + extra = " " * (base_indent + extra_indent) + ftl = f"msg =\n{base}text\n{extra}{{$var}}" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + elements = msg.value.elements + assert len(elements) >= 2 + assert isinstance(elements[-1], Placeable) + + @given(trailing_spaces=st.integers(min_value=1, max_value=20)) + def test_trailing_spaces_handled( + self, trailing_spaces: int + ) -> None: + """Patterns with trailing spaces parse successfully.""" + event(f"trailing_spaces={trailing_spaces}") + spaces = " " * trailing_spaces + ftl = f"msg =\n text\n{spaces}" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + + @given( + base_indent=st.integers(min_value=4, max_value=8), + extra_indent=st.integers(min_value=1, max_value=8), + ) + def test_extra_indent_handling( + self, base_indent: int, extra_indent: int + ) -> None: + """Extra indentation correctly accumulated.""" + event(f"extra_indent={extra_indent}") + event(f"base_indent={base_indent}") + base = " " * base_indent + extra = " " * (base_indent + extra_indent) + ftl = f"msg =\n{base}first\n{extra}second" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert "first" in text + assert "second" in text + + @given( + num_lines=st.integers(min_value=2, max_value=5), + indent_base=st.integers(min_value=4, max_value=8), + ) + def test_multiline_extra_indent_accumulation( + self, num_lines: int, indent_base: int + ) -> None: + """Multiple lines with extra indent accumulate correctly.""" + event(f"num_lines={num_lines}") + event(f"indent_base={indent_base}") + lines_ftl = [f"line{i}" for i in range(num_lines)] + base = " " * indent_base + ftl_lines = ["msg ="] + ftl_lines.append(f"{base}{lines_ftl[0]}") + for i in range(1, num_lines): + extra = " " * (i % 3) + ftl_lines.append(f"{base}{extra}{lines_ftl[i]}") + ftl = "\n".join(ftl_lines) + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + for line_text in lines_ftl: + assert line_text in text + + @given( + extra_indent=st.integers(min_value=1, max_value=8), + base_indent=st.integers(min_value=4, max_value=8), + ) + def test_term_extra_indent( + self, extra_indent: int, base_indent: int + ) -> None: + """Terms handle extra indentation like messages.""" + event(f"extra_indent={extra_indent}") + event(f"base_indent={base_indent}") + base = " " * base_indent + extra = " " * (base_indent + extra_indent) + ftl = f"-term =\n{base}first\n{extra}second" + resource = parse_ftl(ftl) + term = resource.entries[0] + assert isinstance(term, Term) + text = "".join( + e.value + for e in term.value.elements + if isinstance(e, TextElement) + ) + assert "first" in text + assert "second" in text + + @given( + num_blank_lines=st.integers(min_value=1, max_value=10), + indent_size=st.integers(min_value=1, max_value=8), + ) + def test_blank_lines_and_indentation( + self, num_blank_lines: int, indent_size: int + ) -> None: + """Any blank lines before content strip indent.""" + event(f"num_blank_lines={num_blank_lines}") + event(f"indent_size={indent_size}") + blank_lines = "\n" * num_blank_lines + indent = " " * indent_size + ftl = f"msg ={blank_lines}{indent}content" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + assert msg.value.elements[0].value == "content" # type: ignore[union-attr] + + @given( + content=st.text( + min_size=1, + max_size=50, + alphabet="abcdefghijklmnopqrstuvwxyz", + ) + ) + def test_content_preserved_after_blank_lines( + self, content: str + ) -> None: + """Content after blank lines is preserved exactly.""" + event(f"content_length={len(content)}") + ftl = f"msg =\n\n {content}" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert text == content + + @example("Hello") + @example("Line1\nLine2") + @given(st.text(min_size=1, max_size=50)) + def test_parse_simple_pattern_property( + self, text: str + ) -> None: + """parse_simple_pattern handles arbitrary text.""" + if not text or text[0] in ("}", "[", "*"): + return + has_newline = "\n" in text + event(f"has_newline={has_newline}") + cursor = Cursor(text, 0) + result = parse_simple_pattern(cursor) + outcome = "parsed" if result else "none" + event(f"outcome={outcome}") + assert result is None or isinstance(result.value, Pattern) + + @example("value") + @example("{$x}") + @given(st.text(min_size=1, max_size=50)) + def test_parse_pattern_property(self, text: str) -> None: + """parse_pattern handles arbitrary text.""" + has_placeable = "{" in text + event(f"has_placeable={has_placeable}") + cursor = Cursor(text, 0) + result = parse_pattern(cursor) + outcome = "parsed" if result else "none" + event(f"outcome={outcome}") + assert result is None or isinstance(result.value, Pattern) diff --git a/tests/syntax_parser_patterns_cases/multiline_blank_lines.py b/tests/syntax_parser_patterns_cases/multiline_blank_lines.py new file mode 100644 index 00000000..3b7985d2 --- /dev/null +++ b/tests/syntax_parser_patterns_cases/multiline_blank_lines.py @@ -0,0 +1,241 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_patterns.py.""" + +from tests.syntax_parser_patterns_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# MULTILINE BLANK LINES +# ============================================================================ + + +class TestMultilineBlankLines: + """Tests for blank line handling in multiline patterns.""" + + def test_single_blank_line_before_content(self) -> None: + """Single blank line before content strips indentation.""" + ftl = "msg =\n\n value" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + assert msg.value.elements[0].value == "value" # type: ignore[union-attr] + + def test_multiple_blank_lines_before_content(self) -> None: + """Multiple blank lines before content strips indentation.""" + ftl = "msg =\n\n\n\n value" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value.elements[0].value == "value" # type: ignore[union-attr] + + def test_with_subsequent_lines(self) -> None: + """Blank line before content with subsequent lines.""" + ftl = "msg =\n\n first\n second" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert text == "first\nsecond" + + def test_with_extra_indentation(self) -> None: + """Blank line before content preserves extra indentation.""" + ftl = "msg =\n\n first\n second" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert text == "first\n second" + + def test_bundle_format(self) -> None: + """FluentBundle correctly formats with blank line before content.""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg =\n\n Hello World") + result, errors = bundle.format_pattern("msg") + assert not errors + assert result == "Hello World" + + def test_with_placeable(self) -> None: + """Blank line before content with placeable.""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg =\n\n Hello { $name }") + result, errors = bundle.format_pattern( + "msg", {"name": "Alice"} + ) + assert not errors + assert "Hello" in result + assert "Alice" in result + + def test_blank_line_at_end(self) -> None: + """Blank line at end of pattern handled correctly.""" + ftl = "msg =\n first\n\n second" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert "first" in text + assert "second" in text + + def test_mixed_blank_lines(self) -> None: + """Blank lines at various positions.""" + ftl = "msg =\n\n first\n\n second\n\n third" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert "first" in text + assert "second" in text + assert "third" in text + + def test_term_blank_line_before_content(self) -> None: + """Term with blank line before content.""" + ftl = "-brand =\n\n Firefox" + resource = parse_ftl(ftl) + term = resource.entries[0] + assert isinstance(term, Term) + text = "".join( + e.value + for e in term.value.elements + if isinstance(e, TextElement) + ) + assert text == "Firefox" + + def test_multiple_blank_lines_in_continuation(self) -> None: + """Multiple consecutive blank lines within continuation.""" + ftl = "msg =\n first\n\n\n second" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert "first" in text + assert "second" in text + + def test_term_blank_lines_in_continuation(self) -> None: + """Term with blank lines in continuation.""" + ftl = "-term =\n\n\n content" + resource = parse_ftl(ftl) + term = resource.entries[0] + assert isinstance(term, Term) + text = "".join( + e.value + for e in term.value.elements + if isinstance(e, TextElement) + ) + assert text == "content" + + def test_placeable_after_blanks_with_extra_indent(self) -> None: + """Placeable after blank lines with extra indentation.""" + ftl = "msg =\n text\n\n\n {$var}" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + has_text = any( + isinstance(e, TextElement) for e in msg.value.elements + ) + has_placeable = any( + isinstance(e, Placeable) for e in msg.value.elements + ) + assert has_text + assert has_placeable + + def test_only_extra_spaces_no_content(self) -> None: + """Continuation with only extra spaces, no actual content.""" + ftl = "msg =\n text\n\n more" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert "text" in text + assert "more" in text + + def test_complex_mixed_pattern(self) -> None: + """Complex pattern mixing all edge cases.""" + ftl = "msg =\n\n\n first\n\n {$var}\n\n\n last" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + has_text = any( + isinstance(e, TextElement) for e in msg.value.elements + ) + has_placeable = any( + isinstance(e, Placeable) for e in msg.value.elements + ) + assert has_text + assert has_placeable + + def test_original_regression(self) -> None: + """FTL-GRAMMAR-001: blank line sets common_indent to 0.""" + ftl = "msg =\n\n value" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + element = msg.value.elements[0] # type: ignore[union-attr] + assert isinstance(element, TextElement) + assert element.value == "value", ( + f"common_indent bug: expected 'value', got " + f"'{element.value}'" + ) + + def test_regression_variant_simple_pattern(self) -> None: + """Regression: parse_simple_pattern blank line indent.""" + ftl = """msg = { $n -> + [one] + + item + *[other] items +}""" + bundle = FluentBundle("en_US") + bundle.add_resource(ftl) + result, errors = bundle.format_pattern("msg", {"n": 1}) + assert not errors + assert "item" in result + assert " item" not in result + + @pytest.mark.parametrize( + ("ftl", "expected"), + [ + ("msg =\n\n x", "x"), + ("msg =\n\n\n x", "x"), + ("msg =\n\n\n\n\n x", "x"), + ("msg =\n\n x", "x"), + ("msg =\n\n x", "x"), + ], + ) + def test_parametrized_blank_line_scenarios( + self, ftl: str, expected: str + ) -> None: + """Various blank line scenarios all strip indentation.""" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert text == expected diff --git a/tests/syntax_parser_patterns_cases/parse_pattern_cases.py b/tests/syntax_parser_patterns_cases/parse_pattern_cases.py new file mode 100644 index 00000000..1d56fe44 --- /dev/null +++ b/tests/syntax_parser_patterns_cases/parse_pattern_cases.py @@ -0,0 +1,245 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_patterns.py.""" + +from tests.syntax_parser_patterns_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSE_PATTERN +# ============================================================================ + + +class TestParsePatternBasic: + """Tests for parse_pattern basic behavior.""" + + def test_no_text_before_newline(self) -> None: + """Empty pattern at newline (cursor.pos == text_start).""" + result = parse_pattern(Cursor("\n", 0)) + assert result is not None + assert len(result.value.elements) == 0 + + def test_placeable_then_newline(self) -> None: + """Placeable immediately followed by newline.""" + result = parse_pattern(Cursor("{$var}\n", 0)) + assert result is not None + assert len(result.value.elements) == 1 + + def test_placeable_parse_fails(self) -> None: + """Returns None when parse_placeable fails.""" + cursor = Cursor("Text {invalid", 0) + with patch( + "ftllexengine.syntax.parser.expressions.parse_placeable", + return_value=None, + ): + result = parse_pattern(cursor) + assert result is None + + def test_stop_char_not_placeable(self) -> None: + """Pattern with stop character that's not '{'.""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg = Value\n") + result, errors = bundle.format_pattern("msg") + assert not errors + assert "Value" in result + + def test_empty_pattern_with_attribute(self) -> None: + """Empty pattern followed by attribute.""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg =\n .attr = Attribute\n") + result, errors = bundle.format_pattern("msg", attribute="attr") + assert not errors + assert "Attribute" in result + + def test_pattern_at_eof(self) -> None: + """Pattern at EOF without trailing newline.""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg = Value at EOF") + result, errors = bundle.format_pattern("msg") + assert not errors + assert "Value at EOF" in result + + +class TestParsePatternTopLevelDelimiters: + """Tests for top-level pattern delimiter handling. + + In top-level patterns (not inside select expressions), characters + like }, [, * are literal text, not structural delimiters. + """ + + def test_close_brace_is_text(self) -> None: + """} is literal text in top-level patterns.""" + result = parse_pattern(Cursor("}text", 0)) + assert result is not None + assert len(result.value.elements) == 1 + assert result.value.elements[0].value == "}text" # type: ignore[union-attr] + + def test_bracket_is_text(self) -> None: + """[ is literal text in top-level patterns.""" + result = parse_pattern(Cursor("[text", 0)) + assert result is not None + assert len(result.value.elements) == 1 + assert result.value.elements[0].value == "[text" # type: ignore[union-attr] + + def test_asterisk_is_text(self) -> None: + """* is literal text in top-level patterns.""" + result = parse_pattern(Cursor("*text", 0)) + assert result is not None + assert len(result.value.elements) == 1 + assert result.value.elements[0].value == "*text" # type: ignore[union-attr] + + def test_special_char_sequences(self) -> None: + """Multiple delimiters are all literal text.""" + result = parse_pattern(Cursor("}}]]", 0)) + assert result is not None + assert len(result.value.elements) == 1 + assert result.value.elements[0].value == "}}]]" # type: ignore[union-attr] + + def test_stop_char_advances_cursor(self) -> None: + """] at position 0 advances cursor to prevent infinite loop.""" + result = parse_pattern(Cursor("]", 0)) + assert result is not None + assert result.cursor.pos >= 1 or result.cursor.is_eof + + def test_includes_special_chars_combined(self) -> None: + """All delimiter characters are literal in top-level patterns.""" + for delimiter in ["}", "[", "*"]: + result = parse_pattern(Cursor(f"text{delimiter}more", 0)) + assert result is not None + assert len(result.value.elements) == 1 + expected = f"text{delimiter}more" + assert result.value.elements[0].value == expected # type: ignore[union-attr] + + +class TestParsePatternContinuation: + """Tests for continuation handling in parse_pattern.""" + + def test_crlf_multiline(self) -> None: + """CRLF in multiline continuation.""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg = First line\r\n Second line") + result, _ = bundle.format_pattern("msg") + assert "First line" in result + assert "Second line" in result + + def test_cr_only_continuation(self) -> None: + """CR (old Mac style) at continuation.""" + cursor = Cursor("msg = First\r Second", 6) + result = parse_pattern(cursor) + assert result is not None + assert len(result.value.elements) > 0 + + def test_continuation_after_placeable(self) -> None: + """Multiline continuation after placeable adds space element.""" + bundle = FluentBundle("en_US") + bundle.add_resource("msg = {NUMBER(5)}\n continued text") + result, _ = bundle.format_pattern("msg") + assert "5" in result + assert "continued text" in result + + def test_extra_spaces_before_placeable(self) -> None: + """Extra indentation before placeable in top-level pattern.""" + ftl = "msg =\n first\n {$var}" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + has_placeable = any( + isinstance(e, Placeable) for e in msg.value.elements + ) + assert has_placeable + + def test_trailing_extra_spaces(self) -> None: + """Trailing extra spaces at end of top-level pattern.""" + ftl = "msg =\n first\n " + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + assert len(msg.value.elements) >= 1 + + def test_extra_indent_preserved(self) -> None: + """Extra indentation beyond common indent is preserved.""" + ftl = "msg =\n first\n second" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert "first" in text + assert "second" in text + + def test_varying_extra_indent(self) -> None: + """Multiple lines with varying extra indentation.""" + ftl = "msg =\n base\n extra4\n extra8" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + assert len(msg.value.elements) >= 1 + + def test_accumulated_spaces_prepended(self) -> None: + """Accumulated extra spaces prepended to following text.""" + ftl = "msg =\n first\n more text" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + assert "first" in text + assert "more text" in text + + def test_multiple_continuations_varying_indent(self) -> None: + """Multiple continuation lines with varying extra indentation.""" + ftl = "msg =\n l1\n l2\n l3\n l4" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + text = "".join( + e.value + for e in msg.value.elements # type: ignore[union-attr] + if isinstance(e, TextElement) + ) + for line in ["l1", "l2", "l3", "l4"]: + assert line in text + + def test_continuation_new_element_no_prior(self) -> None: + """Accumulated continuation before text, no prior elements.""" + result = parse_pattern(Cursor(" continuation\n more", 0)) + assert result is not None + + def test_continuation_new_element_last_placeable(self) -> None: + """Accumulated continuation merged after placeable.""" + result = parse_pattern(Cursor("{$x}\n text more", 0)) + assert result is not None + + def test_finalize_continuation_no_prior(self) -> None: + """Finalize accumulated text when no prior elements.""" + result = parse_pattern(Cursor(" only continuation", 0)) + assert result is not None + + def test_finalize_continuation_last_placeable(self) -> None: + """Finalize accumulated text when last is placeable.""" + result = parse_pattern(Cursor("{$x}\n final", 0)) + assert result is not None + + def test_empty_pattern_continuation(self) -> None: + """Continuation with empty elements list (newline at pos 0).""" + result = parse_pattern(Cursor("\n text", 0)) + assert result is not None + + def test_term_extra_indent_before_placeable(self) -> None: + """Term with extra indentation before placeable.""" + ftl = "-term =\n first\n {$var}" + resource = parse_ftl(ftl) + term = resource.entries[0] + assert isinstance(term, Term) + assert term.value is not None + has_placeable = any( + isinstance(e, Placeable) for e in term.value.elements + ) + assert has_placeable diff --git a/tests/syntax_parser_patterns_cases/parse_simple_pattern_cases.py b/tests/syntax_parser_patterns_cases/parse_simple_pattern_cases.py new file mode 100644 index 00000000..ac18d86c --- /dev/null +++ b/tests/syntax_parser_patterns_cases/parse_simple_pattern_cases.py @@ -0,0 +1,318 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_patterns.py.""" + +from tests.syntax_parser_patterns_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PARSE_SIMPLE_PATTERN +# ============================================================================ + + +class TestParseSimplePattern: + """Tests for parse_simple_pattern basic behavior.""" + + def test_with_variable(self) -> None: + """Parses pattern with variable reference.""" + cursor = Cursor("Hello {$name}", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert len(result.value.elements) == 2 + + def test_stops_at_bracket(self) -> None: + """Bracket lookahead: [key]rest is literal text.""" + cursor = Cursor("Value[key]rest", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert result.value.elements[0].value == "Value[key]rest" # type: ignore[union-attr] + assert result.cursor.is_eof + + # [key] followed by } IS a variant marker + cursor = Cursor("Value [one]}", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert result.value.elements[0].value == "Value " # type: ignore[union-attr] + assert result.cursor.current == "[" + + def test_stops_at_asterisk(self) -> None: + """Asterisk lookahead: *[ is variant, * alone is literal.""" + cursor = Cursor("Text*[other]", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert result.cursor.current == "*" + + cursor = Cursor("Text*rest", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert result.value.elements[0].value == "Text*rest" # type: ignore[union-attr] + + def test_stops_at_brace(self) -> None: + """Stops at } (expression end).""" + cursor = Cursor("Value}rest", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert result.cursor.current == "}" + + def test_placeable_parse_fails(self) -> None: + """Returns None when placeable parsing fails.""" + cursor = Cursor("Text {invalid", 0) + with patch( + "ftllexengine.syntax.parser.expressions.parse_placeable", + return_value=None, + ): + result = parse_simple_pattern(cursor) + assert result is None + + def test_variant_markers_lookahead(self) -> None: + """Variant markers vs literal text disambiguation.""" + # *[other] IS a variant marker + cursor = Cursor("*[other]", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert len(result.value.elements) == 0 + assert result.cursor.current == "*" + + # [INFO] followed by text is literal + cursor = Cursor("[INFO] message", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert result.value.elements[0].value == "[INFO] message" # type: ignore[union-attr] + + def test_malformed_placeable_returns_none(self) -> None: + """Malformed placeable ({@) returns None.""" + cursor = Cursor("text{@", 0) + result = parse_simple_pattern(cursor) + assert result is None + + def test_in_select_expression(self) -> None: + """parse_simple_pattern as used in select expression variants.""" + bundle = FluentBundle("en_US") + bundle.add_resource("""msg = {NUMBER(1) -> + [one] One item + *[other] Many items +}""") + result, _ = bundle.format_pattern("msg") + assert "item" in result + + +class TestSimplePatternTextAccDirect: + """Tests for text_acc paths in parse_simple_pattern (Cursor-direct).""" + + def test_text_then_continuation_then_placeable(self) -> None: + """Accumulated text merged with prior element before placeable.""" + result = parse_simple_pattern(Cursor("hello\n {$x}", 0)) + assert result is not None + assert len(result.value.elements) >= 2 + + def test_continuation_then_placeable_no_prior(self) -> None: + """Continuation before placeable with no prior elements.""" + result = parse_simple_pattern(Cursor("\n {$x}", 0)) + assert result is not None + + def test_placeable_then_continuation_then_placeable(self) -> None: + """Placeable, continuation, then another placeable.""" + result = parse_simple_pattern(Cursor("{$a}\n {$b}", 0)) + assert result is not None + + def test_text_then_continuation_at_end(self) -> None: + """Text followed by trailing continuation.""" + result = parse_simple_pattern(Cursor("hello\n ", 0)) + assert result is not None + + def test_continuation_at_end_no_prior(self) -> None: + """Trailing continuation with no prior elements.""" + result = parse_simple_pattern(Cursor("\n ", 0)) + assert result is not None + + def test_placeable_then_continuation_at_end(self) -> None: + """Placeable then trailing continuation.""" + result = parse_simple_pattern(Cursor("{$x}\n ", 0)) + assert result is not None + + def test_complex_continuation_before_placeable(self) -> None: + """Multiple continuations before placeable.""" + text = "start\n line1\n line2\n {$x}" + result = parse_simple_pattern(Cursor(text, 0)) + assert result is not None + + def test_multiple_placeables_with_continuations(self) -> None: + """Multiple placeables separated by continuations.""" + result = parse_simple_pattern(Cursor("{$a}\n {$b}\n {$c}", 0)) + assert result is not None + + def test_blank_continuation_lines(self) -> None: + """Blank lines between continuations.""" + result = parse_simple_pattern(Cursor("text\n\n continued", 0)) + assert result is not None + + def test_continuation_before_placeable_with_text(self) -> None: + """Leading spaces then text then placeable.""" + cursor = Cursor(" continuation{$var}", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert len(result.value.elements) >= 2 + + def test_placeable_continuation_text_placeable(self) -> None: + """Placeable, continuation with text, then another placeable.""" + cursor = Cursor("{$x}\n text{$y}", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert len(result.value.elements) >= 3 + + def test_continuation_before_text_no_prior(self) -> None: + """Leading spaces then text, no prior elements.""" + cursor = Cursor(" line1\n line2", 0) + result = parse_simple_pattern(cursor) + assert result is not None + + def test_finalize_continuation_no_prior(self) -> None: + """Finalize accumulated text when no prior elements.""" + cursor = Cursor(" just continuation", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert len(result.value.elements) >= 1 + + def test_finalize_continuation_last_is_placeable(self) -> None: + """Finalize accumulated text when last element is placeable.""" + cursor = Cursor("{$x}\n continuation", 0) + result = parse_simple_pattern(cursor) + assert result is not None + assert len(result.value.elements) >= 2 + + def test_direct_text_acc_finalization(self) -> None: + """Extra spaces accumulated then stop character triggers finalization.""" + source = "a\n b\n }" + result = parse_simple_pattern(Cursor(source, 0)) + assert result is not None + assert len(result.value.elements) >= 1 + + +class TestSimplePatternTextAccVariant: + """Tests for text_acc in variant/message context (parse_ftl/parse_message).""" + + def test_extra_spaces_before_placeable(self) -> None: + """Extra indentation before placeable in variant pattern.""" + ftl = """msg = { $n -> + [one] + first + {$count} + *[other] items +}""" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + + def test_trailing_extra_spaces(self) -> None: + """Trailing extra spaces at end of variant pattern.""" + ftl = """msg = { $n -> + [one] + item + + *[other] items +}""" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + + def test_continuation_extra_spaces_then_placeable(self) -> None: + """Extra spaces before placeable via parse_message.""" + source = """msg = {$n -> + [one] Line1 + Line2 + {$var} + *[other] Items +}""" + cursor = Cursor(source, 0) + result = parse_message(cursor, ParseContext()) + assert result is not None + message = result.value + assert isinstance(message, Message) + assert message.value is not None + + def test_continuation_spaces_only_then_placeable(self) -> None: + """Blank continuation creating extra_spaces, then text+placeable.""" + source = """msg = {$n -> + [one] Start + + text {$x} + *[other] End +}""" + cursor = Cursor(source, 0) + result = parse_message(cursor, ParseContext()) + assert result is not None + assert isinstance(result.value, Message) + + def test_trailing_extra_spaces_via_message(self) -> None: + """Variant ending with only accumulated extra spaces.""" + variant_one = ( + "[one] Text\n MoreText\n " + ) + variant_other = "*[other] Items" + source = ( + f"msg = {{$n ->\n {variant_one}" + f"\n {variant_other}\n}}" + ) + cursor = Cursor(source, 0) + result = parse_message(cursor, ParseContext()) + assert result is not None + assert isinstance(result.value, Message) + assert result.value.value is not None + + def test_extra_spaces_at_close_brace(self) -> None: + """Trailing extra spaces ending at close brace.""" + source = """msg = {$n -> + *[other] Text + +}""" + cursor = Cursor(source, 0) + result = parse_message(cursor, ParseContext()) + assert result is not None + assert isinstance(result.value, Message) + + def test_complex_spacing_finalization(self) -> None: + """Multiple continuations ending with accumulated spaces.""" + source = """msg = {$count -> + [one] Line one + Line two + Line three + + *[other] Other +}""" + cursor = Cursor(source, 0) + result = parse_message(cursor, ParseContext()) + assert result is not None + message = result.value + assert isinstance(message, Message) + assert message.value is not None + placeable = message.value.elements[0] + assert isinstance(placeable, Placeable) + assert isinstance(placeable.expression, SelectExpression) + + def test_variant_ending_with_continuation(self) -> None: + """Variant ending with continuation extra spaces.""" + ftl = """msg = { $n -> + [one] value + text + + [two] other + *[three] default +}""" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None + + def test_variant_extra_indent_then_next(self) -> None: + """Variant with extra indent followed by next variant.""" + ftl = """msg = { $n -> + [one] + line1 + + [two] line2 + *[other] other +}""" + resource = parse_ftl(ftl) + msg = resource.entries[0] + assert isinstance(msg, Message) + assert msg.value is not None diff --git a/tests/syntax_parser_patterns_cases/variant_delimiter_lookahead.py b/tests/syntax_parser_patterns_cases/variant_delimiter_lookahead.py new file mode 100644 index 00000000..fc644b55 --- /dev/null +++ b/tests/syntax_parser_patterns_cases/variant_delimiter_lookahead.py @@ -0,0 +1,105 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_patterns.py.""" + +from tests.syntax_parser_patterns_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VARIANT DELIMITER LOOKAHEAD +# ============================================================================ + + +class TestVariantDelimiterLookahead: + """Tests for variant delimiter (* and [) in pattern text.""" + + def test_asterisk_literal_in_variant(self) -> None: + """'*' without '[' is treated as literal text.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource(""" +count = { $n -> + [one] 1 * item + *[other] { $n } * items +} +""") + result, errors = bundle.format_pattern("count", {"n": 1}) + assert "1 * item" in result + assert not errors + + def test_bracket_not_starting_variant(self) -> None: + """'[' not followed by valid key is treated as literal.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource(""" +msg = { $type -> + [info] [INFO] message + *[other] [?] unknown +} +""") + result, errors = bundle.format_pattern( + "msg", {"type": "info"} + ) + assert "[INFO] message" in result + assert not errors + + def test_math_expression_in_variant(self) -> None: + """Math-like expressions with * and [ in variant text.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource(""" +calc = { $op -> + [mul] Result: 3 * 5 = 15 + [arr] Array: [1, 2, 3] + *[other] Unknown operation +} +""") + result, _ = bundle.format_pattern("calc", {"op": "mul"}) + assert "3 * 5 = 15" in result + + result, _ = bundle.format_pattern("calc", {"op": "arr"}) + assert "[1, 2, 3]" in result + + def test_asterisk_bracket_is_variant(self) -> None: + """'*[' still correctly marks default variant.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource(""" +example = { $x -> + [a] Value A + *[b] Default B +} +""") + result, errors = bundle.format_pattern( + "example", {"x": "unknown"} + ) + assert not errors + assert "Default B" in result + + def test_numeric_variant_key(self) -> None: + """[123] treated as variant key.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource(""" +indexed = { $i -> + [0] Zero + [1] One + *[2] Default +} +""") + result, errors = bundle.format_pattern("indexed", {"i": 0}) + assert not errors + assert "Zero" in result + + def test_complex_asterisk_and_brackets(self) -> None: + """Both * and [] as literals in variant text.""" + bundle = FluentBundle("en_US", use_isolating=False) + bundle.add_resource(""" +complex = { $mode -> + [matrix] See [matrix * vector] for details + [calc] Compute a * b + c + *[other] No special chars +} +""") + result, _ = bundle.format_pattern( + "complex", {"mode": "matrix"} + ) + assert "[matrix * vector]" in result + + def test_variant_pattern_fails(self) -> None: + """parse_variant returns None on malformed input.""" + cursor = Cursor("[one] {@", 0) + assert parse_variant(cursor) is None diff --git a/tests/syntax_parser_patterns_cases/whitespace_utilities.py b/tests/syntax_parser_patterns_cases/whitespace_utilities.py new file mode 100644 index 00000000..38b872a8 --- /dev/null +++ b/tests/syntax_parser_patterns_cases/whitespace_utilities.py @@ -0,0 +1,296 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_parser_patterns.py.""" + +from tests.syntax_parser_patterns_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# WHITESPACE UTILITIES +# ============================================================================ + + +class TestSkipBlankInline: + """Tests for skip_blank_inline (U+0020 only, per FTL spec).""" + + def test_no_spaces(self) -> None: + """Returns same position when no spaces.""" + cursor = Cursor(source="hello", pos=0) + assert skip_blank_inline(cursor).pos == 0 + + def test_leading_spaces(self) -> None: + """Skips leading spaces.""" + cursor = Cursor(source=" hello", pos=0) + result = skip_blank_inline(cursor) + assert result.pos == 3 + assert result.current == "h" + + def test_all_spaces(self) -> None: + """Handles all-space string.""" + cursor = Cursor(source=" ", pos=0) + assert skip_blank_inline(cursor).is_eof is True + + def test_stops_at_tab(self) -> None: + """Does NOT skip tabs.""" + cursor = Cursor(source=" \thello", pos=0) + result = skip_blank_inline(cursor) + assert result.pos == 2 + assert result.current == "\t" + + def test_stops_at_newline(self) -> None: + """Does NOT skip newlines.""" + cursor = Cursor(source=" \nhello", pos=0) + result = skip_blank_inline(cursor) + assert result.pos == 2 + assert result.current == "\n" + + def test_at_eof(self) -> None: + """Handles EOF.""" + cursor = Cursor(source="", pos=0) + assert skip_blank_inline(cursor).is_eof + + +class TestSkipBlank: + """Tests for skip_blank (spaces and line endings).""" + + def test_no_whitespace(self) -> None: + """Returns same position when no whitespace.""" + cursor = Cursor(source="hello", pos=0) + assert skip_blank(cursor).pos == 0 + + def test_spaces_only(self) -> None: + """Skips spaces.""" + cursor = Cursor(source=" hello", pos=0) + result = skip_blank(cursor) + assert result.pos == 3 + assert result.current == "h" + + def test_newlines_only(self) -> None: + """Skips newlines.""" + cursor = Cursor(source="\n\nhello", pos=0) + result = skip_blank(cursor) + assert result.pos == 2 + assert result.current == "h" + + def test_mixed_whitespace(self) -> None: + """Skips mixed spaces and newlines.""" + cursor = Cursor(source=" \n hello", pos=0) + result = skip_blank(cursor) + assert result.pos == 6 + assert result.current == "h" + + def test_all_whitespace(self) -> None: + """Handles all-whitespace string.""" + cursor = Cursor(source=" \n ", pos=0) + assert skip_blank(cursor).is_eof is True + + def test_stops_at_tab(self) -> None: + """Does NOT skip tabs.""" + cursor = Cursor(source=" \n\thello", pos=0) + result = skip_blank(cursor) + assert result.pos == 2 + assert result.current == "\t" + + def test_normalized_crlf(self) -> None: + """Handles CRLF normalized to LF.""" + cursor = Cursor(source="\nhello", pos=0) + result = skip_blank(cursor) + assert result.pos == 1 + assert result.current == "h" + + def test_at_eof(self) -> None: + """Handles EOF.""" + cursor = Cursor(source="", pos=0) + assert skip_blank(cursor).is_eof + + +class TestIsIndentedContinuation: + """Tests for is_indented_continuation detection.""" + + def test_true_for_indented_line(self) -> None: + """Returns True for indented line after newline.""" + cursor = Cursor(source="\n hello", pos=0) + assert is_indented_continuation(cursor) is True + + def test_false_no_indentation(self) -> None: + """Returns False without indentation.""" + cursor = Cursor(source="\nhello", pos=0) + assert is_indented_continuation(cursor) is False + + def test_false_bracket(self) -> None: + """Returns False for line starting with [ (variant).""" + cursor = Cursor(source="\n [variant]", pos=0) + assert is_indented_continuation(cursor) is False + + def test_false_asterisk(self) -> None: + """Returns False for line starting with * (default variant).""" + cursor = Cursor(source="\n *[default]", pos=0) + assert is_indented_continuation(cursor) is False + + def test_false_dot(self) -> None: + """Returns False for line starting with . (attribute).""" + cursor = Cursor(source="\n .attribute", pos=0) + assert is_indented_continuation(cursor) is False + + def test_false_not_at_newline(self) -> None: + """Returns False when not at newline.""" + cursor = Cursor(source="hello", pos=0) + assert is_indented_continuation(cursor) is False + + def test_false_at_eof(self) -> None: + """Returns False at EOF.""" + cursor = Cursor(source="", pos=0) + assert is_indented_continuation(cursor) is False + + def test_normalized_line_ending(self) -> None: + """Works with normalized LF line endings.""" + cursor = Cursor(source="\n hello", pos=0) + assert is_indented_continuation(cursor) is True + + def test_eof_after_newline(self) -> None: + """Returns False for newline at EOF.""" + cursor = Cursor(source="\n", pos=0) + assert is_indented_continuation(cursor) is False + + def test_only_spaces_after_newline(self) -> None: + """Empty indented line is considered a valid continuation.""" + cursor = Cursor(source="\n ", pos=0) + assert is_indented_continuation(cursor) is True + + def test_tab_indentation_rejected(self) -> None: + """Returns False for tab indentation.""" + cursor = Cursor(source="\n\thello", pos=0) + assert is_indented_continuation(cursor) is False + + +class TestSkipMultilinePatternStart: + """Tests for skip_multiline_pattern_start.""" + + def test_inline_pattern(self) -> None: + """Handles inline pattern (no newline).""" + cursor = Cursor(source=" value", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.pos == 2 + assert new_cursor.current == "v" + assert indent == 0 + + def test_multiline_pattern(self) -> None: + """Handles multiline pattern (newline + indent).""" + cursor = Cursor(source="\n value", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.pos == 3 + assert new_cursor.current == "v" + assert indent == 2 + + def test_no_continuation(self) -> None: + """Stops at non-continuation newline.""" + cursor = Cursor(source="\nvalue", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.pos == 0 + assert new_cursor.current == "\n" + assert indent == 0 + + def test_empty_input(self) -> None: + """Handles empty input.""" + cursor = Cursor(source="", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.is_eof + assert indent == 0 + + def test_no_leading_spaces(self) -> None: + """Handles no leading spaces.""" + cursor = Cursor(source="value", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.pos == 0 + assert new_cursor.current == "v" + assert indent == 0 + + def test_normalized_line_ending(self) -> None: + """Handles normalized LF line endings.""" + cursor = Cursor(source="\n value", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.current == "v" + assert indent == 2 + + def test_stops_at_bracket(self) -> None: + """Stops at bracket (variant marker).""" + cursor = Cursor(source="\n [variant]", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.pos == 0 + assert new_cursor.current == "\n" + assert indent == 0 + + def test_inline_spaces_then_newline(self) -> None: + """Handles inline spaces then newline.""" + cursor = Cursor(source=" \nvalue", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.pos == 2 + assert new_cursor.current == "\n" + assert indent == 0 + + def test_only_newline(self) -> None: + """Handles only newline.""" + cursor = Cursor(source="\n", pos=0) + new_cursor, indent = skip_multiline_pattern_start(cursor) + assert new_cursor.pos == 0 + assert indent == 0 + + +class TestWhitespaceSpecCompliance: + """Spec compliance, integration, and edge cases for whitespace.""" + + def test_blank_inline_only_u0020(self) -> None: + """blank_inline ONLY accepts U+0020 (space).""" + assert skip_blank_inline(Cursor(" text", 0)).pos == 3 + assert skip_blank_inline(Cursor("\ttext", 0)).pos == 0 + + def test_blank_accepts_lf(self) -> None: + """blank accepts LF line endings.""" + assert skip_blank(Cursor("\ntext", 0)).current == "t" + + def test_blank_rejects_cr(self) -> None: + """Standalone CR is NOT whitespace per Fluent spec.""" + assert skip_blank(Cursor("\rtext", 0)).current == "\r" + + def test_continuation_special_chars(self) -> None: + """Special starting characters correctly identified.""" + assert is_indented_continuation(Cursor("\n [", 0)) is False + assert is_indented_continuation(Cursor("\n *", 0)) is False + assert is_indented_continuation(Cursor("\n .", 0)) is False + assert is_indented_continuation(Cursor("\n a", 0)) is True + + def test_carriage_return_not_whitespace(self) -> None: + """CR alone is not skipped by skip_blank.""" + cursor = Cursor(source="\rhello", pos=0) + assert skip_blank(cursor).current == "\r" + + def test_inline_pattern_integration(self) -> None: + """Simulate parsing message with inline pattern.""" + cursor = Cursor(source="hello = World", pos=5) + cursor = skip_blank_inline(cursor) + assert cursor.current == "=" + cursor = cursor.advance() + cursor, indent = skip_multiline_pattern_start(cursor) + assert cursor.current == "W" + assert indent == 0 + + def test_multiline_pattern_integration(self) -> None: + """Simulate parsing message with multiline pattern.""" + cursor = Cursor(source="hello =\n World", pos=5) + cursor = skip_blank_inline(cursor) + assert cursor.current == "=" + cursor = cursor.advance() + cursor, indent = skip_multiline_pattern_start(cursor) + assert cursor.current == "W" + assert indent == 2 + + def test_select_expression_with_blank(self) -> None: + """Simulate parsing select expression with blank lines.""" + cursor = Cursor(source=" \n \n [variant]", pos=0) + cursor = skip_blank(cursor) + assert cursor.current == "[" + + def test_continuation_detection_in_pattern(self) -> None: + """Detect continuation vs attribute.""" + c1 = Cursor(source="\n continued text", pos=0) + assert is_indented_continuation(c1) is True + c2 = Cursor(source="\n .attribute = value", pos=0) + assert is_indented_continuation(c2) is False diff --git a/tests/syntax_parser_property_cases/__init__.py b/tests/syntax_parser_property_cases/__init__.py new file mode 100644 index 00000000..6cd63647 --- /dev/null +++ b/tests/syntax_parser_property_cases/__init__.py @@ -0,0 +1,35 @@ +"""Hypothesis property-based tests for Fluent parser invariants and robustness.""" + +from __future__ import annotations + +from decimal import Decimal + +from hypothesis import assume, event, example, given, settings +from hypothesis import strategies as st + +from ftllexengine.syntax.ast import Comment, Junk, Message, Resource, Term +from ftllexengine.syntax.parser import FluentParserV1 +from ftllexengine.syntax.serializer import FluentSerializer +from tests.strategies import ftl_identifiers as shared_ftl_identifiers +from tests.strategies import ftl_simple_text + +ftl_identifiers = shared_ftl_identifiers() +variable_names = ftl_identifiers +safe_text = st.text( + alphabet=st.characters( + blacklist_categories=["Cc"], + blacklist_characters=["{", "}", "[", "]", "$", "-", "*", ".", "#", "\n"], + ), + min_size=1, +).filter(str.strip) +numbers = st.integers() +decimals = st.decimals(allow_nan=False, allow_infinity=False) +attribute_names = ftl_identifiers +variant_keys = st.from_regex(r"[a-z][a-z0-9]*", fullmatch=True) + +__all__ = [ + "Comment", "Decimal", "FluentParserV1", "FluentSerializer", "Junk", "Message", + "Resource", "Term", "assume", "attribute_names", "decimals", "event", + "example", "ftl_identifiers", "ftl_simple_text", "given", "numbers", "safe_text", + "settings", "shared_ftl_identifiers", "st", "variable_names", "variant_keys", +] diff --git a/tests/syntax_parser_property_cases/core.py b/tests/syntax_parser_property_cases/core.py new file mode 100644 index 00000000..5d613a00 --- /dev/null +++ b/tests/syntax_parser_property_cases/core.py @@ -0,0 +1,658 @@ +# mypy: ignore-errors +from tests.syntax_parser_property_cases import ( + Comment, + FluentParserV1, + Junk, + Message, + event, + ftl_identifiers, + given, + numbers, + safe_text, + settings, + st, + variable_names, +) + + +class TestParserRobustness: + """Property-based tests for parser robustness.""" + + @given( + # Use ftl_identifiers strategy - cleaner and unconstrained + identifier=ftl_identifiers, + ) + @settings(max_examples=200) + def test_simple_message_always_parses(self, identifier: str) -> None: + """Simple message with valid identifier always parses successfully.""" + source = f"{identifier} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should always produce a resource + assert resource is not None + assert hasattr(resource, "entries") + # Should have exactly one entry (the message) + assert len(resource.entries) == 1 + # That entry should be a Message + assert isinstance(resource.entries[0], Message) + + # Emit event for identifier characteristics (HypoFuzz guidance) + if "-" in identifier: + event("identifier=has_hyphen") + if "_" in identifier: + event("identifier=has_underscore") + if any(c.isdigit() for c in identifier): + event("identifier=has_digit") + + @given( + identifier=st.text( + alphabet=st.characters( + whitelist_categories=("Lu", "Ll"), min_codepoint=97, max_codepoint=122 + ), + min_size=1, + max_size=20, + ).filter(lambda x: x[0].isalpha()), + value=st.text( + alphabet=st.characters(blacklist_categories=["Cc"], blacklist_characters="{}\n"), + min_size=0, + max_size=100, + ), + ) + @settings(max_examples=200) + def test_message_with_arbitrary_value_parses( + self, identifier: str, value: str + ) -> None: + """Messages with arbitrary (non-special) text values parse.""" + source = f"{identifier} = {value}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + # Should have at least one entry + assert len(resource.entries) >= 1 + # First entry should be a Message (possibly with junk value) + first_entry = resource.entries[0] + assert isinstance(first_entry, (Message, Junk)) + + # Emit events for HypoFuzz guidance + event(f"entry_type={type(first_entry).__name__}") + if len(value) > 50: + event("value_length=long") + elif len(value) > 10: + event("value_length=medium") + else: + event("value_length=short") + + @given( + comment_text=st.text( + alphabet=st.characters(blacklist_categories=["Cc"], blacklist_characters="#"), + min_size=0, + max_size=100, + ), + ) + @settings(max_examples=150) + def test_single_line_comment_always_parses(self, comment_text: str) -> None: + """Single-line comments with arbitrary text parse successfully.""" + source = f"# {comment_text}\nkey = value" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse (comment + message) + assert resource is not None + assert len(resource.entries) >= 1 + + # Emit events for HypoFuzz guidance + if len(comment_text) > 50: + event("comment_length=long") + elif len(comment_text) > 10: + event("comment_length=medium") + else: + event("comment_length=short") + + @given( + num_newlines=st.integers(min_value=0, max_value=10), + ) + @settings(max_examples=50) + def test_blank_lines_do_not_affect_parsing(self, num_newlines: int) -> None: + """Multiple blank lines should not affect parsing.""" + source = f"key1 = value1{'\\n' * num_newlines}key2 = value2" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse both messages regardless of blank lines + assert resource is not None + # Should have at least one entry (message or junk) + assert len(resource.entries) >= 1 + # Check that we have Messages and/or Junk (not empty) + for entry in resource.entries: + assert isinstance(entry, (Message, Junk, Comment)) + + # Emit events for HypoFuzz guidance + if num_newlines == 0: + event("blank_lines=none") + elif num_newlines <= 2: + event("blank_lines=few") + else: + event("blank_lines=many") + + @given( + invalid_start=st.text( + alphabet=st.characters(whitelist_categories=("P", "S")), + min_size=1, + max_size=5, + ).filter(lambda x: x[0] not in "#-"), + ) + @settings(max_examples=100) + def test_invalid_entry_creates_junk(self, invalid_start: str) -> None: + """Invalid entry start characters create junk entries.""" + source = f"{invalid_start} invalid\nkey = value" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should recover and parse something (message or junk) + assert resource is not None + # Parser should produce entries (even if junk) + assert len(resource.entries) >= 1 + + # Emit events for HypoFuzz guidance + has_junk = any(isinstance(e, Junk) for e in resource.entries) + event(f"recovery={'has_junk' if has_junk else 'no_junk'}") + + +class TestParserInvariants: + """Metamorphic and invariant properties of the parser.""" + + @given( + source=st.text( + alphabet=st.characters( + whitelist_categories=("Lu", "Ll"), + min_codepoint=32, + max_codepoint=126, + ), + min_size=0, + max_size=500, + ), + ) + @settings(max_examples=200) + def test_parser_never_crashes(self, source: str) -> None: + """Parser should never crash, regardless of input.""" + parser = FluentParserV1() + + # Should not raise exceptions - parser always returns a resource + resource = parser.parse(source) + assert resource is not None + + # Emit events for entry type distribution (HypoFuzz guidance) + junk_count = sum(1 for e in resource.entries if isinstance(e, Junk)) + msg_count = sum(1 for e in resource.entries if isinstance(e, Message)) + if junk_count > 0: + event(f"parse_result=has_junk_{min(junk_count, 5)}") + if msg_count > 0: + event(f"parse_result=has_messages_{min(msg_count, 5)}") + if len(resource.entries) == 0: + event("parse_result=empty") + + @given( + identifier=st.text( + alphabet=st.characters( + whitelist_categories=("Lu", "Ll"), min_codepoint=97, max_codepoint=122 + ), + min_size=1, + max_size=20, + ).filter(lambda x: x[0].isalpha()), + ) + @settings(max_examples=100) + def test_parse_idempotence(self, identifier: str) -> None: + """Parsing the same source twice yields equivalent results.""" + source = f"{identifier} = value" + parser = FluentParserV1() + + resource1 = parser.parse(source) + resource2 = parser.parse(source) + + # Both should have same number of entries + assert len(resource1.entries) == len(resource2.entries) + + # Emit events for HypoFuzz guidance + if len(identifier) > 10: + event("identifier_length=long") + elif len(identifier) > 5: + event("identifier_length=medium") + else: + event("identifier_length=short") + + @given( + whitespace=st.text(alphabet=st.sampled_from([" ", "\t"]), min_size=0, max_size=10), + ) + @settings(max_examples=100) + def test_leading_whitespace_invariance(self, whitespace: str) -> None: + """Leading whitespace on continuation lines is significant.""" + # Indented continuation should be treated as continuation + source1 = "key = value" + source2 = f"key = value\n{whitespace} continuation" + + parser = FluentParserV1() + resource1 = parser.parse(source1) + resource2 = parser.parse(source2) + + # Both should parse (resource2 might have continuation) + assert resource1 is not None + assert resource2 is not None + + # Emit events for HypoFuzz guidance + has_tabs = "\t" in whitespace + has_spaces = " " in whitespace + if has_tabs and has_spaces: + event("whitespace_type=mixed") + elif has_tabs: + event("whitespace_type=tabs") + elif has_spaces: + event("whitespace_type=spaces") + else: + event("whitespace_type=none") + + +class TestParserEdgeCases: + """Edge cases and boundary conditions.""" + + @given( + num_hashes=st.integers(min_value=1, max_value=10), + ) + @settings(max_examples=50) + def test_comment_hash_count_validation(self, num_hashes: int) -> None: + """Comments with different hash counts are handled correctly.""" + source = f"{'#' * num_hashes} Comment\nkey = value" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should handle any number of hashes (1-3 valid, >3 creates junk) + assert resource is not None + # Should have at least one entry (comment/message or junk) + assert len(resource.entries) >= 1 + + # Emit events for HypoFuzz guidance + if num_hashes == 1: + event("comment_type=standalone") + elif num_hashes == 2: + event("comment_type=group") + elif num_hashes == 3: + event("comment_type=resource") + else: + event("comment_type=invalid_many_hashes") + + @given( + depth=st.integers(min_value=1, max_value=5), + ) + @settings(max_examples=50) + def test_nested_placeables_parse(self, depth: int) -> None: + """Nested placeables up to reasonable depth parse.""" + # Create nested variable references (simplified test - just validates parsing) + inner = "$var" + source = f"key = {{ {inner} }}" + + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse (might create errors for invalid syntax) + assert resource is not None + + # Emit depth event for HypoFuzz guidance + event(f"depth={depth}") + + @given( + num_variants=st.integers(min_value=1, max_value=10), + ) + @settings(max_examples=50) + def test_select_expression_variant_count(self, num_variants: int) -> None: + """Select expressions with varying variant counts parse.""" + # Generate variants + variants = "\n".join([f" [{i}] Variant {i}" for i in range(num_variants)]) + source = f"key = {{ $num ->\\n{variants}\\n *[other] Default\\n}}" + + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse + assert resource is not None + + # Emit variant count event for HypoFuzz guidance + event(f"variant_count={min(num_variants, 10)}") + + def test_empty_source_produces_empty_resource(self) -> None: + """Empty source produces resource with no entries.""" + parser = FluentParserV1() + resource = parser.parse("") + + assert resource is not None + assert len(resource.entries) == 0 + + def test_only_whitespace_produces_empty_resource(self) -> None: + """Source with only whitespace produces empty or junk resource.""" + parser = FluentParserV1() + resource = parser.parse(" \n\t\n \n") + + assert resource is not None + # Whitespace-only source may produce empty resource (this is valid) + + @given( + identifier=st.text( + alphabet=st.characters( + whitelist_categories=("Lu", "Ll"), min_codepoint=97, max_codepoint=122 + ), + min_size=1, + max_size=20, + ).filter(lambda x: x[0].isalpha()), + num_attributes=st.integers(min_value=1, max_value=5), + ) + @settings(max_examples=100) + def test_message_with_multiple_attributes( + self, identifier: str, num_attributes: int + ) -> None: + """Messages with multiple attributes parse correctly.""" + attributes = "\n".join( + [f" .attr{i} = Value {i}" for i in range(num_attributes)] + ) + source = f"{identifier} = Main value\n{attributes}" + + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse message with attributes + assert resource is not None + # Should have at least one entry (the message) + assert len(resource.entries) >= 1 + # First entry should be a Message + first_entry = resource.entries[0] + assert isinstance(first_entry, (Message, Junk)) + + # Emit events for HypoFuzz guidance + event(f"attribute_count={min(num_attributes, 5)}") + + +class TestParserRecovery: + """Test error recovery and resilience.""" + + @given( + num_errors=st.integers(min_value=1, max_value=5), + ) + @settings(max_examples=50) + def test_multiple_errors_recovery(self, num_errors: int) -> None: + """Parser recovers from multiple consecutive errors.""" + # Create multiple invalid lines followed by valid message + invalid_lines = "\n".join([f"!!! invalid {i}" for i in range(num_errors)]) + source = f"{invalid_lines}\nkey = value" + + parser = FluentParserV1() + resource = parser.parse(source) + + # Should create junk entries and recover + assert resource is not None + # Should have at least one entry (junk from invalid lines and/or message) + assert len(resource.entries) >= 1 + + # Emit events for HypoFuzz guidance + event(f"error_count={min(num_errors, 5)}") + junk_count = sum(1 for e in resource.entries if isinstance(e, Junk)) + event(f"junk_entries={min(junk_count, 5)}") + + @given( + unicode_char=st.characters(min_codepoint=0x1F600, max_codepoint=0x1F64F), + ) + @settings(max_examples=50) + def test_unicode_emoji_in_values(self, unicode_char: str) -> None: + """Unicode emoji characters in values are handled.""" + source = f"key = Hello {unicode_char}" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse + assert resource is not None + + # Emit events for HypoFuzz guidance + event("unicode=emoji") + + def test_very_long_identifier(self) -> None: + """Very long identifiers are handled.""" + long_id = "a" * 1000 + source = f"{long_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse (or create junk if too long) + assert resource is not None + + def test_very_long_value(self) -> None: + """Very long values are handled.""" + long_value = "value " * 1000 + source = f"key = {long_value}" + parser = FluentParserV1() + resource = parser.parse(source) + + # Should parse + assert resource is not None + + +# ============================================================================ +# VARIABLE REFERENCES +# ============================================================================ + + +class TestVariableReferenceParsing: + """Property tests for variable reference parsing.""" + + @given(var_name=variable_names) + @settings(max_examples=200) + def test_simple_variable_reference_parses(self, var_name: str) -> None: + """PROPERTY: { $var } always parses successfully.""" + source = f"msg = {{ ${var_name} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + assert len(resource.entries) > 0 + + # Emit events for HypoFuzz guidance + event("variable_position=only") + if len(var_name) > 10: + event("var_name_length=long") + else: + event("var_name_length=short") + + @given(var_name=variable_names, text=safe_text) + @settings(max_examples=150) + def test_variable_with_surrounding_text(self, var_name: str, text: str) -> None: + """PROPERTY: Text { $var } text parses correctly.""" + source = f"msg = {text} {{ ${var_name} }} {text}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variable_position=middle") + + @given( + var1=variable_names, + var2=variable_names, + ) + @settings(max_examples=150) + def test_multiple_variable_references(self, var1: str, var2: str) -> None: + """PROPERTY: Multiple { $var1 } { $var2 } parse correctly.""" + source = f"msg = {{ ${var1} }} {{ ${var2} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variable_count=2") + if var1 == var2: + event("variable_uniqueness=same") + else: + event("variable_uniqueness=different") + + @given(var_name=variable_names) + @settings(max_examples=100) + def test_variable_at_message_start(self, var_name: str) -> None: + """PROPERTY: Message starting with { $var } parses.""" + source = f"msg = {{ ${var_name} }} text" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variable_position=start") + + @given(var_name=variable_names) + @settings(max_examples=100) + def test_variable_at_message_end(self, var_name: str) -> None: + """PROPERTY: Message ending with { $var } parses.""" + source = f"msg = text {{ ${var_name} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variable_position=end") + + @given(var_name=variable_names) + @settings(max_examples=100) + def test_variable_only_message(self, var_name: str) -> None: + """PROPERTY: Message with only { $var } parses.""" + source = f"msg = {{ ${var_name} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variable_position=only") + + @given( + var_name=variable_names, + count=st.integers(min_value=2, max_value=10), + ) + @settings(max_examples=50) + def test_repeated_variable_references(self, var_name: str, count: int) -> None: + """PROPERTY: Same variable referenced multiple times parses.""" + refs = " ".join([f"{{ ${var_name} }}" for _ in range(count)]) + source = f"msg = {refs}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event(f"variable_count={min(count, 10)}") + event("variable_uniqueness=repeated") + + +# ============================================================================ +# PLACEABLES +# ============================================================================ + + +class TestPlaceableParsing: + """Property tests for placeable expression parsing.""" + + @given(text=safe_text) + @settings(max_examples=150) + def test_placeable_with_string_literal(self, text: str) -> None: + """PROPERTY: { "string" } parses as placeable.""" + # Escape quotes in text + escaped = text.replace('"', '\\"') + source = f'msg = {{ "{escaped}" }}' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("placeable_type=string_literal") + + @given(number=numbers) + @settings(max_examples=150) + def test_placeable_with_number_literal(self, number: int) -> None: + """PROPERTY: { 123 } parses as placeable.""" + source = f"msg = {{ {number} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("placeable_type=number_literal") + if number < 0: + event("number_sign=negative") + elif number == 0: + event("number_sign=zero") + else: + event("number_sign=positive") + + @given( + msg_id=ftl_identifiers, + var_name=variable_names, + ) + @settings(max_examples=100) + def test_placeable_with_message_reference( + self, msg_id: str, var_name: str + ) -> None: + """PROPERTY: { message-id } parses as message reference.""" + source = f"{msg_id} = value\nmsg = {{ {var_name} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("placeable_type=message_ref") + + @given( + var_name=variable_names, + count=st.integers(min_value=1, max_value=5), + ) + @settings(max_examples=50) + def test_consecutive_placeables(self, var_name: str, count: int) -> None: + """PROPERTY: Multiple consecutive placeables parse.""" + placeables = "".join([f"{{ ${var_name}{i} }}" for i in range(count)]) + source = f"msg = {placeables}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event(f"consecutive_placeables={min(count, 5)}") + + @given( + var_name=variable_names, + whitespace=st.text(alphabet=" \t", min_size=0, max_size=5), + ) + @settings(max_examples=100) + def test_placeable_internal_whitespace( + self, var_name: str, whitespace: str + ) -> None: + """PROPERTY: Whitespace inside { } is handled.""" + source = f"msg = {{{whitespace}${var_name}{whitespace}}}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + if len(whitespace) == 0: + event("internal_whitespace=none") + elif "\t" in whitespace: + event("internal_whitespace=has_tabs") + else: + event("internal_whitespace=spaces_only") + + +# ============================================================================ +# SELECT EXPRESSIONS +# ============================================================================ + + diff --git a/tests/syntax_parser_property_cases/grammar_boundaries.py b/tests/syntax_parser_property_cases/grammar_boundaries.py new file mode 100644 index 00000000..d89863cb --- /dev/null +++ b/tests/syntax_parser_property_cases/grammar_boundaries.py @@ -0,0 +1,749 @@ +# mypy: ignore-errors +from tests.syntax_parser_property_cases import ( + FluentParserV1, + attribute_names, + event, + ftl_identifiers, + given, + numbers, + safe_text, + settings, + st, + variable_names, +) + + +class TestFunctionCallParsing: + """Property tests for function call parsing.""" + + @given(var_name=variable_names) + @settings(max_examples=150) + def test_number_function_call(self, var_name: str) -> None: + """PROPERTY: NUMBER($var) parses correctly.""" + source = f"msg = {{ NUMBER(${var_name}) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_name=NUMBER") + event("function_arg_type=variable") + + @given(var_name=variable_names) + @settings(max_examples=150) + def test_datetime_function_call(self, var_name: str) -> None: + """PROPERTY: DATETIME($var) parses correctly.""" + source = f"msg = {{ DATETIME(${var_name}) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_name=DATETIME") + event("function_arg_type=variable") + + @given(var_name=variable_names) + @settings(max_examples=100) + def test_function_with_named_arg(self, var_name: str) -> None: + """PROPERTY: FUNC($var, opt: val) parses.""" + source = f"msg = {{ NUMBER(${var_name}, minimumFractionDigits: 2) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_options=with_named") + event("option_value_type=numeric") + + @given(var_name=variable_names, number=numbers) + @settings(max_examples=100) + def test_function_with_numeric_option(self, var_name: str, number: int) -> None: + """PROPERTY: Function with numeric option parses.""" + source = f"msg = {{ NUMBER(${var_name}, minimumFractionDigits: {number}) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_options=with_numeric") + if number < 0: + event("option_value_sign=negative") + elif number == 0: + event("option_value_sign=zero") + else: + event("option_value_sign=positive") + + @given(var_name=variable_names) + @settings(max_examples=100) + def test_function_with_string_option(self, var_name: str) -> None: + """PROPERTY: Function with string option parses.""" + source = f'msg = {{ DATETIME(${var_name}, style: "long") }}' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_options=with_string") + event("option_value_type=string") + + @given( + var_name=variable_names, + count=st.integers(min_value=1, max_value=5), + ) + @settings(max_examples=50) + def test_function_with_multiple_options(self, var_name: str, count: int) -> None: + """PROPERTY: Function with multiple options parses.""" + options = ", ".join([f"opt{i}: {i}" for i in range(count)]) + source = f"msg = {{ NUMBER(${var_name}, {options}) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_options=multiple") + event(f"option_count={min(count, 5)}") + + @given(func_name=ftl_identifiers, var_name=variable_names) + @settings(max_examples=100) + def test_custom_function_call(self, func_name: str, var_name: str) -> None: + """PROPERTY: Custom function calls parse.""" + # Note: uppercase function names required + source = f"msg = {{ {func_name.upper()}(${var_name}) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_name=CUSTOM") + if len(func_name) <= 5: + event("function_name_length=short") + else: + event("function_name_length=long") + + @given(number=numbers) + @settings(max_examples=50) + def test_function_with_number_literal_arg(self, number: int) -> None: + """PROPERTY: Function with number literal argument parses.""" + source = f"msg = {{ NUMBER({number}) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_arg_type=literal") + if number < 0: + event("literal_sign=negative") + elif number == 0: + event("literal_sign=zero") + else: + event("literal_sign=positive") + + @given(var_name=variable_names) + @settings(max_examples=50) + def test_nested_function_calls(self, var_name: str) -> None: + """PROPERTY: Nested function calls parse (if supported).""" + # Most parsers support simple nesting + source = f"msg = {{ NUMBER(${var_name}) }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("function_nesting=simple") + + +# ============================================================================ +# MESSAGE REFERENCES +# ============================================================================ + + +class TestMessageReferenceParsing: + """Property tests for message reference parsing.""" + + @given(msg_id1=ftl_identifiers, msg_id2=ftl_identifiers) + @settings(max_examples=150) + def test_simple_message_reference(self, msg_id1: str, msg_id2: str) -> None: + """PROPERTY: { msg-id } references another message.""" + source = f"{msg_id1} = Value1\n{msg_id2} = {{ {msg_id1} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("msg_ref_type=simple") + if msg_id1 == msg_id2: + event("msg_ref_self=true") + else: + event("msg_ref_self=false") + + @given( + msg_id1=ftl_identifiers, + msg_id2=ftl_identifiers, + attr_name=attribute_names, + ) + @settings(max_examples=100) + def test_message_attribute_reference( + self, msg_id1: str, msg_id2: str, attr_name: str + ) -> None: + """PROPERTY: { msg.attr } references message attribute.""" + source = ( + f"{msg_id1} = Value\n" + f" .{attr_name} = Attr\n" + f"{msg_id2} = {{ {msg_id1}.{attr_name} }}" + ) + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("msg_ref_type=with_attribute") + + @given( + msg_id=ftl_identifiers, + count=st.integers(min_value=2, max_value=5), + ) + @settings(max_examples=50) + def test_multiple_message_references(self, msg_id: str, count: int) -> None: + """PROPERTY: Multiple message references in one pattern parse.""" + refs = " ".join([f"{{ {msg_id}{i} }}" for i in range(count)]) + # Create referenced messages + messages = "\n".join([f"{msg_id}{i} = Value{i}" for i in range(count)]) + source = f"{messages}\nfinal = {refs}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("msg_ref_type=multiple") + event(f"msg_ref_count={min(count, 5)}") + + @given(msg_id1=ftl_identifiers, msg_id2=ftl_identifiers, text=safe_text) + @settings(max_examples=100) + def test_message_reference_with_text( + self, msg_id1: str, msg_id2: str, text: str + ) -> None: + """PROPERTY: Message reference mixed with text parses.""" + source = f"{msg_id1} = Value\n{msg_id2} = {text} {{ {msg_id1} }} {text}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("msg_ref_type=mixed_with_text") + if len(text) == 0: + event("surrounding_text=empty") + else: + event("surrounding_text=present") + + +# ============================================================================ +# IDENTIFIER VALIDATION +# ============================================================================ + + +class TestIdentifierValidation: + """Property tests for identifier validation.""" + + @given( + prefix=st.text( + alphabet=st.characters(min_codepoint=97, max_codepoint=122), + min_size=1, + max_size=5, + ), + number=st.integers(min_value=0, max_value=999), + ) + @settings(max_examples=150) + def test_identifier_with_number_suffix(self, prefix: str, number: int) -> None: + """PROPERTY: Identifiers can have numeric suffixes.""" + msg_id = f"{prefix}{number}" + source = f"{msg_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("identifier_type=with_number_suffix") + if number == 0: + event("number_suffix=zero") + elif number < 10: + event("number_suffix=single_digit") + elif number < 100: + event("number_suffix=two_digit") + else: + event("number_suffix=three_digit") + + @given( + parts=st.lists( + st.text( + alphabet=st.characters(min_codepoint=97, max_codepoint=122), + min_size=1, + max_size=5, + ), + min_size=2, + max_size=5, + ), + ) + @settings(max_examples=100) + def test_identifier_with_hyphens(self, parts: list[str]) -> None: + """PROPERTY: Identifiers with hyphens parse.""" + msg_id = "-".join(parts) + source = f"{msg_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("identifier_type=with_hyphens") + event(f"identifier_parts={min(len(parts), 5)}") + + @given( + parts=st.lists( + st.text( + alphabet=st.characters(min_codepoint=97, max_codepoint=122), + min_size=1, + max_size=5, + ), + min_size=2, + max_size=5, + ), + ) + @settings(max_examples=100) + def test_identifier_with_underscores(self, parts: list[str]) -> None: + """PROPERTY: Identifiers with underscores parse.""" + msg_id = "_".join(parts) + source = f"{msg_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("identifier_type=with_underscores") + event(f"identifier_parts={min(len(parts), 5)}") + + @given(length=st.integers(min_value=1, max_value=100)) + @settings(max_examples=50) + def test_identifier_length_handling(self, length: int) -> None: + """PROPERTY: Identifiers of various lengths parse.""" + msg_id = "a" * length + source = f"{msg_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("identifier_type=length_test") + if length == 1: + event("identifier_length=minimal") + elif length <= 10: + event("identifier_length=short") + elif length <= 50: + event("identifier_length=medium") + else: + event("identifier_length=long") + + @given( + msg_id=ftl_identifiers, + uppercase_count=st.integers(min_value=0, max_value=5), + ) + @settings(max_examples=100) + def test_identifier_case_sensitivity( + self, msg_id: str, uppercase_count: int + ) -> None: + """PROPERTY: Identifier case is preserved.""" + # Mix case by uppercasing some characters + chars = list(msg_id) + for i in range(min(uppercase_count, len(chars))): + chars[i] = chars[i].upper() + mixed_case_id = "".join(chars) + source = f"{mixed_case_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("identifier_type=mixed_case") + if uppercase_count == 0: + event("case_mix=all_lower") + elif uppercase_count >= len(chars): + event("case_mix=all_upper") + else: + event("case_mix=mixed") + + +# ============================================================================ +# ESCAPE SEQUENCES +# ============================================================================ + + +class TestEscapeSequenceParsing: + """Property tests for escape sequence handling.""" + + def test_unicode_escape_basic(self) -> None: + """PROPERTY: Basic Unicode escapes parse.""" + source = r'msg = { "\u0041" }' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + @given( + codepoint=st.integers( + min_value=0x0020, + max_value=0xD7FF, + ), # Valid Unicode range + ) + @settings(max_examples=100) + def test_unicode_escape_various_codepoints(self, codepoint: int) -> None: + """PROPERTY: Unicode escapes for various codepoints parse.""" + source = f'msg = {{ "\\u{codepoint:04X}" }}' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("escape_type=unicode") + if codepoint < 0x0080: + event("codepoint_range=ascii") + elif codepoint < 0x0800: + event("codepoint_range=latin_extended") + elif codepoint < 0x3000: + event("codepoint_range=mid_bmp") + else: + event("codepoint_range=cjk_symbols") + + def test_escaped_quote_in_string(self) -> None: + """PROPERTY: Escaped quotes in strings parse.""" + source = r'msg = { "He said \"Hello\"" }' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + def test_escaped_backslash_in_string(self) -> None: + """PROPERTY: Escaped backslashes parse.""" + source = r'msg = { "Path: C:\\Windows" }' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + def test_escaped_braces_in_text(self) -> None: + """PROPERTY: Escaped braces in text parse.""" + source = r"msg = Literal \{ and \} braces" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + +# ============================================================================ +# LINE ENDING HANDLING +# ============================================================================ + + +class TestLineEndingHandling: + """Property tests for line ending handling.""" + + @given(msg_id=ftl_identifiers) + @settings(max_examples=100) + def test_unix_line_endings(self, msg_id: str) -> None: + """PROPERTY: Unix \\n line endings parse correctly.""" + source = f"{msg_id}1 = value1\n{msg_id}2 = value2\n" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("line_ending_type=unix") + + @given(msg_id=ftl_identifiers) + @settings(max_examples=100) + def test_windows_line_endings(self, msg_id: str) -> None: + """PROPERTY: Windows \\r\\n line endings parse correctly.""" + source = f"{msg_id}1 = value1\r\n{msg_id}2 = value2\r\n" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("line_ending_type=windows") + + @given(msg_id=ftl_identifiers) + @settings(max_examples=100) + def test_old_mac_line_endings(self, msg_id: str) -> None: + """PROPERTY: Old Mac \\r line endings parse.""" + source = f"{msg_id}1 = value1\r{msg_id}2 = value2\r" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("line_ending_type=old_mac") + + @given(msg_id=ftl_identifiers) + @settings(max_examples=50) + def test_mixed_line_endings(self, msg_id: str) -> None: + """PROPERTY: Mixed line endings are handled.""" + source = f"{msg_id}1 = value1\n{msg_id}2 = value2\r\n{msg_id}3 = value3\r" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("line_ending_type=mixed") + + @given(msg_id=ftl_identifiers) + @settings(max_examples=50) + def test_no_final_newline(self, msg_id: str) -> None: + """PROPERTY: Source without final newline parses.""" + source = f"{msg_id} = value" # No trailing newline + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("line_ending_type=no_final") + + +# ============================================================================ +# UTF-8 BOM HANDLING +# ============================================================================ + + +class TestUTF8BOMHandling: + """Property tests for UTF-8 BOM handling.""" + + @given(msg_id=ftl_identifiers) + @settings(max_examples=100) + def test_utf8_bom_at_start(self, msg_id: str) -> None: + """PROPERTY: UTF-8 BOM at file start is handled.""" + bom = "\ufeff" + source = f"{bom}{msg_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("bom_presence=with_bom") + + @given(msg_id=ftl_identifiers) + @settings(max_examples=50) + def test_source_without_bom(self, msg_id: str) -> None: + """PROPERTY: Source without BOM parses normally.""" + source = f"{msg_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("bom_presence=without_bom") + + @given(msg_id=ftl_identifiers, text=safe_text) + @settings(max_examples=50) + def test_bom_only_at_start(self, msg_id: str, text: str) -> None: + """PROPERTY: BOM only valid at file start.""" + source = f"{msg_id} = {text}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("bom_presence=no_bom_with_content") + if len(text) == 0: + event("text_content=empty") + else: + event("text_content=present") + + +# ============================================================================ +# PATTERN ELEMENT BOUNDARIES +# ============================================================================ + + +class TestPatternElementBoundaries: + """Property tests for pattern element boundaries.""" + + @given(var_name=variable_names, text=safe_text) + @settings(max_examples=100) + def test_text_placeable_boundary(self, var_name: str, text: str) -> None: + """PROPERTY: Boundary between text and placeable is correct.""" + source = f"msg = {text}{{ ${var_name} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("boundary_type=text_placeable") + if len(text) == 0: + event("prefix_text=empty") + else: + event("prefix_text=present") + + @given(var_name=variable_names, text=safe_text) + @settings(max_examples=100) + def test_placeable_text_boundary(self, var_name: str, text: str) -> None: + """PROPERTY: Boundary between placeable and text is correct.""" + source = f"msg = {{ ${var_name} }}{text}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("boundary_type=placeable_text") + if len(text) == 0: + event("suffix_text=empty") + else: + event("suffix_text=present") + + @given( + var1=variable_names, + var2=variable_names, + ) + @settings(max_examples=100) + def test_placeable_placeable_boundary(self, var1: str, var2: str) -> None: + """PROPERTY: Adjacent placeables have correct boundary.""" + source = f"msg = {{ ${var1} }}{{ ${var2} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("boundary_type=placeable_placeable") + if var1 == var2: + event("adjacent_vars=same") + else: + event("adjacent_vars=different") + + @given(text1=safe_text, text2=safe_text) + @settings(max_examples=50) + def test_text_text_concatenation(self, text1: str, text2: str) -> None: + """PROPERTY: Consecutive text elements are handled.""" + source = f"msg = {text1} {text2}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("boundary_type=text_text") + total_len = len(text1) + len(text2) + if total_len == 0: + event("combined_text=empty") + elif total_len <= 20: + event("combined_text=short") + else: + event("combined_text=long") + + +# ============================================================================ +# MULTILINE PATTERNS +# ============================================================================ + + +class TestMultilinePatterns: + """Property tests for multiline pattern handling.""" + + @given(msg_id=ftl_identifiers, lines=st.lists(safe_text, min_size=2, max_size=5)) + @settings(max_examples=100) + def test_multiline_text_value(self, msg_id: str, lines: list[str]) -> None: + """PROPERTY: Multiline text values parse.""" + # Indent continuation lines + text_lines = [lines[0]] + [f" {line}" for line in lines[1:]] + source = f"{msg_id} =\n" + "\n".join(text_lines) + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("multiline_type=text_only") + event(f"line_count={min(len(lines), 5)}") + + @given( + msg_id=ftl_identifiers, + var_name=variable_names, + lines=st.lists(safe_text, min_size=2, max_size=5), + ) + @settings(max_examples=50) + def test_multiline_with_placeables( + self, msg_id: str, var_name: str, lines: list[str] + ) -> None: + """PROPERTY: Multiline patterns with placeables parse.""" + text_lines = [f"{lines[0]} {{ ${var_name} }}"] + [ + f" {line}" for line in lines[1:] + ] + source = f"{msg_id} =\n" + "\n".join(text_lines) + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("multiline_type=with_placeables") + event(f"line_count={min(len(lines), 5)}") + + @given( + msg_id=ftl_identifiers, + indent=st.integers(min_value=4, max_value=12), + ) + @settings(max_examples=50) + def test_multiline_indentation_consistency( + self, msg_id: str, indent: int + ) -> None: + """PROPERTY: Consistent indentation in multiline patterns.""" + source = ( + f"{msg_id} =\n" + f"{' ' * indent}Line 1\n" + f"{' ' * indent}Line 2\n" + f"{' ' * indent}Line 3" + ) + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("multiline_type=consistent_indent") + if indent == 4: + event("indent_level=minimal") + elif indent <= 8: + event("indent_level=standard") + else: + event("indent_level=deep") + + +# ============================================================================ +# ROUND-TRIP PROPERTIES +# ============================================================================ + + diff --git a/tests/syntax_parser_property_cases/roundtrip_and_malformed.py b/tests/syntax_parser_property_cases/roundtrip_and_malformed.py new file mode 100644 index 00000000..12b6c810 --- /dev/null +++ b/tests/syntax_parser_property_cases/roundtrip_and_malformed.py @@ -0,0 +1,656 @@ +# mypy: ignore-errors +from tests.syntax_parser_property_cases import ( + FluentParserV1, + FluentSerializer, + Junk, + Message, + Resource, + Term, + assume, + event, + example, + ftl_simple_text, + given, + settings, + shared_ftl_identifiers, + st, +) + + +class TestParserRoundTrip: + """Property: parse(serialize(parse(source))) preserves AST structure.""" + + @given( + msg_id=shared_ftl_identifiers(), + msg_value=ftl_simple_text(), + ) + @settings(max_examples=1000) + def test_simple_message_roundtrip( + self, msg_id: str, msg_value: str + ) -> None: + """Simple messages round-trip through serialize/parse.""" + parser = FluentParserV1() + serializer = FluentSerializer() + + ftl_source = f"{msg_id} = {msg_value}" + resource1 = parser.parse(ftl_source) + entry_count = len(resource1.entries) + event(f"entry_count={entry_count}") + + assert entry_count > 0 + + serialized = serializer.serialize(resource1) + resource2 = parser.parse(serialized) + + assert len(resource2.entries) == entry_count + if isinstance(resource1.entries[0], Message) and isinstance( + resource2.entries[0], Message + ): + assert ( + resource1.entries[0].id.name + == resource2.entries[0].id.name + ) + + @given( + msg_id=shared_ftl_identifiers(), + var_name=shared_ftl_identifiers(), + prefix=ftl_simple_text(), + suffix=ftl_simple_text(), + ) + @settings(max_examples=500) + def test_variable_interpolation_roundtrip( + self, + msg_id: str, + var_name: str, + prefix: str, + suffix: str, + ) -> None: + """Messages with variable interpolation round-trip.""" + parser = FluentParserV1() + serializer = FluentSerializer() + + ftl_source = ( + f"{msg_id} = {prefix} {{ ${var_name} }} {suffix}" + ) + resource1 = parser.parse(ftl_source) + has_junk = any( + isinstance(e, Junk) for e in resource1.entries + ) + event( + f"outcome={'has_junk' if has_junk else 'roundtrip_clean'}" + ) + + assert not has_junk + + serialized = serializer.serialize(resource1) + resource2 = parser.parse(serialized) + + assert not any( + isinstance(e, Junk) for e in resource2.entries + ) + + +# ============================================================================ +# METAMORPHIC PROPERTIES +# ============================================================================ + + +class TestParserMetamorphicProperties: + """Metamorphic properties: predictable relations between inputs.""" + + @given( + value1=ftl_simple_text(), + value2=ftl_simple_text(), + ) + @settings(max_examples=300) + def test_concatenation_preserves_message_count( + self, value1: str, value2: str + ) -> None: + """Separate messages in one source produce two entries.""" + parser = FluentParserV1() + separate_source = f"m1 = {value1}\nm2 = {value2}" + r1 = parser.parse(separate_source) + + non_junk = [ + e for e in r1.entries if not isinstance(e, Junk) + ] + msg_count = len(non_junk) + event(f"non_junk_count={msg_count}") + assert msg_count == 2 + + @given( + msg_id=shared_ftl_identifiers(), + msg_value=ftl_simple_text(), + newlines=st.integers(min_value=1, max_value=5), + ) + @settings(max_examples=200) + def test_blank_line_count_independence( + self, msg_id: str, msg_value: str, newlines: int + ) -> None: + """Blank lines between messages do not affect parse result.""" + parser = FluentParserV1() + separator = "\n" * newlines + ftl_source = f"m1 = test{separator}{msg_id} = {msg_value}" + + resource = parser.parse(ftl_source) + messages = [ + e for e in resource.entries if isinstance(e, Message) + ] + event(f"separator_newlines={newlines}") + assert len(messages) == 2 + + @given( + msg_id=shared_ftl_identifiers(), + msg_value=ftl_simple_text(), + ) + @settings(max_examples=300) + def test_deterministic_parsing( + self, msg_id: str, msg_value: str + ) -> None: + """Parsing same input twice yields identical results.""" + source = f"{msg_id} = {msg_value}" + parser = FluentParserV1() + result1 = parser.parse(source) + result2 = parser.parse(source) + + assert len(result1.entries) == len(result2.entries) + for e1, e2 in zip( + result1.entries, result2.entries, strict=True + ): + assert isinstance(e1, type(e2)) + event(f"entry_count={len(result1.entries)}") + + +# ============================================================================ +# STRUCTURAL PROPERTIES +# ============================================================================ + + +class TestParserStructuralProperties: + """Properties about AST structure produced by parser.""" + + @given( + msg_id=shared_ftl_identifiers(), + msg_value=ftl_simple_text(), + ) + @settings(max_examples=300) + def test_message_has_required_fields( + self, msg_id: str, msg_value: str + ) -> None: + """Parsed Messages have all required fields set.""" + parser = FluentParserV1() + ftl_source = f"{msg_id} = {msg_value}" + resource = parser.parse(ftl_source) + messages = [ + e for e in resource.entries if isinstance(e, Message) + ] + + assert len(messages) > 0 + msg = messages[0] + assert msg.id is not None + assert msg.id.name == msg_id + assert msg.value is not None + event(f"attribute_count={len(msg.attributes)}") + + @given( + msg_id=shared_ftl_identifiers(), + attr_name=shared_ftl_identifiers(), + attr_value=ftl_simple_text(), + ) + @settings(max_examples=200) + def test_attribute_parsing_structure( + self, msg_id: str, attr_name: str, attr_value: str + ) -> None: + """Messages with attributes parse into correct structure.""" + parser = FluentParserV1() + ftl = f"{msg_id} =\n .{attr_name} = {attr_value}" + resource = parser.parse(ftl) + messages = [ + e for e in resource.entries if isinstance(e, Message) + ] + + has_attr = ( + bool(messages) + and bool(messages[0].attributes) + ) + event( + f"outcome={'has_attr' if has_attr else 'no_attr'}" + ) + if has_attr: + attr = messages[0].attributes[0] + assert attr.id.name == attr_name + + @given( + term_id=shared_ftl_identifiers(), + term_value=ftl_simple_text(), + ) + @settings(max_examples=300) + def test_term_parsing_structure( + self, term_id: str, term_value: str + ) -> None: + """Terms with leading hyphen parse correctly.""" + parser = FluentParserV1() + ftl_source = f"-{term_id} = {term_value}" + resource = parser.parse(ftl_source) + + terms = [ + e for e in resource.entries if isinstance(e, Term) + ] + event(f"term_count={len(terms)}") + assert len(terms) > 0 + assert terms[0].id.name == term_id + + @given( + msg_id=shared_ftl_identifiers(), + nesting_depth=st.integers(min_value=1, max_value=10), + ) + @settings(max_examples=200) + def test_nested_placeable_depth( + self, msg_id: str, nesting_depth: int + ) -> None: + """Parser handles nested placeables up to depth limit.""" + parser = FluentParserV1() + open_braces = "{ " * nesting_depth + close_braces = " }" * nesting_depth + ftl_source = f"{msg_id} = {open_braces}$x{close_braces}" + + resource = parser.parse(ftl_source) + event(f"nesting_depth={nesting_depth}") + assert len(resource.entries) > 0 + + @given(source=st.text(min_size=0, max_size=500)) + @settings(max_examples=2000) + def test_parser_always_returns_resource( + self, source: str + ) -> None: + """Parser handles arbitrary input without crashing.""" + parser = FluentParserV1() + try: + result = parser.parse(source) + assert isinstance(result, Resource) + event(f"entry_count={len(result.entries)}") + except RecursionError: + pass + + @given( + msg_id=shared_ftl_identifiers(), + msg_value=ftl_simple_text(), + leading_ws=st.text(alphabet=" \t", max_size=10), + trailing_ws=st.text(alphabet=" \t", max_size=10), + ) + @settings(max_examples=300) + def test_whitespace_around_message( + self, msg_id: str, msg_value: str, + leading_ws: str, trailing_ws: str, + ) -> None: + """Leading/trailing whitespace does not change message ID.""" + parser = FluentParserV1() + ftl1 = f"{msg_id} = {msg_value}" + ftl2 = ( + f"{leading_ws}{msg_id} = {msg_value}{trailing_ws}" + ) + + resource1 = parser.parse(ftl1) + resource2 = parser.parse(ftl2) + + msgs1 = [ + e for e in resource1.entries + if isinstance(e, Message) + ] + msgs2 = [ + e for e in resource2.entries + if isinstance(e, Message) + ] + ws_type = "mixed" if leading_ws and trailing_ws else "one" + event(f"whitespace_padding={ws_type}") + if msgs1 and msgs2: + assert msgs1[0].id.name == msgs2[0].id.name + + +# ============================================================================ +# MALFORMED INPUT PROPERTIES +# ============================================================================ + + +@st.composite +def malformed_placeable(draw: st.DrawFn) -> str: + """Generate placeables with strategic syntax errors.""" + corruptions = [ + "{", # Missing content + "{ ", # Space but no content + "{ $", # Incomplete variable + "{ $v", # Incomplete variable name + "{ $var", # Missing closing } + '{ "', # Incomplete string literal + '{ "text', # Unterminated string + "{ -", # Incomplete term ref + "{ -t", # Incomplete term name + "{ -term", # Missing closing } + "{ 1.", # Malformed number + "{ 1.2.", # Invalid number format + "{ FUNC", # Missing parentheses + "{ FUNC(", # Incomplete function + "{ FUNC($", # Incomplete arg + "{ msg.", # Missing attr name + "{ msg.@", # Invalid attr name + "{ $x ->", # Incomplete select + "{ $x -> [", # Incomplete variant + "{ $x -> [a]", # Missing pattern + ] + return draw(st.sampled_from(corruptions)) + + +@st.composite +def malformed_function_call(draw: st.DrawFn) -> str: + """Generate function calls with strategic syntax errors.""" + func_name = draw( + st.sampled_from(["FUNC", "NUMBER", "DATETIME"]) + ) + corruptions = [ + f"{func_name}", + f"{func_name}(", + f"{func_name}($", + f"{func_name}($v", + f"{func_name}(1.2.", + f'{{ {func_name}("', + f"{func_name}(@", + f"{func_name}(a:", + f"{func_name}(a: )", + f"{func_name}(123: x)", + f"{func_name}(a: 1, a: 2)", + f"{func_name}(x: 1, 2)", + ] + return draw(st.sampled_from(corruptions)) + + +@st.composite +def malformed_select_expression(draw: st.DrawFn) -> str: + """Generate select expressions with strategic errors.""" + var = draw(st.sampled_from(["$x", "$count", "$num"])) + corruptions = [ + f"{{ {var} ->", + f"{{ {var} -> [", + f"{{ {var} -> [@", + f"{{ {var} -> [a]", + f"{{ {var} -> [a] Text", + f"{{ {var} -> [a] {{ msg.", + f"{{ {var} -> [one] X *[other] Y", + ] + return draw(st.sampled_from(corruptions)) + + +@st.composite +def malformed_term_input(draw: st.DrawFn) -> str: + """Generate terms with strategic syntax errors.""" + corruptions = [ + "-", + "- ", + "-@invalid", + "-term", + "-term =", + "-term = val\n .", + "-term = val\n .@", + ] + return draw(st.sampled_from(corruptions)) + + +@st.composite +def malformed_term_reference(draw: st.DrawFn) -> str: + """Generate term references with strategic errors.""" + corruptions = [ + "{ -", + "{ - }", + "{ -@bad }", + "{ -term(", + "{ -term(x", + "{ -term.", + ] + return draw(st.sampled_from(corruptions)) + + +@st.composite +def malformed_attribute(draw: st.DrawFn) -> str: + """Generate attributes with strategic errors.""" + corruptions = [ + " .", + " .@", + " . = val", + " .attr", + " .attr =", + ] + return draw(st.sampled_from(corruptions)) + + +class TestMalformedPlaceables: + """Parser handles malformed placeables without crashing.""" + + @given( + msg_id=shared_ftl_identifiers(), + placeable=malformed_placeable(), + ) + @settings(max_examples=100, deadline=None) + @example(msg_id="key", placeable="{ msg.") + @example(msg_id="key", placeable='{ "') + @example(msg_id="key", placeable="{ 1.2.") + def test_malformed_placeables( + self, msg_id: str, placeable: str + ) -> None: + """Parser recovers from malformed placeables.""" + source = f"{msg_id} = {placeable}" + event(f"placeable_len={len(placeable)}") + parser = FluentParserV1() + + try: + resource = parser.parse(source) + assert resource is not None + except RecursionError: + assume(False) + + +class TestMalformedFunctionCalls: + """Parser handles malformed function calls gracefully.""" + + @given( + msg_id=shared_ftl_identifiers(), + func_call=malformed_function_call(), + ) + @settings(max_examples=80, deadline=None) + @example(msg_id="key", func_call="FUNC($") + @example(msg_id="key", func_call="FUNC(1.2.") + @example(msg_id="key", func_call='{ FUNC("') + @example(msg_id="key", func_call="FUNC(@bad)") + @example(msg_id="key", func_call="FUNC(a:)") + @example(msg_id="key", func_call="FUNC") + def test_malformed_function_calls( + self, msg_id: str, func_call: str + ) -> None: + """Parser recovers from malformed function calls.""" + source = f"{msg_id} = {{ {func_call} }}" + event(f"func_call_len={len(func_call)}") + parser = FluentParserV1() + resource = parser.parse(source) + assert resource is not None + + +class TestMalformedSelectExpressions: + """Parser handles malformed select expressions.""" + + @given( + msg_id=shared_ftl_identifiers(), + select=malformed_select_expression(), + ) + @settings(max_examples=50, deadline=None) + @example(msg_id="key", select="{ $x -> [@") + @example(msg_id="key", select="{ $x -> [a] Text") + @example( + msg_id="key", + select="{ $x -> [a] { msg.", + ) + def test_malformed_select_expressions( + self, msg_id: str, select: str + ) -> None: + """Parser recovers from malformed selects.""" + source = f"{msg_id} = {select}" + event(f"select_len={len(select)}") + parser = FluentParserV1() + resource = parser.parse(source) + assert resource is not None + + +class TestMalformedTerms: + """Parser handles malformed terms and term references.""" + + @given(term_def=malformed_term_input()) + @settings(max_examples=40, deadline=None) + @example(term_def="-@invalid") + @example(term_def="-term = val\n .") + def test_malformed_term_definitions( + self, term_def: str + ) -> None: + """Parser recovers from malformed term definitions.""" + event(f"input_len={len(term_def)}") + parser = FluentParserV1() + resource = parser.parse(term_def) + assert resource is not None + + @given( + msg_id=shared_ftl_identifiers(), + term_ref=malformed_term_reference(), + ) + @settings(max_examples=40, deadline=None) + @example(msg_id="key", term_ref="{ -") + @example(msg_id="key", term_ref="{ -term(") + def test_malformed_term_references( + self, msg_id: str, term_ref: str + ) -> None: + """Parser recovers from malformed term references.""" + source = f"{msg_id} = {term_ref}" + event(f"term_ref_len={len(term_ref)}") + parser = FluentParserV1() + resource = parser.parse(source) + assert resource is not None + + +class TestMalformedAttributes: + """Parser handles malformed attributes.""" + + @given( + msg_id=shared_ftl_identifiers(), + attr_line=malformed_attribute(), + ) + @settings(max_examples=30, deadline=None) + @example(msg_id="key", attr_line=" .") + def test_malformed_attributes( + self, msg_id: str, attr_line: str + ) -> None: + """Parser recovers from malformed attributes.""" + source = f"{msg_id} = value\n{attr_line}" + event(f"attr_line_len={len(attr_line)}") + parser = FluentParserV1() + resource = parser.parse(source) + assert resource is not None + + +class TestSpecialCharacterSequences: + """Parser handles arbitrary special character sequences.""" + + @given( + text=st.text( + alphabet="{}$-.[]*\n\r\t ", + min_size=1, + max_size=30, + ) + ) + @settings(max_examples=200, deadline=None) + def test_arbitrary_special_char_sequences( + self, text: str + ) -> None: + """Parser never crashes on special FTL character combos.""" + assume(text.strip()) + parser = FluentParserV1() + try: + resource = parser.parse(text) + assert resource is not None + event(f"entry_count={len(resource.entries)}") + except RecursionError: + assume(False) + + @given( + msg_id=shared_ftl_identifiers(), + value=st.text( + alphabet=( + "abcdefghijklmnopqrstuvwxyz" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + "0123456789{}$-. " + ), + min_size=1, + max_size=40, + ), + ) + @settings(max_examples=150, deadline=None) + def test_complex_value_patterns( + self, msg_id: str, value: str + ) -> None: + """Parser handles complex patterns in values.""" + source = f"{msg_id} = {value}" + parser = FluentParserV1() + try: + resource = parser.parse(source) + assert resource is not None + has_junk = any( + isinstance(e, Junk) for e in resource.entries + ) + event( + f"outcome={'junk' if has_junk else 'clean'}" + ) + except RecursionError: + assume(False) + + @given( + ftl_source=st.text(min_size=0, max_size=100) + ) + @settings(max_examples=300, deadline=None) + def test_universal_crash_resistance( + self, ftl_source: str + ) -> None: + """Parser never crashes on any input.""" + parser = FluentParserV1() + try: + resource = parser.parse(ftl_source) + assert resource is not None + assert hasattr(resource, "entries") + event(f"entry_count={len(resource.entries)}") + except RecursionError: + assume(False) + + @given( + msg_id=shared_ftl_identifiers(), + placeable_content=st.text( + alphabet=( + "abcdefghijklmnopqrstuvwxyz" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + "0123456789$-. " + ), + min_size=0, + max_size=20, + ), + ) + @settings(max_examples=100, deadline=None) + def test_deterministic_placeable_parsing( + self, msg_id: str, placeable_content: str + ) -> None: + """Parsing same placeable twice gives same result.""" + source = f"{msg_id} = {{ {placeable_content} }}" + parser1 = FluentParserV1() + parser2 = FluentParserV1() + try: + result1 = parser1.parse(source) + result2 = parser2.parse(source) + assert len(result1.entries) == len(result2.entries) + for e1, e2 in zip( + result1.entries, result2.entries, strict=True + ): + assert isinstance(e1, type(e2)) + event(f"entry_count={len(result1.entries)}") + except RecursionError: + assume(False) diff --git a/tests/syntax_parser_property_cases/syntax_elements.py b/tests/syntax_parser_property_cases/syntax_elements.py new file mode 100644 index 00000000..9d83a324 --- /dev/null +++ b/tests/syntax_parser_property_cases/syntax_elements.py @@ -0,0 +1,723 @@ +# mypy: ignore-errors +from tests.syntax_parser_property_cases import ( + Decimal, + FluentParserV1, + Message, + Term, + assume, + attribute_names, + decimals, + event, + ftl_identifiers, + given, + numbers, + safe_text, + settings, + st, + variable_names, + variant_keys, +) + + +class TestSelectExpressionParsing: + """Property tests for select expression parsing.""" + + @given(var_name=variable_names) + @settings(max_examples=150) + def test_minimal_select_expression(self, var_name: str) -> None: + """PROPERTY: Minimal select { $var -> *[other] X } parses.""" + source = f"msg = {{ ${var_name} ->\n *[other] Default\n}}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("select_variant_count=1") + event("select_type=minimal") + + @given( + var_name=variable_names, + key1=variant_keys, + key2=variant_keys, + ) + @settings(max_examples=150) + def test_select_with_multiple_variants( + self, var_name: str, key1: str, key2: str + ) -> None: + """PROPERTY: Select with multiple variants parses.""" + source = f"""msg = {{ ${var_name} -> + [{key1}] Value1 + [{key2}] Value2 + *[other] Default +}}""" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("select_variant_count=3") + if key1 == key2: + event("variant_keys=duplicate") + else: + event("variant_keys=unique") + + @given( + var_name=variable_names, + count=st.integers(min_value=1, max_value=10), + ) + @settings(max_examples=50) + def test_select_with_many_variants(self, var_name: str, count: int) -> None: + """PROPERTY: Select with many variants parses.""" + variants = "\n".join([f" [key{i}] Value{i}" for i in range(count)]) + source = f"msg = {{ ${var_name} ->\n{variants}\n *[other] Default\n}}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event(f"select_variant_count={min(count + 1, 10)}") + + @given(var_name=variable_names, text=safe_text) + @settings(max_examples=100) + def test_select_variant_with_text(self, var_name: str, text: str) -> None: + """PROPERTY: Select variant values can contain text.""" + source = f"msg = {{ ${var_name} ->\n *[other] {text}\n}}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variant_value_type=text") + + @given( + var_name=variable_names, + var_in_variant=variable_names, + ) + @settings(max_examples=100) + def test_select_variant_with_placeable( + self, var_name: str, var_in_variant: str + ) -> None: + """PROPERTY: Select variant can contain placeables.""" + source = f"msg = {{ ${var_name} ->\n *[other] Text {{ ${var_in_variant} }}\n}}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variant_value_type=with_placeable") + + @given(var_name=variable_names, number=numbers) + @settings(max_examples=100) + def test_select_with_numeric_keys(self, var_name: str, number: int) -> None: + """PROPERTY: Select with numeric variant keys parses.""" + source = f"msg = {{ ${var_name} ->\n [{number}] Exact\n *[other] Default\n}}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("variant_key_type=numeric") + if number < 0: + event("numeric_key_sign=negative") + elif number == 0: + event("numeric_key_sign=zero") + else: + event("numeric_key_sign=positive") + + +# ============================================================================ +# TERMS +# ============================================================================ + + +class TestTermParsing: + """Property tests for term definition and reference parsing.""" + + @given(term_id=ftl_identifiers) + @settings(max_examples=150) + def test_simple_term_definition(self, term_id: str) -> None: + """PROPERTY: -term = value parses as term.""" + source = f"-{term_id} = Term value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + if len(resource.entries) > 0: + # Should be a Term entry + entry = resource.entries[0] + assert isinstance(entry, (Term, Message)) # Could be either + + # Emit events for HypoFuzz guidance + event(f"entry_type={type(entry).__name__}") + event("term_structure=simple") + + @given(term_id=ftl_identifiers, text=safe_text) + @settings(max_examples=100) + def test_term_with_text_value(self, term_id: str, text: str) -> None: + """PROPERTY: Term with text value parses.""" + source = f"-{term_id} = {text}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("term_structure=with_text") + + @given(term_id=ftl_identifiers, var_name=variable_names) + @settings(max_examples=100) + def test_term_with_placeable(self, term_id: str, var_name: str) -> None: + """PROPERTY: Term with placeable parses.""" + source = f"-{term_id} = Value {{ ${var_name} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("term_structure=with_placeable") + + @given(term_id=ftl_identifiers, attr_name=attribute_names) + @settings(max_examples=100) + def test_term_with_attribute(self, term_id: str, attr_name: str) -> None: + """PROPERTY: Term with attribute parses.""" + source = f"-{term_id} = Value\n .{attr_name} = Attribute value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("term_structure=with_attribute") + + @given( + msg_id=ftl_identifiers, + term_id=ftl_identifiers, + ) + @settings(max_examples=100) + def test_message_referencing_term(self, msg_id: str, term_id: str) -> None: + """PROPERTY: Message can reference term { -term }.""" + source = f"-{term_id} = Term\n{msg_id} = {{ -{term_id} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("term_ref_type=simple") + + @given( + msg_id=ftl_identifiers, + term_id=ftl_identifiers, + attr_name=attribute_names, + ) + @settings(max_examples=100) + def test_term_attribute_reference( + self, msg_id: str, term_id: str, attr_name: str + ) -> None: + """PROPERTY: Term attribute reference { -term.attr } parses.""" + source = ( + f"-{term_id} = Term\n" + f" .{attr_name} = Attr\n" + f"{msg_id} = {{ -{term_id}.{attr_name} }}" + ) + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("term_ref_type=with_attribute") + + +# ============================================================================ +# STRING LITERALS +# ============================================================================ + + +class TestStringLiteralParsing: + """Property tests for string literal parsing.""" + + @given(text=safe_text) + @settings(max_examples=150) + def test_simple_string_literal(self, text: str) -> None: + """PROPERTY: "text" parses as string literal.""" + escaped = text.replace('"', '\\"').replace("\\", "\\\\") + source = f'msg = {{ "{escaped}" }}' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + if len(text) == 0: + event("string_length=empty") + elif len(text) <= 10: + event("string_length=short") + elif len(text) <= 50: + event("string_length=medium") + else: + event("string_length=long") + + def test_empty_string_literal(self) -> None: + """PROPERTY: Empty string "" parses.""" + source = 'msg = { "" }' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + @given(char=st.characters(min_codepoint=32, max_codepoint=126)) + @settings(max_examples=100) + def test_string_with_single_char(self, char: str) -> None: + """PROPERTY: Single character strings parse.""" + if char == '"': + escaped = '\\"' + elif char == "\\": + escaped = "\\\\" + else: + escaped = char + source = f'msg = {{ "{escaped}" }}' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + if char in ('"', "\\"): + event("char_type=special_escape") + elif char.isalpha(): + event("char_type=alpha") + elif char.isdigit(): + event("char_type=digit") + else: + event("char_type=other") + + @given( + unicode_char=st.characters(min_codepoint=0x0100, max_codepoint=0xFFFF), + ) + @settings(max_examples=100) + def test_string_with_unicode(self, unicode_char: str) -> None: + """PROPERTY: String literals with Unicode parse.""" + source = f'msg = {{ "{unicode_char}" }}' + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + codepoint = ord(unicode_char) + if codepoint < 0x0800: + event("unicode_range=latin_extended") + elif codepoint < 0x3000: + event("unicode_range=mid_bmp") + else: + event("unicode_range=cjk_symbols") + + +# ============================================================================ +# NUMBER LITERALS +# ============================================================================ + + +class TestNumberLiteralParsing: + """Property tests for number literal parsing.""" + + @given(number=numbers) + @settings(max_examples=200) + def test_integer_literal(self, number: int) -> None: + """PROPERTY: Integer literals parse correctly.""" + source = f"msg = {{ {number} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + if number < 0: + event("integer_sign=negative") + elif number == 0: + event("integer_sign=zero") + else: + event("integer_sign=positive") + if abs(number) > 1000000: + event("integer_magnitude=large") + + @given(decimal=decimals) + @settings(max_examples=150) + def test_decimal_literal(self, decimal: Decimal) -> None: + """PROPERTY: Decimal literals parse correctly.""" + # Use fixed-point notation to avoid scientific notation in FTL source + num_str = format(decimal, "f") + # Filter out strings that are too long for the parser + assume(len(num_str) <= 50) + source = f"msg = {{ {num_str} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + if decimal < Decimal(0): + event("decimal_sign=negative") + elif decimal == Decimal(0): + event("decimal_sign=zero") + else: + event("decimal_sign=positive") + # Check if it's a whole number decimal (use str to avoid overflow on huge Decimals) + _, _, frac_part = num_str.lstrip("-").partition(".") + if not frac_part or all(c == "0" for c in frac_part): + event("decimal_type=whole") + else: + event("decimal_type=fractional") + + def test_zero_literal(self) -> None: + """PROPERTY: Zero literal parses.""" + source = "msg = { 0 }" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + @given(number=st.integers(min_value=0, max_value=1000000)) + @settings(max_examples=100) + def test_positive_integer(self, number: int) -> None: + """PROPERTY: Positive integers parse.""" + source = f"msg = {{ {number} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("integer_sign=positive") + if number > 100000: + event("integer_magnitude=large") + elif number > 1000: + event("integer_magnitude=medium") + else: + event("integer_magnitude=small") + + @given(number=st.integers(min_value=-1000000, max_value=-1)) + @settings(max_examples=100) + def test_negative_integer(self, number: int) -> None: + """PROPERTY: Negative integers parse.""" + source = f"msg = {{ {number} }}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("integer_sign=negative") + if abs(number) > 100000: + event("integer_magnitude=large") + elif abs(number) > 1000: + event("integer_magnitude=medium") + else: + event("integer_magnitude=small") + + +# ============================================================================ +# MESSAGE STRUCTURE +# ============================================================================ + + +class TestMessageStructure: + """Property tests for message structure parsing.""" + + @given(msg_id=ftl_identifiers, text=safe_text) + @settings(max_examples=150) + def test_message_with_value_only(self, msg_id: str, text: str) -> None: + """PROPERTY: Message with only value parses.""" + source = f"{msg_id} = {text}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("message_structure=value_only") + + @given( + msg_id=ftl_identifiers, + attr_name=attribute_names, + text=safe_text, + ) + @settings(max_examples=150) + def test_message_with_single_attribute( + self, msg_id: str, attr_name: str, text: str + ) -> None: + """PROPERTY: Message with one attribute parses.""" + source = f"{msg_id} = Value\n .{attr_name} = {text}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("message_structure=value_and_attribute") + event("attribute_count=1") + + @given( + msg_id=ftl_identifiers, + count=st.integers(min_value=2, max_value=5), + ) + @settings(max_examples=50) + def test_message_with_multiple_attributes( + self, msg_id: str, count: int + ) -> None: + """PROPERTY: Message with multiple attributes parses.""" + attrs = "\n".join([f" .attr{i} = Value{i}" for i in range(count)]) + source = f"{msg_id} = Main\n{attrs}" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("message_structure=value_and_attributes") + event(f"attribute_count={min(count, 5)}") + + @given(msg_id=ftl_identifiers, attr_name=attribute_names) + @settings(max_examples=100) + def test_message_attribute_only(self, msg_id: str, attr_name: str) -> None: + """PROPERTY: Message with only attributes (no value) parses.""" + source = f"{msg_id} =\n .{attr_name} = Attribute value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("message_structure=attribute_only") + + @given( + msg_id=ftl_identifiers, + var_name=variable_names, + ) + @settings(max_examples=100) + def test_message_value_with_placeable( + self, msg_id: str, var_name: str + ) -> None: + """PROPERTY: Message value with placeable parses.""" + source = f"{msg_id} = Text {{ ${var_name} }} more" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("message_structure=value_with_placeable") + + +# ============================================================================ +# COMMENTS +# ============================================================================ + + +class TestCommentParsing: + """Property tests for comment parsing.""" + + @given(text=safe_text) + @settings(max_examples=150) + def test_standalone_comment(self, text: str) -> None: + """PROPERTY: Standalone comment parses.""" + source = f"# {text}\n\nmsg = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("comment_level=standalone") + + @given(text=safe_text) + @settings(max_examples=100) + def test_group_comment(self, text: str) -> None: + """PROPERTY: Group comment ## parses.""" + source = f"## {text}\n\nmsg = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("comment_level=group") + + @given(text=safe_text) + @settings(max_examples=100) + def test_resource_comment(self, text: str) -> None: + """PROPERTY: Resource comment ### parses.""" + source = f"### {text}\n\nmsg = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("comment_level=resource") + + @given( + text=safe_text, + count=st.integers(min_value=1, max_value=5), + ) + @settings(max_examples=50) + def test_multiple_comment_lines(self, text: str, count: int) -> None: + """PROPERTY: Multiple consecutive comment lines parse.""" + comments = "\n".join([f"# {text} {i}" for i in range(count)]) + source = f"{comments}\n\nmsg = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event(f"comment_lines={min(count, 5)}") + + @given(msg_id=ftl_identifiers, text=safe_text) + @settings(max_examples=100) + def test_comment_attached_to_message(self, msg_id: str, text: str) -> None: + """PROPERTY: Comment immediately before message parses.""" + source = f"# {text}\n{msg_id} = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("comment_position=attached") + + +# ============================================================================ +# WHITESPACE HANDLING +# ============================================================================ + + +class TestWhitespaceHandling: + """Property tests for whitespace handling.""" + + @given( + msg_id=ftl_identifiers, + spaces=st.integers(min_value=0, max_value=10), + ) + @settings(max_examples=100) + def test_spaces_before_equals(self, msg_id: str, spaces: int) -> None: + """PROPERTY: Spaces before = are handled.""" + source = f"{msg_id}{' ' * spaces}= value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("whitespace_position=before_equals") + if spaces == 0: + event("space_count=none") + elif spaces <= 3: + event("space_count=few") + else: + event("space_count=many") + + @given( + msg_id=ftl_identifiers, + spaces=st.integers(min_value=0, max_value=10), + ) + @settings(max_examples=100) + def test_spaces_after_equals(self, msg_id: str, spaces: int) -> None: + """PROPERTY: Spaces after = are handled.""" + source = f"{msg_id} ={' ' * spaces}value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("whitespace_position=after_equals") + if spaces == 0: + event("space_count=none") + elif spaces <= 3: + event("space_count=few") + else: + event("space_count=many") + + @given( + msg_id=ftl_identifiers, + indent=st.integers(min_value=4, max_value=12), + ) + @settings(max_examples=50) + def test_attribute_indentation(self, msg_id: str, indent: int) -> None: + """PROPERTY: Attribute indentation is handled.""" + source = f"{msg_id} = value\n{' ' * indent}.attr = value" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("whitespace_type=indentation") + if indent == 4: + event("indent_level=minimal") + elif indent <= 8: + event("indent_level=standard") + else: + event("indent_level=deep") + + @given( + msg_id=ftl_identifiers, + blank_lines=st.integers(min_value=0, max_value=5), + ) + @settings(max_examples=50) + def test_blank_lines_between_messages( + self, msg_id: str, blank_lines: int + ) -> None: + """PROPERTY: Blank lines between messages don't affect parsing.""" + source = f"{msg_id}1 = value1{chr(10) * blank_lines}{msg_id}2 = value2" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("whitespace_type=blank_lines") + if blank_lines == 0: + event("blank_line_count=none") + elif blank_lines == 1: + event("blank_line_count=single") + else: + event("blank_line_count=multiple") + + @given( + msg_id=ftl_identifiers, + trailing_spaces=st.integers(min_value=0, max_value=10), + ) + @settings(max_examples=50) + def test_trailing_whitespace(self, msg_id: str, trailing_spaces: int) -> None: + """PROPERTY: Trailing whitespace is handled.""" + source = f"{msg_id} = value{' ' * trailing_spaces}\n" + parser = FluentParserV1() + resource = parser.parse(source) + + assert resource is not None + + # Emit events for HypoFuzz guidance + event("whitespace_position=trailing") + if trailing_spaces == 0: + event("space_count=none") + elif trailing_spaces <= 3: + event("space_count=few") + else: + event("space_count=many") + + +# ============================================================================ +# FUNCTION CALLS +# ============================================================================ + + diff --git a/tests/syntax_serializer_roundtrip_cases/__init__.py b/tests/syntax_serializer_roundtrip_cases/__init__.py new file mode 100644 index 00000000..56736e9d --- /dev/null +++ b/tests/syntax_serializer_roundtrip_cases/__init__.py @@ -0,0 +1,948 @@ +"""Roundtrip tests for syntax.serializer: parse(serialize(ast)) == ast. + +Validates both the parser and serializer simultaneously, covering programmatic +ASTs with embedded newlines, whitespace preservation, and convergence stability. +""" + +from __future__ import annotations + +import pytest +from hypothesis import assume, event, example, given, settings +from hypothesis import strategies as st + +from ftllexengine.enums import CommentType +from ftllexengine.syntax import parse, serialize +from ftllexengine.syntax.ast import ( + Comment, + Identifier, + Junk, + Message, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.parser import FluentParserV1 +from ftllexengine.syntax.serializer import FluentSerializer +from tests.strategies import ( + ftl_comments, + ftl_message_nodes, + ftl_patterns, + ftl_resources, + ftl_select_expressions, + ftl_variable_references, +) + +# ============================================================================ +# SIMPLE ROUNDTRIP TESTS (Example-Based) +# ============================================================================ + + +def test_roundtrip_simple_message(): + """Round-trip a simple message with text only.""" + # Create AST + msg = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello, World!"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Serialize and parse back + serialized = serialize(resource) + reparsed = parse(serialized) + + # Should be structurally identical + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + assert reparsed.entries[0].id.name == "hello" + + +def test_roundtrip_message_with_variable(): + """Round-trip a message with variable interpolation.""" + msg = Message( + id=Identifier(name="greeting"), + value=Pattern( + elements=( + TextElement(value="Hello, "), + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value="!"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + assert reparsed.entries[0].id.name == "greeting" + # Verify pattern has 3 elements + pattern = reparsed.entries[0].value + assert pattern is not None + assert len(pattern.elements) == 3 + + +def test_roundtrip_select_expression(): + """Round-trip a message with select expression (plurals).""" + msg = Message( + id=Identifier(name="emails"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="one email"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern( + elements=( + Placeable( + expression=VariableReference( + id=Identifier(name="count") + ) + ), + TextElement(value=" emails"), + ) + ), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + + +def test_roundtrip_numeric_variant(): + """Round-trip select expression with numeric variant keys.""" + msg = Message( + id=Identifier(name="items"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="no items"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one item"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + msg_parsed = reparsed.entries[0] + assert isinstance(msg_parsed, Message) + assert msg_parsed.id.name == "items" + + +def test_roundtrip_comment(): + """Round-trip standalone comment. + + NOTE: Parser does not currently support standalone comments - they are + silently ignored during parsing. This test documents the limitation. + When parser support is added, this test should pass. + """ + comment = Comment(content=" This is a comment", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + + serialized = serialize(resource) + # Serializer correctly outputs: "# This is a comment\n" + assert serialized == "# This is a comment\n" + + # Per Fluent spec: Comments are preserved in AST + reparsed = parse(serialized) + + # Spec-conformant behavior: Comments are preserved + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Comment) + assert reparsed.entries[0].content == comment.content + + +def test_roundtrip_junk(): + """Round-trip junk (invalid syntax preserved).""" + junk = Junk(content="invalid syntax here {") + resource = Resource(entries=(junk,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Junk gets reparsed as junk + assert len(reparsed.entries) >= 1 + # At least one entry should be junk + assert any(isinstance(e, Junk) for e in reparsed.entries) + + +def test_roundtrip_junk_with_leading_whitespace(): + """Round-trip junk with leading whitespace without redundant newlines. + + Tests that the serializer does not add redundant separators before Junk + entries when the Junk content already includes leading whitespace. + The parser includes preceding whitespace in Junk.content for containment. + """ + # Parse FTL with message followed by blank lines and indented junk + source = "msg = hello\n\n bad" + resource = parse(source) + + # Serialize and re-parse + serialized = serialize(resource) + reparsed = parse(serialized) + + # Verify file doesn't grow on multiple roundtrips (key invariant) + serialized2 = serialize(reparsed) + assert len(serialized2) == len(serialized), ( + "File size should remain stable across roundtrips (no whitespace inflation)" + ) + + # Verify multiple roundtrips converge to stable output + serialized3 = serialize(parse(serialized2)) + assert serialized3 == serialized2, ( + "Serialization should be idempotent after first roundtrip" + ) + + +def test_roundtrip_multiple_messages(): + """Round-trip resource with multiple messages.""" + msg1 = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello!"),)), + attributes=(), + ) + msg2 = Message( + id=Identifier(name="goodbye"), + value=Pattern(elements=(TextElement(value="Goodbye!"),)), + attributes=(), + ) + resource = Resource(entries=(msg1, msg2)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Should have at least 2 messages + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) >= 2 + + +def test_roundtrip_mixed_entries(): + """Round-trip resource with messages and standalone comments. + + When Comments appear as separate entries in the AST (not as message.comment), + they are standalone comments and should remain standalone after roundtrip. + The serializer preserves this by adding 2 blank lines between a standalone + comment and the following message/term. + """ + entries = ( + Comment(content=" Header comment", type=CommentType.COMMENT), + Message( + id=Identifier(name="app-name"), + value=Pattern(elements=(TextElement(value="MyApp"),)), + attributes=(), + ), + Comment(content=" Another comment", type=CommentType.COMMENT), + Message( + id=Identifier(name="version"), + value=Pattern(elements=(TextElement(value="1.0.0"),)), + attributes=(), + ), + ) + resource = Resource(entries=entries) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Standalone comments remain standalone after roundtrip + standalone_comments = [e for e in reparsed.entries if isinstance(e, Comment)] + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(standalone_comments) == 2 # Comments remain standalone + assert len(messages) == 2 # Messages survive roundtrip + + # Messages should NOT have attached comments (comments are standalone) + assert messages[0].comment is None + assert messages[1].comment is None + + # Comment content is preserved + assert "Header comment" in standalone_comments[0].content + assert "Another comment" in standalone_comments[1].content + + +def test_roundtrip_attached_comments(): + """Round-trip resource with attached comments. + + When Comments are set as message.comment (not as separate entries), + they are attached comments and should remain attached after roundtrip. + """ + entries = ( + Message( + id=Identifier(name="app-name"), + value=Pattern(elements=(TextElement(value="MyApp"),)), + attributes=(), + comment=Comment(content=" Attached to app-name", type=CommentType.COMMENT), + ), + Message( + id=Identifier(name="version"), + value=Pattern(elements=(TextElement(value="1.0.0"),)), + attributes=(), + comment=Comment(content=" Attached to version", type=CommentType.COMMENT), + ), + ) + resource = Resource(entries=entries) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # No standalone comments - all attached + standalone_comments = [e for e in reparsed.entries if isinstance(e, Comment)] + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(standalone_comments) == 0 # No standalone comments + assert len(messages) == 2 # Messages survive roundtrip + + # Comments remain attached to their messages + assert messages[0].comment is not None + assert "Attached to app-name" in messages[0].comment.content + assert messages[1].comment is not None + assert "Attached to version" in messages[1].comment.content + + +def test_roundtrip_empty_resource(): + """Round-trip empty resource.""" + resource = Resource(entries=()) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 0 + + +def test_roundtrip_message_with_only_placeable(): + """Round-trip message with only a placeable (no text).""" + msg = Message( + id=Identifier(name="count"), + value=Pattern( + elements=(Placeable(expression=VariableReference(id=Identifier(name="num"))),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + + +def test_roundtrip_complex_pattern(): + """Round-trip message with complex pattern (text + variables). + + NOTE: Parser creates spurious Junk entry for trailing period. + This is a parser quirk - the message itself parses correctly. + """ + msg = Message( + id=Identifier(name="user-info"), + value=Pattern( + elements=( + TextElement(value="User "), + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value=" has "), + Placeable(expression=VariableReference(id=Identifier(name="count"))), + TextElement(value=" items"), # Removed trailing period to avoid parser quirk + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Message parses correctly (ignore spurious Junk entries) + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + msg_parsed = messages[0] + assert isinstance(msg_parsed, Message) + assert msg_parsed.value is not None + assert len(msg_parsed.value.elements) == 5 + + +# ============================================================================ +# WHITESPACE PRESERVATION ROUNDTRIP TESTS +# ============================================================================ + + +def test_roundtrip_multiline_leading_whitespace(): + """Round-trip preserves leading whitespace after newlines. + + Tests fix for IMPL-SERIALIZER-ROUNDTRIP-CORRUPTION-001: when TextElement + with leading whitespace follows element ending with newline, serializer + must emit pattern on separate line to preserve the whitespace semantically. + """ + # Pattern: "Line 1\n Line 2" (2 leading spaces on line 2) + msg = Message( + id=Identifier(name="code-block"), + value=Pattern( + elements=( + TextElement(value="Line 1\n"), + TextElement(value=" Line 2"), # 2-space indent + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Extract reparsed pattern content + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + # Reconstruct the pattern content from elements + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + assert "Line 1\n" in content + assert " Line 2" in content # 2 spaces preserved + + +def test_roundtrip_code_example_indent(): + """Round-trip preserves code example indentation. + + Tests common use case of embedding code examples in localization strings. + """ + # Multi-line code example with indentation + msg = Message( + id=Identifier(name="code-example"), + value=Pattern( + elements=( + TextElement(value="Example:\n"), + TextElement(value=" def hello():\n"), + TextElement(value=" print('Hi')"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + # Verify indentation preserved + assert " def hello():" in content + assert " print('Hi')" in content + + +def test_roundtrip_whitespace_idempotent(): + """Multiple roundtrips produce identical output (idempotency). + + Tests that whitespace handling doesn't cause drift across roundtrips. + """ + msg = Message( + id=Identifier(name="formatted"), + value=Pattern( + elements=( + TextElement(value="Header:\n"), + TextElement(value=" Item 1\n"), + TextElement(value=" Item 2"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # First roundtrip + serialized1 = serialize(resource) + reparsed1 = parse(serialized1) + + # Second roundtrip + serialized2 = serialize(reparsed1) + reparsed2 = parse(serialized2) + + # Third roundtrip + serialized3 = serialize(reparsed2) + + # Output should stabilize after first roundtrip + assert serialized2 == serialized3, "Serialization should be idempotent" + + +def test_roundtrip_mixed_whitespace_and_placeables(): + """Round-trip preserves whitespace with interleaved placeables.""" + msg = Message( + id=Identifier(name="mixed"), + value=Pattern( + elements=( + TextElement(value="Results for "), + Placeable(expression=VariableReference(id=Identifier(name="query"))), + TextElement(value=":\n"), + TextElement(value=" - First result\n"), + TextElement(value=" - "), + Placeable(expression=VariableReference(id=Identifier(name="count"))), + TextElement(value=" more"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + # Verify structure preserved - should have TextElements with whitespace + text_elements = [e for e in pattern.elements if isinstance(e, TextElement)] + text_content = "".join(e.value for e in text_elements) + + # Check whitespace preservation + assert ":\n" in text_content + assert " - First result\n" in text_content or " -" in text_content + + +def test_roundtrip_tab_indentation(): + """Round-trip preserves tab indentation.""" + msg = Message( + id=Identifier(name="tabbed"), + value=Pattern( + elements=( + TextElement(value="Data:\n"), + TextElement(value="\tColumn 1\n"), + TextElement(value="\t\tNested"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + assert "\tColumn 1" in content + assert "\t\tNested" in content + + +def test_roundtrip_preserves_parsed_whitespace(): + """Parse and serialize preserves original whitespace from FTL source. + + Tests the full cycle: FTL source -> parse -> serialize -> parse -> serialize + """ + # FTL with intentional indentation + source = """\ +code-snippet = + Example code: + if True: + print("hello") +""" + parsed = parse(source) + serialized = serialize(parsed) + reparsed = parse(serialized) + serialized2 = serialize(reparsed) + + # Should stabilize + assert serialized == serialized2, "Roundtrip should be stable" + + # Verify semantic content preserved + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + # Original indentation relationships should be preserved + assert "Example code:" in content + assert "print(" in content + + +def test_roundtrip_compact_messages_no_blank_lines(): + """Roundtrip of compact messages preserves no-blank-line format. + + Tests the fix for NAME-SERIALIZER-SPACING-001 where serializer was adding + redundant newlines between Message/Term entries. + """ + # Compact FTL with no blank lines between messages + source = "msg1 = First\nmsg2 = Second\nmsg3 = Third" + + parsed = parse(source) + serialized = serialize(parsed) + + # Serialized output should maintain compact format (no blank lines) + assert serialized == "msg1 = First\nmsg2 = Second\nmsg3 = Third\n" + + # Verify roundtrip preserves structure + reparsed = parse(serialized) + assert len(reparsed.entries) == 3 + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 3 + assert messages[0].id.name == "msg1" + assert messages[1].id.name == "msg2" + assert messages[2].id.name == "msg3" + + +def test_comment_message_separation_preserved(): + """Comment->Message still gets blank line to prevent attachment. + + Tests the fix for NAME-SERIALIZER-SPACING-001 ensures Comment separation + logic is preserved (blank lines prevent comment attachment on re-parse). + """ + # Standalone comment followed by message (with blank line) + source = "# Standalone comment\n\nmsg = Value" + + parsed = parse(source) + serialized = serialize(parsed) + + # Should preserve blank line between comment and message + # The blank line prevents the comment from being attached to the message + assert "\n\n" in serialized + + # Verify roundtrip: comment should remain standalone + reparsed = parse(serialized) + comments = [e for e in reparsed.entries if isinstance(e, Comment)] + messages = [e for e in reparsed.entries if isinstance(e, Message)] + + assert len(comments) == 1 + assert len(messages) == 1 + # Message should NOT have an attached comment + assert messages[0].comment is None + + +def test_roundtrip_mixed_spacing_preserved(): + """Mixed spacing patterns are preserved during roundtrip.""" + # Mix of compact messages and separated entries + source = "msg1 = First\nmsg2 = Second\n\n# Comment\n\nmsg3 = Third" + + parsed = parse(source) + serialized = serialize(parsed) + reparsed = parse(serialized) + + # Should have 3 messages and 1 comment + messages = [e for e in reparsed.entries if isinstance(e, Message)] + comments = [e for e in reparsed.entries if isinstance(e, Comment)] + + assert len(messages) == 3 + assert len(comments) == 1 + + # First two messages should be compact (consecutive) + # Comment should be standalone (not attached) + # Third message should be after comment + assert messages[0].id.name == "msg1" + assert messages[1].id.name == "msg2" + assert messages[2].id.name == "msg3" + + +# ============================================================================ +# PROPERTY-BASED ROUNDTRIP TESTS (Hypothesis) +# ============================================================================ + + +@given(ftl_message_nodes()) +@settings(max_examples=30) +def test_roundtrip_property_messages(message: Message) -> None: + """Property: All generated messages round-trip successfully.""" + resource = Resource(entries=(message,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) >= 1 + assert messages[0].id.name == message.id.name + has_attrs = len(message.attributes) > 0 + event(f"has_attributes={has_attrs}") + event("outcome=message_roundtrip") + + +@given(ftl_patterns()) +@settings(max_examples=30) +def test_roundtrip_property_patterns(pattern: Pattern) -> None: + """Property: All generated patterns round-trip in messages.""" + msg = Message( + id=Identifier(name="test"), value=pattern, attributes=() + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) >= 1 + event(f"element_count={len(pattern.elements)}") + event("outcome=pattern_roundtrip") + + +@given(ftl_select_expressions()) +@settings(max_examples=20) +def test_roundtrip_property_select_expressions( + select_expr: SelectExpression, +) -> None: + """Property: All generated select expressions round-trip.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=select_expr),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) >= 1 + event(f"variant_count={len(select_expr.variants)}") + event("outcome=select_roundtrip") + + +@given(ftl_comments()) +@settings(max_examples=30) +def test_roundtrip_property_comments(comment_str: str) -> None: + """Property: All generated comments serialize correctly.""" + if comment_str.startswith("### "): + comment_type = CommentType.RESOURCE + content = comment_str[4:] + elif comment_str.startswith("## "): + comment_type = CommentType.GROUP + content = comment_str[3:] + else: + comment_type = CommentType.COMMENT + content = comment_str[2:] + + comment_node = Comment(content=content, type=comment_type) + resource = Resource(entries=(comment_node,)) + + serialized = serialize(resource) + assert isinstance(serialized, str) + assert serialized.startswith("#") + + _ = parse(serialized) + event(f"comment_type={comment_type.name}") + event("outcome=comment_roundtrip") + + +@given(ftl_resources()) +@settings(max_examples=20) +def test_roundtrip_property_complete_resources( + resource: Resource, +) -> None: + """Property: All generated resources round-trip successfully.""" + serialized = serialize(resource) + reparsed = parse(serialized) + + original_messages = [ + e for e in resource.entries if isinstance(e, Message) + ] + reparsed_messages = [ + e for e in reparsed.entries if isinstance(e, Message) + ] + + original_ids = {msg.id.name for msg in original_messages} + reparsed_ids = {msg.id.name for msg in reparsed_messages} + assert original_ids.issubset(reparsed_ids) + event(f"entry_count={len(resource.entries)}") + event("outcome=resource_roundtrip") + + +@given(ftl_variable_references()) +@settings(max_examples=30) +def test_roundtrip_property_variable_references( + var_ref: VariableReference, +) -> None: + """Property: Variable references round-trip in placeables.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=var_ref),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) >= 1 + event(f"var_name={var_ref.id.name}") + event("outcome=varref_roundtrip") + + +# ============================================================================ +# SERIALIZER VALIDITY TESTS +# ============================================================================ + + +@given(ftl_resources()) +@settings(max_examples=30) +def test_serializer_produces_valid_ftl(resource: Resource) -> None: + """Property: Serialized output always produces parseable FTL.""" + serialized = serialize(resource) + + assert isinstance(serialized, str) + + result = parse(serialized) + assert isinstance(result, Resource) + event(f"entry_count={len(resource.entries)}") + event("outcome=valid_ftl") + + +@given(ftl_message_nodes()) +@settings(max_examples=30) +def test_serializer_deterministic(message: Message) -> None: + """Property: Same AST always produces same serialized output.""" + resource = Resource(entries=(message,)) + + serialized1 = serialize(resource) + serialized2 = serialize(resource) + + assert serialized1 == serialized2 + event("outcome=deterministic") + + +# ============================================================================ +# PROGRAMMATIC AST ROUNDTRIPS (from test_serializer_programmatic_roundtrip.py) +# ============================================================================ + + +_parser = FluentParserV1() +_serializer = FluentSerializer() + + +def _roundtrip_pattern_value(pattern_text: str) -> str: + """Create a programmatic AST, serialize, parse, and return pattern value.""" + msg = Message( + id=Identifier(name="msg", span=None), + value=Pattern(elements=(TextElement(value=pattern_text),)), + attributes=(), + comment=None, + span=None, + ) + resource = Resource(entries=(msg,)) + serialized = _serializer.serialize(resource) + parsed = _parser.parse(serialized) + entry = parsed.entries[0] + assert hasattr(entry, "value") + assert entry.value is not None + return "".join( + el.value for el in entry.value.elements # type: ignore[union-attr] + ) + +__all__ = [ + "Comment", + "CommentType", + "FluentParserV1", + "FluentSerializer", + "Identifier", + "Junk", + "Message", + "NumberLiteral", + "Pattern", + "Placeable", + "Resource", + "SelectExpression", + "TextElement", + "VariableReference", + "Variant", + "_parser", + "_roundtrip_pattern_value", + "_serializer", + "assume", + "event", + "example", + "ftl_comments", + "ftl_message_nodes", + "ftl_patterns", + "ftl_resources", + "ftl_select_expressions", + "ftl_variable_references", + "given", + "parse", + "pytest", + "serialize", + "settings", + "st", + "test_comment_message_separation_preserved", + "test_roundtrip_attached_comments", + "test_roundtrip_code_example_indent", + "test_roundtrip_comment", + "test_roundtrip_compact_messages_no_blank_lines", + "test_roundtrip_complex_pattern", + "test_roundtrip_empty_resource", + "test_roundtrip_junk", + "test_roundtrip_junk_with_leading_whitespace", + "test_roundtrip_message_with_only_placeable", + "test_roundtrip_message_with_variable", + "test_roundtrip_mixed_entries", + "test_roundtrip_mixed_spacing_preserved", + "test_roundtrip_mixed_whitespace_and_placeables", + "test_roundtrip_multiline_leading_whitespace", + "test_roundtrip_multiple_messages", + "test_roundtrip_numeric_variant", + "test_roundtrip_preserves_parsed_whitespace", + "test_roundtrip_property_comments", + "test_roundtrip_property_complete_resources", + "test_roundtrip_property_messages", + "test_roundtrip_property_patterns", + "test_roundtrip_property_select_expressions", + "test_roundtrip_property_variable_references", + "test_roundtrip_select_expression", + "test_roundtrip_simple_message", + "test_roundtrip_tab_indentation", + "test_roundtrip_whitespace_idempotent", + "test_serializer_deterministic", + "test_serializer_produces_valid_ftl", +] diff --git a/tests/syntax_serializer_roundtrip_cases/identifier_roundtrip_fuzz_marked_deadline_none.py b/tests/syntax_serializer_roundtrip_cases/identifier_roundtrip_fuzz_marked_deadline_none.py new file mode 100644 index 00000000..c858b0dc --- /dev/null +++ b/tests/syntax_serializer_roundtrip_cases/identifier_roundtrip_fuzz_marked_deadline_none.py @@ -0,0 +1,34 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_serializer_roundtrip.py.""" + +from tests.syntax_serializer_roundtrip_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# Identifier Roundtrip (Fuzz-marked: deadline=None) +# ============================================================================ + + +@pytest.mark.fuzz +@given(st.text(alphabet="abcdefghijklmnopqrstuvwxyz", min_size=1, max_size=20)) +@settings(max_examples=50, deadline=None) +def test_serialize_parse_identifiers(identifier: str) -> None: + """Property: valid identifiers survive serialize->parse round-trip. + + FUZZ: run with ./scripts/fuzz_hypofuzz.sh --deep or pytest -m fuzz + """ + assume(identifier[0].isalpha()) + assume(all(c.isalnum() or c == "-" for c in identifier)) + + ftl_source = f"{identifier} = Test value" + resource = parse(ftl_source) + + assume(len(resource.entries) > 0) + assume(not isinstance(resource.entries[0], Junk)) + + serialized = serialize(resource) + resource2 = parse(serialized) + + event(f"id_len={len(identifier)}") + assert resource2 is not None + assert len(resource2.entries) == len(resource.entries) + event("outcome=e2e_id_roundtrip_success") diff --git a/tests/syntax_serializer_roundtrip_cases/property_based_roundtrip_tests_hypothesis.py b/tests/syntax_serializer_roundtrip_cases/property_based_roundtrip_tests_hypothesis.py new file mode 100644 index 00000000..97ff94a9 --- /dev/null +++ b/tests/syntax_serializer_roundtrip_cases/property_based_roundtrip_tests_hypothesis.py @@ -0,0 +1,133 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_serializer_roundtrip.py.""" + +from tests.syntax_serializer_roundtrip_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PROPERTY-BASED ROUNDTRIP TESTS (Hypothesis) +# ============================================================================ + + +@given(ftl_message_nodes()) +@settings(max_examples=30) +def test_roundtrip_property_messages(message: Message) -> None: + """Property: All generated messages round-trip successfully.""" + resource = Resource(entries=(message,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) >= 1 + assert messages[0].id.name == message.id.name + has_attrs = len(message.attributes) > 0 + event(f"has_attributes={has_attrs}") + event("outcome=message_roundtrip") + + +@given(ftl_patterns()) +@settings(max_examples=30) +def test_roundtrip_property_patterns(pattern: Pattern) -> None: + """Property: All generated patterns round-trip in messages.""" + msg = Message( + id=Identifier(name="test"), value=pattern, attributes=() + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) >= 1 + event(f"element_count={len(pattern.elements)}") + event("outcome=pattern_roundtrip") + + +@given(ftl_select_expressions()) +@settings(max_examples=20) +def test_roundtrip_property_select_expressions( + select_expr: SelectExpression, +) -> None: + """Property: All generated select expressions round-trip.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=select_expr),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) >= 1 + event(f"variant_count={len(select_expr.variants)}") + event("outcome=select_roundtrip") + + +@given(ftl_comments()) +@settings(max_examples=30) +def test_roundtrip_property_comments(comment_str: str) -> None: + """Property: All generated comments serialize correctly.""" + if comment_str.startswith("### "): + comment_type = CommentType.RESOURCE + content = comment_str[4:] + elif comment_str.startswith("## "): + comment_type = CommentType.GROUP + content = comment_str[3:] + else: + comment_type = CommentType.COMMENT + content = comment_str[2:] + + comment_node = Comment(content=content, type=comment_type) + resource = Resource(entries=(comment_node,)) + + serialized = serialize(resource) + assert isinstance(serialized, str) + assert serialized.startswith("#") + + _ = parse(serialized) + event(f"comment_type={comment_type.name}") + event("outcome=comment_roundtrip") + + +@given(ftl_resources()) +@settings(max_examples=20) +def test_roundtrip_property_complete_resources( + resource: Resource, +) -> None: + """Property: All generated resources round-trip successfully.""" + serialized = serialize(resource) + reparsed = parse(serialized) + + original_messages = [ + e for e in resource.entries if isinstance(e, Message) + ] + reparsed_messages = [ + e for e in reparsed.entries if isinstance(e, Message) + ] + + original_ids = {msg.id.name for msg in original_messages} + reparsed_ids = {msg.id.name for msg in reparsed_messages} + assert original_ids.issubset(reparsed_ids) + event(f"entry_count={len(resource.entries)}") + event("outcome=resource_roundtrip") + + +@given(ftl_variable_references()) +@settings(max_examples=30) +def test_roundtrip_property_variable_references( + var_ref: VariableReference, +) -> None: + """Property: Variable references round-trip in placeables.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=var_ref),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) >= 1 + event(f"var_name={var_ref.id.name}") + event("outcome=varref_roundtrip") diff --git a/tests/syntax_serializer_roundtrip_cases/serializer_validity_tests.py b/tests/syntax_serializer_roundtrip_cases/serializer_validity_tests.py new file mode 100644 index 00000000..46d1a35a --- /dev/null +++ b/tests/syntax_serializer_roundtrip_cases/serializer_validity_tests.py @@ -0,0 +1,187 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_serializer_roundtrip.py.""" + +from tests.syntax_serializer_roundtrip_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SERIALIZER VALIDITY TESTS +# ============================================================================ + + +@given(ftl_resources()) +@settings(max_examples=30) +def test_serializer_produces_valid_ftl(resource: Resource) -> None: + """Property: Serialized output always produces parseable FTL.""" + serialized = serialize(resource) + + assert isinstance(serialized, str) + + result = parse(serialized) + assert isinstance(result, Resource) + event(f"entry_count={len(resource.entries)}") + event("outcome=valid_ftl") + + +@given(ftl_message_nodes()) +@settings(max_examples=30) +def test_serializer_deterministic(message: Message) -> None: + """Property: Same AST always produces same serialized output.""" + resource = Resource(entries=(message,)) + + serialized1 = serialize(resource) + serialized2 = serialize(resource) + + assert serialized1 == serialized2 + event("outcome=deterministic") + + +# ============================================================================ +# PROGRAMMATIC AST ROUNDTRIPS (from test_serializer_programmatic_roundtrip.py) +# ============================================================================ + + +_parser = FluentParserV1() +_serializer = FluentSerializer() + + +def _roundtrip_pattern_value(pattern_text: str) -> str: + """Create a programmatic AST, serialize, parse, and return pattern value.""" + msg = Message( + id=Identifier(name="msg", span=None), + value=Pattern(elements=(TextElement(value=pattern_text),)), + attributes=(), + comment=None, + span=None, + ) + resource = Resource(entries=(msg,)) + serialized = _serializer.serialize(resource) + parsed = _parser.parse(serialized) + entry = parsed.entries[0] + assert hasattr(entry, "value") + assert entry.value is not None + return "".join( + el.value for el in entry.value.elements # type: ignore[union-attr] + ) + + +class TestEmbeddedNewlineWhitespace: + """Roundtrip preservation of embedded newlines with significant whitespace.""" + + def test_five_space_indent(self) -> None: + """Embedded newline with 5-space indent preserved through roundtrip.""" + original = "foo\n bar" + assert _roundtrip_pattern_value(original) == original + + def test_four_space_indent(self) -> None: + """Embedded newline with exactly 4-space indent (boundary case).""" + original = "foo\n bar" + assert _roundtrip_pattern_value(original) == original + + def test_single_space_indent(self) -> None: + """Embedded newline with single space indent.""" + original = "foo\n bar" + assert _roundtrip_pattern_value(original) == original + + def test_multiple_newlines_varying_indent(self) -> None: + """Multiple embedded newlines with different indentation levels.""" + original = "a\n b\n c\n d" + assert _roundtrip_pattern_value(original) == original + + def test_no_whitespace_after_newline(self) -> None: + """Embedded newline without whitespace does not trigger separate-line.""" + original = "hello\nworld" + assert _roundtrip_pattern_value(original) == original + + def test_trailing_newline_no_whitespace(self) -> None: + """Trailing newline at end of text element.""" + original = "hello\n" + result = _roundtrip_pattern_value(original) + # Trailing newline may be normalized during parse + assert result.rstrip("\n") == "hello" + + def test_tab_after_newline(self) -> None: + """Tab character after newline (not space, no separate-line needed). + + Only space characters trigger separate-line serialization per the + FTL spec's whitespace handling (tab is not continuation indent). + """ + original = "foo\n\tbar" + assert _roundtrip_pattern_value(original) == original + + +def _extract_element_values(resource: Resource) -> list[str]: + """Extract text element values from the first entry's pattern.""" + entry = resource.entries[0] + assert hasattr(entry, "value") + assert entry.value is not None + return [el.value for el in entry.value.elements] # type: ignore[union-attr] + + +class TestParserProducedRoundtrip: + """Verify existing parser-produced roundtrip behavior is preserved.""" + + def test_separate_line_with_extra_indent(self) -> None: + """Parser-produced AST from FTL with extra indentation.""" + ftl = "msg =\n foo\n bar\n" + resource = _parser.parse(ftl) + serialized = _serializer.serialize(resource) + resource2 = _parser.parse(serialized) + assert _extract_element_values(resource) == _extract_element_values(resource2) + + def test_inline_start_multiline(self) -> None: + """Inline pattern start with continuation line.""" + ftl = "msg = foo\n bar\n" + resource = _parser.parse(ftl) + serialized = _serializer.serialize(resource) + resource2 = _parser.parse(serialized) + assert _extract_element_values(resource) == _extract_element_values(resource2) + + +class TestSerializerStability: + """Serialize-parse-serialize stability (idempotence after first roundtrip).""" + + @given( + indent=st.integers(min_value=1, max_value=12), + line_count=st.integers(min_value=2, max_value=5), + ) + @settings(max_examples=100) + @example(indent=1, line_count=2) + @example(indent=4, line_count=2) + @example(indent=5, line_count=3) + def test_embedded_indent_stability(self, indent: int, line_count: int) -> None: + """After first roundtrip, subsequent roundtrips are stable. + + Constructs patterns with N lines, each indented by `indent` spaces. + After initial serialize-parse, the result must be stable on + subsequent serialize-parse cycles. + """ + event(f"indent={indent}") + event(f"line_count={line_count}") + lines = [f"{' ' * indent}line{i}" if i > 0 else "first" for i in range(line_count)] + original = "\n".join(lines) + + # First roundtrip + first_rt = _roundtrip_pattern_value(original) + + # Second roundtrip from the first result + msg2 = Message( + id=Identifier(name="msg", span=None), + value=Pattern(elements=(TextElement(value=first_rt),)), + attributes=(), + comment=None, + span=None, + ) + resource2 = Resource(entries=(msg2,)) + serialized2 = _serializer.serialize(resource2) + parsed2 = _parser.parse(serialized2) + entry2 = parsed2.entries[0] + assert hasattr(entry2, "value") + assert entry2.value is not None + second_rt = "".join( + el.value for el in entry2.value.elements # type: ignore[union-attr] + ) + + # Stability: second roundtrip equals first roundtrip + assert first_rt == second_rt, ( + f"Roundtrip not stable: first={first_rt!r}, second={second_rt!r}" + ) diff --git a/tests/syntax_serializer_roundtrip_cases/simple_roundtrip_tests_example_based.py b/tests/syntax_serializer_roundtrip_cases/simple_roundtrip_tests_example_based.py new file mode 100644 index 00000000..c01743c9 --- /dev/null +++ b/tests/syntax_serializer_roundtrip_cases/simple_roundtrip_tests_example_based.py @@ -0,0 +1,371 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_serializer_roundtrip.py.""" + +from tests.syntax_serializer_roundtrip_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SIMPLE ROUNDTRIP TESTS (Example-Based) +# ============================================================================ + + +def test_roundtrip_simple_message(): + """Round-trip a simple message with text only.""" + # Create AST + msg = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello, World!"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Serialize and parse back + serialized = serialize(resource) + reparsed = parse(serialized) + + # Should be structurally identical + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + assert reparsed.entries[0].id.name == "hello" + + +def test_roundtrip_message_with_variable(): + """Round-trip a message with variable interpolation.""" + msg = Message( + id=Identifier(name="greeting"), + value=Pattern( + elements=( + TextElement(value="Hello, "), + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value="!"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + assert reparsed.entries[0].id.name == "greeting" + # Verify pattern has 3 elements + pattern = reparsed.entries[0].value + assert pattern is not None + assert len(pattern.elements) == 3 + + +def test_roundtrip_select_expression(): + """Round-trip a message with select expression (plurals).""" + msg = Message( + id=Identifier(name="emails"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="one email"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern( + elements=( + Placeable( + expression=VariableReference( + id=Identifier(name="count") + ) + ), + TextElement(value=" emails"), + ) + ), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + + +def test_roundtrip_numeric_variant(): + """Round-trip select expression with numeric variant keys.""" + msg = Message( + id=Identifier(name="items"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="no items"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one item"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + msg_parsed = reparsed.entries[0] + assert isinstance(msg_parsed, Message) + assert msg_parsed.id.name == "items" + + +def test_roundtrip_comment(): + """Round-trip standalone comment. + + NOTE: Parser does not currently support standalone comments - they are + silently ignored during parsing. This test documents the limitation. + When parser support is added, this test should pass. + """ + comment = Comment(content=" This is a comment", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + + serialized = serialize(resource) + # Serializer correctly outputs: "# This is a comment\n" + assert serialized == "# This is a comment\n" + + # Per Fluent spec: Comments are preserved in AST + reparsed = parse(serialized) + + # Spec-conformant behavior: Comments are preserved + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Comment) + assert reparsed.entries[0].content == comment.content + + +def test_roundtrip_junk(): + """Round-trip junk (invalid syntax preserved).""" + junk = Junk(content="invalid syntax here {") + resource = Resource(entries=(junk,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Junk gets reparsed as junk + assert len(reparsed.entries) >= 1 + # At least one entry should be junk + assert any(isinstance(e, Junk) for e in reparsed.entries) + + +def test_roundtrip_junk_with_leading_whitespace(): + """Round-trip junk with leading whitespace without redundant newlines. + + Tests that the serializer does not add redundant separators before Junk + entries when the Junk content already includes leading whitespace. + The parser includes preceding whitespace in Junk.content for containment. + """ + # Parse FTL with message followed by blank lines and indented junk + source = "msg = hello\n\n bad" + resource = parse(source) + + # Serialize and re-parse + serialized = serialize(resource) + reparsed = parse(serialized) + + # Verify file doesn't grow on multiple roundtrips (key invariant) + serialized2 = serialize(reparsed) + assert len(serialized2) == len(serialized), ( + "File size should remain stable across roundtrips (no whitespace inflation)" + ) + + # Verify multiple roundtrips converge to stable output + serialized3 = serialize(parse(serialized2)) + assert serialized3 == serialized2, ( + "Serialization should be idempotent after first roundtrip" + ) + + +def test_roundtrip_multiple_messages(): + """Round-trip resource with multiple messages.""" + msg1 = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello!"),)), + attributes=(), + ) + msg2 = Message( + id=Identifier(name="goodbye"), + value=Pattern(elements=(TextElement(value="Goodbye!"),)), + attributes=(), + ) + resource = Resource(entries=(msg1, msg2)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Should have at least 2 messages + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) >= 2 + + +def test_roundtrip_mixed_entries(): + """Round-trip resource with messages and standalone comments. + + When Comments appear as separate entries in the AST (not as message.comment), + they are standalone comments and should remain standalone after roundtrip. + The serializer preserves this by adding 2 blank lines between a standalone + comment and the following message/term. + """ + entries = ( + Comment(content=" Header comment", type=CommentType.COMMENT), + Message( + id=Identifier(name="app-name"), + value=Pattern(elements=(TextElement(value="MyApp"),)), + attributes=(), + ), + Comment(content=" Another comment", type=CommentType.COMMENT), + Message( + id=Identifier(name="version"), + value=Pattern(elements=(TextElement(value="1.0.0"),)), + attributes=(), + ), + ) + resource = Resource(entries=entries) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Standalone comments remain standalone after roundtrip + standalone_comments = [e for e in reparsed.entries if isinstance(e, Comment)] + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(standalone_comments) == 2 # Comments remain standalone + assert len(messages) == 2 # Messages survive roundtrip + + # Messages should NOT have attached comments (comments are standalone) + assert messages[0].comment is None + assert messages[1].comment is None + + # Comment content is preserved + assert "Header comment" in standalone_comments[0].content + assert "Another comment" in standalone_comments[1].content + + +def test_roundtrip_attached_comments(): + """Round-trip resource with attached comments. + + When Comments are set as message.comment (not as separate entries), + they are attached comments and should remain attached after roundtrip. + """ + entries = ( + Message( + id=Identifier(name="app-name"), + value=Pattern(elements=(TextElement(value="MyApp"),)), + attributes=(), + comment=Comment(content=" Attached to app-name", type=CommentType.COMMENT), + ), + Message( + id=Identifier(name="version"), + value=Pattern(elements=(TextElement(value="1.0.0"),)), + attributes=(), + comment=Comment(content=" Attached to version", type=CommentType.COMMENT), + ), + ) + resource = Resource(entries=entries) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # No standalone comments - all attached + standalone_comments = [e for e in reparsed.entries if isinstance(e, Comment)] + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(standalone_comments) == 0 # No standalone comments + assert len(messages) == 2 # Messages survive roundtrip + + # Comments remain attached to their messages + assert messages[0].comment is not None + assert "Attached to app-name" in messages[0].comment.content + assert messages[1].comment is not None + assert "Attached to version" in messages[1].comment.content + + +def test_roundtrip_empty_resource(): + """Round-trip empty resource.""" + resource = Resource(entries=()) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 0 + + +def test_roundtrip_message_with_only_placeable(): + """Round-trip message with only a placeable (no text).""" + msg = Message( + id=Identifier(name="count"), + value=Pattern( + elements=(Placeable(expression=VariableReference(id=Identifier(name="num"))),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + assert len(reparsed.entries) == 1 + assert isinstance(reparsed.entries[0], Message) + + +def test_roundtrip_complex_pattern(): + """Round-trip message with complex pattern (text + variables). + + NOTE: Parser creates spurious Junk entry for trailing period. + This is a parser quirk - the message itself parses correctly. + """ + msg = Message( + id=Identifier(name="user-info"), + value=Pattern( + elements=( + TextElement(value="User "), + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value=" has "), + Placeable(expression=VariableReference(id=Identifier(name="count"))), + TextElement(value=" items"), # Removed trailing period to avoid parser quirk + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Message parses correctly (ignore spurious Junk entries) + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + msg_parsed = messages[0] + assert isinstance(msg_parsed, Message) + assert msg_parsed.value is not None + assert len(msg_parsed.value.elements) == 5 diff --git a/tests/syntax_serializer_roundtrip_cases/whitespace_preservation_roundtrip_tests.py b/tests/syntax_serializer_roundtrip_cases/whitespace_preservation_roundtrip_tests.py new file mode 100644 index 00000000..164364ea --- /dev/null +++ b/tests/syntax_serializer_roundtrip_cases/whitespace_preservation_roundtrip_tests.py @@ -0,0 +1,289 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_serializer_roundtrip.py.""" + +from tests.syntax_serializer_roundtrip_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# WHITESPACE PRESERVATION ROUNDTRIP TESTS +# ============================================================================ + + +def test_roundtrip_multiline_leading_whitespace(): + """Round-trip preserves leading whitespace after newlines. + + Tests fix for IMPL-SERIALIZER-ROUNDTRIP-CORRUPTION-001: when TextElement + with leading whitespace follows element ending with newline, serializer + must emit pattern on separate line to preserve the whitespace semantically. + """ + # Pattern: "Line 1\n Line 2" (2 leading spaces on line 2) + msg = Message( + id=Identifier(name="code-block"), + value=Pattern( + elements=( + TextElement(value="Line 1\n"), + TextElement(value=" Line 2"), # 2-space indent + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + # Extract reparsed pattern content + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + # Reconstruct the pattern content from elements + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + assert "Line 1\n" in content + assert " Line 2" in content # 2 spaces preserved + + +def test_roundtrip_code_example_indent(): + """Round-trip preserves code example indentation. + + Tests common use case of embedding code examples in localization strings. + """ + # Multi-line code example with indentation + msg = Message( + id=Identifier(name="code-example"), + value=Pattern( + elements=( + TextElement(value="Example:\n"), + TextElement(value=" def hello():\n"), + TextElement(value=" print('Hi')"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + # Verify indentation preserved + assert " def hello():" in content + assert " print('Hi')" in content + + +def test_roundtrip_whitespace_idempotent(): + """Multiple roundtrips produce identical output (idempotency). + + Tests that whitespace handling doesn't cause drift across roundtrips. + """ + msg = Message( + id=Identifier(name="formatted"), + value=Pattern( + elements=( + TextElement(value="Header:\n"), + TextElement(value=" Item 1\n"), + TextElement(value=" Item 2"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # First roundtrip + serialized1 = serialize(resource) + reparsed1 = parse(serialized1) + + # Second roundtrip + serialized2 = serialize(reparsed1) + reparsed2 = parse(serialized2) + + # Third roundtrip + serialized3 = serialize(reparsed2) + + # Output should stabilize after first roundtrip + assert serialized2 == serialized3, "Serialization should be idempotent" + + +def test_roundtrip_mixed_whitespace_and_placeables(): + """Round-trip preserves whitespace with interleaved placeables.""" + msg = Message( + id=Identifier(name="mixed"), + value=Pattern( + elements=( + TextElement(value="Results for "), + Placeable(expression=VariableReference(id=Identifier(name="query"))), + TextElement(value=":\n"), + TextElement(value=" - First result\n"), + TextElement(value=" - "), + Placeable(expression=VariableReference(id=Identifier(name="count"))), + TextElement(value=" more"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + # Verify structure preserved - should have TextElements with whitespace + text_elements = [e for e in pattern.elements if isinstance(e, TextElement)] + text_content = "".join(e.value for e in text_elements) + + # Check whitespace preservation + assert ":\n" in text_content + assert " - First result\n" in text_content or " -" in text_content + + +def test_roundtrip_tab_indentation(): + """Round-trip preserves tab indentation.""" + msg = Message( + id=Identifier(name="tabbed"), + value=Pattern( + elements=( + TextElement(value="Data:\n"), + TextElement(value="\tColumn 1\n"), + TextElement(value="\t\tNested"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + serialized = serialize(resource) + reparsed = parse(serialized) + + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + assert "\tColumn 1" in content + assert "\t\tNested" in content + + +def test_roundtrip_preserves_parsed_whitespace(): + """Parse and serialize preserves original whitespace from FTL source. + + Tests the full cycle: FTL source -> parse -> serialize -> parse -> serialize + """ + # FTL with intentional indentation + source = """\ +code-snippet = + Example code: + if True: + print("hello") +""" + parsed = parse(source) + serialized = serialize(parsed) + reparsed = parse(serialized) + serialized2 = serialize(reparsed) + + # Should stabilize + assert serialized == serialized2, "Roundtrip should be stable" + + # Verify semantic content preserved + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 1 + pattern = messages[0].value + assert pattern is not None + + content = "".join( + elem.value for elem in pattern.elements if isinstance(elem, TextElement) + ) + # Original indentation relationships should be preserved + assert "Example code:" in content + assert "print(" in content + + +def test_roundtrip_compact_messages_no_blank_lines(): + """Roundtrip of compact messages preserves no-blank-line format. + + Tests the fix for NAME-SERIALIZER-SPACING-001 where serializer was adding + redundant newlines between Message/Term entries. + """ + # Compact FTL with no blank lines between messages + source = "msg1 = First\nmsg2 = Second\nmsg3 = Third" + + parsed = parse(source) + serialized = serialize(parsed) + + # Serialized output should maintain compact format (no blank lines) + assert serialized == "msg1 = First\nmsg2 = Second\nmsg3 = Third\n" + + # Verify roundtrip preserves structure + reparsed = parse(serialized) + assert len(reparsed.entries) == 3 + messages = [e for e in reparsed.entries if isinstance(e, Message)] + assert len(messages) == 3 + assert messages[0].id.name == "msg1" + assert messages[1].id.name == "msg2" + assert messages[2].id.name == "msg3" + + +def test_comment_message_separation_preserved(): + """Comment->Message still gets blank line to prevent attachment. + + Tests the fix for NAME-SERIALIZER-SPACING-001 ensures Comment separation + logic is preserved (blank lines prevent comment attachment on re-parse). + """ + # Standalone comment followed by message (with blank line) + source = "# Standalone comment\n\nmsg = Value" + + parsed = parse(source) + serialized = serialize(parsed) + + # Should preserve blank line between comment and message + # The blank line prevents the comment from being attached to the message + assert "\n\n" in serialized + + # Verify roundtrip: comment should remain standalone + reparsed = parse(serialized) + comments = [e for e in reparsed.entries if isinstance(e, Comment)] + messages = [e for e in reparsed.entries if isinstance(e, Message)] + + assert len(comments) == 1 + assert len(messages) == 1 + # Message should NOT have an attached comment + assert messages[0].comment is None + + +def test_roundtrip_mixed_spacing_preserved(): + """Mixed spacing patterns are preserved during roundtrip.""" + # Mix of compact messages and separated entries + source = "msg1 = First\nmsg2 = Second\n\n# Comment\n\nmsg3 = Third" + + parsed = parse(source) + serialized = serialize(parsed) + reparsed = parse(serialized) + + # Should have 3 messages and 1 comment + messages = [e for e in reparsed.entries if isinstance(e, Message)] + comments = [e for e in reparsed.entries if isinstance(e, Comment)] + + assert len(messages) == 3 + assert len(comments) == 1 + + # First two messages should be compact (consecutive) + # Comment should be standalone (not attached) + # Third message should be after comment + assert messages[0].id.name == "msg1" + assert messages[1].id.name == "msg2" + assert messages[2].id.name == "msg3" diff --git a/tests/syntax_validator_cases/__init__.py b/tests/syntax_validator_cases/__init__.py new file mode 100644 index 00000000..0d5ff8bb --- /dev/null +++ b/tests/syntax_validator_cases/__init__.py @@ -0,0 +1,52 @@ +"""Tests for syntax.validator: SemanticValidator, validate(), semantic correctness per spec.""" + +from __future__ import annotations + +from decimal import Decimal + +import pytest + +from ftllexengine import FluentBundle +from ftllexengine.core.depth_guard import DepthGuard +from ftllexengine.diagnostics import ValidationResult +from ftllexengine.diagnostics.codes import DiagnosticCode +from ftllexengine.enums import CommentType +from ftllexengine.introspection import FunctionCallInfo, introspect_message +from ftllexengine.syntax.ast import ( + Annotation, + Attribute, + CallArguments, + Comment, + FunctionReference, + Identifier, + Junk, + Message, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + Span, + Term, + TermReference, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.parser import FluentParserV1 +from ftllexengine.syntax.validator import ( + _VALIDATION_MESSAGES, + SemanticValidator, + validate, +) + +__all__ = [ + "_VALIDATION_MESSAGES", "Annotation", "Attribute", "CallArguments", "Comment", + "CommentType", "Decimal", "DepthGuard", "DiagnosticCode", "FluentBundle", + "FluentParserV1", "FunctionCallInfo", "FunctionReference", "Identifier", "Junk", + "Message", "NamedArgument", "NumberLiteral", "Pattern", "Placeable", "Resource", + "SelectExpression", "SemanticValidator", "Span", "Term", "TermReference", + "TextElement", "ValidationResult", "VariableReference", "Variant", + "introspect_message", "pytest", "validate", +] diff --git a/tests/syntax_validator_cases/entries.py b/tests/syntax_validator_cases/entries.py new file mode 100644 index 00000000..37d7a6bc --- /dev/null +++ b/tests/syntax_validator_cases/entries.py @@ -0,0 +1,617 @@ +# mypy: ignore-errors +from tests.syntax_validator_cases import ( + Annotation, + Attribute, + CallArguments, + Comment, + CommentType, + DepthGuard, + FluentParserV1, + FunctionReference, + Identifier, + Junk, + Message, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SemanticValidator, + Span, + Term, + TermReference, + TextElement, + ValidationResult, + VariableReference, + pytest, + validate, +) + + +class TestMessageValidation: + """Test message entry validation.""" + + def test_message_with_value_and_attributes(self) -> None: + """Message with value and attributes validates correctly.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = Hello World + .attr1 = Attribute 1 + .attr2 = Attribute 2 +""") + result = validate(resource) + assert result.is_valid + + def test_message_with_only_attributes_no_value(self) -> None: + """Message with no value, only attributes (valid per Fluent spec). + + Tests line 171->175 branch when message.value is None. + """ + parser = FluentParserV1() + resource = parser.parse(""" +msg = + .attr1 = Attribute value + .attr2 = Another attribute +""") + result = validate(resource) + assert result.is_valid + assert len(result.annotations) == 0 + + def test_message_with_plain_text_only(self) -> None: + """Message with plain text value validates.""" + parser = FluentParserV1() + resource = parser.parse("msg = Plain text value") + result = validate(resource) + assert result.is_valid + + def test_message_with_placeables(self) -> None: + """Message with variable references validates. + + Tests line 171-172 (message.value exists branch). + """ + parser = FluentParserV1() + resource = parser.parse("msg = Hello { $name }, you have { $count } messages") + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result.is_valid + + def test_message_with_value_explicit_validation_path(self) -> None: + """Message with value takes the validation path. + + Explicitly tests line 171->172 branch (if message.value: path). + """ + # Create message with explicit value pattern + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Has value"),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result.is_valid + + def test_message_without_value_explicit_validation_path(self) -> None: + """Message without value skips value validation. + + Explicitly tests line 171->175 branch (when message.value is None). + """ + # Create message with no value (only attributes) + message = Message( + id=Identifier(name="test"), + value=None, + attributes=( + Attribute( + id=Identifier(name="attr"), + value=Pattern(elements=(TextElement(value="Attribute value"),)), + ), + ), + ) + resource = Resource(entries=(message,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result.is_valid + + +class TestTermValidation: + """Test term entry validation.""" + + def test_term_with_value_validates(self) -> None: + """Term with value is valid per Fluent spec.""" + parser = FluentParserV1() + resource = parser.parse("-brand = Firefox") + result = validate(resource) + assert result.is_valid + + def test_term_with_value_and_attributes(self) -> None: + """Term with value and attributes validates. + + Tests line 202 - term attribute validation. + """ + parser = FluentParserV1() + resource = parser.parse(""" +-brand = Firefox + .short = FX + .long = Mozilla Firefox +""") + result = validate(resource) + assert result.is_valid + + def test_term_without_value_constructor_validation(self) -> None: + """Term without value raises ValueError at construction. + + The AST enforces that terms must have values. + Tests the invariant that validator assumes terms always have values. + """ + with pytest.raises(ValueError, match="Term must have a value pattern"): + Term( + id=Identifier(name="test"), + value=None, # type: ignore[arg-type] # Invalid per spec + attributes=(), + span=Span(start=0, end=10), + ) + + def test_term_without_value_validator_defensive_check(self) -> None: + """Validator defensively checks for term without value. + + Tests lines 188-195 (defensive validation even though AST prevents it). + This tests the validator's defensive programming - if AST validation + is ever bypassed, validator should still catch the error. + """ + # Create a Term object bypassing __post_init__ validation + # This is defensive testing - ensures validator catches errors + # even if AST validation fails + term = object.__new__(Term) + object.__setattr__(term, "id", Identifier(name="broken")) + object.__setattr__(term, "value", None) # Invalid per spec + object.__setattr__(term, "attributes", ()) + object.__setattr__(term, "span", Span(start=0, end=10)) + + resource = Resource(entries=(term,)) + validator = SemanticValidator() + result = validator.validate(resource) + + # Validator should catch the missing value + assert not result.is_valid + errors = [a for a in result.annotations if "TERM_NO_VALUE" in a.code] + assert len(errors) > 0 + + +class TestCommentAndJunkValidation: + """Test Comment and Junk entry handling.""" + + def test_comment_entries_pass_validation(self) -> None: + """Comments require no validation and pass through. + + Tests line 156-157 (Comment case in _validate_entry). + """ + comment = Comment(content="# Test comment", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + result = validate(resource) + assert result.is_valid + assert len(result.annotations) == 0 + + def test_junk_entries_pass_validation(self) -> None: + """Junk already represents parse errors, no further validation needed. + + Tests line 158-159 and 158->exit (Junk case in _validate_entry). + """ + junk = Junk(content="invalid syntax", annotations=()) + resource = Resource(entries=(junk,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + # Validator doesn't add errors for junk (already invalid at parse level) + assert result.is_valid + assert len(result.annotations) == 0 + + def test_resource_with_junk_from_parser(self) -> None: + """Parser-generated junk entries are handled correctly.""" + parser = FluentParserV1() + # Invalid FTL syntax produces Junk entries + resource = parser.parse("msg = { invalid syntax here }") + result = validate(resource) + # Validator doesn't crash on junk + assert isinstance(result, ValidationResult) + + def test_multiple_junk_entries_in_resource(self) -> None: + """Multiple junk entries all pass through validator. + + Ensures Junk case exit path is exercised. + """ + junk1 = Junk(content="bad syntax 1", annotations=()) + junk2 = Junk(content="bad syntax 2", annotations=()) + junk3 = Junk(content="bad syntax 3", annotations=()) + + resource = Resource(entries=(junk1, junk2, junk3)) + validator = SemanticValidator() + result = validator.validate(resource) + + # All junk entries pass through without adding validation errors + assert result.is_valid + + def test_junk_entry_isolated_validation(self) -> None: + """Single junk entry validates in isolation. + + Explicitly tests line 158-159 Junk case and exit path. + This test isolates the Junk validation path to ensure + branch coverage tools detect the 158->exit path. + """ + # Create a Junk entry + junk = Junk(content="isolated junk", annotations=()) + + # Validate with fresh validator instance + validator = SemanticValidator() + errors: list[Annotation] = [] + depth_guard = DepthGuard(max_depth=100) + + # Call _validate_entry directly to ensure this specific path is measured + validator._validate_entry(junk, errors, depth_guard) + + # Junk should not add any validation errors + assert len(errors) == 0 + + +class TestEmptyResourceValidation: + """Test empty resource boundary condition.""" + + def test_empty_resource_is_valid(self) -> None: + """Empty resource (no entries) is valid.""" + resource = Resource(entries=()) + result = validate(resource) + assert result.is_valid + assert len(result.annotations) == 0 + + +# ============================================================================ +# PATTERN ELEMENT VALIDATION TESTS +# ============================================================================ + + +class TestTextElementValidation: + """Test TextElement validation.""" + + def test_text_elements_require_no_validation(self) -> None: + """Plain text elements need no validation. + + Tests line 245-246 and 247->exit (TextElement case in _validate_pattern_element). + """ + parser = FluentParserV1() + resource = parser.parse("msg = Plain text without any placeables") + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result.is_valid + + def test_text_with_special_characters(self) -> None: + """Text elements with special characters validate.""" + parser = FluentParserV1() + resource = parser.parse(r"msg = Text with special: !@#$%^&*()_+-=[]|;',./<>?") + result = validate(resource) + assert isinstance(result, ValidationResult) + + def test_text_element_explicit_validation_path(self) -> None: + """Text element explicitly exercises validation path. + + Ensures TextElement case and exit path (line 247->exit) are covered. + """ + # Create message with explicit TextElement + text_elem = TextElement(value="Explicit text element") + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(text_elem,)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + # TextElement requires no validation, should be valid + assert result.is_valid + + def test_multiple_text_elements_in_pattern(self) -> None: + """Pattern with multiple TextElements validates. + + Multiple invocations of TextElement path. + """ + text1 = TextElement(value="First ") + text2 = TextElement(value="Second ") + text3 = TextElement(value="Third") + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(text1, text2, text3)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result.is_valid + + def test_text_element_isolated_validation(self) -> None: + """Single TextElement validates in isolation. + + Explicitly tests line 245-246 TextElement case and exit path. + This test isolates the TextElement validation path to ensure + branch coverage tools detect the 247->exit path. + """ + # Create TextElement + text_elem = TextElement(value="isolated text") + + # Validate with fresh validator instance + validator = SemanticValidator() + errors: list[Annotation] = [] + depth_guard = DepthGuard(max_depth=100) + + # Call _validate_pattern_element directly to ensure this specific path is measured + validator._validate_pattern_element(text_elem, errors, "test", depth_guard) + + # TextElement should not add any validation errors + assert len(errors) == 0 + + def test_junk_entry_isolated_direct_call(self) -> None: + """Junk entry validated through direct method call. + + Alternative approach to ensure 158->exit branch is covered. + """ + junk = Junk(content="direct call junk", annotations=()) + + validator = SemanticValidator() + errors: list[Annotation] = [] + depth_guard = DepthGuard(max_depth=100) + + # Direct call to _validate_entry with Junk + validator._validate_entry(junk, errors, depth_guard) + + assert len(errors) == 0 + + +class TestPlaceableValidation: + """Test Placeable validation including nested cases.""" + + def test_placeable_with_variable_reference(self) -> None: + """Placeable containing variable reference validates.""" + parser = FluentParserV1() + resource = parser.parse("msg = Hello { $name }") + result = validate(resource) + assert result.is_valid + + def test_nested_placeables(self) -> None: + """Nested placeables validate recursively. + + Tests lines 293-294 (Placeable as inline expression). + """ + # Manually construct nested placeables + inner = Placeable(expression=VariableReference(id=Identifier(name="x"))) + outer = Placeable(expression=inner) + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(outer,)), + attributes=(), + ) + resource = Resource(entries=(message,)) + result = validate(resource) + assert result.is_valid + + +# ============================================================================ +# INLINE EXPRESSION VALIDATION TESTS +# ============================================================================ + + +class TestStringAndNumberLiteralValidation: + """Test literal value validation.""" + + def test_string_literal_always_valid(self) -> None: + """String literals require no validation.""" + parser = FluentParserV1() + resource = parser.parse('msg = { "Hello" }') + result = validate(resource) + assert result.is_valid + + def test_number_literal_always_valid(self) -> None: + """Number literals require no validation.""" + parser = FluentParserV1() + resource = parser.parse("msg = { 42 }") + result = validate(resource) + assert result.is_valid + + +class TestVariableReferenceValidation: + """Test variable reference validation.""" + + def test_variable_reference_always_valid(self) -> None: + """Variable references require no semantic validation.""" + parser = FluentParserV1() + resource = parser.parse("msg = { $var }") + result = validate(resource) + assert result.is_valid + + +class TestMessageReferenceValidation: + """Test message reference validation.""" + + def test_message_reference_validates(self) -> None: + """Message references are always valid semantically. + + Tests line 287 (MessageReference case in _validate_inline_expression). + Message references cannot have arguments (enforced by grammar). + """ + parser = FluentParserV1() + resource = parser.parse("msg = { other-msg }") + result = validate(resource) + assert result.is_valid + + def test_message_reference_with_attribute(self) -> None: + """Message reference with attribute access validates.""" + parser = FluentParserV1() + resource = parser.parse("msg = { other-msg.attr }") + result = validate(resource) + assert result.is_valid + + +class TestTermReferenceValidation: + """Test term reference validation.""" + + def test_term_reference_without_arguments(self) -> None: + """Term reference without arguments validates.""" + parser = FluentParserV1() + resource = parser.parse("msg = { -brand }") + result = validate(resource) + assert result.is_valid + + def test_term_reference_with_named_arguments(self) -> None: + """Term reference with named arguments validates.""" + parser = FluentParserV1() + resource = parser.parse('msg = { -brand(case: "nominative") }') + result = validate(resource) + assert result.is_valid + + def test_term_reference_with_positional_arguments_warns(self) -> None: + """Term reference with positional arguments emits warning. + + Tests lines 310-324 (_validate_term_reference with positional args). + Per Fluent spec, positional args to terms are ignored at runtime. + """ + # Manually construct term reference with positional args + args = CallArguments( + positional=(NumberLiteral(value=1, raw="1"),), + named=(), + ) + term_ref = TermReference( + id=Identifier(name="brand"), + arguments=args, + attribute=None, + ) + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=term_ref),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + result = validate(resource) + + # Should emit warning about positional args being ignored + assert not result.is_valid + warnings = [a for a in result.annotations if "positional arguments" in a.message.lower()] + assert len(warnings) > 0 + + def test_term_reference_with_attribute_and_arguments(self) -> None: + """Term reference with attribute access and arguments validates.""" + parser = FluentParserV1() + resource = parser.parse('msg = { -brand.short(case: "genitive") }') + result = validate(resource) + assert result.is_valid + + +class TestFunctionReferenceValidation: + """Test function reference validation.""" + + def test_function_reference_without_arguments(self) -> None: + """Function reference without arguments validates.""" + # Manually construct function call without arguments + func_ref = FunctionReference( + id=Identifier(name="BUILTIN"), + arguments=CallArguments(positional=(), named=()), + ) + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + result = validate(resource) + assert result.is_valid + + def test_function_reference_with_positional_arguments(self) -> None: + """Function reference with positional arguments validates. + + Tests lines 365-366 (positional arg validation in _validate_call_arguments). + """ + parser = FluentParserV1() + resource = parser.parse("msg = { NUMBER($count) }") + result = validate(resource) + assert result.is_valid + + def test_function_reference_with_named_arguments(self) -> None: + """Function reference with named arguments validates.""" + parser = FluentParserV1() + resource = parser.parse("msg = { NUMBER($count, minimumFractionDigits: 2) }") + result = validate(resource) + assert result.is_valid + + +# ============================================================================ +# CALL ARGUMENTS VALIDATION TESTS +# ============================================================================ + + +class TestCallArgumentsValidation: + """Test call arguments validation.""" + + def test_duplicate_named_arguments_invalid(self) -> None: + """Function call with duplicate named arguments is invalid. + + Tests duplicate detection in _validate_call_arguments. + """ + # Manually construct function with duplicate named args + args = CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="option"), + value=NumberLiteral(value=1, raw="1"), + ), + NamedArgument( + name=Identifier(name="option"), # Duplicate! + value=NumberLiteral(value=2, raw="2"), + ), + ), + ) + func_ref = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=args, + ) + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + result = validate(resource) + + # Should detect duplicate named argument + assert not result.is_valid + errors = [a for a in result.annotations if "DUPLICATE" in a.code] + assert len(errors) > 0 + + def test_mixed_positional_and_named_arguments(self) -> None: + """Function with both positional and named arguments validates.""" + parser = FluentParserV1() + resource = parser.parse("msg = { NUMBER($val, minimumFractionDigits: 2) }") + result = validate(resource) + assert result.is_valid + + def test_nested_expressions_in_arguments(self) -> None: + """Nested expressions in arguments validate recursively.""" + parser = FluentParserV1() + resource = parser.parse("msg = { NUMBER({ $count }) }") + result = validate(resource) + assert result.is_valid + + +# ============================================================================ +# SELECT EXPRESSION VALIDATION TESTS +# ============================================================================ + diff --git a/tests/syntax_validator_cases/high_level.py b/tests/syntax_validator_cases/high_level.py new file mode 100644 index 00000000..d1ef2167 --- /dev/null +++ b/tests/syntax_validator_cases/high_level.py @@ -0,0 +1,460 @@ +# mypy: ignore-errors +from tests.syntax_validator_cases import ( + _VALIDATION_MESSAGES, + DiagnosticCode, + FluentBundle, + FluentParserV1, + Junk, + SemanticValidator, + validate, +) + + +class TestMessageValidationHighLevel: + """Test message validation rules.""" + + def test_valid_simple_message(self): + """Simple message should be valid.""" + parser = FluentParserV1() + resource = parser.parse("hello = Hello, world!") + + result = validate(resource) + assert result.is_valid + assert len(result.annotations) == 0 + + def test_valid_message_with_variable(self): + """Message with variable should be valid.""" + parser = FluentParserV1() + resource = parser.parse("welcome = Welcome, { $name }!") + + result = validate(resource) + assert result.is_valid + + def test_valid_message_with_attribute(self): + """Message with attribute should be valid.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = Value + .tooltip = Tooltip text +""") + + result = validate(resource) + assert result.is_valid + + def test_valid_message_reference(self): + """Message referencing another message should be valid.""" + parser = FluentParserV1() + resource = parser.parse("msg = { other-msg }") + + result = validate(resource) + assert result.is_valid + + def test_valid_message_reference_with_attribute(self): + """Message.attr reference should be valid.""" + parser = FluentParserV1() + resource = parser.parse("msg = { other.attr }") + + result = validate(resource) + assert result.is_valid + + +class TestTermValidationHighLevel: + """Test term validation rules.""" + + def test_valid_simple_term(self): + """Simple term should be valid.""" + parser = FluentParserV1() + resource = parser.parse("-brand = Firefox") + + result = validate(resource) + assert result.is_valid + + def test_valid_term_with_attribute(self): + """Term with attribute should be valid.""" + parser = FluentParserV1() + resource = parser.parse(""" +-brand = Firefox + .gender = masculine +""") + + result = validate(resource) + assert result.is_valid + + def test_valid_term_reference_with_arguments(self): + """Term reference with call arguments should be valid.""" + parser = FluentParserV1() + # Note: This tests that if the parser creates a TermReference with arguments, + # the validator accepts it + resource = parser.parse("msg = { -term() }") + + result = validate(resource) + # Should be valid - terms can be parameterized + assert result.is_valid + + +class TestSelectExpressionValidationHighLevel: + """Test select expression validation rules.""" + + def test_valid_select_with_default(self): + """Select with default variant should be valid.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $count -> + [one] One item + *[other] Many items +} +""") + + result = validate(resource) + assert result.is_valid + + def test_valid_select_multiple_variants(self): + """Select with multiple non-default variants should be valid.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $count -> + [zero] No items + [one] One item + [two] Two items + *[other] Many items +} +""") + + result = validate(resource) + assert result.is_valid + + def test_invalid_select_no_default(self): + """Parser rejects select without default variant (syntactic validation). + + Note: This is now a parser-level validation, not semantic validation. + The parser creates Junk for select expressions without default variants + per FTL spec requirements, so semantic validation never sees them. + + This test verifies the parser correctly enforces this rule. + """ + parser = FluentParserV1() + # Try to parse select without default + resource = parser.parse(""" +msg = { $count -> + [one] One item + [two] Two items +} +""") + + # Parser should create Junk (syntactic error) + assert len(resource.entries) >= 1 + assert isinstance(resource.entries[0], Junk) + + # Verify error annotation exists + junk = resource.entries[0] + assert len(junk.annotations) > 0 + # Generic error message (detailed info removed) + + def test_invalid_duplicate_variant_keys(self): + """Select with duplicate variant keys should be invalid.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $count -> + [one] First one + [one] Second one (duplicate) + *[other] Many +} +""") + + result = validate(resource) + + # Should detect duplicate keys + if not result.is_valid: + assert any("VALIDATION_VARIANT_DUPLICATE" in ann.code for ann in result.annotations) + else: + # Parser might have deduped, which is also acceptable + pass + + def test_high_precision_numeric_variants_not_false_duplicate(self): + """High-precision numeric variant keys are treated as distinct. + + Regression test for SEM-VALIDATOR-PRECISION-001. + Validator should use NumberLiteral.raw (original string) for comparison, + not NumberLiteral.value (Decimal), to preserve precision. + This matches resolver behavior. + """ + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $x -> + [0.10000000000000001] precise + [0.1] rounded + *[other] default +} +""") + + result = validate(resource) + + # These keys should NOT be treated as duplicates because they have + # different source representations even though their numeric values are + # close. The validator should accept this as valid FTL. + assert result.is_valid + + +class TestFunctionValidationHighLevel: + """Test function reference validation rules.""" + + def test_valid_function_no_args(self): + """Function with no arguments should be valid.""" + parser = FluentParserV1() + resource = parser.parse("msg = { FUNC() }") + + result = validate(resource) + assert result.is_valid + + def test_valid_function_positional_args(self): + """Function with positional arguments should be valid.""" + parser = FluentParserV1() + resource = parser.parse("msg = { NUMBER($count) }") + + result = validate(resource) + assert result.is_valid + + def test_valid_function_named_args(self): + """Function with named arguments should be valid.""" + parser = FluentParserV1() + resource = parser.parse("msg = { NUMBER($count, minimumFractionDigits: 2) }") + + result = validate(resource) + assert result.is_valid + + def test_valid_function_mixed_args(self): + """Function with positional and named arguments should be valid.""" + parser = FluentParserV1() + resource = parser.parse('msg = { DATETIME($date, hour: "numeric", minute: "numeric") }') + + result = validate(resource) + assert result.is_valid + + def test_invalid_duplicate_named_args(self): + """Function with duplicate named arguments should be invalid.""" + parser = FluentParserV1() + resource = parser.parse("msg = { FUNC(x: 1, x: 2) }") + + result = validate(resource) + + # Should detect duplicate named arguments + if not result.is_valid: + assert any("E0010" in ann.code for ann in result.annotations) + + +class TestRealWorldScenarios: + """Test validation on real-world FTL patterns.""" + + def test_complex_message_with_select(self): + """Complex message with select should validate.""" + parser = FluentParserV1() + resource = parser.parse(""" +emails = { $unreadEmails -> + [one] You have one unread email + *[other] You have { $unreadEmails } unread emails +} +""") + + result = validate(resource) + assert result.is_valid + + def test_message_with_multiple_placeables(self): + """Message with multiple placeables should validate.""" + parser = FluentParserV1() + resource = parser.parse("msg = Hello { $firstName } { $lastName }!") + + result = validate(resource) + assert result.is_valid + + def test_nested_select_expressions(self): + """Nested select expressions should validate.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $gender -> + [male] { $count -> + [one] He has one item + *[other] He has { $count } items + } + *[female] { $count -> + [one] She has one item + *[other] She has { $count } items + } +} +""") + + result = validate(resource) + assert result.is_valid + + def test_term_reference_in_message(self): + """Term reference in message should validate.""" + parser = FluentParserV1() + resource = parser.parse(""" +-brand = Firefox +welcome = Welcome to { -brand }! +""") + + result = validate(resource) + assert result.is_valid + + def test_message_with_function_and_select(self): + """Message combining function call and select should validate.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = Updated { DATETIME($date, month: "long", year: "numeric") } - { $status -> + [active] Active + *[inactive] Inactive +} +""") + + result = validate(resource) + assert result.is_valid + + +class TestEdgeCases: + """Test edge cases in validation.""" + + def test_comment_only_resource(self): + """Resource with only comments should be valid.""" + parser = FluentParserV1() + resource = parser.parse(""" +# This is a comment +## This is a group comment +### This is a resource comment +""") + + result = validate(resource) + assert result.is_valid + + def test_message_with_only_attributes(self): + """Message with only attributes (no value) should be valid.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = + .attr1 = Value 1 + .attr2 = Value 2 +""") + + result = validate(resource) + # This should be valid per spec + assert result.is_valid + + def test_empty_pattern(self): + """Message with empty value should be valid.""" + parser = FluentParserV1() + resource = parser.parse("msg = ") + + result = validate(resource) + # Empty pattern is syntactically valid + assert result.is_valid + + def test_junk_entries_ignored(self): + """Junk entries should not be validated (already errors).""" + parser = FluentParserV1() + resource = parser.parse(""" +valid = Value +invalid { syntax +also-valid = Another value +""") + + result = validate(resource) + # Should validate the valid entries, ignore junk + assert result.is_valid + + +class TestValidatorState: + """Test validator state management.""" + + def test_validator_reusable(self): + """Validator should be reusable across multiple validations.""" + validator = SemanticValidator() + parser = FluentParserV1() + + resource1 = parser.parse("msg1 = Value 1") + result1 = validator.validate(resource1) + assert result1.is_valid + + resource2 = parser.parse("msg2 = Value 2") + result2 = validator.validate(resource2) + assert result2.is_valid + + # Errors shouldn't accumulate + assert len(result1.annotations) == 0 + assert len(result2.annotations) == 0 + + def test_validate_function_is_stateless(self): + """Module-level validate() function should be stateless.""" + parser = FluentParserV1() + + result1 = validate(parser.parse("msg1 = Value 1")) + result2 = validate(parser.parse("msg2 = Value 2")) + + assert result1.is_valid + assert result2.is_valid + + +class TestValidationErrorCodes: + """Test that error codes are descriptive and consistent.""" + + def test_diagnostic_codes_are_unique(self): + """All validation DiagnosticCode values should be unique.""" + # Get all validation-related codes (5000-5199 range) + validation_codes = [ + code for code in DiagnosticCode + if code.value >= 5000 and code.value < 5200 + ] + values = [code.value for code in validation_codes] + assert len(values) == len(set(values)), "DiagnosticCode values must be unique" + + def test_validation_messages_exist(self): + """All validation codes should have messages in _VALIDATION_MESSAGES.""" + for code, message in _VALIDATION_MESSAGES.items(): + assert isinstance(code, DiagnosticCode), f"{code} should be DiagnosticCode" + assert len(message) > 5, f"Message for {code.name} should be descriptive" + assert message[0].isupper(), f"Message for {code.name} should start with uppercase" + + +class TestAttributeGranularCycleDetection: + """Attribute-granular cycle detection prevents false positives. + + A message referencing its own attribute (msg = { msg.tooltip }) is NOT a cycle. + Only true self-references (msg = { msg }) or cross-message cycles are cyclic. + This distinction prevents spurious warnings for common FTL patterns. + """ + + def test_cross_attribute_reference_not_cyclic(self) -> None: + """Message value referencing its own attribute is not a circular reference.""" + bundle = FluentBundle("en") + ftl = "msg = { msg.tooltip }\n .tooltip = Tooltip text\n" + result = bundle.validate_resource(ftl) + circular_warnings = [w for w in result.warnings if "ircular" in w.message] + assert len(circular_warnings) == 0 + + def test_true_self_reference_detected(self) -> None: + """Message value referencing itself is a circular reference.""" + bundle = FluentBundle("en") + ftl = "msg = { msg }\n" + result = bundle.validate_resource(ftl) + circular_warnings = [w for w in result.warnings if "ircular" in w.message] + assert len(circular_warnings) > 0 + + def test_term_attribute_self_reference_detected(self) -> None: + """Term attribute referencing itself is a circular reference.""" + bundle = FluentBundle("en") + ftl = "-term = Value\n .attr = { -term.attr }\n" + result = bundle.validate_resource(ftl) + circular_warnings = [w for w in result.warnings if "ircular" in w.message] + assert len(circular_warnings) > 0 + + def test_cross_term_cycle_detected(self) -> None: + """Cross-term mutual references produce a circular reference warning.""" + bundle = FluentBundle("en") + ftl = "-a = { -b }\n-b = { -a }\n" + result = bundle.validate_resource(ftl) + circular_warnings = [w for w in result.warnings if "ircular" in w.message] + assert len(circular_warnings) > 0 + + +# ============================================================================ +# VALIDATION EDGE CASES (from test_semantic_validation_edge_cases.py) +# ============================================================================ + diff --git a/tests/syntax_validator_cases/regressions.py b/tests/syntax_validator_cases/regressions.py new file mode 100644 index 00000000..b869e3fe --- /dev/null +++ b/tests/syntax_validator_cases/regressions.py @@ -0,0 +1,590 @@ +# mypy: ignore-errors +from tests.syntax_validator_cases import ( + Attribute, + CallArguments, + Comment, + CommentType, + DiagnosticCode, + FluentBundle, + FluentParserV1, + FunctionCallInfo, + FunctionReference, + Identifier, + Junk, + Message, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + SemanticValidator, + Term, + TextElement, + VariableReference, + Variant, + introspect_message, + pytest, + validate, +) + + +class TestTermPositionalArgsWarning: + """Tests for VAL-TERM-POSITIONAL-ARGS-001 resolution. + + SemanticValidator emits warning when term references include positional + arguments, which are silently ignored at runtime per Fluent spec. + """ + + def test_term_reference_positional_args_triggers_warning(self) -> None: + """Term reference with positional args emits validation warning.""" + parser = FluentParserV1() + ftl_source = """ +-brand = Acme Corp +msg = Welcome to { -brand($var) } +""" + resource = parser.parse(ftl_source) + + validator = SemanticValidator() + result = validator.validate(resource) + + # Should have warning about positional args + # Annotation.code is a string (enum name), not DiagnosticCode enum + warning_codes = [a.code for a in result.annotations] + assert "VALIDATION_TERM_POSITIONAL_ARGS" in warning_codes + + def test_term_reference_named_args_no_warning(self) -> None: + """Term reference with only named args does NOT emit warning.""" + parser = FluentParserV1() + ftl_source = """ +-brand = { $case -> + [nominative] Acme Corp + *[other] Acme Corp +} +msg = Welcome to { -brand(case: "nominative") } +""" + resource = parser.parse(ftl_source) + + validator = SemanticValidator() + result = validator.validate(resource) + + # Should NOT have warning about positional args + warning_codes = [a.code for a in result.annotations] + assert "VALIDATION_TERM_POSITIONAL_ARGS" not in warning_codes + + def test_term_reference_mixed_args_triggers_warning(self) -> None: + """Term reference with mixed positional and named args emits warning.""" + parser = FluentParserV1() + ftl_source = """ +-brand = Acme Corp +msg = Welcome to { -brand($var, extra: "value") } +""" + resource = parser.parse(ftl_source) + + validator = SemanticValidator() + result = validator.validate(resource) + + warning_codes = [a.code for a in result.annotations] + assert "VALIDATION_TERM_POSITIONAL_ARGS" in warning_codes + + def test_term_reference_no_args_no_warning(self) -> None: + """Term reference without arguments does NOT emit warning.""" + parser = FluentParserV1() + ftl_source = """ +-brand = Acme Corp +msg = Welcome to { -brand } +""" + resource = parser.parse(ftl_source) + + validator = SemanticValidator() + result = validator.validate(resource) + + # Should NOT have warning about positional args + warning_codes = [a.code for a in result.annotations] + assert "VALIDATION_TERM_POSITIONAL_ARGS" not in warning_codes + + def test_warning_message_contains_term_name(self) -> None: + """Warning message identifies the term reference causing the warning.""" + parser = FluentParserV1() + ftl_source = """ +-my_special_term = Test +msg = { -my_special_term($x) } +""" + resource = parser.parse(ftl_source) + + validator = SemanticValidator() + result = validator.validate(resource) + + annotations = [ + a + for a in result.annotations + if a.code == "VALIDATION_TERM_POSITIONAL_ARGS" + ] + assert len(annotations) == 1 + assert "-my_special_term" in annotations[0].message + assert "positional arguments are ignored" in annotations[0].message + + +class TestFunctionCallInfoPositionalArgVarsRename: + """Tests for SEM-INTROSPECTION-DATA-LOSS-001 resolution. + + FunctionCallInfo.positional_args renamed to positional_arg_vars to + clarify that it contains only variable reference names, not all arguments. + """ + + def test_positional_arg_vars_field_exists(self) -> None: + """FunctionCallInfo has positional_arg_vars field.""" + info = FunctionCallInfo( + name="NUMBER", + positional_arg_vars=("amount", "extra"), + named_args=frozenset({"minimumFractionDigits"}), + span=None, + ) + assert info.positional_arg_vars == ("amount", "extra") + + def test_positional_arg_vars_contains_only_variable_names(self) -> None: + """positional_arg_vars only contains VariableReference names.""" + parser = FluentParserV1() + # FTL with function that has mixed positional args (variable and literal) + ftl_source = 'msg = { NUMBER($var, "literal") }' + resource = parser.parse(ftl_source) + msg = resource.entries[0] + assert isinstance(msg, Message) + + result = introspect_message(msg) + func = next(iter(result.functions)) + + # Only variable reference name should be present, not "literal" + assert func.positional_arg_vars == ("var",) + + def test_introspect_message_extracts_positional_arg_vars(self) -> None: + """introspect_message correctly populates positional_arg_vars.""" + bundle = FluentBundle("en") + bundle.add_resource("price = { NUMBER($amount, minimumFractionDigits: 2) }") + + info = bundle.introspect_message("price") + funcs = list(info.functions) + assert len(funcs) == 1 + + func = funcs[0] + assert func.name == "NUMBER" + assert "amount" in func.positional_arg_vars + assert "minimumFractionDigits" in func.named_args + + def test_positional_arg_vars_multiple_variables(self) -> None: + """positional_arg_vars captures multiple variable references.""" + parser = FluentParserV1() + ftl_source = "msg = { FUNC($a, $b, $c) }" + resource = parser.parse(ftl_source) + msg = resource.entries[0] + assert isinstance(msg, Message) + + result = introspect_message(msg) + func = next(iter(result.functions)) + + assert set(func.positional_arg_vars) == {"a", "b", "c"} + + +class TestCrossResourceCycleDetection: + """Tests for VAL-CROSS-RESOURCE-CYCLES-001 resolution. + + FluentBundle.validate_resource() now detects cycles involving dependencies + OF existing bundle entries, not just their names. + """ + + def test_simple_cross_resource_cycle_detected(self) -> None: + """Cycle through dependencies of existing entry is detected. + + Scenario: + - Resource 1: msg_a = { msg_b } + - Resource 2: msg_b = { msg_a } + + When validating Resource 2, msg_b references msg_a which is in the bundle. + Since msg_a's dependencies (msg_b) now complete a cycle, it should be detected. + """ + bundle = FluentBundle("en", use_isolating=False) + + # Add first resource: msg_a depends on msg_b (not yet defined) + bundle.add_resource("msg_a = { msg_b }") + + # Now validate second resource that completes the cycle + result = bundle.validate_resource("msg_b = { msg_a }") + + # Should detect the circular reference + warning_texts = " ".join(w.message for w in result.warnings) + assert "Circular" in warning_texts + + def test_term_cross_resource_cycle_detected(self) -> None: + """Cycle through term dependencies is detected. + + Scenario: + - Resource 1: -term_a = { -term_b } + - Resource 2: -term_b = { -term_a } + """ + bundle = FluentBundle("en", use_isolating=False) + + # Add first resource: term_a depends on term_b + bundle.add_resource("-term_a = { -term_b }") + + # Validate second resource that completes the cycle + result = bundle.validate_resource("-term_b = { -term_a }") + + # Should detect the circular reference + warning_texts = " ".join(w.message for w in result.warnings) + assert "Circular" in warning_texts + + def test_mixed_message_term_cross_resource_cycle_detected(self) -> None: + """Cycle involving both messages and terms across resources is detected. + + Scenario: + - Resource 1: -brand = { greeting } + - Resource 2: greeting = { -brand } + """ + bundle = FluentBundle("en", use_isolating=False) + + # Add first resource: term depends on message + bundle.add_resource("-brand = { greeting }") + + # Validate second resource that completes the cycle + result = bundle.validate_resource("greeting = { -brand }") + + # Should detect the circular reference + warning_texts = " ".join(w.message for w in result.warnings) + assert "Circular" in warning_texts + + def test_no_false_positive_for_valid_cross_resource(self) -> None: + """Valid cross-resource references don't trigger false positives. + + Scenario: + - Resource 1: msg_a = Hello + - Resource 2: msg_b = { msg_a } + + This is a valid dependency chain, not a cycle. + """ + bundle = FluentBundle("en", use_isolating=False) + + # Add first resource: msg_a has no dependencies + bundle.add_resource("msg_a = Hello") + + # Validate second resource that references msg_a + result = bundle.validate_resource("msg_b = { msg_a }") + + # Should NOT have circular reference warnings + warning_texts = " ".join(w.message for w in result.warnings) + assert "Circular" not in warning_texts + + def test_transitive_cross_resource_cycle_detected(self) -> None: + """Transitive cycles across resources are detected. + + Scenario: + - Resource 1: msg_a = { msg_b }, msg_b = { msg_c } + - Resource 2: msg_c = { msg_a } + """ + bundle = FluentBundle("en", use_isolating=False) + + # Add first resource with chain msg_a -> msg_b -> msg_c (incomplete) + bundle.add_resource(""" +msg_a = { msg_b } +msg_b = { msg_c } +""") + + # Validate second resource that completes the cycle + result = bundle.validate_resource("msg_c = { msg_a }") + + # Should detect the circular reference + warning_texts = " ".join(w.message for w in result.warnings) + assert "Circular" in warning_texts + + def test_bundle_deps_tracking_accuracy(self) -> None: + """Internal _msg_deps and _term_deps are correctly populated.""" + bundle = FluentBundle("en", use_isolating=False) + + # Add resources with various dependencies + bundle.add_resource(""" +-brand = Acme Corp +-slogan = { -brand } +welcome = Hello { -brand } +goodbye = { welcome } - { -slogan } +""") + + # pylint: disable=protected-access + # Verify _term_deps + assert "brand" in bundle._term_deps + assert bundle._term_deps["brand"] == set() + + assert "slogan" in bundle._term_deps + assert "term:brand" in bundle._term_deps["slogan"] + + # Verify _msg_deps + assert "welcome" in bundle._msg_deps + assert "term:brand" in bundle._msg_deps["welcome"] + + assert "goodbye" in bundle._msg_deps + assert "msg:welcome" in bundle._msg_deps["goodbye"] + assert "term:slogan" in bundle._msg_deps["goodbye"] + # pylint: enable=protected-access + + +# ============================================================================ +# VALIDATOR BRANCH COVERAGE +# ============================================================================ + + +class TestValidatorBranchCoverage: + """Test SemanticValidator branch coverage.""" + + def test_validate_junk_entry_passthrough(self) -> None: + """Junk entry in validation passes through without error.""" + junk = Junk(content="invalid") + resource = Resource(entries=(junk,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result is not None + + def test_validate_comment_entry_passthrough(self) -> None: + """Comment entry in validation passes through successfully.""" + comment = Comment(content="This is a comment", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result.is_valid + + def test_validate_message_without_value(self) -> None: + """Message with value=None and attributes validates without crash.""" + attr = Attribute( + id=Identifier("hint"), + value=Pattern(elements=(TextElement("Hint text"),)), + ) + message = Message( + id=Identifier("noValue"), + value=None, + attributes=(attr,), + ) + resource = Resource(entries=(message,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result is not None + + +class TestTermWithoutValueRejected: + """Term with None value is rejected at construction time by __post_init__.""" + + def test_term_without_value_via_manual_ast(self) -> None: + """Term constructor raises ValueError when value is None.""" + with pytest.raises(ValueError, match="Term must have a value pattern"): + Term( + id=Identifier(name="empty-term"), + value=None, # type: ignore[arg-type] + attributes=(), + ) + + +class TestPlaceableExpressionValidation: + """Validator processes the expression inside a Placeable.""" + + def test_placeable_expression_validation(self) -> None: + """Validation processes Placeable's inner expression (hits validate_expression path).""" + ftl = """ +message = Text { $variable } more text +""" + resource = FluentParserV1().parse(ftl) + result = validate(resource) + + assert result.is_valid + + +class TestDuplicateNamedArguments: + """Validator detects duplicate named argument names in function calls.""" + + def test_duplicate_named_arguments(self) -> None: + """Function with duplicate named arg names produces validation annotation.""" + func_ref = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(NumberLiteral(value=42, raw="42"),), + named=( + NamedArgument( + name=Identifier(name="minimumFractionDigits"), + value=NumberLiteral(value=2, raw="2"), + ), + NamedArgument( + name=Identifier(name="minimumFractionDigits"), # Duplicate + value=NumberLiteral(value=3, raw="3"), + ), + ), + ), + ) + + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + comment=None, + span=(0, 0), # type: ignore[arg-type] + ) + + resource = Resource(entries=(msg,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert len(result.annotations) > 0 or not result.is_valid + + +class TestSelectExpressionNoVariants: + """SelectExpression with zero variants is rejected by __post_init__.""" + + def test_select_expression_no_variants(self) -> None: + """SelectExpression constructor raises ValueError when variants is empty.""" + with pytest.raises(ValueError, match="SelectExpression requires at least one variant"): + SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=(), + ) + + +class TestNestedPlaceableValidation: + """Validator processes nested Placeables (Placeable as expression of Placeable).""" + + def test_nested_placeable_validation(self) -> None: + """Validator traverses nested Placeables without error.""" + inner_placeable = Placeable( + expression=VariableReference(id=Identifier(name="count")) + ) + outer_placeable = Placeable(expression=inner_placeable) + + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(outer_placeable,)), + attributes=(), + comment=None, + span=(0, 0), # type: ignore[arg-type] + ) + + resource = Resource(entries=(msg,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result.is_valid + + +class TestValidatorBranchCoverageExtended: + """Extended validator branch coverage tests.""" + + def test_validate_term_with_attributes(self) -> None: + """Validator handles term with attributes without error.""" + term = Term( + id=Identifier("brand"), + value=Pattern(elements=(TextElement("Firefox"),)), + attributes=( + Attribute( + id=Identifier("gender"), + value=Pattern(elements=(TextElement("m"),)), + ), + ), + ) + resource = Resource(entries=(term,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result is not None + + def test_validate_message_with_select_in_attribute(self) -> None: + """Validator processes message with SelectExpression in attribute.""" + select = SelectExpression( + selector=VariableReference(id=Identifier("count")), + variants=( + Variant( + key=Identifier("one"), + value=Pattern(elements=(TextElement("One"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement("Other"),)), + default=True, + ), + ), + ) + + message = Message( + id=Identifier("msg"), + value=Pattern(elements=(TextElement("Main"),)), + attributes=( + Attribute( + id=Identifier("count"), + value=Pattern(elements=(Placeable(expression=select),)), + ), + ), + ) + resource = Resource(entries=(message,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result is not None + + +# ============================================================================ +# DEFENSE-IN-DEPTH: PLACEABLE AS SELECTOR GUARD (validator.py:421-422) +# ============================================================================ + + +class TestPlaceableAsSelectorDefenseGuard: + """Validator defense-in-depth: Placeable used as SelectExpression selector. + + The SelectorExpression type alias excludes Placeable at the type level, so + normal construction cannot produce this state. However, deserialization or + object.__setattr__ bypass can. The validator re-checks this invariant at + line 420-422 via a widened ``object`` guard to avoid mypy unreachable + detection while still catching adversarial ASTs at runtime. + + Covers validator.py:422 (``self._add_error(errors, VALIDATION_PLACEABLE_SELECTOR)``). + """ + + def test_placeable_as_selector_adds_error(self) -> None: + """Validator adds VALIDATION_PLACEABLE_SELECTOR error when selector is a Placeable.""" + # Build a valid SelectExpression first, then bypass __post_init__ to + # inject a Placeable as the selector — this is the adversarial path. + valid_select = SelectExpression( + selector=VariableReference(id=Identifier("count")), + variants=( + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement("Other"),)), + default=True, + ), + ), + ) + # Bypass __post_init__: inject a Placeable as the selector + nested_literal = Placeable( + expression=VariableReference(id=Identifier("nested")) + ) + object.__setattr__(valid_select, "selector", nested_literal) + + message = Message( + id=Identifier("msg"), + value=Pattern(elements=(Placeable(expression=valid_select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + validator = SemanticValidator() + result = validator.validate(resource) + + assert result is not None + # _add_error stores Annotation.code as DiagnosticCode.name (str), not enum. + # Errors from SemanticValidator appear in result.annotations, not result.errors. + selector_errors = [ + a for a in result.annotations + if a.code == DiagnosticCode.VALIDATION_PLACEABLE_SELECTOR.name + ] + assert len(selector_errors) == 1 diff --git a/tests/syntax_validator_cases/results.py b/tests/syntax_validator_cases/results.py new file mode 100644 index 00000000..3230b419 --- /dev/null +++ b/tests/syntax_validator_cases/results.py @@ -0,0 +1,600 @@ +# mypy: ignore-errors +from tests.syntax_validator_cases import ( + _VALIDATION_MESSAGES, + Annotation, + CallArguments, + Decimal, + DiagnosticCode, + FluentParserV1, + FunctionReference, + Identifier, + Message, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + SemanticValidator, + Span, + TextElement, + ValidationResult, + VariableReference, + Variant, + pytest, + validate, +) + + +class TestSelectExpressionValidation: + """Test select expression validation.""" + + def test_select_with_valid_default_variant(self) -> None: + """Select expression with exactly one default variant validates.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $count -> + [one] One item + *[other] Many items +} +""") + result = validate(resource) + assert result.is_valid + + def test_select_without_variants_constructor_validation(self) -> None: + """SelectExpression without variants raises ValueError at construction. + + Tests AST __post_init__ validation that enforces at least one variant. + Tests assumption that validator can rely on this invariant. + """ + with pytest.raises(ValueError, match="at least one variant"): + SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=(), + ) + + def test_select_without_variants_validator_defensive_check(self) -> None: + """Validator catches empty-variants SelectExpression constructed via object.__new__. + + SelectExpression.__post_init__ enforces non-empty variants at construction. + The validator's check is intentional defense-in-depth for ASTs that bypass + __post_init__ (e.g., via object.__new__ + object.__setattr__). + """ + # Create SelectExpression bypassing __post_init__ validation + select = object.__new__(SelectExpression) + object.__setattr__(select, "selector", VariableReference(id=Identifier(name="x"))) + object.__setattr__(select, "variants", ()) # Invalid per spec + object.__setattr__(select, "span", None) + + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + validator = SemanticValidator() + result = validator.validate(resource) + + # Validator should catch missing variants + assert not result.is_valid + errors = [a for a in result.annotations if "NO_VARIANTS" in a.code] + assert len(errors) > 0 + + def test_select_with_multiple_defaults_constructor_validation(self) -> None: + """SelectExpression with multiple defaults raises ValueError. + + Tests AST __post_init__ validation. + """ + variants = ( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="One"),)), + default=True, # First default + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, # Second default - invalid! + ), + ) + with pytest.raises(ValueError, match="exactly one default variant"): + SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=variants, + ) + + def test_select_with_zero_defaults_validator_defensive_check(self) -> None: + """Validator catches zero-default SelectExpression constructed via object.__new__. + + SelectExpression.__post_init__ enforces exactly one default at construction. + The validator's check is intentional defense-in-depth for ASTs that bypass + __post_init__ (e.g., via object.__new__ + object.__setattr__). + """ + # Create SelectExpression with zero defaults (bypassing __post_init__) + variant = object.__new__(Variant) + object.__setattr__(variant, "key", Identifier(name="one")) + object.__setattr__(variant, "value", Pattern(elements=(TextElement(value="One"),))) + object.__setattr__(variant, "default", False) # No default! + object.__setattr__(variant, "span", None) + + select = object.__new__(SelectExpression) + object.__setattr__(select, "selector", VariableReference(id=Identifier(name="x"))) + object.__setattr__(select, "variants", (variant,)) + object.__setattr__(select, "span", None) + + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + validator = SemanticValidator() + result = validator.validate(resource) + + # Validator should catch default count != 1 + assert not result.is_valid + errors = [a for a in result.annotations if "NO_DEFAULT" in a.code] + assert len(errors) > 0 + + def test_select_with_duplicate_variant_keys_invalid(self) -> None: + """Select expression with duplicate variant keys is invalid. + + Tests line 418 (duplicate variant key detection). + """ + # Manually construct select with duplicate keys + variants = ( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="First one"),)), + default=False, + ), + Variant( + key=Identifier(name="one"), # Duplicate! + value=Pattern(elements=(TextElement(value="Second one"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, + ), + ) + select = SelectExpression( + selector=VariableReference(id=Identifier(name="x")), + variants=variants, + ) + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + result = validate(resource) + + # Should detect duplicate variant key + assert not result.is_valid + errors = [ + a + for a in result.annotations + if "DUPLICATE" in a.code or "duplicate" in a.message.lower() + ] + assert len(errors) > 0 + + def test_select_with_numeric_variant_keys(self) -> None: + """Select expression with numeric variant keys validates.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $count -> + [0] Zero + [1] One + *[other] Many +} +""") + result = validate(resource) + assert result.is_valid + + def test_select_with_duplicate_numeric_keys_different_forms(self) -> None: + """Numeric variant keys 1 and 1.0 are duplicates. + + Tests Decimal normalization in _variant_key_to_string. + """ + # Manually construct select with 1 and 1.0 (should be duplicates) + variants = ( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="One"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=Decimal("1.0"), raw="1.0"), # Duplicate! + value=Pattern(elements=(TextElement(value="One point zero"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, + ), + ) + select = SelectExpression( + selector=VariableReference(id=Identifier(name="x")), + variants=variants, + ) + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + result = validate(resource) + + # Should detect duplicate (1 and 1.0 are same value) + assert not result.is_valid + + def test_select_nested_in_variant(self) -> None: + """Nested select expressions validate recursively.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $x -> + [one] { $y -> + [a] One-A + *[b] One-B + } + *[other] Other +} +""") + result = validate(resource) + assert result.is_valid + + +# ============================================================================ +# VARIANT KEY NORMALIZATION TESTS +# ============================================================================ + + +class TestVariantKeyNormalization: + """Test variant key normalization and Decimal handling.""" + + def test_decimal_normalization_for_numeric_keys(self) -> None: + """Numeric keys are normalized using Decimal for comparison. + + 100 (int, raw="100") and 1E+2 (Decimal, raw="1E2") are the same numeric + value after Decimal normalization; the validator must detect them as + duplicate variant keys. + """ + variants = ( + Variant( + key=NumberLiteral(value=100, raw="100"), + value=Pattern(elements=(TextElement(value="Hundred"),)), + default=False, + ), + Variant( + # Decimal("1E2") == Decimal("100") after normalization. + # raw="1E2" is a valid Decimal literal; value must be Decimal, not int, + # because int("1E2") fails. Both normalize to format("f") = "100". + key=NumberLiteral(value=Decimal("1E2"), raw="1E2"), + value=Pattern(elements=(TextElement(value="Also hundred"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, + ), + ) + select = SelectExpression( + selector=VariableReference(id=Identifier(name="x")), + variants=variants, + ) + message = Message( + id=Identifier(name="msg"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + result = validate(resource) + + # Should detect as duplicates after normalization + assert not result.is_valid + + def test_number_literal_rejects_invalid_raw(self) -> None: + """NumberLiteral.__post_init__ rejects raw strings that do not parse as numbers. + + The validator's former fallback (returning key.raw on Decimal conversion failure) + is now unreachable because NumberLiteral enforces the raw/value invariant at + construction time. + """ + with pytest.raises(ValueError, match="not a valid number literal"): + NumberLiteral(value=Decimal(0), raw="not-a-number") + + def test_number_literal_rejects_non_finite_decimal(self) -> None: + """NumberLiteral.__post_init__ rejects non-finite Decimal values. + + Infinity and NaN are not valid FTL number literal values. + The validator's former exception handling for format(Infinity, 'f') is now + unreachable because NumberLiteral rejects non-finite Decimals at construction. + """ + with pytest.raises(ValueError, match="not a finite number"): + NumberLiteral(value=Decimal("Infinity"), raw="Infinity") + + +# ============================================================================ +# VALIDATION RESULT TESTS +# ============================================================================ + + +class TestValidationResultFactory: + """Test ValidationResult factory methods.""" + + def test_validation_result_valid_factory(self) -> None: + """ValidationResult.valid() creates valid result.""" + result = ValidationResult.valid() + assert result.is_valid is True + assert len(result.annotations) == 0 + + def test_validation_result_invalid_factory(self) -> None: + """ValidationResult.invalid() creates invalid result.""" + annotation = Annotation( + code="E0001", + message="Test error", + span=Span(start=0, end=1), + ) + result = ValidationResult.invalid(annotations=(annotation,)) + assert result.is_valid is False + assert len(result.annotations) == 1 + + def test_validation_result_from_annotations_empty(self) -> None: + """ValidationResult.from_annotations() with empty tuple is valid.""" + result = ValidationResult.from_annotations(()) + assert result.is_valid is True + assert len(result.annotations) == 0 + + def test_validation_result_from_annotations_with_errors(self) -> None: + """ValidationResult.from_annotations() with errors is invalid.""" + annotations = ( + Annotation(code="E0001", message="Error 1", span=Span(start=0, end=1)), + Annotation(code="E0002", message="Error 2", span=Span(start=2, end=3)), + ) + result = ValidationResult.from_annotations(annotations) + assert not result.is_valid + assert len(result.annotations) == 2 + + +class TestValidationResultProperties: + """Test ValidationResult properties.""" + + def test_annotations_are_immutable_tuples(self) -> None: + """Annotations are stored as tuples (immutable).""" + annotation = Annotation( + code="E0001", + message="Error", + span=Span(start=0, end=1), + ) + result = ValidationResult.invalid(annotations=(annotation,)) + assert isinstance(result.annotations, tuple) + + def test_is_valid_true_means_no_errors(self) -> None: + """is_valid=True implies no error-level annotations.""" + result = ValidationResult.valid() + assert result.is_valid is True + assert len(result.annotations) == 0 + + +# ============================================================================ +# ERROR MESSAGE HANDLING TESTS +# ============================================================================ + + +class TestErrorMessageHandling: + """Test error message generation and diagnostic codes.""" + + def test_validation_messages_dict_exists(self) -> None: + """_VALIDATION_MESSAGES dict contains error message templates.""" + assert isinstance(_VALIDATION_MESSAGES, dict) + assert len(_VALIDATION_MESSAGES) > 0 + + def test_diagnostic_codes_for_validation_exist(self) -> None: + """Validation-related DiagnosticCodes are defined.""" + expected_codes = [ + DiagnosticCode.VALIDATION_TERM_NO_VALUE, + DiagnosticCode.VALIDATION_SELECT_NO_DEFAULT, + DiagnosticCode.VALIDATION_SELECT_NO_VARIANTS, + DiagnosticCode.VALIDATION_VARIANT_DUPLICATE, + DiagnosticCode.VALIDATION_NAMED_ARG_DUPLICATE, + ] + for code in expected_codes: + assert isinstance(code, DiagnosticCode) + assert code.value >= 5000 # Validation codes in 5000+ range + + def test_error_message_fallback_for_unknown_code(self) -> None: + """Error message uses fallback for unknown diagnostic code. + + Tests line 129->133 in _add_error method. + """ + # Create an annotation with a code not in _VALIDATION_MESSAGES + validator = SemanticValidator() + errors: list[Annotation] = [] + + # Use a diagnostic code that won't be in the validation messages dict + # Call the _add_error method directly (accessing private method for testing) + validator._add_error( + errors, + DiagnosticCode.MESSAGE_NOT_FOUND, # Not a validation code + span=Span(start=0, end=1), + ) + + # Should have added an error with fallback message + assert len(errors) == 1 + assert errors[0].message == "Unknown validation error" + + +# ============================================================================ +# VALIDATOR STATE MANAGEMENT TESTS +# ============================================================================ + + +class TestValidatorStateManagement: + """Test validator internal state handling.""" + + def test_validator_reusable_across_validations(self) -> None: + """Validator can validate multiple resources without state leakage.""" + parser = FluentParserV1() + validator = SemanticValidator() + + # First validation + resource1 = parser.parse("msg1 = Value 1") + result1 = validator.validate(resource1) + assert result1.is_valid + + # Second validation should not be affected by first + resource2 = parser.parse("msg2 = Value 2") + result2 = validator.validate(resource2) + assert result2.is_valid + + def test_validator_results_independent(self) -> None: + """Validating one resource doesn't affect validation of another.""" + parser = FluentParserV1() + validator = SemanticValidator() + + resource1 = parser.parse("msg1 = Value 1") + resource2 = parser.parse("msg2 = Value 2") + + result1_first = validator.validate(resource1) + validator.validate(resource2) # Validate resource2 + result1_again = validator.validate(resource1) # Validate resource1 again + + # Results for same resource should be identical + assert result1_first.is_valid == result1_again.is_valid + assert len(result1_first.annotations) == len(result1_again.annotations) + + +# ============================================================================ +# INTEGRATION TESTS +# ============================================================================ + + +class TestValidatorIntegration: + """Integration tests combining multiple validation aspects.""" + + def test_complex_message_with_all_features(self) -> None: + """Complex message with multiple features validates correctly.""" + parser = FluentParserV1() + resource = parser.parse(""" +# Comment +greeting = Hello { $name }, you have { $count -> + [0] no messages + [1] one message + *[other] { NUMBER($count) } messages +}! + .formal = Dear { $name }, you have { NUMBER($count) } message(s). + +-brand = Firefox + .short = FX + +status = + .online = Online now + .offline = Offline + +invalid junk entry +""") + result = validate(resource) + # Should handle all entry types and complex patterns + assert isinstance(result, ValidationResult) + + def test_deeply_nested_structures(self) -> None: + """Deeply nested select expressions validate without issues.""" + parser = FluentParserV1() + resource = parser.parse(""" +msg = { $a -> + [1] { $b -> + [1] { $c -> + [1] Triple nested + *[other] C-other + } + *[other] B-other + } + *[other] A-other +} +""") + result = validate(resource) + assert isinstance(result, ValidationResult) + + def test_multiple_entries_with_mixed_validity(self) -> None: + """Resource with mix of valid and invalid entries.""" + # Construct resource with some invalid entries + valid_message = Message( + id=Identifier(name="valid"), + value=Pattern(elements=(TextElement(value="Valid"),)), + attributes=(), + ) + + # Invalid: duplicate named args + invalid_func = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="opt"), + value=NumberLiteral(value=1, raw="1"), + ), + NamedArgument( + name=Identifier(name="opt"), # Duplicate + value=NumberLiteral(value=2, raw="2"), + ), + ), + ), + ) + invalid_message = Message( + id=Identifier(name="invalid"), + value=Pattern(elements=(Placeable(expression=invalid_func),)), + attributes=(), + ) + + resource = Resource(entries=(valid_message, invalid_message)) + result = validate(resource) + + # Should detect the invalid entry + assert not result.is_valid + assert len(result.annotations) > 0 + + +# ============================================================================ +# CONVENIENCE FUNCTION TESTS +# ============================================================================ + + +class TestConvenienceFunction: + """Test the validate() convenience function.""" + + def test_validate_function_creates_validator_internally(self) -> None: + """validate() function is a convenience wrapper.""" + parser = FluentParserV1() + resource = parser.parse("msg = Value") + + # Use convenience function + result = validate(resource) + + assert isinstance(result, ValidationResult) + assert result.is_valid + + def test_validate_function_same_result_as_validator_class(self) -> None: + """validate() function produces same result as SemanticValidator.""" + parser = FluentParserV1() + resource = parser.parse("msg = Hello World") + + # Use convenience function + result1 = validate(resource) + + # Use validator class + validator = SemanticValidator() + result2 = validator.validate(resource) + + assert result1.is_valid == result2.is_valid + assert len(result1.annotations) == len(result2.annotations) + + +# ============================================================================ +# SEMANTIC VALIDATION (from test_semantic_validation.py) +# ============================================================================ + diff --git a/tests/syntax_visitor_cases/__init__.py b/tests/syntax_visitor_cases/__init__.py new file mode 100644 index 00000000..cf0473f0 --- /dev/null +++ b/tests/syntax_visitor_cases/__init__.py @@ -0,0 +1,103 @@ +"""Tests for syntax.visitor: ASTVisitor traversal, dispatch, and defensive branches.""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any + +from ftllexengine.enums import CommentType +from ftllexengine.syntax.ast import ( + Attribute, + CallArguments, + Comment, + FunctionReference, + Identifier, + Junk, + Message, + MessageReference, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TermReference, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.visitor import ASTTransformer, ASTVisitor + +# ============================================================================ +# HELPER VISITORS +# ============================================================================ + + +class CountingVisitor(ASTVisitor): + """Counts visits to each node type.""" + + def __init__(self) -> None: + """Initialize counters.""" + super().__init__() + self.counts: dict[str, int] = {} + + def visit(self, node: Any) -> Any: + """Track each visit.""" + node_type = type(node).__name__ + self.counts[node_type] = self.counts.get(node_type, 0) + 1 + return super().visit(node) + + +class CollectingVisitor(ASTVisitor): + """Collects all identifiers visited.""" + + def __init__(self) -> None: + """Initialize collection.""" + super().__init__() + self.identifiers: list[str] = [] + + def visit_Identifier(self, node: Identifier) -> Any: + """Collect identifier names.""" + self.identifiers.append(node.name) + return self.generic_visit(node) + + +class TransformingVisitor(ASTVisitor): + """Transforms text to uppercase.""" + + def visit_TextElement(self, node: TextElement) -> TextElement: + """Transform text to uppercase.""" + return TextElement(value=node.value.upper()) + +__all__ = [ + "ASTTransformer", + "ASTVisitor", + "Any", + "Attribute", + "CallArguments", + "CollectingVisitor", + "Comment", + "CommentType", + "CountingVisitor", + "FunctionReference", + "Identifier", + "Junk", + "Message", + "MessageReference", + "NamedArgument", + "NumberLiteral", + "Pattern", + "Placeable", + "Resource", + "SelectExpression", + "StringLiteral", + "Term", + "TermReference", + "TextElement", + "TransformingVisitor", + "VariableReference", + "Variant", + "dataclass", +] diff --git a/tests/syntax_visitor_cases/basic_visitor_tests.py b/tests/syntax_visitor_cases/basic_visitor_tests.py new file mode 100644 index 00000000..a4e4d4ce --- /dev/null +++ b/tests/syntax_visitor_cases/basic_visitor_tests.py @@ -0,0 +1,30 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# BASIC VISITOR TESTS +# ============================================================================ + + +class TestASTVisitorBasic: + """Test basic visitor functionality.""" + + def test_visit_dispatches_to_specific_method(self) -> None: + """Visitor dispatches to visit_NodeType method.""" + visitor = CountingVisitor() + node = Identifier(name="test") + + visitor.visit(node) + + assert visitor.counts["Identifier"] == 1 + + def test_generic_visit_returns_node(self) -> None: + """Generic visit returns node unchanged.""" + visitor = ASTVisitor() + node = Identifier(name="test") + + result = visitor.generic_visit(node) + + assert result is node diff --git a/tests/syntax_visitor_cases/call_arguments.py b/tests/syntax_visitor_cases/call_arguments.py new file mode 100644 index 00000000..4731d225 --- /dev/null +++ b/tests/syntax_visitor_cases/call_arguments.py @@ -0,0 +1,87 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# CALL ARGUMENTS +# ============================================================================ + + +class TestVisitorCallArguments: + """Test visiting CallArguments nodes.""" + + def test_visit_call_arguments_empty(self) -> None: + """Visit call arguments with no args.""" + visitor = CountingVisitor() + args = CallArguments(positional=(), named=()) + + visitor.visit(args) + + assert visitor.counts["CallArguments"] == 1 + + def test_visit_call_arguments_positional(self) -> None: + """Visit call arguments with positional args.""" + visitor = CountingVisitor() + args = CallArguments( + positional=( + VariableReference(id=Identifier(name="x")), + NumberLiteral(value=42, raw="42"), + ), + named=(), + ) + + visitor.visit(args) + + assert visitor.counts["CallArguments"] == 1 + assert visitor.counts["VariableReference"] == 1 + assert visitor.counts["NumberLiteral"] == 1 + + def test_visit_call_arguments_named(self) -> None: + """Visit call arguments with named args.""" + visitor = CountingVisitor() + args = CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="param"), + value=StringLiteral(value="value"), + ), + ), + ) + + visitor.visit(args) + + assert visitor.counts["CallArguments"] == 1 + assert visitor.counts["NamedArgument"] == 1 + assert visitor.counts["StringLiteral"] == 1 + + +class TestVisitorNamedArgument: + """Test visiting NamedArgument nodes.""" + + def test_visit_named_argument(self) -> None: + """Visit named argument.""" + visitor = CountingVisitor() + arg = NamedArgument( + name=Identifier(name="minimumFractionDigits"), value=NumberLiteral(value=2, raw="2") + ) + + visitor.visit(arg) + + assert visitor.counts["NamedArgument"] == 1 + assert visitor.counts["Identifier"] == 1 + assert visitor.counts["NumberLiteral"] == 1 + + +class TestVisitorIdentifier: + """Test visiting Identifier nodes.""" + + def test_visit_identifier(self) -> None: + """Visit identifier.""" + visitor = CountingVisitor() + ident = Identifier(name="test") + + visitor.visit(ident) + + assert visitor.counts["Identifier"] == 1 diff --git a/tests/syntax_visitor_cases/complex_integration_tests.py b/tests/syntax_visitor_cases/complex_integration_tests.py new file mode 100644 index 00000000..0c295513 --- /dev/null +++ b/tests/syntax_visitor_cases/complex_integration_tests.py @@ -0,0 +1,123 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# COMPLEX INTEGRATION TESTS +# ============================================================================ + + +class TestVisitorIntegration: + """Test visitor with complex AST structures.""" + + def test_visit_complex_message_with_select(self) -> None: + """Visit message with select expression and multiple variants.""" + visitor = CountingVisitor() + msg = Message( + id=Identifier(name="emails"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern( + elements=(TextElement(value="one email"),) + ), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern( + elements=( + Placeable( + expression=VariableReference( + id=Identifier(name="count") + ) + ), + TextElement(value=" emails"), + ) + ), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + + visitor.visit(msg) + + assert visitor.counts["Message"] == 1 + assert visitor.counts["SelectExpression"] == 1 + assert visitor.counts["Variant"] == 2 + assert visitor.counts["VariableReference"] == 2 # selector + in variant + + def test_visit_message_with_function_call(self) -> None: + """Visit message with function call.""" + visitor = CountingVisitor() + msg = Message( + id=Identifier(name="price"), + value=Pattern( + elements=( + TextElement(value="Price: "), + Placeable( + expression=FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=( + VariableReference(id=Identifier(name="value")), + ), + named=( + NamedArgument( + name=Identifier(name="minimumFractionDigits"), + value=NumberLiteral(value=2, raw="2"), + ), + ), + ), + ) + ), + ) + ), + attributes=(), + ) + + visitor.visit(msg) + + assert visitor.counts["Message"] == 1 + assert visitor.counts["FunctionReference"] == 1 + assert visitor.counts["CallArguments"] == 1 + assert visitor.counts["NamedArgument"] == 1 + + def test_visit_resource_with_mixed_entries(self) -> None: + """Visit resource with messages, terms, comments, and junk.""" + visitor = CountingVisitor() + resource = Resource( + entries=( + Comment(content="Header comment", type=CommentType.COMMENT), + Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello"),)), + attributes=(), + ), + Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + ), + Junk(content="invalid syntax"), + ) + ) + + visitor.visit(resource) + + assert visitor.counts["Resource"] == 1 + assert visitor.counts["Comment"] == 1 + assert visitor.counts["Message"] == 1 + assert visitor.counts["Term"] == 1 + assert visitor.counts["Junk"] == 1 diff --git a/tests/syntax_visitor_cases/defensive_branches_from_test_visitor_branch_coverage_py.py b/tests/syntax_visitor_cases/defensive_branches_from_test_visitor_branch_coverage_py.py new file mode 100644 index 00000000..9ad6ee88 --- /dev/null +++ b/tests/syntax_visitor_cases/defensive_branches_from_test_visitor_branch_coverage_py.py @@ -0,0 +1,171 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# DEFENSIVE BRANCHES (from test_visitor_branch_coverage.py) +# ============================================================================ + + +@dataclass(frozen=True) +class MockFieldContainer: + """Mock container without __dataclass_fields__ for testing defensive branches.""" + + value: str + + +class PlainObject: + """Plain object without dataclass fields for testing defensive branches.""" + + def __init__(self, data: str) -> None: + """Initialize with data.""" + self.data = data + + +class TestGenericVisitDefensiveBranches: + """Test defensive branches in generic_visit for non-ASTNode values.""" + + def test_generic_visit_tuple_with_non_dataclass_items(self) -> None: + """Test line 214->212: tuple containing items without __dataclass_fields__. + + This tests the defensive branch where a tuple field contains items that + are not ASTNodes (don't have __dataclass_fields__). + """ + + class CountingVisitor(ASTVisitor): + """Visitor that counts visits.""" + + def __init__(self) -> None: + """Initialize visitor.""" + super().__init__() + self.visit_count = 0 + + def visit(self, node): + """Count each visit.""" + self.visit_count += 1 + return super().visit(node) + + # Create a message with normal structure + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + + # Monkey-patch the elements tuple to include a non-ASTNode item + # This is testing a defensive code path that shouldn't happen in normal usage + # but guards against malformed AST structures + modified_elements = ( + TextElement(value="First"), + MockFieldContainer(value="not_an_astnode"), # No __dataclass_fields__ + TextElement(value="Last"), + ) + + # Use object.__setattr__ to bypass frozen dataclass protection + object.__setattr__(msg.value, "elements", modified_elements) + + visitor = CountingVisitor() + visitor.generic_visit(msg) + + # The visitor should visit the Message, Pattern, Identifier, and the two TextElements + # but NOT the MockFieldContainer (it lacks __dataclass_fields__) + # Visit count: Message (1) + Identifier (1) + Pattern (1) + 2 TextElements (2) = 5 + assert visitor.visit_count == 5 + + def test_generic_visit_tuple_with_mixed_items(self) -> None: + """Test tuple containing mix of ASTNodes and non-ASTNodes. + + This comprehensively tests the line 214 branch logic where we check + each tuple item for __dataclass_fields__. + """ + + class VisitOrderTracker(ASTVisitor): + """Track order of visits.""" + + def __init__(self) -> None: + """Initialize tracker.""" + super().__init__() + self.visit_order: list[str] = [] + + def visit(self, node): + """Record visit order.""" + node_name = type(node).__name__ + if node_name == "TextElement": + text_value = getattr(node, "value", "") + self.visit_order.append(f"TextElement:{text_value}") + else: + self.visit_order.append(node_name) + return super().visit(node) + + # Create pattern with mixed elements + pattern = Pattern( + elements=( + TextElement(value="A"), + TextElement(value="B"), + ) + ) + + # Inject non-ASTNode items into the tuple + mixed_elements = ( + TextElement(value="A"), + "string_value", # Not an ASTNode, will be skipped + TextElement(value="B"), + 123, # int, will be skipped by primitive check + ) + + object.__setattr__(pattern, "elements", mixed_elements) + + visitor = VisitOrderTracker() + visitor.generic_visit(pattern) + + # Should visit TextElement:A and TextElement:B, skipping string and int + assert "TextElement:A" in visitor.visit_order + assert "TextElement:B" in visitor.visit_order + # String and int should not appear + assert "str" not in visitor.visit_order + assert "int" not in visitor.visit_order + + def test_generic_visit_non_tuple_non_dataclass_field(self) -> None: + """Test line 217->203: single field that is an object without __dataclass_fields__. + + This tests the defensive else branch where a field value is: + - Not None + - Not a primitive (str, int, float, bool) + - Not a tuple + - Not an ASTNode (no __dataclass_fields__) + """ + # Create a message + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + + # Replace the 'comment' field (normally None or Comment ASTNode) with a + # plain object that doesn't have __dataclass_fields__ + plain_obj = PlainObject(data="test") + object.__setattr__(msg, "comment", plain_obj) + + class VisitorTracker(ASTVisitor): + """Track what gets visited.""" + + def __init__(self) -> None: + """Initialize tracker.""" + super().__init__() + self.visited_types: set[str] = set() + + def visit(self, node): + """Track visits.""" + self.visited_types.add(type(node).__name__) + return super().visit(node) + + visitor = VisitorTracker() + visitor.generic_visit(msg) + + # Should have visited Message's children (Identifier, Pattern, TextElement) + # but NOT the PlainObject (it doesn't have __dataclass_fields__) + assert "Identifier" in visitor.visited_types + assert "Pattern" in visitor.visited_types + assert "TextElement" in visitor.visited_types + assert "PlainObject" not in visitor.visited_types diff --git a/tests/syntax_visitor_cases/expression_nodes.py b/tests/syntax_visitor_cases/expression_nodes.py new file mode 100644 index 00000000..32d0b48b --- /dev/null +++ b/tests/syntax_visitor_cases/expression_nodes.py @@ -0,0 +1,200 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# EXPRESSION NODES +# ============================================================================ + + +class TestVisitorLiterals: + """Test visiting literal expression nodes.""" + + def test_visit_string_literal(self) -> None: + """Visit string literal.""" + visitor = CountingVisitor() + literal = StringLiteral(value="test") + + visitor.visit(literal) + + assert visitor.counts["StringLiteral"] == 1 + + def test_visit_number_literal(self) -> None: + """Visit number literal.""" + visitor = CountingVisitor() + literal = NumberLiteral(value=42, raw="42") + + visitor.visit(literal) + + assert visitor.counts["NumberLiteral"] == 1 + + +class TestVisitorReferences: + """Test visiting reference expression nodes.""" + + def test_visit_variable_reference(self) -> None: + """Visit variable reference.""" + visitor = CountingVisitor() + ref = VariableReference(id=Identifier(name="count")) + + visitor.visit(ref) + + assert visitor.counts["VariableReference"] == 1 + assert visitor.counts["Identifier"] == 1 + + def test_visit_message_reference_simple(self) -> None: + """Visit message reference without attribute.""" + visitor = CountingVisitor() + ref = MessageReference(id=Identifier(name="hello"), attribute=None) + + visitor.visit(ref) + + assert visitor.counts["MessageReference"] == 1 + assert visitor.counts["Identifier"] == 1 + + def test_visit_message_reference_with_attribute(self) -> None: + """Visit message reference with attribute.""" + visitor = CountingVisitor() + ref = MessageReference( + id=Identifier(name="button"), attribute=Identifier(name="tooltip") + ) + + visitor.visit(ref) + + assert visitor.counts["MessageReference"] == 1 + assert visitor.counts["Identifier"] == 2 + + def test_visit_term_reference_simple(self) -> None: + """Visit term reference without attribute or arguments.""" + visitor = CountingVisitor() + ref = TermReference(id=Identifier(name="brand"), attribute=None, arguments=None) + + visitor.visit(ref) + + assert visitor.counts["TermReference"] == 1 + assert visitor.counts["Identifier"] == 1 + + def test_visit_term_reference_with_attribute(self) -> None: + """Visit term reference with attribute.""" + visitor = CountingVisitor() + ref = TermReference( + id=Identifier(name="brand"), + attribute=Identifier(name="version"), + arguments=None, + ) + + visitor.visit(ref) + + assert visitor.counts["TermReference"] == 1 + assert visitor.counts["Identifier"] == 2 + + def test_visit_term_reference_with_arguments(self) -> None: + """Visit term reference with arguments.""" + visitor = CountingVisitor() + ref = TermReference( + id=Identifier(name="brand"), + attribute=None, + arguments=CallArguments(positional=(), named=()), + ) + + visitor.visit(ref) + + assert visitor.counts["TermReference"] == 1 + assert visitor.counts["CallArguments"] == 1 + + +class TestVisitorFunctionReference: + """Test visiting FunctionReference nodes.""" + + def test_visit_function_reference_no_args(self) -> None: + """Visit function with no arguments.""" + visitor = CountingVisitor() + func = FunctionReference( + id=Identifier(name="NUMBER"), arguments=CallArguments(positional=(), named=()) + ) + + visitor.visit(func) + + assert visitor.counts["FunctionReference"] == 1 + assert visitor.counts["Identifier"] == 1 + assert visitor.counts["CallArguments"] == 1 + + def test_visit_function_reference_with_args(self) -> None: + """Visit function with positional arguments.""" + visitor = CountingVisitor() + func = FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="value")),), named=() + ), + ) + + visitor.visit(func) + + assert visitor.counts["FunctionReference"] == 1 + assert visitor.counts["CallArguments"] == 1 + assert visitor.counts["VariableReference"] == 1 + + +class TestVisitorSelectExpression: + """Test visiting SelectExpression nodes.""" + + def test_visit_select_expression(self) -> None: + """Visit select expression with variants.""" + visitor = CountingVisitor() + select = SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="one item"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ), + ), + ) + + visitor.visit(select) + + assert visitor.counts["SelectExpression"] == 1 + assert visitor.counts["VariableReference"] == 1 + assert visitor.counts["Variant"] == 2 + assert visitor.counts["Pattern"] == 2 + + +class TestVisitorVariant: + """Test visiting Variant nodes.""" + + def test_visit_variant_with_identifier_key(self) -> None: + """Visit variant with identifier key.""" + visitor = CountingVisitor() + variant = Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="one"),)), + default=False, + ) + + visitor.visit(variant) + + assert visitor.counts["Variant"] == 1 + assert visitor.counts["Identifier"] == 1 + assert visitor.counts["Pattern"] == 1 + + def test_visit_variant_with_number_key(self) -> None: + """Visit variant with number literal key.""" + visitor = CountingVisitor() + variant = Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="none"),)), + default=False, + ) + + visitor.visit(variant) + + assert visitor.counts["Variant"] == 1 + assert visitor.counts["NumberLiteral"] == 1 diff --git a/tests/syntax_visitor_cases/helper_visitors.py b/tests/syntax_visitor_cases/helper_visitors.py new file mode 100644 index 00000000..129aa9b4 --- /dev/null +++ b/tests/syntax_visitor_cases/helper_visitors.py @@ -0,0 +1,45 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# HELPER VISITORS +# ============================================================================ + + +class CountingVisitor(ASTVisitor): + """Counts visits to each node type.""" + + def __init__(self) -> None: + """Initialize counters.""" + super().__init__() + self.counts: dict[str, int] = {} + + def visit(self, node: Any) -> Any: + """Track each visit.""" + node_type = type(node).__name__ + self.counts[node_type] = self.counts.get(node_type, 0) + 1 + return super().visit(node) + + +class CollectingVisitor(ASTVisitor): + """Collects all identifiers visited.""" + + def __init__(self) -> None: + """Initialize collection.""" + super().__init__() + self.identifiers: list[str] = [] + + def visit_Identifier(self, node: Identifier) -> Any: + """Collect identifier names.""" + self.identifiers.append(node.name) + return self.generic_visit(node) + + +class TransformingVisitor(ASTVisitor): + """Transforms text to uppercase.""" + + def visit_TextElement(self, node: TextElement) -> TextElement: + """Transform text to uppercase.""" + return TextElement(value=node.value.upper()) diff --git a/tests/syntax_visitor_cases/pattern_and_element_nodes.py b/tests/syntax_visitor_cases/pattern_and_element_nodes.py new file mode 100644 index 00000000..72188f96 --- /dev/null +++ b/tests/syntax_visitor_cases/pattern_and_element_nodes.py @@ -0,0 +1,68 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# PATTERN AND ELEMENT NODES +# ============================================================================ + + +class TestVisitorPattern: + """Test visiting Pattern nodes.""" + + def test_visit_pattern_with_text(self) -> None: + """Visit pattern with text elements.""" + visitor = CountingVisitor() + pattern = Pattern(elements=(TextElement(value="Hello"),)) + + visitor.visit(pattern) + + assert visitor.counts["Pattern"] == 1 + assert visitor.counts["TextElement"] == 1 + + def test_visit_pattern_with_mixed_elements(self) -> None: + """Visit pattern with text and placeables.""" + visitor = CountingVisitor() + pattern = Pattern( + elements=( + TextElement(value="Hello, "), + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value="!"), + ) + ) + + visitor.visit(pattern) + + assert visitor.counts["Pattern"] == 1 + assert visitor.counts["TextElement"] == 2 + assert visitor.counts["Placeable"] == 1 + assert visitor.counts["VariableReference"] == 1 + + +class TestVisitorTextElement: + """Test visiting TextElement nodes.""" + + def test_visit_text_element(self) -> None: + """Visit text element.""" + visitor = CountingVisitor() + text = TextElement(value="Hello, World!") + + visitor.visit(text) + + assert visitor.counts["TextElement"] == 1 + + +class TestVisitorPlaceable: + """Test visiting Placeable nodes.""" + + def test_visit_placeable_with_variable(self) -> None: + """Visit placeable containing variable.""" + visitor = CountingVisitor() + placeable = Placeable(expression=VariableReference(id=Identifier(name="var"))) + + visitor.visit(placeable) + + assert visitor.counts["Placeable"] == 1 + assert visitor.counts["VariableReference"] == 1 + assert visitor.counts["Identifier"] == 1 diff --git a/tests/syntax_visitor_cases/resource_and_entry_nodes.py b/tests/syntax_visitor_cases/resource_and_entry_nodes.py new file mode 100644 index 00000000..6f87ae44 --- /dev/null +++ b/tests/syntax_visitor_cases/resource_and_entry_nodes.py @@ -0,0 +1,216 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# RESOURCE AND ENTRY NODES +# ============================================================================ + + +class TestVisitorResource: + """Test visiting Resource nodes.""" + + def test_visit_empty_resource(self) -> None: + """Visit empty resource.""" + visitor = CountingVisitor() + resource = Resource(entries=()) + + visitor.visit(resource) + + assert visitor.counts["Resource"] == 1 + + def test_visit_resource_with_messages(self) -> None: + """Visit resource with multiple messages.""" + visitor = CountingVisitor() + resource = Resource( + entries=( + Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello"),)), + attributes=(), + ), + Message( + id=Identifier(name="goodbye"), + value=Pattern(elements=(TextElement(value="Goodbye"),)), + attributes=(), + ), + ) + ) + + visitor.visit(resource) + + assert visitor.counts["Resource"] == 1 + assert visitor.counts["Message"] == 2 + assert visitor.counts["Identifier"] == 2 + assert visitor.counts["Pattern"] == 2 + assert visitor.counts["TextElement"] == 2 + + +class TestVisitorMessage: + """Test visiting Message nodes.""" + + def test_visit_simple_message(self) -> None: + """Visit message with text only.""" + visitor = CountingVisitor() + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + + visitor.visit(msg) + + assert visitor.counts["Message"] == 1 + assert visitor.counts["Identifier"] == 1 + assert visitor.counts["Pattern"] == 1 + assert visitor.counts["TextElement"] == 1 + + def test_visit_message_with_attributes(self) -> None: + """Visit message with attributes.""" + visitor = CountingVisitor() + msg = Message( + id=Identifier(name="button"), + value=Pattern(elements=(TextElement(value="Save"),)), + attributes=( + Attribute( + id=Identifier(name="tooltip"), + value=Pattern(elements=(TextElement(value="Click to save"),)), + ), + ), + ) + + visitor.visit(msg) + + assert visitor.counts["Message"] == 1 + assert visitor.counts["Attribute"] == 1 + assert visitor.counts["Identifier"] == 2 # message + attribute + + def test_visit_message_with_comment(self) -> None: + """Visit message with comment.""" + visitor = CountingVisitor() + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + comment=Comment(content="This is a comment", type=CommentType.COMMENT), + ) + + visitor.visit(msg) + + assert visitor.counts["Message"] == 1 + assert visitor.counts["Comment"] == 1 + + def test_visit_message_without_value(self) -> None: + """Visit message without value (only attributes).""" + visitor = CountingVisitor() + msg = Message( + id=Identifier(name="test"), + value=None, + attributes=( + Attribute( + id=Identifier(name="attr"), + value=Pattern(elements=(TextElement(value="Value"),)), + ), + ), + ) + + visitor.visit(msg) + + assert visitor.counts["Message"] == 1 + assert visitor.counts["Attribute"] == 1 + # No Pattern count for message value (it's None) + assert visitor.counts["Pattern"] == 1 # From attribute + + +class TestVisitorTerm: + """Test visiting Term nodes.""" + + def test_visit_simple_term(self) -> None: + """Visit term with text only.""" + visitor = CountingVisitor() + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + ) + + visitor.visit(term) + + assert visitor.counts["Term"] == 1 + assert visitor.counts["Identifier"] == 1 + assert visitor.counts["Pattern"] == 1 + + def test_visit_term_with_attributes(self) -> None: + """Visit term with attributes.""" + visitor = CountingVisitor() + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=( + Attribute( + id=Identifier(name="version"), + value=Pattern(elements=(TextElement(value="120"),)), + ), + ), + ) + + visitor.visit(term) + + assert visitor.counts["Term"] == 1 + assert visitor.counts["Attribute"] == 1 + + def test_visit_term_with_comment(self) -> None: + """Visit term with comment.""" + visitor = CountingVisitor() + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + comment=Comment(content="Brand name", type=CommentType.COMMENT), + ) + + visitor.visit(term) + + assert visitor.counts["Term"] == 1 + assert visitor.counts["Comment"] == 1 + + +class TestVisitorAttribute: + """Test visiting Attribute nodes.""" + + def test_visit_attribute(self) -> None: + """Visit attribute node.""" + visitor = CountingVisitor() + attr = Attribute( + id=Identifier(name="tooltip"), + value=Pattern(elements=(TextElement(value="Help text"),)), + ) + + visitor.visit(attr) + + assert visitor.counts["Attribute"] == 1 + assert visitor.counts["Identifier"] == 1 + assert visitor.counts["Pattern"] == 1 + + +class TestVisitorCommentJunk: + """Test visiting Comment and Junk nodes.""" + + def test_visit_comment(self) -> None: + """Visit comment node.""" + visitor = CountingVisitor() + comment = Comment(content="This is a comment", type=CommentType.COMMENT) + + visitor.visit(comment) + + assert visitor.counts["Comment"] == 1 + + def test_visit_junk(self) -> None: + """Visit junk node.""" + visitor = CountingVisitor() + junk = Junk(content="invalid { syntax") + + visitor.visit(junk) + + assert visitor.counts["Junk"] == 1 diff --git a/tests/syntax_visitor_cases/visitor_branch_coverage.py b/tests/syntax_visitor_cases/visitor_branch_coverage.py new file mode 100644 index 00000000..74151d1d --- /dev/null +++ b/tests/syntax_visitor_cases/visitor_branch_coverage.py @@ -0,0 +1,156 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VISITOR BRANCH COVERAGE +# ============================================================================ + + +class TestVisitorBranchCoverage: + """Test visitor branch coverage for tuple fields and primitive fields.""" + + def test_visit_node_with_empty_tuple_field(self) -> None: + """Visitor handles message with empty attributes tuple.""" + message = Message( + id=Identifier("empty"), + value=Pattern(elements=(TextElement("Value"),)), + attributes=(), + ) + + class CountingVisitorLocal(ASTVisitor): + """Visitor that counts all nodes visited.""" + + def __init__(self) -> None: + """Initialize counter.""" + super().__init__() + self.visit_count = 0 + + def visit(self, node: Any) -> Any: + """Count each visit.""" + self.visit_count += 1 + return super().visit(node) + + visitor = CountingVisitorLocal() + visitor.visit(message) + + assert visitor.visit_count > 0 + + def test_visit_node_with_primitive_fields(self) -> None: + """Visitor dispatches to visit_Identifier for Identifier nodes.""" + ident = Identifier("test") + + class FieldInspector(ASTVisitor): + """Visitor that tracks Identifier visits.""" + + def __init__(self) -> None: + """Initialize tracker.""" + super().__init__() + self.visited_identifier = False + + def visit_Identifier(self, node: Identifier) -> Any: + """Record that Identifier was visited.""" + self.visited_identifier = True + return self.generic_visit(node) + + visitor = FieldInspector() + visitor.visit(ident) + + assert visitor.visited_identifier + + def test_visit_node_with_none_field(self) -> None: + """Visitor handles message with comment=None field gracefully.""" + message = Message( + id=Identifier("noComment"), + value=Pattern(elements=(TextElement("Val"),)), + attributes=(), + comment=None, + ) + + visitor = ASTVisitor() + result = visitor.visit(message) + + assert result is not None + + +class TestVisitorBranchCoverageExtended: + """Extended visitor branch coverage tests.""" + + def test_visit_resource_with_mixed_entries(self) -> None: + """Visitor traverses Resource with mix of messages, terms, comments, and junk.""" + resource = Resource( + entries=( + Comment(content="File comment", type=CommentType.RESOURCE), + Message( + id=Identifier("msg"), + value=Pattern(elements=(TextElement("Value"),)), + attributes=(), + ), + Term( + id=Identifier("term"), + value=Pattern(elements=(TextElement("Term"),)), + attributes=(), + ), + Junk(content="invalid"), + ) + ) + + visitor = ASTVisitor() + result = visitor.visit(resource) + + assert result is not None + + def test_visit_with_dataclass_fields(self) -> None: + """Visitor traverses nodes with int and bool dataclass fields.""" + num_lit = NumberLiteral(value=42, raw="42") + + variant = Variant( + key=num_lit, + value=Pattern(elements=(TextElement("Forty-two"),)), + default=True, + ) + + select = SelectExpression( + selector=VariableReference(id=Identifier("num")), + variants=(variant,), + ) + + message = Message( + id=Identifier("select"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + + visitor = ASTVisitor() + result = visitor.visit(message) + + assert result is not None + + +class TestTransformerListExpansion: + """ASTTransformer that returns a list from a visit method expands elements.""" + + def test_transform_list_with_multiple_results(self) -> None: + """Transformer returning a list from visit_TextElement expands pattern elements.""" + + class ListExpandingTransformer(ASTTransformer): + """Transformer that returns a list instead of a single node.""" + + def visit_TextElement(self, node: TextElement) -> Any: + """Return two nodes in place of one.""" + return [ + TextElement(value=node.value.upper()), + TextElement(value=" "), + ] + + pattern = Pattern(elements=( + TextElement(value="hello"), + TextElement(value="world"), + )) + + transformer = ListExpandingTransformer() + result = transformer.visit(pattern) + + assert isinstance(result, Pattern) + assert len(result.elements) > 2 diff --git a/tests/syntax_visitor_cases/visitor_customization.py b/tests/syntax_visitor_cases/visitor_customization.py new file mode 100644 index 00000000..e984785a --- /dev/null +++ b/tests/syntax_visitor_cases/visitor_customization.py @@ -0,0 +1,53 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor.py.""" + +from tests.syntax_visitor_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# VISITOR CUSTOMIZATION +# ============================================================================ + + +class TestVisitorCustomization: + """Test custom visitor implementations.""" + + def test_collecting_visitor(self) -> None: + """Custom visitor can collect specific data.""" + visitor = CollectingVisitor() + resource = Resource( + entries=( + Message( + id=Identifier(name="hello"), + value=Pattern( + elements=( + TextElement(value="Hello, "), + Placeable( + expression=VariableReference(id=Identifier(name="name")) + ), + ) + ), + attributes=(), + ), + Message( + id=Identifier(name="goodbye"), + value=Pattern(elements=(TextElement(value="Goodbye"),)), + attributes=(), + ), + ) + ) + + visitor.visit(resource) + + assert "hello" in visitor.identifiers + assert "goodbye" in visitor.identifiers + assert "name" in visitor.identifiers + + def test_transforming_visitor(self) -> None: + """Custom visitor can transform nodes.""" + visitor = TransformingVisitor() + text = TextElement(value="hello") + + result = visitor.visit(text) + + assert isinstance(result, TextElement) + assert result.value == "HELLO" diff --git a/tests/syntax_visitor_transformer_cases/__init__.py b/tests/syntax_visitor_transformer_cases/__init__.py new file mode 100644 index 00000000..c340cf1e --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/__init__.py @@ -0,0 +1,102 @@ +"""Tests for syntax.visitor: ASTTransformer transformation, validation, and error cases.""" + +from __future__ import annotations + +import pytest +from hypothesis import event, given, settings +from hypothesis import strategies as st + +from ftllexengine.syntax.ast import ( + Attribute, + CallArguments, + FunctionReference, + Identifier, + Message, + MessageReference, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TermReference, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.visitor import ASTTransformer, ASTVisitor + + +class UppercaseIdentifierTransformer(ASTTransformer): + """Test transformer that uppercases all identifiers.""" + + def visit_Identifier(self, node: Identifier) -> Identifier: + """Uppercase identifier names.""" + return Identifier(name=node.name.upper()) + + +class NoneReturningTransformer(ASTTransformer): + """Transformer that incorrectly returns None for required scalar fields.""" + + def __init__(self, target_node_type: str) -> None: + super().__init__() + self.target_node_type = target_node_type + + def visit_Identifier(self, node: Identifier) -> Identifier | None: + """Return None for Identifier when requested.""" + if self.target_node_type == "Identifier": + return None + return node + + +class ListReturningTransformer(ASTTransformer): + """Transformer that incorrectly returns lists for scalar fields.""" + + def __init__(self, target_node_type: str) -> None: + super().__init__() + self.target_node_type = target_node_type + + def visit_Identifier(self, node: Identifier) -> Identifier | list[Identifier]: + """Return a list of identifiers when requested.""" + if self.target_node_type == "Identifier": + return [node, Identifier(name="extra")] + return node + + def visit_Pattern(self, node: Pattern) -> Pattern | list[Pattern]: + """Return a list of patterns when requested.""" + if self.target_node_type == "Pattern": + return [node, Pattern(elements=())] + return self.generic_visit(node) # type: ignore[return-value] + +__all__ = [ + "ASTTransformer", + "ASTVisitor", + "Attribute", + "CallArguments", + "FunctionReference", + "Identifier", + "ListReturningTransformer", + "Message", + "MessageReference", + "NamedArgument", + "NoneReturningTransformer", + "NumberLiteral", + "Pattern", + "Placeable", + "Resource", + "SelectExpression", + "StringLiteral", + "Term", + "TermReference", + "TextElement", + "UppercaseIdentifierTransformer", + "VariableReference", + "Variant", + "event", + "given", + "pytest", + "settings", + "st", +] diff --git a/tests/syntax_visitor_transformer_cases/additional_coverage_tests.py b/tests/syntax_visitor_transformer_cases/additional_coverage_tests.py new file mode 100644 index 00000000..d641b36c --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/additional_coverage_tests.py @@ -0,0 +1,44 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# ADDITIONAL COVERAGE TESTS +# ============================================================================ + + +class TestAdditionalCoverage: + """Additional tests to ensure complete coverage.""" + + def test_validate_scalar_result_all_field_types(self) -> None: + """Test _validate_scalar_result for various required scalar fields.""" + + class AlwaysNoneTransformer(ASTTransformer): + def visit_Identifier(self, _node: Identifier) -> None: + """Always return None.""" + return + + # Test various nodes with required scalar Identifier fields + test_cases: list[tuple[str, VariableReference | Attribute]] = [ + ( + "VariableReference.id", + VariableReference(id=Identifier(name="test")), + ), + ( + "Attribute.id", + Attribute( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="val"),)), + ), + ), + ] + + transformer = AlwaysNoneTransformer() + + for _field_name, node in test_cases: + with pytest.raises(TypeError) as exc_info: + transformer.visit(node) + + # Should raise error mentioning the field cannot be None + assert "Cannot assign None to required scalar field" in str(exc_info.value) diff --git a/tests/syntax_visitor_transformer_cases/core.py b/tests/syntax_visitor_transformer_cases/core.py new file mode 100644 index 00000000..2bed48ca --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/core.py @@ -0,0 +1,386 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + + +class UppercaseIdentifierTransformer(ASTTransformer): + """Test transformer that uppercases all identifiers.""" + + def visit_Identifier(self, node: Identifier) -> Identifier: + """Uppercase identifier names.""" + return Identifier(name=node.name.upper()) + + +class TestTermTransformation: + """Test Term node transformation (line 303).""" + + def test_transform_term_with_value(self) -> None: + """Transform a Term with value and attributes.""" + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Acme Corp"),)), + attributes=( + Attribute( + id=Identifier(name="legal"), + value=Pattern(elements=(TextElement(value="Acme Corporation"),)), + ), + ), + comment=None, + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(term) + + # Should transform all identifiers to uppercase + assert isinstance(result, Term) + assert result.id.name == "BRAND" + assert result.attributes[0].id.name == "LEGAL" + + +class TestSelectExpressionTransformation: + """Test SelectExpression transformation (line 315).""" + + def test_transform_select_expression(self) -> None: + """Transform SelectExpression with variants.""" + select = SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="one item"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ), + ), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(select) + + # Should transform all identifiers + assert isinstance(result, SelectExpression) + assert result.selector.id.name == "COUNT" # type: ignore[union-attr] + assert result.variants[0].key.name == "ONE" # type: ignore[union-attr] + assert result.variants[1].key.name == "OTHER" # type: ignore[union-attr] + + +class TestVariantTransformation: + """Test Variant transformation (line 321).""" + + def test_transform_variant(self) -> None: + """Transform Variant with key and value.""" + variant = Variant( + key=Identifier(name="zero"), + value=Pattern( + elements=( + Placeable( + expression=VariableReference(id=Identifier(name="count")) + ), + ) + ), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(variant) + + # Should transform identifiers in key and value + assert isinstance(result, Variant) + assert result.key.name == "ZERO" # type: ignore[union-attr] + assert result.value.elements[0].expression.id.name == "COUNT" # type: ignore[union-attr] + + +class TestFunctionReferenceTransformation: + """Test FunctionReference transformation (line 324).""" + + def test_transform_function_reference(self) -> None: + """Transform FunctionReference with arguments.""" + func_ref = FunctionReference( + id=Identifier(name="number"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="amount")),), + named=( + NamedArgument( + name=Identifier(name="minimumFractionDigits"), + value=NumberLiteral(value=2, raw="2"), + ), + ), + ), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(func_ref) + + # Should transform all identifiers + assert isinstance(result, FunctionReference) + assert result.id.name == "NUMBER" + assert result.arguments.positional[0].id.name == "AMOUNT" # type: ignore[union-attr] + assert result.arguments.named[0].name.name == "MINIMUMFRACTIONDIGITS" + + +class TestMessageReferenceTransformation: + """Test MessageReference transformation (line 330).""" + + def test_transform_message_reference_without_attribute(self) -> None: + """Transform MessageReference without attribute.""" + msg_ref = MessageReference( + id=Identifier(name="welcome"), attribute=None + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(msg_ref) + + assert isinstance(result, MessageReference) + assert result.id.name == "WELCOME" + assert result.attribute is None + + def test_transform_message_reference_with_attribute(self) -> None: + """Transform MessageReference with attribute.""" + msg_ref = MessageReference( + id=Identifier(name="welcome"), + attribute=Identifier(name="tooltip"), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(msg_ref) + + assert isinstance(result, MessageReference) + assert result.id.name == "WELCOME" + assert result.attribute.name == "TOOLTIP" # type: ignore[union-attr] + + +class TestTermReferenceTransformation: + """Test TermReference transformation (line 336).""" + + def test_transform_term_reference_simple(self) -> None: + """Transform TermReference without attribute or arguments.""" + term_ref = TermReference( + id=Identifier(name="brand"), + attribute=None, + arguments=None, + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(term_ref) + + assert isinstance(result, TermReference) + assert result.id.name == "BRAND" + assert result.attribute is None + assert result.arguments is None + + def test_transform_term_reference_with_attribute_and_arguments(self) -> None: + """Transform TermReference with attribute and arguments.""" + term_ref = TermReference( + id=Identifier(name="brand"), + attribute=Identifier(name="legal"), + arguments=CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="case"), + value=StringLiteral(value="upper"), + ), + ), + ), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(term_ref) + + assert isinstance(result, TermReference) + assert result.id.name == "BRAND" + assert result.attribute.name == "LEGAL" # type: ignore[union-attr] + assert result.arguments.named[0].name.name == "CASE" # type: ignore[union-attr] + + +class TestVariableReferenceTransformation: + """Test VariableReference transformation (line 343).""" + + def test_transform_variable_reference(self) -> None: + """Transform VariableReference.""" + var_ref = VariableReference(id=Identifier(name="userName")) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(var_ref) + + assert isinstance(result, VariableReference) + assert result.id.name == "USERNAME" + + +class TestCallArgumentsTransformation: + """Test CallArguments transformation (line 345).""" + + def test_transform_call_arguments(self) -> None: + """Transform CallArguments: positional args are visited, named arg names are visited. + + Named arg values are FTLLiteral (StringLiteral | NumberLiteral) leaf nodes; + the transformer returns them unchanged (generic_visit returns leaf nodes as-is). + The identifier in named arg NAME is visited and uppercased. + """ + call_args = CallArguments( + positional=( + VariableReference(id=Identifier(name="value")), + NumberLiteral(value=42, raw="42"), + ), + named=( + NamedArgument( + name=Identifier(name="option"), + value=StringLiteral(value="opt_value"), + ), + ), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(call_args) + + assert isinstance(result, CallArguments) + assert result.positional[0].id.name == "VALUE" # type: ignore[union-attr] + assert result.positional[1].value == 42 # type: ignore[union-attr] + assert result.named[0].name.name == "OPTION" + # Literal value is a leaf node; returned unchanged by generic_visit + assert result.named[0].value == StringLiteral(value="opt_value") + + +class TestNamedArgumentTransformation: + """Test NamedArgument transformation (line 351).""" + + def test_transform_named_argument(self) -> None: + """Transform NamedArgument: name identifier is visited; literal value is unchanged. + + Named arg values are FTLLiteral (StringLiteral | NumberLiteral); generic_visit + returns leaf nodes as-is. The identifier in the name field is visited. + """ + named_arg = NamedArgument( + name=Identifier(name="minimumFractionDigits"), + value=NumberLiteral(value=2, raw="2"), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(named_arg) + + assert isinstance(result, NamedArgument) + assert result.name.name == "MINIMUMFRACTIONDIGITS" + # Literal value is a leaf node; returned unchanged + assert result.value == NumberLiteral(value=2, raw="2") + + +class TestAttributeTransformation: + """Test Attribute transformation (line 353).""" + + def test_transform_attribute(self) -> None: + """Transform Attribute with id and value.""" + attr = Attribute( + id=Identifier(name="tooltip"), + value=Pattern( + elements=( + Placeable( + expression=VariableReference(id=Identifier(name="text")) + ), + ) + ), + ) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(attr) + + assert isinstance(result, Attribute) + assert result.id.name == "TOOLTIP" + assert result.value.elements[0].expression.id.name == "TEXT" # type: ignore[union-attr] + + +class TestTransformListEdgeCases: + """Test _transform_list method edge cases.""" + + def test_transform_empty_tuple(self) -> None: + """Transform empty tuple.""" + pattern = Pattern(elements=()) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(pattern) + + assert isinstance(result, Pattern) + assert result.elements == () + + def test_transform_large_list(self) -> None: + """Transform large list of elements.""" + elements = tuple( + Placeable(expression=VariableReference(id=Identifier(name=f"var{i}"))) + for i in range(100) + ) + pattern = Pattern(elements=elements) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(pattern) + + assert isinstance(result, Pattern) + assert len(result.elements) == 100 + # All identifiers should be uppercased + for i, elem in enumerate(result.elements): + assert elem.expression.id.name == f"VAR{i}".upper() # type: ignore[union-attr] + + +class TestTransformerPropertyBased: + """Property-based tests for Transformer.""" + + @given( + st.text( + min_size=1, + max_size=20, + alphabet=st.characters(min_codepoint=97, max_codepoint=122), + ) + ) + @settings(max_examples=50) + def test_identifier_transformation_is_idempotent(self, name: str) -> None: + """Transforming twice yields same result (idempotency).""" + identifier = Identifier(name=name) + transformer = UppercaseIdentifierTransformer() + + result1 = transformer.visit(identifier) + assert isinstance(result1, Identifier), f"Expected Identifier, got {type(result1)}" + result2 = transformer.visit(result1) + assert isinstance(result2, Identifier), f"Expected Identifier, got {type(result2)}" + + event("outcome=idempotent") + + # Uppercasing twice should give same result + assert result1.name == result2.name + assert result1.name == name.upper() + + @given( + st.lists( + st.text( + min_size=1, + max_size=10, + alphabet=st.characters(min_codepoint=97, max_codepoint=122), + ), + min_size=0, + max_size=20, + ) + ) + @settings(max_examples=30) + def test_transform_pattern_with_variable_count(self, names: list[str]) -> None: + """Transform pattern with arbitrary number of variables.""" + elements = tuple( + Placeable(expression=VariableReference(id=Identifier(name=name))) + for name in names + ) + pattern = Pattern(elements=elements) + + transformer = UppercaseIdentifierTransformer() + result = transformer.visit(pattern) + assert isinstance(result, Pattern), f"Expected Pattern, got {type(result)}" + + event(f"element_count={len(names)}") + + assert len(result.elements) == len(names) + for i, name in enumerate(names): + elem = result.elements[i] + assert isinstance(elem, Placeable), f"Expected Placeable, got {type(elem)}" + assert isinstance(elem.expression, VariableReference), ( + f"Expected VariableReference, got {type(elem.expression)}" + ) + assert elem.expression.id.name == name.upper() diff --git a/tests/syntax_visitor_transformer_cases/error_cases_and_defensive_branches_from_test_visitor_error_cases_py.py b/tests/syntax_visitor_transformer_cases/error_cases_and_defensive_branches_from_test_visitor_error_cases_py.py new file mode 100644 index 00000000..b5f501a5 --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/error_cases_and_defensive_branches_from_test_visitor_error_cases_py.py @@ -0,0 +1,8 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# ERROR CASES AND DEFENSIVE BRANCHES (from test_visitor_error_cases.py) +# ============================================================================ diff --git a/tests/syntax_visitor_transformer_cases/scalar_field_validation_from_test_transformer_validation_py.py b/tests/syntax_visitor_transformer_cases/scalar_field_validation_from_test_transformer_validation_py.py new file mode 100644 index 00000000..9eb0db38 --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/scalar_field_validation_from_test_transformer_validation_py.py @@ -0,0 +1,256 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# SCALAR FIELD VALIDATION (from test_transformer_validation.py) +# ============================================================================ + + +class TestASTTransformerValidation: + """Tests for ASTTransformer scalar field validation.""" + + def test_scalar_field_accepts_single_node(self) -> None: + """Scalar field accepts single ASTNode return value.""" + class RenameIdentifierTransformer(ASTTransformer): + def visit_Identifier(self, node: Identifier) -> Identifier: + return Identifier(name="renamed") + + message = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="World"),)), + attributes=(), + ) + + transformer = RenameIdentifierTransformer() + transformed = transformer.transform(message) + + # Transformation should succeed + assert isinstance(transformed, Message) + assert transformed.id.name == "renamed" + + def test_scalar_field_rejects_none(self) -> None: + """Scalar field assignment rejects None return value.""" + class RemoveIdentifierTransformer(ASTTransformer): + def visit_Identifier(self, node: Identifier) -> None: + return None # Invalid: scalar field requires node + + message = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="World"),)), + attributes=(), + ) + + transformer = RemoveIdentifierTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.transform(message) + + error_msg = str(exc_info.value) + assert "Cannot assign None to required scalar field" in error_msg + assert "Message.id" in error_msg + assert "Required scalar fields must have a single ASTNode" in error_msg + + def test_scalar_field_rejects_list(self) -> None: + """Scalar field assignment rejects list[ASTNode] return value.""" + class ExpandIdentifierTransformer(ASTTransformer): + def visit_Identifier(self, node: Identifier) -> list[Identifier]: + return [ # Invalid: scalar field requires single node + Identifier(name="id1"), + Identifier(name="id2"), + ] + + message = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="World"),)), + attributes=(), + ) + + transformer = ExpandIdentifierTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.transform(message) + + error_msg = str(exc_info.value) + assert "Cannot assign list to scalar field" in error_msg + assert "Message.id" in error_msg + assert "Got 2 nodes" in error_msg + + def test_collection_field_accepts_list(self) -> None: + """Collection field accepts list[ASTNode] return value via _transform_list.""" + class ExpandTextElementTransformer(ASTTransformer): + def visit_TextElement(self, node: TextElement) -> list[TextElement]: + # Valid: Pattern.elements is a collection field + return [ + TextElement(value="Hello"), + TextElement(value=" "), + TextElement(value="World"), + ] + + pattern = Pattern(elements=(TextElement(value="HelloWorld"),)) + + transformer = ExpandTextElementTransformer() + transformed = transformer.transform(pattern) + + # Transformation should succeed + assert isinstance(transformed, Pattern) + assert len(transformed.elements) == 3 + first_element = transformed.elements[0] + assert isinstance(first_element, TextElement) + assert first_element.value == "Hello" + + def test_optional_scalar_field_accepts_none_when_original_is_none(self) -> None: + """Optional scalar fields (e.g., Message.value) accept None when original has attributes.""" + from ftllexengine.syntax.ast import Attribute + + # Message without value but with attribute (valid per spec) + message = Message( + id=Identifier(name="empty"), + value=None, # Optional field + attributes=( + Attribute( + id=Identifier(name="attr"), + value=Pattern(elements=(TextElement(value="val"),)), + ), + ), + ) + + class NoOpTransformer(ASTTransformer): + pass + + transformer = NoOpTransformer() + transformed = transformer.transform(message) + + # Transformation should succeed + assert isinstance(transformed, Message) + assert transformed.value is None + + def test_optional_scalar_field_accepts_none_when_transformer_removes(self) -> None: + """Optional scalar fields accept None return value to remove existing value.""" + from ftllexengine.enums import CommentType + from ftllexengine.syntax.ast import Comment + + # Message with comment (optional field) + message = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="World"),)), + attributes=(), + comment=Comment(content="A comment", type=CommentType.COMMENT), + ) + + class RemoveCommentTransformer(ASTTransformer): + def visit_Comment(self, node: Comment) -> None: + return None # Valid: removes optional comment field + + transformer = RemoveCommentTransformer() + transformed = transformer.transform(message) + + # Transformation should succeed with comment removed + assert isinstance(transformed, Message) + assert transformed.comment is None + assert transformed.id.name == "hello" + + def test_placeable_expression_validation(self) -> None: + """Placeable.expression validates scalar field assignment.""" + class RemoveExpressionTransformer(ASTTransformer): + def visit_VariableReference(self, node: VariableReference) -> None: + return None # Invalid: Placeable.expression requires node + + placeable = Placeable(expression=VariableReference(id=Identifier(name="var"))) + + transformer = RemoveExpressionTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.transform(placeable) + + error_msg = str(exc_info.value) + assert "Cannot assign None to required scalar field" in error_msg + assert "Placeable.expression" in error_msg + + def test_error_message_shows_node_types_for_list(self) -> None: + """Error message for list assignment shows node types.""" + class MultipleIdentifiersTransformer(ASTTransformer): + def visit_Identifier(self, node: Identifier) -> list[Identifier]: + return [ + Identifier(name="a"), + Identifier(name="b"), + Identifier(name="c"), + ] + + message = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + + transformer = MultipleIdentifiersTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.transform(message) + + error_msg = str(exc_info.value) + assert "Got 3 nodes" in error_msg + assert "['Identifier', 'Identifier', 'Identifier']" in error_msg + + def test_nested_transformation_validates_all_levels(self) -> None: + """Validation applies recursively at all nesting levels.""" + class RemoveNestedIdentifierTransformer(ASTTransformer): + def visit_Identifier(self, node: Identifier) -> Identifier | None: + if node.name == "var": + return None # Invalid for scalar field + return node + + # Nested structure: Message -> Pattern -> Placeable -> VariableReference -> Identifier + message = Message( + id=Identifier(name="msg"), + value=Pattern( + elements=( + Placeable(expression=VariableReference(id=Identifier(name="var"))), + ) + ), + attributes=(), + ) + + transformer = RemoveNestedIdentifierTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.transform(message) + + # Error should be raised when trying to assign None to VariableReference.id + error_msg = str(exc_info.value) + assert "Cannot assign None to required scalar field" in error_msg + assert "VariableReference.id" in error_msg + + def test_validation_with_generic_visit(self) -> None: + """Validation works with default generic_visit (no custom visit methods).""" + class BreakScalarFieldTransformer(ASTTransformer): + def visit_Identifier(self, node: Identifier) -> None: + return None + + # Use a complex node to test generic_visit path + from ftllexengine.syntax.ast import ( + NumberLiteral, + SelectExpression, + Variant, + ) + + select_expr = SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one"),)), + default=True, + ), + ), + ) + + transformer = BreakScalarFieldTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.transform(select_expr) + + # Should fail on SelectExpression.selector -> VariableReference.id + error_msg = str(exc_info.value) + assert "Cannot assign None to required scalar field" in error_msg diff --git a/tests/syntax_visitor_transformer_cases/tests_for_generic_visit_branch_coverage.py b/tests/syntax_visitor_transformer_cases/tests_for_generic_visit_branch_coverage.py new file mode 100644 index 00000000..11c2e06e --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/tests_for_generic_visit_branch_coverage.py @@ -0,0 +1,118 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TESTS FOR GENERIC_VISIT BRANCH COVERAGE +# ============================================================================ + + +class TestGenericVisitBranchCoverage: + """Test branch coverage in generic_visit (lines 214, 217).""" + + def test_generic_visit_skips_none_values(self) -> None: + """Generic visit skips None field values (branch coverage for line 207).""" + # Message with value=None but with attribute (valid per spec), and comment=None + msg = Message( + id=Identifier(name="test"), + value=None, + attributes=( + Attribute( + id=Identifier(name="attr"), + value=Pattern(elements=(TextElement(value="val"),)), + ), + ), + comment=None, + ) + + visitor = ASTVisitor() + result = visitor.generic_visit(msg) + + # Should complete without error (None values are skipped) + assert result is msg + + def test_generic_visit_skips_string_fields(self) -> None: + """Generic visit skips string fields (branch coverage for line 207).""" + # TextElement has a string 'value' field + text = TextElement(value="Hello, World!") + + visitor = ASTVisitor() + result = visitor.generic_visit(text) + + # Should complete without error (string fields are skipped) + assert result is text + + def test_generic_visit_skips_int_fields(self) -> None: + """Generic visit skips int fields (branch coverage for line 207).""" + # Create a node with int field (custom test node) + # Since AST doesn't have many int fields directly, use a workaround + # Actually, Identifier just has 'name' (str), so let's use a different approach + + # The coverage here is about ensuring we skip non-ASTNode fields + # Let's verify by checking the behavior is correct + ident = Identifier(name="test") + + visitor = ASTVisitor() + result = visitor.generic_visit(ident) + + assert result is ident + + def test_generic_visit_tuple_with_non_astnode_items(self) -> None: + """Generic visit skips tuple items without __dataclass_fields__ (line 214 branch). + + This tests the negative branch of: + if hasattr(item, "__dataclass_fields__"): + """ + + class TupleFieldVisitor(ASTVisitor): + """Visitor that tracks tuple processing.""" + + def __init__(self) -> None: + """Initialize visitor.""" + super().__init__() + self.visited_types: list[str] = [] + + def visit(self, node): + """Track visited node types.""" + self.visited_types.append(type(node).__name__) + return super().visit(node) + + # Pattern has elements tuple, which normally contains ASTNodes + # We'll create a normal pattern and verify tuple processing + pattern = Pattern( + elements=( + TextElement(value="Hello"), + TextElement(value="World"), + ) + ) + + visitor = TupleFieldVisitor() + visitor.generic_visit(pattern) + + # Should have visited the TextElements in the tuple + assert "TextElement" in visitor.visited_types + + def test_generic_visit_non_tuple_non_astnode_field(self) -> None: + """Generic visit handles non-tuple, non-ASTNode single fields (line 217 branch). + + This tests the negative branch of: + elif hasattr(value, "__dataclass_fields__"): + """ + # All our AST nodes have either ASTNode children or primitive fields + # The negative branch is when a field is a primitive (str, int, bool) + + # Let's create a scenario with a field that's not an ASTNode + # Actually, this is already covered by string/int tests above + + # The key is to ensure we don't crash on non-ASTNode single values + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + + visitor = ASTVisitor() + result = visitor.generic_visit(msg) + + assert result is msg diff --git a/tests/syntax_visitor_transformer_cases/tests_for_transform_list_edge_cases.py b/tests/syntax_visitor_transformer_cases/tests_for_transform_list_edge_cases.py new file mode 100644 index 00000000..3db298c6 --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/tests_for_transform_list_edge_cases.py @@ -0,0 +1,147 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TESTS FOR _TRANSFORM_LIST EDGE CASES +# ============================================================================ + + +class TestTransformListNodeManagement: + """Test edge cases in _transform_list (line 552 and match branches).""" + + def test_transform_list_with_none_removal(self) -> None: + """_transform_list handles None results (node removal).""" + + class RemoveFirstElementTransformer(ASTTransformer): + """Remove first element from pattern.""" + + def __init__(self) -> None: + """Initialize transformer.""" + super().__init__() + self.first_text_seen = False + + def visit_TextElement(self, node: TextElement) -> TextElement | None: + """Remove first text element.""" + if not self.first_text_seen: + self.first_text_seen = True + return None + return node + + pattern = Pattern( + elements=( + TextElement(value="First"), + TextElement(value="Second"), + TextElement(value="Third"), + ) + ) + + transformer = RemoveFirstElementTransformer() + result = transformer.visit(pattern) + + assert isinstance(result, Pattern) + assert len(result.elements) == 2 + assert result.elements[0].value == "Second" # type: ignore[union-attr] + assert result.elements[1].value == "Third" # type: ignore[union-attr] + + def test_transform_list_with_expansion(self) -> None: + """_transform_list handles list results (node expansion).""" + + class DuplicateTextElementTransformer(ASTTransformer): + """Duplicate text elements.""" + + def visit_TextElement(self, node: TextElement) -> list[TextElement]: + """Duplicate each text element.""" + return [node, TextElement(value=f"{node.value}_copy")] + + pattern = Pattern( + elements=( + TextElement(value="Hello"), + TextElement(value="World"), + ) + ) + + transformer = DuplicateTextElementTransformer() + result = transformer.visit(pattern) + + assert isinstance(result, Pattern) + assert len(result.elements) == 4 + assert result.elements[0].value == "Hello" # type: ignore[union-attr] + assert result.elements[1].value == "Hello_copy" # type: ignore[union-attr] + assert result.elements[2].value == "World" # type: ignore[union-attr] + assert result.elements[3].value == "World_copy" # type: ignore[union-attr] + + def test_transform_list_with_single_replacement(self) -> None: + """_transform_list handles single ASTNode results (replacement, line 552).""" + + class UppercaseTextTransformer(ASTTransformer): + """Uppercase text elements.""" + + def visit_TextElement(self, node: TextElement) -> TextElement: + """Uppercase text.""" + return TextElement(value=node.value.upper()) + + pattern = Pattern( + elements=( + TextElement(value="hello"), + TextElement(value="world"), + ) + ) + + transformer = UppercaseTextTransformer() + result = transformer.visit(pattern) + + assert isinstance(result, Pattern) + assert len(result.elements) == 2 + assert result.elements[0].value == "HELLO" # type: ignore[union-attr] + assert result.elements[1].value == "WORLD" # type: ignore[union-attr] + + def test_transform_list_mixed_operations(self) -> None: + """_transform_list handles mix of None, list, and single node returns.""" + + class MixedTransformer(ASTTransformer): + """Transform with mixed return types.""" + + def __init__(self) -> None: + """Initialize transformer.""" + super().__init__() + self.element_count = 0 + + def visit_TextElement( + self, node: TextElement + ) -> TextElement | None | list[TextElement]: + """Return different types based on position.""" + self.element_count += 1 + + match self.element_count: + case 1: + # Remove first element + return None + case 2: + # Expand second element + return [ + TextElement(value=f"{node.value}_a"), + TextElement(value=f"{node.value}_b"), + ] + case _: + # Keep remaining elements (single node) + return node + + pattern = Pattern( + elements=( + TextElement(value="first"), + TextElement(value="second"), + TextElement(value="third"), + ) + ) + + transformer = MixedTransformer() + result = transformer.visit(pattern) + + assert isinstance(result, Pattern) + # First removed, second expanded to 2, third kept = 3 elements + assert len(result.elements) == 3 + assert result.elements[0].value == "second_a" # type: ignore[union-attr] + assert result.elements[1].value == "second_b" # type: ignore[union-attr] + assert result.elements[2].value == "third" # type: ignore[union-attr] diff --git a/tests/syntax_visitor_transformer_cases/tests_for_validate_optional_scalar_result_error_cases.py b/tests/syntax_visitor_transformer_cases/tests_for_validate_optional_scalar_result_error_cases.py new file mode 100644 index 00000000..9931c195 --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/tests_for_validate_optional_scalar_result_error_cases.py @@ -0,0 +1,51 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TESTS FOR _validate_optional_scalar_result ERROR CASES +# ============================================================================ + + +class TestValidateOptionalScalarResultErrors: + """Test error cases in _validate_optional_scalar_result (lines 360-366).""" + + def test_list_for_optional_message_value_raises_typeerror(self) -> None: + """Returning list for Message.value (optional) raises TypeError (lines 360-366).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Hello"),)), + attributes=(), + ) + + transformer = ListReturningTransformer("Pattern") + + with pytest.raises(TypeError) as exc_info: + transformer.visit(msg) + + error_msg = str(exc_info.value) + assert ( + "Cannot assign list to optional scalar field 'Message.value'" in error_msg + ) + assert "Scalar fields require a single ASTNode or None" in error_msg + assert "Got 2 nodes:" in error_msg + + def test_list_for_optional_message_reference_attribute_raises_typeerror( + self, + ) -> None: + """Returning list for MessageReference.attribute raises TypeError (lines 360-366).""" + msg_ref = MessageReference( + id=Identifier(name="button"), attribute=Identifier(name="tooltip") + ) + + transformer = ListReturningTransformer("Identifier") + + # The error will occur when visiting the attribute field + with pytest.raises(TypeError) as exc_info: + transformer.visit(msg_ref) + + error_msg = str(exc_info.value) + # Could be Message.id or MessageReference.attribute depending on traversal order + assert "Cannot assign list to" in error_msg + assert "scalar field" in error_msg diff --git a/tests/syntax_visitor_transformer_cases/tests_for_validate_scalar_result_error_cases.py b/tests/syntax_visitor_transformer_cases/tests_for_validate_scalar_result_error_cases.py new file mode 100644 index 00000000..2b0d5b19 --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/tests_for_validate_scalar_result_error_cases.py @@ -0,0 +1,117 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TESTS FOR _validate_scalar_result ERROR CASES +# ============================================================================ + + +class TestValidateScalarResultErrors: + """Test error cases in _validate_scalar_result (lines 318-331).""" + + def test_none_for_required_message_id_raises_typeerror(self) -> None: + """Returning None for Message.id raises TypeError (lines 318-323).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Hello"),)), + attributes=(), + ) + + transformer = NoneReturningTransformer("Identifier") + + with pytest.raises(TypeError) as exc_info: + transformer.visit(msg) + + assert "Cannot assign None to required scalar field 'Message.id'" in str( + exc_info.value + ) + assert "Required scalar fields must have a single ASTNode" in str( + exc_info.value + ) + + def test_none_for_required_term_value_raises_typeerror(self) -> None: + """Returning None for Term.value raises TypeError (lines 318-323).""" + + class NonePatternTransformer(ASTTransformer): + def visit_Pattern(self, _node: Pattern) -> None: + """Return None for Pattern (invalid for Term.value).""" + return + + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + ) + + transformer = NonePatternTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.visit(term) + + assert "Cannot assign None to required scalar field 'Term.value'" in str( + exc_info.value + ) + + def test_list_for_scalar_message_id_raises_typeerror(self) -> None: + """Returning list for Message.id raises TypeError (lines 325-331).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Hello"),)), + attributes=(), + ) + + transformer = ListReturningTransformer("Identifier") + + with pytest.raises(TypeError) as exc_info: + transformer.visit(msg) + + error_msg = str(exc_info.value) + assert "Cannot assign list to scalar field 'Message.id'" in error_msg + assert "Scalar fields require a single ASTNode" in error_msg + assert "Got 2 nodes:" in error_msg + assert "['Identifier', 'Identifier']" in error_msg + + def test_list_for_scalar_term_value_raises_typeerror(self) -> None: + """Returning list for Term.value raises TypeError (lines 325-331).""" + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + ) + + transformer = ListReturningTransformer("Pattern") + + with pytest.raises(TypeError) as exc_info: + transformer.visit(term) + + error_msg = str(exc_info.value) + assert "Cannot assign list to scalar field 'Term.value'" in error_msg + assert "Got 2 nodes:" in error_msg + assert "['Pattern', 'Pattern']" in error_msg + + def test_list_for_scalar_placeable_expression_raises_typeerror(self) -> None: + """Returning list for Placeable.expression raises TypeError (lines 325-331).""" + + class ListVariableRefTransformer(ASTTransformer): + def visit_VariableReference( + self, node: VariableReference + ) -> list[VariableReference]: + """Return list of VariableReferences.""" + return [node, VariableReference(id=Identifier(name="extra"))] + + placeable = Placeable( + expression=VariableReference(id=Identifier(name="count")) + ) + + transformer = ListVariableRefTransformer() + + with pytest.raises(TypeError) as exc_info: + transformer.visit(placeable) + + error_msg = str(exc_info.value) + assert ( + "Cannot assign list to scalar field 'Placeable.expression'" in error_msg + ) + assert "['VariableReference', 'VariableReference']" in error_msg diff --git a/tests/syntax_visitor_transformer_cases/transform_list_type_validation_from_test_transformer_type_validation_py.py b/tests/syntax_visitor_transformer_cases/transform_list_type_validation_from_test_transformer_type_validation_py.py new file mode 100644 index 00000000..49fc8dfe --- /dev/null +++ b/tests/syntax_visitor_transformer_cases/transform_list_type_validation_from_test_transformer_type_validation_py.py @@ -0,0 +1,192 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_syntax_visitor_transformer.py.""" + +from tests.syntax_visitor_transformer_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# TRANSFORM LIST TYPE VALIDATION (from test_transformer_type_validation.py) +# ============================================================================ + + +def _make_resource(*messages: Message) -> Resource: + """Create a Resource with the given messages.""" + return Resource(entries=messages) + + +def _make_simple_message(name: str, text: str) -> Message: + """Create a simple message with a text pattern.""" + return Message( + id=Identifier(name=name, span=None), + value=Pattern(elements=(TextElement(value=text),)), + attributes=(), + comment=None, + span=None, + ) + + +class TestTransformListTypeValidation: + """_transform_list rejects wrong-typed nodes.""" + + def test_message_in_pattern_elements_rejected(self) -> None: + """Message node in Pattern.elements raises TypeError. + + Pattern.elements expects TextElement | Placeable. Producing a Message + violates the field type constraint. + """ + + class BadTransformer(ASTTransformer): + def visit_TextElement(self, node: TextElement) -> Message: + return _make_simple_message("wrong", "bad") + + resource = _make_resource(_make_simple_message("msg", "hello")) + transformer = BadTransformer() + + with pytest.raises(TypeError, match=r"Pattern\.elements.*TextElement \| Placeable"): + transformer.transform(resource) + + def test_text_element_in_resource_entries_rejected(self) -> None: + """TextElement in Resource.entries raises TypeError. + + Resource.entries expects Message | Term | Comment | Junk. + """ + + class BadTransformer(ASTTransformer): + def visit_Message(self, node: Message) -> TextElement: + return TextElement(value="not a message") + + resource = _make_resource(_make_simple_message("msg", "hello")) + transformer = BadTransformer() + + with pytest.raises(TypeError, match=r"Resource\.entries.*Message \| Term"): + transformer.transform(resource) + + def test_message_in_call_arguments_named_rejected(self) -> None: + """Message in CallArguments.named raises TypeError. + + CallArguments.named expects NamedArgument only. + """ + + class BadTransformer(ASTTransformer): + def visit_NamedArgument(self, node: NamedArgument) -> Message: + return _make_simple_message("wrong", "bad") + + func_ref = FunctionReference( + id=Identifier(name="NUMBER", span=None), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="x", span=None), span=None),), + named=( + NamedArgument( + name=Identifier(name="style", span=None), + value=StringLiteral(value="decimal", span=None), + span=None, + ), + ), + ), + span=None, + ) + msg = Message( + id=Identifier(name="msg", span=None), + value=Pattern( + elements=(Placeable(expression=func_ref),), + ), + attributes=(), + comment=None, + span=None, + ) + resource = _make_resource(msg) + transformer = BadTransformer() + + with pytest.raises(TypeError, match=r"CallArguments\.named.*NamedArgument"): + transformer.transform(resource) + + +class TestTransformListTypeValidationExpand: + """_transform_list validates types in expanded lists.""" + + def test_expanded_list_with_wrong_type_rejected(self) -> None: + """List expansion with wrong type raises TypeError. + + When visit_* returns a list, each element must match expected types. + """ + + class ExpandBadTransformer(ASTTransformer): + def visit_TextElement( + self, node: TextElement + ) -> list[Message]: + return [_make_simple_message("wrong", "bad")] + + resource = _make_resource(_make_simple_message("msg", "hello")) + transformer = ExpandBadTransformer() + + with pytest.raises(TypeError, match=r"Pattern\.elements"): + transformer.transform(resource) + + +class TestTransformListTypeValidationValid: + """Valid transformations pass type validation.""" + + def test_identity_transform_succeeds(self) -> None: + """Identity transformer (no changes) passes validation.""" + resource = _make_resource( + _make_simple_message("msg1", "hello"), + _make_simple_message("msg2", "world"), + ) + transformer = ASTTransformer() + result = transformer.transform(resource) + assert isinstance(result, Resource) + + def test_correct_type_replacement_succeeds(self) -> None: + """Replacing TextElement with another TextElement passes validation.""" + + class UpperTransformer(ASTTransformer): + def visit_TextElement(self, node: TextElement) -> TextElement: + return TextElement(value=node.value.upper()) + + resource = _make_resource(_make_simple_message("msg", "hello")) + transformer = UpperTransformer() + result = transformer.transform(resource) + assert isinstance(result, Resource) + elements = result.entries[0].value.elements # type: ignore[union-attr] + assert elements[0].value == "HELLO" # type: ignore[union-attr] + + def test_none_removal_succeeds(self) -> None: + """Removing elements via None passes validation (no type check needed).""" + + class RemoveTransformer(ASTTransformer): + def visit_Message(self, node: Message) -> None: + return None + + resource = _make_resource( + _make_simple_message("msg1", "hello"), + _make_simple_message("msg2", "world"), + ) + transformer = RemoveTransformer() + result = transformer.transform(resource) + assert isinstance(result, Resource) + assert len(result.entries) == 0 + + def test_correct_expansion_succeeds(self) -> None: + """Expanding one Message into two Messages passes validation.""" + + class DuplicateTransformer(ASTTransformer): + def visit_Message(self, node: Message) -> list[Message]: + copy = Message( + id=Identifier(name=node.id.name + "_copy", span=None), + value=node.value, + attributes=(), + comment=None, + span=None, + ) + return [node, copy] + + resource = _make_resource(_make_simple_message("msg", "hello")) + transformer = DuplicateTransformer() + result = transformer.transform(resource) + assert isinstance(result, Resource) + assert len(result.entries) == 2 + entry0 = result.entries[0] + entry1 = result.entries[1] + assert isinstance(entry0, Message) + assert isinstance(entry1, Message) + assert entry0.id.name == "msg" + assert entry1.id.name == "msg_copy" diff --git a/tests/test_architecture_contract.py b/tests/test_architecture_contract.py index 95be89e9..df620f62 100644 --- a/tests/test_architecture_contract.py +++ b/tests/test_architecture_contract.py @@ -35,7 +35,10 @@ re.compile(r"raw\.githubusercontent\.com"), ) -VERSION_PROVENANCE_PATTERN = re.compile(r"\b(?:Added|Pre|Post|Prior to)\s+v\d+\.\d+\.\d+\b|v\d+\.\d+\.\d+\+") +VERSION_PROVENANCE_PATTERN = re.compile( + r"\b(?:Added|Pre|Post|Prior to)\s+v\d+\.\d+\.\d+\b|v\d+\.\d+\.\d+\+" +) +LARGE_OWNER_BUDGET_THRESHOLD = 1000 FILE_LINE_BUDGETS = { "src/ftllexengine/runtime/bundle.py": 120, @@ -56,24 +59,154 @@ "src/ftllexengine/localization/orchestrator.py": 400, "src/ftllexengine/parsing/currency.py": 650, "src/ftllexengine/parsing/dates.py": 350, - "src/ftllexengine/syntax/serializer.py": 700, + "src/ftllexengine/syntax/serializer.py": 450, + "src/ftllexengine/syntax/serializer_engine.py": 350, "src/ftllexengine/diagnostics/templates.py": 80, "src/ftllexengine/diagnostics/template_reference.py": 220, "src/ftllexengine/diagnostics/template_runtime.py": 190, "src/ftllexengine/diagnostics/template_parsing.py": 150, + "src/ftllexengine/validation/resource.py": 240, + "src/ftllexengine/validation/resource_common.py": 60, + "src/ftllexengine/validation/resource_entries.py": 260, + "src/ftllexengine/validation/resource_syntax.py": 100, "src/ftllexengine/syntax/visitor.py": 750, "src/ftllexengine/syntax/cursor.py": 700, "tests/test_runtime_bundle_property_core.py": 800, "tests/test_runtime_bundle_property_references.py": 900, "tests/test_runtime_bundle_property_advanced.py": 1000, "tests/test_runtime_bundle_property_state.py": 750, - "tests/test_syntax_serializer.py": 3100, - "tests/test_syntax_parser_property.py": 2850, - "tests/strategies/ftl.py": 2700, - "fuzz_atheris/fuzz_localization.py": 2300, - "fuzz_atheris/fuzz_runtime.py": 1500, - "scripts/fuzz_hypofuzz.sh": 1300, - "scripts/fuzz_atheris.sh": 1100, + "tests/test_introspection_iso.py": 20, + "tests/introspection_iso_cases/lookup.py": 560, + "tests/introspection_iso_cases/cache_and_babel.py": 640, + "tests/introspection_iso_cases/error_paths.py": 320, + "tests/introspection_iso_cases/defensive_branches.py": 560, + "tests/introspection_iso_cases/requirements.py": 360, + "tests/test_runtime_cache_integrity.py": 20, + "tests/runtime_cache_integrity_cases/checksums.py": 320, + "tests/runtime_cache_integrity_cases/write_once_audit.py": 460, + "tests/runtime_cache_integrity_cases/idempotence_and_hashes.py": 400, + "tests/runtime_cache_integrity_cases/integrity_edges.py": 620, + "tests/runtime_cache_integrity_cases/limits_and_timing.py": 320, + "tests/test_introspection_message.py": 20, + "tests/introspection_message_cases/extraction_and_references.py": 580, + "tests/introspection_message_cases/contracts_and_spans.py": 540, + "tests/introspection_message_cases/properties_and_branches.py": 520, + "tests/introspection_message_cases/cache_and_validation.py": 360, + "tests/test_diagnostics_frozen_error.py": 20, + "tests/diagnostics_frozen_error_cases/core_behavior.py": 580, + "tests/diagnostics_frozen_error_cases/branch_coverage.py": 600, + "tests/diagnostics_frozen_error_cases/formatting_and_hashes.py": 620, + "tests/test_runtime_locale_context.py": 20, + "tests/runtime_locale_context_cases/construction_and_cache.py": 480, + "tests/runtime_locale_context_cases/number_formatting.py": 280, + "tests/runtime_locale_context_cases/datetime_and_currency.py": 440, + "tests/runtime_locale_context_cases/boundaries_and_extras.py": 500, + "tests/test_runtime_resolver_selection.py": 20, + "tests/runtime_resolver_selection_cases/pattern_resolution.py": 420, + "tests/runtime_resolver_selection_cases/numeric_matching.py": 480, + "tests/runtime_resolver_selection_cases/number_literal_edges.py": 460, + "tests/runtime_resolver_selection_cases/fallback_and_errors.py": 340, + "tests/test_localization_orchestration.py": 20, + "tests/localization_orchestration_cases/load_and_lookup.py": 420, + "tests/localization_orchestration_cases/strict_and_boot.py": 420, + "tests/localization_orchestration_cases/cache_and_properties.py": 420, + "tests/localization_orchestration_cases/ast_and_cleanup.py": 460, + "tests/test_localization.py": 20, + "tests/localization_cases/basics_and_fallback.py": 340, + "tests/localization_cases/loaders_and_cache.py": 360, + "tests/localization_cases/multilocale_and_callbacks.py": 560, + "tests/localization_cases/validation_and_streams.py": 480, + "tests/test_syntax_serializer_core.py": 950, + "tests/test_syntax_serializer_text_validation.py": 800, + "tests/test_syntax_serializer_patterns.py": 550, + "tests/test_syntax_serializer_helpers.py": 550, + "tests/test_syntax_serializer_branches.py": 700, + "tests/test_runtime_bundle.py": 20, + "tests/runtime_bundle_cases/__init__.py": 40, + "tests/runtime_bundle_cases/basic.py": 820, + "tests/runtime_bundle_cases/state.py": 820, + "tests/runtime_bundle_cases/introspection.py": 320, + "tests/runtime_bundle_cases/properties.py": 700, + "tests/test_syntax_validator.py": 20, + "tests/syntax_validator_cases/__init__.py": 60, + "tests/syntax_validator_cases/entries.py": 620, + "tests/syntax_validator_cases/results.py": 620, + "tests/syntax_validator_cases/high_level.py": 500, + "tests/syntax_validator_cases/regressions.py": 620, + "tests/test_syntax_parser_property.py": 20, + "tests/syntax_parser_property_cases/__init__.py": 60, + "tests/syntax_parser_property_cases/core.py": 700, + "tests/syntax_parser_property_cases/syntax_elements.py": 760, + "tests/syntax_parser_property_cases/grammar_boundaries.py": 780, + "tests/syntax_parser_property_cases/roundtrip_and_malformed.py": 700, + "tests/strategies/ftl.py": 20, + "tests/strategies/ftl_shared.py": 80, + "tests/strategies/ftl_strings.py": 620, + "tests/strategies/ftl_ast.py": 780, + "tests/strategies/ftl_structural.py": 500, + "tests/strategies/ftl_whitespace.py": 440, + "tests/strategies/ftl_negative.py": 500, + "tests/fuzz/test_syntax_serializer_property.py": 40, + "tests/test_syntax_parser_core.py": 40, + "tests/test_syntax_parser_expressions.py": 40, + "tests/test_syntax_parser_patterns.py": 40, + "tests/test_validation_resource.py": 40, + "tests/test_syntax_visitor_transformer.py": 40, + "tests/test_runtime_resolver_depth_cycles.py": 40, + "tests/test_parsing_currency.py": 40, + "tests/test_parsing_dates.py": 40, + "tests/test_runtime_cache_hashable.py": 40, + "tests/fuzz/test_runtime_resolver_state_machine.py": 40, + "tests/strategy_metrics.py": 1260, + "tests/fuzz/test_localization_property.py": 40, + "tests/test_runtime_cache_property.py": 40, + "tests/test_runtime_function_bridge.py": 40, + "tests/test_syntax_visitor.py": 40, + "tests/test_syntax_parser_error_recovery.py": 40, + "tests/test_runtime_plural_rules.py": 40, + "tests/test_integration_e2e.py": 40, + "tests/test_validation_resource_dependency_graph.py": 40, + "tests/test_syntax_cursor_property.py": 40, + "tests/test_syntax_serializer_roundtrip.py": 40, + "tests/test_syntax_cursor.py": 40, + "fuzz_atheris/fuzz_localization.py": 40, + "fuzz_atheris/fuzz_localization_entry.py": 200, + "fuzz_atheris/fuzz_localization_support.py": 380, + "fuzz_atheris/fuzz_localization_patterns_basic.py": 560, + "fuzz_atheris/fuzz_localization_patterns_validation.py": 380, + "fuzz_atheris/fuzz_localization_patterns_introspection.py": 420, + "fuzz_atheris/fuzz_localization_patterns_loader.py": 360, + "fuzz_atheris/fuzz_localization_patterns_boot.py": 280, + "fuzz_atheris/fuzz_runtime.py": 40, + "fuzz_atheris/fuzz_runtime_entry.py": 300, + "fuzz_atheris/fuzz_runtime_support.py": 420, + "fuzz_atheris/fuzz_runtime_builders.py": 420, + "fuzz_atheris/fuzz_runtime_scenarios.py": 460, + "fuzz_atheris/fuzz_bridge.py": 40, + "fuzz_atheris/fuzz_bridge_entry.py": 200, + "fuzz_atheris/fuzz_bridge_support.py": 420, + "fuzz_atheris/fuzz_bridge_patterns_registration.py": 260, + "fuzz_atheris/fuzz_bridge_patterns_numbers.py": 320, + "fuzz_atheris/fuzz_bridge_patterns_dispatch.py": 480, + "fuzz_atheris/fuzz_serializer.py": 40, + "fuzz_atheris/fuzz_serializer_entry.py": 200, + "fuzz_atheris/fuzz_serializer_support.py": 440, + "fuzz_atheris/fuzz_serializer_patterns_text.py": 320, + "fuzz_atheris/fuzz_serializer_patterns_transform.py": 240, + "fuzz_atheris/fuzz_serializer_mutators.py": 220, + "fuzz_atheris/fuzz_builtins.py": 40, + "fuzz_atheris/fuzz_builtins_entry.py": 180, + "fuzz_atheris/fuzz_builtins_support.py": 420, + "fuzz_atheris/fuzz_builtins_patterns_number.py": 220, + "fuzz_atheris/fuzz_builtins_patterns_datetime.py": 180, + "fuzz_atheris/fuzz_builtins_patterns_currency.py": 340, + "scripts/fuzz_hypofuzz.sh": 300, + "scripts/lib/fuzz_hypofuzz/common.sh": 220, + "scripts/lib/fuzz_hypofuzz/modes_check.sh": 320, + "scripts/lib/fuzz_hypofuzz/modes_fuzz.sh": 500, + "scripts/fuzz_atheris.sh": 220, + "scripts/lib/fuzz_atheris/common.sh": 180, + "scripts/lib/fuzz_atheris/commands.sh": 320, } @@ -118,6 +251,26 @@ def _git_visible_repo_files() -> list[Path]: return files +def _git_tracked_repo_files() -> list[Path]: + """List files present in the git index for the current worktree state.""" + git = shutil.which("git") + assert git is not None + result = subprocess.run( + [git, "ls-files", "--cached", "-z"], + check=True, + capture_output=True, + cwd=REPO_ROOT, + ) + files: list[Path] = [] + for raw_path in result.stdout.split(b"\0"): + if not raw_path: + continue + path = REPO_ROOT / raw_path.decode("utf-8") + if path.is_file(): + files.append(path) + return files + + def test_internal_modules_do_not_reverse_layer_dependencies() -> None: """Non-facade modules should only import within or below their own layer.""" violations: list[str] = [] @@ -278,3 +431,113 @@ def test_large_repo_files_stay_under_line_budgets() -> None: offenders.append(f"{relative_path}: {line_count} > {max_lines}") assert offenders == [] + + +def test_large_python_and_shell_owners_have_explicit_budgets() -> None: + """Any very large owner must opt into an explicit architecture budget.""" + offenders: list[str] = [] + scan_roots = ( + REPO_ROOT / "src", + REPO_ROOT / "tests", + REPO_ROOT / "fuzz_atheris", + REPO_ROOT / "scripts", + ) + + for root in scan_roots: + for path in sorted(root.rglob("*")): + if path.suffix not in {".py", ".sh"} or path.name == "__init__.py": + continue + relative = path.relative_to(REPO_ROOT).as_posix() + line_count = len(path.read_text(encoding="utf-8").splitlines()) + if ( + line_count >= LARGE_OWNER_BUDGET_THRESHOLD + and relative not in FILE_LINE_BUDGETS + ): + offenders.append(f"{relative}: {line_count}") + + assert offenders == [] + + +def test_hypofuzz_entrypoint_delegates_to_split_libraries() -> None: + """HypoFuzz entrypoint should stay a thin dispatcher over focused shell libs.""" + entrypoint = (REPO_ROOT / "scripts" / "fuzz_hypofuzz.sh").read_text(encoding="utf-8") + + expected_libraries = ( + REPO_ROOT / "scripts" / "lib" / "fuzz_hypofuzz" / "common.sh", + REPO_ROOT / "scripts" / "lib" / "fuzz_hypofuzz" / "modes_check.sh", + REPO_ROOT / "scripts" / "lib" / "fuzz_hypofuzz" / "modes_fuzz.sh", + ) + + for lib_path in expected_libraries: + assert lib_path.exists() + assert f'source "$FUZZ_LIB_DIR/{lib_path.name}"' in entrypoint + + +def test_hypofuzz_helper_libraries_are_git_tracked() -> None: + """Split HypoFuzz helper libraries must be part of tracked repository state.""" + tracked_paths = { + path.relative_to(REPO_ROOT).as_posix() for path in _git_tracked_repo_files() + } + expected = { + "scripts/lib/fuzz_hypofuzz/common.sh", + "scripts/lib/fuzz_hypofuzz/modes_check.sh", + "scripts/lib/fuzz_hypofuzz/modes_fuzz.sh", + } + + assert expected <= tracked_paths + + +def test_atheris_entrypoint_delegates_to_split_libraries() -> None: + """Atheris entrypoint should stay a thin dispatcher over focused shell libs.""" + entrypoint = (REPO_ROOT / "scripts" / "fuzz_atheris.sh").read_text(encoding="utf-8") + + expected_libraries = ( + REPO_ROOT / "scripts" / "lib" / "fuzz_atheris" / "common.sh", + REPO_ROOT / "scripts" / "lib" / "fuzz_atheris" / "commands.sh", + ) + + for lib_path in expected_libraries: + assert lib_path.exists() + assert f'source "$FUZZ_LIB_DIR/{lib_path.name}"' in entrypoint + + assert "fuzz_atheris/targets.tsv" in ( + REPO_ROOT / "scripts" / "lib" / "fuzz_atheris" / "common.sh" + ).read_text(encoding="utf-8") + + +def test_canonical_split_surfaces_are_git_tracked() -> None: + """Canonical split surfaces and devcontainer workflow files must be tracked.""" + tracked_paths = { + path.relative_to(REPO_ROOT).as_posix() for path in _git_tracked_repo_files() + } + expected: set[str] = set() + patterns = ( + ".devcontainer/*", + "docs/DEVELOPER_DEVCONTAINER.md", + "scripts/devcontainer-prepare-user-home.sh", + "scripts/validate-devcontainer.sh", + "scripts/lib/fuzz_atheris/*.sh", + "scripts/lib/fuzz_hypofuzz/*.sh", + "fuzz_atheris/targets.tsv", + "fuzz_atheris/fuzz_*_entry.py", + "fuzz_atheris/fuzz_*_support.py", + "fuzz_atheris/fuzz_*_patterns*.py", + "fuzz_atheris/fuzz_*_builders.py", + "fuzz_atheris/fuzz_*_scenarios.py", + "fuzz_atheris/fuzz_*_mutators.py", + "tests/*_cases/__init__.py", + "tests/*_cases/*.py", + "src/ftllexengine/parsing/text_normalization.py", + "src/ftllexengine/runtime/locale_resolution.py", + "src/ftllexengine/validation/resource_*.py", + "src/ftllexengine/syntax/serializer_engine.py", + ) + + for pattern in patterns: + expected.update( + path.relative_to(REPO_ROOT).as_posix() + for path in REPO_ROOT.glob(pattern) + if path.is_file() + ) + + assert expected <= tracked_paths diff --git a/tests/test_coverage_policy.py b/tests/test_coverage_policy.py index b08e48ff..7b0fe445 100644 --- a/tests/test_coverage_policy.py +++ b/tests/test_coverage_policy.py @@ -20,10 +20,11 @@ def test_pyproject_enforces_full_line_and_branch_coverage() -> None: assert coverage_report["fail_under"] == 100.0 -def test_scripts_test_sh_uses_same_coverage_threshold() -> None: - """The main test script should match the pyproject coverage policy.""" +def test_scripts_test_sh_reads_coverage_threshold_from_pyproject() -> None: + """The main test script should derive its coverage threshold from pyproject.""" content = (REPO_ROOT / "scripts" / "test.sh").read_text(encoding="utf-8") - match = re.search(r"^DEFAULT_COV_LIMIT=(\d+)$", content, re.MULTILINE) - assert match is not None - assert int(match.group(1)) == 100 + assert "read_coverage_threshold()" in content + assert 'data["tool"]["coverage"]["report"]["fail_under"]' in content + assert 'DEFAULT_COV_LIMIT="$(read_coverage_threshold)"' in content + assert re.search(r"^DEFAULT_COV_LIMIT=100$", content, re.MULTILINE) is None diff --git a/tests/test_diagnostics_frozen_error.py b/tests/test_diagnostics_frozen_error.py index 01ea6f33..fcaf0fc7 100644 --- a/tests/test_diagnostics_frozen_error.py +++ b/tests/test_diagnostics_frozen_error.py @@ -1,1705 +1,5 @@ -"""Property-based tests for FrozenFluentError integrity guarantees. +"""Aggregated frozen diagnostic error test surface.""" -Tests the data integrity features of the FrozenFluentError class: -- BLAKE2b-128 content hashing (determinism, collision resistance) -- Immutability enforcement (no mutation after construction) -- Sealed type enforcement (no subclassing) -- Content-based equality and hashability -- verify_integrity() method correctness - -These tests verify financial-grade data safety properties using Hypothesis -property-based testing. -""" - -from __future__ import annotations - -from typing import Literal - -import pytest -from hypothesis import assume, event, example, given, settings -from hypothesis import strategies as st - -from ftllexengine.diagnostics import ( - Diagnostic, - DiagnosticCode, - ErrorCategory, - FrozenErrorContext, - FrozenFluentError, - SourceSpan, -) -from ftllexengine.integrity import ImmutabilityViolationError -from tests.strategies.diagnostics import error_categories - -# ============================================================================= -# Strategies for generating test data -# ============================================================================= - - -@st.composite -def error_messages(draw: st.DrawFn) -> str: - """Generate valid error messages.""" - return draw(st.text(min_size=1, max_size=200)) - - -@st.composite -def optional_diagnostics(draw: st.DrawFn) -> Diagnostic | None: - """Generate optional Diagnostic objects.""" - if draw(st.booleans()): - code = draw(st.sampled_from(list(DiagnosticCode))) - message = draw(st.text(min_size=1, max_size=100)) - return Diagnostic(code=code, message=message, severity="error") - return None - - -@st.composite -def optional_contexts(draw: st.DrawFn) -> FrozenErrorContext | None: - """Generate optional FrozenErrorContext objects.""" - if draw(st.booleans()): - return FrozenErrorContext( - input_value=draw(st.text(min_size=0, max_size=50)), - locale_code=draw(st.text(min_size=1, max_size=10)), - parse_type=draw(st.sampled_from( - ["", "currency", "date", "datetime", "decimal", "number"] - )), - fallback_value=draw(st.text(min_size=0, max_size=50)), - ) - return None - - -@st.composite -def frozen_fluent_errors(draw: st.DrawFn) -> FrozenFluentError: - """Generate FrozenFluentError instances.""" - return FrozenFluentError( - message=draw(error_messages()), - category=draw(error_categories()), - diagnostic=draw(optional_diagnostics()), - context=draw(optional_contexts()), - ) - - -# ============================================================================= -# Content Hash Properties -# ============================================================================= - - -@pytest.mark.fuzz -class TestContentHashDeterminism: - """Content hash must be deterministic - same inputs always produce same hash.""" - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=100) - def test_same_inputs_produce_same_hash( - self, message: str, category: ErrorCategory - ) -> None: - """Property: Identical errors have identical content hashes.""" - error1 = FrozenFluentError(message, category) - error2 = FrozenFluentError(message, category) - - event(f"msg_len={len(message)}") - assert error1.content_hash == error2.content_hash - assert error1 == error2 - event("outcome=hash_determinism_success") - - @given( - message=error_messages(), - category=error_categories(), - diagnostic=optional_diagnostics(), - context=optional_contexts(), - ) - @settings(max_examples=100) - def test_same_inputs_with_optional_fields( - self, - message: str, - category: ErrorCategory, - diagnostic: Diagnostic | None, - context: FrozenErrorContext | None, - ) -> None: - """Property: Identical errors with optional fields have identical hashes.""" - error1 = FrozenFluentError(message, category, diagnostic, context) - error2 = FrozenFluentError(message, category, diagnostic, context) - - has_diag = diagnostic is not None - has_ctx = context is not None - event(f"has_diagnostic={has_diag}") - event(f"has_context={has_ctx}") - assert error1.content_hash == error2.content_hash - assert error1 == error2 - - @given(error=frozen_fluent_errors()) - @settings(max_examples=100) - def test_hash_is_16_bytes(self, error: FrozenFluentError) -> None: - """Property: Content hash is always 16 bytes (BLAKE2b-128).""" - event(f"category={error.category.name}") - assert len(error.content_hash) == 16 - - -@pytest.mark.fuzz -class TestContentHashCollisionResistance: - """Different inputs should produce different hashes (high probability).""" - - @given( - message1=error_messages(), - message2=error_messages(), - category=error_categories(), - ) - @settings(max_examples=100) - def test_different_messages_different_hashes( - self, message1: str, message2: str, category: ErrorCategory - ) -> None: - """Property: Different messages produce different hashes.""" - assume(message1 != message2) - - error1 = FrozenFluentError(message1, category) - error2 = FrozenFluentError(message2, category) - - event(f"msg1_len={len(message1)}") - event(f"msg2_len={len(message2)}") - assert error1.content_hash != error2.content_hash - assert error1 != error2 - event("outcome=hash_collision_resistance") - - @given( - message=error_messages(), - category1=error_categories(), - category2=error_categories(), - ) - @settings(max_examples=100) - def test_different_categories_different_hashes( - self, message: str, category1: ErrorCategory, category2: ErrorCategory - ) -> None: - """Property: Different categories produce different hashes.""" - assume(category1 != category2) - - error1 = FrozenFluentError(message, category1) - error2 = FrozenFluentError(message, category2) - - event(f"cat1={category1.name}") - event(f"cat2={category2.name}") - assert error1.content_hash != error2.content_hash - assert error1 != error2 - - -# ============================================================================= -# Immutability Enforcement -# ============================================================================= - - -@pytest.mark.fuzz -class TestImmutabilityEnforcement: - """FrozenFluentError must reject all mutations after construction.""" - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_cannot_modify_message(self, error: FrozenFluentError) -> None: - """Property: Cannot modify message after construction.""" - with pytest.raises(ImmutabilityViolationError): - error._message = "modified" - event(f"msg_len={len(error.message)}") - event("outcome=immutability_enforced") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_cannot_modify_category(self, error: FrozenFluentError) -> None: - """Property: Cannot modify category after construction.""" - with pytest.raises(ImmutabilityViolationError): - error._category = ErrorCategory.PARSE - event(f"category={error.category.name}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_cannot_modify_diagnostic(self, error: FrozenFluentError) -> None: - """Property: Cannot modify diagnostic after construction.""" - with pytest.raises(ImmutabilityViolationError): - error._diagnostic = None - has_diag = error.diagnostic is not None - event(f"has_diagnostic={has_diag}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_cannot_modify_context(self, error: FrozenFluentError) -> None: - """Property: Cannot modify context after construction.""" - with pytest.raises(ImmutabilityViolationError): - error._context = None - has_ctx = error.context is not None - event(f"has_context={has_ctx}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_cannot_modify_content_hash(self, error: FrozenFluentError) -> None: - """Property: Cannot modify content hash after construction.""" - with pytest.raises(ImmutabilityViolationError): - error._content_hash = b"fake" - event(f"category={error.category.name}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_cannot_delete_attributes(self, error: FrozenFluentError) -> None: - """Property: Cannot delete any attributes.""" - with pytest.raises(ImmutabilityViolationError): - del error._message - event(f"category={error.category.name}") - - -# ============================================================================= -# Sealed Type Enforcement -# ============================================================================= - - -class TestSealedTypeEnforcement: - """FrozenFluentError must reject subclassing at runtime.""" - - def test_cannot_subclass(self) -> None: - """FrozenFluentError cannot be subclassed.""" - with pytest.raises(TypeError, match="cannot be subclassed"): - # pylint: disable=unused-variable,subclassed-final-class - class MaliciousError(FrozenFluentError): # type: ignore[misc] - pass - - -# ============================================================================= -# Integrity Verification -# ============================================================================= - - -@pytest.mark.fuzz -class TestVerifyIntegrity: - """verify_integrity() must correctly detect corruption.""" - - @given(error=frozen_fluent_errors()) - @settings(max_examples=100) - def test_fresh_error_passes_integrity_check( - self, error: FrozenFluentError - ) -> None: - """Property: Freshly constructed errors always pass integrity check.""" - event(f"category={error.category.name}") - assert error.verify_integrity() is True - event("outcome=integrity_check_passed") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=100) - def test_integrity_is_idempotent(self, error: FrozenFluentError) -> None: - """Property: verify_integrity() can be called multiple times.""" - event(f"category={error.category.name}") - assert error.verify_integrity() is True - assert error.verify_integrity() is True - assert error.verify_integrity() is True - - -# ============================================================================= -# Hashability and Set/Dict Usage -# ============================================================================= - - -@pytest.mark.fuzz -class TestHashability: - """FrozenFluentError must be usable in sets and as dict keys.""" - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_error_is_hashable(self, error: FrozenFluentError) -> None: - """Property: Errors are hashable (can use hash()).""" - h = hash(error) - assert isinstance(h, int) - event(f"category={error.category.name}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_hash_is_stable(self, error: FrozenFluentError) -> None: - """Property: Hash is stable across multiple calls.""" - h1 = hash(error) - h2 = hash(error) - h3 = hash(error) - assert h1 == h2 == h3 - event(f"category={error.category.name}") - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=50) - def test_equal_errors_have_equal_hashes( - self, message: str, category: ErrorCategory - ) -> None: - """Property: Equal errors have equal hashes (hash contract).""" - error1 = FrozenFluentError(message, category) - error2 = FrozenFluentError(message, category) - - assert error1 == error2 - assert hash(error1) == hash(error2) - event(f"category={category.name}") - - @given( - errors=st.lists(frozen_fluent_errors(), min_size=1, max_size=20, unique=True) - ) - @settings(max_examples=50) - def test_errors_can_be_added_to_set( - self, errors: list[FrozenFluentError] - ) -> None: - """Property: Errors can be stored in sets.""" - error_set = set(errors) - assert len(error_set) <= len(errors) - event(f"set_size={len(error_set)}") - - @given( - errors=st.lists(frozen_fluent_errors(), min_size=1, max_size=20, unique=True) - ) - @settings(max_examples=50) - def test_errors_can_be_dict_keys( - self, errors: list[FrozenFluentError] - ) -> None: - """Property: Errors can be used as dict keys.""" - error_dict = {e: i for i, e in enumerate(errors)} - assert len(error_dict) <= len(errors) - event(f"dict_size={len(error_dict)}") - - -# ============================================================================= -# Equality Semantics -# ============================================================================= - - -@pytest.mark.fuzz -class TestEquality: - """FrozenFluentError equality must be based on content.""" - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_error_equals_itself(self, error: FrozenFluentError) -> None: - """Property: Errors are equal to themselves (reflexivity).""" - same_ref = error - assert error == same_ref - event(f"category={error.category.name}") - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=50) - def test_identical_errors_are_equal( - self, message: str, category: ErrorCategory - ) -> None: - """Property: Identical errors are equal (symmetry).""" - error1 = FrozenFluentError(message, category) - error2 = FrozenFluentError(message, category) - - assert error1 == error2 - assert error2 == error1 - event(f"category={category.name}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_error_not_equal_to_string(self, error: FrozenFluentError) -> None: - """Property: Errors are not equal to strings.""" - assert (error == error.message) is False - event(f"category={error.category.name}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_error_not_equal_to_none(self, error: FrozenFluentError) -> None: - """Property: Errors are not equal to None (tests __eq__ method).""" - # pylint: disable=singleton-comparison - assert (error == None) is False # noqa: E711 - explicit None comparison intentional - event(f"category={error.category.name}") - - -# ============================================================================= -# Property Access -# ============================================================================= - - -@pytest.mark.fuzz -class TestPropertyAccess: - """FrozenFluentError properties must be accessible.""" - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=50) - def test_message_property(self, message: str, category: ErrorCategory) -> None: - """Property: message property returns the message.""" - error = FrozenFluentError(message, category) - assert error.message == message - event(f"msg_len={len(message)}") - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=50) - def test_category_property(self, message: str, category: ErrorCategory) -> None: - """Property: category property returns the category.""" - error = FrozenFluentError(message, category) - assert error.category == category - event(f"category={category.name}") - - @given( - message=error_messages(), - category=error_categories(), - diagnostic=optional_diagnostics(), - ) - @settings(max_examples=50) - def test_diagnostic_property( - self, - message: str, - category: ErrorCategory, - diagnostic: Diagnostic | None, - ) -> None: - """Property: diagnostic property returns the diagnostic.""" - error = FrozenFluentError(message, category, diagnostic=diagnostic) - assert error.diagnostic == diagnostic - has_diag = diagnostic is not None - event(f"has_diagnostic={has_diag}") - - @given( - message=error_messages(), - category=error_categories(), - context=optional_contexts(), - ) - @settings(max_examples=50) - def test_context_property( - self, - message: str, - category: ErrorCategory, - context: FrozenErrorContext | None, - ) -> None: - """Property: context property returns the context.""" - error = FrozenFluentError(message, category, context=context) - assert error.context == context - has_ctx = context is not None - event(f"has_context={has_ctx}") - - -# ============================================================================= -# Context Convenience Properties -# ============================================================================= - - -@pytest.mark.fuzz -class TestContextConvenienceProperties: - """FrozenFluentError convenience properties for context fields.""" - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=50) - def test_context_properties_empty_without_context( - self, message: str, category: ErrorCategory - ) -> None: - """Property: Context convenience properties return empty strings without context.""" - error = FrozenFluentError(message, category) - - assert error.fallback_value == "" - assert error.input_value == "" - assert error.locale_code == "" - assert error.parse_type == "" - event(f"category={category.name}") - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=50) - def test_context_properties_with_context( - self, message: str, category: ErrorCategory - ) -> None: - """Property: Context convenience properties return context values.""" - context = FrozenErrorContext( - input_value="test_input", - locale_code="en_US", - parse_type="number", - fallback_value="{!NUMBER}", - ) - error = FrozenFluentError(message, category, context=context) - - assert error.fallback_value == "{!NUMBER}" - assert error.input_value == "test_input" - assert error.locale_code == "en_US" - assert error.parse_type == "number" - event(f"category={category.name}") - - -# ============================================================================= -# String Representation -# ============================================================================= - - -@pytest.mark.fuzz -class TestStringRepresentation: - """FrozenFluentError must have sensible string representation.""" - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_str_returns_message(self, error: FrozenFluentError) -> None: - """Property: str() returns the error message.""" - assert str(error) == error.message - event(f"msg_len={len(error.message)}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_repr_is_valid(self, error: FrozenFluentError) -> None: - """Property: repr() returns a valid representation.""" - r = repr(error) - assert isinstance(r, str) - assert "FrozenFluentError" in r - assert "message=" in r - assert "category=" in r - event(f"category={error.category.name}") - - -# ============================================================================= -# Edge Cases -# ============================================================================= - - -class TestEdgeCases: - """Edge case tests for FrozenFluentError.""" - - def test_empty_message(self) -> None: - """FrozenFluentError accepts empty message.""" - error = FrozenFluentError("", ErrorCategory.REFERENCE) - assert error.message == "" - assert error.verify_integrity() is True - - def test_unicode_message(self) -> None: - """FrozenFluentError handles Unicode messages.""" - error = FrozenFluentError("Error: \u4e2d\u6587\u6587\u672c", ErrorCategory.PARSE) - assert error.verify_integrity() is True - - def test_emoji_message(self) -> None: - """FrozenFluentError handles emoji in messages.""" - error = FrozenFluentError("Error \U0001F44D occurred", ErrorCategory.FORMATTING) - assert error.verify_integrity() is True - - @example(message="Test") - @given(message=st.text()) - @settings(max_examples=100) - def test_arbitrary_text_messages(self, message: str) -> None: - """Property: FrozenFluentError handles arbitrary text.""" - error = FrozenFluentError(message, ErrorCategory.RESOLUTION) - assert error.verify_integrity() is True - assert error.message == message - event(f"msg_len={len(message)}") - - def test_all_categories_work(self) -> None: - """All ErrorCategory values can be used.""" - for category in ErrorCategory: - error = FrozenFluentError("test", category) - assert error.category == category - assert error.verify_integrity() is True - - -# ============================================================================= -# Exception Behavior -# ============================================================================= - - -@pytest.mark.fuzz -class TestExceptionBehavior: - """FrozenFluentError must behave like a proper exception.""" - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_can_be_raised(self, error: FrozenFluentError) -> None: - """Property: FrozenFluentError can be raised and caught.""" - with pytest.raises(FrozenFluentError) as exc_info: - raise error - assert exc_info.value is error - event(f"category={error.category.name}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_can_be_caught_as_exception(self, error: FrozenFluentError) -> None: - """Property: FrozenFluentError can be caught as Exception.""" - with pytest.raises(Exception) as exc_info: # noqa: PT011 - msg elsewhere - raise error - assert exc_info.value is error - event(f"category={error.category.name}") - - @given(error=frozen_fluent_errors()) - @settings(max_examples=50) - def test_exception_args(self, error: FrozenFluentError) -> None: - """Property: Exception args contain the message.""" - assert error.args == (error.message,) - event(f"category={error.category.name}") - - -# ============================================================================= -# Complete Branch Coverage Tests -# ============================================================================= - - -class TestCompleteBranchCoverage: - """Tests to achieve 100% branch coverage for errors.py.""" - - def test_setattr_unfrozen_branch(self) -> None: - """Test __setattr__ when _frozen is False (line 176 coverage). - - This tests the defensive else branch in __setattr__ that allows - attribute setting when the object is not yet frozen. While this - branch is not normally reached (since __init__ uses object.__setattr__ - directly), it exists as a defensive measure. - - This test forcibly unfreezes an error to exercise the branch. - """ - error = FrozenFluentError("test", ErrorCategory.REFERENCE) - - # Verify object is initially frozen - assert error.verify_integrity() is True - - # Forcibly unfreeze using object.__setattr__ to bypass immutability - object.__setattr__(error, "_frozen", False) - - # Now call the instance's __setattr__ DIRECTLY - should reach line 176 - # Must use the class method, not object.__setattr__ - FrozenFluentError.__setattr__(error, "_message", "modified") - - # Verify the change took effect (since we unfroze it) - assert error._message == "modified" - - # Re-freeze for cleanup - object.__setattr__(error, "_frozen", True) - - def test_eq_with_non_error_type_returns_not_implemented(self) -> None: - """Test __eq__ returns NotImplemented for non-FrozenFluentError types. - - The __eq__ method should return NotImplemented (not False) when - comparing with objects that are not FrozenFluentError instances. - This allows Python to try the comparison from the other object's - perspective. - """ - error = FrozenFluentError("test", ErrorCategory.REFERENCE) - - # Test with various non-FrozenFluentError types - # Direct dunder call required to verify NotImplemented return value - # (using == operator would convert NotImplemented to False) - result = error.__eq__(42) # pylint: disable=unnecessary-dunder-call - assert result is NotImplemented - - result = error.__eq__("string") # pylint: disable=unnecessary-dunder-call - assert result is NotImplemented - - result = error.__eq__({"dict": "value"}) # pylint: disable=unnecessary-dunder-call - assert result is NotImplemented - - result = error.__eq__([1, 2, 3]) # pylint: disable=unnecessary-dunder-call - assert result is NotImplemented - - # The actual equality operator should return False (Python's default) - assert (error == 42) is False - assert (error == "string") is False - - def test_compute_content_hash_with_all_fields(self) -> None: - """Test _compute_content_hash with all optional fields populated. - - This ensures the hash computation includes all diagnostic and context - fields when present, achieving full branch coverage in the hash - computation logic. - """ - diagnostic = Diagnostic( - code=DiagnosticCode.MESSAGE_NOT_FOUND, - message="Test diagnostic message", - ) - context = FrozenErrorContext( - input_value="test input", - locale_code="en_US", - parse_type="number", - fallback_value="fallback", - ) - - error1 = FrozenFluentError( - "test message", - ErrorCategory.FORMATTING, - diagnostic=diagnostic, - context=context, - ) - - # Create another with same values - error2 = FrozenFluentError( - "test message", - ErrorCategory.FORMATTING, - diagnostic=diagnostic, - context=context, - ) - - # Hashes should be identical - assert error1.content_hash == error2.content_hash - - # Verify hash includes all fields by changing each one - error3 = FrozenFluentError( - "different message", # Changed - ErrorCategory.FORMATTING, - diagnostic=diagnostic, - context=context, - ) - assert error1.content_hash != error3.content_hash - - diagnostic2 = Diagnostic( - code=DiagnosticCode.TERM_NOT_FOUND, # Different code - message="Test diagnostic message", - ) - error4 = FrozenFluentError( - "test message", - ErrorCategory.FORMATTING, - diagnostic=diagnostic2, # Changed - context=context, - ) - assert error1.content_hash != error4.content_hash - - context2 = FrozenErrorContext( - input_value="different input", # Changed - locale_code="en_US", - parse_type="number", - fallback_value="fallback", - ) - error5 = FrozenFluentError( - "test message", - ErrorCategory.FORMATTING, - diagnostic=diagnostic, - context=context2, # Changed - ) - assert error1.content_hash != error5.content_hash - - def test_hash_with_surrogates_in_text(self) -> None: - """Test content hash computation with invalid Unicode surrogates. - - The hash function uses surrogatepass error handling to ensure it can - hash any Python string, including those with unpaired surrogates from - malformed user input. - """ - # Create error with unpaired surrogate (invalid Unicode) - # Python allows these in strings but they're not valid UTF-8 - message_with_surrogate = "Error: \ud800 invalid" - - error = FrozenFluentError(message_with_surrogate, ErrorCategory.PARSE) - - # Should successfully compute hash without raising UnicodeEncodeError - assert len(error.content_hash) == 16 - assert error.verify_integrity() is True - - # Test with surrogate in context fields - context = FrozenErrorContext( - input_value="\ud800 surrogate input", - locale_code="en_US", - parse_type="currency", - fallback_value="\ud800\udc00 surrogate fallback", - ) - error_with_context = FrozenFluentError( - "test", - ErrorCategory.FORMATTING, - context=context, - ) - assert len(error_with_context.content_hash) == 16 - assert error_with_context.verify_integrity() is True - - @given( - message=st.text(), - category=error_categories(), - ) - @settings(max_examples=50) - def test_repr_contains_all_constructor_args( - self, message: str, category: ErrorCategory - ) -> None: - """Property: __repr__ includes all constructor arguments for debugging.""" - error = FrozenFluentError(message, category) - r = repr(error) - - # Should contain class name - assert "FrozenFluentError" in r - - # Should contain all field names - assert "message=" in r - assert "category=" in r - assert "diagnostic=" in r - assert "context=" in r - - # Message should be represented (possibly truncated in repr) - # Category should be shown - assert category.name in r or str(category) in r - event(f"category={category.name}") - - def test_hash_with_diagnostic_span(self) -> None: - """Test content hash computation with Diagnostic containing SourceSpan. - - This exercises lines 196-199 in _compute_content_hash where span - fields are hashed when diagnostic.span is not None. - """ - # Create diagnostic WITH span - diagnostic_with_span = Diagnostic( - code=DiagnosticCode.MESSAGE_NOT_FOUND, - message="Test message", - span=SourceSpan(start=10, end=20, line=5, column=3), - severity="error", - ) - - error1 = FrozenFluentError( - "test", - ErrorCategory.REFERENCE, - diagnostic=diagnostic_with_span, - ) - - # Create another with same span - error2 = FrozenFluentError( - "test", - ErrorCategory.REFERENCE, - diagnostic=diagnostic_with_span, - ) - - # Should have identical hashes - assert error1.content_hash == error2.content_hash - - # Create diagnostic with different span values - diagnostic_different_span = Diagnostic( - code=DiagnosticCode.MESSAGE_NOT_FOUND, - message="Test message", - span=SourceSpan(start=100, end=200, line=10, column=15), - severity="error", - ) - - error3 = FrozenFluentError( - "test", - ErrorCategory.REFERENCE, - diagnostic=diagnostic_different_span, - ) - - # Should have different hash - assert error1.content_hash != error3.content_hash - - # Verify integrity - assert error1.verify_integrity() is True - assert error3.verify_integrity() is True - - def test_hash_with_diagnostic_optional_fields(self) -> None: - """Test content hash computation with all Diagnostic optional fields. - - This exercises line 215 in _compute_content_hash where optional - string fields (hint, help_url, etc.) are hashed when not None. - """ - # Create diagnostic with ALL optional string fields populated - diagnostic_full = Diagnostic( - code=DiagnosticCode.FUNCTION_FAILED, - message="Function error", - hint="Check your arguments", - help_url="https://example.com/help", - function_name="NUMBER", - argument_name="value", - expected_type="int | Decimal", - received_type="str", - ftl_location="messages.ftl:42", - severity="error", - ) - - error1 = FrozenFluentError( - "test", - ErrorCategory.RESOLUTION, - diagnostic=diagnostic_full, - ) - - # Create another with same fields - error2 = FrozenFluentError( - "test", - ErrorCategory.RESOLUTION, - diagnostic=diagnostic_full, - ) - - # Should have identical hashes - assert error1.content_hash == error2.content_hash - - # Change one optional field - diagnostic_changed = Diagnostic( - code=DiagnosticCode.FUNCTION_FAILED, - message="Function error", - hint="Different hint", # Changed - help_url="https://example.com/help", - function_name="NUMBER", - argument_name="value", - expected_type="int | Decimal", - received_type="str", - ftl_location="messages.ftl:42", - severity="error", - ) - - error3 = FrozenFluentError( - "test", - ErrorCategory.RESOLUTION, - diagnostic=diagnostic_changed, - ) - - # Should have different hash - assert error1.content_hash != error3.content_hash - - # Verify integrity - assert error1.verify_integrity() is True - assert error3.verify_integrity() is True - - def test_hash_with_diagnostic_resolution_path(self) -> None: - """Test content hash computation with Diagnostic resolution_path. - - This exercises lines 225-228 in _compute_content_hash where - resolution_path tuple elements are hashed when not None. - """ - # Create diagnostic with resolution_path - diagnostic_with_path = Diagnostic( - code=DiagnosticCode.CYCLIC_REFERENCE, - message="Cyclic reference detected", - resolution_path=("message1", "message2", "message3"), - severity="error", - ) - - error1 = FrozenFluentError( - "test", - ErrorCategory.CYCLIC, - diagnostic=diagnostic_with_path, - ) - - # Create another with same path - error2 = FrozenFluentError( - "test", - ErrorCategory.CYCLIC, - diagnostic=diagnostic_with_path, - ) - - # Should have identical hashes - assert error1.content_hash == error2.content_hash - - # Create diagnostic with different resolution_path - diagnostic_different_path = Diagnostic( - code=DiagnosticCode.CYCLIC_REFERENCE, - message="Cyclic reference detected", - resolution_path=("message1", "message4", "message5"), # Different - severity="error", - ) - - error3 = FrozenFluentError( - "test", - ErrorCategory.CYCLIC, - diagnostic=diagnostic_different_path, - ) - - # Should have different hash - assert error1.content_hash != error3.content_hash - - # Create diagnostic with empty resolution_path - diagnostic_empty_path = Diagnostic( - code=DiagnosticCode.CYCLIC_REFERENCE, - message="Cyclic reference detected", - resolution_path=(), # Empty tuple - severity="error", - ) - - error4 = FrozenFluentError( - "test", - ErrorCategory.CYCLIC, - diagnostic=diagnostic_empty_path, - ) - - # Should have different hash from non-empty path - assert error1.content_hash != error4.content_hash - - # Verify integrity - assert error1.verify_integrity() is True - assert error3.verify_integrity() is True - assert error4.verify_integrity() is True - - def test_setattr_allows_python_exception_attributes(self) -> None: - """Test __setattr__ allows Python exception mechanism attributes. - - This exercises lines 254-255 in __setattr__ where Python's exception - handling attributes (__traceback__, __context__, __cause__, - __suppress_context__) are allowed even after freeze. - """ - error = FrozenFluentError("test", ErrorCategory.REFERENCE) - - # Python exception attributes should be settable even after freeze - # These are set by Python's exception handling mechanism - import sys # noqa: PLC0415 - import inside function - - # Create a dummy traceback by raising and catching - tb = None - try: - msg = "dummy" - raise ValueError(msg) - except ValueError: - tb = sys.exc_info()[2] - - # Should NOT raise ImmutabilityViolationError - error.__traceback__ = tb - assert error.__traceback__ is tb - - # Test __context__ (exception chaining) - context_error = ValueError("context") - error.__context__ = context_error - assert error.__context__ is context_error - - # Test __cause__ (explicit exception chaining) - cause_error = RuntimeError("cause") - error.__cause__ = cause_error - assert error.__cause__ is cause_error - - # Test __suppress_context__ - error.__suppress_context__ = True - assert error.__suppress_context__ is True - - # Verify error is still frozen for other attributes - with pytest.raises(ImmutabilityViolationError): - error._message = "modified" - - # Verify integrity is maintained - assert error.verify_integrity() is True - - def test_notes_attribute_allowed_for_python_311_compatibility(self) -> None: - """__notes__ attribute can be set for Python 3.11+ exception groups. - - Python 3.11 added __notes__ for Exception Groups (PEP 654/678). - FrozenFluentError must allow this attribute to be set even after freeze - to support exception enrichment via add_note() and exception groups. - """ - error = FrozenFluentError("test", ErrorCategory.RESOLUTION) - - # Simulate what Python's add_note() does internally - # (it sets __notes__ attribute if not present, then appends) - error.__notes__ = [] - error.__notes__.append("additional context") - error.__notes__.append("more info") - - # Verify notes were set - assert hasattr(error, "__notes__") - assert error.__notes__ == ["additional context", "more info"] - - # Verify error is still frozen for other attributes - with pytest.raises(ImmutabilityViolationError): - error._message = "modified" - - # Verify integrity is maintained - assert error.verify_integrity() is True - - def test_delattr_raises_immutability_violation(self) -> None: - """__delattr__ rejects all attribute deletions after construction.""" - error = FrozenFluentError("test", ErrorCategory.REFERENCE) - with pytest.raises(ImmutabilityViolationError): - del error._message - with pytest.raises(ImmutabilityViolationError): - del error._category - - def test_hash_returns_int_from_content_hash(self) -> None: - """__hash__ derives from all 16 bytes of BLAKE2b-128 content hash. - - Python's hash() protocol calls int.__hash__() on the returned integer, - reducing it via Mersenne prime modulus to a platform-sized hash value. - We verify the full 128-bit integer is used, not a truncated subset. - """ - error = FrozenFluentError("test", ErrorCategory.REFERENCE) - h = hash(error) - # __hash__ returns int.from_bytes(content_hash, "big") (all 16 bytes); - # Python's hash() then applies int.__hash__() which reduces via modulus. - # Compute the same reduction independently to verify full-hash derivation. - expected = hash(int.from_bytes(error.content_hash, "big")) - assert h == expected - - def test_eq_compares_content_hash_for_matching_errors(self) -> None: - """__eq__ returns True for errors with identical content.""" - error1 = FrozenFluentError("test", ErrorCategory.REFERENCE) - error2 = FrozenFluentError("test", ErrorCategory.REFERENCE) - assert error1 == error2 - - def test_eq_compares_content_hash_for_different_errors(self) -> None: - """__eq__ returns False for errors with different content.""" - error1 = FrozenFluentError("msg1", ErrorCategory.REFERENCE) - error2 = FrozenFluentError("msg2", ErrorCategory.REFERENCE) - assert error1 != error2 - - def test_convenience_properties_return_empty_without_context( - self, - ) -> None: - """Convenience properties return empty string when context is None.""" - error = FrozenFluentError("test", ErrorCategory.REFERENCE) - assert error.context is None - assert error.fallback_value == "" - assert error.input_value == "" - assert error.locale_code == "" - assert error.parse_type == "" - - def test_convenience_properties_delegate_to_context(self) -> None: - """Convenience properties return context field values when present.""" - ctx = FrozenErrorContext( - input_value="42abc", - locale_code="de_DE", - parse_type="number", - fallback_value="{!NUMBER}", - ) - error = FrozenFluentError( - "test", ErrorCategory.PARSE, context=ctx - ) - assert error.fallback_value == "{!NUMBER}" - assert error.input_value == "42abc" - assert error.locale_code == "de_DE" - assert error.parse_type == "number" - - -# ============================================================================= -# Rich Diagnostic Strategy (all optional fields) -# ============================================================================= - - -@st.composite -def rich_diagnostics(draw: st.DrawFn) -> Diagnostic: - """Generate Diagnostic objects with arbitrary optional field population. - - Produces diagnostics with varied combinations of span, hint, help_url, - function_name, argument_name, type info, ftl_location, severity, and - resolution_path. Provides broad input diversity for content hash and - format_error() fuzzing. - """ - code = draw(st.sampled_from(list(DiagnosticCode))) - message = draw(st.text(min_size=1, max_size=100)) - - has_span = draw(st.booleans()) - span = None - if has_span: - start = draw(st.integers(min_value=0, max_value=10000)) - end = draw( - st.integers(min_value=start, max_value=start + 1000) - ) - line = draw(st.integers(min_value=1, max_value=5000)) - column = draw(st.integers(min_value=1, max_value=200)) - span = SourceSpan( - start=start, end=end, line=line, column=column - ) - - opt_str = st.one_of(st.none(), st.text(min_size=1, max_size=80)) - hint = draw(opt_str) - help_url = draw(opt_str) - function_name = draw(opt_str) - argument_name = draw(opt_str) - expected_type = draw(opt_str) - received_type = draw(opt_str) - ftl_location = draw(opt_str) - severity: Literal["error", "warning"] = draw( - st.sampled_from(["error", "warning"]) - ) - - has_path = draw(st.booleans()) - resolution_path = None - if has_path: - path_elems = draw( - st.lists( - st.text(min_size=1, max_size=20), - min_size=0, - max_size=5, - ) - ) - resolution_path = tuple(path_elems) - - return Diagnostic( - code=code, - message=message, - span=span, - hint=hint, - help_url=help_url, - function_name=function_name, - argument_name=argument_name, - expected_type=expected_type, - received_type=received_type, - ftl_location=ftl_location, - severity=severity, - resolution_path=resolution_path, - ) - - -# ============================================================================= -# Rich Diagnostic Hash Properties (HypoFuzz) -# ============================================================================= - - -@pytest.mark.fuzz -class TestRichDiagnosticHashProperties: - """Content hash integrity with fully-populated diagnostic fields. - - Exercises hash computation paths for all diagnostic optional fields - (span, hint, help_url, function_name, resolution_path, etc.) - that are unreachable with the basic optional_diagnostics strategy. - """ - - @given( - message=error_messages(), - category=error_categories(), - diagnostic=rich_diagnostics(), - context=optional_contexts(), - ) - @settings(max_examples=200) - def test_hash_determinism_rich_diagnostics( - self, - message: str, - category: ErrorCategory, - diagnostic: Diagnostic, - context: FrozenErrorContext | None, - ) -> None: - """Property: Hash is deterministic with fully-populated diagnostics.""" - error1 = FrozenFluentError( - message, category, diagnostic, context - ) - error2 = FrozenFluentError( - message, category, diagnostic, context - ) - - has_span = diagnostic.span is not None - has_path = diagnostic.resolution_path is not None - n_opt = sum( - 1 - for f in ( - diagnostic.hint, - diagnostic.help_url, - diagnostic.function_name, - diagnostic.argument_name, - diagnostic.expected_type, - diagnostic.received_type, - diagnostic.ftl_location, - ) - if f is not None - ) - event(f"has_span={has_span}") - event(f"has_resolution_path={has_path}") - event(f"optional_field_count={n_opt}") - event(f"severity={diagnostic.severity}") - - assert error1.content_hash == error2.content_hash - assert error1 == error2 - assert error1.verify_integrity() - event("outcome=rich_hash_determinism") - - @given( - message=error_messages(), - category=error_categories(), - diagnostic=rich_diagnostics(), - ) - @settings(max_examples=200) - def test_rich_diagnostic_integrity( - self, - message: str, - category: ErrorCategory, - diagnostic: Diagnostic, - ) -> None: - """Property: verify_integrity() passes for all diagnostic variants.""" - error = FrozenFluentError(message, category, diagnostic) - - has_span = diagnostic.span is not None - has_path = diagnostic.resolution_path is not None - event(f"has_span={has_span}") - event(f"has_resolution_path={has_path}") - event(f"severity={diagnostic.severity}") - event(f"code={diagnostic.code.name}") - - assert error.verify_integrity() - assert len(error.content_hash) == 16 - event("outcome=rich_integrity_verified") - - @given( - message=error_messages(), - category=error_categories(), - diagnostic=rich_diagnostics(), - ) - @settings(max_examples=100) - def test_rich_diagnostic_repr_contains_fields( - self, - message: str, - category: ErrorCategory, - diagnostic: Diagnostic, - ) -> None: - """Property: repr includes all constructor args for rich diagnostics.""" - error = FrozenFluentError(message, category, diagnostic) - r = repr(error) - - assert "FrozenFluentError" in r - assert "message=" in r - assert "category=" in r - assert "diagnostic=" in r - event(f"code={diagnostic.code.name}") - event("outcome=rich_repr_valid") - - -# ============================================================================= -# Diagnostic format_error() Properties (codes.py ecosystem) -# ============================================================================= - -# Translate table mirroring DiagnosticFormatter._CONTROL_ESCAPE: -# maps every ASCII C0 control (0x00-0x1F) and DEL (0x7F) to a visible -# escape sequence. The four most common use conventional notation; all others -# use \xNN hex notation. This must stay in sync with formatter.py. -_TEST_CONTROL_TRANSLATE = str.maketrans( - {c: f"\\x{c:02x}" for c in range(0x20)} - | {0x7F: "\\x7f", 0x1B: "\\x1b", 0x0D: "\\r", 0x0A: "\\n", 0x09: "\\t"} -) - - -def _escape_control_chars(text: str) -> str: - """Mirror DiagnosticFormatter._escape_control_chars for test assertions.""" - return text.translate(_TEST_CONTROL_TRANSLATE) - - -@pytest.mark.fuzz -class TestDiagnosticFormatProperties: - """Property tests for Diagnostic.format_error() output correctness. - - Tests the Rust-inspired diagnostic formatting in codes.py, ensuring - all field combinations produce well-structured output. - """ - - @given(diagnostic=rich_diagnostics()) - @settings(max_examples=200) - def test_format_error_nonempty( - self, diagnostic: Diagnostic - ) -> None: - """Property: format_error() always returns non-empty string.""" - formatted = diagnostic.format_error() - assert isinstance(formatted, str) - assert len(formatted) > 0 - event(f"has_span={diagnostic.span is not None}") - event(f"severity={diagnostic.severity}") - event("outcome=format_nonempty") - - @given(diagnostic=rich_diagnostics()) - @settings(max_examples=200) - def test_format_error_contains_message( - self, diagnostic: Diagnostic - ) -> None: - """Property: format_error() always contains the escaped diagnostic message.""" - formatted = diagnostic.format_error() - assert _escape_control_chars(diagnostic.message) in formatted - event(f"code={diagnostic.code.name}") - event("outcome=format_contains_message") - - @given(diagnostic=rich_diagnostics()) - @settings(max_examples=200) - def test_format_error_contains_code_name( - self, diagnostic: Diagnostic - ) -> None: - """Property: format_error() always contains the diagnostic code name.""" - formatted = diagnostic.format_error() - assert diagnostic.code.name in formatted - event(f"code={diagnostic.code.name}") - - @given(diagnostic=rich_diagnostics()) - @settings(max_examples=200) - def test_format_error_severity_prefix( - self, diagnostic: Diagnostic - ) -> None: - """Property: format_error() starts with correct severity prefix.""" - formatted = diagnostic.format_error() - if diagnostic.severity == "warning": - assert formatted.startswith("warning[") - event("severity=warning") - else: - assert formatted.startswith("error[") - event("severity=error") - - @given(diagnostic=rich_diagnostics()) - @settings(max_examples=200) - def test_format_error_location_dispatch( - self, diagnostic: Diagnostic - ) -> None: - """Property: format_error() dispatches location correctly. - - Span takes precedence over ftl_location. When neither is - present, no location line appears. - """ - formatted = diagnostic.format_error() - if diagnostic.span is not None: - line_str = f"line {diagnostic.span.line}" - col_str = f"column {diagnostic.span.column}" - assert line_str in formatted - assert col_str in formatted - event("location=span") - elif diagnostic.ftl_location is not None: - assert _escape_control_chars(diagnostic.ftl_location) in formatted - event("location=ftl_location") - else: - assert "-->" not in formatted - event("location=none") - - @given(diagnostic=rich_diagnostics()) - @settings(max_examples=200) - def test_format_error_optional_field_inclusion( - self, diagnostic: Diagnostic - ) -> None: - """Property: format_error() includes all present optional fields.""" - formatted = diagnostic.format_error() - - if diagnostic.function_name: - escaped_fn = _escape_control_chars(diagnostic.function_name) - fn_line = f"function: {escaped_fn}" - assert fn_line in formatted - event("has_function=True") - else: - event("has_function=False") - - if diagnostic.argument_name: - escaped_arg = _escape_control_chars(diagnostic.argument_name) - arg_line = f"argument: {escaped_arg}" - assert arg_line in formatted - - if diagnostic.expected_type: - escaped_exp = _escape_control_chars(diagnostic.expected_type) - exp_line = f"expected: {escaped_exp}" - assert exp_line in formatted - - if diagnostic.received_type: - escaped_rcv = _escape_control_chars(diagnostic.received_type) - rcv_line = f"received: {escaped_rcv}" - assert rcv_line in formatted - - if diagnostic.resolution_path: - path_str = " -> ".join(diagnostic.resolution_path) - escaped_path = _escape_control_chars(path_str) - assert escaped_path in formatted - path_len = len(diagnostic.resolution_path) - event(f"path_len={path_len}") - - if diagnostic.hint: - escaped_hint = _escape_control_chars(diagnostic.hint) - hint_line = f"help: {escaped_hint}" - assert hint_line in formatted - event("has_hint=True") - else: - event("has_hint=False") - - if diagnostic.help_url: - escaped_url = _escape_control_chars(diagnostic.help_url) - url_line = f"note: see {escaped_url}" - assert url_line in formatted - - @given(diagnostic=rich_diagnostics()) - @settings(max_examples=100) - def test_format_error_idempotent( - self, diagnostic: Diagnostic - ) -> None: - """Property: format_error() is idempotent.""" - result1 = diagnostic.format_error() - result2 = diagnostic.format_error() - assert result1 == result2 - event(f"code={diagnostic.code.name}") - event("outcome=format_idempotent") - - -# ============================================================================= -# Hash Collision Resistance Properties (HypoFuzz) -# ============================================================================= - - -def _make_diag_with_field( - field: str, val: str -) -> Diagnostic: - """Build a Diagnostic with exactly one optional string field set.""" - return Diagnostic( - code=DiagnosticCode.FUNCTION_FAILED, - message="base", - hint=val if field == "hint" else None, - help_url=val if field == "help_url" else None, - function_name=( - val if field == "function_name" else None - ), - argument_name=( - val if field == "argument_name" else None - ), - expected_type=( - val if field == "expected_type" else None - ), - received_type=( - val if field == "received_type" else None - ), - ftl_location=( - val if field == "ftl_location" else None - ), - ) - - -_OPTIONAL_DIAG_FIELDS = [ - "hint", - "help_url", - "function_name", - "argument_name", - "expected_type", - "received_type", - "ftl_location", -] - - -@pytest.mark.fuzz -class TestHashCollisionResistanceProperties: - """Advanced collision resistance for _compute_content_hash. - - Tests three structural integrity mechanisms: - 1. Length-prefix prevents field boundary ambiguity - 2. Each optional diagnostic field independently affects hash - 3. Section markers prevent diagnostic/context presence collisions - """ - - @given( - a=st.text(min_size=2, max_size=50), - b=st.text(min_size=1, max_size=50), - ) - @settings(max_examples=200) - def test_length_prefix_prevents_boundary_collision( - self, a: str, b: str - ) -> None: - """Property: Shifting one char across field boundary changes hash. - - _hash_string uses 4-byte length prefix so ("ab","cd") and - ("a","bcd") produce different digests even though the raw - bytes concatenate identically without the prefix. - - Events emitted: - - a_len={n}: Length of first field - - b_len={n}: Length of second field - - outcome=length_prefix_collision_prevented - """ - # Shift last char of 'a' into 'b' - a_shifted = a[:-1] - b_shifted = a[-1] + b - - ctx1 = FrozenErrorContext( - input_value=a, - locale_code=b, - parse_type="date", - fallback_value="f", - ) - ctx2 = FrozenErrorContext( - input_value=a_shifted, - locale_code=b_shifted, - parse_type="date", - fallback_value="f", - ) - - error1 = FrozenFluentError( - "msg", ErrorCategory.REFERENCE, context=ctx1 - ) - error2 = FrozenFluentError( - "msg", ErrorCategory.REFERENCE, context=ctx2 - ) - - event(f"a_len={len(a)}") - event(f"b_len={len(b)}") - assert error1.content_hash != error2.content_hash - event("outcome=length_prefix_collision_prevented") - - @given( - field=st.sampled_from(_OPTIONAL_DIAG_FIELDS), - val1=st.text(min_size=1, max_size=50), - val2=st.text(min_size=1, max_size=50), - ) - @settings(max_examples=200) - def test_each_optional_field_affects_hash( - self, field: str, val1: str, val2: str - ) -> None: - """Property: Changing any single optional field changes the hash. - - Each of the 7 optional string fields in Diagnostic (hint, - help_url, function_name, argument_name, expected_type, - received_type, ftl_location) must independently affect the - content hash. - - Events emitted: - - field={name}: Which field was varied - - outcome=field_sensitivity_verified - """ - assume(val1 != val2) - event(f"field={field}") - - diag1 = _make_diag_with_field(field, val1) - diag2 = _make_diag_with_field(field, val2) - - e1 = FrozenFluentError( - "m", ErrorCategory.RESOLUTION, diagnostic=diag1 - ) - e2 = FrozenFluentError( - "m", ErrorCategory.RESOLUTION, diagnostic=diag2 - ) - - assert e1.content_hash != e2.content_hash - event("outcome=field_sensitivity_verified") - - @given( - field=st.sampled_from(_OPTIONAL_DIAG_FIELDS), - val=st.text(min_size=1, max_size=50), - ) - @settings(max_examples=100) - def test_none_vs_present_field_affects_hash( - self, field: str, val: str - ) -> None: - """Property: None vs present for any optional field changes hash. - - The hash uses b"\\x00NONE" sentinel for absent fields. A - present field must always produce a different hash than the - sentinel. - - Events emitted: - - field={name}: Which field was toggled - - outcome=none_vs_present_verified - """ - event(f"field={field}") - - diag_with = _make_diag_with_field(field, val) - diag_without = Diagnostic( - code=DiagnosticCode.FUNCTION_FAILED, - message="base", - ) - - e1 = FrozenFluentError( - "m", ErrorCategory.RESOLUTION, diagnostic=diag_with - ) - e2 = FrozenFluentError( - "m", ErrorCategory.RESOLUTION, diagnostic=diag_without - ) - - assert e1.content_hash != e2.content_hash - event("outcome=none_vs_present_verified") - - @given( - message=error_messages(), - category=error_categories(), - ) - @settings(max_examples=100) - def test_section_markers_prevent_presence_collision( - self, message: str, category: ErrorCategory - ) -> None: - """Property: All 4 diagnostic/context presence permutations differ. - - Section markers (\\x01DIAG/\\x00NODIAG, \\x01CTX/\\x00NOCTX) - ensure that errors with different combinations of diagnostic - and context presence always produce different hashes. - - Events emitted: - - category={name}: Error category - - outcome=section_markers_verified - """ - event(f"category={category.name}") - - diag = Diagnostic( - code=DiagnosticCode.MESSAGE_NOT_FOUND, - message="diag msg", - ) - ctx = FrozenErrorContext(input_value="ctx val") - - # All 4 presence permutations - e_nn = FrozenFluentError(message, category) - e_dn = FrozenFluentError( - message, category, diagnostic=diag - ) - e_nc = FrozenFluentError( - message, category, context=ctx - ) - e_dc = FrozenFluentError( - message, category, diagnostic=diag, context=ctx - ) - - hashes = { - e_nn.content_hash, - e_dn.content_hash, - e_nc.content_hash, - e_dc.content_hash, - } - assert len(hashes) == 4 - event("outcome=section_markers_verified") +from tests.diagnostics_frozen_error_cases.branch_coverage import * # noqa: F403 - split module reuses shared support imports +from tests.diagnostics_frozen_error_cases.core_behavior import * # noqa: F403 - split module reuses shared support imports +from tests.diagnostics_frozen_error_cases.formatting_and_hashes import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_documentation_tooling.py b/tests/test_documentation_tooling.py index a8a69e9b..3d111e4f 100644 --- a/tests/test_documentation_tooling.py +++ b/tests/test_documentation_tooling.py @@ -6,6 +6,7 @@ import importlib import importlib.util import inspect +import json import pkgutil import re import subprocess @@ -15,6 +16,8 @@ from tempfile import TemporaryDirectory from types import ModuleType +import pytest + REPO_ROOT = Path(__file__).resolve().parent.parent SRC_ROOT = REPO_ROOT / "src" DOCUMENTED_MODULES = ( @@ -32,6 +35,7 @@ "check.sh", "scripts/validate_docs.py", "scripts/validate_version.py", + "scripts/validate-devcontainer.sh", "scripts/run_examples.py", "scripts/lint.sh", "scripts/test.sh", @@ -91,8 +95,23 @@ def _extract_signature_block(md_path: Path, section: str) -> str | None: return match.group(1).strip() if match else None +def _atheris_targets() -> list[tuple[str, str, str]]: + """Return the canonical Atheris target registry rows.""" + manifest = REPO_ROOT / "fuzz_atheris" / "targets.tsv" + rows: list[tuple[str, str, str]] = [] + + for raw_line in manifest.read_text(encoding="utf-8").splitlines(): + line = raw_line.strip() + if not line or line.startswith("#"): + continue + name, module, description = line.split("\t") + rows.append((name, module, description)) + + return rows + + def test_validate_docs_configuration_tracks_runnable_python_docs() -> None: - """validate_docs should know which markdown files contain runnable Python.""" + """validate_docs should know which markdown files contain runnable examples.""" validate_docs = _load_script_module( "validate_docs_script", REPO_ROOT / "scripts" / "validate_docs.py" ) @@ -110,11 +129,95 @@ def test_validate_docs_configuration_tracks_runnable_python_docs() -> None: assert "docs/QUICK_REFERENCE.md" in config.python_exec_globs assert "docs/TYPE_HINTS_GUIDE.md" in config.python_exec_globs assert "docs/VALIDATION_GUIDE.md" in config.python_exec_globs + assert "docs/WORKFLOW_TOUR.md" in config.python_exec_globs + assert "docs/FUZZING_GUIDE.md" in config.shell_exec_globs + assert "docs/FUZZING_GUIDE_ATHERIS.md" in config.shell_exec_globs + assert "docs/FUZZING_GUIDE_HYPOFUZZ.md" in config.shell_exec_globs + assert "fuzz_atheris/README.md" in config.shell_exec_globs assert ( validate_docs.validate_python_code("from ftllexengine import __version__", REPO_ROOT) is None ) assert validate_docs.validate_python_code("raise RuntimeError('boom')", REPO_ROOT) is not None + assert validate_docs.validate_shell_code("printf docs-shell-ok", REPO_ROOT, 5) is None + assert validate_docs.validate_shell_code("exit 7", REPO_ROOT, 5) is not None + + +def test_validate_docs_prefers_path_bash_for_shell_snippets(monkeypatch: pytest.MonkeyPatch) -> None: + """Shell snippet validation should prefer the PATH-resolved Bash runtime.""" + validate_docs = _load_script_module( + "validate_docs_shell_resolution", REPO_ROOT / "scripts" / "validate_docs.py" + ) + + monkeypatch.setenv("BASH", "/bin/bash") + monkeypatch.setenv("SHELL", "/bin/zsh") + monkeypatch.setattr( + validate_docs.shutil, + "which", + lambda name: "/opt/homebrew/bin/bash" if name == "bash" else None, + ) + + shell, preamble = validate_docs._resolve_shell_runner() + + assert shell == "/opt/homebrew/bin/bash" + assert preamble == "set -euo pipefail" + + +def test_validate_docs_normalizes_devcontainer_wrapper_inside_container( + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Host-side devcontainer wrapper snippets should collapse inside the container.""" + validate_docs = _load_script_module( + "validate_docs_devcontainer_normalization", REPO_ROOT / "scripts" / "validate_docs.py" + ) + + monkeypatch.setenv("FTLLEXENGINE_DEVCONTAINER", "1") + code = ( + "npx --yes @devcontainers/cli up --workspace-folder .\n" + "npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --help\n" + ) + + normalized = validate_docs._normalize_shell_code_for_runtime(code) + + assert normalized.strip() == "./scripts/fuzz_atheris.sh --help" + + +def test_hypofuzz_deep_mode_declares_and_uses_fuzz_tooling_group() -> None: + """Deep HypoFuzz runs must provision the fuzz dependency group explicitly.""" + pyproject = tomllib.loads((REPO_ROOT / "pyproject.toml").read_text(encoding="utf-8")) + fuzz_group = pyproject["dependency-groups"]["fuzz"] + deep_mode = (REPO_ROOT / "scripts" / "lib" / "fuzz_hypofuzz" / "modes_fuzz.sh").read_text( + encoding="utf-8" + ) + + assert any(dep.startswith("hypothesis[cli]>=") for dep in fuzz_group) + assert any(dep.startswith("hypofuzz>=") for dep in fuzz_group) + assert 'uv run --group fuzz --python "$PY_VERSION"' in deep_mode + + +def test_workflow_tour_runnable_blocks_are_self_contained() -> None: + """The workflow guide's runnable Python fences should execute independently.""" + validate_docs = _load_script_module( + "validate_docs_workflow_tour", REPO_ROOT / "scripts" / "validate_docs.py" + ) + config = validate_docs.CheckConfig.from_pyproject(REPO_ROOT) + parser = validate_docs.get_parser(config.parser_path) + assert parser is not None + + report = validate_docs.ValidationReport(status="pass") + block_pattern = re.compile( + r"^([ \t]*)```(\S+)\n(.*?)\n\1```", re.DOTALL | re.MULTILINE | re.IGNORECASE + ) + validate_docs.process_file( + REPO_ROOT / "docs" / "WORKFLOW_TOUR.md", + REPO_ROOT, + config, + parser, + report, + block_pattern, + ) + + assert report.failures == [] def test_run_examples_registers_contracts_for_all_shipped_examples() -> None: @@ -124,17 +227,18 @@ def test_run_examples_registers_contracts_for_all_shipped_examples() -> None: ) shipped_examples = { - path.name - for path in (REPO_ROOT / "examples").glob("*.py") - if path.is_file() + path.name for path in (REPO_ROOT / "examples").glob("*.py") if path.is_file() } assert set(run_examples.EXAMPLE_CONTRACTS) == shipped_examples - assert run_examples.EXAMPLE_CONTRACTS["parser_only.py"]( - "[PASS] Warning-only validation semantics verified\n" - "[PASS] Invalid syntax semantics verified\n" - "All examples completed successfully!\n" - ) is None + assert ( + run_examples.EXAMPLE_CONTRACTS["parser_only.py"]( + "[PASS] Warning-only validation semantics verified\n" + "[PASS] Invalid syntax semantics verified\n" + "All examples completed successfully!\n" + ) + is None + ) assert run_examples.EXAMPLE_CONTRACTS["parser_only.py"]("incomplete output") is not None @@ -296,42 +400,178 @@ def test_check_script_covers_full_quality_surface() -> None: required_commands = ( "scripts/validate_version.py", + "./scripts/validate-devcontainer.sh", "scripts/validate_docs.py", "scripts/run_examples.py", "./scripts/lint.sh", "./scripts/test.sh", "./scripts/fuzz_hypofuzz.sh --preflight", "./scripts/fuzz_atheris.sh --corpus", - "./scripts/fuzz_atheris.sh graph --time", - "./scripts/fuzz_atheris.sh introspection --time", + "./scripts/fuzz_atheris.sh --smoke-all --time", ) for command in required_commands: assert command in text -def test_atheris_corpus_health_bootstraps_its_venv() -> None: - """Atheris corpus health should create its dedicated venv before execution.""" - text = (REPO_ROOT / "scripts" / "fuzz_atheris.sh").read_text(encoding="utf-8") - marker = "run_corpus_health() {" - assert marker in text - body = text.split(marker, 1)[1].split("}", 1)[0] +def test_lint_script_uses_explicit_validator_registry() -> None: + """lint.sh should declare its validator surface instead of discovering it by comments.""" + text = (REPO_ROOT / "scripts" / "lint.sh").read_text(encoding="utf-8") + + assert "SCRIPT_VALIDATORS=(" in text + assert "validate_pyi_sync.py" in text + assert "verify_iso4217.py" in text + assert "validate_docs.py" not in text + assert "validate_version.py" not in text + assert "@lint-plugin:" not in text + - assert "ensure_atheris_venv" in body or "run_diagnostics" in body +def test_atheris_launcher_uses_explicit_target_manifest() -> None: + """Atheris target discovery should come from one manifest, not magic headers.""" + text = ( + REPO_ROOT / "scripts" / "lib" / "fuzz_atheris" / "common.sh" + ).read_text(encoding="utf-8") + manifest_rows = _atheris_targets() + + assert "targets.tsv" in text + assert "FUZZ_PLUGIN" not in text + assert manifest_rows != [] + + for name, module, description in manifest_rows: + assert name + assert module.endswith(".py") + assert description + + +def test_atheris_launcher_pivots_into_uv_managed_atheris_env() -> None: + """Atheris native runs should use the dedicated uv-managed environment contract.""" + entrypoint = (REPO_ROOT / "scripts" / "fuzz_atheris.sh").read_text(encoding="utf-8") + common = (REPO_ROOT / "scripts" / "lib" / "fuzz_atheris" / "common.sh").read_text( + encoding="utf-8" + ) + assert 'UV_PROJECT_ENVIRONMENT="$TARGET_VENV"' in common + assert "--group dev --group atheris --locked" in common + assert "FTLLEXENGINE_DEVCONTAINER" in common + assert ".venv-devcontainer-atheris" in common + assert ".venv-atheris" not in common + assert 'ORIGINAL_ARGS=("$@")' in entrypoint + assert 'pivot_to_atheris_env "${ORIGINAL_ARGS[@]}"' in entrypoint -def test_atheris_bootstrap_discovers_uv_managed_python_313() -> None: - """Atheris bootstrap should recognize uv-managed Python 3.13 interpreters.""" - text = (REPO_ROOT / "scripts" / "fuzz_atheris.sh").read_text(encoding="utf-8") - assert "uv python find 3.13" in text +def test_atheris_docs_make_devcontainer_context_explicit() -> None: + """Published Atheris commands should state the required execution context.""" + guide = (REPO_ROOT / "docs" / "FUZZING_GUIDE_ATHERIS.md").read_text(encoding="utf-8") + inventory = (REPO_ROOT / "fuzz_atheris" / "README.md").read_text(encoding="utf-8") + contributing = (REPO_ROOT / "CONTRIBUTING.md").read_text(encoding="utf-8") + + assert "Inside a contributor devcontainer terminal" in guide + assert "From the host, run the same entrypoint through the devcontainer wrapper" in guide + assert "Inside a contributor devcontainer terminal" in inventory + assert "Inside a devcontainer terminal: `./scripts/fuzz_atheris.sh" in contributing + + +def test_devcontainer_declares_atheris_toolchain_contract() -> None: + """Contributor container must ship the native toolchain Atheris setup needs.""" + dockerfile = (REPO_ROOT / ".devcontainer" / "Dockerfile").read_text(encoding="utf-8") + config_json = json.loads( + (REPO_ROOT / ".devcontainer" / "devcontainer.json").read_text(encoding="utf-8") + ) + + assert "clang-19" in dockerfile + assert "libclang-rt-19-dev" in dockerfile + assert 'find "$(clang --print-resource-dir)"/lib/linux' in dockerfile + assert config_json["containerEnv"]["CLANG_BIN"] == "/usr/local/bin/clang" + assert config_json["containerEnv"]["UV_LINK_MODE"] == "copy" + + +def test_release_protocol_keeps_verification_commands_inside_devcontainer() -> None: + """Release instructions should not blur host and in-container verification steps.""" + text = (REPO_ROOT / "docs" / "RELEASE_PROTOCOL.md").read_text(encoding="utf-8") + + required_exec_commands = ( + "npx --yes @devcontainers/cli exec --workspace-folder . ./check.sh", + "npx --yes @devcontainers/cli exec --workspace-folder . bash -lc '", + "PY_VERSION=3.14 ./scripts/lint.sh", + "PY_VERSION=3.14 ./scripts/test.sh", + "uv run --group dev --python 3.14 python scripts/validate_docs.py", + "uv run --group dev --python 3.14 python scripts/validate_version.py", + "uv build", + ) + + for command in required_exec_commands: + assert command in text + + +def test_release_protocol_uses_clean_clone_for_container_verified_preflight() -> None: + """Release instructions should use a clone topology the devcontainer can verify.""" + text = (REPO_ROOT / "docs" / "RELEASE_PROTOCOL.md").read_text(encoding="utf-8") + + assert "Do not use `git worktree` for release pre-flight in this repository." in text + assert 'git clone --branch main "$PRIMARY_CHECKOUT" "$RELEASE_CLONE"' in text + assert ( + 'git clone --branch codex/release-bootstrap-X.Y.Z "$PRIMARY_CHECKOUT" "$RELEASE_CLONE"' + in text + ) + assert 'git remote set-url origin "$PRIMARY_ORIGIN_URL"' in text + assert "git worktree add" not in text + + +def test_release_protocol_artifact_leak_check_uses_base_tooling() -> None: + """Release instructions should not depend on undeclared grep replacements.""" + text = (REPO_ROOT / "docs" / "RELEASE_PROTOCOL.md").read_text(encoding="utf-8") + + assert 'tar -tzf "dist/ftllexengine-X.Y.Z.tar.gz" | grep -E ' in text + assert "tar -tzf" in text + assert "| rg " not in text + + +def test_atheris_inventory_readme_matches_target_manifest() -> None: + """The published Atheris inventory should stay aligned with the live target registry.""" + readme = (REPO_ROOT / "fuzz_atheris" / "README.md").read_text(encoding="utf-8") + for name, module, description in _atheris_targets(): + assert f"| `{name}` | `{module}` | {description} |" in readme + + +def test_shell_gates_use_devcontainer_scoped_venv_names() -> None: + """Container-run shell gates should not reuse host `.venv-*` paths.""" + script_paths = ( + REPO_ROOT / "check.sh", + REPO_ROOT / "scripts" / "lint.sh", + REPO_ROOT / "scripts" / "test.sh", + REPO_ROOT / "scripts" / "fuzz_hypofuzz.sh", + REPO_ROOT / "scripts" / "benchmark.sh", + ) + + for path in script_paths: + text = path.read_text(encoding="utf-8") + assert "FTLLEXENGINE_DEVCONTAINER" in text + assert ".venv-devcontainer-" in text + + +def test_shell_gates_default_uv_link_mode_for_devcontainer_reuse() -> None: + """Container-owned shell gates should force copy mode for reused devcontainers.""" + script_paths = ( + REPO_ROOT / "check.sh", + REPO_ROOT / "scripts" / "lint.sh", + REPO_ROOT / "scripts" / "test.sh", + REPO_ROOT / "scripts" / "fuzz_hypofuzz.sh", + REPO_ROOT / "scripts" / "benchmark.sh", + REPO_ROOT / "scripts" / "lib" / "fuzz_atheris" / "common.sh", + ) + + for path in script_paths: + text = path.read_text(encoding="utf-8") + assert 'FTLLEXENGINE_DEVCONTAINER:-}" == "1"' in text + assert 'export UV_LINK_MODE="copy"' in text -def test_atheris_bootstrap_recreates_broken_venv_dirs() -> None: - """Atheris bootstrap should discard stale venv directories with broken Python links.""" - text = (REPO_ROOT / "scripts" / "fuzz_atheris.sh").read_text(encoding="utf-8") +def test_test_sh_executes_pytest_via_explicit_uv_command() -> None: + """test.sh should run pytest through the explicit uv execution path.""" + text = (REPO_ROOT / "scripts" / "test.sh").read_text(encoding="utf-8") - assert '[[ -d "$ATHERIS_VENV" ]] && [[ ! -x "$ATHERIS_PYTHON" ]]' in text + assert 'declare -a CMD=("uv" "run" "--python" "$PY_VERSION" "pytest")' in text + assert 'exec uv run --python "$PY_VERSION" "${BASH:-bash}" "$0" "$@"' not in text def test_reference_signature_parameter_names_match_live_exports() -> None: @@ -373,14 +613,10 @@ def test_reference_signature_parameter_names_match_live_exports() -> None: if name != "self" ] live_params = [ - param.name - for param in signature.parameters.values() - if param.name != "self" + param.name for param in signature.parameters.values() if param.name != "self" ] if live_params != doc_params: - issues.append( - f"{route_name}: live={live_params!r} doc={doc_params!r}" - ) + issues.append(f"{route_name}: live={live_params!r} doc={doc_params!r}") assert issues == [] diff --git a/tests/test_init_module.py b/tests/test_init_module.py index 3945bd5c..357e370e 100644 --- a/tests/test_init_module.py +++ b/tests/test_init_module.py @@ -14,10 +14,11 @@ from __future__ import annotations import sys -from collections.abc import Iterator +from collections.abc import Callable, Iterator from contextlib import contextmanager from importlib.metadata import PackageNotFoundError from types import ModuleType +from typing import cast from unittest.mock import MagicMock, patch import pytest @@ -204,15 +205,29 @@ class TestParserOnlyFacadeBehavior: """Parser-only installs keep zero-dependency exports while gating Babel-backed facades.""" def test_direct_optional_attribute_access_provides_install_guidance(self) -> None: - """Direct optional attribute access raises AttributeError with install guidance.""" - with ( - _fresh_ftl_import(block_babel=True) as ftllexengine, - pytest.raises( - AttributeError, - match=r"FluentBundle requires the full runtime install.*pip install ftllexengine\[babel\]", - ), - ): - _ = ftllexengine.FluentBundle + """Direct optional attribute access returns a stub that raises on first use.""" + with _fresh_ftl_import(block_babel=True) as ftllexengine: + from ftllexengine.core.babel_compat import BabelImportError + + fluent_bundle_cls = cast("Callable[..., object]", ftllexengine.FluentBundle) + with pytest.raises( + BabelImportError, + match=r"FluentBundle requires Babel.*pip install ftllexengine\[babel\]", + ): + fluent_bundle_cls("en_US") + + def test_from_import_optional_symbol_provides_install_guidance(self) -> None: + """Explicit from-import keeps the actionable Babel guidance on first use.""" + with _fresh_ftl_import(block_babel=True): + from ftllexengine.core.babel_compat import BabelImportError + + module = __import__("ftllexengine", fromlist=["FluentBundle"]) + fluent_bundle_cls = cast("Callable[..., object]", module.FluentBundle) + with pytest.raises( + BabelImportError, + match=r"FluentBundle requires Babel.*pip install ftllexengine\[babel\]", + ): + fluent_bundle_cls("en_US") def test_zero_dependency_root_symbols_remain_accessible_without_babel(self) -> None: """Parser-only installs still expose zero-dependency root helpers.""" @@ -250,30 +265,27 @@ def test_runtime_and_localization_facades_stay_partially_available_without_babel assert "CacheAuditLogEntry" in localization.__all__ assert localization.PathResourceLoader.__name__ == "PathResourceLoader" - def test_parser_only_feature_probing_treats_optional_names_as_absent(self) -> None: - """hasattr/getattr(default) treat Babel-backed names as absent in parser-only mode.""" + def test_parser_only_feature_probing_uses_is_babel_available(self) -> None: + """Parser-only callers use is_babel_available() instead of hasattr on optional names.""" with _fresh_ftl_import(block_babel=True) as ftllexengine: from ftllexengine import localization, runtime - assert hasattr(ftllexengine, "FluentBundle") is False - assert getattr(ftllexengine, "FluentBundle", None) is None - - assert hasattr(runtime, "number_format") is False - assert getattr(runtime, "number_format", None) is None - - assert hasattr(localization, "FluentLocalization") is False - assert getattr(localization, "FluentLocalization", None) is None + assert ftllexengine.is_babel_available() is False + assert callable(ftllexengine.FluentBundle) + assert callable(runtime.number_format) + assert callable(localization.FluentLocalization) def test_parser_only_runtime_formatter_access_still_gives_install_hint(self) -> None: - """Direct runtime formatter access raises AttributeError with install guidance.""" + """Direct runtime formatter access returns a stub that raises on first call.""" with _fresh_ftl_import(block_babel=True): from ftllexengine import runtime + from ftllexengine.core.babel_compat import BabelImportError with pytest.raises( - AttributeError, - match=r"number_format requires the full runtime install.*pip install ftllexengine\[babel\]", + BabelImportError, + match=r"number_format requires Babel.*pip install ftllexengine\[babel\]", ): - _ = runtime.number_format + runtime.number_format(1, "en_US") def test_internal_runtime_import_failure_is_not_masked_as_missing_babel(self) -> None: """A broken runtime import must surface its real error instead of a Babel hint.""" @@ -325,19 +337,23 @@ def test_unknown_attribute_not_in_optional_attrs(self) -> None: class TestOptionalExportHelper: """Direct tests for the optional-export helper branches.""" - def test_helper_without_parser_only_hint_raises_plain_attribute_error(self) -> None: - """Optional symbols raise AttributeError outside import machinery.""" + def test_helper_without_parser_only_hint_returns_raising_placeholder(self) -> None: + """Optional symbols resolve to a placeholder that raises BabelImportError on use.""" from ftllexengine._optional_exports import raise_missing_babel_symbol + from ftllexengine.core.babel_compat import BabelImportError + + placeholder = raise_missing_babel_symbol( + module_name="ftllexengine.runtime", + name="FluentBundle", + optional_attrs=frozenset({"FluentBundle"}), + ) + placeholder_cls = cast("Callable[..., object]", placeholder) with pytest.raises( - AttributeError, - match=r"FluentBundle requires the full runtime install.*pip install ftllexengine\[babel\]", + BabelImportError, + match=r"FluentBundle requires Babel.*pip install ftllexengine\[babel\]", ): - raise_missing_babel_symbol( - module_name="ftllexengine.runtime", - name="FluentBundle", - optional_attrs=frozenset({"FluentBundle"}), - ) + placeholder_cls("en_US") def test_unknown_facade_contract_raises_key_error(self) -> None: """Unknown facade names fail fast instead of fabricating empty optional exports.""" diff --git a/tests/test_integration_e2e.py b/tests/test_integration_e2e.py index 896770da..5b6bbb92 100644 --- a/tests/test_integration_e2e.py +++ b/tests/test_integration_e2e.py @@ -1,1059 +1,8 @@ -"""End-to-end tests for parse->format workflow integration. - -Tests the complete pipeline from FTL source to formatted output: -- Parse FTL source with parse_ftl() -- Add to FluentBundle via add_resource() -- Format with format_pattern() -- Verify round-trip produces expected results - -These tests validate that parsing and formatting work together correctly -as an integrated system, not just as isolated components. - -Note: "Bidirectional" refers to the two-way workflow (parse->format), not -bidirectional text handling or currency/number parsing from strings. - -Structure: - - TestParseFormatBasic: Essential round-trip tests (run in every CI build) - - TestParseFormatWithVariables: Variable interpolation round-trips - - TestParseFormatSelectExpressions: Select expression round-trips - - TestParseFormatReferences: Message/term reference round-trips - - TestParseFormatEdgeCases: Edge cases and unicode handling - - TestParseFormatWithFunctions: Built-in function integration - - TestParseFormatErrorHandling: Error paths in integration - - TestParseFormatIntrospection: Introspection API integration - - TestParseFormatValidation: Validation API integration - - TestParseFormatWithCache: Caching behavior integration - - TestParseFormatIsolation: Unicode isolation mark behavior - - TestSerializeParseRoundtrip: AST serialization round-trips - - TestMultiModuleIntegration: parse->validate->serialize->introspect pipeline - - TestValidationRuntimeConsistency: validation warnings predict runtime failures -""" - -from __future__ import annotations - -from datetime import UTC, datetime -from decimal import Decimal - -import pytest - -from ftllexengine import ( - FluentBundle, - parse_ftl, - serialize_ftl, -) -from ftllexengine.constants import MAX_DEPTH -from ftllexengine.diagnostics import DiagnosticCode, ErrorCategory, FrozenFluentError -from ftllexengine.introspection import introspect_message -from ftllexengine.runtime.cache_config import CacheConfig -from ftllexengine.syntax.ast import Junk, Message, NumberLiteral, Term -from ftllexengine.syntax.parser import FluentParserV1 -from ftllexengine.syntax.serializer import serialize -from ftllexengine.validation.resource import validate_resource - -# ============================================================================= -# Essential Parse->Format Tests (Run in every CI build) -# ============================================================================= - - -class TestParseFormatBasic: - """Essential tests for parse->format round-trip.""" - - def test_simple_message_roundtrip(self) -> None: - """Simple message parses and formats correctly.""" - ftl_source = "hello = Hello, World!" - - # Verify parsing produces valid AST - resource = parse_ftl(ftl_source) - assert len(resource.entries) == 1 - assert isinstance(resource.entries[0], Message) - - # Verify formatting produces expected output - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("hello") - assert result == "Hello, World!" - assert len(errors) == 0 - - def test_multiple_messages_roundtrip(self) -> None: - """Multiple messages parse and format correctly.""" - ftl_source = """ -msg1 = First message -msg2 = Second message -msg3 = Third message -""" - # Verify parsing - resource = parse_ftl(ftl_source) - messages = [e for e in resource.entries if isinstance(e, Message)] - assert len(messages) == 3 - - # Verify formatting - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result1, _ = bundle.format_pattern("msg1") - result2, _ = bundle.format_pattern("msg2") - result3, _ = bundle.format_pattern("msg3") - - assert result1 == "First message" - assert result2 == "Second message" - assert result3 == "Third message" - - def test_multiline_pattern_roundtrip(self) -> None: - """Multiline patterns parse and format correctly.""" - ftl_source = """ -multi = First line - Second line - Third line -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("multi") - assert "First line" in result - assert "Second line" in result - assert "Third line" in result - assert len(errors) == 0 - - def test_message_with_attribute_roundtrip(self) -> None: - """Messages with attributes parse and format correctly.""" - ftl_source = """ -button = Click here - .accesskey = C - .title = Submit form -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - # Format main value - result, _ = bundle.format_pattern("button") - assert result == "Click here" - - # Format attributes using the attribute parameter - accesskey, _ = bundle.format_pattern("button", attribute="accesskey") - title, _ = bundle.format_pattern("button", attribute="title") - - assert accesskey == "C" - assert title == "Submit form" - - def test_term_roundtrip(self) -> None: - """Terms parse and format correctly.""" - ftl_source = """ --brand = Firefox --version = 120.0 -about = { -brand } v{ -version } -""" - resource = parse_ftl(ftl_source) - terms = [e for e in resource.entries if isinstance(e, Term)] - assert len(terms) == 2 - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("about") - assert result == "Firefox v120.0" - - -class TestParseFormatWithVariables: - """Tests for parse->format with variable interpolation.""" - - def test_single_variable_roundtrip(self) -> None: - """Single variable interpolation works correctly.""" - ftl_source = "greeting = Hello, { $name }!" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("greeting", {"name": "Alice"}) - assert result == "Hello, Alice!" - assert len(errors) == 0 - - def test_multiple_variables_roundtrip(self) -> None: - """Multiple variables interpolate correctly.""" - ftl_source = "user = { $firstName } { $lastName } ({ $role })" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern( - "user", - {"firstName": "John", "lastName": "Doe", "role": "Admin"}, - ) - assert result == "John Doe (Admin)" - - def test_number_variable_roundtrip(self) -> None: - """Number variables format correctly.""" - ftl_source = "count = You have { $n } items." - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("count", {"n": 42}) - assert "42" in result - - def test_decimal_variable_roundtrip(self) -> None: - """Decimal variables format correctly.""" - ftl_source = "price = Total: { $amount }" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("price", {"amount": Decimal("19.99")}) - assert "19.99" in result - - def test_missing_variable_fallback(self) -> None: - """Missing variables produce fallback with error.""" - ftl_source = "greeting = Hello, { $name }!" - - bundle = FluentBundle("en-US", strict=False, use_isolating=False) - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("greeting") - assert "Hello" in result - assert len(errors) > 0 # Should report missing variable - - -class TestParseFormatSelectExpressions: - """Tests for parse->format with select expressions.""" - - def test_simple_select_roundtrip(self) -> None: - """Simple select expression resolves correctly.""" - ftl_source = """ -items = { $count -> - [one] One item - *[other] { $count } items -} -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result_one, _ = bundle.format_pattern("items", {"count": 1}) - result_many, _ = bundle.format_pattern("items", {"count": 5}) - - assert result_one == "One item" - assert "5" in result_many - assert "items" in result_many - - def test_string_selector_roundtrip(self) -> None: - """String selector in select expression works correctly.""" - ftl_source = """ -status = { $state -> - [active] Currently active - [inactive] Not active - *[unknown] Status unknown -} -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - active, _ = bundle.format_pattern("status", {"state": "active"}) - inactive, _ = bundle.format_pattern("status", {"state": "inactive"}) - other, _ = bundle.format_pattern("status", {"state": "foo"}) - - assert active == "Currently active" - assert inactive == "Not active" - assert other == "Status unknown" - - def test_nested_select_roundtrip(self) -> None: - """Nested select expressions resolve correctly.""" - ftl_source = """ -response = { $gender -> - [male] { $count -> - [one] He has one item - *[other] He has { $count } items - } - *[other] { $count -> - [one] They have one item - *[other] They have { $count } items - } -} -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("response", {"gender": "male", "count": 1}) - assert "He has one item" in result - - def test_number_literal_variant_roundtrip(self) -> None: - """Number literal variants in select expressions work correctly.""" - ftl_source = """ -rating = { $stars -> - [1] Poor - [2] Fair - [3] Good - [4] Great - [5] Excellent - *[other] Unknown -} -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("rating", {"stars": 5}) - assert result == "Excellent" - - -class TestParseFormatReferences: - """Tests for parse->format with message and term references.""" - - def test_message_reference_roundtrip(self) -> None: - """Message references resolve correctly.""" - ftl_source = """ -base = World -greeting = Hello, { base }! -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("greeting") - assert result == "Hello, World!" - assert len(errors) == 0 - - def test_chained_reference_roundtrip(self) -> None: - """Chained message references resolve correctly.""" - ftl_source = """ -level1 = Core -level2 = { level1 } Extended -level3 = { level2 } Final -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("level3") - assert result == "Core Extended Final" - - def test_term_reference_roundtrip(self) -> None: - """Term references resolve correctly.""" - ftl_source = """ --brand = Firefox -download = Download { -brand } now! -about = About { -brand } -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - download, _ = bundle.format_pattern("download") - about, _ = bundle.format_pattern("about") - - assert "Firefox" in download - assert "Firefox" in about - - def test_term_attribute_reference_roundtrip(self) -> None: - """Term attribute references resolve correctly.""" - ftl_source = """ --brand = Firefox - .short = Fx -full = { -brand } -short = { -brand.short } -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - full, _ = bundle.format_pattern("full") - short, _ = bundle.format_pattern("short") - - assert full == "Firefox" - assert short == "Fx" - - def test_term_with_arguments_roundtrip(self) -> None: - """Term references with arguments resolve correctly.""" - ftl_source = """ --brand = { $case -> - [nominative] Firefox - [genitive] Firefoxu - *[other] Firefox -} -download = Download { -brand(case: "nominative") } -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("download") - assert "Firefox" in result - - -class TestParseFormatEdgeCases: - """Tests for edge cases and unicode handling.""" - - def test_unicode_content_roundtrip(self) -> None: - """Unicode content parses and formats correctly.""" - ftl_source = "greeting = Sveiki, pasaule!" - - bundle = FluentBundle("lv-LV", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("greeting") - assert result == "Sveiki, pasaule!" - - def test_emoji_content_roundtrip(self) -> None: - """Emoji content parses and formats correctly.""" - ftl_source = "welcome = Welcome! \U0001F44B" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("welcome") - assert "\U0001F44B" in result - - def test_cjk_content_roundtrip(self) -> None: - """CJK (Japanese) content in pattern values parses and formats correctly.""" - ftl_source = "hello = \u3053\u3093\u306b\u3061\u306f\u4e16\u754c" - - bundle = FluentBundle("ja-JP", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("hello") - assert "\u3053\u3093\u306b\u3061\u306f" in result - - def test_arabic_content_roundtrip(self) -> None: - """Arabic RTL script in pattern values parses and formats correctly.""" - ftl_source = "greeting = \u0645\u0631\u062d\u0628\u0627" - - bundle = FluentBundle("ar-SA", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("greeting") - assert "\u0645\u0631\u062d\u0628\u0627" in result - - def test_hebrew_content_roundtrip(self) -> None: - """Hebrew RTL script in pattern values parses and formats correctly.""" - ftl_source = "greeting = \u05e9\u05b8\u05dc\u05d5\u05b9\u05dd" - - bundle = FluentBundle("he-IL", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("greeting") - assert "\u05e9\u05b8\u05dc\u05d5\u05b9\u05dd" in result - - def test_backslash_in_text_roundtrip(self) -> None: - """Backslash in text (not StringLiteral) is preserved as-is per Fluent spec.""" - ftl_source = r"path = C:\Users\file.txt" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("path") - assert "\\" in result - assert "Users" in result - - def test_literal_brace_via_string_literal(self) -> None: - """Literal braces via StringLiteral placeable.""" - ftl_source = 'json = { "{" }key{ "}" }' - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("json") - assert "{" in result - assert "}" in result - - def test_empty_pattern_roundtrip(self) -> None: - """Empty pattern value handled correctly.""" - ftl_source = """ -msg = - .attr = Has attribute -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - # Main value is empty - result, errors = bundle.format_pattern("msg") - - assert not errors - assert isinstance(result, str) - - # Attribute should work - attr, _ = bundle.format_pattern("msg", attribute="attr") - assert attr == "Has attribute" - - def test_whitespace_preservation_roundtrip(self) -> None: - """Significant whitespace in patterns is preserved.""" - ftl_source = "spaced = Hello World" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("spaced") - assert " " in result - - -class TestParseFormatWithFunctions: - """Tests for parse->format with built-in functions.""" - - def test_number_function_roundtrip(self) -> None: - """NUMBER function formats correctly.""" - ftl_source = "amount = { NUMBER($value, minimumFractionDigits: 2) }" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("amount", {"value": Decimal("19.99")}) - assert "19.99" in result or "19,99" in result - - def test_datetime_function_roundtrip(self) -> None: - """DATETIME function formats correctly.""" - ftl_source = 'date = Date: { DATETIME($when, dateStyle: "short") }' - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern( - "date", {"when": datetime(2024, 1, 15, tzinfo=UTC)} - ) - assert "1" in result or "2024" in result - - def test_custom_function_roundtrip(self) -> None: - """Custom functions work in parse->format workflow.""" - ftl_source = "msg = Result: { DOUBLE($n) }" - - bundle = FluentBundle("en-US", use_isolating=False) - - def double_func(n: int | Decimal) -> str: - return str(n * 2) - - bundle.add_function("DOUBLE", double_func) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("msg", {"n": 21}) - assert "42" in result - - -class TestParseFormatErrorHandling: - """Tests for error handling in parse->format workflow.""" - - def test_missing_message_returns_fallback(self) -> None: - """Missing message returns fallback string with error.""" - ftl_source = "hello = Hello!" - - bundle = FluentBundle("en-US", strict=False, use_isolating=False) - bundle.add_resource(ftl_source) - - result, errors = bundle.format_pattern("nonexistent") - assert "{nonexistent}" in result - assert len(errors) == 1 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - - def test_missing_attribute_returns_fallback(self) -> None: - """Missing attribute returns fallback string with error.""" - ftl_source = """ -button = Click - .title = Button title -""" - bundle = FluentBundle("en-US", strict=False, use_isolating=False) - bundle.add_resource(ftl_source) - - _, errors = bundle.format_pattern("button", attribute="nonexistent") - assert len(errors) == 1 - - def test_invalid_ftl_produces_junk(self) -> None: - """Invalid FTL syntax produces Junk entry.""" - ftl_source = "invalid = { unclosed" - - resource = parse_ftl(ftl_source) - assert any(isinstance(e, Junk) for e in resource.entries) - - def test_resolution_error_propagates(self) -> None: - """Resolution errors are captured and returned.""" - ftl_source = """ -msg = { missing-ref } -""" - bundle = FluentBundle("en-US", strict=False, use_isolating=False) - bundle.add_resource(ftl_source) - - _, errors = bundle.format_pattern("msg") - assert len(errors) > 0 - - -class TestParseFormatIntrospection: - """Tests for introspection API in parse->format workflow.""" - - def test_has_message_after_parse(self) -> None: - """has_message() works correctly after parsing.""" - ftl_source = """ -hello = Hello -world = World -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - assert bundle.has_message("hello") is True - assert bundle.has_message("world") is True - assert bundle.has_message("nonexistent") is False - - def test_has_attribute_after_parse(self) -> None: - """has_attribute() works correctly after parsing.""" - ftl_source = """ -button = Click - .title = Title - .accesskey = A -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - assert bundle.has_attribute("button", "title") is True - assert bundle.has_attribute("button", "accesskey") is True - assert bundle.has_attribute("button", "nonexistent") is False - - def test_get_message_ids_after_parse(self) -> None: - """get_message_ids() returns all parsed message IDs.""" - ftl_source = """ -msg1 = First -msg2 = Second -msg3 = Third -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - ids = bundle.get_message_ids() - assert "msg1" in ids - assert "msg2" in ids - assert "msg3" in ids - assert len(ids) == 3 - - def test_get_message_variables_after_parse(self) -> None: - """get_message_variables() extracts variables from parsed message.""" - ftl_source = "greeting = Hello, { $name }! You have { $count } items." - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - variables = bundle.get_message_variables("greeting") - assert "name" in variables - assert "count" in variables - assert len(variables) == 2 - - def test_introspect_message_after_parse(self) -> None: - """introspect_message() provides detailed info after parsing.""" - ftl_source = """ -msg = Hello, { $name }! -select-msg = { $count -> - [one] One item - *[other] { $count } items -} -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - # Introspect simple message - info = bundle.introspect_message("msg") - assert info.message_id == "msg" - assert "name" in info.get_variable_names() - assert info.has_selectors is False - - # Introspect message with select expression - select_info = bundle.introspect_message("select-msg") - assert select_info.message_id == "select-msg" - assert "count" in select_info.get_variable_names() - assert select_info.has_selectors is True - - -class TestParseFormatValidation: - """Tests for validation API in parse->format workflow.""" - - def test_validate_resource_valid_ftl(self) -> None: - """validate_resource() accepts valid FTL.""" - ftl_source = """ -hello = Hello, World! -greeting = Hello, { $name }! -""" - bundle = FluentBundle("en-US", use_isolating=False) - result = bundle.validate_resource(ftl_source) - - assert result.is_valid is True - assert len(result.errors) == 0 - - def test_validate_resource_invalid_ftl(self) -> None: - """validate_resource() rejects invalid FTL.""" - ftl_source = "invalid = { unclosed" - - bundle = FluentBundle("en-US", use_isolating=False) - result = bundle.validate_resource(ftl_source) - - assert result.is_valid is False - assert len(result.errors) > 0 - - -class TestParseFormatWithCache: - """Tests for caching behavior in parse->format workflow.""" - - def test_cache_enabled_improves_repeated_calls(self) -> None: - """Cache improves performance on repeated format calls.""" - ftl_source = "msg = Hello, { $name }!" - - bundle = FluentBundle("en-US", use_isolating=False, cache=CacheConfig()) - bundle.add_resource(ftl_source) - - # First call - cache miss - result1, _ = bundle.format_pattern("msg", {"name": "Alice"}) - - # Second call with same args - cache hit - result2, _ = bundle.format_pattern("msg", {"name": "Alice"}) - - assert result1 == result2 - - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["hits"] >= 1 - - def test_cache_stats_available_when_enabled(self) -> None: - """Cache statistics are available when caching enabled.""" - ftl_source = "msg = Hello!" - - bundle = FluentBundle("en-US", use_isolating=False, cache=CacheConfig()) - bundle.add_resource(ftl_source) - - bundle.format_pattern("msg") - - stats = bundle.get_cache_stats() - assert stats is not None - assert "hits" in stats - assert "misses" in stats - - def test_cache_stats_none_when_disabled(self) -> None: - """Cache statistics are None when caching disabled.""" - ftl_source = "msg = Hello!" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - bundle.format_pattern("msg") - - stats = bundle.get_cache_stats() - assert stats is None - - def test_clear_cache_preserves_stats(self) -> None: - """clear_cache() clears entries but metrics are cumulative (not reset).""" - ftl_source = "msg = Hello!" - - bundle = FluentBundle("en-US", use_isolating=False, cache=CacheConfig()) - bundle.add_resource(ftl_source) - - bundle.format_pattern("msg") # miss - bundle.format_pattern("msg") # hit - - bundle.clear_cache() - bundle.format_pattern("msg") # miss (entries cleared, not metrics) - - stats = bundle.get_cache_stats() - assert stats is not None - # 1 pre-clear miss + 1 post-clear miss = 2 cumulative misses - assert stats["misses"] == 2 - - -class TestParseFormatIsolation: - """Tests for Unicode bidi isolation in parse->format workflow.""" - - def test_use_isolating_true_adds_marks(self) -> None: - """use_isolating=True wraps placeables in bidi isolation marks.""" - ftl_source = "msg = Hello, { $name }!" - - bundle = FluentBundle("en-US", use_isolating=True) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("msg", {"name": "World"}) - - # Should contain FSI (First Strong Isolate) and PDI (Pop Directional Isolate) - assert "\u2068" in result - assert "\u2069" in result - - def test_use_isolating_false_no_marks(self) -> None: - """use_isolating=False does not add bidi isolation marks.""" - ftl_source = "msg = Hello, { $name }!" - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - result, _ = bundle.format_pattern("msg", {"name": "World"}) - - # Should NOT contain isolation marks - assert "\u2068" not in result - assert "\u2069" not in result - - -class TestCommentPreservation: - """Tests for comment handling in parse->format.""" - - def test_comments_dont_affect_formatting(self) -> None: - """Comments in FTL don't affect message formatting.""" - ftl_source = """ -# This is a comment -## Group comment -### Resource comment -hello = Hello! -# Another comment -world = World! -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl_source) - - hello, _ = bundle.format_pattern("hello") - world, _ = bundle.format_pattern("world") - - assert hello == "Hello!" - assert world == "World!" - - -# ============================================================================= -# Intensive Round-trip Tests (Fuzz-marked, run with pytest -m fuzz) -# ============================================================================= - - -class TestSerializeParseRoundtrip: - """Example-based tests for AST serialization round-trips.""" - - def test_serialize_parse_simple_message(self) -> None: - """Serialize->parse round-trip preserves simple messages.""" - ftl_source = "hello = Hello, World!" - - resource = parse_ftl(ftl_source) - serialized = serialize_ftl(resource) - resource2 = parse_ftl(serialized) - - assert len(resource.entries) == len(resource2.entries) - - def test_serialize_parse_with_variables(self) -> None: - """Serialize->parse round-trip preserves variables.""" - ftl_source = "greeting = Hello, { $name }!" - - resource = parse_ftl(ftl_source) - serialized = serialize_ftl(resource) - - bundle1 = FluentBundle("en-US", use_isolating=False) - bundle1.add_resource(ftl_source) - - bundle2 = FluentBundle("en-US", use_isolating=False) - bundle2.add_resource(serialized) - - result1, _ = bundle1.format_pattern("greeting", {"name": "Test"}) - result2, _ = bundle2.format_pattern("greeting", {"name": "Test"}) - - assert result1 == result2 - - def test_serialize_preserves_select_expressions(self) -> None: - """Serialize->parse preserves select expression structure.""" - ftl_source = """ -count = { $n -> - [one] One - *[other] Many -} -""" - resource = parse_ftl(ftl_source) - serialized = serialize_ftl(resource) - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(serialized) - - one, _ = bundle.format_pattern("count", {"n": 1}) - many, _ = bundle.format_pattern("count", {"n": 5}) - - assert "One" in one - assert "Many" in many - - def test_serialize_preserves_term_attributes(self) -> None: - """Serialize->parse preserves term attributes.""" - ftl_source = """ --brand = Firefox - .short = Fx - .full = Mozilla Firefox -msg = { -brand.short } -""" - resource = parse_ftl(ftl_source) - serialized = serialize_ftl(resource) - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(serialized) - - result, _ = bundle.format_pattern("msg") - assert "Fx" in result - - def test_serialize_preserves_message_attributes(self) -> None: - """Serialize->parse preserves message attributes.""" - ftl_source = """ -button = Click me - .accesskey = C - .title = Submit -""" - resource = parse_ftl(ftl_source) - serialized = serialize_ftl(resource) - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(serialized) - - accesskey, _ = bundle.format_pattern("button", attribute="accesskey") - title, _ = bundle.format_pattern("button", attribute="title") - - assert accesskey == "C" - assert title == "Submit" - - -# ============================================================================= -# Multi-Module Pipeline Tests -# ============================================================================= - - -class TestMultiModuleIntegration: - """Integration tests exercising parse->validate->serialize->introspect pipeline.""" - - def test_parse_validate_serialize_roundtrip(self) -> None: - """Complete roundtrip: parse -> validate -> serialize -> re-parse preserves structure.""" - ftl = """ -msg = Hello { $name } - .title = Title - --brand = Firefox - -plural = { $count -> - [one] One item - *[other] { $count } items -} -""" - parser = FluentParserV1() - resource = parser.parse(ftl) - - result = validate_resource(ftl) - assert result.is_valid - - serialized = serialize(resource) - resource2 = parser.parse(serialized) - - assert len(resource2.entries) == len(resource.entries) - - def test_introspect_complex_message(self) -> None: - """Introspect message with select expression, term reference, and function call.""" - ftl = """ -complex = { NUMBER($count) -> - [one] { -brand } has { $count } item - *[other] { -brand } has { NUMBER($count) } items -} - .hint = { $hint } -""" - parser = FluentParserV1() - resource = parser.parse(ftl) - - msg = resource.entries[0] - assert isinstance(msg, Message) - - info = introspect_message(msg) - - var_names = {v.name for v in info.variables} - func_names = {f.name for f in info.functions} - assert "count" in var_names - assert "hint" in var_names - assert info.has_selectors - assert "NUMBER" in func_names - - -class TestValidationRuntimeConsistency: - """Validation warnings predict runtime resolution failures.""" - - def test_chain_depth_warning_matches_runtime_error(self) -> None: - """VALIDATION_CHAIN_DEPTH_EXCEEDED warning implies MAX_DEPTH_EXCEEDED at runtime.""" - chain_length = MAX_DEPTH + 5 - messages = ["msg-0 = Base"] - for i in range(1, chain_length): - messages.append(f"msg-{i} = {{ msg-{i - 1} }}") - - ftl_source = "\n".join(messages) - - result = validate_resource(ftl_source) - has_chain_warning = any( - w.code == DiagnosticCode.VALIDATION_CHAIN_DEPTH_EXCEEDED - for w in result.warnings - ) - assert has_chain_warning - - bundle = FluentBundle("en", strict=False) - bundle.add_resource(ftl_source) - _, errors = bundle.format_pattern(f"msg-{chain_length - 1}") - has_depth_error = any( - e.diagnostic is not None - and e.diagnostic.code.name == "MAX_DEPTH_EXCEEDED" - for e in errors - ) - assert has_depth_error - - -# ============================================================================= -# NumberLiteral Invariant and Roundtrip -# ============================================================================= - - -class TestNumberLiteralInvariant: - """NumberLiteral enforces raw/value consistency and rejects bool.""" - - def test_bool_value_rejected(self) -> None: - """NumberLiteral rejects bool for value (bool is int subclass, not a number literal).""" - with pytest.raises(TypeError, match="must be int or Decimal, not bool"): - NumberLiteral(value=True, raw="1") - - def test_raw_value_inconsistency_rejected(self) -> None: - """NumberLiteral rejects raw that parses to a different value than the value field.""" - with pytest.raises(ValueError, match=r"parses to.*but value is"): - NumberLiteral(value=Decimal("1.5"), raw="9.9") - - def test_integer_variant_key_exact_match_roundtrip(self) -> None: - """Integer number variant keys select the correct variant.""" - ftl = """ -rating = { $stars -> - [1] Poor - [3] Good - [5] Excellent - *[other] Unknown -} -""" - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(ftl) - - poor, err1 = bundle.format_pattern("rating", {"stars": 1}) - excellent, err2 = bundle.format_pattern("rating", {"stars": 5}) - fallback, err3 = bundle.format_pattern("rating", {"stars": 99}) - - assert not err1 - assert not err2 - assert not err3 - assert poor == "Poor" - assert excellent == "Excellent" - assert fallback == "Unknown" - - def test_decimal_variant_key_roundtrip(self) -> None: - """Decimal number variant keys in serialized FTL survive parse->format roundtrip.""" - ftl = """ -precision = { $level -> - [0.5] Half - [1.0] Full - *[other] Custom -} -""" - resource = parse_ftl(ftl) - serialized = serialize_ftl(resource) - resource2 = parse_ftl(serialized) - - bundle = FluentBundle("en-US", use_isolating=False) - bundle.add_resource(serialize_ftl(resource2)) - - # Default variant (string selector won't match numeric keys) - result, _ = bundle.format_pattern("precision", {"level": "other"}) - assert result == "Custom" - - -# ============================================================================= -# Locale Code Validation -# ============================================================================= - - -class TestLocaleCodeValidation: - """FluentBundle validates locale codes against BCP 47 format.""" - - def test_posix_locale_with_charset_rejected(self) -> None: - """POSIX locale string with charset suffix is rejected with BCP 47 guidance.""" - with pytest.raises(ValueError, match="Strip charset suffixes"): - FluentBundle("en_US.UTF-8") - - def test_valid_bcp47_locales_accepted(self) -> None: - """Valid BCP 47 locale codes are accepted by FluentBundle.""" - for locale in ("en-US", "de-DE", "zh-Hans-CN"): - bundle = FluentBundle(locale, use_isolating=False) - bundle.add_resource("hello = Hello") - result, _ = bundle.format_pattern("hello") - assert result == "Hello" +"""Aggregated integration e2e test surface.""" + +from tests.integration_e2e_cases.essential_parse_format_tests_run_in_every_ci_build import * # noqa: F403 - re-export split test surface +from tests.integration_e2e_cases.essential_parse_format_tests_run_in_every_ci_build_2 import * # noqa: F403 - re-export split test surface +from tests.integration_e2e_cases.intensive_round_trip_tests_fuzz_marked_run_with_pytest_m_fuzz import * # noqa: F403 - re-export split test surface +from tests.integration_e2e_cases.locale_code_validation import * # noqa: F403 - re-export split test surface +from tests.integration_e2e_cases.multi_module_pipeline_tests import * # noqa: F403 - re-export split test surface +from tests.integration_e2e_cases.number_literal_invariant_and_roundtrip import * # noqa: F403 - re-export split test surface diff --git a/tests/test_introspection_iso.py b/tests/test_introspection_iso.py index dd5e562b..858f7510 100644 --- a/tests/test_introspection_iso.py +++ b/tests/test_introspection_iso.py @@ -1,2239 +1,7 @@ -"""Tests for ISO 3166/4217 introspection API. +"""Aggregated ISO introspection test surface.""" -Tests cover: -- TerritoryInfo and CurrencyInfo data classes -- Lookup functions (get_territory, get_currency, etc.) -- Type guards (is_valid_territory_code, is_valid_currency_code) -- Cache behavior -- Localization support -- UnknownLocaleError import failure paths (defensive exception handling) -""" - -import builtins -import sys -from unittest.mock import MagicMock, patch - -import pytest - -import ftllexengine.core.babel_compat as _bc -from ftllexengine.introspection import ( - BabelImportError, - CurrencyCode, - CurrencyInfo, - TerritoryCode, - TerritoryInfo, - clear_iso_cache, - get_currency, - get_currency_decimal_digits, - get_territory, - get_territory_currencies, - is_valid_currency_code, - is_valid_territory_code, - list_currencies, - list_territories, - require_currency_code, - require_territory_code, -) - -# Private member access permitted for integration tests -from ftllexengine.introspection.iso import ( - _get_babel_currencies, - _get_babel_currency_name, - _get_babel_currency_symbol, - _get_babel_official_languages, - _get_babel_territories, - _get_babel_territory_currencies, -) -from ftllexengine.introspection.iso_babel import _is_unknown_locale_error - - -class TestTerritoryInfo: - """Tests for TerritoryInfo dataclass.""" - - def test_immutable(self) -> None: - """TerritoryInfo is immutable (frozen).""" - info = TerritoryInfo( - alpha2=TerritoryCode("US"), name="United States", - currencies=(CurrencyCode("USD"),), official_languages=("en",), - ) - with pytest.raises(AttributeError): - info.alpha2 = TerritoryCode("CA") # type: ignore[misc] - - def test_hashable(self) -> None: - """TerritoryInfo is hashable (can be used in sets/dicts).""" - info = TerritoryInfo( - alpha2=TerritoryCode("US"), name="United States", - currencies=(CurrencyCode("USD"),), official_languages=("en",), - ) - assert hash(info) is not None - territories = {info} - assert len(territories) == 1 - - def test_equality(self) -> None: - """TerritoryInfo instances with same values are equal.""" - info1 = TerritoryInfo( - alpha2=TerritoryCode("US"), name="United States", - currencies=(CurrencyCode("USD"),), official_languages=("en",), - ) - info2 = TerritoryInfo( - alpha2=TerritoryCode("US"), name="United States", - currencies=(CurrencyCode("USD"),), official_languages=("en",), - ) - assert info1 == info2 - - def test_slots(self) -> None: - """TerritoryInfo uses __slots__ for memory efficiency.""" - info = TerritoryInfo( - alpha2=TerritoryCode("US"), name="United States", - currencies=(CurrencyCode("USD"),), official_languages=("en",), - ) - assert not hasattr(info, "__dict__") or info.__dict__ == {} - - def test_multi_currency_territory(self) -> None: - """TerritoryInfo supports multiple currencies for multi-currency territories.""" - info = TerritoryInfo( - alpha2=TerritoryCode("PA"), name="Panama", - currencies=(CurrencyCode("PAB"), CurrencyCode("USD")), - official_languages=("es",), - ) - assert len(info.currencies) == 2 - assert CurrencyCode("PAB") in info.currencies - assert CurrencyCode("USD") in info.currencies - - def test_empty_currencies_tuple(self) -> None: - """TerritoryInfo supports empty currencies tuple for territories without currency data.""" - info = TerritoryInfo( - alpha2=TerritoryCode("AQ"), name="Antarctica", - currencies=(), official_languages=(), - ) - assert info.currencies == () - assert len(info.currencies) == 0 - - def test_official_languages_field(self) -> None: - """TerritoryInfo stores official_languages as tuple of BCP-47 codes.""" - info = TerritoryInfo( - alpha2=TerritoryCode("BE"), name="Belgium", - currencies=(CurrencyCode("EUR"),), - official_languages=("fr", "nl", "de"), - ) - assert info.official_languages == ("fr", "nl", "de") - assert isinstance(info.official_languages, tuple) - - def test_official_languages_empty(self) -> None: - """TerritoryInfo accepts empty official_languages tuple.""" - info = TerritoryInfo( - alpha2=TerritoryCode("AQ"), name="Antarctica", - currencies=(), official_languages=(), - ) - assert info.official_languages == () - - -class TestCurrencyInfo: - """Tests for CurrencyInfo dataclass.""" - - def test_immutable(self) -> None: - """CurrencyInfo is immutable (frozen).""" - info = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) - with pytest.raises(AttributeError): - info.code = CurrencyCode("EUR") # type: ignore[misc] - - def test_hashable(self) -> None: - """CurrencyInfo is hashable (can be used in sets/dicts).""" - info = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) - assert hash(info) is not None - currencies = {info} - assert len(currencies) == 1 - - def test_equality(self) -> None: - """CurrencyInfo instances with same values are equal.""" - info1 = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) - info2 = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) - assert info1 == info2 - - def test_slots(self) -> None: - """CurrencyInfo uses __slots__ for memory efficiency.""" - info = CurrencyInfo(code=CurrencyCode("USD"), name="US Dollar", symbol="$", decimal_digits=2) - assert not hasattr(info, "__dict__") or info.__dict__ == {} - - -class TestGetTerritory: - """Tests for get_territory() function.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_returns_territory_info_for_valid_code(self) -> None: - """get_territory returns TerritoryInfo for known codes.""" - result = get_territory("US") - assert result is not None - assert isinstance(result, TerritoryInfo) - assert result.alpha2 == "US" - assert "United States" in result.name or "USA" in result.name - - def test_returns_none_for_unknown_code(self) -> None: - """get_territory returns None for unknown codes.""" - result = get_territory("XX") - assert result is None - - def test_case_insensitive(self) -> None: - """get_territory accepts lowercase codes.""" - result_upper = get_territory("US") - result_lower = get_territory("us") - result_mixed = get_territory("Us") - - assert result_upper is not None - assert result_lower is not None - assert result_mixed is not None - assert result_upper.alpha2 == result_lower.alpha2 == result_mixed.alpha2 - - def test_localized_names(self) -> None: - """get_territory returns localized names based on locale.""" - result_en = get_territory("DE", locale="en") - result_de = get_territory("DE", locale="de") - - assert result_en is not None - assert result_de is not None - - # English name should contain "Germany" - assert "Germany" in result_en.name - # German name should be "Deutschland" - assert "Deutschland" in result_de.name - - def test_includes_currencies(self) -> None: - """get_territory includes currencies when available.""" - result = get_territory("US") - assert result is not None - assert "USD" in result.currencies - - result_jp = get_territory("JP") - assert result_jp is not None - assert "JPY" in result_jp.currencies - - def test_includes_official_languages(self) -> None: - """get_territory populates official_languages from CLDR data.""" - # GB has English as official language per CLDR - result_gb = get_territory("GB") - assert result_gb is not None - assert isinstance(result_gb.official_languages, tuple) - assert "en" in result_gb.official_languages - - # Belgium has three official languages per CLDR - result_be = get_territory("BE") - assert result_be is not None - assert isinstance(result_be.official_languages, tuple) - assert len(result_be.official_languages) >= 2 - for lang in result_be.official_languages: - assert isinstance(lang, str) - assert len(lang) > 0 - - # official_languages is always a tuple (may be empty for some territories) - result_us = get_territory("US") - assert result_us is not None - assert isinstance(result_us.official_languages, tuple) - - def test_various_territories(self) -> None: - """get_territory works for various territory codes.""" - test_cases = ["US", "CA", "GB", "DE", "FR", "JP", "AU", "BR", "IN", "CN"] - - for code in test_cases: - result = get_territory(code) - assert result is not None, f"Failed for {code}" - assert result.alpha2 == code - assert len(result.name) > 0 - - def test_casefold_expansion_returns_none(self) -> None: - """get_territory returns None for inputs that expand via str.upper(). - - 'ß' (U+00DF, LATIN SMALL LETTER SHARP S) has len 1 but upper() returns - 'SS' (len 2), which is the valid ISO 3166-1 code for South Sudan. The - raw input 'ß' is not a valid territory code and must return None. - Regression for FIX-ISO-CASEFOLD-001. - """ - # 'ß'.upper() == 'SS' (South Sudan) — must not be returned - assert get_territory("ß") is None - # Confirm 'SS' itself IS valid (South Sudan exists in CLDR) - assert get_territory("SS") is not None - - -class TestGetCurrency: - """Tests for get_currency() function.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_returns_currency_info_for_valid_code(self) -> None: - """get_currency returns CurrencyInfo for known codes.""" - result = get_currency("USD") - assert result is not None - assert isinstance(result, CurrencyInfo) - assert result.code == "USD" - assert "$" in result.symbol or "USD" in result.symbol - - def test_returns_none_for_unknown_code(self) -> None: - """get_currency returns None for truly unknown codes.""" - # Use a code that's definitely not in any currency database - result = get_currency("ZZZ") - assert result is None - - def test_case_insensitive(self) -> None: - """get_currency accepts lowercase codes.""" - result_upper = get_currency("USD") - result_lower = get_currency("usd") - result_mixed = get_currency("Usd") - - assert result_upper is not None - assert result_lower is not None - assert result_mixed is not None - assert result_upper.code == result_lower.code == result_mixed.code - - def test_localized_symbols(self) -> None: - """get_currency returns localized symbols based on locale.""" - result_en = get_currency("EUR", locale="en") - result_de = get_currency("EUR", locale="de") - - assert result_en is not None - assert result_de is not None - - def test_decimal_digits_standard(self) -> None: - """get_currency returns correct decimal digits for standard currencies.""" - usd = get_currency("USD") - eur = get_currency("EUR") - gbp = get_currency("GBP") - - assert usd is not None - assert usd.decimal_digits == 2 - assert eur is not None - assert eur.decimal_digits == 2 - assert gbp is not None - assert gbp.decimal_digits == 2 - - def test_decimal_digits_zero(self) -> None: - """get_currency returns 0 decimal digits for zero-decimal currencies.""" - jpy = get_currency("JPY") - krw = get_currency("KRW") - vnd = get_currency("VND") - - assert jpy is not None - assert jpy.decimal_digits == 0 - assert krw is not None - assert krw.decimal_digits == 0 - assert vnd is not None - assert vnd.decimal_digits == 0 - - def test_decimal_digits_three(self) -> None: - """get_currency returns 3 decimal digits for three-decimal currencies.""" - kwd = get_currency("KWD") - bhd = get_currency("BHD") - omr = get_currency("OMR") - - assert kwd is not None - assert kwd.decimal_digits == 3 - assert bhd is not None - assert bhd.decimal_digits == 3 - assert omr is not None - assert omr.decimal_digits == 3 - - def test_decimal_digits_four(self) -> None: - """get_currency returns 4 decimal digits for accounting units.""" - clf = get_currency("CLF") - uyw = get_currency("UYW") - - assert clf is not None - assert clf.decimal_digits == 4 - assert uyw is not None - assert uyw.decimal_digits == 4 - - def test_casefold_expansion_returns_none(self) -> None: - """get_currency returns None for inputs that expand via str.upper(). - - A 2-char input whose upper() produces a valid 3-char currency code - must return None — the raw input is not a valid currency code. - Regression for FIX-ISO-CASEFOLD-001. - """ - # 'ßD' has len 2; 'ßD'.upper() == 'SSD' (not a valid code, but the - # pattern is guarded). Verify the length guard returns None for any - # wrong-length input regardless of what upper() produces. - assert get_currency("ß") is None # len 1 - assert get_currency("ßD") is None # len 2, 'ßD'.upper() = 'SSD' - - -class TestListTerritories: - """Tests for list_territories() function.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_returns_frozenset(self) -> None: - """list_territories returns a frozenset.""" - result = list_territories() - assert isinstance(result, frozenset) - - def test_contains_major_territories(self) -> None: - """list_territories includes major world territories.""" - result = list_territories() - codes = {t.alpha2 for t in result} - - major_codes = ["US", "CA", "GB", "DE", "FR", "JP", "AU", "BR", "IN", "CN"] - for code in major_codes: - assert code in codes, f"Missing {code}" - - def test_all_have_two_letter_codes(self) -> None: - """All returned territories have valid 2-letter alpha codes.""" - result = list_territories() - - for territory in result: - assert len(territory.alpha2) == 2 - assert territory.alpha2.isalpha() - assert territory.alpha2.isupper() - - def test_localized_names(self) -> None: - """list_territories returns localized names based on locale.""" - result_en = list_territories(locale="en") - result_de = list_territories(locale="de") - - # Find Germany in both results - de_en = next((t for t in result_en if t.alpha2 == "DE"), None) - de_de = next((t for t in result_de if t.alpha2 == "DE"), None) - - assert de_en is not None - assert de_de is not None - assert "Germany" in de_en.name - assert "Deutschland" in de_de.name - - -class TestListCurrencies: - """Tests for list_currencies() function.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_returns_frozenset(self) -> None: - """list_currencies returns a frozenset.""" - result = list_currencies() - assert isinstance(result, frozenset) - - def test_contains_major_currencies(self) -> None: - """list_currencies includes major world currencies.""" - result = list_currencies() - codes = {c.code for c in result} - - major_codes = ["USD", "EUR", "GBP", "JPY", "CHF", "CAD", "AUD"] - for code in major_codes: - assert code in codes, f"Missing {code}" - - def test_all_have_three_letter_codes(self) -> None: - """All returned currencies have valid 3-letter codes.""" - result = list_currencies() - - for currency in result: - assert len(currency.code) == 3 - assert currency.code.isalpha() - assert currency.code.isupper() - - -class TestGetTerritoryCurrencies: - """Tests for get_territory_currencies() function.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_returns_currencies_for_known_territory(self) -> None: - """get_territory_currencies returns currencies for known territories.""" - us_currencies = get_territory_currencies("US") - assert isinstance(us_currencies, tuple) - assert "USD" in us_currencies - - jp_currencies = get_territory_currencies("JP") - assert "JPY" in jp_currencies - - gb_currencies = get_territory_currencies("GB") - assert "GBP" in gb_currencies - - def test_returns_empty_tuple_for_unknown_territory(self) -> None: - """get_territory_currencies returns empty tuple for unknown territories.""" - result = get_territory_currencies("XX") - assert result == () - - def test_case_insensitive(self) -> None: - """get_territory_currencies accepts lowercase codes.""" - assert "USD" in get_territory_currencies("us") - assert "JPY" in get_territory_currencies("jp") - - def test_eurozone_countries(self) -> None: - """get_territory_currencies returns EUR for eurozone countries.""" - eurozone = ["DE", "FR", "IT", "ES", "NL", "BE", "AT", "LV", "LT", "EE"] - - for code in eurozone: - result = get_territory_currencies(code) - assert "EUR" in result, f"Expected EUR for {code}, got {result}" - - def test_multi_currency_territories(self) -> None: - """get_territory_currencies returns all currencies for multi-currency territories.""" - # Panama uses both PAB and USD - pa_currencies = get_territory_currencies("PA") - # CLDR data should include at least one currency - assert len(pa_currencies) >= 1 - - def test_returns_tuple_for_immutability(self) -> None: - """get_territory_currencies returns an immutable tuple per architectural requirement.""" - result = get_territory_currencies("US") - assert isinstance(result, tuple) - # Verify it's immutable (tuple cannot be modified) - # Callers can convert to list if mutation is needed: list(result) - - def test_casefold_expansion_returns_empty(self) -> None: - """get_territory_currencies returns () for inputs that expand via str.upper(). - - 'ß' (len 1) uppercases to 'SS' (South Sudan, valid), but the raw - input is not a valid territory code. Must return empty tuple. - Regression for FIX-ISO-CASEFOLD-001. - """ - assert get_territory_currencies("ß") == () - # Confirm 'SS' itself returns currencies (South Sudan uses USD) - assert get_territory_currencies("SS") != () - - -class TestTypeGuards: - """Tests for type guard functions.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_is_valid_territory_code_valid(self) -> None: - """is_valid_territory_code returns True for valid codes.""" - assert is_valid_territory_code("US") is True - assert is_valid_territory_code("GB") is True - assert is_valid_territory_code("JP") is True - - def test_is_valid_territory_code_invalid(self) -> None: - """is_valid_territory_code returns False for invalid codes.""" - # XX is not in CLDR; ZZ is (represents "Unknown Region") - assert is_valid_territory_code("XX") is False - assert is_valid_territory_code("QQ") is False - - def test_is_valid_territory_code_wrong_length(self) -> None: - """is_valid_territory_code returns False for wrong-length strings.""" - assert is_valid_territory_code("U") is False - assert is_valid_territory_code("USA") is False - assert is_valid_territory_code("") is False - - def test_is_valid_territory_code_case_insensitive(self) -> None: - """is_valid_territory_code is case insensitive.""" - assert is_valid_territory_code("us") is True - assert is_valid_territory_code("Us") is True - - def test_is_valid_currency_code_valid(self) -> None: - """is_valid_currency_code returns True for valid codes.""" - assert is_valid_currency_code("USD") is True - assert is_valid_currency_code("EUR") is True - assert is_valid_currency_code("JPY") is True - - def test_is_valid_currency_code_invalid(self) -> None: - """is_valid_currency_code returns False for invalid codes.""" - # ZZZ and QQQ are not in CLDR; XXX is (represents "No currency") - assert is_valid_currency_code("ZZZ") is False - assert is_valid_currency_code("QQQ") is False - - def test_is_valid_currency_code_wrong_length(self) -> None: - """is_valid_currency_code returns False for wrong-length strings.""" - assert is_valid_currency_code("US") is False - assert is_valid_currency_code("USDD") is False - assert is_valid_currency_code("") is False - - def test_is_valid_currency_code_case_insensitive(self) -> None: - """is_valid_currency_code is case insensitive.""" - assert is_valid_currency_code("usd") is True - assert is_valid_currency_code("Usd") is True - - def test_type_guard_lookup_consistency_casefold(self) -> None: - """Type guard and lookup agree for inputs that expand under str.upper(). - - If is_valid_territory_code(v) is False, get_territory(v) must be None. - 'ß' (len 1, upper() = 'SS') violated this invariant before FIX-ISO-CASEFOLD-001. - """ - assert is_valid_territory_code("ß") is False - assert get_territory("ß") is None - - assert is_valid_currency_code("ß") is False - assert get_currency("ß") is None - assert is_valid_currency_code("ßD") is False - assert get_currency("ßD") is None - - -class TestCaching: - """Tests for cache behavior.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_results_are_cached(self) -> None: - """Repeated calls return same cached objects.""" - result1 = get_territory("US") - result2 = get_territory("US") - - # Same object should be returned (cached) - assert result1 is result2 - - def test_clear_cache_works(self) -> None: - """clear_iso_cache clears all caches.""" - # Populate cache - result1 = get_territory("US") - result1_currency = get_currency("USD") - - # Clear cache - clear_iso_cache() - - # New objects should be returned - result2 = get_territory("US") - result2_currency = get_currency("USD") - - # Values should be equal - assert result1 == result2 - assert result1_currency == result2_currency - - def test_different_locales_cached_separately(self) -> None: - """Different locales have separate cache entries.""" - result_en = get_territory("DE", locale="en") - result_de = get_territory("DE", locale="de") - - # Different objects (different locales) - assert result_en != result_de - - # Repeat calls return cached objects - assert get_territory("DE", locale="en") is result_en - assert get_territory("DE", locale="de") is result_de - - -class TestTypeAliases: - """Tests for TerritoryCode and CurrencyCode NewType wrappers.""" - - def test_territory_code_is_str_at_runtime(self) -> None: - """TerritoryCode is a NewType of str; transparent (identity) at runtime.""" - code = TerritoryCode("US") - assert isinstance(code, str) - assert code == "US" - - def test_currency_code_is_str_at_runtime(self) -> None: - """CurrencyCode is a NewType of str; transparent (identity) at runtime.""" - code = CurrencyCode("USD") - assert isinstance(code, str) - assert code == "USD" - - def test_territory_code_newtype_constructor_is_identity(self) -> None: - """TerritoryCode(...) returns the string value unchanged at runtime.""" - raw = "LV" - assert TerritoryCode(raw) == raw - - def test_currency_code_newtype_constructor_is_identity(self) -> None: - """CurrencyCode(...) returns the string value unchanged at runtime.""" - raw = "EUR" - assert CurrencyCode(raw) == raw - - -class TestBabelImportError: - """Tests for BabelImportError exception.""" - - def test_exception_is_import_error_subclass(self) -> None: - """BabelImportError is a subclass of ImportError.""" - assert issubclass(BabelImportError, ImportError) - - def test_exception_message(self) -> None: - """BabelImportError has informative installation message.""" - exc = BabelImportError("ISO introspection") - message = str(exc) - assert "Babel" in message - assert "pip install ftllexengine[babel]" in message - assert "ISO introspection" in message - - def test_exception_can_be_raised_and_caught(self) -> None: - """BabelImportError can be raised and caught.""" - feature = "test feature" - with pytest.raises(BabelImportError) as exc_info: - raise BabelImportError(feature) - assert "Babel" in str(exc_info.value) - - -class TestEdgeCases: - """Tests for edge cases and error handling.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_empty_string_territory(self) -> None: - """get_territory handles empty string gracefully.""" - result = get_territory("") - assert result is None - - def test_empty_string_currency(self) -> None: - """get_currency handles empty string gracefully.""" - result = get_currency("") - assert result is None - - def test_numeric_string_territory(self) -> None: - """get_territory handles numeric strings.""" - result = get_territory("12") - assert result is None - - def test_numeric_string_currency(self) -> None: - """get_currency handles numeric strings.""" - result = get_currency("123") - assert result is None - - def test_whitespace_territory(self) -> None: - """get_territory handles whitespace strings.""" - result = get_territory(" ") - assert result is None - - def test_whitespace_currency(self) -> None: - """get_currency handles whitespace strings.""" - result = get_currency(" ") - assert result is None - - def test_special_iso_codes(self) -> None: - """Test special ISO 4217 codes.""" - # XXX is "No currency" - a valid ISO 4217 code - xxx = get_currency("XXX") - assert xxx is not None - - # XAU is gold - a valid ISO 4217 code - xau = get_currency("XAU") - assert xau is not None - - def test_invalid_locale_territory(self) -> None: - """get_territory returns None for invalid locales.""" - result = get_territory("US", locale="invalid_LOCALE_123") - assert result is None - - def test_invalid_locale_currency(self) -> None: - """get_currency returns None for invalid locales.""" - result = get_currency("USD", locale="invalid_LOCALE_123") - assert result is None - - def test_malformed_locale_list_territories(self) -> None: - """list_territories returns empty frozenset for malformed locales.""" - result = list_territories(locale="xxx_YYY") - assert isinstance(result, frozenset) - assert len(result) == 0 - - def test_malformed_locale_list_currencies(self) -> None: - """list_currencies returns frozenset for malformed locales.""" - result = list_currencies(locale="xxx_YYY") - assert isinstance(result, frozenset) - - def test_currency_symbol_fallback(self) -> None: - """get_currency returns code as symbol fallback for unknown/problematic currencies.""" - # Test with a real currency but in a locale that might not have symbol data - result = get_currency("USD", locale="en") - assert result is not None - # Symbol should either be locale-specific or fall back to code - assert result.symbol in ("$", "US$", "USD") - - def test_territory_without_currency(self) -> None: - """Territories without currency data have empty currencies tuple.""" - # Antarctica (AQ) typically has no official currency - result = get_territory("AQ") - if result is not None: - # May have no currencies (empty tuple) - assert isinstance(result.currencies, tuple) - # May be empty or contain some currencies depending on CLDR data - assert all(isinstance(c, str) for c in result.currencies) - - def test_type_guard_non_string_territory(self) -> None: - """is_valid_territory_code returns False for non-string inputs.""" - assert is_valid_territory_code(None) is False # type: ignore[arg-type] - assert is_valid_territory_code(123) is False # type: ignore[arg-type] - assert is_valid_territory_code([]) is False # type: ignore[arg-type] - assert is_valid_territory_code({}) is False # type: ignore[arg-type] - - def test_type_guard_non_string_currency(self) -> None: - """is_valid_currency_code returns False for non-string inputs.""" - assert is_valid_currency_code(None) is False # type: ignore[arg-type] - assert is_valid_currency_code(123) is False # type: ignore[arg-type] - assert is_valid_currency_code([]) is False # type: ignore[arg-type] - assert is_valid_currency_code({}) is False # type: ignore[arg-type] - - -class TestBabelExceptionHandling: - """Tests for Babel exception handling paths.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_currency_name_none_for_truly_invalid_code(self) -> None: - """get_currency returns None for codes not in CLDR.""" - # Use a code that's definitely not in CLDR - result = get_currency("ZZZ") - assert result is None - - # Another invalid code - result2 = get_currency("QQQ") - assert result2 is None - - def test_currency_symbol_with_unusual_locale(self) -> None: - """get_currency handles unusual locales gracefully.""" - # Test with rare locale that might not have full currency symbol data - result = get_currency("USD", locale="zu") # Zulu - if result is not None: - # Symbol should be present (may be fallback) - assert len(result.symbol) > 0 - - def test_territory_currencies_for_non_sovereign_territories(self) -> None: - """get_territory_currencies handles territories without unique currencies.""" - # Vatican City might have unusual currency data - result = get_territory_currencies("VA") - # May return EUR or empty tuple - assert isinstance(result, tuple) - assert all(isinstance(c, str) for c in result) - - # Antarctica has no official currency - result_aq = get_territory_currencies("AQ") - assert result_aq == () - - def test_get_currency_with_very_rare_locale(self) -> None: - """get_currency handles a locale with minimal CLDR data.""" - # Sichuan Yi (ii) is a valid but rare locale with limited data - result = get_currency("USD", locale="ii") - assert result is None or isinstance(result, CurrencyInfo) - - def test_get_territory_with_deprecated_locale_format(self) -> None: - """get_territory handles POSIX locale format variant.""" - result = get_territory("US", locale="en_US_POSIX") - assert result is None or isinstance(result, TerritoryInfo) - - def test_babel_import_error_propagation(self) -> None: - """BabelImportError is raised when Babel is not available.""" - # Temporarily hide babel modules to trigger ImportError - babel_modules = {k: v for k, v in sys.modules.items() if k.startswith("babel")} - saved_available = _bc._babel_available - try: - # Remove babel from sys.modules - for key in list(babel_modules.keys()): - sys.modules.pop(key, None) - - # Clear caches to force re-import - clear_iso_cache() - - # Prevent import by blocking it - sys.modules["babel"] = None # type: ignore[assignment] - - # Reset the availability sentinel so require_babel() re-evaluates against - # the patched sys.modules. Without this, a cached True value causes - # require_babel() to pass even though Babel is no longer importable, - # leading to a raw ModuleNotFoundError instead of BabelImportError. - _bc._babel_available = None - - # Now try to use the functions - they should raise BabelImportError - # PLC0415: Runtime import needed to test ImportError path - from ftllexengine.introspection import iso - - with pytest.raises(BabelImportError): - iso.get_territory("US") - - finally: - # Restore babel modules and availability sentinel - for key, value in babel_modules.items(): - sys.modules[key] = value - _bc._babel_available = saved_available - # Clear cache again to restore normal operation - clear_iso_cache() - - -class TestPrivateBabelWrappers: - """Tests for private Babel wrapper functions. - - Tests exception handling paths in internal functions. - Private member access permitted. - """ - - def test_get_babel_currency_name_with_invalid_code(self) -> None: - """_get_babel_currency_name returns None for invalid codes.""" - result = _get_babel_currency_name("ZZZ", "en") - assert result is None - - result2 = _get_babel_currency_name("QQQ", "en") - assert result2 is None - - def test_get_babel_currency_name_with_problematic_locale(self) -> None: - """_get_babel_currency_name returns None for malformed locales.""" - result = _get_babel_currency_name("USD", "invalid_LOCALE_123") - assert result is None - - def test_get_babel_currency_symbol_with_unknown_code(self) -> None: - """_get_babel_currency_symbol returns code as fallback for unknown codes.""" - # Test with an invalid code - should return the code itself as fallback - result = _get_babel_currency_symbol("ZZZ", "en") - # Should either work or fall back to the code - assert result == "ZZZ" or len(result) > 0 - - def test_get_babel_currency_symbol_with_problematic_locale(self) -> None: - """_get_babel_currency_symbol falls back to currency code for malformed locales.""" - result = _get_babel_currency_symbol("USD", "xxx_YYY_ZZZ") - assert result == "USD" # Falls back to code - - def test_get_babel_territory_currencies_with_invalid_territory(self) -> None: - """_get_babel_territory_currencies returns empty list for invalid territories.""" - result = _get_babel_territory_currencies("XX") - # Should return empty list for unknown territories - assert isinstance(result, list) - assert len(result) == 0 - - def test_get_babel_territory_currencies_with_antarctica(self) -> None: - """_get_babel_territory_currencies handles territories without currencies.""" - result = _get_babel_territory_currencies("AQ") # Antarctica - # Should return empty list (no official currency) - assert isinstance(result, list) - - def test_get_babel_currency_symbol_fallback_path(self) -> None: - """_get_babel_currency_symbol uses fallback when Babel raises exception.""" - # Use a code/locale combination that might trigger Babel errors - # XTS is a test currency code - might not have symbols in all locales - result = _get_babel_currency_symbol("XTS", "en") - # Should return either a valid symbol or the code as fallback - assert isinstance(result, str) - assert len(result) > 0 - - def test_get_babel_currency_name_import_error(self) -> None: - """_get_babel_currency_name raises BabelImportError when Babel unavailable.""" - _bc._babel_available = False - try: - with pytest.raises(BabelImportError): - _get_babel_currency_name("USD", "en") - finally: - _bc._babel_available = None - - def test_get_babel_currency_symbol_import_error(self) -> None: - """_get_babel_currency_symbol raises BabelImportError when Babel unavailable.""" - # Set sentinel to False to simulate Babel being unavailable. - # Direct sentinel manipulation avoids the recursive __import__ mock pattern. - _bc._babel_available = False - try: - with pytest.raises(BabelImportError): - _get_babel_currency_symbol("USD", "en") - finally: - # Reset so subsequent tests reinitialize with Babel available - _bc._babel_available = None - - def test_get_babel_territory_currencies_import_error(self) -> None: - """_get_babel_territory_currencies raises BabelImportError when Babel unavailable.""" - # Set sentinel to False to simulate Babel being unavailable. - # Direct sentinel manipulation avoids the recursive __import__ mock pattern. - _bc._babel_available = False - try: - with pytest.raises(BabelImportError): - _get_babel_territory_currencies("US") - finally: - # Reset so subsequent tests reinitialize with Babel available - _bc._babel_available = None - - def test_get_babel_territory_currencies_exception_handling(self) -> None: - """_get_babel_territory_currencies returns empty list on Babel API errors. - - The production code calls babel.numbers.get_territory_currencies() directly. - Patching that function to raise ValueError exercises the defensive except clause. - """ - with patch( - "babel.numbers.get_territory_currencies", - side_effect=ValueError("simulated Babel data error"), - ): - result = _get_babel_territory_currencies("US") - assert result == [] - - def test_get_babel_official_languages_exception_handling(self) -> None: - """_get_babel_official_languages returns empty tuple on Babel API errors. - - The production code calls babel.languages.get_official_languages() directly. - Patching that function to raise ValueError exercises the defensive except clause. - """ - with patch( - "babel.languages.get_official_languages", - side_effect=ValueError("simulated Babel data error"), - ): - result = _get_babel_official_languages("GB") - assert result == () - - def test_get_babel_official_languages_lookup_error(self) -> None: - """_get_babel_official_languages returns empty tuple on LookupError.""" - with patch( - "babel.languages.get_official_languages", - side_effect=LookupError("unknown territory"), - ): - result = _get_babel_official_languages("XX") - assert result == () - - def test_list_currencies_filters_invalid_codes(self) -> None: - """list_currencies filters out invalid currency codes from Babel data.""" - # This tests the branch where codes don't match ISO 4217 format - # Clear cache to ensure fresh call - clear_iso_cache() - - # Mock _get_babel_currencies to return invalid codes - original_get_babel_currencies = _get_babel_currencies - - def mock_get_babel_currencies() -> dict[str, str]: - real_currencies = original_get_babel_currencies() - # Add invalid codes to trigger the filter branch - return { - **real_currencies, - "US": "Invalid two-letter code", # Only 2 letters - "USDD": "Invalid four-letter code", # 4 letters - "usd": "Invalid lowercase code", # Lowercase - "12D": "Invalid numeric code", # Contains numbers - "": "Empty code", # Empty - } - - with patch( - "ftllexengine.introspection.iso_lookup._get_babel_currencies", - side_effect=mock_get_babel_currencies, - ): - result = list_currencies() - # Should still return valid currencies, filtering out invalid ones - assert isinstance(result, frozenset) - codes = {c.code for c in result} - # Invalid codes should not be in result - assert "US" not in codes # Two-letter code - assert "USDD" not in codes # Four-letter code - # Valid codes should be present - assert "USD" in codes - - -class TestLocaleNormalization: - """Tests for locale input normalization.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_locale_format_variants_return_same_cached_object(self) -> None: - """Different locale formats should hit the same cache entry.""" - # Clear cache to start fresh - clear_iso_cache() - - # Call with BCP-47 format - result_bcp47 = get_territory("US", locale="en-US") - - # Call with POSIX format (should hit same cache) - result_posix = get_territory("US", locale="en_US") - - # Call with lowercase - result_lower = get_territory("US", locale="en_us") - - # All should return the same cached object - assert result_bcp47 is result_posix - assert result_posix is result_lower - - def test_locale_normalization_for_get_currency(self) -> None: - """get_currency normalizes locale formats to single cache entry.""" - clear_iso_cache() - - result1 = get_currency("EUR", locale="de-DE") - result2 = get_currency("EUR", locale="de_DE") - result3 = get_currency("EUR", locale="de_de") - - # Same cached object for all variants - assert result1 is result2 - assert result2 is result3 - - def test_locale_normalization_for_list_territories(self) -> None: - """list_territories normalizes locale formats to single cache entry.""" - clear_iso_cache() - - result1 = list_territories(locale="fr-FR") - result2 = list_territories(locale="fr_FR") - result3 = list_territories(locale="fr_fr") - - # Same cached object for all variants - assert result1 is result2 - assert result2 is result3 - - def test_locale_normalization_for_list_currencies(self) -> None: - """list_currencies normalizes locale formats to single cache entry.""" - clear_iso_cache() - - result1 = list_currencies(locale="ja-JP") - result2 = list_currencies(locale="ja_JP") - result3 = list_currencies(locale="ja_jp") - - # Same cached object for all variants - assert result1 is result2 - assert result2 is result3 - - def test_code_case_normalization(self) -> None: - """Territory and currency codes are case-normalized.""" - clear_iso_cache() - - # Territory code case variants should hit same cache - t_upper = get_territory("US") - t_lower = get_territory("us") - t_mixed = get_territory("Us") - - assert t_upper is t_lower - assert t_lower is t_mixed - - # Currency code case variants should hit same cache - c_upper = get_currency("USD") - c_lower = get_currency("usd") - c_mixed = get_currency("Usd") - - assert c_upper is c_lower - assert c_lower is c_mixed - - -class TestBoundedCache: - """Tests for bounded LRU cache.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_cache_uses_lru_with_maxsize(self) -> None: - """Cache implementation should use bounded LRU cache.""" - # Import the internal cached functions to check their cache_info - - from ftllexengine.introspection.iso import ( - _get_currency_impl, - _get_territory_currencies_impl, - _get_territory_impl, - _list_currencies_impl, - _list_territories_impl, - ) - - # All internal cached functions should have cache_info method (lru_cache feature) - assert hasattr(_get_territory_impl, "cache_info") - assert hasattr(_get_currency_impl, "cache_info") - assert hasattr(_list_territories_impl, "cache_info") - assert hasattr(_list_currencies_impl, "cache_info") - assert hasattr(_get_territory_currencies_impl, "cache_info") - - # Check maxsize is set (bounded cache, not unbounded) - # pylint: disable=no-value-for-parameter - # Note: cache_info() is a method added by @lru_cache decorator, not - # related to the function's parameters. Pylint doesn't understand this. - info = _get_territory_impl.cache_info() - assert info.maxsize is not None - assert info.maxsize > 0 # Should be MAX_LOCALE_CACHE_SIZE (128) - - def test_cache_statistics_work(self) -> None: - """Cache statistics (hits, misses) should be tracked.""" - from ftllexengine.introspection.iso import ( - _get_territory_impl, - ) - - clear_iso_cache() - - # pylint: disable=no-value-for-parameter - # Note: cache_info() is a method added by @lru_cache decorator, not - # related to the function's parameters. Pylint doesn't understand this. - - # Get initial stats - initial_info = _get_territory_impl.cache_info() - initial_hits = initial_info.hits - initial_misses = initial_info.misses - - # First call should be a miss - get_territory("US") - info_after_first = _get_territory_impl.cache_info() - assert info_after_first.misses == initial_misses + 1 - - # Second call should be a hit - get_territory("US") - info_after_second = _get_territory_impl.cache_info() - assert info_after_second.hits == initial_hits + 1 - - -class TestExceptionNarrowing: - """Tests for narrowed exception handling in Babel wrappers.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_value_error_is_caught(self) -> None: - """ValueError from Babel should be caught and handled gracefully.""" - # Invalid locale formats trigger ValueError in Babel - # The function should return None rather than propagating - result = get_territory("US", locale="invalid") - # Should either work or return None, not raise - assert result is None or isinstance(result, TerritoryInfo) - - def test_lookup_error_is_caught(self) -> None: - """LookupError (UnknownLocaleError) from Babel should be handled.""" - # Test with a locale that doesn't exist in CLDR - try: - result = get_currency("USD", locale="xyz_ABC") - # Should return None or result, not raise - assert result is None or isinstance(result, CurrencyInfo) - except LookupError: - pytest.fail("LookupError should be caught, not propagated") - - def test_attribute_key_error_handled(self) -> None: - """AttributeError and KeyError from data access should be handled.""" - # These are handled internally; we verify by checking edge case inputs - # that might trigger such errors in Babel's data access - result = get_territory("XX") # Unknown territory - assert result is None - - result2 = get_currency("ZZZ") # Unknown currency - assert result2 is None - - def test_name_error_propagates(self) -> None: - """NameError (programming bug) propagates rather than being suppressed. - - The narrowed exception catch list (ValueError, LookupError, KeyError, - AttributeError) excludes NameError; it must propagate uncaught. - """ - def mock_locale_parse(locale_str: str) -> object: - msg = "name 'undefined_var' is not defined" - raise NameError(msg) - - with ( - patch("babel.Locale.parse", side_effect=mock_locale_parse), - pytest.raises(NameError), - ): - _get_babel_currency_name("USD", "en") - - -class TestUnknownLocaleErrorHandling: - """Tests for UnknownLocaleError handling (fuzzer-discovered regression). - - Babel's UnknownLocaleError inherits from Exception, not LookupError. - These tests verify the defensive exception handling catches it properly. - """ - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_very_long_invalid_locale_get_currency(self) -> None: - """get_currency handles very long invalid locales gracefully. - - Regression test: fuzzer discovered UnknownLocaleError leak with - locale='x' * 100. Previously raised babel.core.UnknownLocaleError. - """ - # Fuzzer-discovered input - long_locale = "x" * 100 - result = get_currency("USD", locale=long_locale) - # Should return None (graceful degradation), not raise - assert result is None - - def test_very_long_invalid_locale_get_territory(self) -> None: - """get_territory handles very long invalid locales gracefully. - - Regression test for defensive exception handling. - """ - long_locale = "x" * 100 - result = get_territory("US", locale=long_locale) - # Should return None (graceful degradation), not raise - assert result is None - - def test_garbage_locale_get_currency(self) -> None: - """get_currency handles garbage locale strings gracefully.""" - garbage_locales = [ - "!@#$%^", - "123456789", - "\x00\x01\x02", - "a" * 500, - "xx_YY_ZZ_AA_BB", - ] - for locale in garbage_locales: - result = get_currency("USD", locale=locale) - # Should return None, not raise - assert result is None, f"Failed for locale: {locale!r}" - - def test_garbage_locale_get_territory(self) -> None: - """get_territory handles garbage locale strings gracefully.""" - garbage_locales = [ - "!@#$%^", - "123456789", - "\x00\x01\x02", - "a" * 500, - "xx_YY_ZZ_AA_BB", - ] - for locale in garbage_locales: - result = get_territory("US", locale=locale) - # Should return None, not raise - assert result is None, f"Failed for locale: {locale!r}" - - def test_currency_symbol_fallback_on_invalid_locale(self) -> None: - """_get_babel_currency_symbol returns code as fallback for invalid locale.""" - # When locale is invalid, the function should return the code as fallback - result = _get_babel_currency_symbol("USD", "x" * 100) - assert result == "USD" # Falls back to code - - def test_currency_name_none_on_invalid_locale(self) -> None: - """_get_babel_currency_name returns None for invalid locale.""" - result = _get_babel_currency_name("USD", "x" * 100) - assert result is None - - def test_list_territories_empty_on_invalid_locale(self) -> None: - """list_territories returns empty set for invalid locales.""" - long_locale = "x" * 100 - result = list_territories(locale=long_locale) - # Should return empty frozenset, not raise - assert isinstance(result, frozenset) - assert len(result) == 0 - - def test_list_currencies_with_invalid_locale(self) -> None: - """list_currencies handles invalid locales gracefully.""" - long_locale = "x" * 100 - result = list_currencies(locale=long_locale) - # Should return frozenset (may be empty), not raise - assert isinstance(result, frozenset) - - -class TestClearAllCachesIntegration: - """Tests for clear_module_caches integration with ISO caches.""" - - def test_clear_module_caches_includes_iso_cache(self) -> None: - """clear_module_caches should clear ISO introspection caches.""" - from ftllexengine import clear_module_caches - from ftllexengine.introspection.iso import ( - _get_territory_impl, - ) - - # Populate ISO cache - get_territory("US") - get_currency("USD") - list_territories() - - # pylint: disable=no-value-for-parameter - # Note: cache_info() is a method added by @lru_cache decorator, not - # related to the function's parameters. Pylint doesn't understand this. - - # Verify cache is populated - info_before = _get_territory_impl.cache_info() - assert info_before.currsize > 0 - - # Clear ALL caches (not just ISO) - clear_module_caches() - - # Verify ISO cache is now empty - info_after = _get_territory_impl.cache_info() - assert info_after.currsize == 0 - - -class TestListCurrenciesConsistency: - """Tests for list_currencies() consistency across locales.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_same_currency_count_across_locales(self) -> None: - """list_currencies returns same number of currencies for all locales. - - Currencies without localized names fall back to English names rather - than being excluded, ensuring consistent result sets across locales. - """ - result_en = list_currencies(locale="en") - result_de = list_currencies(locale="de") - result_fr = list_currencies(locale="fr") - - # All locales should return the same number of currencies - assert len(result_en) == len(result_de), ( - f"Currency count differs: en={len(result_en)}, de={len(result_de)}" - ) - assert len(result_en) == len(result_fr), ( - f"Currency count differs: en={len(result_en)}, fr={len(result_fr)}" - ) - - def test_same_currency_codes_across_locales(self) -> None: - """list_currencies returns same currency codes regardless of locale. - - The code set is identical across locales; only names/symbols differ. - """ - codes_en = {c.code for c in list_currencies(locale="en")} - codes_de = {c.code for c in list_currencies(locale="de")} - codes_ja = {c.code for c in list_currencies(locale="ja")} - - assert codes_en == codes_de, "Codes differ: en vs de" - assert codes_en == codes_ja, "Codes differ: en vs ja" - - def test_fallback_name_for_rare_locale(self) -> None: - """Currencies with no localized name use English name as fallback. - - For locales with incomplete CLDR coverage, the English name should - be used rather than excluding the currency. - """ - # Use a rare locale that might have incomplete coverage - result = list_currencies(locale="zu") # Zulu - - # Should still include major currencies - codes = {c.code for c in result} - assert "USD" in codes - assert "EUR" in codes - assert "JPY" in codes - - -class TestTerritoryCacheSize: - """Tests for territory cache bounded by MAX_TERRITORY_CACHE_SIZE.""" - - def setup_method(self) -> None: - """Clear cache before each test.""" - clear_iso_cache() - - def test_territory_currencies_cache_size(self) -> None: - """Territory currencies cache uses correct MAX_TERRITORY_CACHE_SIZE.""" - from ftllexengine.constants import ( - MAX_TERRITORY_CACHE_SIZE, - ) - from ftllexengine.introspection.iso import ( - _get_territory_currencies_impl, - ) - - # pylint: disable=no-value-for-parameter - info = _get_territory_currencies_impl.cache_info() - assert info.maxsize == MAX_TERRITORY_CACHE_SIZE - # Should be 300 (enough for all ~249 territories) - assert info.maxsize >= 249 - - def test_no_cache_thrashing_on_full_iteration(self) -> None: - """Iterating all territories should not cause cache thrashing. - - With MAX_TERRITORY_CACHE_SIZE >= 249, all territories fit in cache. - """ - from ftllexengine.introspection.iso import ( - _get_territory_currencies_impl, - ) - - clear_iso_cache() - - # Iterate all territories - territories = list_territories() - for t in territories: - _ = get_territory_currencies(t.alpha2) - - # pylint: disable=no-value-for-parameter - info = _get_territory_currencies_impl.cache_info() - - # No evictions should have occurred (all fit in cache) - # Eviction count is misses - currsize when cache is full - assert info.maxsize is not None # This cache is bounded - assert info.currsize <= info.maxsize - # All unique territories should be cached - unique_territories = {t.alpha2 for t in territories} - assert info.currsize >= len(unique_territories) - 1 # Allow small margin - - -class _UnexpectedTestError(Exception): - """Custom exception for testing defensive error handling. - - Defined at module level to avoid scoping issues with pytest.raises. - Used to verify that non-UnknownLocaleError exceptions propagate correctly. - """ - - def __str__(self) -> str: - return "Something went wrong - internal processing error" - - -class _LocaleWordTestError(Exception): - """Exception whose message contains 'locale' but is NOT UnknownLocaleError. - - Tests type-based exception matching: this must propagate even though the - message contains the word 'locale'. The old substring-based matching would - have incorrectly suppressed this. - """ - - def __str__(self) -> str: - return "Failed to process locale configuration data" - - -class TestDefensiveExceptionPropagation: - """Tests for defensive exception re-raising in Babel wrappers. - - iso.py catches babel.core.UnknownLocaleError by type (isinstance check) - and re-raises all other exceptions. These tests verify that logic bugs - and unexpected exceptions propagate, including those whose messages - contain 'locale' or 'unknown' but are not UnknownLocaleError. - """ - - def test_currency_name_reraises_unexpected_exception(self) -> None: - """_get_babel_currency_name re-raises non-locale exceptions. - - Tests line 196: raise statement in defensive exception handler. - """ - # This test verifies that unexpected exceptions (not matching the - # "locale" or "unknown" pattern) are propagated rather than suppressed. - - call_count = [0] # Use list to allow modification in nested function - error_msg = "Internal error" - - def mock_locale_parse(locale_str: str) -> object: - """Mock Locale.parse to raise unexpected exception.""" - call_count[0] += 1 - raise _UnexpectedTestError(error_msg) - - # Patch Babel's Locale.parse to inject our test exception - with patch("babel.Locale.parse", side_effect=mock_locale_parse): - # The exception should propagate (not be suppressed) - exception_raised = False - result = None - try: - result = _get_babel_currency_name("USD", "en") - except _UnexpectedTestError: - exception_raised = True - except Exception as e: - pytest.fail(f"Unexpected exception type: {type(e).__name__}: {e}") - - if not exception_raised: - pytest.fail( - f"Expected _UnexpectedTestError to be raised. " - f"Mock called {call_count[0]} times. Result: {result}" - ) - - def test_currency_symbol_reraises_unexpected_exception(self) -> None: - """_get_babel_currency_symbol re-raises non-locale exceptions. - - Tests line 217: raise statement in defensive exception handler. - """ - error_msg = "Internal error" - - def mock_get_currency_symbol(code: str, locale: str | object = None) -> str: - """Mock that raises unexpected exception.""" - raise _UnexpectedTestError(error_msg) - - # Patch get_currency_symbol to trigger the exception path - with patch("babel.numbers.get_currency_symbol", side_effect=mock_get_currency_symbol): - # The exception should propagate (not be suppressed) - exception_raised = False - try: - _get_babel_currency_symbol("USD", "en") - except _UnexpectedTestError: - exception_raised = True - - assert exception_raised, "Expected _UnexpectedTestError to be raised" - - def test_territories_reraises_non_unknown_locale_error_with_locale_word( - self, - ) -> None: - """Non-UnknownLocaleError with 'locale' in message propagates. - - Verifies type-based matching: exceptions whose message contains - 'locale' propagate if not babel.core.UnknownLocaleError. - """ - from ftllexengine.introspection.iso import ( - _get_babel_territories, - ) - - def mock_locale_parse(locale_str: str) -> object: - raise _LocaleWordTestError - - with ( - patch("babel.Locale.parse", side_effect=mock_locale_parse), - pytest.raises(_LocaleWordTestError), - ): - _get_babel_territories("en") - - def test_currency_name_reraises_non_unknown_locale_error_with_locale_word( - self, - ) -> None: - """Non-UnknownLocaleError with 'locale' in message propagates. - - Verifies type-based matching replaces fragile substring matching. - """ - def mock_locale_parse(locale_str: str) -> object: - raise _LocaleWordTestError - - with ( - patch("babel.Locale.parse", side_effect=mock_locale_parse), - pytest.raises(_LocaleWordTestError), - ): - _get_babel_currency_name("USD", "en") - - def test_currency_symbol_reraises_non_unknown_locale_error_with_locale_word( - self, - ) -> None: - """Non-UnknownLocaleError with 'locale' in message propagates. - - Verifies type-based matching replaces fragile substring matching. - """ - def mock_symbol( - code: str, - locale: str | object = None, - ) -> str: - raise _LocaleWordTestError - - with ( - patch( - "babel.numbers.get_currency_symbol", - side_effect=mock_symbol, - ), - pytest.raises(_LocaleWordTestError), - ): - _get_babel_currency_symbol("USD", "en") - - -class TestUnknownLocaleErrorImportFailure: - """Tests for UnknownLocaleError import failure paths. - - These tests cover the edge case where: - 1. Babel raises a non-standard exception (not in the caught set) - 2. Attempting to import UnknownLocaleError fails with ImportError - 3. The original exception should be re-raised - """ - - def test_currency_name_reraises_when_import_fails(self) -> None: - """_get_babel_currency_name re-raises when UnknownLocaleError import fails.""" - - class CustomBabelError(Exception): - """Custom exception to simulate unexpected Babel error.""" - - custom_exc = CustomBabelError("Unexpected Babel error") - mock_get_currency_name = MagicMock(side_effect=custom_exc) - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_arg: dict[str, object] | None = None, - locals_arg: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name in ("babel", "babel.numbers"): - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - if name == "babel.core" and "UnknownLocaleError" in fromlist: - msg = "Cannot import UnknownLocaleError" - raise ImportError(msg) - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - - with ( - patch( - "babel.numbers.get_currency_name", - mock_get_currency_name, - ), - patch("builtins.__import__", side_effect=mock_import), - pytest.raises(CustomBabelError) as exc_info, - ): - _get_babel_currency_name("USD", "en") - - assert exc_info.value is custom_exc - - def test_currency_symbol_reraises_when_import_fails(self) -> None: - """_get_babel_currency_symbol re-raises when UnknownLocaleError import fails.""" - - class CustomBabelError(Exception): - """Custom exception to simulate unexpected Babel error.""" - - custom_exc = CustomBabelError("Unexpected symbol error") - mock_get_currency_symbol = MagicMock(side_effect=custom_exc) - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_arg: dict[str, object] | None = None, - locals_arg: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel.numbers": - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - if name == "babel.core" and "UnknownLocaleError" in fromlist: - msg = "Cannot import UnknownLocaleError" - raise ImportError(msg) - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - - with ( - patch( - "babel.numbers.get_currency_symbol", - mock_get_currency_symbol, - ), - patch("builtins.__import__", side_effect=mock_import), - pytest.raises(CustomBabelError) as exc_info, - ): - _get_babel_currency_symbol("USD", "en") - - assert exc_info.value is custom_exc - - def test_currency_name_chained_exception_propagation(self) -> None: - """Exception propagation when UnknownLocaleError import fails.""" - - class UnexpectedError(Exception): - """Simulates an unexpected Babel exception.""" - - original_exc = UnexpectedError("Original error") - mock_get_currency_name = MagicMock(side_effect=original_exc) - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_arg: dict[str, object] | None = None, - locals_arg: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name in ("babel", "babel.numbers"): - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - if name == "babel.core" and "UnknownLocaleError" in fromlist: - msg = "UnknownLocaleError unavailable" - raise ImportError(msg) - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - - with ( - patch( - "babel.numbers.get_currency_name", - mock_get_currency_name, - ), - patch("builtins.__import__", side_effect=mock_import), - pytest.raises(UnexpectedError) as exc_info, - ): - _get_babel_currency_name("USD", "en") - - assert exc_info.value is original_exc - - def test_currency_symbol_chained_exception_propagation(self) -> None: - """Exception propagation when UnknownLocaleError import fails.""" - - class UnexpectedError(Exception): - """Simulates an unexpected Babel exception.""" - - original_exc = UnexpectedError("Original symbol error") - mock_get_currency_symbol = MagicMock(side_effect=original_exc) - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_arg: dict[str, object] | None = None, - locals_arg: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel.numbers": - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - if name == "babel.core" and "UnknownLocaleError" in fromlist: - msg = "UnknownLocaleError unavailable" - raise ImportError(msg) - return original_import( - name, globals_arg, locals_arg, fromlist, level - ) - - with ( - patch( - "babel.numbers.get_currency_symbol", - mock_get_currency_symbol, - ), - patch("builtins.__import__", side_effect=mock_import), - pytest.raises(UnexpectedError) as exc_info, - ): - _get_babel_currency_symbol("USD", "en") - - assert exc_info.value is original_exc - - -class TestIsoBabelDefensiveBranches: - """Direct coverage for defensive helper branches in iso_babel.py.""" - - def test_is_unknown_locale_error_returns_false_when_babel_is_unavailable(self) -> None: - """BabelImportError while resolving the error class yields False.""" - with patch( - "ftllexengine.introspection.iso_babel.get_unknown_locale_error_class", - side_effect=BabelImportError("UnknownLocaleError"), - ): - assert _is_unknown_locale_error(ValueError("not a locale error")) is False - - def test_is_unknown_locale_error_returns_true_for_matching_exception(self) -> None: - """The helper returns True when the exception matches Babel's error class.""" - - class FakeUnknownLocaleError(Exception): - """Stand-in for babel.core.UnknownLocaleError.""" - - with patch( - "ftllexengine.introspection.iso_babel.get_unknown_locale_error_class", - return_value=FakeUnknownLocaleError, - ): - assert _is_unknown_locale_error(FakeUnknownLocaleError("bad locale")) is True - - def test_get_babel_territories_without_unknown_locale_class_success(self) -> None: - """The no-UnknownLocaleError branch still returns territory data when lookup succeeds.""" - - class FakeLocale: - def __init__(self) -> None: - self.territories = {"US": "United States"} - - with ( - patch( - "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", - return_value=None, - ), - patch( - "ftllexengine.introspection.iso_babel._get_babel_locale", - return_value=FakeLocale(), - ), - ): - assert _get_babel_territories("en") == {"US": "United States"} - - def test_get_babel_territories_without_unknown_locale_class_failure(self) -> None: - """The no-UnknownLocaleError branch returns an empty mapping on locale lookup errors.""" - with ( - patch( - "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", - return_value=None, - ), - patch( - "ftllexengine.introspection.iso_babel._get_babel_locale", - side_effect=ValueError("bad locale"), - ), - ): - assert _get_babel_territories("en") == {} - - def test_get_babel_currency_name_without_unknown_locale_class_success(self) -> None: - """The no-UnknownLocaleError branch returns the localized currency name.""" - - class FakeLocale: - def __init__(self) -> None: - self.currencies = {"USD": "US Dollar"} - - class FakeLocaleClass: - @staticmethod - def parse(_locale_str: str) -> FakeLocale: - return FakeLocale() - - class FakeNumbers: - @staticmethod - def get_currency_name(_code: str, *, locale: str) -> str: - assert locale == "en" - return "US Dollar" - - with ( - patch( - "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", - return_value=None, - ), - patch( - "ftllexengine.introspection.iso_babel.get_locale_class", - return_value=FakeLocaleClass, - ), - patch( - "ftllexengine.introspection.iso_babel.get_babel_numbers", - return_value=FakeNumbers, - ), - ): - assert _get_babel_currency_name("USD", "en") == "US Dollar" - - def test_get_babel_currency_name_without_unknown_locale_class_failure(self) -> None: - """The no-UnknownLocaleError branch returns None on locale parse errors.""" - - class FakeLocaleClass: - @staticmethod - def parse(_locale_str: str) -> object: - msg = "bad locale" - raise ValueError(msg) - - with ( - patch( - "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", - return_value=None, - ), - patch( - "ftllexengine.introspection.iso_babel.get_locale_class", - return_value=FakeLocaleClass, - ), - patch( - "ftllexengine.introspection.iso_babel.get_babel_numbers", - return_value=MagicMock(), - ), - ): - assert _get_babel_currency_name("USD", "en") is None - - def test_get_babel_currency_name_without_unknown_locale_class_missing_code(self) -> None: - """The no-UnknownLocaleError branch returns None for absent currency codes.""" - - class FakeLocale: - def __init__(self) -> None: - self.currencies = {"EUR": "Euro"} - - class FakeLocaleClass: - @staticmethod - def parse(_locale_str: str) -> FakeLocale: - return FakeLocale() - - with ( - patch( - "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", - return_value=None, - ), - patch( - "ftllexengine.introspection.iso_babel.get_locale_class", - return_value=FakeLocaleClass, - ), - patch( - "ftllexengine.introspection.iso_babel.get_babel_numbers", - return_value=MagicMock(), - ), - ): - assert _get_babel_currency_name("USD", "en") is None - - def test_get_babel_currency_symbol_without_unknown_locale_class_success(self) -> None: - """The no-UnknownLocaleError branch returns the localized symbol when lookup succeeds.""" - - class FakeNumbers: - @staticmethod - def get_currency_symbol(_code: str, *, locale: str) -> str: - assert locale == "en" - return "$" - - with ( - patch( - "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", - return_value=None, - ), - patch( - "ftllexengine.introspection.iso_babel.get_babel_numbers", - return_value=FakeNumbers, - ), - ): - assert _get_babel_currency_symbol("USD", "en") == "$" - - def test_get_babel_currency_symbol_without_unknown_locale_class_failure(self) -> None: - """The no-UnknownLocaleError branch falls back to the code on lookup errors.""" - - class FakeNumbers: - @staticmethod - def get_currency_symbol(_code: str, *, locale: str) -> str: - _ = locale - msg = "bad locale" - raise ValueError(msg) - - with ( - patch( - "ftllexengine.introspection.iso_babel._maybe_unknown_locale_error_class", - return_value=None, - ), - patch( - "ftllexengine.introspection.iso_babel.get_babel_numbers", - return_value=FakeNumbers, - ), - ): - assert _get_babel_currency_symbol("USD", "en") == "USD" - - -# =========================================================================== -# get_currency_decimal_digits -# =========================================================================== - - -class TestGetCurrencyDecimalDigits: - """Tests for get_currency_decimal_digits() convenience function. - - Decimal precision is locale-independent (ISO 4217 standard). - The function must not require a locale parameter. - """ - - def test_standard_two_decimal_currencies(self) -> None: - """Common 2-decimal currencies return 2.""" - for code in ("EUR", "USD", "GBP", "CHF", "CAD", "AUD", "NZD"): - assert get_currency_decimal_digits(code) == 2, ( - f"{code} should have 2 decimal digits" - ) - - def test_zero_decimal_currencies(self) -> None: - """Zero-decimal currencies return 0.""" - for code in ("JPY", "KRW", "VND", "ISK", "CLP"): - assert get_currency_decimal_digits(code) == 0, ( - f"{code} should have 0 decimal digits" - ) - - def test_three_decimal_currencies(self) -> None: - """Three-decimal currencies return 3.""" - for code in ("KWD", "JOD", "OMR", "BHD", "TND"): - assert get_currency_decimal_digits(code) == 3, ( - f"{code} should have 3 decimal digits" - ) - - def test_four_decimal_currencies(self) -> None: - """Four-decimal currencies return 4.""" - assert get_currency_decimal_digits("CLF") == 4 - assert get_currency_decimal_digits("UYW") == 4 - - def test_unknown_code_returns_none(self) -> None: - """Unknown ISO code returns None.""" - assert get_currency_decimal_digits("XYZ") is None - assert get_currency_decimal_digits("FOO") is None - - def test_case_insensitive(self) -> None: - """Currency code lookup is case-insensitive.""" - assert get_currency_decimal_digits("eur") == 2 - assert get_currency_decimal_digits("Eur") == 2 - assert get_currency_decimal_digits("EUR") == 2 - assert get_currency_decimal_digits("jpy") == 0 - - def test_wrong_length_returns_none(self) -> None: - """Codes of wrong length return None without Babel call.""" - assert get_currency_decimal_digits("") is None - assert get_currency_decimal_digits("EU") is None - assert get_currency_decimal_digits("EURO") is None - - def test_consistent_with_get_currency(self) -> None: - """Result matches get_currency(code).decimal_digits for all known codes.""" - for code in ("USD", "EUR", "JPY", "KWD", "CLF", "GBP"): - info = get_currency(code) - assert info is not None - digits = get_currency_decimal_digits(code) - assert digits == info.decimal_digits, ( - f"Inconsistency for {code}: get_currency_decimal_digits={digits}, " - f"get_currency().decimal_digits={info.decimal_digits}" - ) - - def test_latvian_lats_historical(self) -> None: - """Historical currency LVL (Latvian Lats) returns None (withdrawn from ISO 4217).""" - # LVL is a withdrawn currency — Babel no longer includes it in active CLDR data. - # get_currency_decimal_digits must return None for withdrawn/unknown codes. - result = get_currency_decimal_digits("LVL") - # Accept both None (withdrawn from Babel's CLDR) and 2 (if still in data). - assert result in (None, 2), f"LVL should be None or 2, got {result!r}" - - def test_precious_metal_x_codes_return_zero(self) -> None: - """ISO 4217 precious-metal X-codes return 0 decimal digits.""" - for code in ("XAG", "XAU", "XPD", "XPT"): - assert get_currency_decimal_digits(code) == 0, ( - f"{code} (precious metal) should have 0 decimal digits" - ) - - def test_special_x_codes_return_zero(self) -> None: - """ISO 4217 special X-codes (bond units, SDR, testing, no-currency) return 0.""" - for code in ("XBA", "XBB", "XBC", "XBD", "XDR", "XSU", "XTS", "XUA", "XXX"): - assert get_currency_decimal_digits(code) == 0, ( - f"{code} should have 0 decimal digits" - ) - - def test_xcd_eastern_caribbean_is_two_decimal(self) -> None: - """XCD (Eastern Caribbean Dollar) uses default 2 decimal digits.""" - assert get_currency_decimal_digits("XCD") == 2 - - def test_babel_free_no_babel_install_required(self) -> None: - """get_currency_decimal_digits works without Babel installed. - - Validates the Babel-free contract: result must not depend on any - Babel import path. We verify by confirming standard codes work and - that the returned value is a plain int (not a Babel-derived object). - """ - result = get_currency_decimal_digits("USD") - assert result == 2 - assert type(result) is int - - def test_known_invalid_codes_return_none(self) -> None: - """Non-ISO codes return None without fallback to default.""" - for code in ("XYZ", "FOO", "ZZZ", "AAA", "TST"): - assert get_currency_decimal_digits(code) is None, ( - f"Unknown code {code!r} should return None" - ) - - def test_casefold_expansion_guard(self) -> None: - """Single-char inputs that expand via .upper() return None (no casefold confusion). - - Verifies the raw-length guard prevents the 'ß' -> 'SS' casefold expansion - from matching 'SS' or any other 2-char result of uppercasing a 1-char input. - """ - assert get_currency_decimal_digits("ß") is None - assert get_currency_decimal_digits("a") is None - - def test_fund_codes_return_correct_precision(self) -> None: - """ISO 4217 fund codes are valid and return correct precision.""" - # BOV (Bolivian Mvdol), MXV (Mexican Unidad), USN (US Next Day): 2 decimal - for code in ("BOV", "MXV", "USN"): - result = get_currency_decimal_digits(code) - assert result == 2, f"{code} (fund code) should have 2 decimal digits" - # UYI (Uruguay Peso en Unidades Indexadas): 0 decimal - assert get_currency_decimal_digits("UYI") == 0 - - def test_recently_added_active_codes(self) -> None: - """Codes added by recent ISO 4217 amendments are active and return precision. - - VED (Amendment 169, 2021), ZWG (Amendment 171+, 2024), and XCG - (Amendment 17x, 2025) are active ISO 4217 codes with default 2 decimal digits. - """ - for code in ("VED", "ZWG", "XCG"): - result = get_currency_decimal_digits(code) - assert result == 2, ( - f"{code} (recently-added active code) should have 2 decimal digits, " - f"got {result!r}" - ) - - def test_recently_retired_codes_return_none(self) -> None: - """Codes retired by recent ISO 4217 amendments return None. - - SLL (Sierra Leone Leone, retired Amendment 170, 2022) and ZWL - (Zimbabwean Dollar, retired Amendment 171+, 2024) are no longer active - and must not appear in ISO_4217_VALID_CODES. - """ - for code in ("SLL", "ZWL"): - result = get_currency_decimal_digits(code) - assert result is None, ( - f"{code} (retired code) should return None, got {result!r}" - ) - - def test_iqd_iso_standard_value(self) -> None: - """IQD (Iraqi Dinar) returns ISO 4217 standard value of 3 decimal digits. - - ISO 4217 specifies IQD with 3 decimal places (fils subdivision). - Babel CLDR reports 0 because fils are not used in practice. - This library follows the ISO standard, not CLDR practical usage. - """ - assert get_currency_decimal_digits("IQD") == 3 - - -class TestRequireCurrencyCode: - """Tests for require_currency_code boundary validator.""" - - def test_valid_uppercase_code_returns_currency_code(self) -> None: - """Valid uppercase ISO 4217 code returns CurrencyCode.""" - result = require_currency_code("USD", "price") - assert result == CurrencyCode("USD") - assert type(result) is str # CurrencyCode is a str alias - - def test_valid_lowercase_code_is_normalized(self) -> None: - """Lowercase code is normalised to uppercase CurrencyCode.""" - result = require_currency_code("eur", "amount") - assert result == CurrencyCode("EUR") - - def test_valid_mixed_case_code_is_normalized(self) -> None: - """Mixed-case code is normalised to uppercase.""" - result = require_currency_code("Jpy", "fee") - assert result == CurrencyCode("JPY") - - def test_leading_trailing_whitespace_is_stripped(self) -> None: - """Whitespace around a valid code is stripped before validation.""" - result = require_currency_code(" GBP ", "price") - assert result == CurrencyCode("GBP") - - def test_invalid_code_raises_value_error(self) -> None: - """Unrecognised currency code raises ValueError.""" - with pytest.raises(ValueError, match="currency code"): - require_currency_code("XYZ", "amount") - - def test_empty_string_raises_value_error(self) -> None: - """Empty string raises ValueError (not a valid ISO 4217 code).""" - with pytest.raises(ValueError, match="currency code"): - require_currency_code("", "amount") - - def test_whitespace_only_raises_value_error(self) -> None: - """Whitespace-only string raises ValueError after stripping.""" - with pytest.raises(ValueError, match="currency code"): - require_currency_code(" ", "amount") - - def test_non_str_raises_type_error(self) -> None: - """Non-str value raises TypeError with field_name in message.""" - with pytest.raises(TypeError, match="price"): - require_currency_code(123, "price") - - def test_none_raises_type_error(self) -> None: - """None raises TypeError.""" - with pytest.raises(TypeError, match="currency_code"): - require_currency_code(None, "currency_code") - - def test_field_name_in_error_message(self) -> None: - """field_name appears in both TypeError and ValueError messages.""" - with pytest.raises(TypeError, match="my_field"): - require_currency_code(42, "my_field") - with pytest.raises(ValueError, match="my_field"): - require_currency_code("BADCODE", "my_field") - - def test_valid_codes_cover_major_currencies(self) -> None: - """Major ISO 4217 codes are accepted.""" - for code in ("USD", "EUR", "GBP", "JPY", "CHF", "CAD", "AUD"): - result = require_currency_code(code, "amount") - assert result == CurrencyCode(code) - - def test_returns_currency_code_type(self) -> None: - """Return value is CurrencyCode (str subtype).""" - result = require_currency_code("USD", "amount") - assert isinstance(result, str) - - -class TestRequireTerritoryCode: - """Tests for require_territory_code boundary validator.""" - - def test_valid_uppercase_code_returns_territory_code(self) -> None: - """Valid uppercase ISO 3166-1 alpha-2 code returns TerritoryCode.""" - result = require_territory_code("US", "region") - assert result == TerritoryCode("US") - - def test_valid_lowercase_code_is_normalized(self) -> None: - """Lowercase code is normalised to uppercase TerritoryCode.""" - result = require_territory_code("de", "country") - assert result == TerritoryCode("DE") - - def test_valid_mixed_case_code_is_normalized(self) -> None: - """Mixed-case code is normalised to uppercase.""" - result = require_territory_code("Gb", "territory") - assert result == TerritoryCode("GB") - - def test_leading_trailing_whitespace_is_stripped(self) -> None: - """Whitespace around a valid code is stripped before validation.""" - result = require_territory_code(" FR ", "country") - assert result == TerritoryCode("FR") - - def test_invalid_code_raises_value_error(self) -> None: - """Unrecognised territory code raises ValueError.""" - # "99"/"X9" contain digits — not valid ISO 3166-1 alpha-2 codes - with pytest.raises(ValueError, match="territory code"): - require_territory_code("99", "region") - - def test_empty_string_raises_value_error(self) -> None: - """Empty string raises ValueError.""" - with pytest.raises(ValueError, match="territory code"): - require_territory_code("", "region") - - def test_whitespace_only_raises_value_error(self) -> None: - """Whitespace-only string raises ValueError after stripping.""" - with pytest.raises(ValueError, match="territory code"): - require_territory_code(" ", "region") - - def test_three_char_code_raises_value_error(self) -> None: - """3-char string is not a valid alpha-2 code and raises ValueError.""" - with pytest.raises(ValueError, match="territory code"): - require_territory_code("USA", "country") - - def test_non_str_raises_type_error(self) -> None: - """Non-str value raises TypeError with field_name in message.""" - with pytest.raises(TypeError, match="region"): - require_territory_code(42, "region") - - def test_none_raises_type_error(self) -> None: - """None raises TypeError.""" - with pytest.raises(TypeError, match="territory"): - require_territory_code(None, "territory") - - def test_field_name_in_error_message(self) -> None: - """field_name appears in both TypeError and ValueError messages.""" - with pytest.raises(TypeError, match="my_field"): - require_territory_code(99, "my_field") - with pytest.raises(ValueError, match="my_field"): - require_territory_code("XX", "my_field") - - def test_valid_codes_cover_major_territories(self) -> None: - """Major ISO 3166-1 alpha-2 codes are accepted.""" - for code in ("US", "DE", "GB", "FR", "JP", "CA", "AU"): - result = require_territory_code(code, "region") - assert result == TerritoryCode(code) - - def test_casefold_expansion_guard(self) -> None: - """Single-char inputs that expand via .upper() (e.g. 'ß'->'SS') are rejected.""" - with pytest.raises(ValueError, match="territory code"): - require_territory_code("ß", "region") - - def test_returns_territory_code_type(self) -> None: - """Return value is TerritoryCode (str subtype).""" - result = require_territory_code("US", "region") - assert isinstance(result, str) +from tests.introspection_iso_cases.cache_and_babel import * # noqa: F403 - split module reuses shared support imports +from tests.introspection_iso_cases.defensive_branches import * # noqa: F403 - split module reuses shared support imports +from tests.introspection_iso_cases.error_paths import * # noqa: F403 - split module reuses shared support imports +from tests.introspection_iso_cases.lookup import * # noqa: F403 - split module reuses shared support imports +from tests.introspection_iso_cases.requirements import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_introspection_message.py b/tests/test_introspection_message.py index b59cd95b..ae9e5f97 100644 --- a/tests/test_introspection_message.py +++ b/tests/test_introspection_message.py @@ -1,1802 +1,6 @@ -"""Tests for FTL message introspection API. +"""Aggregated message introspection test surface.""" -Covers variable extraction, function introspection, reference tracking, -MessageIntrospection object contracts, span tracking, depth limits, -exhaustiveness guards, and caching. All branches in message.py are exercised. -""" - -from __future__ import annotations - -import threading -from unittest.mock import patch - -import pytest -from hypothesis import event, given, settings -from hypothesis import strategies as st - -import ftllexengine.introspection.message as _introspection_msg_mod -from ftllexengine import FluentBundle, parse_ftl -from ftllexengine.enums import ReferenceKind, VariableContext -from ftllexengine.introspection import ( - MessageVariableValidationResult, - VariableInfo, - clear_introspection_cache, - extract_references, - extract_references_by_attribute, - extract_variables, - introspect_message, - validate_message_variables, -) -from ftllexengine.introspection.message import ( - IntrospectionVisitor, - ReferenceExtractor, - _introspection_cache, - _introspection_cache_lock, -) -from ftllexengine.syntax.ast import ( - Attribute, - CallArguments, - FunctionReference, - Identifier, - Junk, - Message, - NamedArgument, - NumberLiteral, - Pattern, - Placeable, - StringLiteral, - Term, - TermReference, - TextElement, - VariableReference, -) -from ftllexengine.syntax.parser import FluentParserV1 - -# =========================================================================== -# HELPERS -# =========================================================================== - - -def _parse_message(ftl: str) -> Message: - """Parse FTL source and return first Message entry.""" - resource = FluentParserV1().parse(ftl) - entry = resource.entries[0] - assert isinstance(entry, Message) - return entry - - -def _parse_term(ftl: str) -> Term: - """Parse FTL source and return first Term entry.""" - resource = FluentParserV1().parse(ftl) - entry = resource.entries[0] - assert isinstance(entry, Term) - return entry - - -def _make_message( - name: str, - *, - value: Pattern | None = None, - attributes: tuple[Attribute, ...] = (), -) -> Message: - """Construct a Message programmatically (bypasses parser).""" - return Message(id=Identifier(name=name), value=value, attributes=attributes) - - -def _make_pattern(*elements: TextElement | Placeable) -> Pattern: - """Construct a Pattern from elements.""" - return Pattern(elements=elements) - - -# =========================================================================== -# VARIABLE EXTRACTION -# =========================================================================== - - -class TestVariableExtraction: - """Variable extraction from various message patterns.""" - - def test_simple_variable(self) -> None: - """Extract single variable from simple message.""" - bundle = FluentBundle("en") - bundle.add_resource("greeting = Hello, { $name }!") - assert bundle.get_message_variables("greeting") == frozenset({"name"}) - - def test_multiple_variables(self) -> None: - """Extract multiple variables from message.""" - bundle = FluentBundle("en") - bundle.add_resource("user-info = { $firstName } { $lastName } (Age: { $age })") - assert bundle.get_message_variables("user-info") == frozenset( - {"firstName", "lastName", "age"} - ) - - def test_duplicate_variables(self) -> None: - """Duplicate variable references appear once (frozenset deduplication).""" - bundle = FluentBundle("en") - bundle.add_resource("greeting = { $name }, nice to meet you { $name }!") - assert bundle.get_message_variables("greeting") == frozenset({"name"}) - - def test_no_variables(self) -> None: - """Message with no variables returns empty frozenset.""" - bundle = FluentBundle("en") - bundle.add_resource("hello = Hello, World!") - assert bundle.get_message_variables("hello") == frozenset() - - def test_message_not_found(self) -> None: - """KeyError raised for non-existent message.""" - bundle = FluentBundle("en") - with pytest.raises(KeyError, match=r"Message 'nonexistent' not found"): - bundle.get_message_variables("nonexistent") - - def test_plain_text_pattern_has_no_variables(self) -> None: - """TextElement branch: patterns with only text extract nothing.""" - msg = _parse_message("msg = Plain text without any placeables") - result = introspect_message(msg) - assert len(result.get_variable_names()) == 0 - assert len(result.get_function_names()) == 0 - assert not result.has_selectors - - def test_text_element_branch_in_visitor(self) -> None: - """TextElement case in _visit_pattern_element executes without effect.""" - msg = _parse_message("msg = just text") - visitor = IntrospectionVisitor() - assert msg.value is not None - visitor.visit(msg.value) - assert visitor.variables == set() - - def test_extract_variables_direct_api(self) -> None: - """extract_variables() convenience function delegates correctly.""" - msg = _parse_message("greeting = Hello, { $name }!") - assert extract_variables(msg) == frozenset({"name"}) - - def test_extract_variables_from_select_with_variants(self) -> None: - """All variant-local variables are captured.""" - msg = _parse_message( - "msg = { $count ->\n" - " [one] You have { $count } item from { $source }\n" - " [few] You have { $count } items from { $source }\n" - " *[other] You have { $count } items from { $source }\n" - "}" - ) - vars_ = extract_variables(msg) - assert "count" in vars_ - assert "source" in vars_ - - -# =========================================================================== -# SELECT EXPRESSIONS -# =========================================================================== - - -class TestSelectExpressions: - """Variable extraction from select expressions.""" - - def test_selector_variable(self) -> None: - """Variable used in selector is extracted.""" - bundle = FluentBundle("en") - bundle.add_resource( - "emails = { $count ->\n [one] one email\n *[other] { $count } emails\n}\n" - ) - assert "count" in bundle.get_message_variables("emails") - - def test_variant_variables(self) -> None: - """Variables in variants are all extracted.""" - bundle = FluentBundle("en") - bundle.add_resource( - "message = { $userType ->\n" - " [admin] Hello { $name }, you are an admin\n" - " *[user] Welcome { $name }\n" - "}\n" - ) - assert bundle.get_message_variables("message") == frozenset({"userType", "name"}) - - def test_nested_selectors(self) -> None: - """Nested select expressions extract all variables.""" - bundle = FluentBundle("en") - bundle.add_resource( - "complex = { $gender ->\n" - " [male] { $count ->\n" - " [one] one item\n" - " *[other] { $count } items\n" - " }\n" - " *[female] { $count } things\n" - "}\n" - ) - assert bundle.get_message_variables("complex") == frozenset({"gender", "count"}) - - def test_has_selectors_flag_set(self) -> None: - """MessageIntrospection.has_selectors is True for select expressions.""" - msg = _parse_message( - "msg = { $count ->\n [0] No items\n [1] One item\n *[other] Many items\n}\n" - ) - result = introspect_message(msg) - assert result.has_selectors is True - assert "count" in result.get_variable_names() - - def test_has_selectors_flag_false_for_plain(self) -> None: - """has_selectors is False for messages without select expressions.""" - msg = _parse_message("simple = Hello") - assert not introspect_message(msg).has_selectors - - -# =========================================================================== -# FUNCTION INTROSPECTION -# =========================================================================== - - -class TestFunctionIntrospection: - """Function call detection and metadata extraction.""" - - def test_function_detection(self) -> None: - """Function calls are detected and named correctly.""" - info = introspect_message(_parse_message("price = { NUMBER($amount) }")) - assert "NUMBER" in info.get_function_names() - assert "amount" in info.get_variable_names() - - def test_function_with_named_args(self) -> None: - """Named argument keys are captured in FunctionCallInfo.""" - info = introspect_message( - _parse_message("price = { NUMBER($amount, minimumFractionDigits: 2) }") - ) - funcs = list(info.functions) - assert len(funcs) == 1 - assert funcs[0].name == "NUMBER" - assert "amount" in funcs[0].positional_arg_vars - assert "minimumFractionDigits" in funcs[0].named_args - - def test_multiple_functions(self) -> None: - """Multiple distinct function calls are all detected.""" - info = introspect_message( - _parse_message("ts = { NUMBER($value) } at { DATETIME($time) }") - ) - assert info.get_function_names() == frozenset({"NUMBER", "DATETIME"}) - - def test_function_without_arguments(self) -> None: - """Function with empty argument list (FUNC()) is detected.""" - msg = _parse_message("msg = Result: { BUILTIN() }") - result = introspect_message(msg) - assert "BUILTIN" in result.get_function_names() - - def test_function_with_empty_arguments(self) -> None: - """FunctionReference with empty CallArguments is detected and has no variables. - - Verifies that a function call with no positional or named arguments - produces a FunctionCallInfo with empty variable sets. - """ - func_ref = FunctionReference( - id=Identifier(name="NOOP"), - arguments=CallArguments(positional=(), named=()), - ) - msg = _make_message( - "test", value=_make_pattern(Placeable(expression=func_ref)) - ) - info = introspect_message(msg, use_cache=False) - assert "NOOP" in info.get_function_names() - assert len(info.get_variable_names()) == 0 - - def test_function_multiple_positional_args(self) -> None: - """Multiple positional arguments are all extracted.""" - msg = _parse_message("msg = { FUNC($a, $b, $c) }") - result = introspect_message(msg) - assert result.get_variable_names() == frozenset({"a", "b", "c"}) - - def test_function_variable_in_positional_arg_with_literal_named_arg(self) -> None: - """Variable reference in positional arg is extracted; named arg literals are not. - - Per FTL spec, named argument values are constrained to StringLiteral or - NumberLiteral. They cannot be VariableReferences. Only positional arguments - contribute variable names when they contain VariableReference nodes. - """ - func_ref = FunctionReference( - id=Identifier(name="CUSTOM"), - arguments=CallArguments( - positional=(VariableReference(id=Identifier(name="x")),), - named=( - NamedArgument( - name=Identifier(name="opt"), - value=StringLiteral(value="opt_value"), - ), - ), - ), - ) - msg = _make_message("test", value=_make_pattern(Placeable(expression=func_ref))) - info = introspect_message(msg, use_cache=False) - # Only "x" from positional arg; named arg literal value contributes nothing - assert info.get_variable_names() == frozenset({"x"}) - - def test_function_named_args_with_literals_do_not_contribute_variable_names( - self, - ) -> None: - """Named argument literal values do not contribute to variable_names. - - Per FTL spec, named argument values are always literals (StringLiteral or - NumberLiteral), never VariableReferences. Variables from positional args - are extracted; named arg literal values are not variable references. - """ - func_ref = FunctionReference( - id=Identifier(name="FUNC"), - arguments=CallArguments( - positional=(VariableReference(id=Identifier(name="val")),), - named=( - NamedArgument( - name=Identifier(name="a"), - value=StringLiteral(value="first"), - ), - NamedArgument( - name=Identifier(name="b"), - value=StringLiteral(value="second"), - ), - NamedArgument( - name=Identifier(name="n"), - value=NumberLiteral(value=42, raw="42"), - ), - ), - ), - ) - msg = _make_message("test", value=_make_pattern(Placeable(expression=func_ref))) - info = introspect_message(msg, use_cache=False) - # Only "val" from positional arg; named arg literal values contribute nothing - assert info.get_variable_names() == frozenset({"val"}) - assert "FUNC" in info.get_function_names() - - def test_nested_message_reference_in_function_arg(self) -> None: - """MessageReference in function positional arg is extracted.""" - bundle = FluentBundle("en") - bundle.add_resource("base-value = 42\nformatted = { NUMBER(base-value) }\n") - info = bundle.introspect_message("formatted") - assert any(r.id == "base-value" for r in info.references) - - def test_variable_in_complex_nested_expression(self) -> None: - """Variables in function inside select expression are captured.""" - bundle = FluentBundle("en") - bundle.add_resource( - "complex = { $type ->\n" - " [currency] { NUMBER($amount, minimumFractionDigits: 2) }\n" - " *[plain] { $amount }\n" - "}\n" - ) - info = bundle.introspect_message("complex") - assert "type" in info.get_variable_names() - assert "amount" in info.get_variable_names() - - -# =========================================================================== -# REFERENCE INTROSPECTION -# =========================================================================== - - -class TestReferenceIntrospection: - """Message and term reference tracking.""" - - def test_message_reference(self) -> None: - """MessageReference is captured in ReferenceInfo.""" - bundle = FluentBundle("en") - bundle.add_resource("brand = FTLLexEngine\ngreeting = Welcome to { brand }\n") - info = bundle.introspect_message("greeting") - refs = list(info.references) - assert len(refs) == 1 - assert refs[0].id == "brand" - assert refs[0].kind == ReferenceKind.MESSAGE - assert refs[0].attribute is None - - def test_term_reference(self) -> None: - """TermReference is captured in ReferenceInfo.""" - bundle = FluentBundle("en") - bundle.add_resource("-brand = FTLLexEngine\ngreeting = Welcome to { -brand }\n") - info = bundle.introspect_message("greeting") - refs = list(info.references) - assert len(refs) == 1 - assert refs[0].id == "brand" - assert refs[0].kind == ReferenceKind.TERM - - def test_attribute_message_reference(self) -> None: - """MessageReference with attribute is captured correctly.""" - bundle = FluentBundle("en") - bundle.add_resource( - "message = Message\n .tooltip = Tooltip\ngreeting = Hover for { message.tooltip }\n" - ) - info = bundle.introspect_message("greeting") - refs = list(info.references) - assert len(refs) == 1 - assert refs[0].id == "message" - assert refs[0].attribute == "tooltip" - - -# =========================================================================== -# REFERENCE EXTRACTOR -# =========================================================================== - - -class TestReferenceExtractor: - """ReferenceExtractor specialized visitor for dependency analysis.""" - - def test_message_reference_collected(self) -> None: - """MessageReference is added to message_refs without attribute.""" - msg = _parse_message("msg = { other-message }") - extractor = ReferenceExtractor() - assert msg.value is not None - extractor.visit(msg.value) - assert "other-message" in extractor.message_refs - - def test_message_reference_with_attribute(self) -> None: - """MessageReference with attribute uses qualified form.""" - msg = _parse_message("msg = { other.attr }") - extractor = ReferenceExtractor() - assert msg.value is not None - extractor.visit(msg.value) - assert "other.attr" in extractor.message_refs - - def test_term_reference_no_attribute(self) -> None: - """TermReference without attribute uses unqualified form.""" - msg = _parse_message("msg = { -brand }") - extractor = ReferenceExtractor() - assert msg.value is not None - extractor.visit(msg.value) - assert "brand" in extractor.term_refs - - def test_term_reference_with_attribute(self) -> None: - """TermReference with attribute uses qualified form (line 482 branch).""" - msg = _parse_message("msg = { -brand.short }") - extractor = ReferenceExtractor() - assert msg.value is not None - extractor.visit(msg.value) - # Covers line 482: self.term_refs.add(f"{node.id.name}.{node.attribute.name}") - assert "brand.short" in extractor.term_refs - - def test_nested_term_references_via_arguments(self) -> None: - """Nested term arguments are traversed by generic_visit.""" - msg = _parse_message("msg = { -outer(-inner($var)) }") - assert isinstance(msg, (Message, Term)) - _msg_refs, term_refs = extract_references(msg) - assert "outer" in term_refs - assert "inner" in term_refs - - def test_depth_guard_in_deeply_nested_terms(self) -> None: - """ReferenceExtractor respects max_depth.""" - msg = _parse_message("msg = { -term1(-term2(-term3)) }") - extractor = ReferenceExtractor(max_depth=100) - assert msg.value is not None - extractor.visit(msg.value) - assert "term1" in extractor.term_refs - assert "term2" in extractor.term_refs - assert "term3" in extractor.term_refs - - -# =========================================================================== -# EXTRACT_REFERENCES API -# =========================================================================== - - -class TestExtractReferences: - """Tests for extract_references() public function.""" - - def test_extract_message_and_term_refs(self) -> None: - """extract_references returns both message and term ref sets.""" - msg = _parse_message("msg = { welcome } uses { -brand }") - msg_refs, term_refs = extract_references(msg) - assert "welcome" in msg_refs - assert "brand" in term_refs - - def test_term_reference_with_args_tracked(self) -> None: - """Term references in arguments are captured.""" - msg = _parse_message('msg = { -brand($var, case: "nominative") }') - assert isinstance(msg, (Message, Term)) - _msg_refs, term_refs = extract_references(msg) - assert "brand" in term_refs - - def test_extract_references_message_with_no_value(self) -> None: - """extract_references handles Message(value=None) correctly. - - Covers line 518->522: False branch of ``if entry.value is not None:`` - when message has only attributes (no value pattern). - """ - attr = Attribute( - id=Identifier(name="attr"), - value=_make_pattern(Placeable(expression=TermReference(id=Identifier("brand")))), - ) - msg = _make_message("test", value=None, attributes=(attr,)) - msg_refs, term_refs = extract_references(msg) - # Value is None so no refs from value; attribute has term ref - assert "brand" in term_refs - assert len(msg_refs) == 0 - - def test_extract_references_message_with_empty_value_no_attrs(self) -> None: - """extract_references with empty pattern value returns empty sets.""" - msg = _make_message("test", value=_make_pattern()) - msg_refs, term_refs = extract_references(msg) - assert msg_refs == frozenset() - assert term_refs == frozenset() - - -# =========================================================================== -# EXTRACT_REFERENCES_BY_ATTRIBUTE API -# =========================================================================== - - -class TestExtractReferencesByAttribute: - """Tests for extract_references_by_attribute() public function. - - This function was previously untested (0% coverage). Tests cover all - branches: value pattern, per-attribute patterns, and None-value messages. - """ - - def test_value_pattern_refs_under_none_key(self) -> None: - """Value pattern references are stored under key None.""" - msg = _parse_message("msg = { welcome } uses { -brand }") - result = extract_references_by_attribute(msg) - assert None in result - msg_refs, term_refs = result[None] - assert "welcome" in msg_refs - assert "brand" in term_refs - - def test_attribute_refs_under_attribute_name_key(self) -> None: - """Attribute references are stored under the attribute name key.""" - msg = _parse_message( - "msg = Base text\n .tooltip = { -brand }\n .label = { other }\n" - ) - result = extract_references_by_attribute(msg) - assert "tooltip" in result - assert "label" in result - _m, term_refs = result["tooltip"] - assert "brand" in term_refs - msg_refs2, _t = result["label"] - assert "other" in msg_refs2 - - def test_value_and_attributes_separated(self) -> None: - """Value and attribute references are separate entries.""" - msg = _parse_message( - "msg = { value-ref }\n .attr = { -term-ref }\n" - ) - result = extract_references_by_attribute(msg) - assert None in result - assert "attr" in result - # Value has message ref - assert "value-ref" in result[None][0] - # Attr has term ref - assert "term-ref" in result["attr"][1] - - def test_message_with_no_value(self) -> None: - """Message with value=None has no None key in result.""" - attr = Attribute( - id=Identifier(name="tooltip"), - value=_make_pattern(Placeable(expression=TermReference(id=Identifier("brand")))), - ) - msg = _make_message("btn", value=None, attributes=(attr,)) - result = extract_references_by_attribute(msg) - # No None key (no value pattern) - assert None not in result - assert "tooltip" in result - assert "brand" in result["tooltip"][1] - - def test_message_with_only_value(self) -> None: - """Message with value but no attributes returns single entry.""" - msg = _parse_message("msg = { other }") - result = extract_references_by_attribute(msg) - assert set(result.keys()) == {None} - assert "other" in result[None][0] - - def test_empty_message_no_refs(self) -> None: - """Message with empty value and no attributes returns empty result.""" - msg = _make_message("test", value=_make_pattern()) - result = extract_references_by_attribute(msg) - # Empty Pattern creates a None key with empty sets - assert None in result - msg_refs, term_refs = result[None] - assert msg_refs == frozenset() - assert term_refs == frozenset() - - def test_multiple_attributes_all_present(self) -> None: - """All attributes appear as separate keys.""" - msg = _parse_message( - "btn = Base\n .a1 = { -t1 }\n .a2 = { -t2 }\n .a3 = { -t3 }\n" - ) - result = extract_references_by_attribute(msg) - assert "a1" in result - assert "a2" in result - assert "a3" in result - assert "t1" in result["a1"][1] - assert "t2" in result["a2"][1] - assert "t3" in result["a3"][1] - - -# =========================================================================== -# INTROSPECT_MESSAGE WITH value=None -# =========================================================================== - - -class TestIntrospectMessageNoneValue: - """introspect_message with Message(value=None) - covers line 609->613.""" - - def test_introspect_message_value_none_no_crash(self) -> None: - """Message with value=None is introspected without error. - - Covers line 609->613: False branch of ``if message.value is not None:`` - """ - attr = Attribute( - id=Identifier(name="label"), - value=_make_pattern(Placeable(expression=VariableReference(id=Identifier("x")))), - ) - msg = _make_message("test", value=None, attributes=(attr,)) - result = introspect_message(msg, use_cache=False) - assert result.message_id == "test" - assert "x" in result.get_variable_names() - - def test_introspect_message_value_none_only_attributes(self) -> None: - """Attribute variables are still extracted when value is None.""" - attr1 = Attribute( - id=Identifier(name="formal"), - value=_make_pattern(Placeable(expression=VariableReference(id=Identifier("name")))), - ) - attr2 = Attribute( - id=Identifier(name="casual"), - value=_make_pattern(TextElement(value="Hi there")), - ) - msg = _make_message("greet", value=None, attributes=(attr1, attr2)) - result = introspect_message(msg, use_cache=False) - assert "name" in result.get_variable_names() - assert result.message_id == "greet" - - -# =========================================================================== -# NESTED PLACEABLE IN _visit_expression -# =========================================================================== - - -class TestNestedPlaceableExpression: - """Nested Placeable inside Placeable (lines 363-364 branch coverage).""" - - def test_nested_placeable_extracts_inner_variable(self) -> None: - """Placeable wrapping another Placeable extracts the inner variable. - - Covers lines 363-364: ``elif Placeable.guard(expr):`` branch in - _visit_expression when the expression is itself a Placeable node. - """ - inner_var = VariableReference(id=Identifier(name="inner")) - inner_placeable = Placeable(expression=inner_var) - outer_placeable = Placeable(expression=inner_placeable) - msg = _make_message("test", value=_make_pattern(outer_placeable)) - - visitor = IntrospectionVisitor() - assert msg.value is not None - visitor.visit(msg.value) - names = {v.name for v in visitor.variables} - assert "inner" in names - - def test_nested_placeable_via_introspect_message(self) -> None: - """introspect_message handles doubly-nested Placeable.""" - inner_var = VariableReference(id=Identifier(name="deep")) - msg = _make_message( - "test", - value=_make_pattern(Placeable(expression=Placeable(expression=inner_var))), - ) - result = introspect_message(msg, use_cache=False) - assert "deep" in result.get_variable_names() - - -# =========================================================================== -# EXHAUSTIVENESS GUARD -# =========================================================================== - - -class TestPatternElementExhaustiveness: - """_visit_pattern_element assert_never guard for unexpected element types.""" - - def test_unknown_pattern_element_raises_assertion_error(self) -> None: - """assert_never raises AssertionError for non-TextElement non-Placeable. - - Covers the ``case _ as unreachable: assert_never(unreachable)`` branch. - """ - visitor = IntrospectionVisitor() - # Pass an object that is neither TextElement nor Placeable - sentinel = object() - with pytest.raises(AssertionError): - visitor._visit_pattern_element(sentinel) # type: ignore[arg-type] - - -# =========================================================================== -# MESSAGEINTROSPECTON OBJECT CONTRACTS -# =========================================================================== - - -class TestMessageIntrospectionContracts: - """MessageIntrospection immutability, accessor, and consistency contracts.""" - - def test_frozen_immutability(self) -> None: - """MessageIntrospection cannot be mutated.""" - info = introspect_message(_parse_message("test = { $var }")) - with pytest.raises(AttributeError): - info.message_id = "modified" # type: ignore[misc] - - def test_variable_info_immutability(self) -> None: - """VariableInfo is frozen.""" - var_info = VariableInfo(name="test", context=VariableContext.PATTERN) - with pytest.raises(AttributeError): - var_info.name = "modified" # type: ignore[misc] - - def test_requires_variable_true(self) -> None: - """requires_variable returns True for present variable.""" - info = introspect_message(_parse_message("greeting = Hello, { $name }!")) - assert info.requires_variable("name") - - def test_requires_variable_false(self) -> None: - """requires_variable returns False for absent variable.""" - info = introspect_message(_parse_message("greeting = Hello, { $name }!")) - assert not info.requires_variable("age") - - def test_get_variable_names_returns_frozenset(self) -> None: - """get_variable_names returns frozenset.""" - info = introspect_message(_parse_message("msg = { $x }")) - assert isinstance(info.get_variable_names(), frozenset) - - def test_get_function_names_returns_frozenset(self) -> None: - """get_function_names returns frozenset.""" - info = introspect_message(_parse_message("msg = { NUMBER($x) }")) - assert isinstance(info.get_function_names(), frozenset) - - def test_variables_field_is_frozenset(self) -> None: - """variables field is a frozenset of VariableInfo.""" - info = introspect_message(_parse_message("msg = { $x }")) - assert isinstance(info.variables, frozenset) - - def test_message_id_preserved(self) -> None: - """introspect_message preserves message_id.""" - msg = _parse_message("greet-user = Hello") - assert introspect_message(msg).message_id == "greet-user" - - -# =========================================================================== -# ATTRIBUTE INTROSPECTION -# =========================================================================== - - -class TestAttributeIntrospection: - """Variables in message attributes are extracted.""" - - def test_attribute_variable_extracted(self) -> None: - """Variable in attribute is extracted from message.""" - bundle = FluentBundle("en") - bundle.add_resource( - "login-button = Sign In\n .title = Click to sign in as { $username }\n" - ) - info = bundle.introspect_message("login-button") - assert "username" in info.get_variable_names() - - def test_multiple_attributes_all_extracted(self) -> None: - """Variables from all attributes are collected.""" - bundle = FluentBundle("en") - bundle.add_resource( - "button = Action\n" - " .tooltip = { $action } for { $user }\n" - " .aria-label = { $role }\n" - ) - info = bundle.introspect_message("button") - assert info.get_variable_names() == frozenset({"action", "user", "role"}) - - def test_attribute_only_message(self) -> None: - """Message with no value but attributes is introspected.""" - resource = FluentParserV1().parse("msg =\n .attr1 = Value 1\n .attr2 = Value 2\n") - msg = resource.entries[0] - assert isinstance(msg, Message) - result = introspect_message(msg) - assert result.message_id == "msg" - - def test_attribute_only_message_with_variables(self) -> None: - """Variables in attributes of value-less message are extracted.""" - resource = FluentParserV1().parse( - "msg =\n .formal = Hello { $name }\n .casual = Hi { $name }\n" - ) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert "name" in introspect_message(msg).get_variable_names() - - -# =========================================================================== -# TERM INTROSPECTION -# =========================================================================== - - -class TestTermIntrospection: - """Introspection of Term AST nodes.""" - - def test_introspect_term_direct(self) -> None: - """introspect_message accepts Term nodes.""" - term = _parse_term("-brand = { $companyName }") - info = introspect_message(term) - assert info.message_id == "brand" - assert "companyName" in info.get_variable_names() - - def test_introspect_term_via_bundle(self) -> None: - """FluentBundle.introspect_term() introspects a term.""" - bundle = FluentBundle("en") - bundle.add_resource("-brand = { $companyName }") - info = bundle.introspect_term("brand") - assert info.message_id == "brand" - assert "companyName" in info.get_variable_names() - - def test_introspect_term_not_found(self) -> None: - """KeyError raised for non-existent term.""" - bundle = FluentBundle("en") - with pytest.raises(KeyError, match=r"Term 'nonexistent' not found"): - bundle.introspect_term("nonexistent") - - def test_term_reference_positional_args(self) -> None: - """Term reference with positional arguments extracts nested variables.""" - msg = _parse_message("greeting = { -brand($platform) }") - assert isinstance(msg, (Message, Term)) - info = introspect_message(msg) - assert "platform" in info.get_variable_names() - - def test_term_reference_named_args(self) -> None: - """Term reference with named arguments extracts variable values.""" - msg = _parse_message('app-name = { -brand($userCase, case: "nominative") }') - assert isinstance(msg, (Message, Term)) - info = introspect_message(msg) - assert "userCase" in info.get_variable_names() - - def test_term_reference_both_arg_types(self) -> None: - """Term reference with positional and named arguments captures all variables.""" - msg = _parse_message('msg = { -term($pos1, $pos2, style: "formal") }') - assert isinstance(msg, (Message, Term)) - info = introspect_message(msg) - assert "pos1" in info.get_variable_names() - assert "pos2" in info.get_variable_names() - - -# =========================================================================== -# VARIABLE CONTEXTS -# =========================================================================== - - -class TestVariableContexts: - """Variable context tracking in IntrospectionVisitor.""" - - def test_function_arg_context(self) -> None: - """Variables in function arguments have FUNCTION_ARG context.""" - msg = _parse_message("msg = { NUMBER($value, minimumFractionDigits: 2) }") - visitor = IntrospectionVisitor() - assert msg.value is not None - visitor.visit(msg.value) - value_vars = [v for v in visitor.variables if v.name == "value"] - assert len(value_vars) == 1 - assert value_vars[0].context == VariableContext.FUNCTION_ARG - - def test_selector_context(self) -> None: - """Variables in selectors have SELECTOR context.""" - msg = _parse_message("msg = { $count -> [one] one *[other] many }") - visitor = IntrospectionVisitor() - assert msg.value is not None - visitor.visit(msg.value) - count_vars = [v for v in visitor.variables if v.name == "count"] - selector_contexts = [v for v in count_vars if v.context == VariableContext.SELECTOR] - assert len(selector_contexts) >= 1 - - def test_variant_context(self) -> None: - """Variables in variant values have VARIANT context.""" - msg = _parse_message("msg = { $sel -> [key] Value is { $value } *[other] none }") - visitor = IntrospectionVisitor() - assert msg.value is not None - visitor.visit(msg.value) - value_vars = [v for v in visitor.variables if v.name == "value"] - variant_contexts = [v for v in value_vars if v.context == VariableContext.VARIANT] - assert len(variant_contexts) >= 1 - - def test_context_restored_after_selector(self) -> None: - """Variable context is correctly restored after visiting selector.""" - msg = _parse_message( - "emails = { $count ->\n" - " [one] { $name } has one email\n" - " *[other] { $name } has { $count } emails\n" - "}" - ) - visitor = IntrospectionVisitor() - assert msg.value is not None - visitor.visit(msg.value) - var_contexts = {v.name: v.context for v in visitor.variables} - assert "count" in var_contexts - assert "name" in var_contexts - - -# =========================================================================== -# SPAN TRACKING -# =========================================================================== - - -class TestSpanTracking: - """Source position spans are attached to introspection results.""" - - def test_variable_reference_span(self) -> None: - """Variable references include correct source spans.""" - msg = _parse_message("greeting = Hello, { $name }!") - info = introspect_message(msg) - assert len(info.variables) == 1 - var_info = next(iter(info.variables)) - assert var_info.name == "name" - assert var_info.span is not None - assert var_info.span.start == 20 - assert var_info.span.end == 25 - - def test_function_reference_span(self) -> None: - """Function references include correct source spans.""" - msg = _parse_message("price = { NUMBER($amount) }") - info = introspect_message(msg) - assert len(info.functions) == 1 - func_info = next(iter(info.functions)) - assert func_info.name == "NUMBER" - assert func_info.span is not None - assert func_info.span.start == 10 - assert func_info.span.end == 25 - - def test_message_reference_span(self) -> None: - """Message references include correct source spans.""" - msg = _parse_message("ref = { other-msg }") - info = introspect_message(msg) - refs = [r for r in info.references if r.kind == ReferenceKind.MESSAGE] - assert len(refs) == 1 - assert refs[0].id == "other-msg" - assert refs[0].span is not None - assert refs[0].span.start == 8 - assert refs[0].span.end == 17 - - def test_term_reference_span(self) -> None: - """Term references include correct source spans.""" - msg = _parse_message("msg = { -brand }") - info = introspect_message(msg) - refs = [r for r in info.references if r.kind == ReferenceKind.TERM] - assert len(refs) == 1 - assert refs[0].id == "brand" - assert refs[0].span is not None - assert refs[0].span.start == 8 - assert refs[0].span.end == 15 - - def test_term_reference_with_attribute_span(self) -> None: - """Term references with attributes have correct spans.""" - msg = _parse_message("msg = { -brand.short }") - info = introspect_message(msg) - refs = [r for r in info.references if r.kind == ReferenceKind.TERM] - assert len(refs) == 1 - assert refs[0].attribute == "short" - assert refs[0].span is not None - assert refs[0].span.start == 8 - assert refs[0].span.end == 21 - - def test_multiple_variables_distinct_spans(self) -> None: - """Multiple variables each have distinct spans.""" - msg = _parse_message("msg = { $first } and { $second }") - info = introspect_message(msg) - assert len(info.variables) == 2 - vars_by_name = {v.name: v for v in info.variables} - assert vars_by_name["first"].span is not None - assert vars_by_name["first"].span.start == 8 - assert vars_by_name["second"].span is not None - assert vars_by_name["second"].span.start == 23 - - def test_message_reference_with_attribute_span(self) -> None: - """Message references with attributes have correct spans.""" - msg = _parse_message("msg = { other.attr }") - info = introspect_message(msg) - refs = [r for r in info.references if r.kind == ReferenceKind.MESSAGE] - assert len(refs) == 1 - assert refs[0].attribute == "attr" - assert refs[0].span is not None - assert refs[0].span.start == 8 - assert refs[0].span.end == 18 - - -# =========================================================================== -# DEPTH LIMITS -# =========================================================================== - - -class TestDepthLimits: - """Depth guard prevents stack overflow on deeply nested ASTs.""" - - def test_introspection_visitor_depth_limit(self) -> None: - """IntrospectionVisitor respects max_depth configuration.""" - msg = _parse_message( - "msg = { $a -> [x] { $b -> [y] { $c -> [z] value *[o] v } *[o] v } *[o] v }" - ) - visitor = IntrospectionVisitor(max_depth=100) - assert msg.value is not None - visitor.visit(msg.value) - names = {v.name for v in visitor.variables} - assert "a" in names - assert "b" in names - assert "c" in names - - def test_reference_extractor_depth_limit(self) -> None: - """ReferenceExtractor respects max_depth configuration.""" - msg = _parse_message("msg = { -term1(-term2(-term3)) }") - extractor = ReferenceExtractor(max_depth=100) - assert msg.value is not None - extractor.visit(msg.value) - assert "term1" in extractor.term_refs - assert "term2" in extractor.term_refs - assert "term3" in extractor.term_refs - - -# =========================================================================== -# TYPE ERROR HANDLING -# =========================================================================== - - -class TestIntrospectMessageTypeErrors: - """introspect_message raises TypeError for non-Message/Term inputs.""" - - def test_raises_for_junk(self) -> None: - """Junk entry raises TypeError.""" - resource = parse_ftl("invalid syntax here !!!") - assert resource.entries - junk = resource.entries[0] - assert isinstance(junk, Junk) - with pytest.raises(TypeError, match="Expected Message or Term"): - introspect_message(junk) # type: ignore[arg-type] - - def test_raises_for_string(self) -> None: - """String input raises TypeError.""" - with pytest.raises(TypeError, match="Expected Message or Term"): - introspect_message("not a message") # type: ignore[arg-type] - - def test_raises_for_none(self) -> None: - """None input raises TypeError.""" - with pytest.raises(TypeError, match="Expected Message or Term"): - introspect_message(None) # type: ignore[arg-type] - - def test_raises_for_dict(self) -> None: - """Dict input raises TypeError.""" - with pytest.raises(TypeError, match="Expected Message or Term"): - introspect_message({"not": "a message"}) # type: ignore[arg-type] - - @given( - st.one_of( - st.integers(), - st.decimals(allow_nan=False, allow_infinity=False), - st.booleans(), - st.lists(st.text()), - ) - ) - @settings(max_examples=30) - def test_raises_for_arbitrary_types(self, invalid_input: object) -> None: - """Arbitrary non-Message types raise TypeError.""" - event(f"input_type={type(invalid_input).__name__}") - with pytest.raises(TypeError, match="Expected Message or Term"): - introspect_message(invalid_input) # type: ignore[arg-type] - - -# =========================================================================== -# REAL-WORLD SCENARIOS -# =========================================================================== - - -class TestRealWorldScenarios: - """Integration tests for practical use cases.""" - - def test_ui_message_validation(self) -> None: - """CI/CD variable validation for UI messages.""" - bundle = FluentBundle("en") - bundle.add_resource( - "home-subtitle = Welcome to { $country }\n" - "money-with-vat = Gross: { $gross }, Net: { $net }, VAT: { $vat } ({ $rate }%)\n" - ) - assert "country" in bundle.get_message_variables("home-subtitle") - assert bundle.get_message_variables("money-with-vat") == frozenset( - {"gross", "net", "vat", "rate"} - ) - - def test_function_usage_analysis(self) -> None: - """Analyze function usage in financial messages.""" - bundle = FluentBundle("en") - bundle.add_resource( - 'timestamp = Last updated: { DATETIME($time, dateStyle: "medium") }\n' - "price = Total: { NUMBER($amount, minimumFractionDigits: 2," - " maximumFractionDigits: 2) }\n" - ) - ts_info = bundle.introspect_message("timestamp") - assert "DATETIME" in ts_info.get_function_names() - assert "time" in ts_info.get_variable_names() - - price_info = bundle.introspect_message("price") - number_funcs = [f for f in price_info.functions if f.name == "NUMBER"] - assert len(number_funcs) == 1 - assert "minimumFractionDigits" in number_funcs[0].named_args - assert "maximumFractionDigits" in number_funcs[0].named_args - - -# =========================================================================== -# HYPOTHESIS PROPERTY TESTS (inline - no external strategy module needed) -# =========================================================================== - - -_var_names = st.from_regex(r"[a-z]+", fullmatch=True) -_msg_ids = st.from_regex(r"[a-z]+", fullmatch=True) - - -class TestVariableExtractionProperties: - """Property-based invariants for variable extraction.""" - - @given(var_name=_var_names) - @settings(max_examples=200) - def test_simple_variable_always_extracted(self, var_name: str) -> None: - """{ $var } always extracts var.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = Hello {{ ${var_name} }}") - assert var_name in extract_variables(msg) - - @given(var_name=_var_names) - @settings(max_examples=200) - def test_duplicate_variables_deduplicated(self, var_name: str) -> None: - """{ $var } { $var } extracts var once.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = Hello {{ ${var_name} }} {{ ${var_name} }}") - variables = extract_variables(msg) - assert var_name in variables - assert len([v for v in variables if v == var_name]) == 1 - - @given(var1=_var_names, var2=_var_names) - @settings(max_examples=200) - def test_multiple_variables_all_extracted(self, var1: str, var2: str) -> None: - """{ $a } { $b } extracts both a and b.""" - event(f"same_vars={var1 == var2}") - msg = _parse_message(f"msg = Hello {{ ${var1} }} {{ ${var2} }}") - variables = extract_variables(msg) - assert var1 in variables - if var1 != var2: - assert var2 in variables - - @given(msg_id=_msg_ids) - @settings(max_examples=100) - def test_no_variables_returns_empty_set(self, msg_id: str) -> None: - """Message with no variables returns empty frozenset.""" - event(f"msg_id={msg_id}") - msg = _parse_message(f"{msg_id} = Hello World") - assert len(extract_variables(msg)) == 0 - - @given(var_name=_var_names) - @settings(max_examples=100) - def test_variable_in_function_extracted(self, var_name: str) -> None: - """NUMBER($var) extracts var.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = {{ NUMBER(${var_name}) }}") - assert var_name in extract_variables(msg) - - @given(var_name=_var_names, attr_name=st.from_regex(r"[a-z]+", fullmatch=True)) - @settings(max_examples=100) - def test_attribute_variable_extracted(self, var_name: str, attr_name: str) -> None: - """Variables in attributes are extracted.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = Hello\n .{attr_name} = {{ ${var_name} }}") - assert var_name in introspect_message(msg).get_variable_names() - - -class TestIntrospectionResultProperties: - """Properties of MessageIntrospection result objects.""" - - @given(msg_id=_msg_ids) - @settings(max_examples=200) - def test_message_id_preserved(self, msg_id: str) -> None: - """introspect_message preserves message ID.""" - event(f"msg_id={msg_id}") - msg = _parse_message(f"{msg_id} = Hello") - assert introspect_message(msg).message_id == msg_id - - @given(var_name=_var_names) - @settings(max_examples=200) - def test_get_variable_names_consistent(self, var_name: str) -> None: - """get_variable_names() and variables field are consistent.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = Hello {{ ${var_name} }}") - info = introspect_message(msg) - var_names = info.get_variable_names() - assert var_name in var_names - assert len(info.variables) == len(var_names) - - @given(var_name=_var_names) - @settings(max_examples=200) - def test_requires_variable_matches_extraction(self, var_name: str) -> None: - """requires_variable(x) iff x in get_variable_names().""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = Hello {{ ${var_name} }}") - info = introspect_message(msg) - if info.requires_variable(var_name): - assert var_name in info.get_variable_names() - if var_name in info.get_variable_names(): - assert info.requires_variable(var_name) - - @given(msg_id=_msg_ids) - @settings(max_examples=100) - def test_no_selectors_for_simple_message(self, msg_id: str) -> None: - """Simple message has has_selectors=False.""" - event(f"msg_id={msg_id}") - msg = _parse_message(f"{msg_id} = Hello") - assert introspect_message(msg).has_selectors is False - - @given(var_name=_var_names) - @settings(max_examples=100) - def test_select_expression_sets_has_selectors(self, var_name: str) -> None: - """Message with select expression has has_selectors=True.""" - event(f"var_name={var_name}") - msg = _parse_message( - f"msg = {{ ${var_name} ->\n [one] One item\n *[other] Many items\n}}" - ) - assert introspect_message(msg).has_selectors is True - - @given(var_name=_var_names) - @settings(max_examples=100) - def test_number_function_detected(self, var_name: str) -> None: - """NUMBER($var) is detected as a function call.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = {{ NUMBER(${var_name}) }}") - assert "NUMBER" in introspect_message(msg).get_function_names() - - @given(msg_id=_msg_ids) - @settings(max_examples=100) - def test_no_functions_returns_empty_set(self, msg_id: str) -> None: - """Message with no functions returns empty frozenset.""" - event(f"msg_id={msg_id}") - msg = _parse_message(f"{msg_id} = Hello World") - assert len(introspect_message(msg).get_function_names()) == 0 - - -class TestIntrospectionIdempotence: - """Idempotence: repeated calls return same results.""" - - @given(var_name=_var_names) - @settings(max_examples=100) - def test_extract_variables_idempotent(self, var_name: str) -> None: - """Multiple extract_variables() calls return the same result.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = Hello {{ ${var_name} }}") - r1 = extract_variables(msg) - r2 = extract_variables(msg) - assert r1 == r2 - - @given(var_name=_var_names) - @settings(max_examples=100) - def test_introspect_message_idempotent(self, var_name: str) -> None: - """Multiple introspect_message() calls return equivalent results.""" - event(f"var_name={var_name}") - msg = _parse_message(f"msg = Hello {{ ${var_name} }}") - r1 = introspect_message(msg) - r2 = introspect_message(msg) - assert r1.message_id == r2.message_id - assert r1.variables == r2.variables - assert r1.functions == r2.functions - assert r1.references == r2.references - assert r1.has_selectors == r2.has_selectors - - @given(vars_list=st.lists(_var_names, min_size=1, max_size=10, unique=True)) - @settings(max_examples=50) - def test_multiple_variables_all_captured(self, vars_list: list[str]) -> None: - """All variables in message are captured in extract_variables.""" - event(f"var_count={len(vars_list)}") - placeables = " ".join(f"{{ ${v} }}" for v in vars_list) - msg = _parse_message(f"msg = {placeables}") - variables = extract_variables(msg) - for var in vars_list: - assert var in variables - assert len(variables) == len(vars_list) - - @given( - var_names_list=st.lists( - st.text( - alphabet=st.characters(min_codepoint=97, max_codepoint=122), - min_size=1, - max_size=10, - ), - min_size=1, - max_size=5, - ) - ) - @settings(max_examples=30) - def test_arbitrary_variable_named_args(self, var_names_list: list[str]) -> None: - """Functions with arbitrary variable names in named args extract all vars.""" - var_names_list = list(dict.fromkeys(var_names_list)) - if not var_names_list: - return - event(f"var_count={len(var_names_list)}") - var_list = ", ".join(f"{name}: ${name}" for name in var_names_list) - ftl = f"test = {{ NUMBER($value, {var_list}) }}" - resource = parse_ftl(ftl) - if not resource.entries or isinstance(resource.entries[0], Junk): - return - msg = resource.entries[0] - if not isinstance(msg, Message): - return - info = introspect_message(msg) - assert "value" in info.get_variable_names() - for name in var_names_list: - assert name in info.get_variable_names() - - -# ============================================================================ -# NESTED PLACEABLE COVERAGE -# ============================================================================ - - -class TestIntrospectionNestedPlaceable: - """Test introspection of nested Placeable expressions.""" - - def test_nested_placeable_extraction(self) -> None: - """Nested Placeable (Placeable containing Placeable) visits inner expression.""" - inner_var = VariableReference(id=Identifier("innerVar")) - inner_placeable = Placeable(expression=inner_var) - outer_placeable = Placeable(expression=inner_placeable) - - message = Message( - id=Identifier("nested"), - value=Pattern(elements=(outer_placeable,)), - attributes=(), - ) - - result = introspect_message(message) - - var_names = {v.name for v in result.variables} - assert "innerVar" in var_names - - def test_deeply_nested_placeables(self) -> None: - """Multiple levels of nested Placeables are fully traversed.""" - var = VariableReference(id=Identifier("deep")) - level1 = Placeable(expression=var) - level2 = Placeable(expression=level1) - level3 = Placeable(expression=level2) - - message = Message( - id=Identifier("deepNest"), - value=Pattern(elements=(level3,)), - attributes=(), - ) - - result = introspect_message(message) - var_names = {v.name for v in result.variables} - assert "deep" in var_names - - def test_message_without_value_extract_references(self) -> None: - """Message with value=None but with attributes extracts from attributes.""" - attr_pattern = Pattern( - elements=(Placeable(expression=VariableReference(id=Identifier("attrVar"))),) - ) - message = Message( - id=Identifier("attrsOnly"), - value=None, - attributes=(Attribute(id=Identifier("hint"), value=attr_pattern),), - ) - - msg_refs, term_refs = extract_references(message) - - assert isinstance(msg_refs, frozenset) - assert isinstance(term_refs, frozenset) - - def test_introspect_message_without_value(self) -> None: - """introspect_message extracts from attributes when message.value is None.""" - attr_pattern = Pattern( - elements=( - TextElement("Hint: "), - Placeable(expression=VariableReference(id=Identifier("hintVar"))), - ) - ) - message = Message( - id=Identifier("noValue"), - value=None, - attributes=(Attribute(id=Identifier("tooltip"), value=attr_pattern),), - ) - - result = introspect_message(message) - - var_names = {v.name for v in result.variables} - assert "hintVar" in var_names - - -class TestIntrospectionBranchCoverage: - """Tests for introspection branch coverage.""" - - def test_function_without_arguments(self) -> None: - """Function reference with empty arguments visits function node correctly.""" - func_ref = FunctionReference( - id=Identifier("NOARGS"), - arguments=CallArguments(positional=(), named=()), - ) - - message = Message( - id=Identifier("noArgsFunc"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - - result = introspect_message(message) - - func_names = {f.name for f in result.functions} - assert "NOARGS" in func_names - - def test_text_element_only_pattern(self) -> None: - """Pattern with only TextElement yields no variables or functions.""" - message = Message( - id=Identifier("textOnly"), - value=Pattern(elements=(TextElement("Just plain text"),)), - attributes=(), - ) - - result = introspect_message(message) - - assert len(result.variables) == 0 - assert len(result.functions) == 0 - - def test_function_with_empty_call_arguments(self) -> None: - """Function with empty positional and named arguments is still recorded.""" - func_ref = FunctionReference( - id=Identifier("EMPTY"), - arguments=CallArguments(positional=(), named=()), - ) - - message = Message( - id=Identifier("emptyArgs"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - - result = introspect_message(message) - - func_names = {f.name for f in result.functions} - assert "EMPTY" in func_names - - -# =========================================================================== -# THREAD SAFETY TESTS -# =========================================================================== - - -class TestIntrospectionThreadSafety: - """Verify the cache lock prevents data corruption under concurrent access. - - These tests exercise the check-compute-store pattern introduced with the - threading.Lock that replaced the GIL-reliant lock-free WeakKeyDictionary - access. They run in CI (no @pytest.mark.fuzz) because the thread counts - are small and the wall-clock cost is negligible. - """ - - def test_concurrent_introspection_same_message(self) -> None: - """Concurrent introspection of the same Message yields identical results. - - All threads must see the same MessageIntrospection (equal by content), - and the cache must contain exactly one entry for the shared message. - """ - message = Message( - id=Identifier("sharedMsg"), - value=Pattern(elements=( - TextElement("Hello "), - Placeable(expression=VariableReference(id=Identifier("name"))), - )), - attributes=(), - ) - - # Clear cache to ensure a fresh start for this test. - with _introspection_cache_lock: - _introspection_cache.clear() - - results: list[object] = [] - errors: list[BaseException] = [] - - def worker() -> None: - try: - results.append(introspect_message(message)) - except Exception as exc: - errors.append(exc) - - threads = [threading.Thread(target=worker) for _ in range(20)] - for t in threads: - t.start() - for t in threads: - t.join() - - assert not errors, f"Thread errors: {errors}" - assert len(results) == 20 - - # All results must be equal (same content, immutable). - first = results[0] - assert all(r == first for r in results) - - def test_concurrent_clear_and_introspect(self) -> None: - """Concurrent clear + introspect does not corrupt the cache. - - After all operations complete, any surviving cached entry must be - a valid MessageIntrospection (no partially-written garbage). - """ - message = Message( - id=Identifier("racyMsg"), - value=Pattern(elements=(TextElement("race"),)), - attributes=(), - ) - - errors: list[BaseException] = [] - - def introspector() -> None: - try: - for _ in range(10): - introspect_message(message) - except Exception as exc: - errors.append(exc) - - def clearer() -> None: - try: - for _ in range(5): - clear_introspection_cache() - except Exception as exc: - errors.append(exc) - - threads = ( - [threading.Thread(target=introspector) for _ in range(8)] - + [threading.Thread(target=clearer) for _ in range(2)] - ) - for t in threads: - t.start() - for t in threads: - t.join() - - assert not errors, f"Thread errors: {errors}" - - # Final cache state must be consistent: either empty or holding a valid result. - result = introspect_message(message) - assert result.message_id == "racyMsg" - - -# =========================================================================== -# DOUBLE-CHECK CACHE HIT (line 674) -# =========================================================================== - - -class TestCacheDoubleCheckHit: - """Covers introspect_message line 674: the locked double-check cache hit. - - Line 674 fires only when another thread stores the result between step 1 - (initial pre-lock miss check) and step 3 (locked store). The test uses - a mock lock that pre-fills the cache before the double-check code runs, - exactly simulating the winning-race scenario. - """ - - def test_double_check_returns_preexisting_result(self) -> None: - """Line 674: double-check inside lock returns pre-filled entry. - - The mock lock pre-fills _introspection_cache[msg] on __enter__, - simulating another thread winning the race. introspect_message must - return the pre-filled result rather than overwriting it. - """ - msg = _parse_message("dc-test = { $var }") - clear_introspection_cache() - - # Compute reference result (no cache interaction) - expected = introspect_message(msg, use_cache=False) - clear_introspection_cache() - - # Capture original lock before patching - orig_lock = _introspection_msg_mod._introspection_cache_lock - - class _RaceLock: - """Simulates a concurrent thread winning the race at Step 3. - - introspect_message acquires the lock TWICE per call with use_cache=True: - - First acquisition: Step 1 read-check (cache is empty, should miss) - - Second acquisition: Step 3 write-check (pre-fill simulates the race) - Pre-filling on the first acquisition would cause an early return at the - Step 1 hit (line 641), bypassing the double-check at line 674 entirely. - """ - - def __init__(self) -> None: - self._call_count = 0 - - def __enter__(self) -> object: - orig_lock.acquire() - self._call_count += 1 - if self._call_count == 2: # Step 3 write-check only - _introspection_msg_mod._introspection_cache[msg] = expected - return self - - def __exit__( - self, - exc_type: type[BaseException] | None, - exc_val: BaseException | None, - exc_tb: object, - ) -> None: - orig_lock.release() - - with patch.object( - _introspection_msg_mod, "_introspection_cache_lock", _RaceLock() - ): - # Step 1: first __enter__ — cache is empty, miss, continue to Step 2 - # Step 2: computation proceeds normally - # Step 3: second __enter__ pre-fills cache — double-check hits line 674 - result = introspect_message(msg, use_cache=True) - - assert result.message_id == expected.message_id - assert result.get_variable_names() == expected.get_variable_names() - clear_introspection_cache() - - -# =========================================================================== -# MessageVariableValidationResult + validate_message_variables -# =========================================================================== - - -class TestMessageVariableValidationResult: - """Tests for the MessageVariableValidationResult frozen dataclass.""" - - def test_immutable(self) -> None: - """MessageVariableValidationResult is frozen (immutable).""" - result = MessageVariableValidationResult( - message_id="greeting", - is_valid=True, - declared_variables=frozenset({"name"}), - missing_variables=frozenset(), - extra_variables=frozenset(), - ) - with pytest.raises(AttributeError): - result.is_valid = False # type: ignore[misc] - - def test_valid_result_fields(self) -> None: - """is_valid=True when missing and extra are both empty.""" - result = MessageVariableValidationResult( - message_id="msg", - is_valid=True, - declared_variables=frozenset({"a", "b"}), - missing_variables=frozenset(), - extra_variables=frozenset(), - ) - assert result.is_valid is True - assert result.declared_variables == frozenset({"a", "b"}) - assert result.missing_variables == frozenset() - assert result.extra_variables == frozenset() - - def test_invalid_with_missing(self) -> None: - """is_valid=False when missing_variables is non-empty.""" - result = MessageVariableValidationResult( - message_id="msg", - is_valid=False, - declared_variables=frozenset({"a"}), - missing_variables=frozenset({"b"}), - extra_variables=frozenset(), - ) - assert result.is_valid is False - assert "b" in result.missing_variables - - def test_hashable(self) -> None: - """MessageVariableValidationResult is hashable (frozen dataclass).""" - r1 = MessageVariableValidationResult( - message_id="greeting", - is_valid=True, - declared_variables=frozenset({"name"}), - missing_variables=frozenset(), - extra_variables=frozenset(), - ) - assert hash(r1) is not None - s: set[MessageVariableValidationResult] = {r1} - assert len(s) == 1 - - -class TestValidateMessageVariables: - """Tests for validate_message_variables().""" - - def test_exact_match_is_valid(self) -> None: - """Message declaring exactly the expected variables returns is_valid=True.""" - msg = _parse_message("greeting = Hello, { $name }! You have { $count } items.") - result = validate_message_variables(msg, {"name", "count"}) - assert result.is_valid is True - assert result.declared_variables == frozenset({"name", "count"}) - assert result.missing_variables == frozenset() - assert result.extra_variables == frozenset() - - def test_missing_variable_detected(self) -> None: - """Expected variable absent from FTL message is reported in missing_variables.""" - msg = _parse_message("greeting = Hello, { $name }!") - result = validate_message_variables(msg, {"name", "count"}) - assert result.is_valid is False - assert result.missing_variables == frozenset({"count"}) - assert result.extra_variables == frozenset() - - def test_extra_variable_detected(self) -> None: - """Variable declared in FTL but absent from expected is reported in extra_variables.""" - msg = _parse_message("greeting = Hello, { $name }! You have { $count } items.") - result = validate_message_variables(msg, {"name"}) - assert result.is_valid is False - assert result.extra_variables == frozenset({"count"}) - assert result.missing_variables == frozenset() - - def test_both_missing_and_extra_detected(self) -> None: - """Both missing and extra variables reported independently.""" - msg = _parse_message("msg = { $actual } value") - result = validate_message_variables(msg, {"expected"}) - assert result.is_valid is False - assert "expected" in result.missing_variables - assert "actual" in result.extra_variables - - def test_empty_expected_all_extra(self) -> None: - """Expected set is empty: all declared variables are extra.""" - msg = _parse_message("msg = Hello { $name }!") - result = validate_message_variables(msg, frozenset()) - assert result.is_valid is False - assert result.extra_variables == frozenset({"name"}) - assert result.missing_variables == frozenset() - - def test_message_with_no_variables_and_empty_expected(self) -> None: - """Static message with no variables and empty expected is valid.""" - msg = _parse_message("static = Hello World") - result = validate_message_variables(msg, frozenset()) - assert result.is_valid is True - assert result.declared_variables == frozenset() - - def test_message_id_extracted_from_ast_node(self) -> None: - """result.message_id matches the FTL message identifier.""" - msg = _parse_message("my-message = { $var }") - result = validate_message_variables(msg, {"var"}) - assert result.message_id == "my-message" - - def test_frozenset_and_set_expected_equivalent(self) -> None: - """frozenset and set inputs for expected_variables produce identical results.""" - msg = _parse_message("greeting = Hello, { $name }!") - result_set = validate_message_variables(msg, {"name"}) - result_frozen = validate_message_variables(msg, frozenset({"name"})) - assert result_set.is_valid == result_frozen.is_valid - assert result_set.declared_variables == result_frozen.declared_variables - assert result_set.missing_variables == result_frozen.missing_variables - assert result_set.extra_variables == result_frozen.extra_variables - - def test_validate_term(self) -> None: - """validate_message_variables works on Term AST nodes.""" - resource = FluentParserV1().parse("-brand = { $edition } Edition") - term = next(e for e in resource.entries if isinstance(e, Term)) - result = validate_message_variables(term, {"edition"}) - assert result.is_valid is True - assert result.message_id == "brand" - - @given( - var_names=st.frozensets( - st.from_regex(r"[a-z][a-z]{0,9}", fullmatch=True), - min_size=0, - max_size=5, - ), - extra_vars=st.frozensets( - st.from_regex(r"[a-z][a-z]{0,9}", fullmatch=True), - min_size=0, - max_size=3, - ), - ) - @settings(max_examples=200) - def test_property_validity_iff_exact_match( - self, var_names: frozenset[str], extra_vars: frozenset[str] - ) -> None: - """is_valid iff declared == expected (exact set equality). - - Constructs a message with exactly var_names as variables, validates - against expected = var_names | extra_vars. Result is valid only when - extra_vars is empty. - """ - event(f"declared_count={len(var_names)}") - event(f"extra_count={len(extra_vars)}") - - # Filter out names that overlap between the two sets - safe_names = list(var_names) - safe_extra = [n for n in extra_vars if n not in var_names] - - if not safe_names and not safe_extra: - event("outcome=empty_skip") - return - - placeable_ftl = " ".join(f"{{ ${n} }}" for n in safe_names) - ftl_source = f"msg = {placeable_ftl or 'static'}" - - resource = FluentParserV1().parse(ftl_source) - messages = [e for e in resource.entries if isinstance(e, Message)] - if not messages: - event("outcome=parse_failed") - return - - declared = frozenset(safe_names) - expected = declared | frozenset(safe_extra) - result = validate_message_variables(messages[0], expected) - - assert result.declared_variables == declared - assert result.missing_variables == frozenset(safe_extra) - assert result.extra_variables == frozenset() - - if safe_extra: - event("outcome=missing_detected") - assert result.is_valid is False - else: - event("outcome=exact_match") - assert result.is_valid is True +from tests.introspection_message_cases.cache_and_validation import * # noqa: F403 - split module reuses shared support imports +from tests.introspection_message_cases.contracts_and_spans import * # noqa: F403 - split module reuses shared support imports +from tests.introspection_message_cases.extraction_and_references import * # noqa: F403 - split module reuses shared support imports +from tests.introspection_message_cases.properties_and_branches import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_localization.py b/tests/test_localization.py index 8ad44dc2..76b15a94 100644 --- a/tests/test_localization.py +++ b/tests/test_localization.py @@ -1,1509 +1,6 @@ -"""Tests for FluentLocalization multi-locale orchestration. +"""Aggregated FluentLocalization integration test surface.""" -Tests the fallback chain logic, resource loading, Mozilla architecture alignment, -and root facade accessibility for boot evidence types. -Uses Python 3.13 features for modern test patterns. -""" - -from __future__ import annotations - -import tempfile -from pathlib import Path - -import pytest - -import ftllexengine -from ftllexengine import FluentBundle -from ftllexengine.core.locale_utils import normalize_locale -from ftllexengine.enums import LoadStatus -from ftllexengine.localization import ( - FallbackInfo, - FluentLocalization, - LoadSummary, - LocalizationBootConfig, - LocalizationCacheStats, - PathResourceLoader, - ResourceLoader, - ResourceLoadResult, -) -from ftllexengine.runtime.cache_config import CacheConfig -from ftllexengine.syntax.ast import Message - - -class TestFluentLocalizationBasics: - """Test basic FluentLocalization initialization and API.""" - - def test_single_locale_initialization(self) -> None: - """Initialize with single locale.""" - l10n = FluentLocalization(["en"]) - - assert l10n.locales == ("en",) - - def test_multiple_locales_initialization(self) -> None: - """Initialize with multiple locales in fallback order.""" - l10n = FluentLocalization(["lv", "en", "lt"]) - - assert l10n.locales == ("lv", "en", "lt") - - def test_empty_locales_raises_error(self) -> None: - """Empty locale list raises ValueError.""" - with pytest.raises(ValueError, match="At least one locale is required"): - FluentLocalization([]) - - def test_resource_ids_without_loader_raises_error(self) -> None: - """Providing resource_ids without loader raises ValueError.""" - with pytest.raises( - ValueError, match="resource_loader required when resource_ids provided" - ): - FluentLocalization(["en"], resource_ids=["main.ftl"]) - - def test_invalid_locale_format_rejected_at_init(self) -> None: - """Invalid locale format raises ValueError at initialization (fail-fast). - - Locale format errors are caught at construction time rather than - propagating out of format_value during lazy bundle creation. - """ - with pytest.raises(ValueError, match=r"Invalid locale: 'invalid locale with spaces'"): - FluentLocalization(["en", "invalid locale with spaces"]) - - def test_unknown_locale_rejected_at_init(self) -> None: - """Unknown but well-formed locales are rejected before localization starts.""" - with pytest.raises(ValueError, match="Unknown locale identifier"): - FluentLocalization(["en", "xx-UNKNOWN"]) - - def test_locales_property_immutable(self) -> None: - """Locales property returns immutable tuple.""" - l10n = FluentLocalization(["en", "fr"]) - - assert isinstance(l10n.locales, tuple) - assert l10n.locales == ("en", "fr") - - -class TestAddResource: - """Test dynamic resource addition.""" - - def test_add_resource_single_locale(self) -> None: - """Add FTL resource to single locale.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "hello = Hello, World!") - - result, errors = l10n.format_value("hello") - - assert not errors - assert result == "Hello, World!" - - def test_add_resource_multiple_locales(self) -> None: - """Add different resources to different locales.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("lv", "hello = Sveiki, pasaule!") - l10n.add_resource("en", "hello = Hello, World!") - - result, errors = l10n.format_value("hello") - - assert not errors - # Should use first locale (lv) - assert result == "Sveiki, pasaule!" - - def test_add_resource_invalid_locale_raises_error(self) -> None: - """Adding resource for locale not in chain raises ValueError.""" - l10n = FluentLocalization(["en"]) - - with pytest.raises(ValueError, match="Locale 'fr' not in fallback chain"): - l10n.add_resource("fr", "hello = Bonjour!") - - -class TestFallbackChain: - """Test locale fallback chain logic.""" - - def test_fallback_to_second_locale(self) -> None: - """Falls back to second locale when message missing in first.""" - l10n = FluentLocalization(["lv", "en"]) - # Add message only to English (not Latvian) - l10n.add_resource("en", "greeting = Hello!") - - result, errors = l10n.format_value("greeting") - - assert not errors - assert result == "Hello!" - - def test_fallback_to_third_locale(self) -> None: - """Falls back through chain to third locale.""" - l10n = FluentLocalization(["lv", "en", "lt"]) - # Add message only to Lithuanian - l10n.add_resource("lt", "welcome = Labas!") - - result, errors = l10n.format_value("welcome") - - assert not errors - assert result == "Labas!" - - def test_first_locale_takes_precedence(self) -> None: - """First locale in chain takes precedence over later locales.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("lv", "msg = Latvian version") - l10n.add_resource("en", "msg = English version") - - result, errors = l10n.format_value("msg") - - assert not errors - # Should use first locale (lv), not fallback to en - assert result == "Latvian version" - - def test_partial_translations(self) -> None: - """Handles partial translations with different messages per locale.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("lv", "home = Mājas") - l10n.add_resource("en", "home = Home\nabout = About") - - home_result, _ = l10n.format_value("home") - about_result, _ = l10n.format_value("about") - - assert home_result == "Mājas" # From lv - assert about_result == "About" # Falls back to en - - def test_message_not_found_in_any_locale(self) -> None: - """Message not found in any locale returns fallback.""" - l10n = FluentLocalization(["lv", "en"], strict=False) - l10n.add_resource("lv", "hello = Sveiki!") - l10n.add_resource("en", "hello = Hello!") - - result, errors = l10n.format_value("nonexistent") - - assert result == "{nonexistent}" - assert len(errors) == 1 - # Check error message contains 'nonexistent' - assert "nonexistent" in str(errors[0]) - - -class TestFormatValue: - """Test format_value method.""" - - def test_format_simple_message(self) -> None: - """Format simple message without variables.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "hello = Hello, World!") - - result, errors = l10n.format_value("hello") - - assert result == "Hello, World!" - assert errors == () - - def test_format_message_with_variables(self) -> None: - """Format message with variable interpolation.""" - l10n = FluentLocalization(["en"], use_isolating=False) - l10n.add_resource("en", "greeting = Hello, { $name }!") - - result, errors = l10n.format_value("greeting", {"name": "Anna"}) - - assert not errors - - assert result == "Hello, Anna!" - - def test_format_message_with_multiple_variables(self) -> None: - """Format message with multiple variables.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "user-info = { $firstName } { $lastName } (Age: { $age })") - - result, errors = l10n.format_value( - "user-info", {"firstName": "John", "lastName": "Doe", "age": 30} - ) - - assert not errors - - assert "John" in result - assert "Doe" in result - assert "30" in result - - def test_format_propagates_bundle_errors(self) -> None: - """Format propagates errors from FluentBundle.""" - l10n = FluentLocalization(["en"], strict=False) - l10n.add_resource("en", "msg = Hello, { $name }!") - - # Missing required variable - result, errors = l10n.format_value("msg") - - assert "Hello" in result - assert len(errors) > 0 # Bundle should report missing variable - - def test_empty_message_id_returns_fallback(self) -> None: - """Empty message ID returns graceful fallback.""" - l10n = FluentLocalization(["en"], strict=False) - l10n.add_resource("en", "hello = Hello!") - - result, errors = l10n.format_value("") - - assert result == "{???}" - assert len(errors) == 1 - assert "Empty or invalid message ID" in str(errors[0]) - - -class TestHasMessage: - """Test has_message method.""" - - def test_has_message_in_first_locale(self) -> None: - """Returns True if message in first locale.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("lv", "hello = Sveiki!") - - assert l10n.has_message("hello") is True - - def test_has_message_in_fallback_locale(self) -> None: - """Returns True if message in fallback locale.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("en", "hello = Hello!") - - assert l10n.has_message("hello") is True - - def test_has_message_not_found(self) -> None: - """Returns False if message not in any locale.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "hello = Hello!") - - assert l10n.has_message("goodbye") is False - - -class TestGetBundles: - """Test get_bundles generator.""" - - def test_get_bundles_returns_generator(self) -> None: - """get_bundles returns a generator.""" - l10n = FluentLocalization(["en", "fr"]) - - bundles_gen = l10n.get_bundles() - - # Generator should be iterable - bundles = list(bundles_gen) - assert len(bundles) == 2 - - def test_get_bundles_respects_locale_order(self) -> None: - """get_bundles yields bundles in locale priority order.""" - l10n = FluentLocalization(["lv", "en", "lt"]) - - bundles = list(l10n.get_bundles()) - - assert bundles[0].locale == "lv" - assert bundles[1].locale == "en" - assert bundles[2].locale == "lt" - - -class TestUseIsolating: - """Test use_isolating parameter.""" - - def test_use_isolating_true(self) -> None: - """use_isolating=True wraps placeables in isolation marks.""" - l10n = FluentLocalization(["en"], use_isolating=True) - l10n.add_resource("en", "msg = Hello, { $name }!") - - result, errors = l10n.format_value("msg", {"name": "Anna"}) - - assert not errors - - # Should contain Unicode bidi isolation marks - assert "\u2068" in result # FSI (First Strong Isolate) - assert "\u2069" in result # PDI (Pop Directional Isolate) - - def test_use_isolating_false(self) -> None: - """use_isolating=False does not wrap placeables.""" - l10n = FluentLocalization(["en"], use_isolating=False) - l10n.add_resource("en", "msg = Hello, { $name }!") - - result, errors = l10n.format_value("msg", {"name": "Anna"}) - - assert not errors - - # Should NOT contain isolation marks - assert "\u2068" not in result - assert "\u2069" not in result - - -class TestPathResourceLoader: - """Test PathResourceLoader implementation.""" - - def test_path_resource_loader_load(self, tmp_path: Path) -> None: - """PathResourceLoader loads FTL files from disk.""" - # Create test FTL files - locales_dir = tmp_path / "locales" - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - - main_ftl = en_dir / "main.ftl" - main_ftl.write_text("hello = Hello, World!", encoding="utf-8") - - # Load resource - loader = PathResourceLoader(str(locales_dir / "{locale}")) - ftl_source = loader.load("en", "main.ftl") - - assert ftl_source == "hello = Hello, World!" - - def test_path_resource_loader_missing_locale_placeholder_raises(self) -> None: - """PathResourceLoader raises ValueError when {locale} placeholder is missing.""" - # Fail-fast: Missing placeholder would cause silent data corruption - # where all locales load from the same static path - with pytest.raises(ValueError, match=r"must contain '\{locale\}' placeholder"): - PathResourceLoader("locales/en") # Missing {locale} - - with pytest.raises(ValueError, match=r"must contain '\{locale\}' placeholder"): - PathResourceLoader("/absolute/path/to/locales") # Missing {locale} - - # Valid: Contains {locale} placeholder - loader = PathResourceLoader("locales/{locale}") # Should not raise - assert "{locale}" in loader.base_path - - def test_path_resource_loader_file_not_found(self, tmp_path: Path) -> None: - """PathResourceLoader raises FileNotFoundError for missing files.""" - loader = PathResourceLoader(str(tmp_path / "{locale}")) - - with pytest.raises(FileNotFoundError): - loader.load("en", "nonexistent.ftl") - - def test_path_resource_loader_with_localization(self, tmp_path: Path) -> None: - """PathResourceLoader integrates with FluentLocalization.""" - # Create test structure: locales/en/main.ftl, locales/lv/main.ftl - locales_dir = tmp_path / "locales" - - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text("hello = Hello!", encoding="utf-8") - - lv_dir = locales_dir / "lv" - lv_dir.mkdir(parents=True) - (lv_dir / "main.ftl").write_text("hello = Sveiki!", encoding="utf-8") - - # Create localization with loader - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["lv", "en"], ["main.ftl"], loader) - - result, errors = l10n.format_value("hello") - - assert not errors - assert result == "Sveiki!" # From lv - - def test_path_resource_loader_missing_locale_file_uses_fallback( - self, tmp_path: Path - ) -> None: - """Missing locale file falls back to next locale.""" - # Create only English file (no Latvian) - locales_dir = tmp_path / "locales" - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text("hello = Hello!", encoding="utf-8") - - # Latvian directory doesn't exist - will fall back to English - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["lv", "en"], ["main.ftl"], loader) - - result, errors = l10n.format_value("hello") - - assert not errors - assert result == "Hello!" # Fell back to English - - def test_resource_loader_describe_path_default(self) -> None: - """ResourceLoader.describe_path default returns locale/resource_id.""" - - class _MinimalLoader(ResourceLoader): - def load(self, _locale: str, _resource_id: str) -> str: - return "" - - loader = _MinimalLoader() - result = loader.describe_path("en", "main.ftl") - assert result == "en/main.ftl" - - def test_resource_loader_describe_path_default_no_override(self) -> None: - """ResourceLoader.describe_path default is used when subclass does not override.""" - - class _BareLoader(ResourceLoader): - def load(self, _locale: str, _resource_id: str) -> str: - return "" - - loader = _BareLoader() - assert loader.describe_path("de_DE", "errors.ftl") == "de_DE/errors.ftl" - - -class TestRealWorldScenarios: - """Test real-world usage patterns.""" - - def test_e_commerce_site_partial_translations(self) -> None: - """E-commerce site with partial Latvian translations.""" - l10n = FluentLocalization(["lv", "en"], use_isolating=False) - - # Latvian has only some translations - l10n.add_resource( - "lv", - """ -welcome = Sveiki, { $name }! -cart = Grozs -""", - ) - - # English has full translations - l10n.add_resource( - "en", - """ -welcome = Hello, { $name }! -cart = Cart -checkout = Checkout -payment-error = Payment failed: { $reason } -""", - ) - - # Messages in Latvian use lv - welcome, _ = l10n.format_value("welcome", {"name": "Anna"}) - assert welcome == "Sveiki, Anna!" - - cart, _ = l10n.format_value("cart") - assert cart == "Grozs" - - # Missing messages fall back to English - checkout, _ = l10n.format_value("checkout") - assert checkout == "Checkout" - - payment, _ = l10n.format_value("payment-error", {"reason": "Invalid card"}) - assert payment == "Payment failed: Invalid card" - - def test_fallback_chain_three_locales(self) -> None: - """Complex fallback: lv → en → lt.""" - l10n = FluentLocalization(["lv", "en", "lt"]) - - l10n.add_resource("lv", "home = Mājas") - l10n.add_resource("en", "home = Home\nabout = About") - l10n.add_resource("lt", "home = Namai\nabout = Apie\ncontact = Kontaktai") - - home, _ = l10n.format_value("home") - assert home == "Mājas" # From lv - - about, _ = l10n.format_value("about") - assert about == "About" # Falls back to en (skips lv) - - contact, _ = l10n.format_value("contact") - assert contact == "Kontaktai" # Falls back to lt (skips lv, en) - - def test_multiple_resource_files(self, tmp_path: Path) -> None: - """Multiple FTL files per locale (ui.ftl, errors.ftl).""" - # Create directory structure - locales_dir = tmp_path / "locales" - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - - (en_dir / "ui.ftl").write_text("hello = Hello!\nwelcome = Welcome!", encoding="utf-8") - (en_dir / "errors.ftl").write_text("error-404 = Page not found", encoding="utf-8") - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["en"], ["ui.ftl", "errors.ftl"], loader) - - # Should load from both files - hello, _ = l10n.format_value("hello") - error, _ = l10n.format_value("error-404") - - assert hello == "Hello!" - assert error == "Page not found" - - -class TestCacheConfiguration: - """Test cache configuration in FluentLocalization.""" - - def test_cache_disabled_by_default(self) -> None: - """Cache is disabled by default.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "msg = Hello") - - # Format twice - l10n.format_value("msg") - l10n.format_value("msg") - - # Get stats from first bundle - bundles = list(l10n.get_bundles()) - stats = bundles[0].get_cache_stats() - - # Cache disabled - stats should be None - assert stats is None - - def test_cache_enabled_with_parameter(self) -> None: - """Cache can be enabled via constructor parameter.""" - l10n = FluentLocalization(["en"], cache=CacheConfig()) - l10n.add_resource("en", "msg = Hello") - - # Format twice - should hit cache on second call - l10n.format_value("msg") - l10n.format_value("msg") - - # Get stats from first bundle - bundles = list(l10n.get_bundles()) - stats = bundles[0].get_cache_stats() - - # Cache enabled - should have stats - assert stats is not None - assert stats["hits"] == 1 - assert stats["misses"] == 1 - - def test_cache_size_configurable(self) -> None: - """Cache size can be configured via constructor parameter.""" - l10n = FluentLocalization(["en"], cache=CacheConfig(size=500)) - l10n.add_resource("en", "msg = Hello") - - # Format message - l10n.format_value("msg") - - # Verify cache is enabled (size configuration is internal) - bundles = list(l10n.get_bundles()) - stats = bundles[0].get_cache_stats() - assert stats is not None - - def test_cache_works_across_multiple_locales(self) -> None: - """Cache enabled for all bundles in multi-locale setup.""" - l10n = FluentLocalization(["lv", "en"], cache=CacheConfig()) - l10n.add_resource("lv", "msg = Sveiki") - l10n.add_resource("en", "msg = Hello") - - # Format from primary locale (lv) - l10n.format_value("msg") - l10n.format_value("msg") - - # Verify lv bundle has cache hits - bundles = list(l10n.get_bundles()) - lv_stats = bundles[0].get_cache_stats() - assert lv_stats is not None - assert lv_stats["hits"] == 1 - - def test_clear_cache_on_all_bundles(self) -> None: - """clear_cache() clears cache on all bundles.""" - l10n = FluentLocalization(["lv", "en"], cache=CacheConfig()) - l10n.add_resource("lv", "msg = Sveiki") - l10n.add_resource("en", "msg = Hello") - - # Format messages to populate cache - l10n.format_value("msg") - l10n.format_value("msg") - - # Clear cache - l10n.clear_cache() - - # Format again - should be cache miss - l10n.format_value("msg") - - # Verify cache was cleared; metrics are cumulative (not reset on clear). - # 1 miss before clear + 1 miss after clear = 2 cumulative misses. - bundles = list(l10n.get_bundles()) - lv_stats = bundles[0].get_cache_stats() - assert lv_stats is not None - assert lv_stats["misses"] == 2 # Pre-clear miss + post-clear miss - - -class TestCacheIntrospection: - """Test cache introspection properties.""" - - def test_cache_enabled_property_when_enabled(self) -> None: - """cache_enabled property returns True when caching enabled.""" - l10n = FluentLocalization(["en"], cache=CacheConfig()) - assert l10n.cache_enabled is True - - def test_cache_enabled_property_when_disabled(self) -> None: - """cache_enabled property returns False when no CacheConfig is provided.""" - l10n = FluentLocalization(["en"]) - assert l10n.cache_enabled is False - - def test_cache_config_property_when_enabled(self) -> None: - """cache_config property returns CacheConfig when caching enabled.""" - l10n = FluentLocalization(["en"], cache=CacheConfig(size=500)) - assert l10n.cache_config is not None - assert l10n.cache_config.size == 500 - - def test_cache_config_property_when_disabled(self) -> None: - """cache_config returns None when caching disabled.""" - l10n = FluentLocalization(["en"]) - assert l10n.cache_config is None - - def test_bundle_cache_properties_reflect_localization_config(self) -> None: - """Individual bundles reflect FluentLocalization cache config.""" - l10n = FluentLocalization(["lv", "en"], cache=CacheConfig(size=250)) - - # Check all bundles have matching config - for bundle in l10n.get_bundles(): - assert bundle.cache_enabled is True - assert bundle.cache_config is not None - assert bundle.cache_config.size == 250 - - -class TestMultiLocaleFileLoading: - """Tests for multi-locale file loading workflows. - - These tests verify the end-to-end workflow of loading FTL files - from disk across multiple locales with proper fallback behavior. - """ - - def test_load_multiple_files_per_locale(self, tmp_path: Path) -> None: - """Multiple FTL files per locale are loaded and merged correctly.""" - locales_dir = tmp_path / "locales" - - # Create en locale with multiple files - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text("welcome = Welcome!", encoding="utf-8") - (en_dir / "errors.ftl").write_text("error-404 = Not Found", encoding="utf-8") - (en_dir / "buttons.ftl").write_text("submit = Submit", encoding="utf-8") - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization( - ["en"], ["main.ftl", "errors.ftl", "buttons.ftl"], loader - ) - - # All messages from all files should be available - welcome, _ = l10n.format_value("welcome") - error, _ = l10n.format_value("error-404") - submit, _ = l10n.format_value("submit") - - assert welcome == "Welcome!" - assert error == "Not Found" - assert submit == "Submit" - - def test_fallback_across_multiple_files(self, tmp_path: Path) -> None: - """Fallback works correctly across multiple files and locales.""" - locales_dir = tmp_path / "locales" - - # Create en locale (complete) - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text("home = Home\nabout = About", encoding="utf-8") - (en_dir / "errors.ftl").write_text("error-404 = Not Found", encoding="utf-8") - - # Create de locale (partial - missing errors.ftl) - de_dir = locales_dir / "de" - de_dir.mkdir(parents=True) - (de_dir / "main.ftl").write_text("home = Startseite\nabout = Uber uns", encoding="utf-8") - # Note: de/errors.ftl intentionally missing - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["de", "en"], ["main.ftl", "errors.ftl"], loader) - - # de messages should come from de - home, _ = l10n.format_value("home") - assert home == "Startseite" - - # error should fall back to en (de/errors.ftl missing) - error, _ = l10n.format_value("error-404") - assert error == "Not Found" - - def test_partial_translation_within_file(self, tmp_path: Path) -> None: - """Partial translations within a file fall back correctly.""" - locales_dir = tmp_path / "locales" - - # Create en locale (complete) - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text( - "home = Home\nabout = About\ncontact = Contact", encoding="utf-8" - ) - - # Create fr locale (partial translations) - fr_dir = locales_dir / "fr" - fr_dir.mkdir(parents=True) - (fr_dir / "main.ftl").write_text("home = Accueil", encoding="utf-8") - # Note: about and contact missing in fr - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["fr", "en"], ["main.ftl"], loader) - - # fr message from fr - home, _ = l10n.format_value("home") - assert home == "Accueil" - - # missing fr messages fall back to en - about, _ = l10n.format_value("about") - contact, _ = l10n.format_value("contact") - assert about == "About" - assert contact == "Contact" - - def test_three_locale_fallback_chain(self, tmp_path: Path) -> None: - """Three-locale fallback chain works correctly.""" - locales_dir = tmp_path / "locales" - - # en has all messages - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text( - "level1 = English One\nlevel2 = English Two\nlevel3 = English Three", - encoding="utf-8" - ) - - # de has two messages - de_dir = locales_dir / "de" - de_dir.mkdir(parents=True) - (de_dir / "main.ftl").write_text( - "level1 = Deutsch Eins\nlevel2 = Deutsch Zwei", - encoding="utf-8" - ) - - # fr has one message - fr_dir = locales_dir / "fr" - fr_dir.mkdir(parents=True) - (fr_dir / "main.ftl").write_text("level1 = Francais Un", encoding="utf-8") - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["fr", "de", "en"], ["main.ftl"], loader) - - # level1 from fr (first locale) - level1, _ = l10n.format_value("level1") - assert level1 == "Francais Un" - - # level2 from de (second locale, fr doesn't have it) - level2, _ = l10n.format_value("level2") - assert level2 == "Deutsch Zwei" - - # level3 from en (third locale, fr and de don't have it) - level3, _ = l10n.format_value("level3") - assert level3 == "English Three" - - def test_unicode_content_in_files(self, tmp_path: Path) -> None: - """FTL files containing CJK and accented Unicode characters load correctly.""" - locales_dir = tmp_path / "locales" - - ja_dir = locales_dir / "ja" - ja_dir.mkdir(parents=True) - (ja_dir / "main.ftl").write_text( - "greeting = \u3053\u3093\u306b\u3061\u306f\u4e16\u754c", encoding="utf-8" - ) - - lv_dir = locales_dir / "lv" - lv_dir.mkdir(parents=True) - (lv_dir / "main.ftl").write_text("greeting = Sveiki, pasaule!", encoding="utf-8") - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - - l10n_ja = FluentLocalization(["ja"], ["main.ftl"], loader) - l10n_lv = FluentLocalization(["lv"], ["main.ftl"], loader) - - ja_greeting, _ = l10n_ja.format_value("greeting") - lv_greeting, _ = l10n_lv.format_value("greeting") - - assert "\u3053\u3093\u306b\u3061\u306f" in ja_greeting - assert lv_greeting == "Sveiki, pasaule!" - - def test_missing_locale_directory_falls_back(self, tmp_path: Path) -> None: - """Missing locale directory gracefully falls back to next locale.""" - locales_dir = tmp_path / "locales" - - # Only create en directory (no de) - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text("greeting = Hello!", encoding="utf-8") - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - # de is first but doesn't exist - l10n = FluentLocalization(["de", "en"], ["main.ftl"], loader) - - # Should fall back to en - greeting, _ = l10n.format_value("greeting") - assert greeting == "Hello!" - - def test_empty_file_handled_gracefully(self, tmp_path: Path) -> None: - """Empty FTL files are handled without errors.""" - locales_dir = tmp_path / "locales" - - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "empty.ftl").write_text("", encoding="utf-8") - (en_dir / "main.ftl").write_text("greeting = Hello!", encoding="utf-8") - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["en"], ["empty.ftl", "main.ftl"], loader) - - # Should still work - empty file just adds no messages - greeting, _ = l10n.format_value("greeting") - assert greeting == "Hello!" - - def test_file_with_only_comments(self, tmp_path: Path) -> None: - """FTL files with only comments are handled correctly.""" - locales_dir = tmp_path / "locales" - - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "comments.ftl").write_text( - "# This file has only comments\n## Section comment\n### Resource comment", - encoding="utf-8" - ) - (en_dir / "main.ftl").write_text("greeting = Hello!", encoding="utf-8") - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["en"], ["comments.ftl", "main.ftl"], loader) - - # Should work - comments file adds no messages - greeting, _ = l10n.format_value("greeting") - assert greeting == "Hello!" - - def test_variables_in_file_loaded_messages(self, tmp_path: Path) -> None: - """Variables work correctly in file-loaded messages.""" - locales_dir = tmp_path / "locales" - - en_dir = locales_dir / "en" - en_dir.mkdir(parents=True) - (en_dir / "main.ftl").write_text( - "greeting = Hello, { $name }!\ncount = You have { $n } items.", - encoding="utf-8" - ) - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - l10n = FluentLocalization(["en"], ["main.ftl"], loader, use_isolating=False) - - greeting, _ = l10n.format_value("greeting", {"name": "World"}) - count, _ = l10n.format_value("count", {"n": 42}) - - assert greeting == "Hello, World!" - assert "42" in count - - -class TestOnFallbackCallback: - """on_fallback callback is invoked when a message resolves from a fallback locale.""" - - def test_on_fallback_invoked_on_format_value(self) -> None: - """on_fallback callback invoked when message resolved from fallback locale.""" - fallback_events: list[FallbackInfo] = [] - - def record_fallback(info: FallbackInfo) -> None: - fallback_events.append(info) - - l10n = FluentLocalization(["lv", "en"], on_fallback=record_fallback) - - # Add message only to fallback locale (en) - l10n.add_resource("en", "fallback-msg = English fallback") - - # Request message - should trigger fallback - result, _ = l10n.format_value("fallback-msg") - - assert result == "English fallback" - assert len(fallback_events) == 1 - assert fallback_events[0].requested_locale == normalize_locale("lv") - assert fallback_events[0].resolved_locale == normalize_locale("en") - assert fallback_events[0].message_id == "fallback-msg" - - def test_on_fallback_invoked_on_format_pattern(self) -> None: - """on_fallback callback invoked in format_pattern when using fallback locale.""" - fallback_events: list[FallbackInfo] = [] - - def record_fallback(info: FallbackInfo) -> None: - fallback_events.append(info) - - l10n = FluentLocalization(["de", "en"], on_fallback=record_fallback) - - # Add message only to fallback locale (en) - l10n.add_resource("en", "pattern-msg = Pattern from fallback") - - # Request message via format_pattern - should trigger fallback - result, _ = l10n.format_pattern("pattern-msg") - - assert result == "Pattern from fallback" - assert len(fallback_events) == 1 - assert fallback_events[0].requested_locale == normalize_locale("de") - assert fallback_events[0].resolved_locale == normalize_locale("en") - assert fallback_events[0].message_id == "pattern-msg" - - def test_on_fallback_not_invoked_for_primary_locale(self) -> None: - """on_fallback not invoked when message found in primary locale.""" - fallback_events: list[FallbackInfo] = [] - - def record_fallback(info: FallbackInfo) -> None: - fallback_events.append(info) - - l10n = FluentLocalization(["fr", "en"], on_fallback=record_fallback) - - # Add message to primary locale (fr) - l10n.add_resource("fr", "french-msg = Message en francais") - - result, _ = l10n.format_value("french-msg") - - assert result == "Message en francais" - assert len(fallback_events) == 0 # No fallback occurred - - def test_on_fallback_none_does_not_raise(self) -> None: - """on_fallback=None (default) works without errors.""" - l10n = FluentLocalization(["lv", "en"]) - - l10n.add_resource("en", "msg = No callback") - - # Should not raise even without callback - result, _ = l10n.format_value("msg") - assert result == "No callback" - - def test_on_fallback_multiple_calls(self) -> None: - """on_fallback invoked for each fallback resolution.""" - fallback_events: list[FallbackInfo] = [] - - def record_fallback(info: FallbackInfo) -> None: - fallback_events.append(info) - - l10n = FluentLocalization(["it", "en"], on_fallback=record_fallback) - - l10n.add_resource("en", "msg1 = First\nmsg2 = Second") - - l10n.format_value("msg1") - l10n.format_value("msg2") - - assert len(fallback_events) == 2 - assert fallback_events[0].message_id == "msg1" - assert fallback_events[1].message_id == "msg2" - - def test_on_fallback_with_format_pattern_and_attribute(self) -> None: - """on_fallback invoked in format_pattern with attribute access.""" - fallback_events: list[FallbackInfo] = [] - - def record_fallback(info: FallbackInfo) -> None: - fallback_events.append(info) - - l10n = FluentLocalization(["es", "en"], on_fallback=record_fallback) - - l10n.add_resource( - "en", - """ -button = Click - .tooltip = Button tooltip -""", - ) - - # Request attribute via format_pattern - result, _ = l10n.format_pattern("button", attribute="tooltip") - - assert result == "Button tooltip" - assert len(fallback_events) == 1 - assert fallback_events[0].message_id == "button" - - -class TestCrossFileDepthValidation: - """Reference depth limits are enforced even when the chain spans multiple add_resource calls.""" - - def test_deep_reference_chain_across_resources(self) -> None: - """Reference chains spanning multiple resources respect depth limits. - - When messages reference each other across multiple add_resource calls, - the total depth limit should still be enforced. - """ - l10n = FluentLocalization(["en"], use_isolating=False) - - # Add resources separately - chain: level5 -> level4 -> level3 -> level2 -> level1 -> base - l10n.add_resource("en", "base = Base value") - l10n.add_resource("en", "level1 = L1: { base }") - l10n.add_resource("en", "level2 = L2: { level1 }") - l10n.add_resource("en", "level3 = L3: { level2 }") - l10n.add_resource("en", "level4 = L4: { level3 }") - l10n.add_resource("en", "level5 = L5: { level4 }") - - # Should resolve successfully (depth 6 is within default limit of 50) - result, errors = l10n.format_value("level5") - - assert not errors - assert "Base value" in result - assert "L1:" in result - assert "L5:" in result - - def test_very_deep_reference_chain_is_limited(self) -> None: - """Reference chains exceeding max_nesting_depth produce errors, not stack overflow.""" - bundle = FluentBundle("en", use_isolating=False, max_nesting_depth=10, strict=False) - - # Build a chain deeper than max_nesting_depth - bundle.add_resource("level0 = Base") - for i in range(1, 15): # 15 levels, exceeds max_depth of 10 - bundle.add_resource(f"level{i} = Chain {{ level{i-1} }}") - - # Resolving the deepest level should hit depth limit - result, errors = bundle.format_pattern("level14") - - # Depth limit exceeded must produce resolution errors - assert len(errors) > 0, f"Expected depth limit errors; got result={result!r}" - - def test_cross_file_term_reference_depth(self) -> None: - """Term references across resources are tracked for depth. - - Terms (-name syntax) referenced across multiple resources - should also respect depth limits. - """ - l10n = FluentLocalization(["en"], use_isolating=False) - - # Add terms and messages across resources - l10n.add_resource("en", "-brand = Firefox") - l10n.add_resource("en", "-product = { -brand } Browser") - l10n.add_resource("en", "title = Welcome to { -product }") - l10n.add_resource("en", "subtitle = { title } - Get Started") - - result, errors = l10n.format_value("subtitle") - - assert not errors - assert "Firefox" in result - assert "Browser" in result - assert "Welcome" in result - - def test_cross_locale_depth_isolation(self) -> None: - """Depth limits are applied per-resolution, not accumulating across locales. - - When falling back through locales, each resolution attempt - should have its own depth counter, not share state. - """ - l10n = FluentLocalization(["de", "en"], use_isolating=False) - - # German has deep chain - l10n.add_resource("de", "a = DE-A") - l10n.add_resource("de", "b = DE-B: { a }") - l10n.add_resource("de", "c = DE-C: { b }") - - # English has different chain (also deep) - l10n.add_resource("en", "x = EN-X") - l10n.add_resource("en", "y = EN-Y: { x }") - - # Resolve German chain - result_c, errors_c = l10n.format_value("c") - assert not errors_c - assert "DE-A" in result_c - - # Resolve English chain (falls back since not in de) - result_y, errors_y = l10n.format_value("y") - assert not errors_y - assert "EN-X" in result_y - - def test_circular_reference_detection_across_resources(self) -> None: - """Circular references across resources are detected. - - Even when circular references are created by adding resources - separately, the resolver should detect and break the cycle. - """ - l10n = FluentLocalization(["en"], use_isolating=False, strict=False) - - # Create circular reference: msg1 -> msg2 -> msg3 -> msg1 - l10n.add_resource("en", "msg1 = Start: { msg2 }") - l10n.add_resource("en", "msg2 = Middle: { msg3 }") - l10n.add_resource("en", "msg3 = End: { msg1 }") # Circular! - - # Resolution should detect cycle and not stack overflow - result, errors = l10n.format_value("msg1") - - # Should have errors (cycle detected) or produce partial result - # The key is it doesn't hang or crash - assert isinstance(result, str) - assert len(errors) > 0 or "{" in result # Either errors or unresolved placeholder - - def test_select_expression_depth_across_resources(self) -> None: - """Select expressions in cross-resource chains respect depth. - - Complex patterns with select expressions referenced across - resources should not bypass depth limits. - """ - l10n = FluentLocalization(["en"], use_isolating=False) - - l10n.add_resource( - "en", - """ -base = { $type -> - [a] Type A - [b] Type B - *[other] Unknown -} -""", - ) - l10n.add_resource("en", "wrapper = Result: { base }") - l10n.add_resource("en", "outer = Final: { wrapper }") - - result, errors = l10n.format_value("outer", {"type": "a"}) - - assert not errors - assert "Type A" in result - assert "Final:" in result - - -class TestPathResourceLoaderResolvedRoot: - """PathResourceLoader._resolved_root falls back to cwd when no static prefix.""" - - def test_resolved_root_fallback_to_cwd(self) -> None: - """Pattern with no static path prefix resolves root to current working directory.""" - loader = PathResourceLoader("{locale}") - expected = Path.cwd().resolve() - assert loader._resolved_root == expected # pylint: disable=protected-access - - -class TestPathResourceLoaderSecurity: - """PathResourceLoader rejects path traversal and absolute path inputs.""" - - def test_load_rejects_absolute_path(self) -> None: - """Absolute path resource_id raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match="Absolute paths not allowed"): - loader.load("en", "/etc/passwd") - - def test_load_rejects_absolute_path_posix_style(self) -> None: - """POSIX absolute path resource_id raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match="Absolute paths not allowed"): - loader.load("en", "/usr/local/etc/passwd") - - def test_load_rejects_parent_directory_traversal(self) -> None: - """'..' sequences in resource_id raise ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match="Path traversal sequences not allowed"): - loader.load("en", "../../../etc/passwd") - - def test_load_rejects_parent_directory_in_middle(self) -> None: - """'..' in the middle of a resource_id path raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match="Path traversal sequences not allowed"): - loader.load("en", "foo/../bar/../secrets.ftl") - - def test_load_rejects_path_starting_with_forward_slash(self) -> None: - """resource_id starting with '/' is rejected as absolute or separator-prefixed. - - On Unix, /messages.ftl is caught as an absolute path first. - On Windows with forward slash it may be caught by the separator check. - """ - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match=r"(Absolute|separator)"): - loader.load("en", "/messages.ftl") - - def test_load_rejects_path_starting_with_backslash(self) -> None: - """resource_id starting with '\\' is rejected.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match="not allowed in resource_id"): - loader.load("en", "\\messages.ftl") - - def test_load_detects_symlink_escape_via_is_safe_path(self) -> None: - """Symlink pointing outside the base directory is rejected by _is_safe_path.""" - - with tempfile.TemporaryDirectory() as tmpdir: - base_path = Path(tmpdir) - locale_dir = base_path / "locales" / "en" - locale_dir.mkdir(parents=True) - - outside_dir = base_path / "outside" - outside_dir.mkdir() - secret_file = outside_dir / "secret.ftl" - secret_file.write_text("secret = Secret data") - - symlink_path = locale_dir / "escape.ftl" - try: - symlink_path.symlink_to(secret_file) - - loader = PathResourceLoader(str(base_path / "locales" / "{locale}")) - - with pytest.raises(ValueError, match="Path traversal detected"): - loader.load("en", "escape.ftl") - except OSError: - pytest.skip("Symlink creation not supported on this system") - - - -class TestPathResourceLoaderValidation: - """PathResourceLoader accepts valid resource_ids and rejects malformed ones.""" - - def test_load_with_valid_resource_id(self) -> None: - """Valid resource_id loads file content correctly.""" - - with tempfile.TemporaryDirectory() as tmpdir: - base = Path(tmpdir) - locale_dir = base / "locales" / "en" - locale_dir.mkdir(parents=True) - - test_file = locale_dir / "messages.ftl" - test_file.write_text("hello = Hello, World!") - - loader = PathResourceLoader(str(base / "locales" / "{locale}")) - content = loader.load("en", "messages.ftl") - - assert "Hello, World!" in content - - def test_load_with_subdirectory_resource_id(self) -> None: - """Subdirectory in resource_id resolves to nested path correctly.""" - - with tempfile.TemporaryDirectory() as tmpdir: - base = Path(tmpdir) - locale_dir = base / "locales" / "en" / "ui" - locale_dir.mkdir(parents=True) - - test_file = locale_dir / "buttons.ftl" - test_file.write_text("save = Save") - - loader = PathResourceLoader(str(base / "locales" / "{locale}")) - content = loader.load("en", "ui/buttons.ftl") - - assert "Save" in content - - def test_validate_resource_id_validates_before_path_resolution(self) -> None: - """Validation rejects malformed resource_ids before any filesystem operations.""" - loader = PathResourceLoader("locales/{locale}") - - invalid_ids = [ - "/absolute/path.ftl", - "..\\parent\\path.ftl", - "..\\..\\..\\escape.ftl", - "\\windows\\path.ftl", - ] - - for invalid_id in invalid_ids: - with pytest.raises(ValueError, match=r"(Absolute|traversal|separator)"): - loader.load("en", invalid_id) - - -class TestPathResourceLoaderLocaleValidation: - """PathResourceLoader rejects locale codes containing path traversal sequences.""" - - def test_load_rejects_locale_with_parent_traversal(self) -> None: - """'..' in locale code raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match=r"Invalid locale: '../../../etc'"): - loader.load("../../../etc", "messages.ftl") - - def test_load_rejects_locale_with_embedded_traversal(self) -> None: - """'..' embedded within locale code raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match=r"Invalid locale: 'en/\.\./de'"): - loader.load("en/../de", "messages.ftl") - - def test_load_rejects_locale_with_forward_slash(self) -> None: - """'/' in locale code raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match=r"Invalid locale: 'en/attack'"): - loader.load("en/attack", "messages.ftl") - - def test_load_rejects_locale_with_backslash(self) -> None: - """'\\' in locale code raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match=r"Invalid locale: 'en\\\\attack'"): - loader.load("en\\attack", "messages.ftl") - - def test_load_rejects_empty_locale(self) -> None: - """Empty locale code raises ValueError.""" - loader = PathResourceLoader("locales/{locale}") - - with pytest.raises(ValueError, match="locale cannot be blank"): - loader.load("", "messages.ftl") - - def test_load_accepts_valid_locale_codes(self) -> None: - """Standard BCP 47-style locale codes are accepted.""" - - with tempfile.TemporaryDirectory() as tmpdir: - base = Path(tmpdir) - - valid_locales = ["en", "en_US", "de_DE", "lv_LV", "zh_Hans_CN"] - - for locale in valid_locales: - locale_dir = base / "locales" / normalize_locale(locale) - locale_dir.mkdir(parents=True, exist_ok=True) - test_file = locale_dir / "test.ftl" - test_file.write_text(f"msg = Test for {locale}") - - loader = PathResourceLoader(str(base / "locales" / "{locale}")) - - for locale in valid_locales: - content = loader.load(locale, "test.ftl") - assert f"Test for {locale}" in content - - def test_root_dir_parameter_provides_fixed_anchor(self) -> None: - """root_dir anchors path validation independently of the locale parameter.""" - - with tempfile.TemporaryDirectory() as tmpdir: - base = Path(tmpdir) - locale_dir = base / "locales" / "en" - locale_dir.mkdir(parents=True) - test_file = locale_dir / "test.ftl" - test_file.write_text("msg = Test") - - loader = PathResourceLoader( - str(base / "locales" / "{locale}"), - root_dir=str(base), - ) - - content = loader.load("en", "test.ftl") - assert "Test" in content - - def test_root_dir_prevents_locale_escape_attempt(self) -> None: - """root_dir constrains path validation to a fixed boundary. - - When a symlink inside the locale directory resolves to a file - outside root_dir, the loader raises ValueError even though the - resource_id itself contains no traversal sequences. - """ - with tempfile.TemporaryDirectory() as tmpdir: - base = Path(tmpdir) - locale_dir = base / "locales" / "en" - locale_dir.mkdir(parents=True) - (locale_dir / "test.ftl").write_text("msg = Test") - - outside = base / "outside" - outside.mkdir() - secret = outside / "secret.ftl" - secret.write_text("secret = Should not access") - - loader = PathResourceLoader( - str(base / "locales" / "{locale}"), - root_dir=str(base / "locales"), - ) - - # Normal load within root_dir succeeds - content = loader.load("en", "test.ftl") - assert "Test" in content - - # Symlink from within locale dir to a file outside root_dir - escape_link = locale_dir / "escape.ftl" - try: - escape_link.symlink_to(secret) - # The resource_id has no '..' but the resolved path escapes root_dir - with pytest.raises(ValueError, match="Path traversal detected"): - loader.load("en", "escape.ftl") - except OSError: - pytest.skip("Symlink creation not supported on this system") - - -class TestLocalizationBootTypesFacadeExport: - """Boot evidence types and loaders are accessible from the root facade.""" - - def test_load_status_accessible_from_root_facade(self) -> None: - """LoadStatus enum is exported from ftllexengine root facade.""" - assert ftllexengine.LoadStatus is LoadStatus - - def test_load_status_in_root_all(self) -> None: - """LoadStatus is listed in ftllexengine.__all__.""" - assert "LoadStatus" in ftllexengine.__all__ - - def test_fallback_info_accessible_from_root_facade(self) -> None: - """FallbackInfo is exported from ftllexengine root facade.""" - assert ftllexengine.FallbackInfo is FallbackInfo - - def test_load_summary_accessible_from_root_facade(self) -> None: - """LoadSummary is exported from ftllexengine root facade.""" - assert ftllexengine.LoadSummary is LoadSummary - - def test_resource_load_result_accessible_from_root_facade(self) -> None: - """ResourceLoadResult is exported from ftllexengine root facade.""" - assert ftllexengine.ResourceLoadResult is ResourceLoadResult - - def test_resource_loader_accessible_from_root_facade(self) -> None: - """ResourceLoader Protocol is exported from ftllexengine root facade.""" - assert ftllexengine.ResourceLoader is ResourceLoader - - def test_path_resource_loader_accessible_from_root_facade(self) -> None: - """PathResourceLoader is exported from ftllexengine root facade.""" - assert ftllexengine.PathResourceLoader is PathResourceLoader - - def test_localization_boot_config_accessible_from_root_facade(self) -> None: - """LocalizationBootConfig is exported from ftllexengine root facade.""" - assert ftllexengine.LocalizationBootConfig is LocalizationBootConfig - - def test_localization_cache_stats_accessible_from_root_facade(self) -> None: - """LocalizationCacheStats is exported from ftllexengine root facade.""" - assert ftllexengine.LocalizationCacheStats is LocalizationCacheStats - - def test_boot_types_in_root_all(self) -> None: - """All boot evidence types are listed in ftllexengine.__all__.""" - for name in ( - "FallbackInfo", - "LoadSummary", - "LocalizationBootConfig", - "LocalizationCacheStats", - "PathResourceLoader", - "ResourceLoadResult", - "ResourceLoader", - ): - assert name in ftllexengine.__all__, f"{name!r} missing from ftllexengine.__all__" - -# ============================================================================ -# TestFluentLocalizationAddResourceStream -# ============================================================================ - - -class TestFluentLocalizationAddResourceStream: - """FluentLocalization.add_resource_stream incremental resource loading.""" - - def test_loads_message_from_line_list(self) -> None: - """add_resource_stream registers messages for a locale.""" - l10n = FluentLocalization( - locales=("en",), - resource_ids=(), - ) - l10n.add_resource_stream("en", ["greeting = Hello\n"]) - result, errors = l10n.format_pattern("greeting") - assert errors == () - assert result == "Hello" - - def test_invalid_locale_raises(self) -> None: - """Locale not in fallback chain raises ValueError.""" - l10n = FluentLocalization(locales=("en",), resource_ids=()) - with pytest.raises(ValueError, match="not in fallback chain"): - l10n.add_resource_stream("de", ["msg = Value\n"]) - - def test_returns_empty_junk_on_clean_source(self) -> None: - """Clean stream returns empty junk tuple.""" - l10n = FluentLocalization(locales=("en",), resource_ids=()) - junk = l10n.add_resource_stream("en", ["msg = Value\n"]) - assert junk == () - - def test_source_path_accepted(self) -> None: - """source_path kwarg threads through without error.""" - l10n = FluentLocalization(locales=("en",), resource_ids=()) - l10n.add_resource_stream( - "en", ["msg = Value\n"], source_path="locales/en/ui.ftl" - ) - result, _ = l10n.format_pattern("msg") - assert result == "Value" - - def test_multiple_messages_from_stream(self) -> None: - """Multiple messages from a stream are all registered.""" - l10n = FluentLocalization(locales=("en",), resource_ids=()) - l10n.add_resource_stream("en", ["msg1 = One\n", "\n", "msg2 = Two\n"]) - r1, _ = l10n.format_pattern("msg1") - r2, _ = l10n.format_pattern("msg2") - assert r1 == "One" - assert r2 == "Two" - - def test_equivalence_with_add_resource(self) -> None: - """add_resource_stream produces same result as add_resource for same content.""" - source = "msg = Hello\n" - l1 = FluentLocalization(locales=("en",), resource_ids=()) - l1.add_resource("en", source) - l2 = FluentLocalization(locales=("en",), resource_ids=()) - l2.add_resource_stream("en", source.splitlines(keepends=True)) - r1, e1 = l1.format_pattern("msg") - r2, e2 = l2.format_pattern("msg") - assert r1 == r2 - assert e1 == e2 - - def test_second_call_reuses_existing_bundle(self) -> None: - """Second add_resource_stream call for same locale reuses the existing bundle. - - The first call creates the bundle lazily; the second call must take the - branch where the bundle already exists in _bundles (line 734->736 coverage). - """ - l10n = FluentLocalization(locales=("en",), resource_ids=()) - l10n.add_resource_stream("en", ["msg1 = First\n"]) - l10n.add_resource_stream("en", ["msg2 = Second\n"]) - r1, e1 = l10n.format_pattern("msg1") - r2, e2 = l10n.format_pattern("msg2") - assert r1 == "First" - assert r2 == "Second" - assert e1 == () - assert e2 == () - - -# ============================================================================ -# TestParseStreamFtlFacade -# ============================================================================ - - -class TestParseStreamFtlFacade: - """parse_stream_ftl is accessible from root facade.""" - - def test_accessible_from_root(self) -> None: - """parse_stream_ftl is importable from ftllexengine.""" - assert hasattr(ftllexengine, "parse_stream_ftl") - assert callable(ftllexengine.parse_stream_ftl) - - def test_in_root_all(self) -> None: - """parse_stream_ftl is listed in ftllexengine.__all__.""" - assert "parse_stream_ftl" in ftllexengine.__all__ - - def test_yields_entries_from_lines(self) -> None: - """parse_stream_ftl yields Message entries from line list.""" - entries = list(ftllexengine.parse_stream_ftl(["greeting = Hello\n"])) - assert len(entries) == 1 - assert isinstance(entries[0], Message) - assert entries[0].id.name == "greeting" +from tests.localization_cases.basics_and_fallback import * # noqa: F403 - split module reuses shared support imports +from tests.localization_cases.loaders_and_cache import * # noqa: F403 - split module reuses shared support imports +from tests.localization_cases.multilocale_and_callbacks import * # noqa: F403 - split module reuses shared support imports +from tests.localization_cases.validation_and_streams import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_localization_orchestration.py b/tests/test_localization_orchestration.py index 3e7b8943..2e48ca30 100644 --- a/tests/test_localization_orchestration.py +++ b/tests/test_localization_orchestration.py @@ -1,1516 +1,6 @@ -"""Tests for FluentLocalization orchestration API surface. +"""Aggregated localization orchestration test surface.""" -Covers FluentLocalization methods not exercised by the main integration tests: -- has_attribute: attribute existence across fallback chain -- get_message_ids: union of IDs across all locales -- get_message_variables: variable extraction with fallback -- get_all_message_variables: merged variable map -- get_message: AST node access with fallback chain -- get_term: AST node access with fallback chain -- introspect_term: term introspection with fallback -- __enter__/__exit__: context manager protocol -- get_load_summary: resource load tracking -- get_cache_stats: aggregate cache metrics branch -- get_cache_audit_log: per-locale cache audit access - -Also covers data type invariants for ResourceLoadResult, LoadSummary, -and PathResourceLoader initialization edge cases. - -Python 3.13+. -""" - -from __future__ import annotations - -from pathlib import Path - -import pytest -from hypothesis import HealthCheck, event, given, settings -from hypothesis import strategies as st - -from ftllexengine import validate_message_variables -from ftllexengine.core.locale_utils import normalize_locale -from ftllexengine.integrity import ( - FormattingIntegrityError, - IntegrityCheckFailedError, -) -from ftllexengine.localization import ( - CacheAuditLogEntry, - FluentLocalization, - LoadStatus, - LoadSummary, - PathResourceLoader, - ResourceLoadResult, -) -from ftllexengine.runtime.bundle import FluentBundle -from ftllexengine.runtime.cache_config import CacheConfig -from ftllexengine.syntax import Message, Term -from ftllexengine.syntax.ast import Junk, Span -from tests.strategies.localization import locale_chains, message_ids - - -class TestResourceLoadResultStatusProperties: - """ResourceLoadResult status predicates are mutually exclusive.""" - - @pytest.mark.parametrize("status", list(LoadStatus)) - def test_status_properties_exclusive(self, status: LoadStatus) -> None: - """Exactly one of is_success/is_not_found/is_error is True.""" - result = ResourceLoadResult("en", "main.ftl", status) - flags = [result.is_success, result.is_not_found, result.is_error] - assert sum(flags) == 1 - - def test_has_junk_true_when_junk_present(self) -> None: - """has_junk is True when junk_entries is non-empty.""" - junk = Junk(content="bad", span=Span(start=0, end=3)) - result = ResourceLoadResult( - "en", "test.ftl", LoadStatus.SUCCESS, - junk_entries=(junk,), - ) - assert result.has_junk is True - - def test_has_junk_false_when_empty(self) -> None: - """has_junk is False when junk_entries is empty.""" - result = ResourceLoadResult( - "en", "test.ftl", LoadStatus.SUCCESS, junk_entries=(), - ) - assert result.has_junk is False - - - - -class TestLoadSummaryStatistics: - """LoadSummary post_init and filtering methods.""" - - def _make_summary(self) -> LoadSummary: - """Build a LoadSummary with all three status types and junk.""" - junk = Junk(content="j", span=Span(start=0, end=1)) - results = ( - ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), - ResourceLoadResult( - "en", "junk.ftl", LoadStatus.SUCCESS, - junk_entries=(junk,), - ), - ResourceLoadResult("de", "nf.ftl", LoadStatus.NOT_FOUND), - ResourceLoadResult( - "fr", "err.ftl", LoadStatus.ERROR, - error=OSError("fail"), - ), - ) - return LoadSummary(results=results) - - def test_post_init_calculates_counts(self) -> None: - """__post_init__ calculates all aggregate counts.""" - summary = self._make_summary() - assert summary.total_attempted == 4 - assert summary.successful == 2 - assert summary.not_found == 1 - assert summary.errors == 1 - assert summary.junk_count == 1 - - def test_get_errors_returns_error_results(self) -> None: - """get_errors returns only ERROR status results.""" - summary = self._make_summary() - errors = summary.get_errors() - assert len(errors) == 1 - assert errors[0].locale == "fr" - - def test_get_not_found_returns_not_found_results(self) -> None: - """get_not_found returns only NOT_FOUND status results.""" - summary = self._make_summary() - not_found = summary.get_not_found() - assert len(not_found) == 1 - assert not_found[0].locale == "de" - - def test_get_successful_returns_success_results(self) -> None: - """get_successful returns only SUCCESS status results.""" - summary = self._make_summary() - successful = summary.get_successful() - assert len(successful) == 2 - - def test_get_by_locale_filters_correctly(self) -> None: - """get_by_locale returns results for specified locale only.""" - summary = self._make_summary() - en_results = summary.get_by_locale("en") - assert len(en_results) == 2 - assert all(r.locale == "en" for r in en_results) - - def test_get_with_junk_returns_junk_results(self) -> None: - """get_with_junk returns results with non-empty junk_entries.""" - summary = self._make_summary() - junk_results = summary.get_with_junk() - assert len(junk_results) == 1 - assert junk_results[0].resource_id == "junk.ftl" - - def test_get_all_junk_flattens_entries(self) -> None: - """get_all_junk returns flattened tuple of all Junk entries.""" - summary = self._make_summary() - all_junk = summary.get_all_junk() - assert len(all_junk) == 1 - - def test_has_errors_property(self) -> None: - """has_errors reflects errors count.""" - summary = self._make_summary() - assert summary.has_errors is True - - clean = LoadSummary(results=( - ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), - )) - assert clean.has_errors is False - - def test_has_junk_property(self) -> None: - """has_junk reflects junk_count.""" - summary = self._make_summary() - assert summary.has_junk is True - - def test_all_successful_ignores_junk(self) -> None: - """all_successful is True even with junk, if no errors/not_found.""" - junk = Junk(content="j", span=Span(start=0, end=1)) - summary = LoadSummary(results=( - ResourceLoadResult( - "en", "ok.ftl", LoadStatus.SUCCESS, - junk_entries=(junk,), - ), - )) - assert summary.all_successful is True - - def test_all_clean_requires_zero_junk(self) -> None: - """all_clean is False when junk exists even if all_successful.""" - junk = Junk(content="j", span=Span(start=0, end=1)) - summary = LoadSummary(results=( - ResourceLoadResult( - "en", "ok.ftl", LoadStatus.SUCCESS, - junk_entries=(junk,), - ), - )) - assert summary.all_successful is True - assert summary.all_clean is False - - def test_all_clean_true_when_no_issues(self) -> None: - """all_clean is True when no errors, not_found, or junk.""" - summary = LoadSummary(results=( - ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), - )) - assert summary.all_clean is True - - def test_setattr_raises_attribute_error(self) -> None: - """LoadSummary rejects attribute mutation (frozen dataclass).""" - summary = self._make_summary() - with pytest.raises(AttributeError): - summary.results = () # type: ignore[misc] - - def test_delattr_raises_attribute_error(self) -> None: - """LoadSummary rejects attribute deletion (frozen dataclass).""" - summary = self._make_summary() - with pytest.raises(AttributeError): - del summary.results - - def test_eq_same_results_tuple(self) -> None: - """LoadSummary instances with same results tuple compare equal.""" - results = ( - ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), - ) - s1 = LoadSummary(results=results) - s2 = LoadSummary(results=results) - assert s1 == s2 - - def test_eq_not_implemented_for_other_types(self) -> None: - """LoadSummary equality returns NotImplemented for non-LoadSummary.""" - summary = self._make_summary() - assert summary != "not a summary" # type: ignore[comparison-overlap] - # Direct dunder call required to test NotImplemented sentinel - result = LoadSummary.__eq__(summary, "not a summary") # pylint: disable=unnecessary-dunder-call # type: ignore[arg-type] - assert result is NotImplemented - - def test_hash_consistent_with_eq(self) -> None: - """Equal LoadSummary instances sharing results have equal hashes.""" - results = ( - ResourceLoadResult("en", "ok.ftl", LoadStatus.SUCCESS), - ) - s1 = LoadSummary(results=results) - s2 = LoadSummary(results=results) - assert hash(s1) == hash(s2) - - def test_repr_includes_counts(self) -> None: - """LoadSummary repr includes all aggregate counts.""" - summary = self._make_summary() - r = repr(summary) - assert "LoadSummary(" in r - assert "total=4" in r - assert "ok=2" in r - - -class TestPathResourceLoaderInit: - """PathResourceLoader initialization edge cases.""" - - def test_empty_static_prefix_uses_cwd(self) -> None: - """base_path starting with {locale} uses cwd as root.""" - loader = PathResourceLoader("{locale}/resources") - assert loader._resolved_root == Path.cwd().resolve() - - def test_explicit_root_dir_overrides(self) -> None: - """Explicit root_dir overrides base_path derivation.""" - loader = PathResourceLoader( - "any/{locale}/path", root_dir="/tmp", - ) - assert loader._resolved_root == Path("/tmp").resolve() - - def test_trailing_separators_stripped(self) -> None: - """Trailing separators stripped from static prefix.""" - loader = PathResourceLoader("locales/{locale}////") - assert loader._resolved_root == Path("locales").resolve() - - def test_multiple_locale_placeholders(self) -> None: - """Multiple {locale} placeholders use first split part.""" - loader = PathResourceLoader("root/{locale}/sub/{locale}") - assert loader._resolved_root == Path("root").resolve() - - - -class TestHasAttribute: - """Tests for has_attribute fallback chain search.""" - - def test_attribute_in_primary_locale(self) -> None: - """has_attribute finds attribute in primary locale.""" - l10n = FluentLocalization(["en", "de"]) - l10n.add_resource("en", "btn = Click\n .tooltip = Help\n") - assert l10n.has_attribute("btn", "tooltip") is True - - def test_attribute_in_fallback_locale(self) -> None: - """has_attribute finds attribute in fallback locale.""" - l10n = FluentLocalization(["de", "en"]) - l10n.add_resource("en", "btn = Click\n .tooltip = Help\n") - assert l10n.has_attribute("btn", "tooltip") is True - - def test_attribute_not_found(self) -> None: - """has_attribute returns False for nonexistent attribute.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "msg = No attrs\n") - assert l10n.has_attribute("msg", "nonexistent") is False - - def test_message_not_found(self) -> None: - """has_attribute returns False for nonexistent message.""" - l10n = FluentLocalization(["en"]) - assert l10n.has_attribute("missing", "attr") is False - - - -class TestGetMessageIds: - """Tests for get_message_ids union across locales.""" - - def test_returns_union_of_ids(self) -> None: - """get_message_ids returns union across all locales.""" - l10n = FluentLocalization(["en", "de"]) - l10n.add_resource("en", "msg-a = A\nmsg-b = B\n") - l10n.add_resource("de", "msg-b = B2\nmsg-c = C\n") - ids = l10n.get_message_ids() - assert set(ids) == {"msg-a", "msg-b", "msg-c"} - - def test_no_duplicates(self) -> None: - """get_message_ids has no duplicate IDs.""" - l10n = FluentLocalization(["en", "de"]) - l10n.add_resource("en", "msg = A\n") - l10n.add_resource("de", "msg = B\n") - ids = l10n.get_message_ids() - assert len(ids) == len(set(ids)) - - def test_primary_locale_ids_first(self) -> None: - """Primary locale IDs appear before fallback IDs.""" - l10n = FluentLocalization(["en", "de"]) - l10n.add_resource("en", "alpha = A\n") - l10n.add_resource("de", "alpha = A2\nbeta = B\n") - ids = l10n.get_message_ids() - assert ids.index("alpha") < ids.index("beta") - - def test_empty_when_no_resources(self) -> None: - """get_message_ids is empty when no resources loaded.""" - l10n = FluentLocalization(["en"]) - assert l10n.get_message_ids() == [] - - - -class TestGetMessageVariables: - """Tests for get_message_variables with fallback.""" - - def test_returns_variable_names(self) -> None: - """get_message_variables returns frozenset of variable names.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource( - "en", "greeting = Hello { $firstName } { $lastName }!\n", - ) - variables = l10n.get_message_variables("greeting") - assert isinstance(variables, frozenset) - assert "firstName" in variables - assert "lastName" in variables - - def test_fallback_chain_search(self) -> None: - """get_message_variables searches fallback chain.""" - l10n = FluentLocalization(["de", "en"]) - l10n.add_resource("en", "msg = Value { $count }\n") - variables = l10n.get_message_variables("msg") - assert "count" in variables - - def test_raises_for_missing_message(self) -> None: - """get_message_variables raises KeyError for missing message.""" - l10n = FluentLocalization(["en"]) - with pytest.raises(KeyError, match="not found"): - l10n.get_message_variables("nonexistent") - - - -class TestGetAllMessageVariables: - """Tests for get_all_message_variables merged map.""" - - def test_returns_dict_of_variable_sets(self) -> None: - """get_all_message_variables returns dict mapping id -> frozenset.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource( - "en", "msg1 = { $name }\nmsg2 = Static\n", - ) - all_vars = l10n.get_all_message_variables() - assert isinstance(all_vars, dict) - assert "msg1" in all_vars - assert "name" in all_vars["msg1"] - assert "msg2" in all_vars - - def test_primary_locale_takes_precedence(self) -> None: - """Primary locale variables win for duplicate message IDs.""" - l10n = FluentLocalization(["en", "de"]) - l10n.add_resource("en", "msg = { $primary }\n") - l10n.add_resource("de", "msg = { $fallback }\n") - all_vars = l10n.get_all_message_variables() - assert "primary" in all_vars["msg"] - - def test_includes_fallback_only_messages(self) -> None: - """Messages only in fallback locales are included.""" - l10n = FluentLocalization(["en", "de"]) - l10n.add_resource("en", "en-only = { $x }\n") - l10n.add_resource("de", "de-only = { $y }\n") - all_vars = l10n.get_all_message_variables() - assert "en-only" in all_vars - assert "de-only" in all_vars - - - -class TestIntrospectTerm: - """Tests for introspect_term with fallback chain.""" - - def test_found_in_primary(self) -> None: - """introspect_term returns introspection from primary locale.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "-brand = Firefox\n") - info = l10n.introspect_term("brand") - assert info is not None - - def test_found_in_fallback(self) -> None: - """introspect_term searches fallback chain.""" - l10n = FluentLocalization(["de", "en"]) - l10n.add_resource("en", "-product = App\n") - info = l10n.introspect_term("product") - assert info is not None - - def test_not_found_returns_none(self) -> None: - """introspect_term returns None for missing term.""" - l10n = FluentLocalization(["en"]) - info = l10n.introspect_term("nonexistent") - assert info is None - - - -class TestStrictMode: - """Tests for FluentLocalization strict mode (fail-fast on errors).""" - - def test_strict_property_reflects_constructor(self) -> None: - """strict property returns constructor value.""" - l10n_strict = FluentLocalization(["en"], strict=True) - l10n_default = FluentLocalization(["en"]) - assert l10n_strict.strict is True - assert l10n_default.strict is True - - def test_strict_raises_on_missing_message(self) -> None: - """Strict mode raises FormattingIntegrityError for missing messages.""" - l10n = FluentLocalization(["en"], strict=True) - l10n.add_resource("en", "hello = Hello\n") - - with pytest.raises(FormattingIntegrityError) as exc_info: - l10n.format_value("nonexistent") - - err = exc_info.value - assert err.message_id == "nonexistent" - assert err.fallback_value is not None - assert len(err.fluent_errors) == 1 - ctx = err.context - assert ctx is not None - assert ctx.component == "localization" - assert ctx.operation == "format_pattern" - - def test_strict_raises_on_empty_message_id(self) -> None: - """Strict mode raises for empty/invalid message ID.""" - l10n = FluentLocalization(["en"], strict=True) - l10n.add_resource("en", "hello = Hello\n") - - with pytest.raises(FormattingIntegrityError) as exc_info: - l10n.format_value("") - - err = exc_info.value - assert err.message_id == "" - assert len(err.fluent_errors) == 1 - - def test_strict_format_pattern_raises_on_missing(self) -> None: - """Strict mode raises via format_pattern path.""" - l10n = FluentLocalization(["en"], strict=True) - l10n.add_resource("en", "hello = Hello\n") - - with pytest.raises(FormattingIntegrityError) as exc_info: - l10n.format_pattern("nonexistent") - - assert exc_info.value.message_id == "nonexistent" - - def test_strict_error_context_fields(self) -> None: - """Strict error includes component, operation, and count metadata.""" - l10n = FluentLocalization(["en"], strict=True) - - with pytest.raises(FormattingIntegrityError) as exc_info: - l10n.format_value("missing") - - err = exc_info.value - assert "failed:" in str(err) - ctx = err.context - assert ctx is not None - assert ctx.actual == "<1 error>" - assert ctx.expected == "" - - def test_strict_raises_on_invalid_args_type(self) -> None: - """Strict mode raises FormattingIntegrityError for invalid args type.""" - l10n = FluentLocalization(["en"], strict=True) - l10n.add_resource("en", "hello = Hello\n") - - with pytest.raises(FormattingIntegrityError) as exc_info: - l10n.format_pattern("hello", "not-a-mapping") # type: ignore[arg-type] - - err = exc_info.value - assert len(err.fluent_errors) == 1 - ctx = err.context - assert ctx is not None - assert ctx.component == "localization" - - def test_strict_raises_on_invalid_attribute_type(self) -> None: - """Strict mode raises FormattingIntegrityError for invalid attribute type.""" - l10n = FluentLocalization(["en"], strict=True) - l10n.add_resource("en", "hello = Hello\n") - - with pytest.raises(FormattingIntegrityError) as exc_info: - l10n.format_pattern( - "hello", attribute=42 # type: ignore[arg-type] - ) - - err = exc_info.value - assert len(err.fluent_errors) == 1 - ctx = err.context - assert ctx is not None - assert ctx.component == "localization" - - def test_non_strict_returns_fallback_on_invalid_args_type(self) -> None: - """Non-strict mode returns fallback for invalid args type without raising.""" - l10n = FluentLocalization(["en"], strict=False) - l10n.add_resource("en", "hello = Hello\n") - - _, errors = l10n.format_pattern("hello", "not-a-mapping") # type: ignore[arg-type] - assert len(errors) == 1 - - def test_strict_non_strict_returns_fallback(self) -> None: - """Non-strict mode returns fallback value without raising.""" - l10n = FluentLocalization(["en"], strict=False) - - result, errors = l10n.format_value("nonexistent") - assert "nonexistent" in result - assert len(errors) == 1 - - - -class TestResourceLoadingErrors: - """Tests for error handling during resource loading.""" - - def test_custom_loader_source_path_format(self) -> None: - """Non-PathResourceLoader uses locale/resource_id format.""" - - class DictLoader: - def load(self, locale: str, _resource_id: str) -> str: - return f"msg = Hello from {locale}\n" - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization( - ["en", "de"], ["main.ftl"], DictLoader(), - ) - summary = l10n.get_load_summary() - assert summary.total_attempted == 2 - for result in summary.results: - assert result.source_path is not None - assert "/" in result.source_path - - def test_oserror_recorded_as_error(self) -> None: - """OSError during loading recorded with ERROR status.""" - - class FailLoader: - def load( - self, _locale: str, _resource_id: str, - ) -> str: - msg = "Permission denied" - raise OSError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization(["en"], ["main.ftl"], FailLoader()) - summary = l10n.get_load_summary() - assert summary.errors == 1 - assert isinstance(summary.get_errors()[0].error, OSError) - - def test_valueerror_recorded_as_error(self) -> None: - """ValueError during loading recorded with ERROR status.""" - - class FailLoader: - def load( - self, _locale: str, _resource_id: str, - ) -> str: - msg = "Path traversal" - raise ValueError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization(["en"], ["main.ftl"], FailLoader()) - summary = l10n.get_load_summary() - assert summary.errors == 1 - assert isinstance(summary.get_errors()[0].error, ValueError) - - def test_file_not_found_recorded_as_not_found(self) -> None: - """FileNotFoundError recorded as NOT_FOUND status.""" - - class MissingLoader: - def load( - self, _locale: str, _resource_id: str, - ) -> str: - msg = "Not found" - raise FileNotFoundError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization(["en"], ["main.ftl"], MissingLoader()) - summary = l10n.get_load_summary() - assert summary.not_found == 1 - - def test_get_load_summary_returns_summary(self) -> None: - """get_load_summary returns LoadSummary from init phase.""" - l10n = FluentLocalization(["en"]) - summary = l10n.get_load_summary() - assert isinstance(summary, LoadSummary) - assert summary.total_attempted == 0 # No resource_ids provided - - -class TestBootValidation: - """Tests for FluentLocalization boot-time validation helpers.""" - - def test_require_clean_returns_summary_when_all_resources_are_clean(self) -> None: - """require_clean returns the immutable load summary on success.""" - l10n = FluentLocalization(["en"]) - - summary = l10n.require_clean() - - assert isinstance(summary, LoadSummary) - assert summary.all_clean is True - assert summary.total_attempted == 0 - - def test_require_clean_raises_integrity_error_for_unclean_summary(self) -> None: - """require_clean raises IntegrityCheckFailedError with structured context.""" - - class MissingLoader: - def load(self, _locale: str, _resource_id: str) -> str: - msg = "missing" - raise FileNotFoundError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization(["en"], ["main.ftl"], MissingLoader()) - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.require_clean() - - err = exc_info.value - assert "not clean" in str(err) - ctx = err.context - assert ctx is not None - assert ctx.component == "localization" - assert ctx.operation == "require_clean" - assert ctx.key == "en/main.ftl" - assert ctx.expected == "LoadSummary(all_clean=True)" - - def test_validate_message_schemas_returns_results_in_input_order(self) -> None: - """validate_message_schemas returns immutable validation results on success.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource( - "en", - "first = Hello { $name }\n" - "second = Balance { $amount }\n", - ) - - results = l10n.validate_message_schemas({ - "first": frozenset({"name"}), - "second": frozenset({"amount"}), - }) - - assert [result.message_id for result in results] == ["first", "second"] - assert all(result.is_valid for result in results) - - def test_validate_message_schemas_uses_fallback_chain(self) -> None: - """Schema validation resolves messages from fallback locales.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("en", "welcome = Hello { $name }\n") - - results = l10n.validate_message_schemas({ - "welcome": frozenset({"name"}), - }) - - assert len(results) == 1 - assert results[0].is_valid is True - - def test_validate_message_variables_returns_single_result(self) -> None: - """Single-message boot validation returns the exact validation result.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "invoice = Total { $amount } for { $customer }\n") - - result = l10n.validate_message_variables( - "invoice", - frozenset({"amount", "customer"}), - ) - - assert result.message_id == "invoice" - assert result.is_valid is True - - def test_validate_message_variables_uses_fallback_chain(self) -> None: - """Single-message validation resolves through localization fallback.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("en", "welcome = Hello { $name }\n") - - result = l10n.validate_message_variables("welcome", frozenset({"name"})) - - assert result.message_id == "welcome" - assert result.is_valid is True - - def test_validate_message_schemas_raises_for_missing_message(self) -> None: - """Missing messages fail boot validation with an integrity exception.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "present = Hello\n") - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.validate_message_schemas({"missing": frozenset()}) - - err = exc_info.value - assert "missing: not found" in str(err) - ctx = err.context - assert ctx is not None - assert ctx.operation == "validate_message_schemas" - assert ctx.key == "missing" - assert ctx.actual == "missing_messages=1" - - def test_validate_message_variables_raises_for_missing_message(self) -> None: - """Missing single-message validation raises IntegrityCheckFailedError.""" - l10n = FluentLocalization(["en"]) - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.validate_message_variables("missing", frozenset()) - - err = exc_info.value - assert "missing: not found" in str(err) - ctx = err.context - assert ctx is not None - assert ctx.operation == "validate_message_variables" - assert ctx.key == "missing" - assert ctx.actual == "missing_messages=1" - - def test_validate_message_schemas_raises_for_exact_schema_mismatch(self) -> None: - """Extra or missing variables fail exact boot schema validation.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "checkout = Total { $amount } for { $customer }\n") - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.validate_message_schemas({ - "checkout": frozenset({"amount"}), - }) - - err = exc_info.value - assert "checkout: extra {customer}" in str(err) - ctx = err.context - assert ctx is not None - assert ctx.operation == "validate_message_schemas" - assert ctx.key == "checkout" - assert ctx.actual == "schema_mismatches=1" - - def test_validate_message_variables_raises_for_exact_schema_mismatch(self) -> None: - """Single-message validation raises on exact-schema mismatch.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "checkout = Total { $amount } for { $customer }\n") - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.validate_message_variables("checkout", frozenset({"amount"})) - - err = exc_info.value - assert "checkout: extra {customer}" in str(err) - ctx = err.context - assert ctx is not None - assert ctx.operation == "validate_message_variables" - assert ctx.key == "checkout" - assert ctx.actual == "schema_mismatches=1" - - -class TestCacheStatsBranch: - """Tests for get_cache_stats aggregation branch.""" - - def test_aggregates_across_multiple_bundles(self) -> None: - """get_cache_stats sums metrics across all bundles.""" - l10n = FluentLocalization( - ["en", "de"], cache=CacheConfig(size=500), - ) - l10n.add_resource("en", "msg = Hello\n") - l10n.add_resource("de", "msg = Hallo\n") - - # Format to create cache entries - l10n.format_value("msg") - - stats = l10n.get_cache_stats() - assert stats is not None - assert stats["bundle_count"] == 2 - assert stats["maxsize"] == 1000 # 500 * 2 - - def test_empty_bundles_returns_zero_stats(self) -> None: - """get_cache_stats returns zero stats with no initialized bundles.""" - l10n = FluentLocalization(["en"], cache=CacheConfig()) - stats = l10n.get_cache_stats() - assert stats is not None - assert stats["bundle_count"] == 0 - assert stats["size"] == 0 - - def test_hit_rate_calculated_correctly(self) -> None: - """Hit rate is hits/(hits+misses)*100.""" - l10n = FluentLocalization(["en"], cache=CacheConfig()) - l10n.add_resource("en", "msg = Hello\n") - l10n.format_value("msg") # miss - l10n.format_value("msg") # hit - stats = l10n.get_cache_stats() - assert stats is not None - assert stats["hit_rate"] == 50.0 - - def test_skips_bundle_with_no_cache(self) -> None: - """Bundles returning None from get_cache_stats are skipped.""" - l10n = FluentLocalization( - ["en", "de"], cache=CacheConfig(size=100), - ) - # Create cached bundle for "en" - l10n.add_resource("en", "msg = Hello\n") - l10n.format_value("msg") - - # Inject a no-cache bundle for "de" directly - no_cache_bundle = FluentBundle("de") - no_cache_bundle.add_resource("msg = Hallo\n") - l10n._bundles["de"] = no_cache_bundle - - stats = l10n.get_cache_stats() - assert stats is not None - # Only "en" bundle contributes stats - assert stats["bundle_count"] == 2 - assert stats["maxsize"] == 100 # Only en's maxsize - - -class TestCacheAuditLogBranch: - """Tests for get_cache_audit_log per-locale audit access.""" - - def test_returns_none_when_caching_disabled(self) -> None: - """get_cache_audit_log() returns None when localization caching is disabled.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "msg = Hello\n") - - assert l10n.get_cache_audit_log() is None - - def test_returns_empty_mapping_when_no_bundles_initialized(self) -> None: - """get_cache_audit_log() does not create bundles during inspection.""" - l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_audit=True)) - - audit_logs = l10n.get_cache_audit_log() - assert audit_logs == {} - - def test_returns_per_locale_write_log_entries(self) -> None: - """get_cache_audit_log() returns immutable CacheAuditLogEntry tuples per locale.""" - l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_audit=True)) - l10n.add_resource("en", "msg = Hello\n") - l10n.add_resource("de", "msg = Hallo\n") - - l10n.format_value("msg") - l10n.format_value("msg") - - audit_logs = l10n.get_cache_audit_log() - assert audit_logs is not None - assert list(audit_logs) == ["en", "de"] - assert [entry.operation for entry in audit_logs["en"]] == ["MISS", "PUT", "HIT"] - assert audit_logs["de"] == () - assert all(isinstance(entry, CacheAuditLogEntry) for entry in audit_logs["en"]) - - @given(enable_audit=st.booleans(), locales=locale_chains(min_size=1, max_size=3)) - @settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture]) - def test_property_audit_log_tracks_initialized_locales( - self, enable_audit: bool, locales: list[str] - ) -> None: - """PROPERTY: get_cache_audit_log() uses canonical locale keys.""" - l10n = FluentLocalization(locales, cache=CacheConfig(enable_audit=enable_audit)) - for locale in locales: - l10n.add_resource(locale, "msg = Hello\n") - - l10n.format_value("msg") - - audit_logs = l10n.get_cache_audit_log() - assert audit_logs is not None - normalized_locales = [normalize_locale(locale) for locale in locales] - assert list(audit_logs) == normalized_locales - - event(f"audit={'enabled' if enable_audit else 'disabled'}") - event(f"locale_count={len(locales)}") - - if enable_audit: - assert len(audit_logs[normalized_locales[0]]) >= 2 - assert all( - isinstance(entry, CacheAuditLogEntry) - for entry in audit_logs[normalized_locales[0]] - ) - else: - assert all(log == () for log in audit_logs.values()) - - -class TestFormatPattern: - """Tests for format_pattern fallback chain edge cases.""" - - def test_format_pattern_not_found_returns_braced_id(self) -> None: - """format_pattern returns {message_id} when not found in any locale.""" - l10n = FluentLocalization(["en", "de"], strict=False) - result, errors = l10n.format_pattern("missing") - assert result == "{missing}" - assert len(errors) == 1 - - def test_format_pattern_primary_locale_skips_fallback_callback( - self, - ) -> None: - """format_pattern does not invoke on_fallback for primary locale.""" - from ftllexengine.localization import FallbackInfo # noqa: PLC0415 - import inside function - events: list[FallbackInfo] = [] - l10n = FluentLocalization( - ["en", "de"], on_fallback=events.append, - use_isolating=False, - ) - l10n.add_resource("en", "msg = Primary") - result, errors = l10n.format_pattern("msg") - assert result == "Primary" - assert errors == () - assert len(events) == 0 - - -class TestRepr: - """Tests for __repr__ format.""" - - def test_includes_locales_and_bundle_count(self) -> None: - """__repr__ shows locales and initialized/total bundles.""" - l10n = FluentLocalization(["en", "de"]) - r = repr(l10n) - assert "FluentLocalization" in r - assert "locales=('en', 'de')" in r - assert "bundles=0/2" in r - - def test_bundle_count_updates_after_access(self) -> None: - """__repr__ bundle count reflects initialized bundles.""" - l10n = FluentLocalization(["en", "de"]) - l10n.add_resource("en", "msg = test") - r = repr(l10n) - assert "bundles=1/2" in r - - -# --------------------------------------------------------------------------- -# Property-based orchestration tests (migrated from test_localization_hypothesis) -# --------------------------------------------------------------------------- - - -class TestOrchestrationProperties: - """Property-based tests for orchestration invariants. - - Standard @given tests with bounded strategies. Run in CI (no fuzz marker). - """ - - @given( - locales=locale_chains(min_size=1, max_size=3), - message_id=message_ids(), - message_value=st.text(min_size=1, max_size=100), - ) - def test_format_value_never_crashes( - self, - locales: list[str], - message_id: str, - message_value: str, - ) -> None: - """format_value never crashes regardless of input (robustness).""" - event(f"locale_count={len(locales)}") - val_class = ( - "short" if len(message_value) <= 10 - else "medium" if len(message_value) <= 50 - else "long" - ) - event(f"value_len={val_class}") - l10n = FluentLocalization(locales, strict=False) - ftl_source = f"{message_id} = {message_value}" - l10n.add_resource(locales[0], ftl_source) - result, errors = l10n.format_value(message_id) - assert isinstance(result, str) - assert isinstance(errors, tuple) - - @given( - locales=locale_chains(min_size=2, max_size=5), - message_id=message_ids(), - target_locale_idx=st.integers(min_value=0, max_value=4), - ) - def test_fallback_uses_first_available_locale( - self, - locales: list[str], - message_id: str, - target_locale_idx: int, - ) -> None: - """Fallback resolves from first locale in chain that has message.""" - idx = min(target_locale_idx, len(locales) - 1) - event(f"target_idx={idx}") - l10n = FluentLocalization(locales) - target_locale = locales[idx] - l10n.add_resource( - target_locale, f"{message_id} = From {target_locale}", - ) - result, errors = l10n.format_value(message_id) - assert f"From {target_locale}" in result - assert not any( - "not found in any locale" in str(e) for e in errors - ) - - @given( - locales=locale_chains(min_size=1, max_size=3), - num_messages=st.integers(min_value=1, max_value=10), - ) - def test_partial_translations_use_correct_fallback( - self, locales: list[str], num_messages: int, - ) -> None: - """Partial translations correctly fall back per message.""" - event(f"num_messages={num_messages}") - has_fallback = len(locales) > 1 - event(f"has_fallback={has_fallback}") - l10n = FluentLocalization(locales, strict=False) - - msg_ids = [f"msg-{i}" for i in range(num_messages)] - - first_msgs = [m for i, m in enumerate(msg_ids) if i % 2 == 0] - if first_msgs: - ftl = "\n".join(f"{m} = First locale" for m in first_msgs) - l10n.add_resource(locales[0], ftl) - - if has_fallback: - last_msgs = [ - m for i, m in enumerate(msg_ids) if i % 2 == 1 - ] - if last_msgs: - ftl = "\n".join( - f"{m} = Last locale" for m in last_msgs - ) - l10n.add_resource(locales[-1], ftl) - - for idx, mid in enumerate(msg_ids): - result, errors = l10n.format_value(mid) - missing = any( - "not found in any locale" in str(e) - for e in errors - ) - if idx % 2 == 0: - assert "First locale" in result or missing - elif has_fallback: - assert "Last locale" in result or missing - - @settings(suppress_health_check=[HealthCheck.function_scoped_fixture]) - @given( - locales=locale_chains(min_size=1, max_size=3), - message_id=message_ids(), - ) - def test_loader_integration_deterministic( - self, - tmp_path: Path, - locales: list[str], - message_id: str, - ) -> None: - """Loader integration produces identical results across instances.""" - event(f"locale_count={len(locales)}") - locales_dir = tmp_path / "locales" - for idx, locale in enumerate(locales): - locale_dir = locales_dir / normalize_locale(locale) - locale_dir.mkdir(parents=True, exist_ok=True) - (locale_dir / "main.ftl").write_text( - f"{message_id} = Value {idx}", encoding="utf-8", - ) - - loader = PathResourceLoader(str(locales_dir / "{locale}")) - - l10n1 = FluentLocalization(locales, ["main.ftl"], loader) - result1, _ = l10n1.format_value(message_id) - - l10n2 = FluentLocalization(locales, ["main.ftl"], loader) - result2, _ = l10n2.format_value(message_id) - - assert result1 == result2 - - @given( - locales=locale_chains(min_size=2, max_size=4), - message_id=message_ids(), - ) - def test_locale_order_affects_resolution( - self, locales: list[str], message_id: str, - ) -> None: - """Reversing locale order changes which bundle resolves message.""" - event(f"locale_count={len(locales)}") - l10n_fwd = FluentLocalization(locales) - l10n_rev = FluentLocalization(list(reversed(locales))) - - first_msg = f"{message_id} = From {locales[0]}" - last_msg = f"{message_id} = From {locales[-1]}" - - l10n_fwd.add_resource(locales[0], first_msg) - l10n_fwd.add_resource(locales[-1], last_msg) - - l10n_rev.add_resource(locales[0], first_msg) - l10n_rev.add_resource(locales[-1], last_msg) - - result_fwd, _ = l10n_fwd.format_value(message_id) - result_rev, _ = l10n_rev.format_value(message_id) - - if len(locales) > 1: - assert result_fwd != result_rev - - @given( - locales=locale_chains(min_size=1, max_size=1), - message_id=message_ids(), - value1=st.text( - alphabet=st.characters(whitelist_categories=("L", "N")), - min_size=1, max_size=50, - ), - value2=st.text( - alphabet=st.characters(whitelist_categories=("L", "N")), - min_size=1, max_size=50, - ), - ) - def test_add_resource_twice_uses_latest( - self, - locales: list[str], - message_id: str, - value1: str, - value2: str, - ) -> None: - """Adding resource twice uses latest value (override property).""" - event("outcome=override") - locale = locales[0] - l10n = FluentLocalization([locale]) - - l10n.add_resource(locale, f"{message_id} = {value1}") - result1, _ = l10n.format_value(message_id) - - l10n.add_resource(locale, f"{message_id} = {value2}") - result2, _ = l10n.format_value(message_id) - - assert value1 in result1 or value2 in result1 - assert value2 in result2 - - -# =========================================================================== -# FORMATTING INTEGRITY ERROR RE-RAISE (lines 690-703) -# =========================================================================== - - -class TestFormattingIntegrityErrorReraise: - """FluentLocalization re-raises FormattingIntegrityError with corrected component. - - Lines 690-703: the except FormattingIntegrityError block in format_pattern - fires when the bundle raises in strict mode and the message exists in the - bundle. The orchestrator must re-raise with component='localization'. - """ - - def test_strict_localization_reraises_with_localization_component(self) -> None: - """Strict FluentLocalization re-raises FormattingIntegrityError. - - Covers lines 690-703: the except block that replaces the 'bundle' - component with 'localization' in the IntegrityContext before re-raising. - """ - l10n = FluentLocalization(["en"], strict=True) - l10n.add_resource("en", "test-msg = Hello { $name }!") - - # Calling format_pattern without the required $name argument causes - # VARIABLE_NOT_PROVIDED error. In strict mode the bundle raises - # FormattingIntegrityError, which the orchestrator catches and re-raises. - with pytest.raises(FormattingIntegrityError) as exc_info: - l10n.format_pattern("test-msg", {}) - - exc = exc_info.value - assert exc.context is not None - assert exc.context.component == "localization" - assert len(exc.fluent_errors) > 0 - assert exc.message_id == "test-msg" - - -class TestGetMessageAST: - """FluentLocalization.get_message() returns the parsed Message AST from the fallback chain.""" - - def test_existing_message_primary_locale(self) -> None: - """get_message returns the Message from the primary locale.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "greeting = Hello, { $name }!") - - msg = l10n.get_message("greeting") - - assert msg is not None - assert isinstance(msg, Message) - assert msg.id.name == "greeting" - - def test_missing_message_returns_none(self) -> None: - """get_message returns None when no locale contains the message.""" - l10n = FluentLocalization(["en", "lv"]) - l10n.add_resource("en", "hello = Hello!") - - assert l10n.get_message("nonexistent") is None - - def test_fallback_chain_used_when_primary_missing(self) -> None: - """get_message falls back to secondary locale when primary lacks the message.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("en", "greeting = Hello!") - # lv has no "greeting" resource - - msg = l10n.get_message("greeting") - - assert msg is not None - assert isinstance(msg, Message) - assert msg.id.name == "greeting" - - def test_primary_locale_wins_when_both_have_message(self) -> None: - """Primary locale's Message is returned when multiple locales have the message.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("lv", "greeting = Sveiki!") - l10n.add_resource("en", "greeting = Hello!") - - msg = l10n.get_message("greeting") - - assert msg is not None - assert msg.id.name == "greeting" - # Verify it's the primary locale's message by checking a separate bundle - lv_bundle = FluentBundle("lv", use_isolating=False) - lv_bundle.add_resource("greeting = Sveiki!") - lv_msg = lv_bundle.get_message("greeting") - assert lv_msg is not None - assert msg is not lv_msg # Different bundle instances, same message id - - def test_empty_localization_returns_none(self) -> None: - """get_message returns None when no resources have been added.""" - l10n = FluentLocalization(["en"]) - - assert l10n.get_message("anything") is None - - def test_get_message_result_usable_with_validate_message_variables(self) -> None: - """get_message result can be passed to validate_message_variables().""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "greeting = Hello, { $name }!") - - msg = l10n.get_message("greeting") - assert msg is not None - - result = validate_message_variables(msg, frozenset({"name"})) - assert result.is_valid - assert result.declared_variables == frozenset({"name"}) - - -class TestGetTermAST: - """FluentLocalization.get_term() returns the parsed Term AST from the fallback chain.""" - - def test_existing_term_primary_locale(self) -> None: - """get_term returns the Term from the primary locale.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "-brand = Firefox") - - term = l10n.get_term("brand") - - assert term is not None - assert isinstance(term, Term) - assert term.id.name == "brand" - - def test_missing_term_returns_none(self) -> None: - """get_term returns None when no locale contains the term.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "hello = Hello!") - - assert l10n.get_term("nonexistent") is None - - def test_fallback_chain_used_for_term(self) -> None: - """get_term falls back to secondary locale when primary lacks the term.""" - l10n = FluentLocalization(["lv", "en"]) - l10n.add_resource("en", "-brand = Firefox") - # lv has no "-brand" resource - - term = l10n.get_term("brand") - - assert term is not None - assert isinstance(term, Term) - assert term.id.name == "brand" - - def test_term_id_without_leading_dash(self) -> None: - """-brand is accessed as get_term('brand'), not get_term('-brand').""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "-brand = Firefox") - - assert l10n.get_term("brand") is not None - assert l10n.get_term("-brand") is None - - def test_get_message_does_not_return_terms(self) -> None: - """get_message does not return terms (separate namespaces).""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "-brand = Firefox") - - assert l10n.get_message("brand") is None - - def test_get_term_does_not_return_messages(self) -> None: - """get_term does not return messages (separate namespaces).""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "brand = Firefox") - - assert l10n.get_term("brand") is None - - -class TestDescribeUncleanLoadResult: - """Tests for _describe_unclean_load_result private helper. - - Called by require_clean() to build the error detail string. Tested - directly to cover the error=None (UnknownError) and junk branches. - """ - - def test_error_result_with_none_error_uses_unknown_error(self) -> None: - """When result.is_error is True but error is None, name is 'UnknownError'.""" - result = ResourceLoadResult("en", "bad.ftl", LoadStatus.ERROR, error=None) - l10n = FluentLocalization(["en"]) - - key, detail = l10n._describe_unclean_load_result(result) - - assert key == "en/bad.ftl" - assert detail == "load error (UnknownError)" - - def test_error_result_with_actual_error_uses_type_name(self) -> None: - """When result.error is not None, type name is used in the description.""" - result = ResourceLoadResult( - "en", "bad.ftl", LoadStatus.ERROR, error=OSError("disk fail"), - ) - l10n = FluentLocalization(["en"]) - - _key, detail = l10n._describe_unclean_load_result(result) - - assert "OSError" in detail - - def test_junk_result_describes_junk_entry_count(self) -> None: - """Junk branch returns description with junk entry count.""" - junk = Junk(content="bad syntax", span=Span(start=0, end=10)) - result = ResourceLoadResult( - "en", "partial.ftl", LoadStatus.SUCCESS, junk_entries=(junk,), - ) - l10n = FluentLocalization(["en"]) - - key, detail = l10n._describe_unclean_load_result(result) - - assert key == "en/partial.ftl" - assert "1 junk entry" in detail - - def test_junk_plural_with_two_entries(self) -> None: - """Two junk entries use 'entries' plural noun.""" - junk1 = Junk(content="bad1", span=Span(start=0, end=4)) - junk2 = Junk(content="bad2", span=Span(start=5, end=9)) - result = ResourceLoadResult( - "en", "partial.ftl", LoadStatus.SUCCESS, - junk_entries=(junk1, junk2), - ) - l10n = FluentLocalization(["en"]) - - _key, detail = l10n._describe_unclean_load_result(result) - - assert "2 junk entries" in detail - - -class TestRequireCleanCleanBeforeProblematic: - """Tests for require_clean when the first result in summary is clean. - - The for-loop in require_clean iterates summary.results looking for the - first non-clean result. When results[0] is clean, the inner if-condition - is False for that iteration (the loop-continue branch), and iteration - advances to the next element. - """ - - def test_first_clean_second_not_found_raises_with_correct_key(self) -> None: - """require_clean iterates past a clean first result to find the bad one.""" - - class PartialLoader: - def load(self, _locale: str, resource_id: str) -> str: - if resource_id == "first.ftl": - return "msg = Hello\n" - msg = "missing" - raise FileNotFoundError(msg) - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization( - ["en"], ["first.ftl", "second.ftl"], PartialLoader(), - ) - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.require_clean() - - ctx = exc_info.value.context - assert ctx is not None - # second.ftl is the first non-clean result; first.ftl was clean - assert "second.ftl" in (ctx.key or "") - - -class TestRequireCleanJunkBranch: - """Tests for require_clean that trigger the junk description branch.""" - - def test_require_clean_raises_with_junk_detail(self) -> None: - """require_clean raises when the loader produces a resource with junk entries. - - strict=False: testing load summary junk tracking; junk entries must be - captured in the ResourceLoadResult, not raised as SyntaxIntegrityError. - """ - - class JunkLoader: - def load(self, _locale: str, _resource_id: str) -> str: - # "bad-junk" is not valid FTL syntax; produces a Junk AST node - return "bad-junk\n" - - def describe_path(self, locale: str, resource_id: str) -> str: - return f"{locale}/{resource_id}" - - l10n = FluentLocalization( - ["en"], ["main.ftl"], JunkLoader(), strict=False, - ) - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.require_clean() - - assert "junk" in str(exc_info.value).lower() - - -class TestFormatSchemaDifferenceMissingVariables: - """Tests for _format_schema_difference when only missing_variables is set. - - Existing tests cover the extra_variables path (message declares more vars - than expected). These tests cover the missing_variables path (expected vars - not found in message) and the False branch of 'if validation.extra_variables'. - """ - - def test_missing_variables_only_reported(self) -> None: - """Schema diff reports missing variables when message uses fewer than expected.""" - l10n = FluentLocalization(["en"]) - # Message uses no variables; expected schema requires $amount - l10n.add_resource("en", "invoice = Static total\n") - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.validate_message_schemas({ - "invoice": frozenset({"amount"}), - }) - - err = exc_info.value - # Must describe the missing variable - assert "missing {amount}" in str(err) - - def test_validate_message_variables_missing_variable_raises(self) -> None: - """Single-message validation reports missing variable in error message.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource("en", "price = Free\n") - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.validate_message_variables("price", frozenset({"cost"})) - - assert "missing {cost}" in str(exc_info.value) - - -class TestValidateMessageSchemasTruncation: - """Tests for validate_message_schemas 'N more issues' truncation. - - When 4 or more messages fail validation, mismatches[:3] is taken and - the remaining count is appended as '... N more issue(s)'. - """ - - def test_four_mismatches_appends_remaining_count(self) -> None: - """Four schema mismatches trigger 'N more issue' truncation.""" - l10n = FluentLocalization(["en"]) - l10n.add_resource( - "en", - "m1 = { $a }\nm2 = { $a }\nm3 = { $a }\nm4 = { $a }\n", - ) - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - # All four messages have $a extra (expected empty schema) - l10n.validate_message_schemas({ - "m1": frozenset(), - "m2": frozenset(), - "m3": frozenset(), - "m4": frozenset(), - }) - - err_str = str(exc_info.value) - assert "more issue" in err_str - - def test_five_mismatches_pluralises_noun(self) -> None: - """Five mismatches produce '2 more issues' (plural noun).""" - l10n = FluentLocalization(["en"]) - l10n.add_resource( - "en", - "m1 = { $a }\nm2 = { $a }\nm3 = { $a }\nm4 = { $a }\nm5 = { $a }\n", - ) - - with pytest.raises(IntegrityCheckFailedError) as exc_info: - l10n.validate_message_schemas({ - "m1": frozenset(), - "m2": frozenset(), - "m3": frozenset(), - "m4": frozenset(), - "m5": frozenset(), - }) - - err_str = str(exc_info.value) - assert "more issues" in err_str - - -class TestGetCacheAuditLogBundleWithoutCache: - """Tests for get_cache_audit_log when a bundle in _bundles has no cache. - - When bundle.get_cache_audit_log() returns None (bundle has no cache - configured), that bundle's locale is excluded from the audit_logs dict. - This exercises the ``if audit_log is not None:`` False branch. - """ - - def test_bundle_without_cache_excluded_from_audit_log(self) -> None: - """Locale with a no-cache bundle is absent from the audit log mapping.""" - l10n = FluentLocalization( - ["en", "de"], cache=CacheConfig(enable_audit=True), - ) - l10n.add_resource("en", "msg = Hello\n") - l10n.format_value("msg") - - # Inject a bundle with no cache for "de"; get_cache_audit_log() returns None - no_cache_bundle = FluentBundle("de") - no_cache_bundle.add_resource("msg = Hallo\n") - l10n._bundles["de"] = no_cache_bundle - - audit_logs = l10n.get_cache_audit_log() - - assert audit_logs is not None - assert "en" in audit_logs - assert "de" not in audit_logs +from tests.localization_orchestration_cases.ast_and_cleanup import * # noqa: F403 - split module reuses shared support imports +from tests.localization_orchestration_cases.cache_and_properties import * # noqa: F403 - split module reuses shared support imports +from tests.localization_orchestration_cases.load_and_lookup import * # noqa: F403 - split module reuses shared support imports +from tests.localization_orchestration_cases.strict_and_boot import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_localization_validation.py b/tests/test_localization_validation.py index b7996dbf..a9c9351f 100644 --- a/tests/test_localization_validation.py +++ b/tests/test_localization_validation.py @@ -24,6 +24,7 @@ from hypothesis import event, given from hypothesis import strategies as st +from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT from ftllexengine.localization import ( FluentLocalization, LoadStatus, @@ -192,6 +193,13 @@ def test_add_resource_locale_tab_character_explicit(self) -> None: assert errors == () assert l10n.locales == ("en",) + def test_constructor_rejects_unknown_well_formed_boundary_locale(self) -> None: + """Constructor rejects unknown locales even when the boundary string is well formed.""" + boundary_locale = "a" + ("b" * (MAX_LOCALE_LENGTH_HARD_LIMIT - 2)) + "C" + + with pytest.raises(ValueError, match="Unknown locale identifier"): + FluentLocalization(["en", f" {boundary_locale} "], strict=False) + class TestFormatValueInvalidArgsTypeValidation: """Test FluentLocalization.format_value with invalid args type.""" diff --git a/tests/test_parsing_currency.py b/tests/test_parsing_currency.py index 42681701..b514e86b 100644 --- a/tests/test_parsing_currency.py +++ b/tests/test_parsing_currency.py @@ -1,1292 +1,20 @@ -"""Tests for currency parsing: parse_currency(), symbol resolution, CLDR maps. - -Property-based tests using Hypothesis cover: -- Roundtrip: format -> parse -> verify for unambiguous/ISO inputs -- Locale resilience: arbitrary locales never crash -- Invalid input: no-digit strings always fail -- Ambiguous resolution: locale-aware symbol disambiguation -- CLDR map integrity: type contracts and coverage invariants - -Unit tests cover specification examples and targeted edge cases. - -parse_currency() returns tuple[tuple[Decimal, str] | None, tuple[FrozenFluentError, ...]]. -Functions never raise exceptions (errors returned in tuple) except -BabelImportError when Babel is not installed. - -Python 3.13+. -""" - -from __future__ import annotations - -import builtins -import re -from decimal import Decimal -from typing import Any -from unittest.mock import MagicMock, patch - -import pytest -from babel import UnknownLocaleError -from hypothesis import event, given, settings -from hypothesis import strategies as st - -from ftllexengine.parsing import currency as currency_module -from ftllexengine.parsing.currency import ( - _build_currency_maps_from_cldr, - _get_currency_maps, - parse_currency, - resolve_ambiguous_symbol, -) -from tests.strategies.currency import ( - ambiguous_currency_inputs, - invalid_currency_inputs, - iso_code_currency_inputs, - unambiguous_currency_inputs, -) - -# --------------------------------------------------------------------------- -# Property: Unambiguous symbols always parse successfully -# --------------------------------------------------------------------------- - - -class TestUnambiguousCurrencyParsing: - """Property-based tests for unambiguous currency parsing.""" - - @settings(deadline=500) # CLDR map build on first call exceeds 200ms - @given(data=unambiguous_currency_inputs()) - def test_unambiguous_symbol_parses( - self, data: tuple[str, str, str] - ) -> None: - """PROPERTY: Unambiguous symbols and ISO codes always parse.""" - value, locale, expected_code = data - event(f"expected_code={expected_code}") - - result, errors = parse_currency(value, locale) - # Unambiguous symbols should parse without error - if result is not None: - amount, code = result - assert code == expected_code - assert isinstance(amount, Decimal) - assert errors == () - - -# --------------------------------------------------------------------------- -# Property: ISO code inputs always resolve correctly -# --------------------------------------------------------------------------- - - -class TestISOCodeParsing: - """Property-based tests for ISO code currency parsing.""" - - @given(data=iso_code_currency_inputs()) - def test_iso_code_parses_to_correct_currency( - self, data: tuple[str, str, str] - ) -> None: - """PROPERTY: ISO codes resolve to the correct currency.""" - value, locale, expected_code = data - event(f"iso_code={expected_code}") - - result, errors = parse_currency(value, locale) - assert result is not None, f"Failed to parse: {value!r} ({locale})" - amount, code = result - assert code == expected_code - assert isinstance(amount, Decimal) - assert errors == () - - -# --------------------------------------------------------------------------- -# Property: Invalid inputs never crash, always return errors -# --------------------------------------------------------------------------- - - -class TestInvalidCurrencyInputs: - """Property-based tests for invalid currency input handling.""" - - @given(data=invalid_currency_inputs()) - def test_invalid_input_returns_error( - self, data: tuple[str, str] - ) -> None: - """PROPERTY: Invalid inputs return error tuple, never crash.""" - value, locale = data - is_empty = value == "" - event(f"is_empty={is_empty}") - - result, errors = parse_currency(value, locale) - assert result is None - assert len(errors) > 0 - - @given( - invalid_value=st.text(min_size=1, max_size=30).filter( - lambda x: not any(c.isdigit() for c in x) - ) - ) - def test_no_digits_always_fails( - self, invalid_value: str - ) -> None: - """PROPERTY: Values without digits always fail to parse.""" - has_currency_char = any( - c in invalid_value for c in "\u20ac$\u00a3\u00a5\u20b9" - ) - event(f"has_currency_char={has_currency_char}") - val_len = "short" if len(invalid_value) <= 5 else "long" - event(f"value_length={val_len}") - - result, _ = parse_currency(invalid_value, "en_US") - assert result is None - - -# --------------------------------------------------------------------------- -# Property: Arbitrary locales never crash -# --------------------------------------------------------------------------- - - -class TestLocaleResilience: - """Property-based tests for locale robustness.""" - - @given( - bad_locale=st.text( - alphabet=st.characters(blacklist_categories=["Cs"]), - min_size=1, - max_size=20, - ).filter(lambda x: x not in ["en", "en_US", "de_DE", "fr_FR"]) - ) - def test_arbitrary_locales_never_crash( - self, bad_locale: str - ) -> None: - """PROPERTY: Invalid locales never crash currency parsing.""" - locale_len = "short" if len(bad_locale) <= 5 else "long" - event(f"locale_length={locale_len}") - has_underscore = "_" in bad_locale - event(f"has_underscore={has_underscore}") - - result, errors = parse_currency("\u20ac50", bad_locale) - assert result is None or isinstance(result, tuple) - if result is None: - assert len(errors) > 0 - - -# --------------------------------------------------------------------------- -# Property: Ambiguous symbols with locale inference -# --------------------------------------------------------------------------- - - -class TestAmbiguousSymbolResolution: - """Property-based tests for ambiguous symbol resolution.""" - - @given(data=ambiguous_currency_inputs()) - def test_ambiguous_with_default_resolves( - self, data: tuple[str, str, str, str] - ) -> None: - """PROPERTY: Ambiguous symbols with default_currency resolve.""" - value, locale, default_currency, expected = data - event(f"locale={locale}") - - result, errors = parse_currency( - value, locale, default_currency=default_currency, - ) - if result is not None: - _, code = result - assert code == expected - assert errors == () - - @given( - locale_currency=st.sampled_from([ - ("en_US", "USD"), ("en_CA", "CAD"), - ("en_AU", "AUD"), ("en_NZ", "NZD"), - ("es_MX", "MXN"), ("es_AR", "ARS"), - ]) - ) - def test_dollar_locale_inference( - self, locale_currency: tuple[str, str] - ) -> None: - """PROPERTY: $ with infer_from_locale resolves per locale.""" - locale, expected = locale_currency - event(f"dollar_locale={locale}") - - result, errors = parse_currency( - "$100", locale, infer_from_locale=True, - ) - assert result is not None, ( - f"$ should resolve via locale {locale}" - ) - _, code = result - assert code == expected - assert errors == () - - -# --------------------------------------------------------------------------- -# resolve_ambiguous_symbol: Locale prefix fallback -# --------------------------------------------------------------------------- - - -class TestResolveAmbiguousSymbolLocalePrefix: - """Test resolve_ambiguous_symbol locale prefix matching.""" - - def test_yen_sign_with_zh_cn_uses_prefix(self) -> None: - """Yen sign resolves to CNY via zh prefix for zh_CN.""" - result = resolve_ambiguous_symbol("\u00a5", "zh_CN") - assert result == "CNY" - - def test_yen_sign_with_zh_tw_uses_prefix(self) -> None: - """Yen sign resolves to CNY via zh prefix for zh_TW.""" - result = resolve_ambiguous_symbol("\u00a5", "zh_TW") - assert result == "CNY" - - def test_yen_sign_with_zh_hk_uses_prefix(self) -> None: - """Yen sign resolves to CNY via zh prefix for zh_HK.""" - result = resolve_ambiguous_symbol("\u00a5", "zh_HK") - assert result == "CNY" - - def test_pound_sign_with_en_gb_exact_match(self) -> None: - """Pound sign resolves to GBP via exact en_gb match.""" - result = resolve_ambiguous_symbol("\u00a3", "en_GB") - assert result == "GBP" - - def test_pound_sign_with_ar_eg_exact_match(self) -> None: - """Pound sign resolves to EGP via exact ar_eg match.""" - result = resolve_ambiguous_symbol("\u00a3", "ar_EG") - assert result == "EGP" - - def test_pound_sign_with_ar_sa_uses_prefix(self) -> None: - """Pound sign resolves to EGP via ar prefix for ar_SA.""" - # ar_SA is not in exact match but ar prefix maps to EGP - result = resolve_ambiguous_symbol("\u00a3", "ar_SA") - assert result == "EGP" - - def test_non_ambiguous_returns_none(self) -> None: - """Non-ambiguous symbols return None.""" - result = resolve_ambiguous_symbol("\u20ac", "en_US") - assert result is None - - def test_no_locale_uses_default(self) -> None: - """Ambiguous symbol without locale uses default.""" - result = resolve_ambiguous_symbol("\u00a5", None) - assert result == "JPY" - - def test_empty_locale_uses_default(self) -> None: - """Ambiguous symbol with empty locale uses default.""" - result = resolve_ambiguous_symbol("$", "") - assert result == "USD" - - def test_unknown_locale_with_underscore_uses_default(self) -> None: - """Unknown locale with underscore falls through to default.""" - result = resolve_ambiguous_symbol("$", "xx_YY") - assert result == "USD" - - def test_unknown_locale_without_underscore_uses_default(self) -> None: - """Unknown locale without underscore skips prefix match.""" - result = resolve_ambiguous_symbol("$", "xx") - assert result == "USD" - - @given( - symbol_locale=st.sampled_from([ - ("\u00a5", "zh_CN", "CNY"), - ("\u00a5", "zh_TW", "CNY"), - ("\u00a5", "zh_HK", "CNY"), - ("\u00a3", "ar_SA", "EGP"), - ("\u00a3", "ar_DZ", "EGP"), - ]) - ) - def test_prefix_resolution_property( - self, symbol_locale: tuple[str, str, str] - ) -> None: - """PROPERTY: Locale prefix resolution matches expected currency.""" - symbol, locale, expected = symbol_locale - event(f"prefix_symbol={symbol}") - event(f"prefix_locale={locale}") - result = resolve_ambiguous_symbol(symbol, locale) - assert result == expected - - -# --------------------------------------------------------------------------- -# parse_currency: Specification examples -# --------------------------------------------------------------------------- - - -class TestParseCurrencySpecificationExamples: - """Specification examples for parse_currency behavior.""" - - def test_eur_symbol_prefix(self) -> None: - """EUR symbol prefix: EUR100.50 -> (100.50, EUR).""" - result, errors = parse_currency("\u20ac100.50", "en_US") - assert not errors - assert result is not None - assert result == (Decimal("100.50"), "EUR") - - def test_eur_symbol_suffix_latvian(self) -> None: - """EUR symbol suffix: 100,50 EUR -> (100.50, EUR) in lv_LV.""" - result, errors = parse_currency("100,50 \u20ac", "lv_LV") - assert not errors - assert result is not None - assert result == (Decimal("100.50"), "EUR") - - def test_usd_with_default_currency(self) -> None: - """$ with default_currency=USD resolves correctly.""" - result, errors = parse_currency( - "$1,234.56", "en_US", default_currency="USD", - ) - assert not errors - assert result is not None - assert result[0] == Decimal("1234.56") - assert result[1] == "USD" - - def test_iso_code_prefix(self) -> None: - """ISO code prefix: USD 1,234.56 -> (1234.56, USD).""" - result, errors = parse_currency("USD 1,234.56", "en_US") - assert not errors - assert result is not None - assert result == (Decimal("1234.56"), "USD") - - def test_iso_code_german_format(self) -> None: - """German format: EUR 1.234,56 -> (1234.56, EUR).""" - result, errors = parse_currency("EUR 1.234,56", "de_DE") - assert not errors - assert result is not None - assert result == (Decimal("1234.56"), "EUR") - - def test_rupee_unambiguous(self) -> None: - """Indian Rupee symbol is unambiguous.""" - result, errors = parse_currency("\u20b91000", "hi_IN") - assert not errors - assert result is not None - assert result[1] == "INR" - - def test_swiss_franc_iso(self) -> None: - """Swiss Franc via ISO code.""" - result, errors = parse_currency("CHF 100", "de_CH") - assert not errors - assert result is not None - assert result == (Decimal(100), "CHF") - - def test_cny_chinese_locale(self) -> None: - """Yen symbol resolves to CNY in Chinese locales.""" - result, errors = parse_currency( - "\u00a51000", "zh_CN", infer_from_locale=True, - ) - assert not errors - assert result is not None - assert result[1] == "CNY" - - def test_jpy_japanese_locale(self) -> None: - """Yen symbol resolves to JPY in Japanese locales.""" - result, errors = parse_currency( - "\u00a512,345", "ja_JP", infer_from_locale=True, - ) - assert not errors - assert result is not None - assert result[1] == "JPY" - - def test_gbp_british_locale(self) -> None: - """Pound symbol resolves to GBP in British locales.""" - result, errors = parse_currency( - "\u00a3999.99", "en_GB", infer_from_locale=True, - ) - assert not errors - assert result is not None - assert result == (Decimal("999.99"), "GBP") - - -# --------------------------------------------------------------------------- -# parse_currency: Error paths -# --------------------------------------------------------------------------- - - -class TestParseCurrencyErrors: - """Test error handling in parse_currency.""" - - def test_no_symbol_returns_error(self) -> None: - """Missing currency symbol returns error.""" - result, errors = parse_currency("1,234.56", "en_US") - assert result is None - assert len(errors) == 1 - - def test_invalid_input_returns_error(self) -> None: - """Non-parseable input returns error.""" - result, errors = parse_currency("invalid", "en_US") - assert result is None - assert len(errors) == 1 - - def test_invalid_number_with_symbol(self) -> None: - """Invalid number with currency symbol returns error.""" - result, errors = parse_currency("\u20acinvalid", "en_US") - assert result is None - assert len(errors) == 1 - - def test_empty_string(self) -> None: - """Empty string returns error.""" - result, errors = parse_currency("", "en_US") - assert result is None - assert len(errors) == 1 - - def test_only_symbol(self) -> None: - """Symbol without number returns error.""" - result, errors = parse_currency("\u20ac", "en_US") - assert result is None - assert len(errors) == 1 - - def test_invalid_locale(self) -> None: - """Invalid locale returns error with locale info.""" - result, errors = parse_currency( - "\u20ac10.50", "invalid_LOCALE_CODE", - ) - assert result is None - assert len(errors) == 1 - assert any("locale" in str(err).lower() for err in errors) - - def test_malformed_locale(self) -> None: - """Malformed locale returns error.""" - result, errors = parse_currency("$100", "!!!invalid@@@") - assert result is None - assert len(errors) == 1 - - def test_ambiguous_without_default_returns_error(self) -> None: - """$ without default_currency or inference returns error.""" - result, errors = parse_currency("$100", "en_US") - assert result is None - assert len(errors) == 1 - - -# --------------------------------------------------------------------------- -# _resolve_currency_code internal paths -# --------------------------------------------------------------------------- - - -class TestResolveCurrencyCode: - """Test _resolve_currency_code edge cases.""" - - def test_unknown_symbol_returns_error(self) -> None: - """Unknown symbol returns error.""" - result, error = currency_module._resolve_currency_code( - "ZZZZZ", "en_US", "ZZZZZ 100", - default_currency=None, infer_from_locale=False, - ) - assert result is None - assert error is not None - - def test_invalid_default_currency_format(self) -> None: - """Ambiguous symbol with invalid default_currency returns error.""" - result, error = currency_module._resolve_currency_code( - "$", "en_US", "$100", - default_currency="invalid", infer_from_locale=False, - ) - assert result is None - assert error is not None - - def test_lowercase_default_currency_rejected(self) -> None: - """Lowercase default_currency is rejected (ISO requires uppercase).""" - result, error = currency_module._resolve_currency_code( - "$", "en_US", "$100", - default_currency="usd", infer_from_locale=False, - ) - assert result is None - assert error is not None - - def test_short_default_currency_rejected(self) -> None: - """2-letter default_currency is rejected (ISO requires 3).""" - result, error = currency_module._resolve_currency_code( - "$", "en_US", "$100", - default_currency="US", infer_from_locale=False, - ) - assert result is None - assert error is not None - - def test_long_default_currency_rejected(self) -> None: - """4-letter default_currency is rejected (ISO requires 3).""" - result, error = currency_module._resolve_currency_code( - "$", "en_US", "$100", - default_currency="USDD", infer_from_locale=False, - ) - assert result is None - assert error is not None - - def test_numeric_default_currency_rejected(self) -> None: - """Numeric default_currency is rejected (ISO requires letters).""" - result, error = currency_module._resolve_currency_code( - "$", "en_US", "$100", - default_currency="123", infer_from_locale=False, - ) - assert result is None - assert error is not None - - def test_invalid_iso_code_not_in_cldr(self) -> None: - """3-letter uppercase code not in CLDR returns error.""" - result, errors = parse_currency("AAA 100", "en_US") - assert result is None - assert len(errors) == 1 - - @given( - default=st.from_regex(r"[a-z]{3}", fullmatch=True) - ) - @settings(max_examples=20) - def test_lowercase_codes_always_rejected( - self, default: str - ) -> None: - """PROPERTY: Lowercase 3-letter codes always rejected.""" - event(f"code_sample={default[:2]}") - result, error = currency_module._resolve_currency_code( - "$", "en_US", "$100", - default_currency=default, infer_from_locale=False, - ) - assert result is None - assert error is not None - - -# --------------------------------------------------------------------------- -# Locale-to-currency fallback -# --------------------------------------------------------------------------- - - -class TestLocaleToCurrencyFallback: - """Test locale-to-currency inference fallback.""" - - def test_dollar_inferred_from_en_us(self) -> None: - """$ inferred as USD from en_US.""" - result, errors = parse_currency( - "$100", "en_US", infer_from_locale=True, - ) - assert errors == () - assert result is not None - assert result[1] == "USD" - - def test_dollar_resolves_to_usd_in_de_de(self) -> None: - """$ resolves to USD in de_DE (dollar sign is unambiguous).""" - result, errors = parse_currency( - "$100", "de_DE", infer_from_locale=True, - ) - assert errors == () - assert result is not None - assert result[1] == "USD" - - def test_cldr_only_ambiguous_symbol_locale_fallback(self) -> None: - """CLDR-only ambiguous symbol resolves via locale-to-currency map. - - Rs is ambiguous in CLDR (INR, PKR, etc.) but not in the fast-tier - ambiguous set. resolve_ambiguous_symbol returns None, so resolution - falls through to the CLDR locale-to-currency mapping. - """ - result, errors = parse_currency( - "Rs 500", "hi_IN", infer_from_locale=True, - ) - assert errors == () - assert result is not None - assert result == (Decimal(500), "INR") - - def test_cldr_only_ambiguous_kr_dot_locale_fallback(self) -> None: - """kr. (Nordic krona with period) resolves via locale-to-currency map. - - kr. is ambiguous in CLDR (DKK, NOK, SEK, ISK) but not in the fast-tier - ambiguous set. Falls through to locale-to-currency mapping. - """ - result, errors = parse_currency( - "kr.500", "da_DK", infer_from_locale=True, - ) - assert errors == () - assert result is not None - assert result == (Decimal(500), "DKK") - - def test_no_resolution_available(self) -> None: - """Empty currency maps cause resolution failure.""" - with ( - patch( - "ftllexengine.parsing.currency.resolve_ambiguous_symbol", - return_value=None, - ), - patch( - "ftllexengine.parsing.currency._get_currency_maps", - return_value=( - {}, - {"$"}, - {}, - frozenset({"USD"}), - ), - ), - ): - result, errors = parse_currency( - "$100", "en_US", infer_from_locale=True, - ) - - assert result is None - assert len(errors) == 1 - - def test_kr_unknown_locale_defaults_to_sek(self) -> None: - """kr symbol with unknown locale defaults to SEK.""" - result, error = currency_module._resolve_currency_code( - "kr", "xx_UNKNOWN", "kr 100", - default_currency=None, infer_from_locale=True, - ) - assert result == "SEK" or error is not None - - -# --------------------------------------------------------------------------- -# Roundtrip: format -> parse -> verify -# --------------------------------------------------------------------------- - - -class TestRoundtripCurrency: - """Test format -> parse -> verify roundtrip.""" - - def test_roundtrip_usd_en_us(self) -> None: - """Currency roundtrip for US English.""" - from ftllexengine.runtime.functions import currency_format - - original = Decimal("1234.56") - formatted = currency_format( - original, "en-US", - currency="USD", currency_display="symbol", - ) - result, errors = parse_currency( - str(formatted), "en_US", default_currency="USD", - ) - assert not errors - assert result is not None - assert result[0] == original - assert result[1] == "USD" - - def test_roundtrip_eur_lv_lv(self) -> None: - """Currency roundtrip for Latvian EUR.""" - from ftllexengine.runtime.functions import currency_format - - original = Decimal("1234.56") - formatted = currency_format( - original, "lv-LV", - currency="EUR", currency_display="symbol", - ) - result, errors = parse_currency(str(formatted), "lv_LV") - assert not errors - assert result is not None - assert result[0] == original - assert result[1] == "EUR" - - -# --------------------------------------------------------------------------- -# CLDR map integrity -# --------------------------------------------------------------------------- - - -class TestCLDRMapIntegrity: - """Test CLDR currency map structural invariants.""" - - REQUIRED_CURRENCIES: frozenset[str] = frozenset({ - "USD", "EUR", "JPY", "GBP", "CHF", "AUD", "NZD", "CAD", - "CNY", "HKD", "SGD", "SEK", "NOK", "DKK", "KRW", - "INR", "RUB", "TRY", "ZAR", "MXN", "BRL", - "PLN", "CZK", "HUF", "RON", "BGN", - }) - - def test_symbol_lookup_locales_discover_major_currencies( - self, - ) -> None: - """Hardcoded locale list discovers major currency symbols.""" - symbol_map, _, _, _ = _get_currency_maps() - discovered: set[str] = set(symbol_map.values()) - missing = self.REQUIRED_CURRENCIES - discovered - max_missing = len(self.REQUIRED_CURRENCIES) // 5 - assert len(missing) <= max_missing, ( - f"Too many major currencies missing: {sorted(missing)}. " - f"Max allowed: {max_missing}, got: {len(missing)}" - ) - - def test_locale_to_currency_covers_major_territories( - self, - ) -> None: - """Locale-to-currency mapping covers major territories.""" - _, _, locale_to_currency, _ = _get_currency_maps() - expected_locales = { - "en_US", "en_GB", "en_CA", "en_AU", - "de_DE", "de_AT", "de_CH", - "fr_FR", "fr_CA", - "ja_JP", "zh_CN", "ko_KR", - "es_ES", "es_MX", "pt_BR", - "lv_LV", "et_EE", "lt_LT", - } - found = expected_locales & set(locale_to_currency.keys()) - missing = expected_locales - found - min_coverage = len(expected_locales) * 0.8 - assert len(found) >= min_coverage, ( - f"Insufficient: {len(found)}/{len(expected_locales)}. " - f"Missing: {sorted(missing)}" - ) - - def test_returns_correct_types(self) -> None: - """_build_currency_maps_from_cldr returns correct types.""" - sym, amb, loc, codes = _build_currency_maps_from_cldr() - for s, c in sym.items(): - assert isinstance(s, str) - assert isinstance(c, str) - for s in amb: - assert isinstance(s, str) - for l_key, l_val in loc.items(): - assert isinstance(l_key, str) - assert isinstance(l_val, str) - assert isinstance(codes, frozenset) - - def test_euro_is_unambiguous(self) -> None: - """EUR symbol is in the unambiguous map.""" - sym, amb, _, _ = _build_currency_maps_from_cldr() - assert "\u20ac" in sym or "\u20ac" not in amb - if "\u20ac" in sym: - assert sym["\u20ac"] == "EUR" - - def test_dollar_is_ambiguous(self) -> None: - """$ symbol is in the ambiguous set.""" - _, amb, _, _ = _build_currency_maps_from_cldr() - assert "$" in amb - - def test_currency_maps_caching(self) -> None: - """_get_currency_maps_full returns same cached object.""" - result1 = currency_module._get_currency_maps_full() - result2 = currency_module._get_currency_maps_full() - assert result1 is result2 - assert len(result1) == 4 - - -# --------------------------------------------------------------------------- -# _build_currency_maps_from_cldr exception paths -# --------------------------------------------------------------------------- - - -class TestBuildCurrencyMapsExceptions: - """Test _build_currency_maps_from_cldr exception handling.""" - - @pytest.fixture(autouse=True) - def _clear_cache(self) -> None: - _build_currency_maps_from_cldr.cache_clear() - _get_currency_maps.cache_clear() - - def test_locale_parse_exception_handled(self) -> None: - """Locale.parse exceptions are caught gracefully.""" - from babel import Locale - - original_parse = Locale.parse - - def mock_parse(locale_id: str) -> Any: - if "broken" in locale_id.lower(): - msg = "Mocked parse failure" - raise ValueError(msg) - return original_parse(locale_id) - - with ( - patch.object(Locale, "parse", side_effect=mock_parse), - patch( - "babel.localedata.locale_identifiers", - return_value=["en_US", "broken_locale", "de_DE"], - ), - ): - sym, amb, loc, _ = _build_currency_maps_from_cldr() - - assert isinstance(sym, dict) - assert isinstance(amb, set) - assert isinstance(loc, dict) - - def test_key_error_in_currencies_access(self) -> None: - """KeyError when accessing locale.currencies is caught.""" - mock_locale = MagicMock() - mock_locale.currencies.keys.side_effect = KeyError("Mock") - - with ( - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=["test_locale"], - ), - ): - sym, _, _, codes = _build_currency_maps_from_cldr() - - assert isinstance(sym, dict) - assert isinstance(codes, frozenset) - - def test_locale_with_currencies_none(self) -> None: - """Locale with currencies=None is handled.""" - mock_locale = MagicMock() - mock_locale.currencies = None - - with ( - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=["test_locale"], - ), - ): - sym, amb, loc, _ = _build_currency_maps_from_cldr() - - assert isinstance(sym, dict) - assert isinstance(amb, set) - assert isinstance(loc, dict) - - def test_get_currency_symbol_exception(self) -> None: - """get_currency_symbol exceptions are caught.""" - - def mock_symbol( - currency_code: str, - locale: object = None, # noqa: ARG001 - unused - ) -> str: - if currency_code == "FAIL": - msg = "Mock symbol failure" - raise ValueError(msg) - return "$" if currency_code == "USD" else currency_code - - mock_locale = MagicMock() - mock_locale.currencies = {"USD": "Dollar", "FAIL": "Bad"} - mock_locale.territory = "US" - - with ( - patch( - "babel.numbers.get_currency_symbol", - side_effect=mock_symbol, - ), - patch( - "babel.localedata.locale_identifiers", - return_value=["en_US"], - ), - patch("babel.Locale.parse", return_value=mock_locale), - ): - sym, amb, _, _ = _build_currency_maps_from_cldr() - - assert isinstance(sym, dict) - assert isinstance(amb, set) - - def test_attribute_error_in_symbol_lookup(self) -> None: - """AttributeError in get_currency_symbol is caught.""" - - def mock_raises( - currency_code: str, # noqa: ARG001 - unused - locale: object = None, # noqa: ARG001 - unused - ) -> str: - msg = "Mock attribute error" - raise AttributeError(msg) - - mock_locale = MagicMock() - mock_locale.currencies = {"USD": "Dollar"} - mock_locale.territory = "US" - mock_locale.configure_mock( - **{"__str__.return_value": "en_US"}, - ) - - with ( - patch( - "babel.numbers.get_currency_symbol", - side_effect=mock_raises, - ), - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=["en_US"], - ), - ): - sym, _, _, codes = _build_currency_maps_from_cldr() - - assert isinstance(sym, dict) - assert isinstance(codes, frozenset) - - def test_territory_currencies_exception(self) -> None: - """get_territory_currencies exception is caught.""" - - def mock_territory(territory: str) -> list[str]: - if territory == "XX": - msg = "Unknown territory" - raise ValueError(msg) - return ["USD"] - - mock_us = MagicMock() - mock_us.territory = "US" - mock_us.currencies = {} - mock_us.configure_mock( - **{"__str__.return_value": "en_US"}, - ) - - mock_xx = MagicMock() - mock_xx.territory = "XX" - mock_xx.currencies = {} - mock_xx.configure_mock( - **{"__str__.return_value": "xx_XX"}, - ) - - def mock_parse(locale_id: str) -> MagicMock: - return mock_xx if locale_id == "xx_XX" else mock_us - - with ( - patch( - "babel.numbers.get_territory_currencies", - side_effect=mock_territory, - ), - patch( - "babel.localedata.locale_identifiers", - return_value=["en_US", "xx_XX"], - ), - patch("babel.Locale.parse", side_effect=mock_parse), - ): - _, _, loc, _ = _build_currency_maps_from_cldr() - - assert isinstance(loc, dict) - - def test_unknown_locale_error_in_territory_lookup(self) -> None: - """UnknownLocaleError in get_territory_currencies is caught.""" - - def mock_raises( - territory: str, # noqa: ARG001 - unused - ) -> list[str]: - msg = "Mock unknown locale" - raise UnknownLocaleError(msg) - - mock_locale = MagicMock() - mock_locale.territory = "XX" - mock_locale.currencies = {} - mock_locale.configure_mock( - **{"__str__.return_value": "xx_XX"}, - ) - - with ( - patch( - "babel.numbers.get_territory_currencies", - side_effect=mock_raises, - ), - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=["xx_XX"], - ), - ): - _, _, _, codes = _build_currency_maps_from_cldr() - - assert isinstance(codes, frozenset) - - def test_locale_without_territory(self) -> None: - """Locale without territory is handled.""" - mock_locale = MagicMock() - mock_locale.territory = None - mock_locale.currencies = {} - - with ( - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=["en"], - ), - ): - _, _, loc, _ = _build_currency_maps_from_cldr() - - assert isinstance(loc, dict) - - def test_locale_str_without_underscore_excluded(self) -> None: - """Locale str without underscore is not in locale_to_currency.""" - mock_locale = MagicMock() - mock_locale.territory = "XX" - mock_locale.currencies = {} - mock_locale.configure_mock( - **{"__str__.return_value": "en"}, - ) - - with ( - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=["en"], - ), - patch( - "babel.numbers.get_territory_currencies", - return_value=["GBP"], - ), - ): - _, _, loc, _ = _build_currency_maps_from_cldr() - - assert "en" not in loc - - def test_empty_territory_currencies(self) -> None: - """get_territory_currencies returning empty list is handled.""" - mock_locale = MagicMock() - mock_locale.territory = "US" - mock_locale.currencies = {} - mock_locale.configure_mock( - **{"__str__.return_value": "en_US"}, - ) - - with ( - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=["en_US"], - ), - patch( - "babel.numbers.get_territory_currencies", - return_value=[], - ), - ): - _, _, loc, _ = _build_currency_maps_from_cldr() - - assert isinstance(loc, dict) - - @given(locale_count=st.integers(min_value=1, max_value=5)) - @settings(max_examples=10) - def test_handles_various_locale_counts( - self, locale_count: int - ) -> None: - """PROPERTY: Function handles any number of locales.""" - event(f"locale_count={locale_count}") - - _build_currency_maps_from_cldr.cache_clear() - mock_locales = [f"mock_{i}" for i in range(locale_count)] - - mock_locale = MagicMock() - mock_locale.territory = None - mock_locale.currencies = {} - - with ( - patch("babel.Locale.parse", return_value=mock_locale), - patch( - "babel.localedata.locale_identifiers", - return_value=mock_locales, - ), - ): - sym, amb, loc, _ = _build_currency_maps_from_cldr() - - assert isinstance(sym, dict) - assert isinstance(amb, set) - assert isinstance(loc, dict) - - -# --------------------------------------------------------------------------- -# BabelImportError handling -# --------------------------------------------------------------------------- - - -class TestBabelImportError: - """Test Babel import error handling.""" - - def test_build_maps_returns_empty_when_babel_missing( - self, - ) -> None: - """_build_currency_maps_from_cldr returns empty without Babel.""" - import ftllexengine.core.babel_compat as _bc - - _build_currency_maps_from_cldr.cache_clear() - - original_import = builtins.__import__ - - def mock_import( - name: str, *args: object, **kwargs: object - ) -> object: - if name == "babel" or name.startswith("babel."): - msg = f"No module named '{name}'" - raise ImportError(msg) - return original_import(name, *args, **kwargs) # type: ignore[arg-type] - - # Reset sentinel so is_babel_available() re-evaluates under the mock - _bc._babel_available = None - - try: - with patch( - "builtins.__import__", side_effect=mock_import - ): - sym, amb, loc, codes = ( - _build_currency_maps_from_cldr() - ) - assert sym == {} - assert amb == set() - assert loc == {} - assert codes == frozenset() - finally: - _build_currency_maps_from_cldr.cache_clear() - # Reset sentinel so subsequent tests reinitialize with Babel available - _bc._babel_available = None - - def test_parse_currency_raises_babel_import_error( - self, - ) -> None: - """parse_currency raises BabelImportError without Babel.""" - import ftllexengine.core.babel_compat as _bc - from ftllexengine.core.babel_compat import BabelImportError - - _bc._babel_available = None - original_import = builtins.__import__ - - def mock_import( - name: str, *args: object, **kwargs: object - ) -> object: - if name == "babel" or name.startswith("babel."): - msg = f"No module named '{name}'" - raise ImportError(msg) - return original_import(name, *args, **kwargs) # type: ignore[arg-type] - - try: - with patch( - "builtins.__import__", side_effect=mock_import - ): - with pytest.raises(BabelImportError) as exc_info: - parse_currency("\u20ac100", "en_US") - - error_msg = str(exc_info.value) - assert "parse_currency" in error_msg - finally: - _bc._babel_available = None - - -# --------------------------------------------------------------------------- -# Fast tier operations -# --------------------------------------------------------------------------- - - -class TestFastTierOperations: - """Test fast tier currency operations (no CLDR scan).""" - - def test_fast_tier_symbols_available(self) -> None: - """Fast tier unambiguous symbols always available.""" - from ftllexengine.parsing.currency import ( - _FAST_TIER_UNAMBIGUOUS_SYMBOLS, - _get_currency_maps_fast, - ) - - symbols, _, _, _ = _get_currency_maps_fast() - assert len(symbols) > 0 - assert "\u20ac" in symbols - assert symbols["\u20ac"] == "EUR" - assert symbols == _FAST_TIER_UNAMBIGUOUS_SYMBOLS - - def test_currency_pattern_compiles_and_matches(self) -> None: - """Currency regex pattern compiles and matches.""" - from ftllexengine.parsing.currency import ( - _get_currency_pattern, - ) - - _get_currency_pattern.cache_clear() - try: - pattern = _get_currency_pattern() - assert pattern.search("\u20ac100") is not None - assert pattern.search("USD 100") is not None - finally: - _get_currency_pattern.cache_clear() - - def test_currency_pattern_longest_match_first(self) -> None: - """Currency pattern matches multi-char symbols before prefixes.""" - from ftllexengine.parsing.currency import ( - _get_currency_pattern, - ) - - _get_currency_pattern.cache_clear() - try: - pattern = _get_currency_pattern() - # Rs must match before R - m = pattern.search("Rs100") - assert m is not None - assert m.group() == "Rs" - # kr. must match before kr - m = pattern.search("kr.500") - assert m is not None - assert m.group() == "kr." - finally: - _get_currency_pattern.cache_clear() - - -# --------------------------------------------------------------------------- -# Pattern compilation fallback -# --------------------------------------------------------------------------- - - -class TestPatternCompilationFallback: - """Test pattern compilation with empty symbol maps.""" - - def test_pattern_fallback_with_empty_symbols(self) -> None: - """Pattern falls back to ISO-code-only when no symbols.""" - from ftllexengine.parsing.currency import ( - _get_currency_pattern, - ) - - _get_currency_pattern.cache_clear() - - with patch( - "ftllexengine.parsing.currency._get_currency_maps", - return_value=({}, set(), {}, frozenset()), - ): - _get_currency_pattern.cache_clear() - pattern = _get_currency_pattern() - - assert isinstance(pattern, re.Pattern) - assert pattern.search("USD") is not None - assert pattern.search("\u20ac") is None - - _get_currency_pattern.cache_clear() - _get_currency_maps.cache_clear() - - -# --------------------------------------------------------------------------- -# Cache management -# --------------------------------------------------------------------------- - - -class TestClearCurrencyCaches: - """Test clear_currency_caches function.""" - - def test_executes_without_error(self) -> None: - """clear_currency_caches executes without error.""" - from ftllexengine.parsing.currency import clear_currency_caches - - clear_currency_caches() - - def test_invalidates_caches(self) -> None: - """clear_currency_caches clears cached data.""" - from ftllexengine.parsing.currency import clear_currency_caches - - maps1 = _get_currency_maps() - clear_currency_caches() - maps2 = _get_currency_maps() - assert len(maps1[0]) == len(maps2[0]) - - def test_idempotent(self) -> None: - """Multiple calls are safe.""" - from ftllexengine.parsing.currency import clear_currency_caches - - clear_currency_caches() - clear_currency_caches() - clear_currency_caches() - - -# --------------------------------------------------------------------------- -# Thread-safe caching behavior -# --------------------------------------------------------------------------- - - -class TestCurrencyCachingConcurrency: - """Test thread-safe caching via functools.cache.""" - - def test_concurrent_currency_maps_access(self) -> None: - """Concurrent calls to _get_currency_maps_full return cached object. - - functools.cache provides thread-safe cache access, but does NOT - prevent thundering herd on cold cache (multiple threads may compute - simultaneously). This test verifies that AFTER cache is populated, - concurrent access returns the same cached object. - """ - import threading - - # Pre-warm cache to ensure it's populated - _ = currency_module._get_currency_maps_full() - - barrier = threading.Barrier(4) - results: list[object] = [] - - def get_with_barrier() -> None: - barrier.wait() - data = currency_module._get_currency_maps_full() - results.append(data) - - threads = [ - threading.Thread(target=get_with_barrier) - for _ in range(4) - ] - for t in threads: - t.start() - for t in threads: - t.join() - - assert len(results) == 4 - assert all(r is results[0] for r in results) - - def test_currency_maps_structure(self) -> None: - """Cached currency maps have expected 4-tuple structure.""" - data = currency_module._get_currency_maps_full() - - assert len(data) == 4 - symbol_map, ambiguous, locale_to_currency, valid_codes = data - - assert isinstance(symbol_map, dict) - assert isinstance(ambiguous, set) - assert isinstance(locale_to_currency, dict) - assert isinstance(valid_codes, frozenset) +"""Aggregated parsing currency test surface.""" + +from tests.parsing_currency_cases.babel_import_error_handling import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.build_currency_maps_from_cldr_exception_paths import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.cache_management import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.cldr_map_integrity import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.fast_tier_operations import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.locale_to_currency_fallback import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.parse_currency_error_paths import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.parse_currency_specification_examples import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.pattern_compilation_fallback import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.property_ambiguous_symbols_with_locale_inference import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.property_arbitrary_locales_never_crash import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.property_invalid_inputs_never_crash_always_return_errors import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.property_iso_code_inputs_always_resolve_correctly import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.property_unambiguous_symbols_always_parse_successfully import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.resolve_ambiguous_symbol_locale_prefix_fallback import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.resolve_currency_code_internal_paths import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.roundtrip_format_parse_verify import * # noqa: F403 - re-export split test surface +from tests.parsing_currency_cases.thread_safe_caching_behavior import * # noqa: F403 - re-export split test surface diff --git a/tests/test_parsing_currency_property.py b/tests/test_parsing_currency_property.py index 2477469a..efe890ce 100644 --- a/tests/test_parsing_currency_property.py +++ b/tests/test_parsing_currency_property.py @@ -11,6 +11,7 @@ from decimal import Decimal +from babel.numbers import format_decimal from hypothesis import event, given, settings from hypothesis import strategies as st @@ -249,11 +250,13 @@ def test_parse_currency_fractional_amounts(self, amount: Decimal) -> None: locale=st.sampled_from(["en_US", "de_DE", "fr_FR", "ja_JP", "lv_LV", "pl_PL"]), ) @settings(max_examples=50) - def test_parse_currency_locale_independence(self, locale: str) -> None: - """Currency parsing should work across locales.""" + def test_parse_currency_locale_formatted_iso_code(self, locale: str) -> None: + """ISO-coded money parses when the numeric portion matches the locale.""" event(f"locale={locale}") - # Use ISO code (universal) - currency_str = "EUR 1234.56" + formatted_amount = str( + format_decimal(Decimal("1234.56"), locale=locale, decimal_quantization=False) + ) + currency_str = f"EUR {formatted_amount}" result, errors = parse_currency(currency_str, locale) assert not errors @@ -262,9 +265,13 @@ def test_parse_currency_locale_independence(self, locale: str) -> None: parsed_amount, currency_code = result assert currency_code == "EUR" - # Note: Babel parsing may interpret differently based on locale - # Main check: doesn't crash and returns valid Decimal - assert isinstance(parsed_amount, Decimal) + assert parsed_amount == Decimal("1234.56") + + def test_parse_currency_rejects_de_dot_grouping_mismatch(self) -> None: + """de_DE rejects dot-decimal money input because dot means grouping.""" + result, errors = parse_currency("EUR 1234.56", "de_DE") + assert len(errors) > 0 + assert result is None @given( value=st.text( diff --git a/tests/test_parsing_dates.py b/tests/test_parsing_dates.py index 6bf0e0ef..7d05fabf 100644 --- a/tests/test_parsing_dates.py +++ b/tests/test_parsing_dates.py @@ -1,1236 +1,18 @@ -"""Tests for date and datetime parsing functions. - -Core parsing tests, internal function edge cases, tokenizer, separator -extraction, BabelImportError paths, datetime ordering, and property-based -roundtrip tests for parse_date() and parse_datetime(). - -Functions return tuple[value, errors]: -- parse_date() returns tuple[date | None, list[FluentParseError]] -- parse_datetime() returns tuple[datetime | None, list[FluentParseError]] -- Functions never raise exceptions; errors returned in list - -Python 3.13+. -""" - -from __future__ import annotations - -import builtins -import sys -from datetime import UTC, date, datetime -from unittest.mock import MagicMock, Mock, patch - -import pytest -from babel import Locale -from hypothesis import event, given -from hypothesis import strategies as st - -import ftllexengine.core.babel_compat as _bc -from ftllexengine.parsing.dates import ( - _babel_to_strptime, - _extract_datetime_separator, - _get_date_patterns, - _get_datetime_patterns, - _preprocess_datetime_input, - _tokenize_babel_pattern, - parse_date, - parse_datetime, -) - -# --------------------------------------------------------------------------- -# parse_date -# --------------------------------------------------------------------------- - - -class TestParseDate: - """Test parse_date() function.""" - - def test_parse_date_us_format(self) -> None: - """Parse US date format (M/d/yy - CLDR short format).""" - result, errors = parse_date("1/28/25", "en_US") - assert not errors - assert result == date(2025, 1, 28) - - def test_parse_date_european_format(self) -> None: - """Parse European date format (d.M.yy - CLDR short format).""" - result, errors = parse_date("28.1.25", "lv_LV") - assert not errors - assert result == date(2025, 1, 28) - - result, errors = parse_date("28.01.25", "de_DE") - assert not errors - assert result == date(2025, 1, 28) - - def test_parse_date_iso_format(self) -> None: - """Parse ISO 8601 date format.""" - result, errors = parse_date("2025-01-28", "en_US") - assert not errors - assert result == date(2025, 1, 28) - - def test_parse_date_invalid_returns_error(self) -> None: - """Invalid input returns error in tuple; function never raises.""" - result, errors = parse_date("invalid", "en_US") - assert len(errors) > 0 - assert result is None - assert errors[0].parse_type == "date" - assert errors[0].input_value == "invalid" - - def test_parse_date_empty_returns_error(self) -> None: - """Empty input returns error in list.""" - result, errors = parse_date("", "en_US") - assert len(errors) > 0 - assert result is None - - -# --------------------------------------------------------------------------- -# parse_datetime -# --------------------------------------------------------------------------- - - -class TestParseDatetime: - """Test parse_datetime() function.""" - - def test_parse_datetime_us_format(self) -> None: - """Parse US datetime format (M/d/yy + time - CLDR).""" - result, errors = parse_datetime("1/28/25, 14:30", "en_US") - assert not errors - assert result == datetime(2025, 1, 28, 14, 30) - - def test_parse_datetime_european_format(self) -> None: - """Parse European datetime format (d.M.yy + time - CLDR).""" - result, errors = parse_datetime("28.1.25 14:30", "lv_LV") - assert not errors - assert result == datetime(2025, 1, 28, 14, 30) - - def test_parse_datetime_with_timezone(self) -> None: - """Parse datetime and apply timezone.""" - result, errors = parse_datetime( - "2025-01-28 14:30", "en_US", tzinfo=UTC - ) - assert not errors - assert result == datetime(2025, 1, 28, 14, 30, tzinfo=UTC) - - def test_parse_datetime_invalid_returns_error(self) -> None: - """Invalid input returns error in tuple; function never raises.""" - result, errors = parse_datetime("invalid", "en_US") - assert len(errors) > 0 - assert result is None - assert errors[0].parse_type == "datetime" - - def test_parse_datetime_empty_returns_error(self) -> None: - """Empty input returns error in list.""" - result, errors = parse_datetime("", "en_US") - assert len(errors) > 0 - assert result is None - - def test_parse_datetime_with_seconds(self) -> None: - """Datetime parsing with seconds component.""" - result, errors = parse_datetime("28.01.25, 14:30:45", "de_DE") - assert not errors - assert result is not None - assert result.hour == 14 - assert result.minute == 30 - assert result.second == 45 - - def test_parse_datetime_iso_format_all_locales(self) -> None: - """ISO format works across all locales.""" - iso_str = "2025-01-28 14:30:00" - for locale in [ - "en_US", "de_DE", "fr_FR", "es_ES", "ja_JP", "zh_CN" - ]: - result, errors = parse_datetime(iso_str, locale) - assert not errors - assert result is not None, f"ISO format failed for {locale}" - assert result.year == 2025 - assert result.month == 1 - assert result.day == 28 - - def test_parse_datetime_with_working_formats(self) -> None: - """Datetime parsing with CLDR locale-specific separators.""" - test_cases = [ - ("01/28/25, 14:30", "en_US"), - ("01/28/25, 02:30 PM", "en_US"), - ("28.01.25, 14:30", "de_DE"), - ] - for date_str, locale in test_cases: - result, errors = parse_datetime(date_str, locale) - assert not errors - assert result is not None, ( - f"Failed to parse '{date_str}' for {locale}" - ) - assert result.year == 2025 - assert result.month == 1 - assert result.day == 28 - - -# ============================================================================ -# Unknown Locale Handling -# ============================================================================ - - -class TestParseDateUnknownLocale: - """Test parse_date with unknown locale.""" - - def test_iso_format_succeeds(self) -> None: - """ISO format succeeds even with unknown locale.""" - result, errors = parse_date("2025-01-01", "xx-INVALID") - assert result is not None - assert len(errors) == 0 - - def test_non_iso_format_fails(self) -> None: - """Non-ISO format with unknown locale returns error.""" - result, errors = parse_date("01/28/2025", "xx-INVALID") - assert result is None - assert len(errors) == 1 - assert errors[0].parse_type == "date" - - def test_malformed_locale(self) -> None: - """Malformed locale returns error for non-ISO format.""" - result, errors = parse_date( - "28.01.2025", "not-a-valid-locale-format" - ) - assert result is None - assert len(errors) == 1 - - -class TestParseDatetimeUnknownLocale: - """Test parse_datetime with unknown locale.""" - - def test_iso_format_succeeds(self) -> None: - """ISO format succeeds even with unknown locale.""" - result, errors = parse_datetime( - "2025-01-28T14:30:00", "xx-INVALID" - ) - assert result is not None - assert len(errors) == 0 - - def test_non_iso_format_fails(self) -> None: - """Non-ISO format with unknown locale returns error.""" - result, errors = parse_datetime( - "01/28/2025 2:30 PM", "xx-INVALID" - ) - assert result is None - assert len(errors) == 1 - assert errors[0].parse_type == "datetime" - - -# ============================================================================ -# _tokenize_babel_pattern -# ============================================================================ - - -class TestTokenizeBabelPattern: - """Test CLDR pattern tokenizer quote handling.""" - - def test_simple_quoted_literal(self) -> None: - """Simple quoted literal is extracted as single token.""" - tokens = _tokenize_babel_pattern("h 'at' a") - assert "at" in tokens - - def test_escaped_quote_outside(self) -> None: - """Two quotes '' outside a quoted section produce literal quote.""" - tokens = _tokenize_babel_pattern("h''mm") - assert "'" in tokens - - def test_escaped_quote_inside(self) -> None: - """Two quotes '' inside quoted text produce literal quote.""" - tokens = _tokenize_babel_pattern("h 'o''clock' a") - assert "o'clock" in tokens - - def test_irish_locale_pattern(self) -> None: - """Quoted literals in locale patterns.""" - tokens = _tokenize_babel_pattern("d MMMM 'de' yyyy") - assert "de" in tokens - assert "d" in tokens - assert "yyyy" in tokens - - def test_standard_pattern_unchanged(self) -> None: - """Standard patterns without quotes work correctly.""" - tokens = _tokenize_babel_pattern("yyyy-MM-dd") - assert tokens == ["yyyy", "-", "MM", "-", "dd"] - - def test_latvian_pattern(self) -> None: - """Latvian date pattern d.MM.yyyy.""" - tokens = _tokenize_babel_pattern("d.MM.yyyy") - assert tokens == ["d", ".", "MM", ".", "yyyy"] - - def test_empty_pattern(self) -> None: - """Empty pattern produces empty token list.""" - assert _tokenize_babel_pattern("") == [] - - def test_unclosed_quote(self) -> None: - """Unclosed quote at end is handled gracefully.""" - tokens = _tokenize_babel_pattern("h 'unclosed") - assert "h" in tokens - assert "unclosed" in tokens - - def test_empty_quoted_section(self) -> None: - """Empty quotes '' produce single quote, not empty token.""" - tokens = _tokenize_babel_pattern("a''b") - assert "'" in tokens - assert "a" in tokens - assert "b" in tokens - - def test_adjacent_quoted_sections(self) -> None: - """Multiple adjacent quotes produce multiple literal quotes.""" - tokens = _tokenize_babel_pattern("''''") - assert tokens.count("'") == 2 - - def test_just_two_quotes(self) -> None: - """Just '' produces single quote.""" - tokens = _tokenize_babel_pattern("''") - assert "'" in tokens - - def test_three_quotes(self) -> None: - """Three quotes: first two produce quote, third starts section.""" - tokens = _tokenize_babel_pattern("'''") - assert "'" in tokens - - def test_real_world_german_pattern(self) -> None: - """German pattern with quoted 'um' literal.""" - tokens = _tokenize_babel_pattern("d. MMMM yyyy 'um' HH:mm") - assert "um" in tokens - assert "d" in tokens - assert "MMMM" in tokens - - def test_real_world_at_pattern(self) -> None: - """Pattern with 'at' literal.""" - tokens = _tokenize_babel_pattern( - "EEEE, MMMM d, y 'at' h:mm a" - ) - assert "at" in tokens - - def test_pattern_ending_in_quote(self) -> None: - """Pattern ending with unclosed quote handled gracefully.""" - tokens = _tokenize_babel_pattern("yyyy 'test") - assert "yyyy" in tokens - assert "test" in tokens - - def test_russian_quoted_literal(self) -> None: - """Russian pattern with quoted Cyrillic year marker.""" - pattern = "d MMMM y '\u0433'." - tokens = _tokenize_babel_pattern(pattern) - assert "\u0433" in tokens - assert "d" in tokens - assert "MMMM" in tokens - assert "y" in tokens - assert "." in tokens - - def test_spanish_quoted_de(self) -> None: - """Spanish pattern d 'de' MMMM 'de' y with quoted 'de'.""" - tokens = _tokenize_babel_pattern("d 'de' MMMM 'de' y") - assert "de" in tokens - assert "d" in tokens - assert "MMMM" in tokens - assert "y" in tokens - - -# ============================================================================ -# _extract_datetime_separator -# ============================================================================ - - -class TestExtractDatetimeSeparator: - """Test _extract_datetime_separator edge cases.""" - - def test_normal_order(self) -> None: - """en_US uses date-first order.""" - locale = Locale.parse("en_US") - separator, is_time_first = _extract_datetime_separator(locale) - assert isinstance(separator, str) - assert is_time_first is False - - def test_fallback_on_missing(self) -> None: - """Missing datetime_format returns fallback space.""" - mock_locale = MagicMock() - mock_locale.datetime_formats.get.return_value = None - separator, is_time_first = _extract_datetime_separator(mock_locale) - assert separator == " " - assert is_time_first is False - - def test_missing_placeholders(self) -> None: - """Pattern without placeholders returns fallback.""" - mock_locale = MagicMock() - mock_locale.datetime_formats.get.return_value = ( - "no placeholders here" - ) - separator, is_time_first = _extract_datetime_separator(mock_locale) - assert separator == " " - assert is_time_first is False - - def test_reversed_order(self) -> None: - """Pattern with {0} before {1} detects time-first.""" - mock_locale = MagicMock() - mock_locale.datetime_formats.get.return_value = "{0} at {1}" - separator, is_time_first = _extract_datetime_separator(mock_locale) - assert separator == " at " - assert is_time_first is True - - def test_adjacent_placeholders(self) -> None: - """Adjacent placeholders return fallback separator.""" - mock_locale = MagicMock() - mock_locale.datetime_formats.get.return_value = "{1}{0}" - separator, is_time_first = _extract_datetime_separator(mock_locale) - assert separator == " " - assert is_time_first is False - - def test_exception_handling(self) -> None: - """AttributeError returns fallback.""" - mock_locale = MagicMock() - mock_locale.datetime_formats.get.side_effect = AttributeError( - "mock error" - ) - separator, is_time_first = _extract_datetime_separator(mock_locale) - assert separator == " " - assert is_time_first is False - - -# ============================================================================ -# _get_date_patterns Exception Handling -# ============================================================================ - - -class TestGetDatePatternsExceptions: - """Test _get_date_patterns exception handling.""" - - def test_unknown_locale_returns_empty(self) -> None: - """Unknown locale returns empty tuple.""" - _get_date_patterns.cache_clear() - assert _get_date_patterns("xx-UNKNOWN") == () - - def test_invalid_format_returns_empty(self) -> None: - """Invalid format returns empty tuple.""" - _get_date_patterns.cache_clear() - assert _get_date_patterns("not-valid-at-all-xyz-123") == () - - def test_valid_locale_returns_patterns(self) -> None: - """Valid locale returns non-empty patterns.""" - _get_date_patterns.cache_clear() - assert len(_get_date_patterns("en-US")) > 0 - - def test_attribute_error_in_pattern(self) -> None: - """AttributeError accessing pattern falls back to str(fmt).""" - _get_date_patterns.cache_clear() - - mock_format = MagicMock() - del mock_format.pattern - - with patch.object(Locale, "parse") as mock_parse: - mock_locale = MagicMock() - mock_locale.date_formats = { - "short": mock_format, "medium": mock_format, - "long": mock_format, "full": mock_format, - } - mock_parse.return_value = mock_locale - _get_date_patterns.cache_clear() - patterns = _get_date_patterns("mock-locale-attr-err") - - assert len(patterns) > 0 - - def test_raises_babel_import_error_when_babel_missing(self) -> None: - """Raises BabelImportError when Babel unavailable.""" - _get_date_patterns.cache_clear() - _bc._babel_available = None - - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_: dict[str, object] | None = None, - locals_: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel": - msg = "No module named 'babel'" - raise ImportError(msg) - return original_import(name, globals_, locals_, fromlist, level) - - try: - with patch.object( - builtins, "__import__", side_effect=mock_import - ): - with pytest.raises( - ImportError, match="parse" - ) as exc_info: - _get_date_patterns("en_US") - assert exc_info.typename == "BabelImportError" - assert "parse_date" in str(exc_info.value) - finally: - _bc._babel_available = None - - def test_babel_import_error_feature_name(self) -> None: - """BabelImportError contains correct feature name.""" - _get_date_patterns.cache_clear() - _bc._babel_available = None - - babel_modules_backup = {} - babel_keys = [ - k for k in sys.modules - if k == "babel" or k.startswith("babel.") - ] - for key in babel_keys: - babel_modules_backup[key] = sys.modules.pop(key, None) - - try: - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_: dict[str, object] | None = None, - locals_: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel" or name.startswith("babel."): - msg = f"No module named '{name}'" - raise ImportError(msg) - return original_import( - name, globals_, locals_, fromlist, level - ) - - with patch.object( - builtins, "__import__", side_effect=mock_import - ): - with pytest.raises( - ImportError, match="parse" - ) as exc_info: - _get_date_patterns("en_US") - assert "parse_date" in str(exc_info.value) - finally: - for key, value in babel_modules_backup.items(): - if value is not None: - sys.modules[key] = value - _get_date_patterns.cache_clear() - _bc._babel_available = None - - -# ============================================================================ -# _get_datetime_patterns Exception Handling -# ============================================================================ - - -class TestGetDatetimePatternsExceptions: - """Test _get_datetime_patterns exception handling.""" - - def test_unknown_locale_returns_empty(self) -> None: - """Unknown locale returns empty tuple.""" - _get_datetime_patterns.cache_clear() - assert _get_datetime_patterns("xx-UNKNOWN") == () - - def test_invalid_format_returns_empty(self) -> None: - """Invalid format returns empty tuple.""" - _get_datetime_patterns.cache_clear() - assert _get_datetime_patterns("invalid-locale-format-xyz") == () - - def test_valid_locale_returns_patterns(self) -> None: - """Valid locale returns non-empty patterns.""" - _get_datetime_patterns.cache_clear() - assert len(_get_datetime_patterns("en-US")) > 0 - - def test_cldr_pattern_success_path(self) -> None: - """Successful CLDR datetime pattern extraction via mock.""" - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - - class MockDateTimeFormat: - def __init__(self, pattern_str: str) -> None: - self._pattern = pattern_str - - @property - def pattern(self) -> str: - return self._pattern - - mock_short = MockDateTimeFormat("M/d/yy, h:mm a") - mock_medium = MockDateTimeFormat("MMM d, yyyy, h:mm:ss a") - mock_long = MockDateTimeFormat("MMMM d, yyyy 'at' h:mm:ss a") - - with patch.object(Locale, "parse") as mock_parse: - mock_locale = MagicMock() - mock_datetime_formats = MagicMock() - mock_datetime_formats.__getitem__ = MagicMock( - side_effect=lambda k: { - "short": mock_short, - "medium": mock_medium, - "long": mock_long, - }.get(k, mock_short) - ) - mock_datetime_formats.get = MagicMock( - return_value="{1}, {0}" - ) - mock_locale.datetime_formats = mock_datetime_formats - - mock_date_format = MockDateTimeFormat("M/d/yy") - mock_date_formats = MagicMock() - mock_date_formats.__getitem__ = MagicMock( - return_value=mock_date_format - ) - mock_locale.date_formats = mock_date_formats - mock_parse.return_value = mock_locale - - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - patterns = _get_datetime_patterns("mock-cldr-success-v1") - - assert len(patterns) > 0 - pattern_str = " ".join(p[0] for p in patterns) - assert "%" in pattern_str - - def test_attribute_error_in_pattern(self) -> None: - """AttributeError accessing datetime pattern handled gracefully.""" - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - - class RaisingFormat: - @property - def pattern(self) -> str: - msg = "no pattern attribute" - raise AttributeError(msg) - - mock_format = RaisingFormat() - - with patch.object(Locale, "parse") as mock_parse: - mock_locale = MagicMock() - mock_datetime_formats = MagicMock() - mock_datetime_formats.__getitem__ = MagicMock( - return_value=mock_format - ) - mock_datetime_formats.get = MagicMock(return_value=None) - mock_locale.datetime_formats = mock_datetime_formats - mock_date_formats = MagicMock() - mock_date_formats.__getitem__ = MagicMock( - return_value=mock_format - ) - mock_locale.date_formats = mock_date_formats - mock_parse.return_value = mock_locale - - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - patterns = _get_datetime_patterns( - "mock-locale-datetime-attr-err-v3" - ) - - assert len(patterns) > 0 - - def test_key_error_via_missing_key(self) -> None: - """KeyError accessing datetime style handled gracefully.""" - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - - with patch.object(Locale, "parse") as mock_parse: - mock_locale = MagicMock() - mock_datetime_formats = MagicMock() - mock_datetime_formats.__getitem__ = MagicMock( - side_effect=KeyError("No format") - ) - mock_datetime_formats.get = MagicMock(return_value=None) - mock_locale.datetime_formats = mock_datetime_formats - mock_date_formats = MagicMock() - mock_date_formats.__getitem__ = MagicMock( - side_effect=KeyError("No format") - ) - mock_locale.date_formats = mock_date_formats - mock_parse.return_value = mock_locale - - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - patterns = _get_datetime_patterns( - "mock-locale-keyerror-v2" - ) - - assert patterns == () - - def test_raises_babel_import_error_when_babel_missing(self) -> None: - """Raises BabelImportError when Babel unavailable.""" - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - _bc._babel_available = None - - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_: dict[str, object] | None = None, - locals_: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel": - msg = "No module named 'babel'" - raise ImportError(msg) - return original_import(name, globals_, locals_, fromlist, level) - - try: - with patch.object( - builtins, "__import__", side_effect=mock_import - ): - with pytest.raises( - ImportError, match="parse" - ) as exc_info: - _get_datetime_patterns("en_US") - assert exc_info.typename == "BabelImportError" - assert "parse_datetime" in str(exc_info.value) - finally: - _bc._babel_available = None - - def test_babel_import_error_feature_name(self) -> None: - """BabelImportError contains correct feature name.""" - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - _bc._babel_available = None - - babel_modules_backup = {} - babel_keys = [ - k for k in sys.modules - if k == "babel" or k.startswith("babel.") - ] - for key in babel_keys: - babel_modules_backup[key] = sys.modules.pop(key, None) - - try: - original_import = builtins.__import__ - - def mock_import( - name: str, - globals_: dict[str, object] | None = None, - locals_: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel" or name.startswith("babel."): - msg = f"No module named '{name}'" - raise ImportError(msg) - return original_import( - name, globals_, locals_, fromlist, level - ) - - with patch.object( - builtins, "__import__", side_effect=mock_import - ): - with pytest.raises( - ImportError, match="parse" - ) as exc_info: - _get_datetime_patterns("en_US") - assert "parse_datetime" in str(exc_info.value) - finally: - for key, value in babel_modules_backup.items(): - if value is not None: - sys.modules[key] = value - _get_datetime_patterns.cache_clear() - _get_date_patterns.cache_clear() - _bc._babel_available = None - - -# ============================================================================ -# _preprocess_datetime_input -# ============================================================================ - - -class TestPreprocessDatetimeInput: - """Test _preprocess_datetime_input function.""" - - def test_with_has_era_true(self) -> None: - """has_era=True triggers _strip_era.""" - result = _preprocess_datetime_input("28 Jan 2025 AD", has_era=True) - assert "AD" not in result - assert result == "28 Jan 2025" - - def test_with_has_era_false(self) -> None: - """has_era=False returns value unchanged.""" - value = "2025-01-28 14:30:00" - assert _preprocess_datetime_input(value, has_era=False) == value - - def test_with_era_and_timezone(self) -> None: - """Era is stripped but timezone preserved.""" - result = _preprocess_datetime_input( - "28 Jan 2025 AD PST", has_era=True - ) - assert "AD" not in result - assert "PST" in result - - -# ============================================================================ -# _babel_to_strptime: Timezone Token Handling -# ============================================================================ - - -class TestBabelToStrptimeTimezoneToken: - """Test _babel_to_strptime timezone token handling.""" - - def test_timezone_z(self) -> None: - """Timezone token 'z' is removed from pattern.""" - pattern, has_era = _babel_to_strptime("d MMM y HH:mm z") - assert has_era is False - assert "z" not in pattern - - def test_timezone_zzzz(self) -> None: - """Timezone token 'zzzz' is removed.""" - pattern, has_era = _babel_to_strptime( - "MMMM d, y 'at' h:mm a zzzz" - ) - assert has_era is False - assert "zzzz" not in pattern - - def test_timezone_v(self) -> None: - """Timezone token 'v' is removed.""" - pattern, has_era = _babel_to_strptime("d MMM y HH:mm v") - assert has_era is False - assert "v" not in pattern - - def test_timezone_vvvv(self) -> None: - """Timezone token 'vvvv' is removed.""" - pattern, has_era = _babel_to_strptime("d MMM y HH:mm vvvv") - assert has_era is False - assert "vvvv" not in pattern - - def test_timezone_o(self) -> None: - """Timezone token 'O' is removed.""" - pattern, has_era = _babel_to_strptime("d MMM y HH:mm O") - assert has_era is False - assert "O" not in pattern - - def test_both_era_and_timezone(self) -> None: - """Both era and timezone tokens handled correctly.""" - pattern, has_era = _babel_to_strptime("d MMM y G HH:mm z") - assert has_era is True - assert "G" not in pattern - assert "z" not in pattern - - def test_none_token_fallthrough(self) -> None: - """None-mapped token that is not era is silently dropped.""" - from ftllexengine.parsing import dates as dates_module - - original_map = dates_module._BABEL_TOKEN_MAP.copy() - modified_map = original_map.copy() - modified_map["QQQ"] = None - - with patch.object( - dates_module, "_BABEL_TOKEN_MAP", modified_map - ): - pattern, has_era = _babel_to_strptime( - "d MMM y QQQ HH:mm" - ) - assert has_era is False - assert "QQQ" not in pattern - - def test_zzzz_localized_gmt_skipped(self) -> None: - """ZZZZ (localized GMT) is skipped entirely.""" - pattern, has_era = _babel_to_strptime("d MMM y HH:mm ZZZZ") - assert has_era is False - assert "ZZZZ" not in pattern - assert "%z" not in pattern - - def test_trailing_whitespace_normalized(self) -> None: - """Trailing whitespace from skipped tokens is stripped.""" - pattern, has_era = _babel_to_strptime("HH:mm zzzz") - assert has_era is False - assert pattern == "%H:%M" - - def test_multiple_trailing_spaces_normalized(self) -> None: - """Multiple trailing spaces from skipped tokens stripped.""" - pattern, has_era = _babel_to_strptime("HH:mm zzzz") - assert has_era is False - assert pattern == "%H:%M" - - -# ============================================================================ -# Babel Datetime Format Conversion (Mock) -# ============================================================================ - - -class TestBabelDatetimeFormatConversion: - """Test Babel datetime format conversion with mock pattern objects.""" - - def test_babel_datetime_format_with_mock(self) -> None: - """Mock Babel to return pattern object for datetime_formats.""" - from ftllexengine.parsing import dates - - dates._get_datetime_patterns.cache_clear() - dates._get_date_patterns.cache_clear() - - try: - mock_pattern = Mock() - mock_pattern.pattern = "M/d/yy, h:mm a" - - mock_locale = Mock() - mock_locale.datetime_formats = { - "short": mock_pattern, "medium": mock_pattern, - } - mock_date_format = Mock() - mock_date_format.pattern = "M/d/yy" - mock_locale.date_formats = {"short": mock_date_format} - - with patch("babel.Locale") as mock_locale_class: - mock_locale_class.parse.return_value = mock_locale - patterns = dates._get_datetime_patterns( - "test_mock_locale" - ) - assert len(patterns) > 0 - finally: - dates._get_datetime_patterns.cache_clear() - dates._get_date_patterns.cache_clear() - - -# ============================================================================ -# Quoted Literals in CLDR Patterns -# ============================================================================ - - -class TestQuotedLiteralsInCLDRPatterns: - """Test non-empty quoted literals in CLDR date patterns.""" - - def test_parse_date_russian(self) -> None: - """Russian date parsing with short format.""" - result, errors = parse_date("28.01.2025", "ru_RU") - assert not errors - assert result is not None - assert result.year == 2025 - - def test_parse_date_spanish(self) -> None: - """Spanish short format d/M/yy.""" - result, errors = parse_date("28/01/25", "es_ES") - assert not errors - assert result is not None - assert result.year == 2025 - - def test_parse_date_portuguese(self) -> None: - """Portuguese date format.""" - result, errors = parse_date("28/01/2025", "pt_PT") - assert not errors - assert result is not None - assert result.year == 2025 - - -# ============================================================================ -# Time-First Datetime Ordering -# ============================================================================ - - -class TestDatetimeTimeFirstOrdering: - """Test time-first datetime ordering (mock locales).""" - - def test_time_first_ordering(self) -> None: - """Mock locale with time-first ordering generates patterns.""" - _get_datetime_patterns.cache_clear() - - original_parse = Locale.parse - - def mock_parse_time_first(locale_str: str) -> MagicMock: - real_locale = original_parse(locale_str) - mock_locale = MagicMock(spec=Locale) - - time_first_pattern = "{0} {1}" - mock_datetime_format = MagicMock( - return_value=time_first_pattern - ) - mock_datetime_format.__str__ = MagicMock( # type: ignore[method-assign] - return_value=time_first_pattern - ) - mock_datetime_format.pattern = time_first_pattern - - mock_locale.datetime_formats = { - "short": mock_datetime_format, - "medium": mock_datetime_format, - "long": mock_datetime_format, - } - mock_locale.date_formats = real_locale.date_formats - return mock_locale - - with patch( - "babel.Locale.parse", side_effect=mock_parse_time_first - ): - patterns = _get_datetime_patterns("en_US") - - assert len(patterns) > 0 - - time_first_found = False - for pattern, _has_era in patterns: - time_pos = min( - ( - pattern.find(t) - for t in ["%H", "%I"] - if pattern.find(t) != -1 - ), - default=-1, - ) - date_pos = min( - ( - pattern.find(d) - for d in ["%d", "%m", "%Y"] - if pattern.find(d) != -1 - ), - default=-1, - ) - if ( - time_pos != -1 - and date_pos != -1 - and time_pos < date_pos - ): - time_first_found = True - break - - assert time_first_found - _get_datetime_patterns.cache_clear() - - def test_parse_datetime_with_time_first_locale(self) -> None: - """Integration: parse datetime with time-first mock locale.""" - _get_datetime_patterns.cache_clear() - - original_parse = Locale.parse - - def mock_parse_time_first(locale_str: str) -> MagicMock: - real_locale = original_parse(locale_str) - mock_locale = MagicMock(spec=Locale) - - time_first_pattern = "{0} {1}" - mock_datetime_format = MagicMock( - return_value=time_first_pattern - ) - mock_datetime_format.__str__ = MagicMock( # type: ignore[method-assign] - return_value=time_first_pattern - ) - mock_locale.datetime_formats = { - "short": mock_datetime_format, - "medium": mock_datetime_format, - } - mock_locale.date_formats = real_locale.date_formats - return mock_locale - - with patch( - "babel.Locale.parse", side_effect=mock_parse_time_first - ): - result, _errors = parse_datetime( - "14:30 28.01.2025", "de_DE" - ) - - assert result is None or result.year in (2025, 1925) - _get_datetime_patterns.cache_clear() - - -# ============================================================================ -# BabelImportError Structure -# ============================================================================ - - -class TestBabelImportErrorBehavior: - """Test BabelImportError structure and message format.""" - - def test_babel_import_error_structure(self) -> None: - """BabelImportError has correct structure and message.""" - from ftllexengine.core.babel_compat import BabelImportError - - error = BabelImportError("parse_date") - assert error.feature == "parse_date" - assert "parse_date" in str(error) - assert "pip install ftllexengine[babel]" in str(error) - assert isinstance(error, ImportError) - - def test_get_date_patterns_returns_valid_patterns(self) -> None: - """_get_date_patterns returns valid (pattern, has_era) tuples.""" - from ftllexengine.parsing import dates - - dates._get_date_patterns.cache_clear() - patterns = dates._get_date_patterns("en_US") - - assert isinstance(patterns, tuple) - assert len(patterns) > 0 - for pattern, has_era in patterns: - assert isinstance(pattern, str) - assert isinstance(has_era, bool) - - def test_get_datetime_patterns_returns_valid_patterns(self) -> None: - """_get_datetime_patterns returns valid (pattern, has_era) tuples.""" - from ftllexengine.parsing import dates - - dates._get_datetime_patterns.cache_clear() - patterns = dates._get_datetime_patterns("en_US") - - assert isinstance(patterns, tuple) - assert len(patterns) > 0 - for pattern, has_era in patterns: - assert isinstance(pattern, str) - assert isinstance(has_era, bool) - - def test_parse_date_works(self) -> None: - """parse_date works correctly when Babel is installed.""" - result, errors = parse_date("2025-01-28", "en_US") - assert not errors - assert result is not None - assert result.year == 2025 - - def test_parse_datetime_works(self) -> None: - """parse_datetime works correctly when Babel is installed.""" - result, errors = parse_datetime("2025-01-28 14:30", "en_US") - assert not errors - assert result is not None - assert result.year == 2025 - assert result.hour == 14 - - -# ============================================================================ -# Hypothesis Property Tests -# ============================================================================ - - -class TestDatetimeProperties: - """Property-based tests for datetime parsing.""" - - @given( - hour=st.integers(min_value=0, max_value=23), - minute=st.integers(min_value=0, max_value=59), - ) - def test_parse_datetime_various_times( - self, hour: int, minute: int - ) -> None: - """PROPERTY: Datetime patterns handle various times.""" - time_of_day = "morning" if hour < 12 else "afternoon" - event(f"time_of_day={time_of_day}") - - date_str = f"28.01.25, {hour:02d}:{minute:02d}" - result, errors = parse_datetime(date_str, "de_DE") - assert not errors - if result is not None: - assert result.hour == hour - assert result.minute == minute - - @given( - year=st.integers(min_value=2020, max_value=2030), - month=st.integers(min_value=1, max_value=12), - day=st.integers(min_value=1, max_value=28), - hour=st.integers(min_value=0, max_value=23), - minute=st.integers(min_value=0, max_value=59), - ) - def test_datetime_roundtrip( - self, - year: int, - month: int, - day: int, - hour: int, - minute: int, - ) -> None: - """PROPERTY: Datetime ISO formatted then parsed preserves values.""" - event(f"year={year}") - time_of_day = "morning" if hour < 12 else "afternoon" - event(f"time_of_day={time_of_day}") - - dt = datetime(year, month, day, hour, minute, 0, tzinfo=UTC) - iso_str = dt.strftime("%Y-%m-%d %H:%M:%S") - result, errors = parse_datetime(iso_str, "en_US") - - assert not errors - if result is not None: - assert result.year == year - assert result.month == month - assert result.day == day - assert result.hour == hour - assert result.minute == minute - - -# ============================================================================ -# Integration: Full Coverage Verification -# ============================================================================ - - -class TestIntegrationFullCoverage: - """Integration test exercising multiple code branches.""" - - def test_parse_datetime_exercises_all_branches(self) -> None: - """Exercise ISO, CLDR, error, and empty paths.""" - test_cases = [ - ("2025-01-28T14:30:00", "en_US", True), - ("1/28/25, 2:30 PM", "en_US", True), - ("not-a-datetime", "en_US", False), - ("", "en_US", False), - ] - for datetime_str, locale, should_succeed in test_cases: - result, errors = parse_datetime(datetime_str, locale) - if should_succeed: - assert result is not None or len(errors) > 0 - else: - assert len(errors) > 0 - assert result is None - - -# ============================================================================ -# DATETIME SEPARATOR AND BABEL PATTERN TOKENIZER COVERAGE -# ============================================================================ - - -class TestTokenizeBabelPatternEdgeCases: - """_tokenize_babel_pattern: patterns starting with a quote and unclosed sections.""" - - def test_quoted_section_with_escaped_quote(self) -> None: - """Escaped quote '' inside a quoted literal is unescaped to a single quote.""" - pattern = "'It''s a test'" - tokens = _tokenize_babel_pattern(pattern) - assert any("It's a test" in t for t in tokens) - - def test_unclosed_quoted_section(self) -> None: - """Unclosed quoted literal collects remaining characters.""" - pattern = "'unclosed" - tokens = _tokenize_babel_pattern(pattern) - assert any("unclosed" in t for t in tokens) - - -class TestDatesQuotedLiteral: - """Non-empty quoted literal in Babel date pattern tokenizes correctly.""" - - def test_quoted_literal_in_pattern(self) -> None: - """Spanish-style quoted separator 'de' is extracted as a token.""" - pattern = "d 'de' MMMM 'de' y" - tokens = _tokenize_babel_pattern(pattern) - assert "de" in tokens - - -class TestParseDateFourDigitYear: - """4-digit year inputs are accepted for locales whose CLDR short format uses yy. - - CLDR short patterns often specify a 2-digit year (e.g. lv-LV: dd.MM.yy, - en-US: M/d/yy). Documents commonly write dates with a 4-digit year for - clarity and unambiguity. Both forms must parse successfully. - """ - - def test_lv_lv_two_digit_year_parses(self) -> None: - """lv-LV short format (dd.MM.yy) parses 2-digit year correctly.""" - result, errors = parse_date("15.01.26", "lv_LV") - assert not errors - assert result == date(2026, 1, 15) - - def test_lv_lv_four_digit_year_parses(self) -> None: - """lv-LV common form (dd.MM.yyyy) parses 4-digit year correctly.""" - result, errors = parse_date("15.01.2026", "lv_LV") - assert not errors - assert result == date(2026, 1, 15) - - def test_lv_lv_four_digit_year_roundtrip_identity(self) -> None: - """Parse("15.01.2026", lv_LV) yields the same date as parse("15.01.26", lv_LV).""" - result_2, _ = parse_date("15.01.26", "lv_LV") - result_4, _ = parse_date("15.01.2026", "lv_LV") - assert result_2 == result_4 - - def test_de_de_four_digit_year_parses(self) -> None: - """de-DE short format (dd.MM.yy) accepts 4-digit year variant.""" - result, errors = parse_date("28.01.2025", "de_DE") - assert not errors - assert result == date(2025, 1, 28) - - def test_pl_pl_four_digit_year_parses(self) -> None: - """pl-PL short format accepts 4-digit year variant.""" - result, errors = parse_date("28.01.2025", "pl_PL") - assert not errors - assert result == date(2025, 1, 28) - - def test_two_digit_year_still_expands_via_cldr_semantics(self) -> None: - """2-digit input still matches first (CLDR %y expansion: 00-68 -> 2000-2068).""" - # %y in Python strptime: 00-68 -> 2000-2068, 69-99 -> 1969-1999 - result_short, _ = parse_date("28.01.68", "lv_LV") - assert result_short is not None - assert result_short.year == 2068 # %y expansion - - def test_extract_cldr_patterns_includes_four_digit_variant(self) -> None: - """_get_date_patterns for lv_LV includes both %y and %Y variants for short style.""" - patterns = _get_date_patterns("lv_LV") - strptime_patterns = [p for p, _ in patterns] - has_two_digit = any("%y" in p for p in strptime_patterns) - has_four_digit = any(("%Y" in p and ".%Y" in p) or "%Y" in p for p in strptime_patterns) - assert has_two_digit, "2-digit year pattern (%y) must be present for lv_LV" - assert has_four_digit, "4-digit year variant (%Y) must be generated for lv_LV" +"""Aggregated parsing dates test surface.""" + +from tests.parsing_dates_cases.babel_datetime_format_conversion_mock import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.babel_import_error_structure import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.babel_to_strptime_timezone_token_handling import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.datetime_separator_and_babel_pattern_tokenizer_coverage import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.extract_datetime_separator import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.get_date_patterns_exception_handling import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.get_datetime_patterns_exception_handling import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.hypothesis_property_tests import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.integration_full_coverage_verification import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.parse_date_cases import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.parse_datetime_cases import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.preprocess_datetime_input import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.quoted_literals_in_cldr_patterns import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.time_first_datetime_ordering import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.tokenize_babel_pattern import * # noqa: F403 - re-export split test surface +from tests.parsing_dates_cases.unknown_locale_handling import * # noqa: F403 - re-export split test surface diff --git a/tests/test_parsing_numbers.py b/tests/test_parsing_numbers.py index 18b18d17..56c0eb99 100644 --- a/tests/test_parsing_numbers.py +++ b/tests/test_parsing_numbers.py @@ -48,6 +48,15 @@ def test_parse_decimal_de_de(self) -> None: assert not errors assert result == Decimal("0.01") + def test_parse_decimal_ar_eg_native_digits(self) -> None: + """Parse Arabic-Indic digits for locales with non-Latin defaults.""" + result, errors = parse_decimal( + "\u0661\u0662\u066c\u0663\u0664\u0665\u066b\u0666\u0667", + "ar_EG", + ) + assert not errors + assert result == Decimal("12345.67") + def test_parse_decimal_financial_precision(self) -> None: """Decimal preserves financial precision.""" amount, errors = parse_decimal("100,50", "lv_LV") @@ -109,6 +118,79 @@ def test_parse_decimal_invalid_returns_error(self) -> None: assert result is None assert errors[0].parse_type == "decimal" + def test_parse_decimal_type_error_returns_error(self) -> None: + """Non-string input returns error in tuple; function never raises.""" + result, errors = parse_decimal(1234, "en_US") # type: ignore[arg-type] + assert len(errors) > 0 + assert result is None + assert errors[0].parse_type == "decimal" + + def test_parse_decimal_falls_back_when_numbering_system_kw_rejected(self) -> None: + """Fallback call works when Babel parser does not accept numbering_system.""" + + def fake_parse_decimal(raw: str, *, locale: object, **kwargs: object) -> Decimal: + assert raw == "123.45" + assert locale is mock_locale + if "numbering_system" in kwargs: + msg = "numbering_system unsupported" + raise TypeError(msg) + return Decimal("123.45") + + mock_locale = MagicMock() + mock_locale.default_numbering_system = "arab" + mock_locale.number_symbols = { + "arab": {"group": "\u066c", "decimal": "\u066b"}, + } + decimal_format = MagicMock() + decimal_format.grouping = (3, 3) + mock_locale.decimal_formats = {None: decimal_format} + mock_cls = MagicMock() + mock_cls.parse.return_value = mock_locale + + with patch( + "ftllexengine.parsing.numbers.get_locale_class", return_value=mock_cls, + ), patch( + "ftllexengine.parsing.numbers.get_parse_decimal_func", + return_value=fake_parse_decimal, + ): + result, errors = parse_decimal("123.45", "en_US") + + assert not errors + assert result == Decimal("123.45") + + def test_parse_decimal_reuses_grouping_failure_reason_across_numbering_systems( + self, + ) -> None: + """Repeated invalid grouping across numbering systems returns one parse error.""" + + def fake_parse_decimal(raw: str, *, _locale: object, **kwargs: object) -> Decimal: + msg = f"unexpected parse attempt for {raw!r} with {kwargs!r}" + raise AssertionError(msg) + + mock_locale = MagicMock() + mock_locale.default_numbering_system = "arab" + mock_locale.number_symbols = { + "arab": {"group": ",", "decimal": "."}, + "latn": {"group": ",", "decimal": "."}, + } + decimal_format = MagicMock() + decimal_format.grouping = (3, 3) + mock_locale.decimal_formats = {None: decimal_format} + mock_cls = MagicMock() + mock_cls.parse.return_value = mock_locale + + with patch( + "ftllexengine.parsing.numbers.get_locale_class", return_value=mock_cls, + ), patch( + "ftllexengine.parsing.numbers.get_parse_decimal_func", + return_value=fake_parse_decimal, + ): + result, errors = parse_decimal("1,2,3", "en_US") + + assert result is None + assert len(errors) == 1 + assert "group separators not at standard digit-boundary positions" in errors[0].message + def test_parse_decimal_empty_returns_error(self) -> None: """Empty input returns error in tuple.""" result, errors = parse_decimal("", "en_US") @@ -183,6 +265,24 @@ def test_roundtrip_decimal_precision(self) -> None: assert not errors assert parsed == original + def test_roundtrip_decimal_ar_eg_with_rtl_marks(self) -> None: + """RTL locale output roundtrips through parse_decimal().""" + from ftllexengine.runtime.functions import number_format + + original = Decimal("1234.56") + formatted = number_format( + original, "ar-EG", minimum_fraction_digits=2, use_grouping=True + ) + parsed, errors = parse_decimal(str(formatted), "ar_EG") + assert not errors + assert parsed == original + + def test_parse_decimal_ignores_bidi_isolation_marks(self) -> None: + """Invisible bidi controls are ignored at the parsing boundary.""" + parsed, errors = parse_decimal("\u2068123.45\u2069", "en_US") + assert not errors + assert parsed == Decimal("123.45") + class TestValidateGroupPositions: """Direct tests for _validate_group_positions branch coverage. diff --git a/tests/test_performance_regression.py b/tests/test_performance_regression.py index a37a79d6..ee8de3c7 100644 --- a/tests/test_performance_regression.py +++ b/tests/test_performance_regression.py @@ -28,6 +28,7 @@ from __future__ import annotations import time +from collections.abc import Callable import pytest from hypothesis import event, given, settings @@ -61,29 +62,63 @@ # ============================================================================== -def measure_parse_time(ftl: str) -> float: - """Measure time to parse FTL string (in seconds).""" +def _measure_best_time( + func: Callable[[], object], *, warmup_runs: int = 0, timed_runs: int = 1 +) -> float: + """Return the fastest observed runtime for one callable. + + Warmup runs reduce first-call effects, and multiple timed runs prevent + single-shot scheduler noise from masquerading as a regression. + """ + for _ in range(warmup_runs): + func() + + best = float("inf") + for _ in range(timed_runs): + start = time.perf_counter() + func() + best = min(best, time.perf_counter() - start) + + return best + + +def measure_parse_time(ftl: str, *, warmup_runs: int = 0, timed_runs: int = 1) -> float: + """Measure stable time to parse FTL string (in seconds).""" parser = FluentParserV1() - start = time.perf_counter() - _ = parser.parse(ftl) - end = time.perf_counter() - return end - start + + def parse_resource() -> object: + return parser.parse(ftl) + + return _measure_best_time(parse_resource, warmup_runs=warmup_runs, timed_runs=timed_runs) + + +def measure_serialize_time( + resource: Resource, *, warmup_runs: int = 0, timed_runs: int = 1 +) -> float: + """Measure stable time to serialize resource (in seconds).""" + + def serialize_resource() -> object: + return serialize(resource) + + return _measure_best_time( + serialize_resource, warmup_runs=warmup_runs, timed_runs=timed_runs + ) -def measure_serialize_time(resource: Resource) -> float: - """Measure time to serialize resource (in seconds).""" - start = time.perf_counter() - _ = serialize(resource) - end = time.perf_counter() - return end - start +def measure_resolution_time( + bundle: FluentBundle, + msg_id: str, + args: dict[str, object], + *, + warmup_runs: int = 0, + timed_runs: int = 1, +) -> float: + """Measure stable time to resolve a message (in seconds).""" + def resolve() -> object: + return bundle.format_pattern(msg_id, args) # type: ignore[arg-type] -def measure_resolution_time(bundle: FluentBundle, msg_id: str, args: dict[str, object]) -> float: - """Measure time to resolve message (in seconds).""" - start = time.perf_counter() - _ = bundle.format_pattern(msg_id, args) # type: ignore[arg-type] - end = time.perf_counter() - return end - start + return _measure_best_time(resolve, warmup_runs=warmup_runs, timed_runs=timed_runs) # ============================================================================== @@ -112,12 +147,7 @@ def test_parser_scales_linearly_with_message_count(self): messages = [f"msg{i} = Value {i}\n" for i in range(size)] ftl = "".join(messages) - # Warmup run to stabilize JIT/cache - _ = measure_parse_time(ftl) - - # Take minimum of 3 runs (more stable than mean/median) - time_measurements = [measure_parse_time(ftl) for _ in range(3)] - times.append(min(time_measurements)) + times.append(measure_parse_time(ftl, warmup_runs=1, timed_runs=3)) # Calculate normalized complexity ratios # For O(n): these should be close to 1.0 @@ -147,7 +177,7 @@ def test_parser_performance_baseline(self, message_count: int) -> None: messages = [f"msg{i} = Value {i}\n" for i in range(message_count)] ftl = "".join(messages) - parse_time = measure_parse_time(ftl) + parse_time = measure_parse_time(ftl, warmup_runs=1, timed_runs=5) # Calculate messages per second messages_per_sec = message_count / parse_time if parse_time > 0 else float("inf") @@ -165,7 +195,7 @@ def test_parser_handles_large_messages_efficiently(self): long_pattern = "x" * 1000 ftl = f"msg = {long_pattern}\n" - parse_time = measure_parse_time(ftl) + parse_time = measure_parse_time(ftl, warmup_runs=1, timed_runs=5) # Should parse in < 10ms assert parse_time < 0.01, f"Parser too slow for long patterns: {parse_time:.4f}s" @@ -200,12 +230,7 @@ def test_serializer_scales_linearly(self): messages = [f"msg{i} = Value {i}\n" for i in range(size)] resource = parser.parse("".join(messages)) - # Warmup run - _ = measure_serialize_time(resource) - - # Take minimum of 3 runs - time_measurements = [measure_serialize_time(resource) for _ in range(3)] - times.append(min(time_measurements)) + times.append(measure_serialize_time(resource, warmup_runs=1, timed_runs=3)) # Calculate normalized complexity ratios ratio_1_to_2 = (times[1] / times[0]) / (sizes[1] / sizes[0]) @@ -232,7 +257,7 @@ def test_serializer_performance_baseline(self): messages = [f"msg{i} = Value {i}\n" for i in range(200)] resource = parser.parse("".join(messages)) - serialize_time = measure_serialize_time(resource) + serialize_time = measure_serialize_time(resource, warmup_runs=1, timed_runs=5) # Calculate messages per second messages_per_sec = 200 / serialize_time if serialize_time > 0 else float("inf") @@ -368,7 +393,7 @@ def test_deeply_nested_select_expressions(self): ftl = f"msg = {nested}" # Should parse in reasonable time (< 100ms) - parse_time = measure_parse_time(ftl) + parse_time = measure_parse_time(ftl, warmup_runs=1, timed_runs=5) assert parse_time < 0.1, f"Parser too slow for nested selects: {parse_time:.4f}s" def test_very_large_resource(self): @@ -381,7 +406,7 @@ def test_very_large_resource(self): ftl = "".join(messages) # Should parse in < 1 second - parse_time = measure_parse_time(ftl) + parse_time = measure_parse_time(ftl, warmup_runs=1, timed_runs=5) assert parse_time < 1.0, f"Parser too slow for 1000 messages: {parse_time:.4f}s" # Should use reasonable memory (parse shouldn't copy excessively) @@ -394,7 +419,7 @@ def test_large_number_literals(self): # Create message with very large number ftl = f"msg = {{ {10**100} }}" - parse_time = measure_parse_time(ftl) + parse_time = measure_parse_time(ftl, warmup_runs=1, timed_runs=5) assert parse_time < 0.01, f"Parser too slow for large numbers: {parse_time:.4f}s" @given(st.lists(ftl_simple_messages(), min_size=10, max_size=50)) @@ -431,7 +456,7 @@ def test_no_regex_catastrophic_backtracking(self): # Use repetitive pattern that could trigger backtracking ftl = "msg = " + "a" * 50 + "x" - parse_time = measure_parse_time(ftl) + parse_time = measure_parse_time(ftl, warmup_runs=1, timed_runs=5) assert parse_time < 0.01, f"Possible catastrophic backtracking: {parse_time:.4f}s" def test_no_quadratic_string_concatenation(self): @@ -447,7 +472,7 @@ def test_no_quadratic_string_concatenation(self): resource = parser.parse("".join(messages)) # Serialize (should use efficient string building) - serialize_time = measure_serialize_time(resource) + serialize_time = measure_serialize_time(resource, warmup_runs=1, timed_runs=5) # Should be fast even for 500 messages (< 100ms) assert serialize_time < 0.1, ( diff --git a/tests/test_runtime_bundle.py b/tests/test_runtime_bundle.py index 45e98da9..b74ec752 100644 --- a/tests/test_runtime_bundle.py +++ b/tests/test_runtime_bundle.py @@ -1,2468 +1,6 @@ -"""Tests for runtime.bundle: FluentBundle resource loading, formatting, branch coverage.""" +"""Aggregated runtime bundle test surface.""" -from __future__ import annotations - -import logging -from typing import Any -from unittest.mock import Mock, patch - -import pytest -from hypothesis import assume, event, example, given -from hypothesis import strategies as st - -from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT, MAX_SOURCE_SIZE -from ftllexengine.core.locale_utils import normalize_locale -from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError, ValidationError -from ftllexengine.integrity import FormattingIntegrityError, SyntaxIntegrityError -from ftllexengine.runtime import FluentBundle -from ftllexengine.runtime.cache_config import CacheConfig -from ftllexengine.runtime.function_bridge import FunctionRegistry -from ftllexengine.runtime.functions import create_default_registry -from ftllexengine.validation.resource import validate_resource - - -class TestFluentBundleCreation: - """Test FluentBundle initialization.""" - - def test_create_bundle_with_locale(self) -> None: - """Create bundle with locale code.""" - bundle = FluentBundle("lv_LV") - - assert bundle.locale == "lv_lv" - - def test_create_bundle_initializes_empty_registries(self) -> None: - """Bundle starts with empty message/term registries.""" - bundle = FluentBundle("en_US") - - assert len(bundle.get_message_ids()) == 0 - assert not bundle.has_message("any-message") - - -class TestFluentBundleAddResource: - """Test FluentBundle add_resource method.""" - - @pytest.fixture - def bundle(self) -> Any: - """Create bundle for testing.""" - return FluentBundle("lv_LV", strict=False) - - def test_add_resource_simple_message(self, bundle: Any) -> None: - """add_resource parses and registers simple message.""" - bundle.add_resource("hello = Sveiki, pasaule!") - - assert bundle.has_message("hello") - assert "hello" in bundle.get_message_ids() - - def test_add_resource_multiple_messages(self, bundle: Any) -> None: - """add_resource registers all messages from source.""" - source = """ -hello = Sveiki! -goodbye = Uz redzēšanos! -thanks = Paldies! -""" - bundle.add_resource(source) - - assert bundle.has_message("hello") - assert bundle.has_message("goodbye") - assert bundle.has_message("thanks") - assert len(bundle.get_message_ids()) == 3 - - def test_add_resource_message_with_variable(self, bundle: Any) -> None: - """add_resource handles messages with variables.""" - bundle.add_resource("welcome = Laipni lūdzam, { $name }!") - - assert bundle.has_message("welcome") - - def test_add_resource_message_with_attribute(self, bundle: Any) -> None: - """add_resource handles messages with attributes.""" - source = """ -button-save = Saglabāt - .tooltip = Saglabā ierakstu -""" - bundle.add_resource(source) - - assert bundle.has_message("button-save") - - def test_add_resource_with_junk_entries_continues(self, bundle: Any) -> None: - """add_resource with non-critical syntax errors creates junk but continues.""" - # Parser is robust - creates Junk entries for invalid syntax but doesn't crash - bundle.add_resource("invalid message syntax") - - # Bundle should still work, junk is just ignored - assert len(bundle.get_message_ids()) == 0 # No valid messages parsed - - def test_add_multiple_resources_accumulates(self, bundle: Any) -> None: - """Multiple add_resource calls accumulate messages.""" - bundle.add_resource("msg1 = First") - bundle.add_resource("msg2 = Second") - - assert bundle.has_message("msg1") - assert bundle.has_message("msg2") - assert len(bundle.get_message_ids()) == 2 - - -class TestFluentBundleFormatPattern: - """Test FluentBundle format_pattern method.""" - - @pytest.fixture - def bundle(self) -> Any: - """Create bundle with sample messages.""" - bundle = FluentBundle("lv_LV", strict=False) - bundle.add_resource(""" -hello = Sveiki, pasaule! -welcome = Laipni lūdzam, { $name }! -greeting = { $name } saka { $message } -button-save = Saglabāt - .tooltip = Saglabā ierakstu datubāzē -""") - return bundle - - def test_format_pattern_simple_message(self, bundle: Any) -> None: - """format_pattern returns simple message text.""" - result, errors = bundle.format_pattern("hello") - - assert result == "Sveiki, pasaule!" - assert errors == (), f"Unexpected errors: {errors}" - - def test_format_pattern_with_variable(self, bundle: Any) -> None: - """format_pattern substitutes variable from args.""" - result, errors = bundle.format_pattern("welcome", {"name": "Jānis"}) - - assert "Jānis" in result - assert "Laipni lūdzam" in result - assert errors == (), f"Unexpected errors: {errors}" - - def test_format_pattern_with_multiple_variables(self, bundle: Any) -> None: - """format_pattern substitutes multiple variables.""" - result, errors = bundle.format_pattern("greeting", {"name": "Anna", "message": "Sveiki"}) - - assert "Anna" in result - assert "Sveiki" in result - assert errors == (), f"Unexpected errors: {errors}" - - def test_format_pattern_missing_variable_uses_placeholder(self, bundle: Any) -> None: - """format_pattern handles missing variable gracefully.""" - result, errors = bundle.format_pattern("welcome", {}) - - # Should not crash, returns some fallback - assert isinstance(result, str) - assert len(errors) == 1, ( - f"Expected 1 error for missing variable, got {len(errors)}: {errors}" - ) - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - assert "variable" in str(errors[0]).lower() or "name" in str(errors[0]).lower() - - def test_format_pattern_with_attribute_parameter(self, bundle: Any) -> None: - """format_pattern accepts attribute parameter.""" - result, errors = bundle.format_pattern("button-save", attribute="tooltip") - - # Should successfully retrieve the .tooltip attribute - assert result == "Saglabā ierakstu datubāzē" - assert errors == (), f"Unexpected errors: {errors}" - - def test_format_pattern_missing_message_raises_error(self, bundle: Any) -> None: - """format_pattern for non-existent message raises FrozenFluentError.""" - result, errors = bundle.format_pattern("nonexistent-message") - assert len(errors) == 1 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - assert "not found" in str(errors[0]).lower() - assert result == "{nonexistent-message}" - - def test_format_pattern_none_args(self, bundle: Any) -> None: - """format_pattern with args=None works for messages without variables.""" - result, errors = bundle.format_pattern("hello", None) - - assert result == "Sveiki, pasaule!" - assert errors == (), f"Unexpected errors: {errors}" - - def test_format_pattern_empty_args(self, bundle: Any) -> None: - """format_pattern with empty dict works.""" - result, errors = bundle.format_pattern("hello", {}) - - assert result == "Sveiki, pasaule!" - assert errors == (), f"Unexpected errors: {errors}" - - -class TestFluentBundleHasMessage: - """Test FluentBundle has_message method.""" - - @pytest.fixture - def bundle(self) -> Any: - """Create bundle with messages.""" - bundle = FluentBundle("en_US") - bundle.add_resource("existing = This message exists") - return bundle - - def test_has_message_returns_true_when_exists(self, bundle: Any) -> None: - """has_message returns True for existing message.""" - assert bundle.has_message("existing") is True - - def test_has_message_returns_false_when_not_exists(self, bundle: Any) -> None: - """has_message returns False for non-existent message.""" - assert bundle.has_message("nonexistent") is False - - -class TestFluentBundleGetMessageIds: - """Test FluentBundle get_message_ids method.""" - - def test_get_message_ids_empty_bundle(self) -> None: - """get_message_ids returns empty list for new bundle.""" - bundle = FluentBundle("de_DE") - - assert bundle.get_message_ids() == [] - - def test_get_message_ids_returns_all_ids(self) -> None: - """get_message_ids returns all registered message IDs.""" - bundle = FluentBundle("pl_PL") - bundle.add_resource(""" -msg1 = First -msg2 = Second -msg3 = Third -""") - - ids = bundle.get_message_ids() - - assert len(ids) == 3 - assert "msg1" in ids - assert "msg2" in ids - assert "msg3" in ids - - -class TestFluentBundleAddFunction: - """Test FluentBundle add_function method.""" - - @pytest.fixture - def bundle(self) -> Any: - """Create bundle.""" - return FluentBundle("en_US") - - def test_add_function_registers_custom_function(self) -> None: - """add_function adds custom function to bundle.""" - bundle = FluentBundle("en", use_isolating=False) - - def CUSTOM(value: object) -> str: - return str(value).upper() - - bundle.add_function("CUSTOM", CUSTOM) - - # Verify function works by using it in a message - bundle.add_resource("msg = { CUSTOM($val) }") - result, _ = bundle.format_pattern("msg", {"val": "test"}) - assert result == "TEST" - - def test_add_function_with_callable(self) -> None: - """add_function accepts any callable.""" - bundle = FluentBundle("en", use_isolating=False) - - # Function must return string per spec - bundle.add_function("LAMBDA", lambda x: str(int(x) * 2)) - bundle.add_resource("msg = { LAMBDA($n) }") - result, _ = bundle.format_pattern("msg", {"n": "5"}) - assert result == "10" - - -class TestFluentBundleErrorHandling: - """Test FluentBundle error handling and edge cases.""" - - @pytest.fixture - def bundle(self) -> Any: - """Create bundle with test message.""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource("test = Test message") - return bundle - - def test_format_pattern_handles_resolver_errors_gracefully(self, bundle: Any) -> None: - """format_pattern returns fallback on resolver errors.""" - # Add message that references undefined variable - bundle.add_resource("broken-msg = Value is { $undefined }") - - result, errors = bundle.format_pattern("broken-msg", {}) - - # Should return result with variable fallback, plus error - assert isinstance(result, str) - assert "{$undefined}" in result # Variable fallback - assert len(errors) >= 1, f"Expected at least 1 error for undefined variable, got {errors}" - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - - def test_format_pattern_handles_key_error_gracefully(self, bundle: Any) -> None: - """format_pattern handles KeyError (missing variable) gracefully.""" - bundle.add_resource("needs-var = Hello { $name }") - - # Call without providing required variable - result, errors = bundle.format_pattern("needs-var", {}) - - # Should return result with variable fallback, plus error - assert isinstance(result, str) - assert "{$name}" in result - assert len(errors) >= 1, f"Expected error for missing variable, got {errors}" - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - - def test_format_pattern_handles_attribute_error_gracefully(self, bundle: Any) -> None: - """format_pattern handles AttributeError gracefully.""" - bundle.add_resource("attr-msg = Test") - - # Try to access non-existent attribute - result, errors = bundle.format_pattern("attr-msg", attribute="nonexistent") - - # Should handle gracefully with fallback + error - assert isinstance(result, str) - assert len(errors) >= 1, f"Expected error for nonexistent attribute, got {errors}" - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - assert "attribute" in str(errors[0]).lower() - - def test_format_pattern_handles_unexpected_errors_gracefully(self, bundle: Any) -> None: - """format_pattern catches unexpected exceptions.""" - # Even if something goes really wrong, bundle should not crash - result, errors = bundle.format_pattern("test", {}) - - assert result == "Test message" - assert errors == (), f"Unexpected errors: {errors}" - - def test_add_resource_with_terms_and_junk(self) -> None: - """add_resource handles mix of messages, terms, and junk.""" - bundle = FluentBundle("en_US", strict=False) - - source = """ -message1 = Hello --term1 = Brand Name -message2 = Goodbye -invalid syntax here --term2 = Another Term -""" - bundle.add_resource(source) - - # Messages should be registered - assert bundle.has_message("message1") - assert bundle.has_message("message2") - - # Terms should not appear in messages - assert not bundle.has_message("-term1") - - # Should have exactly 2 messages - assert len(bundle.get_message_ids()) == 2 - - -class TestFluentBundleIntegration: - """Integration tests for FluentBundle with complex scenarios.""" - - def test_complete_workflow_simple(self) -> None: - """Full workflow: create, add resource, format.""" - bundle = FluentBundle("lv_LV") - bundle.add_resource("greeting = Sveiki, { $name }!") - - result, errors = bundle.format_pattern("greeting", {"name": "Pēteris"}) - - assert "Sveiki" in result - assert "Pēteris" in result - assert errors == (), f"Unexpected errors: {errors}" - - def test_multiple_locales_independent(self) -> None: - """Multiple bundles for different locales are independent.""" - bundle_lv = FluentBundle("lv_LV") - bundle_en = FluentBundle("en_US") - - bundle_lv.add_resource("hello = Sveiki!") - bundle_en.add_resource("hello = Hello!") - - result_lv, errors_lv = bundle_lv.format_pattern("hello") - assert result_lv == "Sveiki!" - assert errors_lv == () - result_en, errors_en = bundle_en.format_pattern("hello") - assert result_en == "Hello!" - assert errors_en == () - - def test_overwrite_message_with_new_resource(self) -> None: - """Adding resource with same message ID overwrites.""" - bundle = FluentBundle("en_US") - - bundle.add_resource("msg = Original") - result1, errors1 = bundle.format_pattern("msg") - assert result1 == "Original" - assert errors1 == () - - bundle.add_resource("msg = Updated") - result2, errors2 = bundle.format_pattern("msg") - assert result2 == "Updated" - assert errors2 == () - - -class TestFluentBundleEdgeCases: - """Test edge cases and additional coverage paths.""" - - def test_add_resource_with_terms_only(self) -> None: - """Bundle handles resources with only terms (no messages).""" - bundle = FluentBundle("en_US") - - # Add resource with only terms (lines 76-77) - bundle.add_resource(""" --brand = MyApp --version = 3.0 --company = MyCompany -""") - - # No messages should be registered - assert len(bundle.get_message_ids()) == 0 - - # But terms are registered internally (can't query them directly) - # This exercises lines 76-77 (term registration) - - def test_format_pattern_with_recursion_error(self) -> None: - """Bundle handles RecursionError gracefully (line 152-155).""" - bundle = FluentBundle("en_US", strict=False) - - # While we can't easily create a RecursionError through normal means, - # we can test that other error types return fallback - bundle.add_resource("test-msg = Hello { $name }") - - # Missing variable triggers error path - result, errors = bundle.format_pattern("test-msg", {}) - - # Should return result with variable fallback, plus error - assert isinstance(result, str) - assert "{$name}" in result - assert len(errors) >= 1, f"Expected error for missing variable, got {errors}" - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - - def test_format_pattern_with_exception_in_resolver(self) -> None: - """Bundle catches unexpected exceptions in resolver (lines 156-160).""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg = Test value") - - # Normal case works - result, errors = bundle.format_pattern("msg", {}) - assert result == "Test value" - assert errors == (), f"Unexpected errors: {errors}" - - # Even with weird args, should not crash - result, errors = bundle.format_pattern( - "msg", {"weird": object()} # type: ignore[dict-item] - ) - assert isinstance(result, str) - assert errors == (), f"Unexpected errors: {errors}" - - def test_add_resource_with_invalid_fluent_syntax(self) -> None: - """Bundle handles completely invalid Fluent syntax.""" - bundle = FluentBundle("en_US", strict=False) - - # This would trigger parser error recovery - source = """ -valid-msg = This works -{ invalid { nested { braces -another-valid = Also works -""" - bundle.add_resource(source) - - # Valid messages should still be registered - assert bundle.has_message("valid-msg") - assert bundle.has_message("another-valid") - - def test_format_pattern_with_keyerror_from_resolver(self) -> None: - """Bundle handles KeyError from resolver (lines 148-151).""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource("needs-var = Value: { $required }") - - # Missing required variable triggers KeyError path - result, errors = bundle.format_pattern("needs-var", {}) - - # Should return fallback with variable reference - assert result == "Value: {$required}" - assert len(errors) == 1 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - - def test_format_pattern_with_attribute_error_from_resolver(self) -> None: - """Bundle handles AttributeError from resolver (lines 148-151).""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource(""" -msg = Test message - .tooltip = Tooltip text -""") - - # Request non-existent attribute triggers AttributeError path - result, errors = bundle.format_pattern("msg", attribute="nonexistent") - - # Should return fallback with attribute reference - assert result == "{msg.nonexistent}" - assert len(errors) == 1 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - - def test_add_function_registers_successfully(self) -> None: - """Bundle can register custom functions.""" - bundle = FluentBundle("en_US") - - # Add custom function - def UPPERCASE(text: object) -> str: - return str(text).upper() - - bundle.add_function("UPPERCASE", UPPERCASE) - - # Function is registered (can't easily test usage without full parser support) - # This exercises the add_function method - bundle.add_resource("msg = Test message") - result, errors = bundle.format_pattern("msg", {}) - assert result == "Test message" - assert errors == (), f"Unexpected errors: {errors}" - - def test_get_message_ids_with_terms_excluded(self) -> None: - """get_message_ids returns only messages, not terms.""" - bundle = FluentBundle("en_US") - - bundle.add_resource(""" -message1 = First message --term1 = A term -message2 = Second message --term2 = Another term -""") - - ids = bundle.get_message_ids() - - # Should have exactly 2 messages - assert len(ids) == 2 - assert "message1" in ids - assert "message2" in ids - - # Terms should NOT be in message IDs - assert "-term1" not in ids - assert "-term2" not in ids - - -class TestFluentBundleMockedErrors: - """Test FluentBundle error handlers using mocking.""" - - def test_format_pattern_with_keyerror_exception(self) -> None: - """Bundle propagates KeyError from resolver (fail-fast behavior). - - Internal errors (KeyError, AttributeError, etc.) are no longer - caught. This ensures bugs are detected immediately rather than hidden - behind fallback values. - """ - bundle = FluentBundle("en_US") - bundle.add_resource("msg = Hello { $name }") - - # Patch the resolver instance directly; resolver is eagerly initialized - # so patching the FluentResolver class does not affect existing bundles. - mock_resolver = Mock() - mock_resolver.resolve_message.side_effect = KeyError("name") - # KeyError propagates (fail-fast) - with ( - patch.object(bundle, "_resolver", mock_resolver), - pytest.raises(KeyError, match="name"), - ): - bundle.format_pattern("msg", {}) - - def test_format_pattern_with_attribute_error_exception(self) -> None: - """Bundle propagates AttributeError from resolver (fail-fast behavior). - - Internal errors are no longer caught. - """ - bundle = FluentBundle("en_US") - bundle.add_resource("msg = Hello") - - # Patch the resolver instance directly; resolver is eagerly initialized. - mock_resolver = Mock() - mock_resolver.resolve_message.side_effect = AttributeError("Invalid attribute") - # AttributeError propagates (fail-fast) - with ( - patch.object(bundle, "_resolver", mock_resolver), - pytest.raises(AttributeError, match="Invalid attribute"), - ): - bundle.format_pattern("msg", {}) - - def test_format_pattern_with_recursion_error_exception(self) -> None: - """Bundle propagates RecursionError from resolver (fail-fast behavior). - - Internal errors are no longer caught. - """ - bundle = FluentBundle("en_US") - bundle.add_resource("msg = Hello") - - # Patch the resolver instance directly; resolver is eagerly initialized. - mock_resolver = Mock() - mock_resolver.resolve_message.side_effect = RecursionError("Maximum recursion") - # RecursionError propagates (fail-fast) - with ( - patch.object(bundle, "_resolver", mock_resolver), - pytest.raises(RecursionError, match="Maximum recursion"), - ): - bundle.format_pattern("msg", {}) - - def test_format_pattern_with_unexpected_exception(self) -> None: - """Bundle propagates unexpected exceptions from resolver (fail-fast behavior). - - Internal errors are no longer caught. Only FluentError subclasses - are part of the normal error handling flow. - """ - bundle = FluentBundle("en_US") - bundle.add_resource("msg = Hello") - - # Patch the resolver instance directly; resolver is eagerly initialized. - mock_resolver = Mock() - mock_resolver.resolve_message.side_effect = RuntimeError("Unexpected error") - # RuntimeError propagates (fail-fast) - with ( - patch.object(bundle, "_resolver", mock_resolver), - pytest.raises(RuntimeError, match="Unexpected error"), - ): - bundle.format_pattern("msg", {}) - - # Note: Lines 76-77 (term debug logging) are unreachable with current parser - # Parser doesn't support Term syntax (-term = value), so isinstance(entry, Term) - # is never True. This is acceptable dead code for future parser enhancement. - - -class TestFluentBundleValidateResource: - """Test FluentBundle.validate_resource() method (Phase 4: Validation API).""" - - @pytest.fixture - def bundle(self) -> FluentBundle: - """Create bundle for testing.""" - return FluentBundle("en_US") - - def test_validate_valid_resource(self, bundle: FluentBundle) -> None: - """validate_resource returns success for valid FTL.""" - source = """hello = Hello, world! -goodbye = Goodbye!""" - result = bundle.validate_resource(source) - - assert result.is_valid - assert result.error_count == 0 - assert result.warning_count == 0 - assert len(result.errors) == 0 - assert len(result.warnings) == 0 - - def test_validate_empty_resource(self, bundle: FluentBundle) -> None: - """validate_resource handles empty string.""" - result = bundle.validate_resource("") - - assert result.is_valid - assert result.error_count == 0 - - def test_validate_resource_with_variables(self, bundle: FluentBundle) -> None: - """validate_resource handles messages with variables.""" - source = "welcome = Hello, { $name }!" - result = bundle.validate_resource(source) - - assert result.is_valid - assert result.error_count == 0 - - def test_validate_resource_with_select(self, bundle: FluentBundle) -> None: - """validate_resource handles SELECT expressions.""" - source = """emails = { $count -> - [one] 1 email - *[other] { $count } emails -}""" - result = bundle.validate_resource(source) - - assert result.is_valid - assert result.error_count == 0 - - def test_validate_invalid_syntax_returns_errors(self, bundle: FluentBundle) -> None: - """validate_resource returns errors for invalid syntax.""" - source = "invalid syntax without equals sign" - result = bundle.validate_resource(source) - - assert not result.is_valid - assert result.error_count == 1 - assert len(result.errors) == 1 - - def test_validate_multiple_errors(self, bundle: FluentBundle) -> None: - """validate_resource returns all errors found.""" - source = """hello = Hello -invalid line 1 -goodbye = Goodbye -invalid line 2""" - result = bundle.validate_resource(source) - - assert not result.is_valid - assert result.error_count == 2 - assert len(result.errors) == 2 - - def test_validate_does_not_modify_bundle(self, bundle: FluentBundle) -> None: - """validate_resource does not add messages to bundle.""" - source = "hello = Hello, world!" - - # Validate first - result = bundle.validate_resource(source) - assert result.is_valid - - # Bundle should still be empty - assert len(bundle.get_message_ids()) == 0 - assert not bundle.has_message("hello") - - def test_validation_result_properties(self, bundle: FluentBundle) -> None: - """ValidationResult properties work correctly.""" - # Valid resource - valid_result = bundle.validate_resource("hello = Hello") - assert valid_result.is_valid is True - assert valid_result.error_count == 0 - assert valid_result.warning_count == 0 - - # Invalid resource - invalid_result = bundle.validate_resource("invalid") - assert invalid_result.is_valid is False - assert invalid_result.error_count >= 1 - assert invalid_result.warning_count == 0 - - -def test_use_isolating_enabled_by_default(): - """Bidi isolation should be enabled by default per Fluent spec.""" - bundle = FluentBundle("ar") - bundle.add_resource("msg = مرحبا { $name }!") - result, errors = bundle.format_pattern("msg", {"name": "Alice"}) - - # Should contain FSI (U+2068) and PDI (U+2069) marks - assert "\u2068Alice\u2069" in result - assert result == "مرحبا \u2068Alice\u2069!" - assert errors == (), f"Unexpected errors: {errors}" - - -def test_use_isolating_can_be_disabled(): - """Bidi isolation can be disabled for LTR-only applications.""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource("msg = Hello { $name }!") - result, errors = bundle.format_pattern("msg", {"name": "Alice"}) - - # Should NOT contain isolation marks - assert "\u2068" not in result - assert "\u2069" not in result - assert result == "Hello Alice!" - assert errors == (), f"Unexpected errors: {errors}" - - -def test_use_isolating_with_multiple_placeables(): - """Bidi isolation wraps each placeable independently.""" - bundle = FluentBundle("ar", use_isolating=True) - bundle.add_resource("msg = { $first } و { $second }") - result, errors = bundle.format_pattern("msg", {"first": "Alice", "second": "Bob"}) - - # Each placeable wrapped independently - assert result == "\u2068Alice\u2069 و \u2068Bob\u2069" - assert errors == (), f"Unexpected errors: {errors}" - - -def test_cache_enabled_property_when_enabled(): - """cache_enabled property returns True when caching enabled.""" - bundle = FluentBundle("en", cache=CacheConfig()) - assert bundle.cache_enabled is True - - -def test_cache_enabled_property_when_disabled(): - """cache_enabled property returns False when caching disabled.""" - bundle = FluentBundle("en") - assert bundle.cache_enabled is False - - -def test_cache_enabled_property_default(): - """cache_enabled property returns False by default.""" - bundle = FluentBundle("en") - assert bundle.cache_enabled is False - - -def test_cache_config_size_when_enabled(): - """cache_config.size returns configured size when caching enabled.""" - bundle = FluentBundle("en", cache=CacheConfig(size=500)) - assert bundle.cache_config is not None - assert bundle.cache_config.size == 500 - - -def test_cache_config_is_none_when_disabled(): - """cache_config returns None when caching is disabled.""" - bundle = FluentBundle("en") - assert bundle.cache_config is None - assert bundle.cache_enabled is False - - -# ============================================================================ -# Branch Coverage Classes (from test_bundle_branch_coverage) -# ============================================================================ - -# ============================================================================= -# Property Accessors -# ============================================================================= - - -class TestBundlePropertyAccessors: - """Test all property accessors for complete coverage.""" - - def test_locale_property_returns_configured_locale(self) -> None: - """locale property returns the canonical locale code.""" - bundle = FluentBundle("lv_LV") - assert bundle.locale == "lv_lv" - - bundle_ar = FluentBundle("ar_EG") - assert bundle_ar.locale == "ar_eg" - - def test_use_isolating_property_true(self) -> None: - """use_isolating property returns True when enabled.""" - bundle = FluentBundle("en", use_isolating=True) - assert bundle.use_isolating is True - - def test_use_isolating_property_false(self) -> None: - """use_isolating property returns False when disabled.""" - bundle = FluentBundle("en", use_isolating=False) - assert bundle.use_isolating is False - - def test_strict_property_returns_configured_value(self) -> None: - """strict property returns the strict mode boolean.""" - assert FluentBundle("en", strict=True).strict is True - assert FluentBundle("en", strict=False).strict is False - assert FluentBundle("en").strict is True - - def test_cache_enabled_property(self) -> None: - """cache_enabled property reflects configuration.""" - assert FluentBundle("en", cache=CacheConfig()).cache_enabled is True - assert FluentBundle("en").cache_enabled is False - - def test_cache_config_size_property(self) -> None: - """cache_config.size returns configured maximum.""" - bundle = FluentBundle("en", cache=CacheConfig(size=500)) - assert bundle.cache_config is not None - assert bundle.cache_config.size == 500 - - def test_cache_usage_property_tracks_entries(self) -> None: - """cache_usage property tracks current cached entries.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg1 = Hello\nmsg2 = World") - - assert bundle.cache_usage == 0 - bundle.format_pattern("msg1") - assert bundle.cache_usage == 1 - bundle.format_pattern("msg2") - assert bundle.cache_usage == 2 - - def test_cache_usage_returns_zero_when_disabled(self) -> None: - """cache_usage returns 0 when caching is disabled.""" - bundle = FluentBundle("en") - bundle.add_resource("msg = Hello") - bundle.format_pattern("msg") - assert bundle.cache_usage == 0 - - def test_cache_write_once_config(self) -> None: - """cache_config.write_once reflects configured boolean.""" - on = FluentBundle("en", cache=CacheConfig(write_once=True)) - assert on.cache_config is not None - assert on.cache_config.write_once is True - off = FluentBundle("en", cache=CacheConfig(write_once=False)) - assert off.cache_config is not None - assert off.cache_config.write_once is False - - def test_cache_enable_audit_config(self) -> None: - """cache_config.enable_audit reflects configured boolean.""" - on = FluentBundle("en", cache=CacheConfig(enable_audit=True)) - assert on.cache_config is not None - assert on.cache_config.enable_audit is True - off = FluentBundle("en", cache=CacheConfig(enable_audit=False)) - assert off.cache_config is not None - assert off.cache_config.enable_audit is False - - def test_cache_max_audit_entries_config(self) -> None: - """cache_config.max_audit_entries reflects configured maximum.""" - bundle = FluentBundle( - "en", cache=CacheConfig(max_audit_entries=5000) - ) - assert bundle.cache_config is not None - assert bundle.cache_config.max_audit_entries == 5000 - - def test_cache_max_entry_weight_config(self) -> None: - """cache_config.max_entry_weight reflects configured maximum.""" - bundle = FluentBundle( - "en", cache=CacheConfig(max_entry_weight=8000) - ) - assert bundle.cache_config is not None - assert bundle.cache_config.max_entry_weight == 8000 - - def test_cache_max_errors_per_entry_config(self) -> None: - """cache_config.max_errors_per_entry reflects configured maximum.""" - bundle = FluentBundle( - "en", cache=CacheConfig(max_errors_per_entry=25) - ) - assert bundle.cache_config is not None - assert bundle.cache_config.max_errors_per_entry == 25 - - def test_max_source_size_property(self) -> None: - """max_source_size property returns configured or default value.""" - assert FluentBundle("en", max_source_size=500_000).max_source_size == 500_000 - assert FluentBundle("en").max_source_size == MAX_SOURCE_SIZE - - def test_max_nesting_depth_property(self) -> None: - """max_nesting_depth property returns configured or default value.""" - assert FluentBundle("en", max_nesting_depth=50).max_nesting_depth == 50 - assert FluentBundle("en").max_nesting_depth == 100 - - -# ============================================================================= -# Locale Validation -# ============================================================================= - - -class TestBundleLocaleValidation: - """Test locale code validation in __init__.""" - - def test_rejects_invalid_characters(self) -> None: - """Locale with special characters raises ValueError.""" - with pytest.raises(ValueError, match=r"Invalid locale: 'en@invalid'"): - FluentBundle("en@invalid") - - def test_rejects_spaces(self) -> None: - """Locale with spaces raises ValueError.""" - with pytest.raises(ValueError, match=r"Invalid locale: 'en US'"): - FluentBundle("en US") - - def test_rejects_non_ascii(self) -> None: - """Locale with non-ASCII characters raises ValueError.""" - with pytest.raises(ValueError, match=r"Invalid locale: 'ën_FR'"): - FluentBundle("\u00ebn_FR") - - def test_accepts_hyphen_separator(self) -> None: - """Locale with hyphen separator accepted.""" - assert FluentBundle("en-US").locale == "en_us" - - def test_accepts_underscore_separator(self) -> None: - """Locale with underscore separator accepted.""" - assert FluentBundle("en_US").locale == "en_us" - - def test_exceeding_max_length_rejected(self) -> None: - """Locale exceeding MAX_LOCALE_LENGTH_HARD_LIMIT raises ValueError.""" - long_locale = "a" * (MAX_LOCALE_LENGTH_HARD_LIMIT + 1) - with pytest.raises(ValueError, match="locale exceeds maximum length"): - FluentBundle(long_locale) - - def test_exceeding_max_length_shows_truncated(self) -> None: - """Error message includes truncated locale and actual length.""" - long_locale = "X" * (MAX_LOCALE_LENGTH_HARD_LIMIT + 100) - with pytest.raises( - ValueError, match="locale exceeds maximum length" - ) as exc_info: - FluentBundle(long_locale) - error_msg = str(exc_info.value) - assert long_locale[:50] in error_msg - assert str(len(long_locale)) in error_msg - - -# ============================================================================= -# Special Methods (__repr__) -# ============================================================================= - - -class TestBundleSpecialMethods: - """Test __repr__ for complete coverage.""" - - def test_repr_shows_locale_and_counts(self) -> None: - """__repr__ returns string with locale and message/term counts.""" - bundle = FluentBundle("lv_LV") - repr_str = repr(bundle) - assert "FluentBundle" in repr_str - assert "lv_lv" in repr_str - assert "messages=0" in repr_str - assert "terms=0" in repr_str - - def test_repr_reflects_counts_after_adding_resources(self) -> None: - """__repr__ shows accurate counts after adding resources.""" - bundle = FluentBundle("en") - bundle.add_resource("msg1 = Hello\nmsg2 = World\n-brand = Firefox") - repr_str = repr(bundle) - assert "messages=2" in repr_str - assert "terms=1" in repr_str - - -# ============================================================================= -# for_system_locale Factory Method -# ============================================================================= - - -class TestBundleForSystemLocale: - """Test for_system_locale classmethod.""" - - def test_creates_bundle_with_detected_locale(self) -> None: - """for_system_locale creates bundle with system locale.""" - with patch( - "ftllexengine.runtime.bundle_lifecycle.get_system_locale", - return_value="en_US", - ): - bundle = FluentBundle.for_system_locale() - assert bundle.locale == "en_us" - - def test_passes_configuration_parameters(self) -> None: - """for_system_locale passes all configuration parameters.""" - with patch( - "ftllexengine.runtime.bundle_lifecycle.get_system_locale", - return_value="de_DE", - ): - bundle = FluentBundle.for_system_locale( - use_isolating=False, - cache=CacheConfig(size=2000), - strict=True, - max_source_size=500_000, - ) - assert bundle.locale == "de_de" - assert bundle.use_isolating is False - assert bundle.cache_enabled is True - assert bundle.cache_config is not None - assert bundle.cache_config.size == 2000 - assert bundle.strict is True - assert bundle.max_source_size == 500_000 - - def test_raises_when_locale_unavailable(self) -> None: - """for_system_locale raises RuntimeError when locale unavailable.""" - with patch( - "ftllexengine.runtime.bundle_lifecycle.get_system_locale", - side_effect=RuntimeError("Cannot determine system locale"), - ), pytest.raises(RuntimeError, match="Cannot determine"): - FluentBundle.for_system_locale() - - def test_falls_back_to_env_vars_when_getlocale_fails(self) -> None: - """for_system_locale uses env vars when getlocale() returns None.""" - with patch("locale.getlocale", return_value=(None, None)), patch.dict( - "os.environ", {"LC_ALL": "de_DE"}, clear=False - ): - bundle = FluentBundle.for_system_locale() - assert bundle.locale == "de_de" - - def test_tries_lc_messages_when_lc_all_missing(self) -> None: - """for_system_locale tries LC_MESSAGES when LC_ALL not set.""" - with patch("locale.getlocale", return_value=(None, None)), patch.dict( - "os.environ", {"LC_MESSAGES": "fr_FR"}, clear=True - ): - bundle = FluentBundle.for_system_locale() - assert bundle.locale == "fr_fr" - - def test_tries_lang_when_others_missing(self) -> None: - """for_system_locale tries LANG as final fallback.""" - with patch("locale.getlocale", return_value=(None, None)), patch.dict( - "os.environ", {"LANG": "es_ES"}, clear=True - ): - bundle = FluentBundle.for_system_locale() - assert bundle.locale == "es_es" - - def test_raises_when_no_locale_found(self) -> None: - """for_system_locale raises RuntimeError with no locale.""" - with ( - patch("locale.getlocale", return_value=(None, None)), - patch.dict("os.environ", {}, clear=True), - pytest.raises( - RuntimeError, match="Could not determine system locale" - ), - ): - FluentBundle.for_system_locale() - - def test_normalizes_posix_format(self) -> None: - """for_system_locale strips encoding suffix and normalizes.""" - with patch("locale.getlocale", return_value=("en_US.UTF-8", None)): - bundle = FluentBundle.for_system_locale() - assert bundle.locale == "en_us" - assert "UTF-8" not in bundle.locale - - def test_handles_locale_without_encoding(self) -> None: - """for_system_locale handles locale without encoding suffix.""" - with patch("locale.getlocale", return_value=("pl_PL", None)): - bundle = FluentBundle.for_system_locale() - assert bundle.locale == "pl_pl" - - -# ============================================================================= -# Resource Management (add_resource, comments, terms) -# ============================================================================= - - -class TestBundleResourceManagement: - """Test add_resource edge cases, comment handling, term attributes.""" - - def test_add_resource_with_comments(self) -> None: - """Comments are parsed but not registered as messages.""" - bundle = FluentBundle("en") - ftl_source = ( - "# Standalone comment\nmsg1 = Hello\n\n" - "## Section comment\nmsg2 = World\n\n" - "### Resource comment\n-term = Value\n" - ) - junk = bundle.add_resource(ftl_source) - assert len(junk) == 0 - assert bundle.has_message("msg1") - assert bundle.has_message("msg2") - assert len(bundle.get_message_ids()) == 2 - - def test_standalone_comment_only_resource(self) -> None: - """Resource containing only comments is valid.""" - bundle = FluentBundle("en") - junk = bundle.add_resource( - "# Comment\n## Section\n### Resource\n" - ) - assert len(junk) == 0 - assert len(bundle.get_message_ids()) == 0 - - def test_consecutive_comments(self) -> None: - """Multiple consecutive comments hit Comment->loop branch.""" - bundle = FluentBundle("en") - ftl = "## Section 1\n## Section 2\n### Resource\nmsg = Value\n" - junk = bundle.add_resource(ftl) - assert len(junk) == 0 - assert bundle.has_message("msg") - - def test_message_without_value_only_attributes(self) -> None: - """Message with no value, only attributes, is registered.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource("msg =\n .attr1 = Value 1\n .attr2 = Value 2\n") - assert bundle.has_message("msg") - - def test_term_with_multiple_attributes(self) -> None: - """Term with attributes is registered successfully.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource( - "-brand = Firefox\n .gender = masculine\n" - " .case = nominative\n" - ) - assert bundle is not None - - def test_add_resource_clears_cache(self) -> None: - """add_resource clears cache when enabled.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("first = First") - bundle.format_pattern("first") - assert bundle.get_cache_stats()["size"] > 0 # type: ignore[index] - bundle.add_resource("second = Second") - assert bundle.get_cache_stats()["size"] == 0 # type: ignore[index] - - def test_duplicate_terms_overwrite(self, caplog: Any) -> None: - """Duplicate term definitions produce overwrite warning.""" - bundle = FluentBundle("en") - bundle.add_resource("-brand = Firefox\n-brand = Chrome\n") - assert any( - "Overwriting existing term '-brand'" in r.message - for r in caplog.records - ) - - def test_multiple_duplicate_terms(self, caplog: Any) -> None: - """Multiple duplicate terms each produce warnings.""" - bundle = FluentBundle("en") - bundle.add_resource( - "-brand = First\n-version = First\n" - "-brand = Second\n-version = Second\n" - ) - warnings = [ - r for r in caplog.records - if "Overwriting existing term" in r.message - ] - assert len(warnings) == 2 - - def test_comments_with_debug_logging(self, caplog: Any) -> None: - """Comments are processed at debug level without errors.""" - caplog.set_level(logging.DEBUG) - bundle = FluentBundle("en") - ftl = ( - "# Comment before term\n" - "-brand = Firefox\n" - ) - junk = bundle.add_resource(ftl) - assert len(junk) == 0 - - -# ============================================================================= -# Type Validation (add_resource, validate_resource, format_pattern) -# ============================================================================= - - -class TestBundleTypeValidation: - """Test type validation at API boundaries.""" - - def test_add_resource_rejects_bytes(self) -> None: - """add_resource raises TypeError for bytes with decode suggestion.""" - bundle = FluentBundle("en") - with pytest.raises(TypeError, match=r"source must be str, not bytes"): - bundle.add_resource(b"msg = Hello") # type: ignore[arg-type] - with pytest.raises(TypeError, match=r"source.decode\('utf-8'\)"): - bundle.add_resource(b"msg = Hello") # type: ignore[arg-type] - - def test_add_resource_rejects_int(self) -> None: - """add_resource raises TypeError for non-string types.""" - bundle = FluentBundle("en") - with pytest.raises(TypeError, match=r"source must be str"): - bundle.add_resource(42) # type: ignore[arg-type] - - def test_validate_resource_rejects_bytes(self) -> None: - """validate_resource raises TypeError for bytes.""" - bundle = FluentBundle("en") - with pytest.raises(TypeError, match=r"source must be str, not bytes"): - bundle.validate_resource(b"msg = Hello") # type: ignore[arg-type] - - def test_format_pattern_empty_message_id(self) -> None: - """format_pattern with empty message ID returns fallback.""" - bundle = FluentBundle("en", strict=False) - result, errors = bundle.format_pattern("") - assert result == "{???}" - assert len(errors) == 1 - - def test_format_pattern_invalid_args_type(self) -> None: - """format_pattern with non-Mapping args returns fallback.""" - bundle = FluentBundle("en", strict=False) - bundle.add_resource("msg = Hello") - result, errors = bundle.format_pattern("msg", []) # type: ignore[arg-type] - assert result == "{???}" - assert len(errors) == 1 - - def test_format_pattern_invalid_attribute_type(self) -> None: - """format_pattern with non-string attribute returns fallback.""" - bundle = FluentBundle("en", strict=False) - bundle.add_resource("msg = Hello") - result, errors = bundle.format_pattern( - "msg", {}, attribute=123 # type: ignore[arg-type] - ) - assert result == "{???}" - assert len(errors) == 1 - - def test_strict_mode_raises_on_empty_message_id(self) -> None: - """format_pattern in strict mode raises on empty message ID.""" - bundle = FluentBundle("en", strict=True) - with pytest.raises(FormattingIntegrityError): - bundle.format_pattern("") - - def test_strict_mode_raises_on_invalid_args_type(self) -> None: - """format_pattern in strict mode raises on invalid args type.""" - bundle = FluentBundle("en", strict=True) - bundle.add_resource("msg = Hello") - with pytest.raises(FormattingIntegrityError): - bundle.format_pattern("msg", []) # type: ignore[arg-type] - - def test_strict_mode_raises_on_invalid_attribute_type(self) -> None: - """format_pattern in strict mode raises on invalid attribute type.""" - bundle = FluentBundle("en", strict=True) - bundle.add_resource("msg = Hello") - with pytest.raises(FormattingIntegrityError): - bundle.format_pattern( - "msg", {}, attribute=123 # type: ignore[arg-type] - ) - - -# ============================================================================= -# Strict Mode (syntax errors, formatting errors, caching) -# ============================================================================= - - -class TestBundleStrictMode: - """Test strict mode syntax and formatting error handling.""" - - def test_raises_syntax_integrity_error_on_junk(self) -> None: - """Strict mode raises SyntaxIntegrityError for junk entries.""" - bundle = FluentBundle("en", strict=True) - with pytest.raises( - SyntaxIntegrityError, match=r"Strict mode: .* syntax error" - ): - bundle.add_resource("msg = \n!!invalid!!") - - def test_error_includes_source_path(self) -> None: - """Strict mode error includes source_path when provided.""" - bundle = FluentBundle("en", strict=True) - with pytest.raises( - SyntaxIntegrityError, match=r"locales/en/messages.ftl" - ) as exc_info: - bundle.add_resource( - "msg = \n!!invalid!!", - source_path="locales/en/messages.ftl", - ) - assert exc_info.value.source_path == "locales/en/messages.ftl" - - def test_error_truncates_long_summary(self) -> None: - """Strict mode truncates to first 3 junk entries.""" - bundle = FluentBundle("en", strict=True) - invalid_ftl = ( - "msg1 =\n!!e1!!\nmsg2 =\n!!e2!!\n" - "msg3 =\n!!e3!!\nmsg4 =\n!!e4!!\n" - ) - with pytest.raises( - SyntaxIntegrityError, match=r"and \d+ more" - ): - bundle.add_resource(invalid_ftl) - - def test_does_not_mutate_bundle_on_error(self) -> None: - """Strict mode does not partially populate bundle on syntax error.""" - bundle = FluentBundle("en", strict=True) - bundle.add_resource("msg1 = Hello") - assert len(bundle.get_message_ids()) == 1 - - with pytest.raises(SyntaxIntegrityError): - bundle.add_resource("msg2 = World\n!!invalid!!") - assert len(bundle.get_message_ids()) == 1 - - def test_formatting_integrity_error_on_missing_var(self) -> None: - """Strict mode raises FormattingIntegrityError for missing vars.""" - bundle = FluentBundle("en", strict=True) - bundle.add_resource("msg = Hello { $name }") - with pytest.raises(FormattingIntegrityError, match=r"Strict mode"): - bundle.format_pattern("msg", {}) - - def test_formatting_error_includes_message_id(self) -> None: - """Strict mode formatting error includes message ID.""" - bundle = FluentBundle("en", strict=True) - bundle.add_resource("greeting = Hello { $name }") - with pytest.raises( - FormattingIntegrityError, match=r"greeting" - ) as exc_info: - bundle.format_pattern("greeting", {}) - assert exc_info.value.message_id == "greeting" - - def test_formatting_error_truncates_multiple_errors(self) -> None: - """Strict mode error truncates to first 3 formatting errors.""" - bundle = FluentBundle("en", strict=True) - bundle.add_resource("msg = { $a } { $b } { $c } { $d }") - with pytest.raises(FormattingIntegrityError, match=r"and \d+ more"): - bundle.format_pattern("msg", {}) - - -# ============================================================================= -# Validation (circular refs, undefined refs, duplicates, syntax errors) -# ============================================================================= - - -class TestBundleValidation: - """Test validate_resource warning and error detection.""" - - def test_detects_circular_message_refs(self) -> None: - """Circular message references generate warnings.""" - bundle = FluentBundle("en") - result = bundle.validate_resource( - "msg1 = { msg2 }\nmsg2 = { msg1 }\n" - ) - assert any( - "Circular message reference" in w.message - for w in result.warnings - ) - - def test_detects_self_referencing_message(self) -> None: - """Message referencing itself detected as circular.""" - bundle = FluentBundle("en") - result = bundle.validate_resource("msg = { msg }\n") - assert len(result.warnings) > 0 - - def test_detects_circular_term_refs(self) -> None: - """Circular term references generate warnings.""" - bundle = FluentBundle("en") - result = bundle.validate_resource( - "-term1 = { -term2 }\n-term2 = { -term1 }\n" - ) - assert any( - "Circular term reference" in w.message - for w in result.warnings - ) - - def test_detects_self_referencing_term(self) -> None: - """Term referencing itself detected as circular.""" - bundle = FluentBundle("en") - result = bundle.validate_resource("-term = { -term }\n") - assert len(result.warnings) > 0 - - def test_detects_term_attribute_circular_ref(self) -> None: - """Circular reference in term attribute detected.""" - bundle = FluentBundle("en") - result = bundle.validate_resource( - "-term = Value\n .attr = { -term.attr }\n" - ) - assert len(result.warnings) > 0 - - def test_detects_nested_term_circular_ref(self) -> None: - """Three-way circular term reference detected.""" - bundle = FluentBundle("en") - result = bundle.validate_resource( - "-t1 = { -t2 }\n-t2 = { -t3 }\n-t3 = { -t1 }\n" - ) - assert len(result.warnings) > 0 - - def test_detects_undefined_message_ref(self) -> None: - """Undefined message reference generates warning.""" - bundle = FluentBundle("en") - result = bundle.validate_resource("msg = { undefined }\n") - assert any( - "undefined" in w.message.lower() for w in result.warnings - ) - - def test_detects_undefined_term_ref_from_message(self) -> None: - """Message referencing undefined term generates warning.""" - bundle = FluentBundle("en") - result = bundle.validate_resource("msg = { -undefined_term }\n") - assert len(result.warnings) > 0 - - def test_detects_undefined_term_ref_from_term(self) -> None: - """Term referencing undefined term generates warning.""" - bundle = FluentBundle("en_US", use_isolating=False) - result = bundle.validate_resource("-term-a = { -term-b }\n") - assert any( - "undefined term '-term-b'" in w.message - for w in result.warnings - ) - - def test_detects_undefined_message_ref_from_term(self) -> None: - """Term referencing undefined message generates warning.""" - bundle = FluentBundle("en") - result = bundle.validate_resource("-term = { undefined_msg }\n") - assert len(result.warnings) > 0 - - def test_term_referencing_defined_message_no_warning(self) -> None: - """Term referencing a defined message does not warn.""" - bundle = FluentBundle("en_US", use_isolating=False) - result = bundle.validate_resource( - "greeting = Hello\n-term = { greeting }\n" - ) - assert not any( - "undefined message" in w.message for w in result.warnings - ) - - def test_detects_duplicate_term_id(self) -> None: - """Duplicate term ID generates warning.""" - bundle = FluentBundle("en_US", use_isolating=False) - result = bundle.validate_resource( - "-brand = Firefox\n-brand = Chrome\n" - ) - assert any( - "Duplicate term ID" in w.message for w in result.warnings - ) - - def test_message_without_value_validates(self) -> None: - """Message with only attributes validates successfully.""" - bundle = FluentBundle("en_US", use_isolating=False) - result = bundle.validate_resource("msg =\n .attr = Value\n") - assert result.is_valid - - def test_term_with_attributes_validates(self) -> None: - """Term with attributes validates successfully.""" - bundle = FluentBundle("en_US", use_isolating=False) - result = bundle.validate_resource( - "-term = Base\n .attr1 = A1\n .attr2 = A2\n" - ) - assert result.is_valid - - def test_handles_critical_syntax_error(self) -> None: - """Critical syntax errors produce validation errors.""" - bundle = FluentBundle("en") - result = bundle.validate_resource("msg = {{ invalid") - assert not result.is_valid - assert len(result.errors) > 0 - - def test_critical_error_returns_validation_error(self) -> None: - """Critical errors are ValidationError instances.""" - bundle = FluentBundle("en_US", use_isolating=False) - result = bundle.validate_resource("msg = {{ broken") - assert all( - isinstance(e, ValidationError) for e in result.errors - ) - - def test_integration_all_warning_types(self) -> None: - """Resource with all warning types produces correct warnings.""" - bundle = FluentBundle("en_US", use_isolating=False) - ftl = ( - "msg-dup = First\nmsg-dup = Second\n" - "-term-dup = First\n-term-dup = Second\n" - "circ-a = { circ-b }\ncirc-b = { circ-a }\n" - "-tc-a = { -tc-b }\n-tc-b = { -tc-a }\n" - "msg-undef = { missing-msg }\n" - "-term-undef = { -missing-term }\n" - "msg-attrs =\n .attr = Value\n" - "-term-attrs = Base\n .attr = Attribute\n" - ) - result = bundle.validate_resource(ftl) - warnings = " ".join(w.message for w in result.warnings) - assert "Duplicate message ID" in warnings - assert "Duplicate term ID" in warnings - assert "Circular message reference" in warnings - assert "Circular term reference" in warnings - assert "undefined message" in warnings - assert "undefined term" in warnings - - def test_message_without_value_no_crash(self) -> None: - """Validation doesn't crash on empty-value message.""" - bundle = FluentBundle("en") - result = bundle.validate_resource("empty =\n") - assert result is not None - - -# ============================================================================= -# Cache Management -# ============================================================================= - - -class TestBundleCacheManagement: - """Test clear_cache, get_cache_stats, cache invalidation.""" - - def test_clear_cache_when_enabled(self) -> None: - """clear_cache removes all cached format results.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg1 = Hello\nmsg2 = World") - bundle.format_pattern("msg1") - bundle.format_pattern("msg2") - assert bundle.cache_usage == 2 - bundle.clear_cache() - assert bundle.cache_usage == 0 - - def test_clear_cache_when_disabled(self) -> None: - """clear_cache succeeds when cache is disabled.""" - bundle = FluentBundle("en") - bundle.clear_cache() - assert bundle.get_cache_stats() is None - - def test_clear_cache_resets_to_empty(self) -> None: - """clear_cache resets the format cache to empty state.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg = Hello") - bundle.clear_cache() - assert bundle.cache_usage == 0 - - def test_get_cache_stats_returns_dict_when_enabled(self) -> None: - """get_cache_stats returns dict with hits/misses when enabled.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg = Hello") - bundle.format_pattern("msg", {}) - bundle.format_pattern("msg", {}) - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["hits"] == 1 - assert stats["misses"] == 1 - - def test_get_cache_stats_returns_none_when_disabled(self) -> None: - """get_cache_stats returns None when caching is disabled.""" - bundle = FluentBundle("en") - assert bundle.get_cache_stats() is None - - def test_format_pattern_caches_result(self) -> None: - """format_pattern caches results when cache enabled.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg = Hello") - result1, _ = bundle.format_pattern("msg") - stats1 = bundle.get_cache_stats() - assert stats1 is not None - assert stats1["misses"] == 1 - result2, _ = bundle.format_pattern("msg") - stats2 = bundle.get_cache_stats() - assert stats2 is not None - assert stats2["hits"] == 1 - assert result1 == result2 - - -# -- Introspection (variables, introspect_message/term, has_attribute) ------- - - -class TestBundleIntrospection: - """Test introspection and query methods.""" - - def test_get_message_variables_returns_frozenset(self) -> None: - """get_message_variables returns frozenset of variable names.""" - bundle = FluentBundle("en") - bundle.add_resource("greeting = Hello, { $name }!") - variables = bundle.get_message_variables("greeting") - assert "name" in variables - assert isinstance(variables, frozenset) - - def test_get_message_variables_raises_keyerror(self) -> None: - """get_message_variables raises KeyError for missing message.""" - bundle = FluentBundle("en") - with pytest.raises(KeyError, match="not found"): - bundle.get_message_variables("nonexistent") - - def test_get_all_message_variables(self) -> None: - """get_all_message_variables returns dict of variable sets.""" - bundle = FluentBundle("en") - bundle.add_resource( - "greeting = Hello, { $name }!\n" - "farewell = Bye, { $first } { $last }!\n" - "simple = No variables\n" - ) - all_vars = bundle.get_all_message_variables() - assert all_vars["greeting"] == frozenset({"name"}) - assert all_vars["farewell"] == frozenset({"first", "last"}) - assert all_vars["simple"] == frozenset() - - def test_get_all_message_variables_empty_bundle(self) -> None: - """get_all_message_variables returns empty dict when empty.""" - bundle = FluentBundle("en") - assert bundle.get_all_message_variables() == {} - - def test_introspect_message_returns_metadata(self) -> None: - """introspect_message returns MessageIntrospection with metadata.""" - bundle = FluentBundle("en") - bundle.add_resource( - "price = { NUMBER($amount, minimumFractionDigits: 2) }" - ) - info = bundle.introspect_message("price") - assert "amount" in info.get_variable_names() - assert "NUMBER" in info.get_function_names() - - def test_introspect_message_raises_keyerror(self) -> None: - """introspect_message raises KeyError for missing message.""" - bundle = FluentBundle("en") - with pytest.raises(KeyError, match="not found"): - bundle.introspect_message("nonexistent") - - def test_introspect_term_returns_metadata(self) -> None: - """introspect_term returns MessageIntrospection for term.""" - bundle = FluentBundle("en") - bundle.add_resource( - "-brand = { $case ->\n" - " [nominative] Firefox\n" - " *[other] Firefox\n}\n" - ) - info = bundle.introspect_term("brand") - assert "case" in info.get_variable_names() - - def test_introspect_term_raises_keyerror(self) -> None: - """introspect_term raises KeyError for missing term.""" - bundle = FluentBundle("en") - with pytest.raises(KeyError, match="Term 'nonexistent' not found"): - bundle.introspect_term("nonexistent") - - def test_introspect_term_success(self) -> None: - """introspect_term returns valid data for existing term.""" - bundle = FluentBundle("en") - bundle.add_resource( - "-brand = Firefox\n .gender = masculine" - ) - info = bundle.introspect_term("brand") - assert info is not None - - def test_has_attribute_true(self) -> None: - """has_attribute returns True when attribute exists.""" - bundle = FluentBundle("en") - bundle.add_resource("button = Click\n .tooltip = Save\n") - assert bundle.has_attribute("button", "tooltip") is True - - def test_has_attribute_false_missing_attribute(self) -> None: - """has_attribute returns False when attribute missing.""" - bundle = FluentBundle("en") - bundle.add_resource("button = Click\n .tooltip = Save\n") - assert bundle.has_attribute("button", "nonexistent") is False - - def test_has_attribute_false_missing_message(self) -> None: - """has_attribute returns False when message missing.""" - bundle = FluentBundle("en") - bundle.add_resource("msg = Hello") - assert bundle.has_attribute("nonexistent", "tooltip") is False - - def test_has_attribute_multiple_attributes(self) -> None: - """has_attribute correctly checks among multiple attributes.""" - bundle = FluentBundle("en") - bundle.add_resource( - "button = Click\n" - " .tooltip = Tooltip\n" - " .aria-label = Label\n" - " .placeholder = Enter\n" - ) - assert bundle.has_attribute("button", "tooltip") is True - assert bundle.has_attribute("button", "aria-label") is True - assert bundle.has_attribute("button", "placeholder") is True - assert bundle.has_attribute("button", "missing") is False - - -# ============================================================================= -# Formatting (format_pattern error paths) -# ============================================================================= - - -class TestBundleFormatting: - """Test formatting methods and error handling.""" - - def test_format_pattern_formats_message(self) -> None: - """format_pattern formats message without attribute access.""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource("welcome = Hello, { $name }!") - result, errors = bundle.format_pattern("welcome", {"name": "Alice"}) - assert result == "Hello, Alice!" - assert errors == () - - def test_format_pattern_handles_recursion_error(self) -> None: - """format_pattern catches RecursionError from circular refs.""" - bundle = FluentBundle("en", strict=False) - bundle.add_resource("msg1 = { msg2 }\nmsg2 = { msg1 }\n") - _result, errors = bundle.format_pattern("msg1") - assert len(errors) > 0 - - -# ============================================================================= -# Custom Functions -# ============================================================================= - - -class TestBundleCustomFunctions: - """Test custom function registration and registry isolation.""" - - def test_custom_function_registered_and_works(self) -> None: - """add_function registers custom function successfully.""" - bundle = FluentBundle("en") - - def CUSTOM(value: Any) -> str: - return str(value).upper() - - bundle.add_function("CUSTOM", CUSTOM) - bundle.add_resource("msg = { CUSTOM($val) }") - result, _ = bundle.format_pattern("msg", {"val": "hello"}) - assert "HELLO" in result - - def test_add_function_clears_cache(self) -> None: - """add_function clears cache after registration.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg = Hello") - bundle.format_pattern("msg") - assert bundle.cache_usage == 1 - - def CUSTOM(v: Any) -> str: - return str(v) - - bundle.add_function("CUSTOM", CUSTOM) - assert bundle.cache_usage == 0 - - def test_add_function_without_cache(self) -> None: - """add_function works when cache is disabled.""" - bundle = FluentBundle("en", use_isolating=False) - - def CUSTOM(val: str) -> str: - return val.upper() - - bundle.add_function("CUSTOM", CUSTOM) - bundle.add_resource("msg = { CUSTOM($val) }") - result, _ = bundle.format_pattern("msg", {"val": "test"}) - assert result == "TEST" - - def test_init_with_custom_registry(self) -> None: - """FluentBundle accepts custom FunctionRegistry.""" - registry = create_default_registry() - - def my_func(_val: int) -> str: - return "custom" - - registry.register(my_func, ftl_name="CUSTOM") - bundle = FluentBundle("en", functions=registry) - bundle.add_resource("test = { CUSTOM(123) }") - result, errors = bundle.format_pattern("test") - assert not errors - assert "custom" in result - - def test_init_copies_registry_for_isolation(self) -> None: - """FluentBundle creates copy of registry for isolation.""" - original = create_default_registry() - bundle = FluentBundle("en", strict=False, functions=original) - - def new_func(_val: int) -> str: - return "new" - - original.register(new_func, ftl_name="NEWFUNC") - bundle.add_resource("test = { NEWFUNC(1) }") - result, errors = bundle.format_pattern("test") - assert len(errors) > 0 or "NEWFUNC" not in result - - -# ============================================================================= -# get_babel_locale Method -# ============================================================================= - - -class TestBundleGetBabelLocale: - """Test get_babel_locale introspection method.""" - - def test_returns_locale_identifier(self) -> None: - """get_babel_locale returns Babel locale identifier.""" - assert FluentBundle("lv").get_babel_locale() == "lv" - - def test_handles_underscore_locale(self) -> None: - """get_babel_locale handles underscore-separated locales.""" - assert FluentBundle("en_US").get_babel_locale() == "en_US" - - def test_handles_hyphen_locale(self) -> None: - """get_babel_locale handles hyphen-separated locales.""" - result = FluentBundle("en-GB").get_babel_locale() - assert "en" in result - - def test_invalid_locale_is_rejected_at_construction(self) -> None: - """Unknown locales are rejected before a bundle can be created.""" - with pytest.raises(ValueError, match="Unknown locale identifier"): - FluentBundle("xx-INVALID") - - -# ============================================================================= -# Thread Safety -# ============================================================================= - - -class TestBundleThreadSafety: - """Test always-on thread safety via readers-writer lock.""" - - def test_add_resource_is_thread_safe(self) -> None: - """add_resource acquires lock (always-on thread safety).""" - bundle = FluentBundle("en") - bundle.add_resource("msg = Hello") - assert bundle.has_message("msg") - result, errors = bundle.format_pattern("msg") - assert result == "Hello" - assert errors == () - - def test_format_pattern_is_thread_safe(self) -> None: - """format_pattern acquires lock (always-on thread safety).""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource("greeting = Hello, { $name }!") - result, errors = bundle.format_pattern( - "greeting", {"name": "World"} - ) - assert result == "Hello, World!" - assert errors == () - - -# ============================================================================= -# Hypothesis Property-Based Tests -# ============================================================================= - - -class TestBundleHypothesisProperties: - """Property-based tests for FluentBundle boundary exploration.""" - - # --- Init type validation (from test_bundle_100pct_final_coverage) --- - - @given( - invalid_functions=st.one_of( - st.dictionaries( - st.text(min_size=1, max_size=10), st.integers() - ), - st.lists(st.text()), - st.integers(), - st.text(), - st.none(), - ) - ) - def test_init_rejects_non_function_registry( - self, invalid_functions: object - ) -> None: - """FluentBundle.__init__ rejects non-FunctionRegistry functions.""" - if invalid_functions is None: - event("type=NoneType_valid") - return - - type_name = type(invalid_functions).__name__ - event(f"type={type_name}") - - with pytest.raises( - TypeError, - match="functions must be FunctionRegistry, not", - ): - FluentBundle( - "en_US", functions=invalid_functions # type: ignore[arg-type] - ) - - @example(invalid_functions={"NUMBER": lambda x: x}) - @example(invalid_functions=[]) - @example(invalid_functions=42) - @example(invalid_functions="not_a_registry") - @given( - invalid_functions=st.one_of( - st.dictionaries( - st.text(min_size=1, max_size=5), - st.integers(), - min_size=1, - ), - st.lists(st.integers(), min_size=1), - ) - ) - def test_init_type_error_message_includes_type_name( - self, invalid_functions: object - ) -> None: - """TypeError message includes actual type name.""" - type_name = type(invalid_functions).__name__ - event(f"type={type_name}") - - with pytest.raises(TypeError) as exc_info: - FluentBundle( - "en_US", functions=invalid_functions # type: ignore[arg-type] - ) - - assert type_name in str(exc_info.value) - assert "FunctionRegistry" in str(exc_info.value) - assert "create_default_registry" in str(exc_info.value) - - # --- Property getters (from test_bundle_100pct_final_coverage) --- - - @given( - max_expansion_size=st.integers( - min_value=1000, max_value=10_000_000 - ), - locale=st.sampled_from(["en_US", "de_DE", "lv_LV", "ja_JP"]), - ) - def test_max_expansion_size_preserved( - self, max_expansion_size: int, locale: str - ) -> None: - """max_expansion_size property returns configured value.""" - if max_expansion_size < 10_000: - event("boundary=small") - elif max_expansion_size > 1_000_000: - event("boundary=large") - else: - event("boundary=medium") - - bundle = FluentBundle( - locale, max_expansion_size=max_expansion_size - ) - assert bundle.max_expansion_size == max_expansion_size - - @given( - locale=st.sampled_from(["en", "de", "lv", "pl", "ar", "ja"]), - provide_custom_registry=st.booleans(), - ) - def test_function_registry_preserved( - self, locale: str, provide_custom_registry: bool - ) -> None: - """function_registry property returns valid registry.""" - if provide_custom_registry: - event("registry_type=custom") - custom_registry = create_default_registry() - bundle = FluentBundle(locale, functions=custom_registry) - else: - event("registry_type=shared") - bundle = FluentBundle(locale) - - registry = bundle.function_registry - assert isinstance(registry, FunctionRegistry) - assert "NUMBER" in registry - - # --- Comment handling (from test_bundle_100pct_final_coverage) --- - - @given( - num_comments=st.integers(min_value=1, max_value=10), - comment_style=st.sampled_from( - ["single", "double", "triple"] - ), - ) - def test_comments_handled_correctly( - self, num_comments: int, comment_style: str - ) -> None: - """Comment entries handled during resource registration.""" - event(f"comment_count={num_comments}") - event(f"comment_style={comment_style}") - - marker = {"single": "#", "double": "##", "triple": "###"}[ - comment_style - ] - lines = [f"{marker} Comment {i}" for i in range(num_comments)] - lines.extend(["", "msg = Hello"]) - - bundle = FluentBundle("en_US") - junk = bundle.add_resource("\n".join(lines)) - assert len(junk) == 0 - assert bundle.has_message("msg") - - @example(num_standalone=1) - @example(num_standalone=3) - @example(num_standalone=10) - @given(num_standalone=st.integers(min_value=1, max_value=20)) - def test_comments_do_not_create_junk( - self, num_standalone: int - ) -> None: - """Comments are skipped without creating Junk entries.""" - event(f"standalone_comments={num_standalone}") - - lines = ["### Section Header"] - lines.extend( - f"# Comment line {i}" for i in range(num_standalone) - ) - lines.extend(["", "message = Value", "## Trailing comment"]) - - bundle = FluentBundle("en_US") - junk = bundle.add_resource("\n".join(lines)) - assert len(junk) == 0 - assert bundle.has_message("message") - - # --- Strict mode cache interaction --- - # (from test_bundle_100pct_final_coverage) - - @given( - locale=st.sampled_from(["en", "de", "lv", "pl"]), - missing_var_name=st.text( - alphabet=st.characters( - min_codepoint=ord("a"), max_codepoint=ord("z") - ), - min_size=1, - max_size=20, - ), - ) - def test_strict_mode_raises_on_cached_error( - self, locale: str, missing_var_name: str - ) -> None: - """Strict mode raises FormattingIntegrityError on cached errors.""" - bundle = FluentBundle( - locale, strict=True, cache=CacheConfig() - ) - bundle.add_resource( - f"msg = Hello {{ ${missing_var_name} }}" - ) - - with pytest.raises(FormattingIntegrityError) as exc1: - bundle.format_pattern("msg", {}) - - event("cache_hit_type=error") - assert exc1.value.message_id == "msg" - assert len(exc1.value.fluent_errors) == 1 - assert ( - exc1.value.fluent_errors[0].category - == ErrorCategory.REFERENCE - ) - - with pytest.raises(FormattingIntegrityError) as exc2: - bundle.format_pattern("msg", {}) - assert exc2.value.message_id == "msg" - - @given( - locale=st.sampled_from(["en_US", "de_DE", "lv_LV"]), - message_text=st.text( - alphabet=st.characters( - min_codepoint=ord("A"), - max_codepoint=ord("z"), - blacklist_categories=("Cc", "Cs"), - ), - min_size=1, - max_size=50, - ), - ) - def test_strict_mode_cache_hit_without_errors( - self, locale: str, message_text: str - ) -> None: - """Strict mode cached success result returns normally.""" - safe = "".join( - c for c in message_text if c.isprintable() and c not in "{}#" - ).strip() - if not safe: - safe = "Hello" - - bundle = FluentBundle( - locale, strict=True, cache=CacheConfig() - ) - bundle.add_resource(f"msg = {safe}") - - r1, e1 = bundle.format_pattern("msg") - assert r1 == safe - assert e1 == () - - event("cache_hit_type=success") - - r2, e2 = bundle.format_pattern("msg") - assert r2 == safe - assert e2 == () - - # --- Configuration preservation properties --- - # (from test_bundle_complete_final_coverage, events added) - - @given( - st.text( - alphabet=st.sampled_from(["a", "b", "c", "_", "-"]), - min_size=1, - max_size=50, - ) - ) - def test_valid_locale_accepted(self, locale: str) -> None: - """Valid locale formats are accepted by FluentBundle.""" - if not locale or not locale[0].isalnum(): - event("outcome=filtered") - return - - try: - bundle = FluentBundle(locale) - event("outcome=accepted") - assert bundle.locale == normalize_locale(locale) - except ValueError: - event("outcome=rejected") - - @given(st.booleans()) - def test_use_isolating_preserved( - self, use_isolating: bool - ) -> None: - """use_isolating configuration is preserved.""" - kind = "isolating" if use_isolating else "non_isolating" - event(f"outcome={kind}") - bundle = FluentBundle("en", use_isolating=use_isolating) - assert bundle.use_isolating == use_isolating - - @given(st.booleans()) - def test_strict_mode_preserved(self, strict: bool) -> None: - """strict mode configuration is preserved.""" - kind = "strict" if strict else "lenient" - event(f"outcome={kind}") - bundle = FluentBundle("en", strict=strict) - assert bundle.strict == strict - - @given(st.integers(min_value=1, max_value=10000)) - def test_cache_config_size_preserved(self, cache_size: int) -> None: - """cache_config.size is preserved from CacheConfig constructor.""" - if cache_size < 100: - event("boundary=small") - elif cache_size < 5000: - event("boundary=medium") - else: - event("boundary=large") - bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) - assert bundle.cache_config is not None - assert bundle.cache_config.size == cache_size - - # --- Validation properties (from test_bundle_coverage, events added) --- - - @given( - term_name=st.from_regex( - r"[a-z][a-z0-9-]{0,10}", fullmatch=True - ) - ) - def test_duplicate_term_generates_warning( - self, term_name: str - ) -> None: - """Duplicate term IDs always generate warnings.""" - event("outcome=duplicate_warned") - bundle = FluentBundle("en_US", use_isolating=False) - ftl = f"-{term_name} = First\n-{term_name} = Second\n" - result = bundle.validate_resource(ftl) - assert any( - "Duplicate term ID" in w.message for w in result.warnings - ) - - @given( - term_a=st.from_regex( - r"[a-z][a-z0-9-]{0,10}", fullmatch=True - ), - term_b=st.from_regex( - r"[a-z][a-z0-9-]{0,10}", fullmatch=True - ), - ) - def test_undefined_term_ref_generates_warning( - self, term_a: str, term_b: str - ) -> None: - """Undefined term references always generate warnings.""" - assume(term_a != term_b) - event("outcome=undefined_warned") - bundle = FluentBundle("en_US", use_isolating=False) - ftl = f"-{term_a} = {{ -{term_b} }}" - result = bundle.validate_resource(ftl) - assert any( - f"undefined term '-{term_b}'" in w.message - for w in result.warnings - ) - - -# ============================================================================ -# LOCALE VALIDATION AND BUNDLE INTEGRATION COVERAGE -# ============================================================================ - - -class TestLocaleValidationAsciiOnly: - """Locale codes must be ASCII alphanumeric with underscore or hyphen separators.""" - - def test_valid_ascii_locales_accepted(self) -> None: - """Valid ASCII locale codes are accepted without error.""" - valid_locales = [ - "en", - "en_US", - "en-US", - "de_DE", - "lv_LV", - "zh_Hans_CN", - "pt_BR", - "ja_JP", - "ar_EG", - ] - for locale in valid_locales: - bundle = FluentBundle(locale) - assert bundle.locale == normalize_locale(locale) - - def test_unicode_locale_rejected(self) -> None: - """Locale codes with non-ASCII characters raise ValueError.""" - invalid_locales = [ - "\xe9_FR", - "\u65e5\u672c\u8a9e", - "en_\xfc", - "\xe4\xf6\xfc", - ] - for locale in invalid_locales: - with pytest.raises(ValueError, match="must be ASCII alphanumeric"): - FluentBundle(locale) - - def test_empty_locale_rejected(self) -> None: - """Empty locale code raises ValueError.""" - with pytest.raises(ValueError, match="locale cannot be blank"): - FluentBundle("") - - def test_invalid_format_rejected(self) -> None: - """Invalid locale code formats raise ValueError.""" - invalid_formats = [ - "_en", - "en_", - "en__US", - "en US", - "en.US", - "en@US", - ] - for locale in invalid_formats: - with pytest.raises(ValueError, match=r"Invalid locale:"): - FluentBundle(locale) - - @given( - st.builds( - lambda first, rest: first + rest, - first=st.text( - alphabet="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", - min_size=1, - max_size=1, - ), - rest=st.text( - alphabet="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", - min_size=0, - max_size=9, - ), - ) - ) - def test_ascii_alphanumeric_input_is_canonicalized_or_rejected(self, locale: str) -> None: - """PROPERTY: ASCII locale-like input either canonicalizes or fails explicitly.""" - event(f"locale_len={len(locale)}") - try: - bundle = FluentBundle(locale) - except ValueError: - with pytest.raises(ValueError, match=r"Unknown locale identifier|Invalid locale format"): - FluentBundle(locale) - event("outcome=rejected") - else: - assert bundle.locale == normalize_locale(locale) - event("outcome=accepted") - - -class TestBundleOverwriteWarning: - """Overwriting an existing message or term in add_resource logs a WARNING.""" - - def test_message_overwrite_logs_warning(self, caplog: pytest.LogCaptureFixture) -> None: - """Overwriting a message logs a warning with the message ID.""" - bundle = FluentBundle("en") - - with caplog.at_level(logging.WARNING): - bundle.add_resource("greeting = Hello") - bundle.add_resource("greeting = Goodbye") - - warning_messages = [ - record.message for record in caplog.records - if record.levelno == logging.WARNING - ] - assert any("Overwriting existing message 'greeting'" in msg for msg in warning_messages) - - def test_term_overwrite_logs_warning(self, caplog: pytest.LogCaptureFixture) -> None: - """Overwriting a term logs a warning with the term ID.""" - bundle = FluentBundle("en") - - with caplog.at_level(logging.WARNING): - bundle.add_resource("-brand = Acme") - bundle.add_resource("-brand = NewCorp") - - warning_messages = [ - record.message for record in caplog.records - if record.levelno == logging.WARNING - ] - assert any("Overwriting existing term '-brand'" in msg for msg in warning_messages) - - def test_no_warning_for_new_entries(self, caplog: pytest.LogCaptureFixture) -> None: - """No overwrite warning when adding distinct entries.""" - bundle = FluentBundle("en") - - with caplog.at_level(logging.WARNING): - bundle.add_resource("greeting = Hello") - bundle.add_resource("farewell = Goodbye") - - overwrite_warnings = [ - record.message for record in caplog.records - if record.levelno == logging.WARNING and "Overwriting" in record.message - ] - assert len(overwrite_warnings) == 0 - - def test_last_write_wins_behavior_preserved(self) -> None: - """Last Write Wins behavior: last added resource wins on repeated key.""" - bundle = FluentBundle("en") - bundle.add_resource("greeting = First") - bundle.add_resource("greeting = Second") - bundle.add_resource("greeting = Third") - - result, _ = bundle.format_pattern("greeting") - assert result == "Third" - - -class TestBundleIntegration: - """Integration tests via FluentBundle for multi-module coverage.""" - - def test_variant_key_failed_number_parse(self) -> None: - """Number-like variant key that fails parse falls through to identifier.""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource( - "msg = { $val ->\n" - " [-.test] Match\n" - " *[other] Other\n" - "}\n" - ) - result, _ = bundle.format_pattern( - "msg", {"val": "-.test"} - ) - assert result is not None - - def test_identifier_as_function_argument(self) -> None: - """Identifier becomes MessageReference in function call arguments.""" - bundle = FluentBundle("en_US") - - def test_func(val: str | int) -> str: - return str(val) - - bundle.add_function("TEST", test_func) - bundle.add_resource("ref = value") - bundle.add_resource("msg = { TEST(ref) }") - result, errors = bundle.format_pattern("msg") - assert not errors - assert result is not None - - def test_comment_with_crlf_ending(self) -> None: - """Comment with CRLF line ending is parsed correctly.""" - bundle = FluentBundle("en_US") - bundle.add_resource("# Comment\r\nmsg = value") - result, errors = bundle.format_pattern("msg") - assert not errors - assert "value" in result - - def test_full_coverage_integration(self) -> None: - """Integration test exercising parser, resolver, and validator together.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - "# Comment\n" - "msg1 = { $val }\n" - "msg2 = { NUMBER($val) }\n" - "msg3 = { -term }\n" - "msg4 = { other.attr }\n" - "sel = { 42 ->\n" - " [42] Match\n" - " *[other] Other\n" - "}\n" - "-brand = Firefox\n" - " .version = 1.0\n" - "empty =\n" - " .attr = Value\n" - ) - r1, _ = bundle.format_pattern("msg1", {"val": "t"}) - r2, _ = bundle.format_pattern("msg2", {"val": 42}) - r3, _ = bundle.format_pattern("sel") - assert all(r is not None for r in [r1, r2, r3]) - - validation = validate_resource( - "msg = { $val }\n-term = Firefox\n" - ) - assert validation is not None - - -class TestBundleLocaleValidationBeforeLoading: - """Locale validation happens before any resource loading attempt.""" - - def test_locale_validation_before_resource_loading(self) -> None: - """Invalid locale raises ValueError immediately, before resource loading.""" - with pytest.raises(ValueError, match="must be ASCII alphanumeric"): - FluentBundle("\xe9_FR") - - -# ============================================================================ -# TestAddResourceStream -# ============================================================================ - - -class TestAddResourceStream: - """FluentBundle.add_resource_stream incremental resource loading.""" - - def test_single_message_from_lines(self) -> None: - """add_resource_stream loads a single message from a line list.""" - bundle = FluentBundle("en") - bundle.add_resource_stream(["greeting = Hello\n"]) - assert bundle.has_message("greeting") - - def test_multiple_messages_blank_separated(self) -> None: - """Multiple messages separated by blank lines are all registered.""" - bundle = FluentBundle("en") - bundle.add_resource_stream(["msg1 = One\n", "\n", "msg2 = Two\n"]) - assert bundle.has_message("msg1") - assert bundle.has_message("msg2") - - def test_empty_stream_registers_nothing(self) -> None: - """Empty line iterable registers no messages.""" - bundle = FluentBundle("en") - bundle.add_resource_stream([]) - assert not bundle.has_message("anything") - - def test_returns_empty_junk_tuple_on_clean_source(self) -> None: - """Clean FTL stream returns empty junk tuple.""" - bundle = FluentBundle("en") - junk = bundle.add_resource_stream(["msg = Value\n"]) - assert junk == () - - def test_returns_junk_on_parse_error(self) -> None: - """Junk entries from invalid FTL are returned (not raised) in non-strict mode.""" - bundle = FluentBundle("en", strict=False) - junk = bundle.add_resource_stream([" invalid = indented\n"]) - assert len(junk) >= 1 - - def test_strict_mode_raises_on_junk(self) -> None: - """Strict mode raises SyntaxIntegrityError when the stream contains junk.""" - bundle = FluentBundle("en", strict=True) - with pytest.raises(SyntaxIntegrityError): - bundle.add_resource_stream([" invalid = indented\n"]) - - def test_source_path_threads_through(self) -> None: - """source_path kwarg is accepted without error.""" - bundle = FluentBundle("en") - bundle.add_resource_stream( - ["greeting = Hello\n"], source_path="locales/en/ui.ftl" - ) - assert bundle.has_message("greeting") - - def test_format_works_after_stream_load(self) -> None: - """Messages loaded via add_resource_stream are formattable.""" - bundle = FluentBundle("en") - bundle.add_resource_stream(["greeting = Hello, { $name }!\n"]) - result, errors = bundle.format_pattern("greeting", {"name": "World"}) - assert errors == () - assert result == "Hello, \u2068World\u2069!" - - def test_generator_input_accepted(self) -> None: - """Generator (not just list) is accepted as lines argument.""" - bundle = FluentBundle("en") - - def gen() -> object: - yield "msg = From generator\n" - - bundle.add_resource_stream(gen()) # type: ignore[arg-type] - assert bundle.has_message("msg") - - def test_equivalence_with_add_resource(self) -> None: - """add_resource_stream produces same messages as add_resource for same content.""" - source = "msg1 = One\n\nmsg2 = Two\n" - b1 = FluentBundle("en") - b1.add_resource(source) - b2 = FluentBundle("en") - b2.add_resource_stream(source.splitlines(keepends=True)) - assert b1.has_message("msg1") == b2.has_message("msg1") - assert b1.has_message("msg2") == b2.has_message("msg2") - r1, _ = b1.format_pattern("msg1") - r2, _ = b2.format_pattern("msg1") - assert r1 == r2 - - @given( - names=st.lists( - st.text( - min_size=1, - max_size=20, - alphabet=st.characters( - min_codepoint=ord("a"), - max_codepoint=ord("z"), - ), - ), - min_size=1, - max_size=10, - ) - ) - def test_all_messages_reachable_after_stream_load( - self, names: list[str] - ) -> None: - """All messages loaded via stream are reachable via has_message.""" - event(f"msg_count={len(names)}") - unique_names = list(dict.fromkeys(names)) - source = "\n\n".join(f"{name} = Value" for name in unique_names) + "\n" - bundle = FluentBundle("en") - bundle.add_resource_stream(source.splitlines(keepends=True)) - for name in unique_names: - assert bundle.has_message(name), f"Missing: {name}" +from tests.runtime_bundle_cases.basic import * # noqa: F403 - re-export split runtime bundle tests +from tests.runtime_bundle_cases.introspection import * # noqa: F403 - re-export split runtime bundle tests +from tests.runtime_bundle_cases.properties import * # noqa: F403 - re-export split runtime bundle tests +from tests.runtime_bundle_cases.state import * # noqa: F403 - re-export split runtime bundle tests diff --git a/tests/test_runtime_bundle_cache_security.py b/tests/test_runtime_bundle_cache_security.py index 64780b7d..8f87fb64 100644 --- a/tests/test_runtime_bundle_cache_security.py +++ b/tests/test_runtime_bundle_cache_security.py @@ -194,6 +194,8 @@ def test_get_cache_audit_log_returns_write_log_entries_when_enabled(self) -> Non assert audit_log is not None assert isinstance(audit_log, tuple) assert [entry.operation for entry in audit_log] == ["MISS", "PUT", "HIT"] + assert [entry.sequence for entry in audit_log] == [1, 2, 3] + assert [entry.cache_sequence for entry in audit_log] == [0, 1, 1] assert all(isinstance(entry, WriteLogEntry) for entry in audit_log) def test_audit_logging_records_operations(self) -> None: diff --git a/tests/test_runtime_bundle_delegation.py b/tests/test_runtime_bundle_delegation.py index 30459ea8..cd79c9d0 100644 --- a/tests/test_runtime_bundle_delegation.py +++ b/tests/test_runtime_bundle_delegation.py @@ -46,6 +46,13 @@ def test_locale_at_hard_limit_boundary_reaches_unknown_locale_validation(self) - with pytest.raises(ValueError, match="Unknown locale identifier"): FluentBundle(boundary_locale) + def test_whitespace_wrapped_boundary_locale_reaches_unknown_locale_validation(self) -> None: + """Boundary-length locales are trimmed, then rejected as unknown locales.""" + boundary_locale = "a" + ("b" * (MAX_LOCALE_LENGTH_HARD_LIMIT - 2)) + "c" + + with pytest.raises(ValueError, match="Unknown locale identifier"): + FluentBundle(f" {boundary_locale} ") + def test_locale_one_over_hard_limit_rejected(self) -> None: """Locale at MAX_LOCALE_LENGTH_HARD_LIMIT + 1 is rejected.""" # Create locale exceeding by exactly 1 character diff --git a/tests/test_runtime_cache_hashable.py b/tests/test_runtime_cache_hashable.py index eed9b1eb..75cc2953 100644 --- a/tests/test_runtime_cache_hashable.py +++ b/tests/test_runtime_cache_hashable.py @@ -1,1233 +1,12 @@ -"""Tests for IntegrityCache hashable key construction, NaN normalization, and -unhashable argument handling. - -Covers: -- __init__ parameter validation -- _make_hashable type-tagged conversions (bool/int/Decimal/datetime/date/ - FluentNumber/list/dict/set/tuple) for collision-free cache keys -- Depth limiting to prevent O(N) key computation on adversarial inputs -- _make_key integration and error recovery (RecursionError, TypeError) -- NaN normalization (Decimal) to prevent cache pollution DoS vectors -- Hashable conversion of list/dict/set/tuple args for full cache coverage -- Unhashable argument graceful bypass (skips caching, increments counter) -- Error bloat protection (max_entry_weight, max_errors_per_entry) -- LRU eviction and move-to-end behavior -- Property accessors (size, hits, misses, unhashable_skips, oversize_skips) -""" - -from __future__ import annotations - -from datetime import UTC, date, datetime -from decimal import Decimal -from typing import Any, NoReturn - -import pytest -from hypothesis import event, example, given, settings -from hypothesis import strategies as st - -from ftllexengine.constants import MAX_DEPTH -from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError -from ftllexengine.runtime.cache import IntegrityCache -from ftllexengine.runtime.function_bridge import FluentNumber, FluentValue - -# ============================================================================ -# SECTION 1: INITIALIZATION VALIDATION -# ============================================================================ - - -class TestIntegrityCacheInitValidation: - """Test IntegrityCache.__init__ parameter validation.""" - - def test_maxsize_zero_rejected(self) -> None: - """IntegrityCache rejects maxsize=0.""" - with pytest.raises(ValueError, match="maxsize must be positive"): - IntegrityCache(maxsize=0) - - def test_maxsize_negative_rejected(self) -> None: - """IntegrityCache rejects negative maxsize.""" - with pytest.raises(ValueError, match="maxsize must be positive"): - IntegrityCache(maxsize=-1) - - def test_max_entry_weight_zero_rejected(self) -> None: - """IntegrityCache rejects max_entry_weight=0.""" - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(max_entry_weight=0) - - def test_max_entry_weight_negative_rejected(self) -> None: - """IntegrityCache rejects negative max_entry_weight.""" - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(max_entry_weight=-1) - - def test_max_errors_per_entry_zero_rejected(self) -> None: - """IntegrityCache rejects max_errors_per_entry=0.""" - with pytest.raises(ValueError, match="max_errors_per_entry must be positive"): - IntegrityCache(max_errors_per_entry=0) - - def test_max_errors_per_entry_negative_rejected(self) -> None: - """IntegrityCache rejects negative max_errors_per_entry.""" - with pytest.raises(ValueError, match="max_errors_per_entry must be positive"): - IntegrityCache(max_errors_per_entry=-1) - - -# ============================================================================ -# SECTION 2: MAKE HASHABLE - TYPE-TAGGED CONVERSIONS -# ============================================================================ - - -class TestMakeHashableTypes: - """Test IntegrityCache._make_hashable type-tagged conversions. - - Python's hash equality (hash(1) == hash(True)) would cause cache collisions. - Type-tagging ensures distinct cache keys per type. - """ - - def test_make_hashable_primitives(self) -> None: - """_make_hashable type-tags bool/int to prevent hash collisions. - - str and None are not tagged (no collision risk). - bool/int are type-tagged so hash(1) == hash(True) does not cause - cache key collisions. - """ - assert IntegrityCache._make_hashable("text") == "text" - assert IntegrityCache._make_hashable(None) is None - assert IntegrityCache._make_hashable(42) == ("__int__", 42) - assert IntegrityCache._make_hashable(True) == ("__bool__", True) - assert IntegrityCache._make_hashable(False) == ("__bool__", False) - - def test_make_hashable_decimal(self) -> None: - """_make_hashable type-tags Decimal with str() to preserve scale. - - Decimal("1.0") and Decimal("1") are equal in Python but produce - different plural forms in CLDR (visible fraction digits differ). - Type-tagging with str() preserves scale for correct cache keys. - """ - result = IntegrityCache._make_hashable(Decimal("123.45")) - assert result == ("__decimal__", "123.45") - assert isinstance(result, tuple) - - def test_make_hashable_datetime_naive(self) -> None: - """_make_hashable type-tags naive datetime with isoformat and '__naive__'. - - Two datetimes representing the same UTC instant with different tzinfo - compare equal but format differently. Including tz_key prevents collision. - Naive datetime gets '__naive__' sentinel as tz_key. - """ - dt = datetime(2024, 1, 1, 12, 0, 0) # noqa: DTZ001 - naive datetime by design - result = IntegrityCache._make_hashable(dt) - assert result == ("__datetime__", "2024-01-01T12:00:00", "__naive__") - assert isinstance(result, tuple) - - def test_make_hashable_datetime_aware(self) -> None: - """_make_hashable type-tags aware datetime with UTC timezone string. - - Aware datetime includes the tzinfo string to prevent collisions between - identical times expressed in different timezones. - """ - dt = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) - result = IntegrityCache._make_hashable(dt) - assert result == ("__datetime__", "2024-01-01T12:00:00+00:00", "UTC") - assert isinstance(result, tuple) - - def test_make_hashable_date(self) -> None: - """_make_hashable type-tags date with isoformat.""" - d = date(2024, 1, 1) - result = IntegrityCache._make_hashable(d) - assert result == ("__date__", "2024-01-01") - assert isinstance(result, tuple) - - def test_make_hashable_fluent_number(self) -> None: - """_make_hashable type-tags FluentNumber with underlying type info for precision. - - FluentNumber wraps numeric values with formatting options. The inner value - is recursively normalized to handle NaN consistency. - """ - value = FluentNumber(value=42, formatted="42") - result = IntegrityCache._make_hashable(value) - assert result == ("__fluentnumber__", "int", ("__int__", 42), "42", None) - - def test_make_hashable_list_to_tuple(self) -> None: - """_make_hashable type-tags list distinctly from tuple. - - str([1,2]) = "[1, 2]" but str((1,2)) = "(1, 2)". Type-tagging with - '__list__' ensures lists and tuples produce different cache keys even - after both are converted to tuples internally. - """ - result = IntegrityCache._make_hashable([1, 2, [3, 4]]) - inner_list = ("__list__", (("__int__", 3), ("__int__", 4))) - expected = ("__list__", (("__int__", 1), ("__int__", 2), inner_list)) - assert result == expected - assert isinstance(result, tuple) - - def test_make_hashable_dict_to_sorted_tuples(self) -> None: - """_make_hashable converts dict to type-tagged sorted tuple of tuples.""" - result = IntegrityCache._make_hashable({"b": 2, "a": 1}) - assert isinstance(result, tuple) - assert result[0] == "__dict__" - inner = result[1] - assert isinstance(inner, tuple) - assert inner == (("a", ("__int__", 1)), ("b", ("__int__", 2))) - - def test_make_hashable_set_to_frozenset(self) -> None: - """_make_hashable converts set to type-tagged frozenset with type-tagged ints.""" - result = IntegrityCache._make_hashable({1, 2, 3}) - assert isinstance(result, tuple) - assert result[0] == "__set__" - inner = result[1] - expected_inner = frozenset({("__int__", 1), ("__int__", 2), ("__int__", 3)}) - assert inner == expected_inner - - def test_make_hashable_tuple_simple(self) -> None: - """_make_hashable type-tags tuples to distinguish from lists.""" - result = IntegrityCache._make_hashable((1, 2, 3)) - expected = ("__tuple__", (("__int__", 1), ("__int__", 2), ("__int__", 3))) - assert result == expected - assert isinstance(result, tuple) - - def test_make_hashable_tuple_with_nested_list(self) -> None: - """_make_hashable type-tags nested lists within tuples distinctly.""" - result = IntegrityCache._make_hashable((1, [2, 3], 4)) - inner_list = ("__list__", (("__int__", 2), ("__int__", 3))) - expected = ("__tuple__", (("__int__", 1), inner_list, ("__int__", 4))) - assert result == expected - assert isinstance(result, tuple) - hash(result) # Must be hashable end-to-end - - def test_make_hashable_tuple_with_nested_dict(self) -> None: - """_make_hashable type-tags tuples with nested dicts.""" - result = IntegrityCache._make_hashable((1, {"b": 2, "a": 1}, 3)) - inner_dict = ("__dict__", (("a", ("__int__", 1)), ("b", ("__int__", 2)))) - expected = ("__tuple__", (("__int__", 1), inner_dict, ("__int__", 3))) - assert result == expected - hash(result) - - def test_make_hashable_tuple_with_nested_set(self) -> None: - """_make_hashable type-tags tuples with nested sets.""" - result = IntegrityCache._make_hashable((1, {2, 3}, 4)) - inner_set = ("__set__", frozenset({("__int__", 2), ("__int__", 3)})) - expected = ("__tuple__", (("__int__", 1), inner_set, ("__int__", 4))) - assert result == expected - hash(result) - - def test_make_hashable_deeply_nested_tuple(self) -> None: - """_make_hashable type-tags all nested tuples, lists, and dicts.""" - result = IntegrityCache._make_hashable((1, (2, [3, {"a": 4}]), 5)) - inner_dict = ("__dict__", (("a", ("__int__", 4)),)) - inner_list = ("__list__", (("__int__", 3), inner_dict)) - inner_tuple = ("__tuple__", (("__int__", 2), inner_list)) - expected = ("__tuple__", (("__int__", 1), inner_tuple, ("__int__", 5))) - assert result == expected - hash(result) - - def test_make_hashable_nested_mixed_structures(self) -> None: - """_make_hashable handles mixed nested list/dict/set structures.""" - result = IntegrityCache._make_hashable([{"a": [1, 2]}, {3, 4}]) - assert isinstance(result, tuple) - assert result[0] == "__list__" - # Result must be fully hashable - hash(result) - - def test_make_hashable_unknown_type_raises(self) -> None: - """_make_hashable raises TypeError for unrecognized types.""" - - class CustomType: - pass - - with pytest.raises(TypeError, match="Unknown type in cache key"): - IntegrityCache._make_hashable(CustomType()) - - -# ============================================================================ -# SECTION 3: MAKE HASHABLE - DEPTH LIMITING -# ============================================================================ - - -class TestMakeHashableDepth: - """Test depth limiting in _make_hashable. - - Prevents O(N) key computation on adversarially nested inputs and guards - against stack overflow via RecursionError transformation. - """ - - def test_shallow_nesting_succeeds(self) -> None: - """Shallow nested structures convert successfully.""" - shallow = {"a": [1, 2, {"b": 3}]} - result = IntegrityCache._make_hashable(shallow) - assert result is not None - - def test_moderate_nesting_succeeds(self) -> None: - """Moderately nested structures (50 levels) convert successfully.""" - # 50 levels well under MAX_DEPTH - value: dict[str, Any] | int = 42 - for _ in range(50): - value = {"nested": value} - result = IntegrityCache._make_hashable(value) - assert result is not None - - def test_excessive_nesting_raises_type_error(self) -> None: - """Excessively nested structures raise TypeError with descriptive message.""" - value: dict[str, Any] | int = 42 - for _ in range(MAX_DEPTH + 10): - value = {"nested": value} - with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): - IntegrityCache._make_hashable(value) - - def test_custom_depth_parameter_respected(self) -> None: - """Custom depth parameter overrides default MAX_DEPTH.""" - value: dict[str, Any] | int = 42 - for _ in range(15): - value = {"nested": value} - - # Should fail at depth=10 - with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): - IntegrityCache._make_hashable(value, depth=10) - - # Should succeed at depth=20 - result = IntegrityCache._make_hashable(value, depth=20) - assert result is not None - - def test_list_nesting_depth_limited(self) -> None: - """List nesting respects depth limit.""" - value: list[Any] | int = 42 - for _ in range(MAX_DEPTH + 10): - value = [value] - with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): - IntegrityCache._make_hashable(value) - - def test_set_nesting_handled(self) -> None: - """Sets with simple values are converted; they cannot nest further. - - Sets cannot contain other sets (sets are unhashable), so depth is - naturally bounded. Simple sets should convert correctly. - """ - result = IntegrityCache._make_hashable({1, 2, 3}) - assert isinstance(result, tuple) - assert result[0] == "__set__" - assert isinstance(result[1], frozenset) - - def test_mixed_nesting_depth_limited(self) -> None: - """Mixed dict/list alternating nesting respects depth limit.""" - value: dict[str, Any] | list[Any] | int = 42 - for i in range(MAX_DEPTH + 10): - value = {"nested": value} if i % 2 == 0 else [value] - with pytest.raises(TypeError, match="Maximum nesting depth exceeded"): - IntegrityCache._make_hashable(value) - - -# ============================================================================ -# SECTION 4: MAKE KEY INTEGRATION -# ============================================================================ - - -class TestMakeKey: - """Test _make_key integration with _make_hashable. - - _make_key builds a cache key tuple from (message_id, args, attribute, - locale_code, use_isolating). Returns None on any hashing failure, - allowing cache bypass without raising to the caller. - """ - - def test_make_key_with_none_args(self) -> None: - """_make_key with None args returns key with empty tuple for args component.""" - key = IntegrityCache._make_key("msg-id", None, None, "en-US", use_isolating=True) - assert key is not None - assert key == ("msg-id", (), None, "en-US", True) - - def test_make_key_with_simple_args(self) -> None: - """_make_key handles simple string/int arguments.""" - key = IntegrityCache._make_key( - message_id="test", - args={"name": "Alice", "count": 42}, - attribute=None, - locale_code="en", - use_isolating=True, - ) - assert key is not None - - def test_make_key_with_nested_args(self) -> None: - """_make_key handles nested list arguments via _make_hashable.""" - key = IntegrityCache._make_key( - message_id="test", - args={"items": [1, 2, 3]}, - attribute=None, - locale_code="en", - use_isolating=True, - ) - assert key is not None - - def test_make_key_with_all_fluent_value_types(self) -> None: - """_make_key accepts all valid FluentValue types.""" - key = IntegrityCache._make_key( - message_id="test", - args={ - "string": "hello", - "int": 42, - "decimal": Decimal("3.14"), - "decimal2": Decimal("99.99"), - "datetime": datetime(2024, 1, 1, tzinfo=UTC), - "date": date(2024, 1, 1), - "fluent_number": FluentNumber(value=100, formatted="100"), - }, - attribute=None, - locale_code="en", - use_isolating=True, - ) - assert key is not None - - def test_make_key_with_deeply_nested_returns_none(self) -> None: - """_make_key returns None for excessively nested args (graceful bypass).""" - deep: dict[str, Any] | int = 42 - for _ in range(MAX_DEPTH + 10): - deep = {"nested": deep} - key = IntegrityCache._make_key( - message_id="test", - args={"deep": deep}, - attribute=None, - locale_code="en", - use_isolating=True, - ) - assert key is None # Cache bypass, not a crash - - def test_make_key_with_unknown_type_returns_none(self) -> None: - """_make_key returns None for unknown types (graceful bypass).""" - - class CustomObject: - pass - - key = IntegrityCache._make_key( - message_id="test", - args={"custom": CustomObject()}, # type: ignore[dict-item] - attribute=None, - locale_code="en", - use_isolating=True, - ) - assert key is None - - def test_make_key_catches_recursion_error(self) -> None: - """_make_key returns None when RecursionError occurs (circular reference).""" - circular_list: list[object] = [] - circular_list.append(circular_list) - args: dict[str, object] = {"data": circular_list} - result = IntegrityCache._make_key( - "msg", args, None, "en", use_isolating=True # type: ignore[arg-type] - ) - assert result is None - - def test_make_key_catches_type_error_in_hash(self) -> None: - """_make_key returns None when TypeError occurs during hash verification.""" - - class UnhashableAfterConversion: - """Passes _make_hashable type dispatch but fails hash().""" - - def __hash__(self) -> int: # pylint: disable=invalid-hash-returned - msg = "cannot hash" - raise TypeError(msg) - - args: dict[str, object] = {"data": UnhashableAfterConversion()} - result = IntegrityCache._make_key( - "msg", args, None, "en", use_isolating=True # type: ignore[arg-type] - ) - assert result is None - - -# ============================================================================ -# SECTION 5: NaN NORMALIZATION -# ============================================================================ - - -class TestNaNDecimalNormalization: - """Test that Decimal NaN values are normalized in cache keys.""" - - def test_decimal_nan_cache_key_consistency(self) -> None: - """Decimal NaN produces consistent cache key across independent instances.""" - cache = IntegrityCache(strict=False) - cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="Decimal Result", errors=()) - entry = cache.get("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Decimal Result" - - def test_decimal_nan_does_not_pollute_cache(self) -> None: - """Multiple puts with Decimal NaN update the same entry.""" - cache = IntegrityCache(strict=False, maxsize=100) - for i in range(10): - cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - stats = cache.get_stats() - assert stats["size"] == 1, ( - f"Expected 1 entry but got {stats['size']}. " - "Decimal NaN normalization may not be working." - ) - - def test_decimal_snan_normalized_same_as_qnan(self) -> None: - """Signaling NaN and quiet NaN both normalize to the same canonical key.""" - cache = IntegrityCache(strict=False) - cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="QNaN", errors=()) - # sNaN should resolve to same cache key as qNaN - entry = cache.get("msg", {"val": Decimal("sNaN")}, None, "en", use_isolating=True) - assert entry is not None - - def test_decimal_nan_different_from_regular_decimal(self) -> None: - """Decimal NaN has different cache key from regular Decimal values.""" - cache = IntegrityCache(strict=False) - cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="NaN Result", errors=()) - cache.put("msg", {"val": Decimal("1.0")}, None, "en", use_isolating=True, formatted="Regular Result", errors=()) - - nan_entry = cache.get("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True) - regular_entry = cache.get("msg", {"val": Decimal("1.0")}, None, "en", use_isolating=True) - - assert nan_entry is not None - assert nan_entry.formatted == "NaN Result" - assert regular_entry is not None - assert regular_entry.formatted == "Regular Result" - assert cache.get_stats()["size"] == 2 - - -class TestNaNInNestedStructures: - """Test NaN normalization in nested data structures.""" - - def test_nan_in_list_normalized(self) -> None: - """NaN values within lists are normalized for cache key consistency.""" - cache = IntegrityCache(strict=False) - items = [Decimal(1), Decimal("NaN"), Decimal(3)] - cache.put("msg", {"items": items}, None, "en", use_isolating=True, formatted="List Result", errors=()) - entry = cache.get("msg", {"items": items}, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "List Result" - - def test_nan_in_dict_normalized(self) -> None: - """NaN values within dicts are normalized for cache key consistency.""" - cache = IntegrityCache(strict=False) - args: dict[str, FluentValue] = {"data": {"a": Decimal(1), "b": Decimal("NaN")}} - cache.put("msg", args, None, "en", use_isolating=True, formatted="Dict Result", errors=()) - data = {"a": Decimal(1), "b": Decimal("NaN")} - entry = cache.get("msg", {"data": data}, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Dict Result" - - def test_deeply_nested_nan_normalized(self) -> None: - """NaN values in deeply nested structures are normalized consistently.""" - cache = IntegrityCache(strict=False) - deep_args: dict[str, FluentValue] = { - "outer": { - "inner": [ - {"value": Decimal("NaN")}, - {"value": Decimal("sNaN")}, - ] - } - } - cache.put("msg", deep_args, None, "en", use_isolating=True, formatted="Deep Result", errors=()) - fresh_args: dict[str, FluentValue] = { - "outer": { - "inner": [ - {"value": Decimal("NaN")}, - {"value": Decimal("sNaN")}, - ] - } - } - entry = cache.get("msg", fresh_args, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Deep Result" - - -class TestNaNSecurityProperties: - """Test security properties of NaN normalization.""" - - def test_nan_cache_pollution_prevented(self) -> None: - """NaN-based cache pollution attack is prevented by normalization. - - Attack scenario: 100 NaN-containing requests without normalization would - create 100 unique, unretrievable entries, evicting all legitimate entries. - With normalization all NaN entries collapse to a single key. - """ - cache = IntegrityCache(strict=False, maxsize=10) - for i in range(5): - cache.put(f"legit{i}", None, None, "en", use_isolating=True, formatted=f"Legit {i}", errors=()) - for i in range(100): - cache.put("attack", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted=f"Attack {i}", errors=()) - - # 5 legit + 1 attack = 6 entries (attack collapses to 1 due to normalization) - assert cache.get_stats()["size"] == 6 - for i in range(5): - entry = cache.get(f"legit{i}", None, None, "en", use_isolating=True) - assert entry is not None, f"Legitimate entry legit{i} was evicted!" - - @given(st.decimals(allow_nan=True)) - @settings(max_examples=100) - @example(Decimal("NaN")) - @example(Decimal("sNaN")) - @example(Decimal("Inf")) - @example(Decimal("-Inf")) - def test_all_decimal_special_values_produce_retrievable_keys( - self, value: Decimal - ) -> None: - """PROPERTY: For any Decimal value, put followed by get returns the entry.""" - cache = IntegrityCache(strict=False) - args = {"val": value} - cache.put("msg", args, None, "en", use_isolating=True, formatted=f"Value: {value}", errors=()) - entry = cache.get("msg", args, None, "en", use_isolating=True) - assert entry is not None, f"Entry for value {value!r} was not retrievable" - is_nan = value.is_nan() or value.is_snan() - event(f"is_nan={is_nan}") - - -class TestNaNHashableValue: - """Test _make_hashable NaN handling directly.""" - - def test_make_hashable_decimal_nan_returns_canonical(self) -> None: - """_make_hashable returns canonical ('__decimal__', '__NaN__') for Decimal NaN.""" - result = IntegrityCache._make_hashable(Decimal("NaN")) - assert result == ("__decimal__", "__NaN__") - - def test_make_hashable_decimal_snan_returns_canonical(self) -> None: - """_make_hashable returns canonical ('__decimal__', '__NaN__') for Decimal sNaN.""" - result = IntegrityCache._make_hashable(Decimal("sNaN")) - assert result == ("__decimal__", "__NaN__") - - def test_make_hashable_regular_decimal_uses_str(self) -> None: - """_make_hashable returns tagged str for regular Decimal values.""" - result = IntegrityCache._make_hashable(Decimal("1.50")) - assert result == ("__decimal__", "1.50") - - def test_make_hashable_decimal_infinity_uses_str_not_nan_sentinel(self) -> None: - """Decimal Infinity uses str() representation, not the NaN sentinel. - - Infinity satisfies Inf == Inf (unlike NaN), so no special normalization - is needed. Both +Inf and -Inf produce distinct, retrievable keys. - """ - pos_inf = IntegrityCache._make_hashable(Decimal("Inf")) - neg_inf = IntegrityCache._make_hashable(Decimal("-Inf")) - nan_result = IntegrityCache._make_hashable(Decimal("NaN")) - - assert pos_inf == ("__decimal__", "Infinity") - assert neg_inf == ("__decimal__", "-Infinity") - assert pos_inf != nan_result - assert neg_inf != nan_result - - -# ============================================================================ -# SECTION 6: HASHABLE CONVERSION - CACHE ROUNDTRIP TESTS -# ============================================================================ - - -class TestCacheHashableConversion: # pylint: disable=too-many-public-methods - """Test IntegrityCache automatic conversion of unhashable args to hashable keys. - - Lists, dicts, sets, and tuples are converted to hashable equivalents - (type-tagged tuples, sorted tuples, frozensets) enabling caching for these - types without requiring callers to pre-convert their arguments. - """ - - def test_get_with_list_value_now_cacheable(self) -> None: - """get() with list args succeeds: lists are converted to type-tagged tuples.""" - cache = IntegrityCache(strict=False, maxsize=100) - args = {"key": [1, 2, 3]} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - - def test_get_with_dict_value_now_cacheable(self) -> None: - """get() with nested dict args succeeds: dicts are converted to sorted tuples.""" - cache = IntegrityCache(strict=False, maxsize=100) - args = {"key": {"nested": "value"}} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - - def test_get_with_set_value_now_cacheable(self) -> None: - """get() with set args succeeds: sets are converted to type-tagged frozensets.""" - cache = IntegrityCache(strict=False, maxsize=100) - args: dict[str, object] = {"key": {1, 2, 3}} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - - def test_put_with_list_value_now_caches(self) -> None: - """put() with list args stores entry: lists are converted at key build time.""" - cache = IntegrityCache(strict=False, maxsize=100) - cache.put("msg-id", {"items": [1, 2, 3]}, None, "en-US", use_isolating=True, formatted="formatted", errors=()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - - def test_put_with_dict_value_now_caches(self) -> None: - """put() with nested dict args stores entry: dicts are converted at key build.""" - cache = IntegrityCache(strict=False, maxsize=100) - cache.put("msg-id", {"config": {"option": "value"}}, None, "en-US", use_isolating=True, formatted="fmt", errors=()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - - def test_make_key_converts_list_to_valid_key(self) -> None: - """_make_key returns a non-None key when args contain lists.""" - args: dict[str, object] = {"list_value": [1, 2, 3]} - key = IntegrityCache._make_key( - "msg-id", args, None, "en-US", use_isolating=True # type: ignore[arg-type] - ) - assert key is not None - - def test_make_key_converts_nested_structures_to_valid_key(self) -> None: - """_make_key returns a non-None key when args contain nested structures.""" - args: dict[str, object] = {"list": [1, 2], "dict": {"nested": "value"}} - key = IntegrityCache._make_key( - "msg-id", args, None, "en-US", use_isolating=True # type: ignore[arg-type] - ) - assert key is not None - - def test_get_with_tuple_value_cacheable(self) -> None: - """get() caches tuple-valued args correctly via type-tagged conversion.""" - cache = IntegrityCache(strict=False, maxsize=100) - args = {"coords": (10, 20, 30)} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - - def test_get_with_tuple_containing_list_cacheable(self) -> None: - """get() caches tuple-with-nested-list args: nested list is converted.""" - cache = IntegrityCache(strict=False, maxsize=100) - args: dict[str, object] = {"data": (1, [2, 3], 4)} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - - @given(st.tuples(st.integers(), st.integers(), st.integers())) - def test_get_with_various_tuples_cacheable( - self, tuple_value: tuple[int, int, int] - ) -> None: - """PROPERTY: Tuple-valued args cache and retrieve correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) - args = {"tuple_arg": tuple_value} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert cache.unhashable_skips == 0 - event(f"tuple_len={len(tuple_value)}") - - @given(st.lists(st.integers(), min_size=1, max_size=10)) - def test_get_with_various_lists_cacheable(self, list_value: list[int]) -> None: - """PROPERTY: List-valued args cache and retrieve correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) - args = {"list_arg": list_value} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert cache.unhashable_skips == 0 - event(f"list_len={len(list_value)}") - - @given( - st.dictionaries( - st.text(min_size=1, max_size=10), st.integers(), min_size=1, max_size=5 - ) - ) - def test_put_with_various_dicts_cacheable(self, dict_value: dict[str, int]) -> None: - """PROPERTY: Dict-valued args cache correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) - args = {"dict_arg": dict_value} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) - assert len(cache) == 1 - assert cache.unhashable_skips == 0 - event(f"dict_len={len(dict_value)}") - - def test_mixed_hashable_and_convertible_args(self) -> None: - """Cache handles mixed hashable/convertible args in the same call.""" - cache = IntegrityCache(strict=False, maxsize=100) - args: dict[str, object] = { - "str_arg": "value", - "int_arg": 42, - "list_arg": [1, 2, 3], - } - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert cache.unhashable_skips == 0 - - def test_empty_list_cacheable(self) -> None: - """Empty lists are converted and cached correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) - args: dict[str, list[object]] = {"empty_list": []} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert len(cache) == 1 - - def test_empty_dict_cacheable(self) -> None: - """Empty dicts are converted and cached correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) - args: dict[str, dict[object, object]] = {"empty_dict": {}} - cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] - cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] - assert cached is not None - assert cached.as_result() == ("formatted", ()) - assert len(cache) == 1 - - -# ============================================================================ -# SECTION 7: UNHASHABLE ARGUMENT HANDLING -# ============================================================================ - - -class TestUnhashableHandling: - """Test graceful bypass for arguments that cannot be hashed. - - Covers three bypass mechanisms: - 1. Unknown type in _make_hashable (case _ branch) - 2. Python's hash() raising TypeError - 3. RecursionError from circular references - In all cases: entry is not cached, unhashable_skips increments. - """ - - def test_get_with_unknown_type_skips_cache(self) -> None: - """get() with unknown type arg bypasses cache and increments unhashable_skips. - - UnknownType is not recognized by _make_hashable's match/case dispatch, - triggering TypeError("Unknown type in cache key") → _make_key returns None. - An unhashable bypass is not a cache miss: no key was looked up, so misses - is not incremented. Only unhashable_skips reflects the event. - """ - cache = IntegrityCache(strict=False) - - class UnknownType: - pass - - args: dict[str, object] = {"data": UnknownType()} - result = cache.get("msg", args, None, "en", use_isolating=True) # type: ignore[arg-type] - assert result is None - assert cache.unhashable_skips == 1 - assert cache.misses == 0 - assert cache.hits == 0 - - def test_put_with_unhashable_hash_raises_skips_cache(self) -> None: - """put() with arg whose __hash__ raises TypeError skips caching.""" - cache = IntegrityCache(strict=False) - - class CustomObject: - def __hash__(self) -> int: # pylint: disable=invalid-hash-returned - msg = "unhashable" - raise TypeError(msg) - - args: dict[str, object] = {"obj": CustomObject()} - cache.put("msg", args, None, "en", use_isolating=True, formatted="result", errors=()) # type: ignore[arg-type] - assert cache.size == 0 - assert cache.unhashable_skips == 1 - - def test_unhashable_custom_object_in_get_skipped(self) -> None: - """Custom unhashable objects in get() args bypass caching gracefully.""" - cache = IntegrityCache(strict=False, maxsize=100) - - class UnhashableClass: - def __init__(self) -> None: - self.data = [1, 2, 3] - - def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned - msg = "unhashable type" - raise TypeError(msg) - - custom_args: dict[str, object] = {"custom": UnhashableClass()} - result = cache.get("msg-id", custom_args, None, "en-US", use_isolating=True) # type: ignore[arg-type] - assert result is None - assert cache.unhashable_skips == 1 - - def test_unhashable_skips_not_incremented_for_convertible_types(self) -> None: - """unhashable_skips only counts truly unhashable objects; lists/dicts do not.""" - cache = IntegrityCache(strict=False, maxsize=100) - assert cache.unhashable_skips == 0 - - cache.get("msg1", {"list": [1]}, None, "en-US", use_isolating=True) - assert cache.unhashable_skips == 0 # Lists are convertible, not skipped - - cache.put("msg2", {"dict": {}}, None, "en-US", use_isolating=True, formatted="result", errors=()) - assert cache.unhashable_skips == 0 # Dicts are convertible, not skipped - - def test_unhashable_skips_preserved_on_clear(self) -> None: - """clear() does not reset unhashable_skips; counter is cumulative.""" - cache = IntegrityCache(strict=False, maxsize=100) - - class UnhashableClass: - def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned - msg = "unhashable type" - raise TypeError(msg) - - cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] - assert cache.unhashable_skips == 1 - # clear() removes entries but preserves cumulative observability metrics. - cache.clear() - assert cache.unhashable_skips == 1 - - def test_get_stats_includes_unhashable_skips(self) -> None: - """get_stats() reflects unhashable bypasses in unhashable_skips, not misses. - - Unhashable args bypass the cache entirely; no key lookup occurs. - misses counts only true cache misses (key looked up, not found). - """ - cache = IntegrityCache(strict=False, maxsize=100) - - class UnhashableClass: - def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned - msg = "unhashable type" - raise TypeError(msg) - - cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] - stats = cache.get_stats() - assert "unhashable_skips" in stats - assert stats["unhashable_skips"] == 1 - assert stats["misses"] == 0 - - def test_hashable_args_do_not_increment_unhashable_skips(self) -> None: - """Fully hashable primitive args never increment unhashable_skips.""" - cache = IntegrityCache(strict=False, maxsize=100) - args: dict[str, FluentValue] = {"str": "value", "int": 42, "decimal": Decimal("3.14")} - cache.get("msg1", args, None, "en-US", use_isolating=True) - cache.put("msg2", args, None, "en-US", use_isolating=True, formatted="result", errors=()) - assert cache.unhashable_skips == 0 - - def test_put_with_circular_reference_increments_skip_counter(self) -> None: - """Circular reference in args increments unhashable_skips and skips storage.""" - cache = IntegrityCache(strict=False, maxsize=100) - circular: dict[str, object] = {} - circular["self"] = circular # Circular reference - assert cache.unhashable_skips == 0 - cache.put( - message_id="test", - args=circular, # type: ignore[arg-type] - attribute=None, - locale_code="en", - use_isolating=True, - formatted="output", - errors=(), - ) - assert cache.unhashable_skips == 1 - assert len(cache) == 0 - - def test_put_with_nested_circular_reference_increments_skip(self) -> None: - """Nested circular reference also triggers unhashable_skips increment.""" - cache = IntegrityCache(strict=False, maxsize=50) - nested: dict[str, object] = {"level1": {}} - nested["level1"]["back"] = nested # type: ignore[index] - initial_skips = cache.unhashable_skips - cache.put( - message_id="nested_test", - args=nested, # type: ignore[arg-type] - attribute=None, - locale_code="lv", - use_isolating=True, - formatted="result", - errors=(), - ) - assert cache.unhashable_skips == initial_skips + 1 - assert len(cache) == 0 - - def test_put_with_custom_unhashable_in_args_dict(self) -> None: - """Custom unhashable object as a dict value triggers skip.""" - cache = IntegrityCache(strict=False, maxsize=100) - - class UnhashableObject: - __hash__ = None # type: ignore[assignment] - - unhashable_args = {"obj": UnhashableObject()} - initial_skips = cache.unhashable_skips - cache.put( - message_id="custom_obj", - args=unhashable_args, # type: ignore[arg-type] - attribute="attr", - locale_code="en_US", - use_isolating=True, - formatted="value", - errors=(), - ) - assert cache.unhashable_skips == initial_skips + 1 - assert len(cache) == 0 - - -# ============================================================================ -# SECTION 8: ERROR BLOAT PROTECTION -# ============================================================================ - - -class TestIntegrityCacheErrorBloatProtection: - """Test IntegrityCache error collection memory bounding. - - Prevents unbounded memory use when a single message generates many errors. - Two limits: max_errors_per_entry (count) and max_entry_weight (bytes). - """ - - def test_put_rejects_excessive_error_count(self) -> None: - """put() skips caching when error count exceeds max_errors_per_entry.""" - cache = IntegrityCache(strict=False, max_errors_per_entry=10) - errors = tuple( - FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) for i in range(15) - ) - cache.put("msg", None, None, "en", use_isolating=True, formatted="formatted text", errors=errors) - assert cache.size == 0 - assert cache.get_stats()["error_bloat_skips"] == 1 - assert cache.get("msg", None, None, "en", use_isolating=True) is None - - def test_put_rejects_excessive_error_weight(self) -> None: - """put() skips caching when total weight exceeds max_entry_weight. - - Dynamic weight: base (100) + string len + per-error weights. - 10 errors with 100-char messages + 100-char formatted string exceeds 2000. - """ - cache = IntegrityCache(strict=False, max_entry_weight=2000, max_errors_per_entry=50) - errors = tuple( - FrozenFluentError("E" * 100, ErrorCategory.REFERENCE) for _ in range(10) - ) - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=errors) - assert cache.size == 0 - # 10 errors pass the count check (10 <= 50), but combined weight - # (100 formatted + 10 * 200 per error = 2100) exceeds max_entry_weight=2000. - assert cache.get_stats()["combined_weight_skips"] == 1 - assert cache.get_stats()["error_bloat_skips"] == 0 - - def test_put_accepts_reasonable_error_collections(self) -> None: - """put() caches results with error counts and weights within limits.""" - cache = IntegrityCache(strict=False, max_entry_weight=15000, max_errors_per_entry=50) - errors = tuple( - FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) for i in range(10) - ) - cache.put("msg", None, None, "en", use_isolating=True, formatted="formatted text", errors=errors) - assert cache.size == 1 - assert cache.get_stats()["error_bloat_skips"] == 0 - cached = cache.get("msg", None, None, "en", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("formatted text", errors) - - -# ============================================================================ -# SECTION 9: LRU EVICTION BEHAVIOR -# ============================================================================ - - -class TestIntegrityCacheLRUBehavior: - """Test IntegrityCache LRU eviction and move-to-end behavior.""" - - def test_put_moves_existing_key_to_end_of_lru(self) -> None: - """put() on existing key marks it as recently used (moves to LRU tail).""" - cache = IntegrityCache(strict=False, maxsize=3) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) - cache.put("msg3", None, None, "en", use_isolating=True, formatted="result3", errors=()) - assert cache.size == 3 - - # Updating msg1 moves it to the LRU tail (recently used) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="updated1", errors=()) - - # Adding msg4 should evict msg2 (now the oldest) - cache.put("msg4", None, None, "en", use_isolating=True, formatted="result4", errors=()) - assert cache.size == 3 - - assert cache.get("msg2", None, None, "en", use_isolating=True) is None - entry1 = cache.get("msg1", None, None, "en", use_isolating=True) - assert entry1 is not None - assert entry1.as_result() == ("updated1", ()) - assert cache.get("msg3", None, None, "en", use_isolating=True) is not None - assert cache.get("msg4", None, None, "en", use_isolating=True) is not None - - def test_put_evicts_lru_entry_when_cache_full(self) -> None: - """put() evicts the least recently used entry when capacity is reached.""" - cache = IntegrityCache(strict=False, maxsize=2) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) - assert cache.size == 2 - - cache.put("msg3", None, None, "en", use_isolating=True, formatted="result3", errors=()) - assert cache.size == 2 - assert cache.get("msg1", None, None, "en", use_isolating=True) is None - assert cache.get("msg2", None, None, "en", use_isolating=True) is not None - assert cache.get("msg3", None, None, "en", use_isolating=True) is not None - - -# ============================================================================ -# SECTION 10: PROPERTY ACCESSORS -# ============================================================================ - - -class TestIntegrityCacheProperties: - """Test IntegrityCache property accessors for size, hit/miss counters, and limits.""" - - def test_len_and_size_consistent(self) -> None: - """len(cache) and cache.size return the same current entry count.""" - cache = IntegrityCache(strict=False) - assert len(cache) == 0 - cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) - assert len(cache) == 1 - assert cache.size == 1 - cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) - assert len(cache) == 2 - assert cache.size == 2 - - def test_maxsize_property(self) -> None: - """maxsize property returns the configured maximum size.""" - cache = IntegrityCache(strict=False, maxsize=500) - assert cache.maxsize == 500 - - def test_max_entry_weight_property(self) -> None: - """max_entry_weight property returns the configured weight limit.""" - cache = IntegrityCache(strict=False, max_entry_weight=5000) - assert cache.max_entry_weight == 5000 - - def test_hits_increments_on_cache_hit(self) -> None: - """hits property increments each time get() finds an entry.""" - cache = IntegrityCache(strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="result", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - assert cache.hits == 1 - cache.get("msg", None, None, "en", use_isolating=True) - assert cache.hits == 2 - - def test_misses_increments_on_cache_miss(self) -> None: - """misses increments only for true cache misses, not unhashable bypasses.""" - cache = IntegrityCache(strict=False) - cache.get("msg1", None, None, "en", use_isolating=True) - assert cache.misses == 1 - cache.get("msg2", None, None, "en", use_isolating=True) - assert cache.misses == 2 - - def test_misses_not_incremented_for_unhashable_bypass(self) -> None: - """Unhashable args bypass the cache entirely; misses is not incremented. - - An unhashable bypass is not a cache miss: no key was constructed or - looked up. Only unhashable_skips reflects the event. Conflating them - would deflate hit_rate and mislead operators about cache efficiency. - """ - cache = IntegrityCache(strict=False) - - class UnknownType: - pass - - cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] - assert cache.unhashable_skips == 1 - assert cache.misses == 0 - - def test_hit_rate_excludes_unhashable_bypasses(self) -> None: - """hit_rate is computed over hashable interactions only: hits / (hits + misses). - - Unhashable bypasses do not count as misses, so they do not dilute the - rate. A cache with one hashable hit and one unhashable bypass reports - hit_rate=100.0, not 50.0. - """ - cache = IntegrityCache(strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) # hit - - class UnknownType: - pass - - cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] - - stats = cache.get_stats() - assert stats["hits"] == 1 - assert stats["misses"] == 0 - assert stats["unhashable_skips"] == 1 - assert stats["hit_rate"] == 100.0 - - def test_hit_rate_zero_on_all_true_misses(self) -> None: - """hit_rate is 0.0 when all interactions are true misses (no unhashable).""" - cache = IntegrityCache(strict=False) - cache.get("absent", None, None, "en", use_isolating=True) - stats = cache.get_stats() - assert stats["hits"] == 0 - assert stats["misses"] == 1 - assert stats["hit_rate"] == 0.0 - - def test_hit_rate_correct_mixed_hits_and_misses(self) -> None: - """hit_rate is accurate across a mix of hits, misses, and unhashable bypasses.""" - cache = IntegrityCache(strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) # hit - cache.get("msg", None, None, "en", use_isolating=True) # hit - cache.get("absent", None, None, "en", use_isolating=True) # miss - - class UnknownType: - pass - - cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] - - stats = cache.get_stats() - assert stats["hits"] == 2 - assert stats["misses"] == 1 - assert stats["unhashable_skips"] == 1 - # hit_rate = 2 / (2 + 1) * 100 = 66.67% - assert stats["hit_rate"] == round(2 / 3 * 100, 2) - - def test_unhashable_skips_increments_on_skip(self) -> None: - """unhashable_skips increments for both get() and put() skips.""" - cache = IntegrityCache(strict=False) - - class UnknownType: - pass - - get_args: dict[str, object] = {"data": UnknownType()} - cache.get("msg", get_args, None, "en", use_isolating=True) # type: ignore[arg-type] - assert cache.unhashable_skips == 1 - put_args: dict[str, object] = {"data": UnknownType()} - cache.put("msg", put_args, None, "en", use_isolating=True, formatted="result", errors=()) # type: ignore[arg-type] - assert cache.unhashable_skips == 2 - - def test_oversize_skips_increments_on_oversize_entry(self) -> None: - """oversize_skips increments when formatted string exceeds max_entry_weight.""" - cache = IntegrityCache(strict=False, max_entry_weight=10) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - assert cache.oversize_skips == 1 - cache.put("msg2", None, None, "en", use_isolating=True, formatted="y" * 50, errors=()) - assert cache.oversize_skips == 2 - - @given( - st.integers(min_value=1, max_value=1000), - st.integers(min_value=1, max_value=10000), - st.integers(min_value=1, max_value=100), - ) - @settings(max_examples=50) - def test_property_constructor_parameters_stored_correctly( - self, - maxsize: int, - max_entry_weight: int, - max_errors_per_entry: int, - ) -> None: - """PROPERTY: Constructor parameters are stored and reflected by properties.""" - cache = IntegrityCache( - strict=False, - maxsize=maxsize, - max_entry_weight=max_entry_weight, - max_errors_per_entry=max_errors_per_entry, - ) - assert cache.maxsize == maxsize - assert cache.max_entry_weight == max_entry_weight - assert cache.size == 0 - assert cache.hits == 0 - assert cache.misses == 0 - event(f"maxsize={maxsize}") - - @given(st.text(min_size=0, max_size=100)) - @settings(max_examples=50) - def test_property_primitive_args_always_cacheable(self, text: str) -> None: - """PROPERTY: All primitive FluentValue types produce valid, retrievable entries.""" - cache = IntegrityCache(strict=False) - - args_list: list[dict[str, FluentValue]] = [ - {"text": text}, - {"num": 42}, - {"decimal": Decimal("3.14")}, - {"flag": True}, - {"val": None}, - ] - for args in args_list: - cache.put("msg", args, None, "en", use_isolating=True, formatted="result", errors=()) - entry = cache.get("msg", args, None, "en", use_isolating=True) - assert entry is not None - assert entry.as_result() == ("result", ()) - - event(f"text_len={len(text)}") +"""Aggregated runtime cache hashable test surface.""" + +from tests.runtime_cache_hashable_cases.section_1_initialization_validation import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_2_make_hashable_type_tagged_conversions import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_3_make_hashable_depth_limiting import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_4_make_key_integration import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_5_na_n_normalization import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_6_hashable_conversion_cache_roundtrip_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_7_unhashable_argument_handling import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_8_error_bloat_protection import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_9_lru_eviction_behavior import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_hashable_cases.section_10_property_accessors import * # noqa: F403 - re-export split test surface diff --git a/tests/test_runtime_cache_integrity.py b/tests/test_runtime_cache_integrity.py index 4efc9364..0382ca0f 100644 --- a/tests/test_runtime_cache_integrity.py +++ b/tests/test_runtime_cache_integrity.py @@ -1,1970 +1,7 @@ -"""Tests for IntegrityCache checksum verification, write-once, audit logging, -error content hash handling, error weight estimation, and property getters. +"""Aggregated runtime cache integrity test surface.""" -Financial-grade integrity verification tests: -- BLAKE2b-128 checksum computation and verification -- Corruption detection (strict/non-strict modes) -- Write-once semantics (strict/non-strict modes) -- Audit logging operations -- error.content_hash usage in checksum computation -- Fallback hashing for non-standard error objects -- _estimate_error_weight with context, diagnostic, and resolution path -- IntegrityCacheEntry.verify() defense-in-depth against corrupted errors -- Property getters (corruption_detected, write_once, strict) -- write_once_conflicts counter (true conflicts, both strict and non-strict) -- combined_weight_skips counter (distinct from oversize_skips and error_bloat_skips) -""" - -from __future__ import annotations - -import contextlib -import threading -import time -from concurrent.futures import ThreadPoolExecutor, as_completed -from datetime import UTC -from decimal import Decimal -from typing import Literal - -import pytest -from hypothesis import event, given, settings -from hypothesis import strategies as st - -from ftllexengine.constants import DEFAULT_MAX_ENTRY_WEIGHT -from ftllexengine.diagnostics import ( - Diagnostic, - DiagnosticCode, - ErrorCategory, - FrozenErrorContext, - FrozenFluentError, -) -from ftllexengine.integrity import CacheCorruptionError, IntegrityContext, WriteConflictError -from ftllexengine.runtime import FluentBundle -from ftllexengine.runtime.cache import ( - IntegrityCache, - IntegrityCacheEntry, - WriteLogEntry, - _estimate_error_weight, -) -from ftllexengine.runtime.cache_config import CacheConfig - -# Sentinel key_hash for unit tests that verify checksum mechanics but do not -# need meaningful key binding (all-zeros = "unbound test entry"). -_NO_KEY_HASH: bytes = b"\x00" * 8 - -# ============================================================================ -# CHECKSUM VERIFICATION TESTS -# ============================================================================ - - -class TestChecksumComputation: - """Test BLAKE2b-128 checksum computation.""" - - def test_checksum_computed_on_create(self) -> None: - """IntegrityCacheEntry.create() computes checksum.""" - entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.checksum is not None - assert len(entry.checksum) == 16 # BLAKE2b-128 = 16 bytes - - def test_different_metadata_different_checksum(self) -> None: - """Different metadata (sequence, timestamp) produces different checksums. - - Checksums now include created_at and sequence for complete audit trail integrity. - Identical content with different metadata produces different checksums. - """ - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("Hello", (), sequence=2, key_hash=_NO_KEY_HASH) - # Checksums differ because sequence is different (and created_at likely differs) - assert entry1.checksum != entry2.checksum - - def test_different_content_different_checksum(self) -> None: - """Different content produces different checksums.""" - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("World", (), sequence=1, key_hash=_NO_KEY_HASH) - assert entry1.checksum != entry2.checksum - - def test_errors_affect_checksum(self) -> None: - """Errors are included in checksum computation.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - entry_no_errors = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry_with_errors = IntegrityCacheEntry.create( - "Hello", (error,), sequence=1, key_hash=_NO_KEY_HASH - ) - assert entry_no_errors.checksum != entry_with_errors.checksum - - def test_verify_returns_true_for_valid_entry(self) -> None: - """verify() returns True for uncorrupted entry.""" - entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - - def test_entry_as_result_preserves_content(self) -> None: - """as_result() returns correct (formatted, errors) pair.""" - errors = (FrozenFluentError("Test", ErrorCategory.REFERENCE),) - entry = IntegrityCacheEntry.create("Hello", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry.as_result() == ("Hello", errors) - - @given(st.text(min_size=0, max_size=1000)) - @settings(max_examples=50) - def test_checksum_validates_correctly(self, text: str) -> None: - """PROPERTY: Checksum validation is deterministic for same entry. - - Checksums now include metadata (created_at, sequence) for complete audit - trail integrity. Different entries with same content will have different - checksums due to different timestamps. We verify that each entry's - checksum validates correctly. - """ - entry = IntegrityCacheEntry.create(text, (), sequence=1, key_hash=_NO_KEY_HASH) - # Each entry should validate its own checksum correctly - assert entry.verify() is True - event(f"text_len={len(text)}") - - -# ============================================================================ -# CORRUPTION DETECTION TESTS -# ============================================================================ - - -class TestCorruptionDetectionStrictMode: - """Test corruption detection in strict mode (fail-fast).""" - - def test_strict_mode_raises_on_corruption(self) -> None: - """strict=True raises CacheCorruptionError on checksum mismatch.""" - cache = IntegrityCache(strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Simulate corruption by directly modifying internal state - key = next(iter(cache._cache.keys())) - original_entry = cache._cache[key] - - # Create corrupted entry with wrong checksum - corrupted = IntegrityCacheEntry( - formatted="Corrupted!", - errors=original_entry.errors, - checksum=original_entry.checksum, # Wrong checksum for new content - created_at=original_entry.created_at, - sequence=original_entry.sequence, - key_hash=original_entry.key_hash, - ) - cache._cache[key] = corrupted - - with pytest.raises(CacheCorruptionError) as exc_info: - cache.get("msg", None, None, "en", use_isolating=True) - - assert "corruption detected" in str(exc_info.value).lower() - assert exc_info.value.context is not None - assert exc_info.value.context.component == "cache" - - def test_strict_mode_corruption_counter_incremented(self) -> None: - """Corruption detection increments corruption_detected counter.""" - cache = IntegrityCache(strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Corrupt entry - key = next(iter(cache._cache.keys())) - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - with contextlib.suppress(CacheCorruptionError): - cache.get("msg", None, None, "en", use_isolating=True) - - stats = cache.get_stats() - assert stats["corruption_detected"] == 1 - - -class TestCorruptionDetectionNonStrictMode: - """Test corruption detection in non-strict mode (silent eviction).""" - - def test_non_strict_evicts_corrupted_entry(self) -> None: - """strict=False silently evicts corrupted entry.""" - cache = IntegrityCache(strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Verify entry exists - assert cache.get("msg", None, None, "en", use_isolating=True) is not None - - # Corrupt entry - key = next(iter(cache._cache.keys())) - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - # Get returns None (not an exception) - result = cache.get("msg", None, None, "en", use_isolating=True) - assert result is None - - # Entry was evicted - stats = cache.get_stats() - assert stats["size"] == 0 - assert stats["corruption_detected"] == 1 - - def test_non_strict_records_miss_on_corruption(self) -> None: - """Corrupted entry results in cache miss.""" - cache = IntegrityCache(strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # First get is a hit - cache.get("msg", None, None, "en", use_isolating=True) - stats = cache.get_stats() - assert stats["hits"] == 1 - assert stats["misses"] == 0 - - # Corrupt entry - key = next(iter(cache._cache.keys())) - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - # Second get is a miss (corruption detected, entry evicted) - cache.get("msg", None, None, "en", use_isolating=True) - stats = cache.get_stats() - assert stats["misses"] == 1 # Corruption triggers miss - - -# ============================================================================ -# KEY BINDING CONFUSION TESTS (lines 653-670) -# ============================================================================ - - -class TestKeyBindingConfusion: - """Cover the key-binding confusion check (lines 652-670). - - The key-binding check fires when an entry's stored key_hash doesn't match - the hash of the lookup key. This is distinct from a checksum mismatch: - the entry is internally consistent (verify() passes) but is stored under - the wrong key slot — a sign of active tampering or memory corruption. - - Strategy: put an entry under key B, inject it into the slot for key A, - then call get(key A). verify() passes (entry_b is internally valid) but - the key_hash bound to key B != _compute_key_hash(key A). - """ - - @staticmethod - def _inject_key_confused_entry(cache: IntegrityCache) -> None: - """Put msg-b, then move its entry into the msg-a slot.""" - cache.put("msg-b", None, None, "en", use_isolating=True, formatted="Hello B", errors=()) - key_b: tuple = ("msg-b", (), None, "en", True) - key_a: tuple = ("msg-a", (), None, "en", True) - # Inject entry_b under key_a — checksum is valid but key_hash is wrong - cache._cache[key_a] = cache._cache[key_b] - - def test_key_confusion_strict_raises(self) -> None: - """strict=True raises CacheCorruptionError on key-binding mismatch.""" - cache = IntegrityCache(strict=True) - self._inject_key_confused_entry(cache) - - with pytest.raises(CacheCorruptionError) as exc_info: - cache.get("msg-a", None, None, "en", use_isolating=True) - - assert "key confusion" in str(exc_info.value).lower() - assert exc_info.value.context is not None - assert exc_info.value.context.component == "cache" - assert exc_info.value.context.operation == "get" - - def test_key_confusion_strict_increments_counter(self) -> None: - """Key-binding confusion increments corruption_detected counter.""" - cache = IntegrityCache(strict=True) - self._inject_key_confused_entry(cache) - - with contextlib.suppress(CacheCorruptionError): - cache.get("msg-a", None, None, "en", use_isolating=True) - - assert cache.get_stats()["corruption_detected"] == 1 - - def test_key_confusion_non_strict_returns_none(self) -> None: - """strict=False evicts the confused entry and returns None.""" - cache = IntegrityCache(strict=False) - self._inject_key_confused_entry(cache) - - result = cache.get("msg-a", None, None, "en", use_isolating=True) - - assert result is None - stats = cache.get_stats() - assert stats["corruption_detected"] == 1 - assert stats["misses"] == 1 - - def test_key_confusion_non_strict_evicts_entry(self) -> None: - """Non-strict key confusion removes the confused entry from the cache.""" - cache = IntegrityCache(strict=False) - self._inject_key_confused_entry(cache) - - key_a: tuple = ("msg-a", (), None, "en", True) - assert key_a in cache._cache # Injected entry is present - - cache.get("msg-a", None, None, "en", use_isolating=True) - - assert key_a not in cache._cache - - -# ============================================================================ -# WRITE-ONCE SEMANTICS TESTS -# ============================================================================ - - -class TestWriteOnceStrictMode: - """Test write-once semantics in strict mode.""" - - def test_write_once_allows_first_write(self) -> None: - """First write to a key succeeds.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" - - def test_write_once_strict_raises_on_second_write(self) -> None: - """Second write to same key raises WriteConflictError in strict mode.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - with pytest.raises(WriteConflictError) as exc_info: - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - assert "write-once violation" in str(exc_info.value).lower() - assert exc_info.value.existing_seq == 1 - assert exc_info.value.new_seq == 2 # Would-be sequence of rejected entry - - def test_write_once_preserves_original_value(self) -> None: - """Write-once rejection preserves original cached value.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Original", errors=()) - - with contextlib.suppress(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="Updated", errors=()) - - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Original" - - def test_write_once_conflict_counter_incremented_before_raise(self) -> None: - """write_once_conflicts is incremented before WriteConflictError is raised.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - with contextlib.suppress(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - # Counter must be observable even after an exception was raised - assert cache.write_once_conflicts == 1 - - -class TestWriteOnceNonStrictMode: - """Test write-once semantics in non-strict mode.""" - - def test_write_once_non_strict_silently_skips(self) -> None: - """Second write silently skipped in non-strict mode.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # No exception raised - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - # Original value preserved - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" - - def test_write_once_allows_different_keys(self) -> None: - """Write-once allows writes to different keys.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="First", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Second", errors=()) - - entry1 = cache.get("msg1", None, None, "en", use_isolating=True) - entry2 = cache.get("msg2", None, None, "en", use_isolating=True) - assert entry1 is not None - assert entry1.formatted == "First" - assert entry2 is not None - assert entry2.formatted == "Second" - - def test_write_once_conflict_counter_incremented(self) -> None: - """True write-once conflicts increment write_once_conflicts counter.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Different content for same key = true conflict - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - stats = cache.get_stats() - assert stats["write_once_conflicts"] == 1 - - def test_write_once_conflict_counter_multiple(self) -> None: - """write_once_conflicts accumulates across repeated true conflicts.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - for i in range(5): - cache.put("msg", None, None, "en", use_isolating=True, formatted=f"World-{i}", errors=()) - - assert cache.write_once_conflicts == 5 - - def test_write_once_conflict_not_incremented_for_idempotent(self) -> None: - """Idempotent writes do NOT increment write_once_conflicts.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent - - assert cache.write_once_conflicts == 0 - assert cache.idempotent_writes == 1 - - def test_write_once_conflict_counter_preserved_on_clear(self) -> None: - """clear() preserves cumulative write_once_conflicts counter.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) # Conflict - - assert cache.write_once_conflicts == 1 - cache.clear() - assert cache.write_once_conflicts == 1 - - -class TestWriteOnceDisabled: - """Test behavior when write-once is disabled (default).""" - - def test_default_allows_overwrites(self) -> None: - """Default cache allows overwriting entries.""" - cache = IntegrityCache(write_once=False, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "World" - - -# ============================================================================ -# AUDIT LOGGING TESTS -# ============================================================================ - - -class TestAuditLogging: - """Test audit logging functionality.""" - - def test_audit_disabled_by_default(self) -> None: - """Audit logging is disabled by default.""" - cache = IntegrityCache() - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - - stats = cache.get_stats() - assert stats["audit_enabled"] is False - assert stats["audit_entries"] == 0 - - def test_audit_enabled_records_operations(self) -> None: - """Audit logging records operations when enabled.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - cache.get("msg2", None, None, "en", use_isolating=True) # Miss - - stats = cache.get_stats() - assert stats["audit_enabled"] is True - assert stats["audit_entries"] >= 3 # PUT + HIT + MISS - - def test_audit_log_entry_structure(self) -> None: - """Audit log entries have correct structure.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Access internal audit log for verification - audit_log = cache._audit_log - assert audit_log is not None - assert len(audit_log) >= 1 - - entry = audit_log[0] # pylint: disable=unsubscriptable-object - assert isinstance(entry, WriteLogEntry) - assert entry.operation == "PUT" - assert isinstance(entry.key_hash, str) - assert isinstance(entry.timestamp, float) - assert entry.sequence >= 0 - assert isinstance(entry.checksum_hex, str) - - def test_audit_log_records_all_operation_types(self) -> None: - """Audit log records HIT, MISS, PUT, EVICT operations.""" - cache = IntegrityCache(maxsize=2, enable_audit=True, strict=False) - - # PUT 3 entries to trigger eviction - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) # Evicts msg1 - - # HIT - cache.get("msg2", None, None, "en", use_isolating=True) - - # MISS - cache.get("nonexistent", None, None, "en", use_isolating=True) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = {entry.operation for entry in audit_log} - assert "PUT" in operations - assert "EVICT" in operations - assert "HIT" in operations - assert "MISS" in operations - - def test_audit_log_max_entries_enforced(self) -> None: - """Audit log respects max_audit_entries limit.""" - cache = IntegrityCache(enable_audit=True, max_audit_entries=5, strict=False) - - # Generate more operations than max_audit_entries - for i in range(10): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - - audit_log = cache._audit_log - assert audit_log is not None - assert len(audit_log) <= 5 - - def test_audit_log_not_cleared_on_cache_clear(self) -> None: - """Audit log preserved when cache is cleared (historical record).""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - audit_log_before = len(cache._audit_log or []) - cache.clear() - audit_log_after = len(cache._audit_log or []) - - assert audit_log_after >= audit_log_before - - def test_audit_records_write_once_rejection(self) -> None: - """Audit log records WRITE_ONCE_CONFLICT for different content writes.""" - cache = IntegrityCache(write_once=True, enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="First", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Second", errors=()) # Conflict (different content) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "WRITE_ONCE_CONFLICT" in operations - - -class TestAuditLoggingCorruption: - """Test audit logging of corruption events.""" - - def test_audit_records_corruption(self) -> None: - """Audit log records CORRUPTION operations.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Corrupt entry - key = next(iter(cache._cache.keys())) - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - # Trigger corruption detection - cache.get("msg", None, None, "en", use_isolating=True) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "CORRUPTION" in operations - - -# ============================================================================ -# SEQUENCE NUMBER TESTS -# ============================================================================ - - -class TestSequenceNumbers: - """Test monotonically increasing sequence numbers.""" - - def test_sequence_increments_on_put(self) -> None: - """Sequence number increments with each put.""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) - - entry1 = cache.get("msg1", None, None, "en", use_isolating=True) - entry2 = cache.get("msg2", None, None, "en", use_isolating=True) - entry3 = cache.get("msg3", None, None, "en", use_isolating=True) - - assert entry1 is not None - assert entry1.sequence == 1 - assert entry2 is not None - assert entry2.sequence == 2 - assert entry3 is not None - assert entry3.sequence == 3 - - def test_sequence_not_reset_on_clear(self) -> None: - """Sequence number continues after cache clear (audit trail integrity).""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - - stats_before = cache.get_stats() - assert stats_before["sequence"] == 2 - - cache.clear() - - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) - - entry = cache.get("msg3", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.sequence == 3 - - -# ============================================================================ -# CONCURRENT INTEGRITY TESTS -# ============================================================================ - - -class TestConcurrentIntegrity: - """Test integrity under concurrent access.""" - - def test_concurrent_puts_maintain_integrity(self) -> None: - """Concurrent puts produce valid checksums.""" - cache = IntegrityCache(maxsize=100, strict=False) - - def put_entry(i: int) -> None: - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - - with ThreadPoolExecutor(max_workers=10) as executor: - futures = [executor.submit(put_entry, i) for i in range(100)] - for future in as_completed(futures): - future.result() - - # All entries should have valid checksums - for i in range(100): - entry = cache.get(f"msg{i}", None, None, "en", use_isolating=True) - if entry is not None: - assert entry.verify(), f"Entry msg{i} failed checksum verification" - - def test_write_once_thread_safety(self) -> None: - """Write-once semantics are thread-safe.""" - cache = IntegrityCache(write_once=True, strict=False) - success_count = 0 - lock = threading.Lock() - - def try_put() -> None: - nonlocal success_count - try: - cache.put("msg", None, None, "en", use_isolating=True, formatted="Value", errors=()) - with lock: - success_count += 1 - except WriteConflictError: - pass # Expected for some threads - - threads = [threading.Thread(target=try_put) for _ in range(20)] - for thread in threads: - thread.start() - for thread in threads: - thread.join() - - # Only one entry should exist - stats = cache.get_stats() - assert stats["size"] == 1 - - -# ============================================================================ -# STATS VERIFICATION TESTS -# ============================================================================ - - -class TestIntegrityStats: - """Test integrity-related statistics.""" - - def test_stats_includes_integrity_fields(self) -> None: - """get_stats() includes all integrity-related fields.""" - cache = IntegrityCache( - write_once=True, - strict=True, - enable_audit=True, - ) - - stats = cache.get_stats() - - # Verify integrity-specific fields exist - assert "corruption_detected" in stats - assert "sequence" in stats - assert "write_once" in stats - assert "strict" in stats - assert "audit_enabled" in stats - assert "audit_entries" in stats - assert "write_once_conflicts" in stats - assert "combined_weight_skips" in stats - - # Verify types - assert isinstance(stats["corruption_detected"], int) - assert isinstance(stats["sequence"], int) - assert isinstance(stats["write_once"], bool) - assert isinstance(stats["strict"], bool) - assert isinstance(stats["audit_enabled"], bool) - assert isinstance(stats["audit_entries"], int) - assert isinstance(stats["write_once_conflicts"], int) - assert isinstance(stats["combined_weight_skips"], int) - - # Verify values reflect configuration - assert stats["write_once"] is True - assert stats["strict"] is True - assert stats["audit_enabled"] is True - assert stats["write_once_conflicts"] == 0 - assert stats["combined_weight_skips"] == 0 - - def test_corruption_counter_accumulates(self) -> None: - """corruption_detected counter accumulates across multiple corruptions.""" - cache = IntegrityCache(strict=False) - - for i in range(3): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - - # Corrupt all entries - for key in list(cache._cache.keys()): - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - # Trigger corruption detection for each - for i in range(3): - cache.get(f"msg{i}", None, None, "en", use_isolating=True) - - stats = cache.get_stats() - assert stats["corruption_detected"] == 3 - - -# ============================================================================ -# CONTENT HASH TESTS -# ============================================================================ - - -class TestContentHash: - """Test content-only hash computation for idempotent write detection.""" - - def test_content_hash_computed(self) -> None: - """IntegrityCacheEntry has content_hash property.""" - entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - content_hash = entry.content_hash - assert content_hash is not None - assert len(content_hash) == 16 # BLAKE2b-128 - - def test_identical_content_same_hash(self) -> None: - """Entries with identical content have identical content hashes. - - This is critical for idempotent write detection: concurrent threads - computing the same formatted result should produce matching content hashes. - """ - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("Hello", (), sequence=2, key_hash=_NO_KEY_HASH) - - # Full checksums differ (include metadata) - assert entry1.checksum != entry2.checksum - - # Content hashes are identical - assert entry1.content_hash == entry2.content_hash - - def test_different_content_different_hash(self) -> None: - """Entries with different content have different content hashes.""" - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("World", (), sequence=1, key_hash=_NO_KEY_HASH) - - assert entry1.content_hash != entry2.content_hash - - def test_errors_affect_content_hash(self) -> None: - """Errors are included in content hash computation.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - entry_no_errors = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry_with_errors = IntegrityCacheEntry.create( - "Hello", (error,), sequence=1, key_hash=_NO_KEY_HASH - ) - - assert entry_no_errors.content_hash != entry_with_errors.content_hash - - @given(st.text(min_size=0, max_size=500)) - @settings(max_examples=30) - def test_content_hash_deterministic(self, text: str) -> None: - """PROPERTY: Content hash is deterministic for same content.""" - entry1 = IntegrityCacheEntry.create(text, (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create(text, (), sequence=999, key_hash=_NO_KEY_HASH) - - assert entry1.content_hash == entry2.content_hash - event(f"text_len={len(text)}") - - -# ============================================================================ -# IDEMPOTENT WRITE TESTS -# ============================================================================ - - -class TestIdempotentWrites: - """Test idempotent write detection for thundering herd scenarios. - - In write_once mode, concurrent writes with identical content (formatted + errors) - are treated as idempotent operations, not conflicts. This prevents false-positive - WriteConflictError during thundering herds where multiple threads resolve the - same message simultaneously. - """ - - def test_idempotent_write_succeeds_in_strict_mode(self) -> None: - """Identical content is allowed in write_once + strict mode. - - Thundering herd scenario: Multiple threads resolve same message, - all compute identical results. Second thread should succeed silently. - """ - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Second put with IDENTICAL content should succeed (idempotent) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Verify entry unchanged - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" - assert entry.sequence == 1 # Original sequence preserved - - def test_different_content_raises_conflict(self) -> None: - """Different content raises WriteConflictError in strict mode.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - with pytest.raises(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - def test_idempotent_write_counter_incremented(self) -> None: - """Idempotent writes increment the idempotent_writes counter.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Perform idempotent writes - for _ in range(5): - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - stats = cache.get_stats() - assert stats["idempotent_writes"] == 5 - - def test_idempotent_writes_property(self) -> None: - """idempotent_writes property returns correct count.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - assert cache.idempotent_writes == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - assert cache.idempotent_writes == 1 - - def test_idempotent_with_errors(self) -> None: - """Idempotent detection includes errors in comparison.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - cache = IntegrityCache(write_once=True, strict=True) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=(error,)) - - # Same content WITH same error = idempotent - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=(error,)) - assert cache.idempotent_writes == 1 - - # Same text but WITHOUT error = conflict - with pytest.raises(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - def test_idempotent_non_strict_mode(self) -> None: - """Idempotent writes also work in non-strict mode.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Idempotent write - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Different content silently ignored (non-strict) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - stats = cache.get_stats() - assert stats["idempotent_writes"] == 1 # Only one idempotent - - # Original value preserved - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" - - def test_idempotent_counter_preserved_on_clear(self) -> None: - """Idempotent counter is cumulative across clear() calls.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent - - assert cache.idempotent_writes == 1 - - # clear() removes entries but does NOT reset cumulative metrics. - cache.clear() - - assert cache.idempotent_writes == 1 - - def test_audit_records_idempotent_writes(self) -> None: - """Audit log records WRITE_ONCE_IDEMPOTENT operations.""" - cache = IntegrityCache(write_once=True, strict=True, enable_audit=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "WRITE_ONCE_IDEMPOTENT" in operations - - def test_audit_records_conflict(self) -> None: - """Audit log records WRITE_ONCE_CONFLICT for different content.""" - cache = IntegrityCache(write_once=True, strict=False, enable_audit=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) # Conflict (non-strict) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "WRITE_ONCE_CONFLICT" in operations - - -class TestIdempotentWritesConcurrency: - """Test idempotent writes under concurrent access (thundering herd).""" - - def test_concurrent_identical_writes_no_exceptions(self) -> None: - """Concurrent writes with identical content all succeed (no exceptions). - - This is the thundering herd scenario: multiple threads resolve same - message simultaneously, all compute identical results. Without idempotent - detection, N-1 threads would crash with WriteConflictError. - """ - cache = IntegrityCache(write_once=True, strict=True) - errors: list[Exception] = [] - - def put_identical() -> None: - try: - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - except Exception as e: # pylint: disable=broad-exception-caught - errors.append(e) - - # 20 threads all trying to cache same value - threads = [threading.Thread(target=put_identical) for _ in range(20)] - for thread in threads: - thread.start() - for thread in threads: - thread.join() - - # NO exceptions should occur (all are idempotent or first write) - assert len(errors) == 0, f"Got {len(errors)} exceptions: {errors}" - - # Only one entry should exist - stats = cache.get_stats() - assert stats["size"] == 1 - - # Idempotent counter should reflect concurrent writes minus first - assert stats["idempotent_writes"] == 19 # 20 threads - 1 first write - - def test_concurrent_different_writes_raises_conflicts(self) -> None: - """Concurrent writes with DIFFERENT content raise conflicts.""" - cache = IntegrityCache(write_once=True, strict=True) - conflict_count = 0 - lock = threading.Lock() - - def put_different(i: int) -> None: - nonlocal conflict_count - try: - cache.put("msg", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - except WriteConflictError: - with lock: - conflict_count += 1 - - # 10 threads all trying to cache DIFFERENT values - threads = [threading.Thread(target=put_different, args=(i,)) for i in range(10)] - for thread in threads: - thread.start() - for thread in threads: - thread.join() - - # Most writes should fail (conflict) - assert conflict_count >= 9 # At least 9 conflicts (1 succeeds) - - # Only one entry should exist - stats = cache.get_stats() - assert stats["size"] == 1 - - -# ============================================================================ -# CACHE KEY COLLISION PREVENTION TESTS -# ============================================================================ - - -class TestDatetimeTimezoneCollisionPrevention: - """Test that datetime objects with different timezones produce distinct cache keys. - - Two datetime objects can represent the same UTC instant but have different tzinfo. - Python's datetime equality considers them equal, but they format to different - local time strings. The cache must distinguish them. - """ - - def test_same_utc_instant_different_timezone_distinct_keys(self) -> None: - """Datetimes with same UTC instant but different tzinfo produce distinct keys.""" - from datetime import datetime, timedelta, timezone # noqa: PLC0415 - import inside function - - # 12:00 UTC - dt_utc = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) - # 07:00 EST (UTC-5) = 12:00 UTC - SAME INSTANT - dt_est = datetime(2024, 1, 1, 7, 0, 0, tzinfo=timezone(timedelta(hours=-5))) - - # Verify they represent the same instant (Python equality) - assert dt_utc == dt_est - - # But they should produce DIFFERENT cache keys - key_utc = IntegrityCache._make_hashable(dt_utc) - key_est = IntegrityCache._make_hashable(dt_est) - assert key_utc != key_est - - def test_naive_datetime_distinguished_from_aware(self) -> None: - """Naive datetime is distinguished from aware datetime.""" - from datetime import datetime # noqa: PLC0415 - import inside function - - dt_naive = datetime(2024, 1, 1, 12, 0, 0) # noqa: DTZ001 - naive datetime by design - dt_aware = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) - - key_naive = IntegrityCache._make_hashable(dt_naive) - key_aware = IntegrityCache._make_hashable(dt_aware) - - # Different tz_key means different cache keys - assert key_naive != key_aware - assert isinstance(key_naive, tuple) - assert isinstance(key_aware, tuple) - assert key_naive[2] == "__naive__" - assert key_aware[2] == "UTC" - - -class TestDecimalNegativeZeroCollisionPrevention: - """Test that Decimal("0") and Decimal("-0") produce distinct cache keys. - - Python's Decimal("0") == Decimal("-0"), but locale-aware formatting may - distinguish them (e.g., "-0" vs "0"). The cache must treat them as distinct. - """ - - def test_zero_and_negative_zero_distinct_keys(self) -> None: - """Decimal("0") and Decimal("-0") produce distinct cache keys.""" - key_pos = IntegrityCache._make_hashable(Decimal(0)) - key_neg = IntegrityCache._make_hashable(Decimal("-0")) - - # They're equal in Python - assert Decimal(0) == Decimal("-0") - - # But distinct in cache keys (via str representation) - assert key_pos != key_neg - assert key_pos == ("__decimal__", "0") - assert key_neg == ("__decimal__", "-0") - - -class TestSequenceMappingABCSupport: - """Test that Sequence and Mapping ABCs are supported, not just list/tuple/dict.""" - - def test_userlist_accepted(self) -> None: - """UserList (Sequence ABC) is accepted and type-tagged.""" - from collections import UserList # noqa: PLC0415 - import inside function - - values = UserList([1, 2, 3]) - result = IntegrityCache._make_hashable(values) - - # Should be tagged as __seq__ (generic Sequence) - assert isinstance(result, tuple) - assert result[0] == "__seq__" - # Inner values are type-tagged - assert result[1] == (("__int__", 1), ("__int__", 2), ("__int__", 3)) - - def test_chainmap_accepted(self) -> None: - """ChainMap (Mapping ABC) is accepted with __mapping__ tag.""" - from collections import ChainMap # noqa: PLC0415 - import inside function - - values: ChainMap[str, int] = ChainMap({"a": 1}, {"b": 2}) - result = IntegrityCache._make_hashable(values) - - # Should be tagged tuple with __mapping__ prefix - assert isinstance(result, tuple) - assert result[0] == "__mapping__" - # ChainMap flattens to view of first-found keys - inner = result[1] - assert isinstance(inner, tuple) - assert ("a", ("__int__", 1)) in inner - assert ("b", ("__int__", 2)) in inner - - def test_list_still_tagged_as_list(self) -> None: - """Regular list still uses __list__ tag, not __seq__.""" - result = IntegrityCache._make_hashable([1, 2]) - assert isinstance(result, tuple) - assert result[0] == "__list__" - - def test_tuple_still_tagged_as_tuple(self) -> None: - """Regular tuple still uses __tuple__ tag, not __seq__.""" - result = IntegrityCache._make_hashable((1, 2)) - assert isinstance(result, tuple) - assert result[0] == "__tuple__" - - -# ============================================================================ -# ENTRY CONTENT HASH AND CHECKSUM COMPUTATION -# ============================================================================ - - -class TestIntegrityCacheEntryContentHash: - """Test IntegrityCacheEntry checksum computation with error.content_hash.""" - - def test_compute_checksum_uses_error_content_hash(self) -> None: - """_compute_checksum uses error.content_hash when available.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - entry = IntegrityCacheEntry.create( - "formatted text", (error,), sequence=1, key_hash=_NO_KEY_HASH - ) - assert entry.checksum is not None - assert len(entry.checksum) == 16 # BLAKE2b-128 - assert entry.verify() is True - - def test_compute_checksum_with_multiple_errors_content_hash(self) -> None: - """_compute_checksum uses content_hash for multiple errors.""" - errors = ( - FrozenFluentError("Error 1", ErrorCategory.REFERENCE), - FrozenFluentError("Error 2", ErrorCategory.RESOLUTION), - FrozenFluentError("Error 3", ErrorCategory.CYCLIC), - ) - entry = IntegrityCacheEntry.create( - "formatted text", errors, sequence=1, key_hash=_NO_KEY_HASH - ) - assert entry.checksum is not None - assert entry.verify() is True - - @given(st.integers(min_value=1, max_value=10)) - @settings(max_examples=50) - def test_property_checksum_deterministic_with_errors(self, error_count: int) -> None: - """PROPERTY: Checksum is deterministic; each entry validates against itself. - - Checksums include metadata (created_at, sequence) for complete audit trail - integrity, so two independently created entries with the same content will - have different checksums. Each entry does self-validate correctly. - """ - errors = tuple( - FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) - for i in range(error_count) - ) - entry = IntegrityCacheEntry.create("formatted", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - entry2 = IntegrityCacheEntry.create("formatted", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry2.verify() is True - event(f"error_count={error_count}") - - def test_cache_put_get_with_frozen_errors(self) -> None: - """Cache operations work correctly with FrozenFluentError.content_hash.""" - cache = IntegrityCache(strict=False) - errors = ( - FrozenFluentError("Reference error", ErrorCategory.REFERENCE), - FrozenFluentError("Resolution error", ErrorCategory.RESOLUTION), - ) - cache.put("msg", None, None, "en", use_isolating=True, formatted="formatted text", errors=errors) - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "formatted text" - assert entry.errors == errors - assert entry.verify() is True - - -# ============================================================================ -# AUDIT LOG PUBLIC API (get_audit_log) -# ============================================================================ - - -class TestIntegrityCacheAuditLogDisabled: - """Test get_audit_log() returns empty tuple when audit logging is disabled.""" - - def test_get_audit_log_returns_empty_when_disabled_by_default(self) -> None: - """get_audit_log() returns empty tuple when audit disabled (default).""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) - cache.get("msg1", None, None, "en", use_isolating=True) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) - audit_log = cache.get_audit_log() - assert audit_log == () - assert isinstance(audit_log, tuple) - - def test_get_audit_log_returns_empty_when_disabled_explicit(self) -> None: - """get_audit_log() returns empty tuple when enable_audit=False explicitly.""" - cache = IntegrityCache(enable_audit=False, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="result", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - assert cache.get_audit_log() == () - - @given( - st.integers(min_value=1, max_value=20), - st.integers(min_value=1, max_value=10), - ) - @settings(max_examples=30) - def test_property_audit_log_always_empty_when_disabled( - self, put_count: int, get_count: int - ) -> None: - """PROPERTY: get_audit_log() always returns empty tuple when disabled.""" - cache = IntegrityCache(enable_audit=False, strict=False) - for i in range(put_count): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) - for i in range(get_count): - cache.get(f"msg{i % put_count}", None, None, "en", use_isolating=True) - audit_log = cache.get_audit_log() - assert audit_log == () - assert len(audit_log) == 0 - event(f"put_count={put_count}") - - -class TestIntegrityCacheAuditLogEnabled: - """Test get_audit_log() returns tuple of entries when audit logging is enabled.""" - - def test_get_audit_log_returns_tuple_when_enabled(self) -> None: - """get_audit_log() returns tuple with entries when enable_audit=True.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) - cache.get("msg1", None, None, "en", use_isolating=True) - cache.get("msg2", None, None, "en", use_isolating=True) # Miss - audit_log = cache.get_audit_log() - assert isinstance(audit_log, tuple) - assert len(audit_log) >= 3 # PUT + HIT + MISS - - @given(st.integers(min_value=1, max_value=10)) - @settings(max_examples=20) - def test_property_audit_log_returns_tuple_when_enabled(self, op_count: int) -> None: - """PROPERTY: get_audit_log() returns tuple of at least op_count entries.""" - cache = IntegrityCache(enable_audit=True, strict=False) - for i in range(op_count): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) - audit_log = cache.get_audit_log() - assert isinstance(audit_log, tuple) - assert len(audit_log) >= op_count - event(f"op_count={op_count}") - - -# ============================================================================ -# PROPERTY GETTERS (corruption_detected, write_once, strict) -# ============================================================================ - - -class TestIntegrityCachePropertyGetters: - """Test property getters for complete coverage.""" - - def test_corruption_detected_property(self) -> None: - """corruption_detected property reflects detected corruption count.""" - cache = IntegrityCache(strict=False) - assert cache.corruption_detected == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - key = next(iter(cache._cache.keys())) - original_entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted!", - errors=original_entry.errors, - checksum=original_entry.checksum, - created_at=original_entry.created_at, - sequence=original_entry.sequence, - key_hash=original_entry.key_hash, - ) - cache._cache[key] = corrupted - cache.get("msg", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 1 - - def test_write_once_property(self) -> None: - """write_once property reflects constructor argument.""" - assert IntegrityCache(write_once=False, strict=False).write_once is False - assert IntegrityCache(write_once=True, strict=False).write_once is True - - def test_strict_property(self) -> None: - """strict property reflects constructor argument.""" - assert IntegrityCache(strict=False).strict is False - assert IntegrityCache(strict=True).strict is True - - @given(st.booleans(), st.booleans()) - @settings(max_examples=4) - def test_property_write_once_strict_reflect_constructor( - self, write_once: bool, strict: bool - ) -> None: - """PROPERTY: write_once and strict properties reflect constructor args.""" - cache = IntegrityCache(write_once=write_once, strict=strict) - assert cache.write_once == write_once - assert cache.strict == strict - wo = "write_once" if write_once else "normal" - event(f"mode={wo}") - - def test_corruption_detected_accumulates_across_multiple(self) -> None: - """corruption_detected accumulates across multiple corruption events.""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) - for key in list(cache._cache.keys()): - entry = cache._cache[key] - cache._cache[key] = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache.get("msg1", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 1 - cache.get("msg2", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 2 - cache.get("msg3", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 3 - - def test_error_bloat_skips_property(self) -> None: - """error_bloat_skips property reflects excess-error-count skip count.""" - cache = IntegrityCache(strict=False, max_errors_per_entry=2) - errors = tuple( - FrozenFluentError(f"err-{i}", ErrorCategory.REFERENCE) for i in range(3) - ) - assert cache.error_bloat_skips == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=errors) - assert cache.error_bloat_skips == 1 - - def test_combined_weight_skips_property_initial_zero(self) -> None: - """combined_weight_skips property starts at zero.""" - cache = IntegrityCache(strict=False) - assert cache.combined_weight_skips == 0 - - def test_combined_weight_skips_property_incremented(self) -> None: - """combined_weight_skips property reflects combined-weight skip count.""" - # max_entry_weight=200: formatted (100 chars) passes check 1, - # but combined with error overhead (100 base + 150 msg = 250), total=350 fails. - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - assert cache.combined_weight_skips == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) - assert cache.combined_weight_skips == 1 - - def test_write_once_conflicts_property_initial_zero(self) -> None: - """write_once_conflicts property starts at zero.""" - cache = IntegrityCache(write_once=True, strict=False) - assert cache.write_once_conflicts == 0 - - def test_write_once_conflicts_property_incremented(self) -> None: - """write_once_conflicts property reflects true conflict count.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - assert cache.write_once_conflicts == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - assert cache.write_once_conflicts == 1 - - -class TestIntegrityCacheEdgeCases: - """Additional edge cases for complete coverage.""" - - def test_entry_with_empty_errors_differs_from_entry_with_error(self) -> None: - """Entries with empty vs non-empty errors tuples have distinct checksums.""" - error = FrozenFluentError("Test", ErrorCategory.REFERENCE) - entry1 = IntegrityCacheEntry.create("text", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("text", (error,), sequence=2, key_hash=_NO_KEY_HASH) - assert entry1.checksum != entry2.checksum - - def test_cache_stats_includes_all_integrity_fields(self) -> None: - """get_stats() includes corruption_detected, write_once, strict, audit_enabled.""" - cache = IntegrityCache(write_once=True, strict=True, enable_audit=False) - stats = cache.get_stats() - assert "corruption_detected" in stats - assert "write_once" in stats - assert "strict" in stats - assert "audit_enabled" in stats - assert stats["corruption_detected"] == 0 - assert stats["write_once"] is True - assert stats["strict"] is True - assert stats["audit_enabled"] is False - - def test_multiple_operations_exercise_all_properties(self) -> None: - """Exercise all properties through multiple cache operations.""" - cache = IntegrityCache( - maxsize=10, write_once=False, strict=False, enable_audit=False - ) - for i in range(5): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) - assert cache.size == 5 - assert cache.maxsize == 10 - assert cache.hits == 0 - assert cache.misses == 0 - assert cache.corruption_detected == 0 - assert cache.write_once is False - assert cache.strict is False - for i in range(5): - entry = cache.get(f"msg{i}", None, None, "en", use_isolating=True) - assert entry is not None - assert cache.hits == 5 - assert cache.get_audit_log() == () - - -# ============================================================================ -# ERROR WEIGHT ESTIMATION -# ============================================================================ - - -class TestEstimateErrorWeightWithContext: - """Test _estimate_error_weight with errors containing FrozenErrorContext. - - Covers the branch where error.context fields are processed. - """ - - def test_error_weight_with_context(self) -> None: - """Error with context includes all context field lengths in weight.""" - context = FrozenErrorContext( - input_value="test_input_value", - locale_code="en_US", - parse_type="number", - fallback_value="{!NUMBER}", - ) - error = FrozenFluentError( - "Parse error", ErrorCategory.FORMATTING, context=context - ) - weight = _estimate_error_weight(error) - expected_weight = ( - 100 # _ERROR_BASE_OVERHEAD - + len("Parse error") - + len("test_input_value") - + len("en_US") - + len("number") - + len("{!NUMBER}") - ) - assert weight == expected_weight - - def test_error_weight_without_context(self) -> None: - """Error without context only includes base overhead plus message length.""" - error = FrozenFluentError("Simple error", ErrorCategory.REFERENCE) - weight = _estimate_error_weight(error) - assert weight == 100 + len("Simple error") - - @given( - input_val=st.text(min_size=0, max_size=100), - locale=st.text(min_size=0, max_size=20), - parse_type=st.sampled_from( - ["", "currency", "date", "datetime", "decimal", "number"] - ), - fallback=st.text(min_size=0, max_size=50), - ) - @settings(max_examples=50) - def test_property_error_weight_accounts_for_all_context_fields( - self, - input_val: str, - locale: str, - parse_type: Literal["", "currency", "date", "datetime", "decimal", "number"], - fallback: str, - ) -> None: - """PROPERTY: Error weight correctly accounts for all context field lengths.""" - context = FrozenErrorContext( - input_value=input_val, - locale_code=locale, - parse_type=parse_type, - fallback_value=fallback, - ) - error = FrozenFluentError("Test", ErrorCategory.FORMATTING, context=context) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Test") - + len(input_val) - + len(locale) - + len(parse_type) - + len(fallback) - ) - assert weight == expected - event(f"context_len={len(input_val) + len(locale)}") - - -class TestEstimateErrorWeightDiagnosticBranches: - """Test _estimate_error_weight with diagnostic fields including resolution_path.""" - - def test_error_weight_diagnostic_without_resolution_path(self) -> None: - """Error with diagnostic but no resolution_path skips path length processing.""" - diagnostic = Diagnostic( - code=DiagnosticCode.MESSAGE_NOT_FOUND, - message="Reference error", - ) - error = FrozenFluentError( - "Message not found", ErrorCategory.REFERENCE, diagnostic=diagnostic - ) - weight = _estimate_error_weight(error) - expected = 100 + len("Message not found") + len("Reference error") - assert weight == expected - - def test_error_weight_diagnostic_with_resolution_path(self) -> None: - """Error with diagnostic and resolution_path includes path element lengths.""" - diagnostic = Diagnostic( - code=DiagnosticCode.CYCLIC_REFERENCE, - message="Reference error", - resolution_path=("message1", "term1", "message2"), - ) - error = FrozenFluentError( - "Circular reference", ErrorCategory.CYCLIC, diagnostic=diagnostic - ) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Circular reference") - + len("Reference error") - + len("message1") - + len("term1") - + len("message2") - ) - assert weight == expected - - def test_error_weight_diagnostic_with_all_optional_fields(self) -> None: - """Error with diagnostic containing all optional fields includes them in weight.""" - diagnostic = Diagnostic( - code=DiagnosticCode.INVALID_ARGUMENT, - message="Invalid argument", - hint="Use NUMBER() function", - help_url="https://example.com/help", - function_name="CURRENCY", - argument_name="minimumFractionDigits", - expected_type="int", - received_type="str", - ftl_location="message.ftl:42", - ) - error = FrozenFluentError( - "Function call error", ErrorCategory.FORMATTING, diagnostic=diagnostic - ) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Function call error") - + len("Invalid argument") - + len("Use NUMBER() function") - + len("https://example.com/help") - + len("CURRENCY") - + len("minimumFractionDigits") - + len("int") - + len("str") - + len("message.ftl:42") - ) - assert weight == expected - - -class TestCacheEntryVerifyWithCorruptedError: - """Test IntegrityCacheEntry.verify() when error.verify_integrity() returns False. - - Exercises the defense-in-depth check where entry verification recurses into - each contained error's own verify_integrity() method. - """ - - def test_verify_returns_false_when_error_message_corrupted(self) -> None: - """IntegrityCacheEntry.verify() returns False when error is memory-corrupted. - - Simulates memory corruption: error._message is changed without updating - the stored _content_hash, causing verify_integrity() to return False. - """ - error = FrozenFluentError("Test error 2", ErrorCategory.REFERENCE) - entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) - object.__setattr__(error, "_frozen", False) - object.__setattr__(error, "_message", "corrupted message") - object.__setattr__(error, "_frozen", True) - assert error.verify_integrity() is False - assert entry.verify() is False - - def test_verify_detects_corruption_defense_in_depth(self) -> None: - """IntegrityCacheEntry.verify() provides defense-in-depth error verification.""" - error = FrozenFluentError("Original message", ErrorCategory.REFERENCE) - entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - object.__setattr__(error, "_frozen", False) - object.__setattr__(error, "_message", "Corrupted by memory error") - object.__setattr__(error, "_frozen", True) - assert error.verify_integrity() is False - assert entry.verify() is False - - def test_verify_returns_true_when_all_errors_valid(self) -> None: - """IntegrityCacheEntry.verify() returns True when all errors pass integrity.""" - errors = ( - FrozenFluentError("Error 1", ErrorCategory.REFERENCE), - FrozenFluentError("Error 2", ErrorCategory.FORMATTING), - FrozenFluentError("Error 3", ErrorCategory.CYCLIC), - ) - entry = IntegrityCacheEntry.create("Result", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - - def test_verify_returns_false_if_any_error_corrupted(self) -> None: - """IntegrityCacheEntry.verify() returns False if any single error is corrupted.""" - error1 = FrozenFluentError("Error 1", ErrorCategory.REFERENCE) - error2 = FrozenFluentError("Error 2", ErrorCategory.FORMATTING) - error3 = FrozenFluentError("Error 3", ErrorCategory.CYCLIC) - entry = IntegrityCacheEntry.create( - "Result", (error1, error2, error3), sequence=1, key_hash=_NO_KEY_HASH - ) - object.__setattr__(error2, "_frozen", False) - object.__setattr__(error2, "_content_hash", b"bad_hash_xxxxxxx") - object.__setattr__(error2, "_frozen", True) - assert entry.verify() is False - - -class TestErrorWeightAndVerifyIntegration: - """Integration tests combining error weight estimation and verification.""" - - def test_large_error_with_context_and_diagnostic(self) -> None: - """Error with both context and diagnostic computes correct weight.""" - context = FrozenErrorContext( - input_value="very long input value that would increase weight significantly", - locale_code="en_US", - parse_type="currency", - fallback_value="{!CURRENCY}", - ) - diagnostic = Diagnostic( - code=DiagnosticCode.PARSE_DECIMAL_FAILED, - message="Failed to parse number", - hint="Check number format", - resolution_path=("step1", "step2", "step3"), - ) - error = FrozenFluentError( - "Complex error message", - ErrorCategory.FORMATTING, - diagnostic=diagnostic, - context=context, - ) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Complex error message") - + len("Failed to parse number") - + len("Check number format") - + len("step1") + len("step2") + len("step3") - + len("very long input value that would increase weight significantly") - + len("en_US") - + len("currency") - + len("{!CURRENCY}") - ) - assert weight == expected - assert error.verify_integrity() is True - entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - - @given( - message=st.text(min_size=1, max_size=100), - input_val=st.text(min_size=0, max_size=50), - locale=st.text(min_size=0, max_size=10), - ) - @settings(max_examples=50) - def test_property_weight_estimation_deterministic( - self, message: str, input_val: str, locale: str - ) -> None: - """PROPERTY: Weight estimation is deterministic and positive.""" - context = FrozenErrorContext( - input_value=input_val, - locale_code=locale, - parse_type="number", - fallback_value="fallback", - ) - error = FrozenFluentError(message, ErrorCategory.FORMATTING, context=context) - weight1 = _estimate_error_weight(error) - weight2 = _estimate_error_weight(error) - assert weight1 == weight2 - assert weight1 > 0 - min_weight = len(message) + len(input_val) + len(locale) + len("number") + len("fallback") - assert weight1 >= min_weight - event(f"weight={weight1}") - - -# ============================================================================ -# CACHE ENTRY SIZE LIMIT COVERAGE -# ============================================================================ - - -class TestCacheEntrySizeLimit: - """IntegrityCache max_entry_weight prevents caching of oversized results.""" - - def test_default_max_entry_weight(self) -> None: - """Default max_entry_weight is DEFAULT_MAX_ENTRY_WEIGHT (10,000 characters).""" - cache = IntegrityCache(strict=False) - assert cache.max_entry_weight == DEFAULT_MAX_ENTRY_WEIGHT - assert cache.max_entry_weight == 10_000 - - def test_custom_max_entry_weight(self) -> None: - """Custom max_entry_weight is stored and returned correctly.""" - cache = IntegrityCache(strict=False, max_entry_weight=1000) - assert cache.max_entry_weight == 1000 - - def test_invalid_max_entry_weight_rejected(self) -> None: - """Zero and negative max_entry_weight raise ValueError.""" - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(strict=False, max_entry_weight=0) - - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(strict=False, max_entry_weight=-1) - - def test_small_entries_cached(self) -> None: - """Entries below max_entry_weight are stored and retrievable.""" - cache = IntegrityCache(strict=False, max_entry_weight=1000) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - - assert cache.size == 1 - assert cache.oversize_skips == 0 - - cached = cache.get("msg", None, None, "en", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("x" * 100, ()) - - def test_large_entries_not_cached(self) -> None: - """Entries exceeding max_entry_weight are skipped and counted.""" - cache = IntegrityCache(strict=False, max_entry_weight=100) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 200, errors=()) - - assert cache.size == 0 - assert cache.oversize_skips == 1 - - cached = cache.get("msg", None, None, "en", use_isolating=True) - assert cached is None - - def test_boundary_entry_size(self) -> None: - """Entry exactly at max_entry_weight is cached (inclusive boundary).""" - cache = IntegrityCache(strict=False, max_entry_weight=100) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - - assert cache.size == 1 - assert cache.oversize_skips == 0 - - def test_get_stats_includes_oversize_skips(self) -> None: - """get_stats() reports oversize_skips and max_entry_weight.""" - cache = IntegrityCache(strict=False, max_entry_weight=50) - - for i in range(5): - cache.put(f"msg-{i}", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - - stats = cache.get_stats() - assert stats["oversize_skips"] == 5 - assert stats["max_entry_weight"] == 50 - assert stats["size"] == 0 - - def test_clear_preserves_oversize_skips(self) -> None: - """clear() removes entries but preserves cumulative oversize_skips counter.""" - cache = IntegrityCache(strict=False, max_entry_weight=50) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - assert cache.oversize_skips == 1 - - cache.clear() - assert cache.oversize_skips == 1 - - def test_bundle_cache_uses_default_max_entry_weight(self) -> None: - """FluentBundle's internal cache uses default max_entry_weight.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg = { $data }") - - small_data = "x" * 100 - bundle.format_pattern("msg", {"data": small_data}) - - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 1 - - @given(st.integers(min_value=1, max_value=1000)) - def test_max_entry_weight_property(self, size: int) -> None: - """PROPERTY: max_entry_weight is correctly stored and returned.""" - event(f"weight_size={size}") - cache = IntegrityCache(strict=False, max_entry_weight=size) - assert cache.max_entry_weight == size - - def test_combined_weight_skips_counter_incremented(self) -> None: - """Entries skipped due to combined weight increment combined_weight_skips. - - Scenario: formatted string (100 chars) passes check 1 (len <= max_entry_weight=200). - Error overhead = 100 (base) + 150 (message) = 250. Total = 350 > 200 fails check 3. - """ - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) - - stats = cache.get_stats() - assert stats["combined_weight_skips"] == 1 - assert stats["oversize_skips"] == 0 - assert stats["error_bloat_skips"] == 0 - assert stats["size"] == 0 - - def test_combined_weight_skips_distinct_from_oversize_skips(self) -> None: - """oversize_skips and combined_weight_skips are separate, distinct counters.""" - cache = IntegrityCache(strict=False, max_entry_weight=200) - heavy_error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - # Check 1 (oversize): formatted string alone exceeds max_entry_weight - cache.put("over-msg", None, None, "en", use_isolating=True, formatted="x" * 201, errors=()) - - # Check 3 (combined_weight): formatted OK, but combined total exceeds limit - cache.put("combined-msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(heavy_error,)) - - stats = cache.get_stats() - assert stats["oversize_skips"] == 1 - assert stats["combined_weight_skips"] == 1 - - def test_combined_weight_skips_distinct_from_error_bloat_skips(self) -> None: - """error_bloat_skips and combined_weight_skips are separate, distinct counters.""" - cache = IntegrityCache(strict=False, max_entry_weight=200, max_errors_per_entry=2) - heavy_error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - # Check 2 (error_bloat): too many errors by count - many_errors = tuple( - FrozenFluentError(f"e-{i}", ErrorCategory.REFERENCE) for i in range(3) - ) - cache.put("bloat-msg", None, None, "en", use_isolating=True, formatted="Hello", errors=many_errors) - - # Check 3 (combined_weight): error count OK (1 <= 2), combined weight fails - cache.put("combined-msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(heavy_error,)) - - stats = cache.get_stats() - assert stats["error_bloat_skips"] == 1 - assert stats["combined_weight_skips"] == 1 - - def test_combined_weight_skips_preserved_on_clear(self) -> None: - """clear() preserves cumulative combined_weight_skips counter.""" - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) - assert cache.combined_weight_skips == 1 - - cache.clear() - assert cache.combined_weight_skips == 1 - - def test_get_stats_includes_combined_weight_skips(self) -> None: - """get_stats() reports combined_weight_skips alongside related skip counters.""" - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) - - stats = cache.get_stats() - assert "combined_weight_skips" in stats - assert stats["combined_weight_skips"] == 1 - - -# =========================================================================== -# DUAL-CLOCK AUDIT LOG (wall_time_unix) -# =========================================================================== - - -class TestWriteLogEntryWallTime: - """WriteLogEntry carries both monotonic timestamp and wall_time_unix.""" - - def test_write_log_entry_has_wall_time_unix_field(self) -> None: - """WriteLogEntry.wall_time_unix field exists and is a float.""" - before = time.time() - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) - after = time.time() - - log = cache.get_audit_log() - assert len(log) >= 1 - entry = log[0] - assert isinstance(entry.wall_time_unix, float) - # Wall time should be bracketed between the before/after calls - assert before <= entry.wall_time_unix <= after - - def test_write_log_entry_timestamp_is_monotonic(self) -> None: - """WriteLogEntry.timestamp (monotonic) is distinct from wall_time_unix.""" - - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) - - log = cache.get_audit_log() - entry = log[0] - # Monotonic and wall clock are different clocks — values may differ - assert isinstance(entry.timestamp, float) - assert isinstance(entry.wall_time_unix, float) - # Both should be positive - assert entry.timestamp > 0 - assert entry.wall_time_unix > 0 - - def test_audit_log_multiple_entries_wall_time_non_decreasing(self) -> None: - """wall_time_unix values across audit entries are non-decreasing.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("a", None, None, "en", use_isolating=True, formatted="A", errors=()) - cache.put("b", None, None, "en", use_isolating=True, formatted="B", errors=()) - cache.put("c", None, None, "en", use_isolating=True, formatted="C", errors=()) - - log = cache.get_audit_log() - wall_times = [e.wall_time_unix for e in log] - for i in range(len(wall_times) - 1): - assert wall_times[i] <= wall_times[i + 1], ( - f"wall_time_unix not non-decreasing at index {i}: " - f"{wall_times[i]} > {wall_times[i + 1]}" - ) - - -class TestIntegrityContextWallTime: - """IntegrityContext.wall_time_unix is populated at integrity error sites.""" - - def test_integrity_context_wall_time_unix_field_exists(self) -> None: - """IntegrityContext accepts wall_time_unix and stores it correctly.""" - t = time.time() - ctx = IntegrityContext( - component="test", - operation="check", - timestamp=time.monotonic(), - wall_time_unix=t, - ) - assert ctx.wall_time_unix == t - - def test_integrity_context_wall_time_unix_defaults_to_none(self) -> None: - """IntegrityContext.wall_time_unix defaults to None for backwards compat.""" - ctx = IntegrityContext(component="test", operation="check") - assert ctx.wall_time_unix is None - - def test_cache_corruption_error_context_has_wall_time(self) -> None: - """CacheCorruptionError raised by strict cache carries wall_time_unix.""" - cache = IntegrityCache(enable_audit=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="ok", errors=()) - - # Corrupt the checksum by manipulating the stored entry directly - key = next(iter(cache._cache)) - entry = cache._cache[key] - - # Corrupt the checksum in-place via object.__setattr__ (frozen dataclass). - # content_hash is field(init=False), so we cannot pass it to __init__. - object.__setattr__(entry, "checksum", b"\x00" * 16) # deliberately invalid - cache._cache[key] = entry - - before = time.time() - with pytest.raises(CacheCorruptionError) as exc_info: - cache.get("msg", None, None, "en", use_isolating=True) - after = time.time() - - ctx = exc_info.value.context - assert ctx is not None - assert ctx.wall_time_unix is not None - assert before <= ctx.wall_time_unix <= after +from tests.runtime_cache_integrity_cases.checksums import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_cache_integrity_cases.idempotence_and_hashes import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_cache_integrity_cases.integrity_edges import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_cache_integrity_cases.limits_and_timing import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_cache_integrity_cases.write_once_audit import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_runtime_cache_property.py b/tests/test_runtime_cache_property.py index c9352e36..717fefc3 100644 --- a/tests/test_runtime_cache_property.py +++ b/tests/test_runtime_cache_property.py @@ -1,1176 +1,9 @@ -"""Property-based (Hypothesis) tests for FormatCache and IntegrityCache. - -All classes are marked with @pytest.mark.fuzz and run only via: - ./scripts/fuzz_hypofuzz.sh --deep - pytest -m fuzz - -Covers: -- IntegrityCache invariants: maxsize enforced, get-after-put, clear, hit/miss counters -- IntegrityCache LRU eviction patterns -- IntegrityCache key handling: locale, attribute, args dict stability -- IntegrityCache robustness: various arg types, duplicate puts, non-negative stats -- IntegrityCache statistics: hit_rate consistency, size matches entry count -- IntegrityCache init parameters stored correctly -- IntegrityCache primitives: all FluentValue types produce valid cache keys -- FormatCache invariants: transparency, isolation, LRU eviction, stats consistency -- FormatCache invalidation: add_resource, add_function -- FormatCache internals: __len__, properties, key uniqueness, attribute isolation -- FormatCache type collision prevention: bool/int, int/Decimal -""" - -from __future__ import annotations - -from decimal import Decimal - -import pytest -from hypothesis import assume, event, given, settings -from hypothesis import strategies as st - -from ftllexengine import FluentBundle -from ftllexengine.runtime.cache import IntegrityCache -from ftllexengine.runtime.cache_config import CacheConfig - -# ============================================================================ -# MODULE-LEVEL STRATEGIES (used by IntegrityCache tests) -# ============================================================================ - -# Strategy for message IDs - use st.from_regex per hypothesis.md -message_ids = st.from_regex(r"[a-z]+", fullmatch=True) - -# Strategy for locale codes -locale_codes = st.sampled_from(["en_US", "de_DE", "lv_LV", "fr_FR", "ja_JP"]) - -# Strategy for attributes - remove arbitrary max_size -attributes = st.one_of(st.none(), st.text(min_size=1)) - -# Strategy for cache values (result, errors) - remove arbitrary max_size -cache_values: st.SearchStrategy[tuple[str, tuple[()]]] = st.tuples( - st.text(min_size=0), - st.just(()), # Empty error tuple for simplicity -) - -# Strategy for message arguments - keep collection bound, remove text max_size -args_strategy = st.one_of( - st.none(), - st.dictionaries( - st.text(min_size=1), - st.one_of( - st.integers(), - st.decimals(allow_nan=False, allow_infinity=False), - st.text(), - ), - max_size=5, # Keep practical bound for dict size - ), -) - - -# ============================================================================ -# PROPERTY TESTS - BASIC INVARIANTS -# ============================================================================ - - -@pytest.mark.fuzz -class TestCacheInvariants: - """Test fundamental IntegrityCache invariants.""" - - @given(maxsize=st.integers(min_value=1, max_value=10000)) - @settings(max_examples=100) - def test_cache_maxsize_enforced(self, maxsize: int) -> None: - """INVARIANT: Cache never exceeds maxsize.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) - - # Add more than maxsize entries - for i in range(maxsize + 10): - cache.put( - f"msg_{i}", - None, - None, - "en_US", - use_isolating=True, - formatted=f"result_{i}", - errors=(), - ) - - # Cache should not exceed maxsize - assert cache.get_stats()["size"] <= maxsize - event(f"maxsize={maxsize}") - - @given( - msg_id=message_ids, - locale=locale_codes, - args=args_strategy, - attr=attributes, - value=cache_values, - ) - @settings(max_examples=200) - def test_get_after_put_returns_value( - self, - msg_id: str, - locale: str, - args: dict[str, int | Decimal | str] | None, - attr: str | None, - value: tuple[str, tuple[()]], - ) -> None: - """PROPERTY: get(k) after put(k, v) returns v.""" - cache = IntegrityCache(maxsize=100, strict=False) - - formatted, errors = value - cache.put(msg_id, args, attr, locale, use_isolating=True, formatted=formatted, errors=errors) - entry = cache.get(msg_id, args, attr, locale, use_isolating=True) - - assert entry is not None - assert entry.as_result() == value - has_args = args is not None - event(f"has_args={has_args}") - - @given( - msg_id=message_ids, - locale=locale_codes, - ) - @settings(max_examples=100) - def test_get_without_put_returns_none( - self, - msg_id: str, - locale: str, - ) -> None: - """PROPERTY: get(k) without put(k) returns None.""" - cache = IntegrityCache(maxsize=100, strict=False) - - result = cache.get(msg_id, None, None, locale, use_isolating=True) - - assert result is None - event(f"locale={locale}") - - @given(maxsize=st.integers(min_value=1, max_value=100)) - @settings(max_examples=50) - def test_clear_resets_cache_to_empty(self, maxsize: int) -> None: - """PROPERTY: clear() empties cache and resets counters.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) - - # Add some entries - for i in range(min(10, maxsize)): - cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) - - # Clear - cache.clear() - - # Cache should be empty - stats = cache.get_stats() - assert stats["size"] == 0 - assert stats["hits"] == 0 - assert stats["misses"] == 0 - event(f"maxsize={maxsize}") - - @given( - msg_id=message_ids, - locale=locale_codes, - value=cache_values, - ) - @settings(max_examples=100) - def test_hit_counter_increments_on_cache_hit( - self, - msg_id: str, - locale: str, - value: tuple[str, tuple[()]], - ) -> None: - """PROPERTY: Cache hits increment hit counter.""" - cache = IntegrityCache(maxsize=100, strict=False) - - formatted, errors = value - cache.put(msg_id, None, None, locale, use_isolating=True, formatted=formatted, errors=errors) - - # First get - cache hit - initial_stats = cache.get_stats() - cache.get(msg_id, None, None, locale, use_isolating=True) - - stats_after_hit = cache.get_stats() - assert stats_after_hit["hits"] == initial_stats["hits"] + 1 - event(f"locale={locale}") - - @given( - msg_id=message_ids, - locale=locale_codes, - ) - @settings(max_examples=100) - def test_miss_counter_increments_on_cache_miss( - self, - msg_id: str, - locale: str, - ) -> None: - """PROPERTY: Cache misses increment miss counter.""" - cache = IntegrityCache(maxsize=100, strict=False) - - initial_stats = cache.get_stats() - cache.get(msg_id, None, None, locale, use_isolating=True) # Cache miss - - stats_after_miss = cache.get_stats() - assert stats_after_miss["misses"] == initial_stats["misses"] + 1 - event(f"locale={locale}") - - -# ============================================================================ -# PROPERTY TESTS - LRU EVICTION -# ============================================================================ - - -@pytest.mark.fuzz -class TestLRUEviction: - """Test LRU (Least Recently Used) eviction behavior.""" - - @given(maxsize=st.integers(min_value=2, max_value=10)) - @settings(max_examples=50) - def test_lru_evicts_least_recently_used(self, maxsize: int) -> None: - """PROPERTY: LRU eviction removes oldest entry.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) - - # Fill cache to capacity - for i in range(maxsize): - cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) - - # Access first entry to make it recently used - cache.get("msg_0", None, None, "en_US", use_isolating=True) - - # Add one more entry (should evict msg_1, not msg_0) - cache.put("msg_new", None, None, "en_US", use_isolating=True, formatted="result_new", errors=()) - - # msg_0 should still be in cache (recently accessed) - assert cache.get("msg_0", None, None, "en_US", use_isolating=True) is not None - - # msg_1 should be evicted (oldest unreferenced) - assert cache.get("msg_1", None, None, "en_US", use_isolating=True) is None - event(f"maxsize={maxsize}") - - @given( - maxsize=st.integers(min_value=3, max_value=10), - access_pattern=st.lists( - st.integers(min_value=0, max_value=9), - min_size=5, - max_size=20, - ), - ) - @settings(max_examples=50) - def test_lru_access_pattern_eviction( - self, - maxsize: int, - access_pattern: list[int], - ) -> None: - """PROPERTY: LRU eviction respects access patterns.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) - - # Fill cache - for i in range(maxsize): - cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) - - # Access entries according to pattern - for idx in access_pattern: - if idx < maxsize: - cache.get(f"msg_{idx}", None, None, "en_US", use_isolating=True) - - # Add new entries (will trigger evictions) - for i in range(maxsize, maxsize + 3): - cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) - - # Recently accessed entries should still be in cache - assert cache.get_stats()["size"] <= maxsize - event(f"pattern_len={len(access_pattern)}") - - -# ============================================================================ -# PROPERTY TESTS - KEY HANDLING -# ============================================================================ - - -@pytest.mark.fuzz -class TestCacheKeyHandling: - """Test cache key construction and equality.""" - - @given( - msg_id=message_ids, - locale=locale_codes, - value=cache_values, - ) - @settings(max_examples=100) - def test_same_key_retrieves_same_value( - self, - msg_id: str, - locale: str, - value: tuple[str, tuple[()]], - ) -> None: - """PROPERTY: Same key components retrieve same cached value.""" - cache = IntegrityCache(maxsize=100, strict=False) - - formatted, errors = value - # Put with specific key - cache.put(msg_id, None, None, locale, use_isolating=True, formatted=formatted, errors=errors) - - # Get with same key components - entry = cache.get(msg_id, None, None, locale, use_isolating=True) - - assert entry is not None - assert entry.as_result() == value - event(f"locale={locale}") - - @given( - msg_id=message_ids, - locale1=locale_codes, - locale2=locale_codes, - value=cache_values, - ) - @settings(max_examples=100) - def test_different_locale_creates_different_key( - self, - msg_id: str, - locale1: str, - locale2: str, - value: tuple[str, tuple[()]], - ) -> None: - """PROPERTY: Different locales create different cache keys.""" - assume(locale1 != locale2) - - cache = IntegrityCache(maxsize=100, strict=False) - - formatted, errors = value - # Put with locale1 - cache.put(msg_id, None, None, locale1, use_isolating=True, formatted=formatted, errors=errors) - - # Get with locale2 should miss - result = cache.get(msg_id, None, None, locale2, use_isolating=True) - - assert result is None - event(f"locale_pair={locale1}_{locale2}") - - @given( - msg_id=message_ids, - locale=locale_codes, - attr1=attributes, - attr2=attributes, - value=cache_values, - ) - @settings(max_examples=100) - def test_different_attribute_creates_different_key( - self, - msg_id: str, - locale: str, - attr1: str | None, - attr2: str | None, - value: tuple[str, tuple[()]], - ) -> None: - """PROPERTY: Different attributes create different cache keys.""" - assume(attr1 != attr2) - - cache = IntegrityCache(maxsize=100, strict=False) - - formatted, errors = value - # Put with attr1 - cache.put(msg_id, None, attr1, locale, use_isolating=True, formatted=formatted, errors=errors) - - # Get with attr2 should miss - result = cache.get(msg_id, None, attr2, locale, use_isolating=True) - - assert result is None - has_attr1 = attr1 is not None - event(f"has_attr={has_attr1}") - - @given( - msg_id=message_ids, - locale=locale_codes, - value=cache_values, - ) - @settings(max_examples=100) - def test_args_dict_key_stability( - self, - msg_id: str, - locale: str, - value: tuple[str, tuple[()]], - ) -> None: - """PROPERTY: Equivalent args dicts produce same cache key.""" - cache = IntegrityCache(maxsize=100, strict=False) - - formatted, errors = value - # Put with args dict - args = {"x": 1, "y": 2} - cache.put(msg_id, args, None, locale, use_isolating=True, formatted=formatted, errors=errors) - - # Get with equivalent dict (different order) - args_reordered = {"y": 2, "x": 1} - entry = cache.get(msg_id, args_reordered, None, locale, use_isolating=True) - - # Should hit cache (dict key normalized) - assert entry is not None - assert entry.as_result() == value - event(f"locale={locale}") - - -# ============================================================================ -# PROPERTY TESTS - ROBUSTNESS -# ============================================================================ - - -@pytest.mark.fuzz -class TestCacheRobustness: - """Test cache robustness with various input types.""" - - @given( - args=st.dictionaries( - st.text(min_size=1), - st.one_of( - st.integers(), - st.decimals(allow_nan=False, allow_infinity=False), - st.text(), - st.booleans(), - st.none(), - ), - max_size=10, # Keep practical bound for dict size - ), - ) - @settings(max_examples=200) - def test_cache_handles_various_arg_types( - self, args: dict[str, int | Decimal | str | bool | None] - ) -> None: - """ROBUSTNESS: Cache handles various argument types.""" - cache = IntegrityCache(maxsize=100, strict=False) - - # Should not crash with various arg types - try: - cache.put("msg", args, None, "en_US", use_isolating=True, formatted="result", errors=()) - entry = cache.get("msg", args, None, "en_US", use_isolating=True) - # If put succeeded, get should return the value - if entry is not None: - assert entry.as_result() == ("result", ()) - except (TypeError, ValueError): - # Some types may not be hashable - acceptable - pass - event(f"arg_types={len(args)}") - - @given( - msg_ids=st.lists(message_ids, min_size=1, max_size=50), - maxsize=st.integers(min_value=1, max_value=10), - ) - @settings(max_examples=50) - def test_cache_handles_duplicate_puts( - self, - msg_ids: list[str], - maxsize: int, - ) -> None: - """ROBUSTNESS: Cache handles duplicate puts gracefully.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) - - # Put same message multiple times - for msg_id in msg_ids: - cache.put(msg_id, None, None, "en_US", use_isolating=True, formatted=f"result_{msg_id}", errors=()) - - # Cache should still respect maxsize - assert cache.get_stats()["size"] <= maxsize - event(f"duplicates={len(msg_ids)}") - - @given(maxsize=st.integers(min_value=1, max_value=100)) - @settings(max_examples=50) - def test_cache_stats_never_negative(self, maxsize: int) -> None: - """ROBUSTNESS: Cache stats are never negative.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) - - # Perform various operations - cache.put("msg", None, None, "en_US", use_isolating=True, formatted="result", errors=()) - cache.get("msg", None, None, "en_US", use_isolating=True) - cache.get("missing", None, None, "en_US", use_isolating=True) - cache.clear() - - stats = cache.get_stats() - assert stats["size"] >= 0 - assert stats["hits"] >= 0 - assert stats["misses"] >= 0 - assert stats["maxsize"] > 0 - event(f"maxsize={maxsize}") - - -# ============================================================================ -# PROPERTY TESTS - STATISTICS -# ============================================================================ - - -@pytest.mark.fuzz -class TestCacheStatistics: - """Test cache statistics tracking.""" - - @given( - operations=st.lists( - st.tuples( - st.sampled_from(["put", "get"]), - message_ids, - ), - min_size=1, - max_size=50, - ), - ) - @settings(max_examples=50) - def test_hit_rate_consistency( - self, - operations: list[tuple[str, str]], - ) -> None: - """PROPERTY: hit_rate = hits / (hits + misses).""" - cache = IntegrityCache(maxsize=20, strict=False) - - for op, msg_id in operations: - if op == "put": - cache.put(msg_id, None, None, "en_US", use_isolating=True, formatted=f"result_{msg_id}", errors=()) - elif op == "get": - cache.get(msg_id, None, None, "en_US", use_isolating=True) - - stats = cache.get_stats() - total = stats["hits"] + stats["misses"] - - if total > 0: - expected_hit_rate = stats["hits"] / total - # hit_rate might be percentage (0-100) or decimal (0.0-1.0) - actual_rate: float = float(stats["hit_rate"]) - if actual_rate > 1.0: # Percentage format - actual_rate = actual_rate / 100.0 - assert abs(actual_rate - expected_hit_rate) < 0.01 - event(f"op_count={len(operations)}") - - @given( - num_entries=st.integers(min_value=0, max_value=50), - maxsize=st.integers(min_value=10, max_value=100), - ) - @settings(max_examples=50) - def test_size_equals_entry_count( - self, - num_entries: int, - maxsize: int, - ) -> None: - """PROPERTY: size stat equals actual number of cached entries.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) - - # Add entries - for i in range(num_entries): - cache.put(f"msg_{i}", None, None, "en_US", use_isolating=True, formatted=f"result_{i}", errors=()) - - stats = cache.get_stats() - expected_size = min(num_entries, maxsize) - - assert stats["size"] == expected_size - event(f"entries={num_entries}") - - -# ============================================================================ -# PROPERTY TESTS - INIT PARAMETERS -# ============================================================================ - - -@pytest.mark.fuzz -class TestIntegrityCacheHypothesisProperties: - """Property-based tests for IntegrityCache using Hypothesis.""" - - @given( - st.integers(min_value=1, max_value=1000), - st.integers(min_value=1, max_value=10000), - st.integers(min_value=1, max_value=100), - ) - @settings(max_examples=50) - def test_property_init_parameters_stored_correctly( - self, - maxsize: int, - max_entry_weight: int, - max_errors_per_entry: int, - ) -> None: - """PROPERTY: Constructor parameters are stored correctly.""" - cache = IntegrityCache( - strict=False, - maxsize=maxsize, - max_entry_weight=max_entry_weight, - max_errors_per_entry=max_errors_per_entry, - ) - - assert cache.maxsize == maxsize - assert cache.max_entry_weight == max_entry_weight - assert cache.size == 0 - assert cache.hits == 0 - assert cache.misses == 0 - event(f"maxsize={maxsize}") - - @given(st.text(min_size=0, max_size=100)) - @settings(max_examples=50) - def test_property_primitives_hashable(self, text: str) -> None: - """PROPERTY: All primitive types produce valid cache keys.""" - cache = IntegrityCache(strict=False) - - # String - cache.put("msg", {"text": text}, None, "en", use_isolating=True, formatted="result", errors=()) - entry = cache.get("msg", {"text": text}, None, "en", use_isolating=True) - assert entry is not None - assert entry.as_result() == ("result", ()) - - # Integer - cache.put("msg", {"num": 42}, None, "en", use_isolating=True, formatted="result", errors=()) - entry = cache.get("msg", {"num": 42}, None, "en", use_isolating=True) - assert entry is not None - assert entry.as_result() == ("result", ()) - - # Decimal - cache.put("msg", {"decimal": Decimal("3.14")}, None, "en", use_isolating=True, formatted="result", errors=()) - entry = cache.get("msg", {"decimal": Decimal("3.14")}, None, "en", use_isolating=True) - assert entry is not None - assert entry.as_result() == ("result", ()) - - # Bool - cache.put("msg", {"bool": True}, None, "en", use_isolating=True, formatted="result", errors=()) - entry = cache.get("msg", {"bool": True}, None, "en", use_isolating=True) - assert entry is not None - assert entry.as_result() == ("result", ()) - - # None - cache.put("msg", {"val": None}, None, "en", use_isolating=True, formatted="result", errors=()) - entry = cache.get("msg", {"val": None}, None, "en", use_isolating=True) - assert entry is not None - assert entry.as_result() == ("result", ()) - event(f"text_len={len(text)}") - - -# ============================================================================ -# FORMATCACHE PROPERTIES (via FluentBundle) -# ============================================================================ - - -@st.composite -def message_args(draw: st.DrawFn) -> dict[str, str | int]: - """Generate valid message arguments.""" - num_args = draw(st.integers(min_value=0, max_value=5)) - args = {} - for _ in range(num_args): - key = draw(st.text( - alphabet=st.characters(min_codepoint=97, max_codepoint=122), - min_size=1, max_size=10, - )) - value = draw(st.one_of(st.text(min_size=0, max_size=20), st.integers())) - args[key] = value - return args - - -@pytest.mark.fuzz -class TestCacheProperties: - """Property-based tests for FormatCache behavior.""" - - @given(args=message_args()) - def test_cache_transparency(self, args: dict[str, str | int]) -> None: - """Cache hit returns same result as cache miss. - - Property: format_pattern(msg, args) with cache enabled should return - identical results to format_pattern(msg, args) without cache. - """ - ftl_vars = " ".join([f"{{ ${k} }}" for k in args]) - ftl_source = f"msg = Hello {ftl_vars}!" - - # Bundle without cache - bundle_no_cache = FluentBundle("en", use_isolating=False) - bundle_no_cache.add_resource(ftl_source) - result_no_cache, errors_no_cache = bundle_no_cache.format_pattern("msg", args) - - # Bundle with cache - bundle_with_cache = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle_with_cache.add_resource(ftl_source) - - # First call (cache miss) - result_miss, errors_miss = bundle_with_cache.format_pattern("msg", args) - assert result_miss == result_no_cache - assert len(errors_miss) == len(errors_no_cache) - - # Second call (cache hit) - result_hit, errors_hit = bundle_with_cache.format_pattern("msg", args) - assert result_hit == result_no_cache - assert len(errors_hit) == len(errors_no_cache) - - # Cache hit and miss must return identical results - assert result_miss == result_hit - assert len(errors_miss) == len(errors_hit) - event(f"arg_count={len(args)}") - - @given( - args1=message_args(), - args2=message_args(), - ) - def test_cache_isolation( - self, args1: dict[str, str | int], args2: dict[str, str | int] - ) -> None: - """Different args produce different cache entries. - - Property: format_pattern(msg, args1) and format_pattern(msg, args2) - should be cached separately if args differ. - """ - # Only test if args actually differ - if args1 == args2: - return - - ftl_vars = set(args1.keys()) | set(args2.keys()) - ftl_placeholders = " ".join([f"{{ ${k} }}" for k in ftl_vars]) - ftl_source = f"msg = Test {ftl_placeholders}" - - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False, strict=False) - bundle.add_resource(ftl_source) - - # Format with args1 - _result1, _ = bundle.format_pattern("msg", args1) - - # Format with args2 - _result2, _ = bundle.format_pattern("msg", args2) - - # Results should differ if args differ - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 2 # Two separate cache entries - event(f"key_count={len(args1)}") - - @given( - cache_size=st.integers(min_value=1, max_value=100), - num_messages=st.integers(min_value=1, max_value=200), - ) - def test_lru_eviction_property(self, cache_size: int, num_messages: int) -> None: - """Cache size never exceeds limit. - - Property: No matter how many format calls, cache size <= maxsize. - """ - bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) - - # Add many messages - ftl_source = "\n".join([f"msg{i} = Message {i}" for i in range(num_messages)]) - bundle.add_resource(ftl_source) - - # Format all messages - for i in range(num_messages): - bundle.format_pattern(f"msg{i}") - - # Cache size must respect limit - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] <= cache_size - assert stats["size"] == min(num_messages, cache_size) - evicted = num_messages > cache_size - event(f"eviction={evicted}") - - @given( - num_calls=st.integers(min_value=1, max_value=100), - ) - def test_stats_consistency_property(self, num_calls: int) -> None: - """Cache stats are always consistent. - - Property: hits + misses = total calls. - """ - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = Hello") - - # Make num_calls format calls - for _ in range(num_calls): - bundle.format_pattern("msg") - - # Stats must be consistent - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["hits"] + stats["misses"] == num_calls - assert stats["hits"] == num_calls - 1 # All but first are hits - assert stats["misses"] == 1 # Only first is miss - event(f"num_calls={num_calls}") - - -@pytest.mark.fuzz -class TestCacheInvalidationProperties: - """Property-based tests for cache invalidation.""" - - @given( - num_resources=st.integers(min_value=1, max_value=10), - ) - def test_invalidation_on_add_resource(self, num_resources: int) -> None: - """Cache entries are cleared every time add_resource is called. - - Property: After add_resource(), cache size = 0 and cumulative - hits/misses are unchanged (add_resource itself does not format - anything). Stats are cumulative by design — they are NOT reset on - clear so production observability is preserved across invalidations. - """ - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = Hello") - - # Warm up cache - bundle.format_pattern("msg") - - # Add resources multiple times - for i in range(num_resources): - stats_before = bundle.get_cache_stats() - assert stats_before is not None - - bundle.add_resource(f"msg{i} = World {i}") - - stats_after = bundle.get_cache_stats() - assert stats_after is not None - assert stats_after["size"] == 0 # Cache entries cleared - # Cumulative stats are preserved across clear (design intent: production - # observability must not be reset by routine cache invalidation). - assert stats_after["hits"] == stats_before["hits"] - assert stats_after["misses"] == stats_before["misses"] - event(f"num_resources={num_resources}") - - @given( - num_functions=st.integers(min_value=1, max_value=10), - ) - def test_invalidation_on_add_function(self, num_functions: int) -> None: - """Cache is cleared every time add_function is called. - - Property: After add_function(), cache size = 0 and stats reset. - """ - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = Hello") - - # Warm up cache - bundle.format_pattern("msg") - - # Add functions multiple times - for i in range(num_functions): - stats_before = bundle.get_cache_stats() - assert stats_before is not None - - def func(value: str) -> str: - return value.upper() - - bundle.add_function(f"FUNC{i}", func) - - stats_after = bundle.get_cache_stats() - assert stats_after is not None - assert stats_after["size"] == 0 # Cache cleared - event(f"num_functions={num_functions}") - - -@pytest.mark.fuzz -class TestCacheInternalProperties: - """Property-based tests for cache internals.""" - - @given( - cache_size=st.integers(min_value=1, max_value=100), - num_operations=st.integers(min_value=0, max_value=200), - ) - def test_cache_len_property(self, cache_size: int, num_operations: int) -> None: - """Cache __len__ always returns correct size. - - Property: len(cache) <= maxsize and len(cache) = stats["size"]. - """ - bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) - - # Add messages - ftl_source = "\n".join([f"msg{i} = Message {i}" for i in range(num_operations)]) - bundle.add_resource(ftl_source) - - # Format messages - for i in range(num_operations): - bundle.format_pattern(f"msg{i}") - - # len() should match stats - cache = bundle._cache - assert cache is not None # Type narrowing for mypy - stats = bundle.get_cache_stats() - assert stats is not None - assert len(cache) == stats["size"] - assert len(cache) <= cache_size - event(f"maxsize={cache_size}") - - @given( - cache_size=st.integers(min_value=1, max_value=50), - ) - def test_cache_properties_consistent(self, cache_size: int) -> None: - """Cache properties (maxsize, hits, misses) are consistent. - - Property: Properties always match internal state. - """ - bundle = FluentBundle("en", cache=CacheConfig(size=cache_size)) - bundle.add_resource("msg = Hello") - cache = bundle._cache - assert cache is not None # Type narrowing for mypy - - # maxsize property matches constructor - assert cache.maxsize == cache_size - - # hits and misses start at zero - assert cache.hits == 0 - assert cache.misses == 0 - - # After one call: 1 miss, 0 hits - bundle.format_pattern("msg") - assert cache.hits == 0 - assert cache.misses == 1 - - # After second call: 1 miss, 1 hit - bundle.format_pattern("msg") - assert cache.hits == 1 - assert cache.misses == 1 - event(f"maxsize={cache_size}") - - @given( - num_updates=st.integers(min_value=1, max_value=50), - ) - def test_cache_update_existing_key_property(self, num_updates: int) -> None: - """Updating existing cache entry doesn't increase size. - - Property: Repeatedly formatting same message keeps cache size at 1. - """ - bundle = FluentBundle("en", cache=CacheConfig(size=10)) - bundle.add_resource("msg = Hello") - cache = bundle._cache - assert cache is not None # Type narrowing for mypy - - # Format same message multiple times - for _ in range(num_updates): - bundle.format_pattern("msg") - - # Cache size should be 1 (same entry updated) - assert len(cache) == 1 - assert cache.hits == num_updates - 1 - assert cache.misses == 1 - event(f"updates={num_updates}") - - @given( - args_list=st.lists( - st.dictionaries( - keys=st.text(alphabet="abcdefghij", min_size=1, max_size=3), - values=st.integers(min_value=0, max_value=100), - min_size=0, - max_size=3, - ), - min_size=1, - max_size=20, - ) - ) - def test_cache_key_uniqueness_property(self, args_list: list[dict[str, int]]) -> None: - """Each unique args dict creates separate cache entry. - - Property: Distinct args → distinct cache keys → separate entries. - """ - bundle = FluentBundle("en", cache=CacheConfig(size=100), use_isolating=False, strict=False) - bundle.add_resource("msg = { $a } { $b } { $c }") - cache = bundle._cache - assert cache is not None # Type narrowing for mypy - - # Format with different args - for args in args_list: - bundle.format_pattern("msg", args) - - # Cache size equals number of unique args - unique_args = len({tuple(sorted(args.items())) for args in args_list}) - assert len(cache) == min(unique_args, 100) # Min with cache_size - event(f"unique_args={unique_args}") - - @given( - message_ids=st.lists( - st.text(alphabet="abcdefghij", min_size=3, max_size=10), - min_size=1, - max_size=20, - unique=True, - ) - ) - def test_cache_message_id_isolation_property( - self, message_ids: list[str] - ) -> None: - """Different message IDs create separate cache entries. - - Property: Each message_id → separate cache entry. - """ - bundle = FluentBundle("en", cache=CacheConfig(size=100)) - - # Add all messages - ftl_source = "\n".join([f"{msg_id} = Message {i}" for i, msg_id in enumerate(message_ids)]) - bundle.add_resource(ftl_source) - cache = bundle._cache - assert cache is not None # Type narrowing for mypy - - # Format all messages - for msg_id in message_ids: - bundle.format_pattern(msg_id) - - # Cache should have one entry per message - assert len(cache) == min(len(message_ids), 100) - event(f"msg_count={len(message_ids)}") - - @given( - attributes=st.lists( - st.one_of(st.none(), st.text(alphabet="abcdefghij", min_size=1, max_size=10)), - min_size=1, - max_size=10, - ) - ) - def test_cache_attribute_isolation_property( - self, attributes: list[str | None] - ) -> None: - """Different attributes create separate cache entries. - - Property: Each attribute → separate cache entry. - """ - bundle = FluentBundle("en", cache=CacheConfig(size=100), use_isolating=False) - - # Create message with multiple attributes - attrs_ftl = "\n ".join([f".{attr} = Attr {attr}" for attr in attributes if attr]) - bundle.add_resource(f"msg = Value\n {attrs_ftl}") - cache = bundle._cache - assert cache is not None # Type narrowing for mypy - - # Format with different attributes - seen_attrs = set() - for attr in attributes: - bundle.format_pattern("msg", attribute=attr) - seen_attrs.add(attr) - - # Cache should have one entry per unique attribute - assert len(cache) == len(seen_attrs) - event(f"attr_count={len(seen_attrs)}") - - @given( - num_operations=st.integers(min_value=0, max_value=100), - ) - def test_cache_size_property_consistency(self, num_operations: int) -> None: - """Cache size property matches internal state. - - Property: cache.size == len(cache._cache). - """ - bundle = FluentBundle("en", cache=CacheConfig(size=100)) - - # Add messages - ftl_source = "\n".join([f"msg{i} = Message {i}" for i in range(num_operations)]) - bundle.add_resource(ftl_source) - cache = bundle._cache - assert cache is not None # Type narrowing for mypy - - # Format messages - for i in range(num_operations): - bundle.format_pattern(f"msg{i}") - - # size property should match len() and stats - assert cache.size == len(cache) - stats = bundle.get_cache_stats() - assert stats is not None - assert cache.size == stats["size"] - event(f"entries={num_operations}") - - -@pytest.mark.fuzz -class TestCacheTypeCollisionPrevention: - """Tests for type collision prevention in cache keys. - - Python's hash equality means hash(1) == hash(True) == hash(1.0), which would - cause cache collisions when these values produce different formatted outputs. - The cache uses type-tagged tuples to prevent this. - """ - - def test_bool_int_produce_different_cache_entries(self) -> None: - """Boolean True and integer 1 produce distinct cache entries. - - In Fluent, True formats as "true" while 1 formats as "1". Without type - tagging, Python's hash equality would cause cache collision. - """ - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = { $v }") - - # Format with True first - result_bool, _ = bundle.format_pattern("msg", {"v": True}) - # Format with 1 (would collide without type tagging) - result_int, _ = bundle.format_pattern("msg", {"v": 1}) - - # Results must differ - bool formats as "true", int as "1" - assert result_bool == "true" - assert result_int == "1" - - # Cache should have 2 entries (not 1 due to collision) - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 2 - - def test_int_decimal_produce_different_cache_entries(self) -> None: - """Integer 1 and Decimal('1') produce distinct cache entries. - - Without type tagging, hash(1) == hash(Decimal('1')) would cause collision. - """ - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = { $v }") - - # Format with int first - _result_int, _ = bundle.format_pattern("msg", {"v": 1}) - # Format with Decimal (would collide without type tagging) - _result_decimal, _ = bundle.format_pattern("msg", {"v": Decimal(1)}) - - # Cache should have 2 entries - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 2 - - def test_bool_false_int_zero_distinct(self) -> None: - """Boolean False and integer 0 produce distinct cache entries. - - hash(False) == hash(0) in Python. - """ - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = { $v }") - - result_bool, _ = bundle.format_pattern("msg", {"v": False}) - result_int, _ = bundle.format_pattern("msg", {"v": 0}) - - # bool formats as "false", int as "0" - assert result_bool == "false" - assert result_int == "0" - - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 2 - - def test_cache_hit_returns_correct_typed_value(self) -> None: - """Cache hit returns value for correct type, not hash-equivalent type. - - After caching with int 1, looking up with bool True must NOT return - the cached "1", but cache miss and format "true". - """ - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = { $v }") - - # Cache with int 1 - bundle.format_pattern("msg", {"v": 1}) - - # Look up with bool True - must NOT be a cache hit for the int entry - result, _ = bundle.format_pattern("msg", {"v": True}) - - # If type tagging works, this returns "true" not "1" - assert result == "true" - - @given(st.booleans(), st.integers()) - def test_bool_int_always_distinct(self, b: bool, i: int) -> None: - """PROPERTY: Any bool and int pair with same Python hash produce distinct cache entries.""" - # Only test when hash would collide - if hash(b) != hash(i): - return - - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = { $v }") - - # Format both - bundle.format_pattern("msg", {"v": b}) - bundle.format_pattern("msg", {"v": i}) - - # Should be 2 entries despite hash equality - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 2 - event(f"bool={b}") - - @given(st.integers(), st.decimals(allow_nan=False, allow_infinity=False)) - def test_int_decimal_always_distinct_when_equal(self, i: int, d: Decimal) -> None: - """PROPERTY: Int and Decimal with same numeric value produce distinct cache entries.""" - # Only test when values are hash-equal (hash(n) == hash(Decimal(n)) in Python) - try: - if hash(i) != hash(d): - return - except (TypeError, ValueError): - return - - bundle = FluentBundle("en", cache=CacheConfig(), use_isolating=False) - bundle.add_resource("msg = { $v }") - - # Format both - bundle.format_pattern("msg", {"v": i}) - bundle.format_pattern("msg", {"v": d}) - - # Should be 2 entries despite hash equality - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 2 - event(f"int_value={i}") +"""Aggregated runtime cache property test surface.""" + +from tests.runtime_cache_property_cases.formatcache_properties_via_fluent_bundle import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_property_cases.property_tests_basic_invariants import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_property_cases.property_tests_init_parameters import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_property_cases.property_tests_key_handling import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_property_cases.property_tests_lru_eviction import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_property_cases.property_tests_robustness import * # noqa: F403 - re-export split test surface +from tests.runtime_cache_property_cases.property_tests_statistics import * # noqa: F403 - re-export split test surface diff --git a/tests/test_runtime_function_bridge.py b/tests/test_runtime_function_bridge.py index ae302882..502fb83e 100644 --- a/tests/test_runtime_function_bridge.py +++ b/tests/test_runtime_function_bridge.py @@ -1,1161 +1,14 @@ -"""Tests for runtime.function_bridge: FunctionRegistry, FunctionSignature, edge cases.""" - -from __future__ import annotations - -from decimal import Decimal -from typing import Any - -import pytest - -from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError -from ftllexengine.runtime.function_bridge import ( - _FTL_REQUIRES_LOCALE_ATTR, - FluentValue, - FunctionRegistry, - FunctionSignature, - fluent_function, -) - -# ============================================================================ -# HELPER FUNCTIONS FOR TESTING -# ============================================================================ - - -def sample_function(value: int, *, minimum_fraction_digits: int = 0) -> str: - """Sample function with snake_case parameters.""" - return f"{value:.{minimum_fraction_digits}f}" - - -def simple_function(text: str) -> str: - """Simple function with single parameter.""" - return text.upper() - - -def positional_only_function(value: int, /) -> str: - """Function with positional-only parameter.""" - return str(value * 2) - - -def mixed_params_function( - value: int, /, *, use_grouping: bool = False, date_style: str = "short" -) -> str: - """Function with mixed parameter types.""" - result = str(value) - if use_grouping: - result = f"{value:,}" - return f"{result} ({date_style})" - - -# ============================================================================ -# FUNCTION SIGNATURE TESTS -# ============================================================================ - - -class TestFunctionSignature: - """Test FunctionSignature dataclass.""" - - def test_create_function_signature(self) -> None: - """Create FunctionSignature with all fields.""" - sig = FunctionSignature( - python_name="test_func", - ftl_name="TEST", - param_mapping=(("minimumValue", "minimum_value"),), - callable=str, - ) - - assert sig.python_name == "test_func" - assert sig.ftl_name == "TEST" - assert sig.param_mapping == (("minimumValue", "minimum_value"),) - - def test_function_signature_immutable(self) -> None: - """FunctionSignature is immutable.""" - sig = FunctionSignature( - python_name="test", - ftl_name="TEST", - param_mapping=(), - callable=lambda: "test", - ) - - with pytest.raises(AttributeError): - sig.python_name = "new_name" # type: ignore[misc] - - -# ============================================================================ -# FUNCTION REGISTRY BASIC TESTS -# ============================================================================ - - -class TestFunctionRegistryBasic: - """Test basic FunctionRegistry functionality.""" - - def test_create_registry(self) -> None: - """Create empty function registry.""" - registry = FunctionRegistry() - - assert not registry.has_function("NUMBER") - - def test_register_function_with_default_name(self) -> None: - """Register function with auto-generated FTL name.""" - registry = FunctionRegistry() - - def number(value: int) -> str: - return str(value) - - registry.register(number) - - assert registry.has_function("NUMBER") - assert registry.get_python_name("NUMBER") == "number" - - def test_register_function_with_custom_ftl_name(self) -> None: - """Register function with custom FTL name.""" - registry = FunctionRegistry() - - registry.register(sample_function, ftl_name="NUM_FORMAT") - - assert registry.has_function("NUM_FORMAT") - assert not registry.has_function("SAMPLE_FUNCTION") - - def test_register_function_with_custom_param_map(self) -> None: - """Register function with custom parameter mappings.""" - registry = FunctionRegistry() - - def custom_func(arg1: int, *, special_arg: str = "") -> str: - return f"{arg1}:{special_arg}" - - registry.register( - custom_func, - ftl_name="CUSTOM", - param_map={"customArg": "special_arg"}, - ) - - result = registry.call("CUSTOM", [42], {"customArg": "test"}) - assert result == "42:test" - - def test_register_inject_locale_function_with_incompatible_signature(self) -> None: - """Register function with inject_locale=True but wrong signature raises TypeError. - - Regression test for API-REGISTRY-SIG-MISMATCH-001. - Functions marked with inject_locale=True must have at least 2 positional - parameters to receive (value, locale_code). Registration should fail-fast - rather than allowing runtime errors. - """ - from ftllexengine.runtime.function_bridge import ( - fluent_function, - ) - - @fluent_function(inject_locale=True) - def bad_func(value: int) -> str: - """Only 1 positional param - incompatible with locale injection.""" - return str(value) - - registry = FunctionRegistry() - - with pytest.raises(TypeError, match="inject_locale=True requires at least 2 positional"): - registry.register(bad_func, ftl_name="BAD") - - def test_register_inject_locale_function_with_compatible_signature(self) -> None: - """Register function with inject_locale=True and correct signature succeeds.""" - from ftllexengine.runtime.function_bridge import ( - fluent_function, - ) - - @fluent_function(inject_locale=True) - def good_func(value: int, locale_code: str) -> str: - """2 positional params - compatible with locale injection.""" - return f"{value}@{locale_code}" - - registry = FunctionRegistry() - registry.register(good_func, ftl_name="GOOD") - - assert registry.has_function("GOOD") - - -# ============================================================================ -# PARAMETER NAME CONVERSION TESTS -# ============================================================================ - - -class TestParameterNameConversion: - """Test snake_case <-> camelCase conversion.""" - - def test_to_camel_case_single_word(self) -> None: - """Convert single word (no change).""" - result = FunctionRegistry._to_camel_case("value") - - assert result == "value" - - def test_to_camel_case_two_words(self) -> None: - """Convert two_words to twoWords.""" - result = FunctionRegistry._to_camel_case("minimum_value") - - assert result == "minimumValue" - - def test_to_camel_case_multiple_words(self) -> None: - """Convert multiple_word_name to multipleWordName.""" - result = FunctionRegistry._to_camel_case("minimum_fraction_digits") - - assert result == "minimumFractionDigits" - - def test_to_camel_case_already_camel(self) -> None: - """Convert camelCase (no underscores) stays same.""" - result = FunctionRegistry._to_camel_case("alreadyCamel") - - assert result == "alreadyCamel" - - - -# ============================================================================ -# FUNCTION CALLING TESTS -# ============================================================================ - - -class TestFunctionCalling: - """Test calling registered functions.""" - - def test_call_function_with_positional_args(self) -> None: - """Call function with only positional arguments.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="UPPER") - - result = registry.call("UPPER", ["hello"], {}) - - assert result == "HELLO" - - def test_call_function_with_named_args(self) -> None: - """Call function with named arguments.""" - registry = FunctionRegistry() - registry.register(sample_function, ftl_name="FORMAT") - - # FTL: FORMAT($value, minimumFractionDigits: 2) - result = registry.call("FORMAT", [42], {"minimumFractionDigits": 2}) - - assert result == "42.00" - - def test_call_function_with_mixed_args(self) -> None: - """Call function with both positional and named arguments.""" - registry = FunctionRegistry() - registry.register(mixed_params_function, ftl_name="MIX") - - result = registry.call("MIX", [1000], {"useGrouping": True, "dateStyle": "long"}) - assert isinstance(result, str) - assert "1,000" in result - assert "long" in result - - def test_call_function_auto_converts_camel_to_snake(self) -> None: - """Function call auto-converts FTL camelCase to Python snake_case.""" - registry = FunctionRegistry() - - def test_func(*, minimum_value: int = 0, maximum_value: int = 100) -> str: - return f"{minimum_value}-{maximum_value}" - - registry.register(test_func, ftl_name="RANGE") - - # FTL uses camelCase: minimumValue, maximumValue - result = registry.call("RANGE", [], {"minimumValue": 5, "maximumValue": 50}) - - assert result == "5-50" - - def test_call_nonexistent_function_raises_error(self) -> None: - """Calling non-existent function raises FrozenFluentError with RESOLUTION category.""" - registry = FunctionRegistry() - - with pytest.raises(FrozenFluentError, match="Function 'NONEXISTENT' not found") as exc_info: - registry.call("NONEXISTENT", [], {}) - assert exc_info.value.category == ErrorCategory.RESOLUTION - - def test_call_function_that_raises_exception(self) -> None: - """Function that raises exception is wrapped in FrozenFluentError.""" - registry = FunctionRegistry() - - def failing_func(_value: int) -> str: - msg = "Something went wrong" - raise ValueError(msg) - - registry.register(failing_func, ftl_name="FAIL") - - with pytest.raises(FrozenFluentError, match="Function 'FAIL' failed") as exc_info: - registry.call("FAIL", [42], {}) - assert exc_info.value.category == ErrorCategory.RESOLUTION - - -# ============================================================================ -# AUTO-GENERATION PARAMETER MAPPING TESTS -# ============================================================================ - - -class TestAutoParameterMapping: - """Test automatic parameter mapping generation.""" - - def test_auto_map_snake_case_params(self) -> None: - """Auto-generate mappings for snake_case parameters.""" - registry = FunctionRegistry() - - def func(*, minimum_value: int = 0, maximum_value: int = 100) -> str: - return f"{minimum_value}:{maximum_value}" - - registry.register(func, ftl_name="FUNC") - - # Should auto-map: minimumValue -> minimum_value, maximumValue -> maximum_value - result = registry.call("FUNC", [], {"minimumValue": 1, "maximumValue": 10}) - assert result == "1:10" - - def test_auto_map_skips_self_parameter(self) -> None: - """Auto-mapping skips 'self' parameter.""" - - class TestClass: - def method(self, value: int) -> str: - return str(value) - - registry = FunctionRegistry() - obj = TestClass() - registry.register(obj.method, ftl_name="METHOD") - - result = registry.call("METHOD", [42], {}) - assert result == "42" - - def test_auto_map_with_positional_only_marker(self) -> None: - """Auto-mapping skips positional-only marker '/'.""" - registry = FunctionRegistry() - - registry.register(positional_only_function, ftl_name="POS") - - result = registry.call("POS", [21], {}) - assert result == "42" - - def test_custom_param_map_overrides_auto_map(self) -> None: - """Custom parameter mapping overrides auto-generated mapping.""" - registry = FunctionRegistry() - - def func(*, minimum_value: int = 0) -> str: - return str(minimum_value) - - # Auto would create: minimumValue -> minimum_value - # Custom override: minVal -> minimum_value - registry.register( - func, - ftl_name="FUNC", - param_map={"minVal": "minimum_value"}, - ) - - result = registry.call("FUNC", [], {"minVal": 42}) - assert result == "42" - - -# ============================================================================ -# REGISTRY QUERY TESTS -# ============================================================================ - - -class TestRegistryQueries: - """Test registry query methods.""" - - def test_has_function_returns_true_when_registered(self) -> None: - """has_function returns True for registered function.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="UPPER") - - assert registry.has_function("UPPER") - - def test_has_function_returns_false_when_not_registered(self) -> None: - """has_function returns False for unregistered function.""" - registry = FunctionRegistry() - - assert not registry.has_function("UNKNOWN") - - def test_get_python_name_returns_name_when_registered(self) -> None: - """get_python_name returns Python function name.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="UPPER") - - python_name = registry.get_python_name("UPPER") - - assert python_name == "simple_function" - - def test_get_python_name_returns_none_when_not_registered(self) -> None: - """get_python_name returns None for unregistered function.""" - registry = FunctionRegistry() - - python_name = registry.get_python_name("UNKNOWN") - - assert python_name is None - - -# ============================================================================ -# INTROSPECTION API TESTS -# ============================================================================ - - -class TestFunctionRegistryIntrospection: - """Test FunctionRegistry introspection methods.""" - - def test_list_functions_empty_registry(self) -> None: - """list_functions returns empty list for empty registry.""" - registry = FunctionRegistry() - - functions = registry.list_functions() - - assert functions == [] - - def test_list_functions_single_function(self) -> None: - """list_functions returns single function name.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="UPPER") - - functions = registry.list_functions() - - assert functions == ["UPPER"] - - def test_list_functions_multiple_functions(self) -> None: - """list_functions returns all registered function names.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="FUNC1") - registry.register(sample_function, ftl_name="FUNC2") - registry.register(positional_only_function, ftl_name="FUNC3") - - functions = registry.list_functions() - - assert set(functions) == {"FUNC1", "FUNC2", "FUNC3"} - assert len(functions) == 3 - - def test_get_function_info_existing_function(self) -> None: - """get_function_info returns metadata for registered function.""" - registry = FunctionRegistry() - registry.register(sample_function, ftl_name="FORMAT") - - info = registry.get_function_info("FORMAT") - - assert info is not None - assert info.python_name == "sample_function" - assert info.ftl_name == "FORMAT" - assert isinstance(info.param_mapping, tuple) - assert "minimumFractionDigits" in info.param_dict - assert info.param_dict["minimumFractionDigits"] == "minimum_fraction_digits" - assert callable(info.callable) - - def test_get_function_info_nonexistent_function(self) -> None: - """get_function_info returns None for unregistered function.""" - registry = FunctionRegistry() - - info = registry.get_function_info("NONEXISTENT") - - assert info is None - - def test_iter_empty_registry(self) -> None: - """Iterating empty registry yields no names.""" - registry = FunctionRegistry() - - names = list(registry) - - assert names == [] - - def test_iter_single_function(self) -> None: - """Iterating registry yields function names.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="UPPER") - - names = list(registry) - - assert names == ["UPPER"] - - def test_iter_multiple_functions(self) -> None: - """Iterating registry yields all function names.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="FUNC1") - registry.register(sample_function, ftl_name="FUNC2") - registry.register(positional_only_function, ftl_name="FUNC3") - - names = list(registry) - - assert set(names) == {"FUNC1", "FUNC2", "FUNC3"} - - def test_iter_for_loop(self) -> None: - """Can iterate registry in for loop.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="A") - registry.register(sample_function, ftl_name="B") - - collected_names = [] - for name in registry: - collected_names.append(name) - - assert set(collected_names) == {"A", "B"} - - def test_len_empty_registry(self) -> None: - """len() returns 0 for empty registry.""" - registry = FunctionRegistry() - - assert len(registry) == 0 - - def test_len_single_function(self) -> None: - """len() returns 1 for registry with one function.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="FUNC") - - assert len(registry) == 1 - - def test_len_multiple_functions(self) -> None: - """len() returns correct count for multiple functions.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="F1") - registry.register(sample_function, ftl_name="F2") - registry.register(positional_only_function, ftl_name="F3") - - assert len(registry) == 3 - - def test_len_after_overwrite(self) -> None: - """len() doesn't double-count after overwriting function.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="FUNC") - registry.register(sample_function, ftl_name="FUNC") - - assert len(registry) == 1 - - def test_contains_registered_function(self) -> None: - """'in' operator returns True for registered function.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="UPPER") - - assert "UPPER" in registry - - def test_contains_unregistered_function(self) -> None: - """'in' operator returns False for unregistered function.""" - registry = FunctionRegistry() - - assert "NONEXISTENT" not in registry - - def test_contains_case_sensitive(self) -> None: - """'in' operator is case-sensitive.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="UPPER") - - assert "UPPER" in registry - assert "upper" not in registry - assert "Upper" not in registry - - def test_introspection_integration(self) -> None: - """Combine introspection methods for function discovery.""" - registry = FunctionRegistry() - registry.register(simple_function, ftl_name="FUNC1") - registry.register(sample_function, ftl_name="FUNC2") - - # Check count - assert len(registry) == 2 - - # List all functions - functions = registry.list_functions() - assert len(functions) == 2 - - # Iterate and inspect each function - for name in registry: - assert name in registry - info = registry.get_function_info(name) - assert info is not None - assert info.ftl_name == name - - def test_copy_preserves_introspection(self) -> None: - """Copied registry preserves introspection capabilities.""" - original = FunctionRegistry() - original.register(simple_function, ftl_name="FUNC1") - original.register(sample_function, ftl_name="FUNC2") - - copied = original.copy() - - # Both registries have same functions - assert len(original) == len(copied) - assert set(original) == set(copied) - assert original.list_functions() == copied.list_functions() - - # Modifying copy doesn't affect original - copied.register(positional_only_function, ftl_name="FUNC3") - assert len(copied) == 3 - assert len(original) == 2 - - -# ============================================================================ -# EDGE CASES AND INTEGRATION TESTS -# ============================================================================ - - -class TestFunctionBridgeEdgeCases: - """Test edge cases and corner scenarios.""" - - def test_register_multiple_functions(self) -> None: - """Register multiple functions in same registry.""" - registry = FunctionRegistry() - - def func1(x: int) -> str: - return str(x) - - def func2(x: int) -> str: - return str(x * 2) - - registry.register(func1, ftl_name="F1") - registry.register(func2, ftl_name="F2") - - assert registry.has_function("F1") - assert registry.has_function("F2") - assert registry.call("F1", [5], {}) == "5" - assert registry.call("F2", [5], {}) == "10" - - def test_overwrite_registered_function(self) -> None: - """Registering same FTL name twice overwrites previous.""" - registry = FunctionRegistry() - - def func1(_x: int) -> str: - return "first" - - def func2(_x: int) -> str: - return "second" - - registry.register(func1, ftl_name="FUNC") - registry.register(func2, ftl_name="FUNC") - - result = registry.call("FUNC", [1], {}) - assert result == "second" - - def test_empty_parameter_name(self) -> None: - """Handle empty parameter names gracefully.""" - result = FunctionRegistry._to_camel_case("") - assert result == "" - - def test_parameter_with_numbers(self) -> None: - """Handle parameter names with numbers.""" - result = FunctionRegistry._to_camel_case("param_123_test") - assert result == "param123Test" - - def test_call_with_unmapped_parameter(self) -> None: - """Call with parameter not in mapping passes through unchanged.""" - registry = FunctionRegistry() - - def func(**kwargs: Any) -> str: - return str(kwargs.get("unknownParam", "default")) - - registry.register(func, ftl_name="FUNC") - - # unknownParam not in auto-mapping, but should pass through - result = registry.call("FUNC", [], {"unknownParam": "custom"}) - assert result == "custom" - - -# ============================================================================ -# REAL-WORLD USAGE TESTS -# ============================================================================ - - -class TestRealWorldUsage: - """Test realistic usage scenarios.""" - - def test_number_formatting_function(self) -> None: - """Test NUMBER-like function with real parameters.""" - registry = FunctionRegistry() - - def number_format( - value: object, - *, - minimum_fraction_digits: int = 0, # noqa: ARG001 - unused - maximum_fraction_digits: int = 3, - use_grouping: bool = False, - ) -> str: - formatted = f"{Decimal(str(value)):.{maximum_fraction_digits}f}" - if use_grouping: - # Simple grouping simulation - parts = formatted.split(".") - parts[0] = f"{int(parts[0]):,}" - formatted = ".".join(parts) - return formatted - - registry.register(number_format, ftl_name="NUMBER") - - # FTL: { NUMBER($price, minimumFractionDigits: 2, useGrouping: true) } - result = registry.call( - "NUMBER", - [Decimal("1234.5")], - {"minimumFractionDigits": 2, "useGrouping": True}, - ) - assert isinstance(result, str) - assert "1,234" in result - - def test_datetime_formatting_function(self) -> None: - """Test DATETIME-like function with style parameters.""" - registry = FunctionRegistry() - - def datetime_format( - value: str, *, date_style: str = "short", time_style: str = "short" - ) -> str: - return f"{value} ({date_style}/{time_style})" - - registry.register(datetime_format, ftl_name="DATETIME") - - # FTL: { DATETIME($date, dateStyle: "long", timeStyle: "medium") } - result = registry.call( - "DATETIME", - ["2024-01-15"], - {"dateStyle": "long", "timeStyle": "medium"}, - ) - - assert result == "2024-01-15 (long/medium)" - - -# ============================================================================ -# EDGE CASES (from test_function_bridge_edge_cases.py) -# ============================================================================ - - -class TestFrozenRegistryLines160To164: - """Test lines 160-164: TypeError when registering on frozen registry.""" - - def test_register_on_frozen_registry_raises_type_error(self) -> None: - """Test register() raises TypeError on frozen registry (lines 160-164).""" - registry = FunctionRegistry() - - # Freeze the registry - registry.freeze() - - # Try to register a function on frozen registry - def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - # Should raise TypeError with specific message - with pytest.raises( - TypeError, - match=r"Cannot modify frozen registry.*create_default_registry", - ): - registry.register(my_func, ftl_name="MYFUNC") - - -class TestParameterCollisionLines188To193: - """Test lines 188-193: ValueError on parameter name collision.""" - - def test_register_with_parameter_collision_raises_value_error(self) -> None: - """Test register() raises ValueError on parameter collision (lines 188-193).""" - registry = FunctionRegistry() - - # Create a function with parameters that will collide after stripping underscores - # Both `_value` and `value` would map to camelCase `value` - def colliding_func( - val: str, - locale_code: str, # noqa: ARG001 - unused - /, - _test_param: int = 0, # Will strip to `test_param` -> `testParam` - test_param: int = 0, # Also maps to `testParam` # noqa: ARG001 - unused - ) -> str: - return val - - # Should raise ValueError about parameter collision - with pytest.raises(ValueError, match=r"Parameter name collision.*testParam"): - registry.register(colliding_func, ftl_name="COLLIDE") - - -class TestFreezeMethodLine285: - """Test line 285: freeze() method.""" - - def test_freeze_sets_frozen_flag(self) -> None: - """Test freeze() sets _frozen = True (line 285).""" - registry = FunctionRegistry() - - # Initially not frozen - assert not registry.frozen - - # Freeze it - registry.freeze() - - # Should now be frozen - assert registry.frozen - - def test_freeze_prevents_registration(self) -> None: - """Test freeze() actually prevents further registration.""" - registry = FunctionRegistry() - - # Register a function before freezing - def func1(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - registry.register(func1, ftl_name="FUNC1") - assert "FUNC1" in registry - - # Freeze the registry - registry.freeze() - - # Try to register another function - def func2(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - # Should fail - with pytest.raises(TypeError): - registry.register(func2, ftl_name="FUNC2") - - # Original function still there - assert "FUNC1" in registry - # New function not added - assert "FUNC2" not in registry - - -class TestFrozenPropertyLine294: - """Test line 294: frozen property getter.""" - - def test_frozen_property_returns_false_initially(self) -> None: - """Test frozen property returns False for new registry (line 294).""" - registry = FunctionRegistry() - - # Should not be frozen initially - result = registry.frozen - - assert result is False - - def test_frozen_property_returns_true_after_freeze(self) -> None: - """Test frozen property returns True after freeze() (line 294).""" - registry = FunctionRegistry() - - # Freeze it - registry.freeze() - - # Property should return True - result = registry.frozen - - assert result is True - - def test_frozen_property_is_readonly(self) -> None: - """Test frozen property cannot be set directly.""" - registry = FunctionRegistry() - - # Should not be able to set frozen property - with pytest.raises(AttributeError): - registry.frozen = True # type: ignore[misc] - - -class TestFrozenRegistryCopyIntegration: - """Integration tests for frozen registry and copy().""" - - def test_copy_of_frozen_registry_is_mutable(self) -> None: - """Test copy() of frozen registry creates mutable copy.""" - registry = FunctionRegistry() - - # Register and freeze - def func1(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - registry.register(func1, ftl_name="FUNC1") - registry.freeze() - - # Create copy - copy = registry.copy() - - # Copy should not be frozen - assert not copy.frozen - - # Should be able to register on copy - def func2(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - copy.register(func2, ftl_name="FUNC2") - - # Copy has both functions - assert "FUNC1" in copy - assert "FUNC2" in copy - - # Original only has first function - assert "FUNC1" in registry - assert "FUNC2" not in registry - - -class TestFluentFunctionDecoratorWithParentheses: - """Test lines 134-148: fluent_function decorator WITH parentheses.""" - - def test_fluent_function_decorator_with_inject_locale_true(self) -> None: - """Test @fluent_function(inject_locale=True) decorator path (lines 134-148).""" - from ftllexengine.runtime.function_bridge import ( - fluent_function, - ) - - # Use decorator WITH parentheses - @fluent_function(inject_locale=True) - def my_format(value: str, locale_code: str, /) -> str: - return f"{value}_{locale_code}" - - # Verify the function works - result = my_format("test", "en_US") - assert result == "test_en_US" - - # Verify the locale injection marker was set - assert hasattr(my_format, "_ftl_requires_locale") - assert my_format._ftl_requires_locale is True - - def test_fluent_function_decorator_with_inject_locale_false(self) -> None: - """Test @fluent_function(inject_locale=False) decorator path.""" - from ftllexengine.runtime.function_bridge import ( - fluent_function, - ) - - # Use decorator WITH parentheses but inject_locale=False - @fluent_function(inject_locale=False) - def my_upper(value: str) -> str: - return value.upper() - - # Verify the function works - result = my_upper("test") - assert result == "TEST" - - # Verify the locale injection marker was NOT set - assert not getattr(my_upper, "_ftl_requires_locale", False) - - def test_fluent_function_decorator_without_parentheses(self) -> None: - """Test @fluent_function decorator WITHOUT parentheses (line 147).""" - from ftllexengine.runtime.function_bridge import ( - fluent_function, - ) - - # Use decorator WITHOUT parentheses - @fluent_function - def my_simple(value: str) -> str: - return value.lower() - - # Verify the function works - result = my_simple("TEST") - assert result == "test" - - # When used without parentheses and without inject_locale, should not set marker - assert not getattr(my_simple, "_ftl_requires_locale", False) - - -class TestRegisterWithUninspectableCallable: - """Test lines 258-264: ValueError when callable has no inspectable signature.""" - - def test_register_uninspectable_callable_raises_type_error(self) -> None: - """Test register() raises TypeError for callables without signatures (lines 258-264).""" - registry = FunctionRegistry() - - # Create a mock callable that signature() cannot inspect - class UninspectableCallable: - def __call__(self, *args: object, **kwargs: object) -> str: # noqa: ARG002 - unused - return "test" - - # Manually break signature inspection by making it raise ValueError - from unittest.mock import patch - - uninspectable = UninspectableCallable() - - with ( - patch( - "ftllexengine.runtime.function_registry_helpers.signature", - side_effect=ValueError("No signature"), - ), - pytest.raises( - TypeError, - match=r"Cannot register.*no inspectable signature.*param_mapping", - ), - ): - registry.register(uninspectable, ftl_name="UNINSPECTABLE") - - -class TestShouldInjectLocaleWithMissingFunction: - """Test lines 575-579: should_inject_locale when function not in registry.""" - - def test_should_inject_locale_returns_false_for_missing_function(self) -> None: - """Test should_inject_locale returns False for non-existent function (lines 575-576).""" - registry = FunctionRegistry() - - # Function doesn't exist in registry - result = registry.should_inject_locale("NONEXISTENT") - - # Should return False (not raise) - assert result is False - - def test_should_inject_locale_returns_false_for_function_without_marker(self) -> None: - """Test should_inject_locale when function has no marker. - - Returns False when function exists but has no marker (lines 578-579). - """ - registry = FunctionRegistry() - - # Register a function without locale injection marker - def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - registry.register(my_func, ftl_name="CUSTOM") - - # Function exists, but doesn't have _ftl_requires_locale marker - result = registry.should_inject_locale("CUSTOM") - - # Should return False (lines 578-579: getattr returns False) - assert result is False - - def test_should_inject_locale_returns_true_for_function_with_marker(self) -> None: - """Test should_inject_locale returns True when function has marker set.""" - from ftllexengine.runtime.function_bridge import ( - fluent_function, - ) - - registry = FunctionRegistry() - - # Register a function with locale injection marker - @fluent_function(inject_locale=True) - def my_format(value: str, locale_code: str, /) -> str: - return f"{value}_{locale_code}" - - registry.register(my_format, ftl_name="MYFORMAT") - - # Function has marker, should return True - result = registry.should_inject_locale("MYFORMAT") - - assert result is True - - -class TestGetExpectedPositionalArgs: - """Test lines 605-608: get_expected_positional_args method.""" - - def test_get_expected_positional_args_for_builtin_function(self) -> None: - """Test get_expected_positional_args returns count for built-in (lines 605-608).""" - from ftllexengine.runtime.functions import ( - create_default_registry, - ) - - registry = create_default_registry() - - # NUMBER is a built-in function with 1 positional arg - result = registry.get_expected_positional_args("NUMBER") - - assert result == 1 - - def test_get_expected_positional_args_for_custom_function(self) -> None: - """Test get_expected_positional_args returns None for custom function.""" - registry = FunctionRegistry() - - # Register a custom function - def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - registry.register(my_func, ftl_name="CUSTOM") - - # Custom function should return None (not in BUILTIN_FUNCTIONS) - result = registry.get_expected_positional_args("CUSTOM") - - assert result is None - - -class TestGetBuiltinMetadata: - """Test lines 626-628: get_builtin_metadata method.""" - - def test_get_builtin_metadata_for_builtin_function(self) -> None: - """Test get_builtin_metadata returns metadata for built-in (lines 626-628).""" - from ftllexengine.runtime.functions import ( - create_default_registry, - ) - - registry = create_default_registry() - - # NUMBER is a built-in function - metadata = registry.get_builtin_metadata("NUMBER") - - # Should return metadata object - assert metadata is not None - assert metadata.requires_locale is True - - def test_get_builtin_metadata_for_custom_function(self) -> None: - """Test get_builtin_metadata returns None for custom function.""" - registry = FunctionRegistry() - - # Register a custom function - def my_func(value: str, locale_code: str, /) -> str: # noqa: ARG001 - unused - return value - - registry.register(my_func, ftl_name="CUSTOM") - - # Custom function should return None - metadata = registry.get_builtin_metadata("CUSTOM") - - assert metadata is None - - -# ============================================================================ -# DECORATOR AND REGISTRY COVERAGE -# ============================================================================ - - -class TestFunctionBridgeCoverage: - """Test fluent_function decorator and FunctionRegistry coverage.""" - - def test_fluent_function_no_parentheses_usage(self) -> None: - """Using @fluent_function without parentheses applies decorator directly.""" - - @fluent_function - def my_upper(value: str) -> FluentValue: - return value.upper() - - result = my_upper("hello") - assert result == "HELLO" - - def test_fluent_function_with_parentheses_usage(self) -> None: - """Using @fluent_function() with parentheses works as factory.""" - - @fluent_function() - def my_lower(value: str) -> FluentValue: - return value.lower() - - result = my_lower("HELLO") - assert result == "hello" - - def test_fluent_function_with_locale_injection(self) -> None: - """Using @fluent_function(inject_locale=True) sets locale attribute.""" - - @fluent_function(inject_locale=True) - def locale_aware(value: str, locale: str) -> FluentValue: - return f"{value}@{locale}" - - assert hasattr(locale_aware, _FTL_REQUIRES_LOCALE_ATTR) - assert getattr(locale_aware, _FTL_REQUIRES_LOCALE_ATTR) is True - - def test_fluent_function_wrapper_returns_value(self) -> None: - """Wrapper function passes through the decorated function's return value.""" - - @fluent_function - def add_suffix(value: str, suffix: str = "!") -> FluentValue: - return f"{value}{suffix}" - - result = add_suffix("Hello", suffix="?") - assert result == "Hello?" - - def test_get_builtin_metadata_exists(self) -> None: - """get_builtin_metadata returns metadata for known built-in function.""" - registry = FunctionRegistry() - - meta = registry.get_builtin_metadata("NUMBER") - assert meta is not None - assert meta.requires_locale is True - - def test_get_builtin_metadata_not_exists(self) -> None: - """get_builtin_metadata returns None for unknown function name.""" - registry = FunctionRegistry() - - meta = registry.get_builtin_metadata("NONEXISTENT") - assert meta is None - - -class TestFunctionBridgeLeadingUnderscore: - """Test function parameter with leading underscore is preserved in mapping.""" - - def test_parameter_with_leading_underscore(self) -> None: - """Parameter with leading underscore is kept in param_mapping.""" - registry = FunctionRegistry() - - def test_func(_internal: str, public: str) -> str: # noqa: PT019 - intentional - return f"{_internal}:{public}" - - registry.register(test_func, ftl_name="TEST") - - sig = registry._functions["TEST"] # pylint: disable=protected-access - param_values = [v for _, v in sig.param_mapping] - assert "_internal" in param_values - - -class TestFunctionMetadataCallable: - """Test should_inject_locale returns False for unknown function names.""" - - def test_should_inject_locale_not_found(self) -> None: - """should_inject_locale returns False for unregistered function name.""" - registry = FunctionRegistry() - - def custom(val: str) -> str: - return val - - registry.register(custom, ftl_name="CUSTOM") - assert registry.should_inject_locale("NOTFOUND") is False +"""Aggregated runtime function bridge test surface.""" + +from tests.runtime_function_bridge_cases.auto_generation_parameter_mapping_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.decorator_and_registry_coverage import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.edge_cases_and_integration_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.edge_cases_from_test_function_bridge_edge_cases_py import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.function_calling_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.function_registry_basic_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.function_signature_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.helper_functions_for_testing import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.introspection_api_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.parameter_name_conversion_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.real_world_usage_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_function_bridge_cases.registry_query_tests import * # noqa: F403 - re-export split test surface diff --git a/tests/test_runtime_locale_context.py b/tests/test_runtime_locale_context.py index 39bbc6e8..cc256b35 100644 --- a/tests/test_runtime_locale_context.py +++ b/tests/test_runtime_locale_context.py @@ -1,1582 +1,7 @@ -"""Tests for LocaleContext - locale-aware formatting without global state. +"""Aggregated LocaleContext test surface.""" -Tests immutable locale configuration, thread-safe caching, CLDR-compliant -formatting for numbers, dates, and currency via Babel integration. - -Covers: -- Factory methods (create, create_or_raise) and construction guard -- Cache management (identity, LRU eviction, double-check pattern) -- Number formatting (grouping, decimals, special values, validation) -- DateTime formatting (styles, patterns, ISO string input) -- Currency formatting (symbol/code/name display, patterns, boundary values) -- Internal helpers (_get_iso_code_pattern) -- Long locale code handling -- Babel import error paths - -Python 3.13+. -""" - -from __future__ import annotations - -import logging -import sys -import threading -from datetime import UTC, datetime -from decimal import Decimal -from typing import Any, Literal, get_args -from unittest.mock import MagicMock, PropertyMock, patch - -import pytest -from babel import Locale -from babel import dates as babel_dates -from babel import numbers as babel_numbers - -import ftllexengine.core.babel_compat as _bc -from ftllexengine.constants import MAX_LOCALE_CACHE_SIZE -from ftllexengine.core.babel_compat import BabelImportError -from ftllexengine.core.locale_utils import normalize_locale -from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError -from ftllexengine.runtime.locale_context import LocaleContext - -# ============================================================================ -# Construction Guard Tests -# ============================================================================ - - -class TestLocaleContextConstructionGuard: - """Test __post_init__ validation prevents direct construction.""" - - def test_direct_construction_without_token_raises(self) -> None: - """Direct construction without factory token raises TypeError.""" - babel_locale = Locale.parse("en_US") - - with pytest.raises(TypeError) as exc_info: - LocaleContext( - locale_code="en-US", - _babel_locale=babel_locale, - ) - - error_msg = str(exc_info.value) - assert "LocaleContext.create()" in error_msg - assert "LocaleContext.create_or_raise()" in error_msg - assert "direct construction" in error_msg - - def test_direct_construction_with_wrong_token_raises(self) -> None: - """Direct construction with invalid token raises TypeError.""" - babel_locale = Locale.parse("en_US") - wrong_token = object() - - with pytest.raises(TypeError) as exc_info: - LocaleContext( - locale_code="en-US", - _babel_locale=babel_locale, - _factory_token=wrong_token, - ) - - assert "LocaleContext.create()" in str(exc_info.value) - - def test_direct_construction_with_none_token_raises(self) -> None: - """Direct construction with None token raises TypeError.""" - babel_locale = Locale.parse("en_US") - - with pytest.raises(TypeError) as exc_info: - LocaleContext( - locale_code="en-US", - _babel_locale=babel_locale, - _factory_token=None, - ) - - error_msg = str(exc_info.value) - assert "LocaleContext.create()" in error_msg - assert "direct construction" in error_msg - - def test_factory_methods_bypass_guard(self) -> None: - """Factory methods bypass __post_init__ guard successfully.""" - ctx1 = LocaleContext.create("en-US") - assert isinstance(ctx1, LocaleContext) - - ctx2 = LocaleContext.create_or_raise("de-DE") - assert isinstance(ctx2, LocaleContext) - - -# ============================================================================ -# Cache Management Tests -# ============================================================================ - - -class TestLocaleContextCacheManagement: - """Test LocaleContext cache operations.""" - - def test_clear_cache_empties_cache(self) -> None: - """clear_cache() empties the cache.""" - LocaleContext.clear_cache() - LocaleContext.create("en-US") - LocaleContext.create("de-DE") - assert LocaleContext.cache_size() > 0 - - LocaleContext.clear_cache() - assert LocaleContext.cache_size() == 0 - - def test_cache_size_returns_count(self) -> None: - """cache_size() returns number of cached instances.""" - LocaleContext.clear_cache() - assert LocaleContext.cache_size() == 0 - - LocaleContext.create("en-US") - assert LocaleContext.cache_size() == 1 - - LocaleContext.create("de-DE") - assert LocaleContext.cache_size() == 2 - - def test_cache_info_returns_dict(self) -> None: - """cache_info() returns dictionary with expected keys.""" - LocaleContext.clear_cache() - LocaleContext.create("en-US") - LocaleContext.create("de-DE") - - info = LocaleContext.cache_info() - - assert isinstance(info, dict) - assert "size" in info - assert "max_size" in info - assert "locales" in info - assert isinstance(info["locales"], tuple) - assert info["size"] == 2 - - def test_cache_info_after_clear(self) -> None: - """cache_info() returns empty after clearing.""" - LocaleContext.clear_cache() - LocaleContext.create("en-US") - - LocaleContext.clear_cache() - info = LocaleContext.cache_info() - - assert info["size"] == 0 - assert info["locales"] == () - - def test_cache_returns_same_instance(self) -> None: - """Cache returns the same instance for same locale.""" - LocaleContext.clear_cache() - - ctx1 = LocaleContext.create("en-US") - ctx2 = LocaleContext.create("en-US") - - assert ctx1 is ctx2 - - def test_cache_double_check_pattern(self) -> None: - """Cache double-check pattern returns existing instance.""" - from ftllexengine.core.locale_utils import ( # noqa: PLC0415 - import inside function - normalize_locale, - ) - from ftllexengine.runtime.locale_context import ( # noqa: PLC0415 - import inside function - _FACTORY_TOKEN, - ) - - LocaleContext.clear_cache() - - cache_key = normalize_locale("en-RACE-TEST") - pre_inserted_ctx = LocaleContext( - locale_code="en-RACE-TEST", - _babel_locale=Locale.parse("en_US"), - _factory_token=_FACTORY_TOKEN, - ) - - original_parse = Locale.parse - - def parse_with_insertion( - code: str, *args: Any, **kwargs: Any - ) -> Locale: - with LocaleContext._cache_lock: - if cache_key not in LocaleContext._cache: - LocaleContext._cache[cache_key] = ( - pre_inserted_ctx - ) - return original_parse(code, *args, **kwargs) - - with patch.object( - Locale, "parse", side_effect=parse_with_insertion - ): - result = LocaleContext.create("en-RACE-TEST") - - assert result is pre_inserted_ctx - - def test_cache_thread_safety(self) -> None: - """Cache is thread-safe under concurrent access.""" - LocaleContext.clear_cache() - - results: list[LocaleContext] = [] - - def create_context() -> None: - ctx = LocaleContext.create("en-US") - results.append(ctx) - - thread1 = threading.Thread(target=create_context) - thread2 = threading.Thread(target=create_context) - - thread1.start() - thread2.start() - thread1.join() - thread2.join() - - assert len(results) == 2 - assert results[0] is results[1] - - def test_cache_eviction_on_max_size(self) -> None: - """Cache evicts LRU entry when max size reached.""" - LocaleContext.clear_cache() - - locales = ["en-US"] + [ - f"de-DE-x-variant{i}" - for i in range(MAX_LOCALE_CACHE_SIZE) - ] - - for locale in locales[:MAX_LOCALE_CACHE_SIZE]: - LocaleContext.create(locale) - - assert ( - LocaleContext.cache_size() == MAX_LOCALE_CACHE_SIZE - ) - - LocaleContext.create(locales[MAX_LOCALE_CACHE_SIZE]) - - assert ( - LocaleContext.cache_size() == MAX_LOCALE_CACHE_SIZE - ) - - info = LocaleContext.cache_info() - locales_tuple = info["locales"] - assert isinstance(locales_tuple, tuple) - assert "en_US" not in locales_tuple - - def test_clear_cache_and_recreate(self) -> None: - """Cache clearing and recreation works correctly.""" - LocaleContext.clear_cache() - - ctx1 = LocaleContext.create("fr-FR") - assert ctx1.locale_code == "fr_fr" - - ctx2 = LocaleContext.create("fr-FR") - assert ctx1 is ctx2 - - LocaleContext.clear_cache() - ctx3 = LocaleContext.create("fr-FR") - assert ctx1 is not ctx3 - - -# ============================================================================ -# Factory Methods Tests -# ============================================================================ - - -class TestLocaleContextCreate: - """Test LocaleContext.create() factory with graceful fallback.""" - - def test_create_valid_locale(self) -> None: - """create() returns LocaleContext for valid locale.""" - ctx = LocaleContext.create("en-US") - assert isinstance(ctx, LocaleContext) - assert ctx.locale_code == "en_us" - - def test_create_unknown_locale_returns_context(self) -> None: - """create() returns LocaleContext for unknown locale.""" - LocaleContext.clear_cache() - result = LocaleContext.create("xx-UNKNOWN") - - assert isinstance(result, LocaleContext) - assert result.locale_code == "xx_unknown" - assert result.is_fallback is True - - def test_create_unknown_locale_warns( - self, caplog: pytest.LogCaptureFixture - ) -> None: - """create() logs warning for unknown locale.""" - LocaleContext.clear_cache() - - with caplog.at_level(logging.WARNING): - LocaleContext.create("xx_INVALID") - - assert any( - "Unknown locale" in r.message - or "xx_INVALID" in r.message - for r in caplog.records - ) - - def test_create_invalid_format_raises(self) -> None: - """create() rejects structurally invalid locale boundary values.""" - LocaleContext.clear_cache() - - with pytest.raises(ValueError, match=r"Invalid locale_code: '!!!INVALID@@@'"): - LocaleContext.create("!!!INVALID@@@") - - def test_create_unknown_locale_uses_en_us(self) -> None: - """create() uses en_US formatting for unknown locales.""" - ctx = LocaleContext.create("invalid-locale-xyz") - locale = ctx.babel_locale - - assert locale.language == "en" - - -class TestLocaleContextCreateOrRaise: - """Test create_or_raise() factory with strict validation.""" - - def test_create_or_raise_valid_locale(self) -> None: - """create_or_raise() returns LocaleContext for valid locale.""" - ctx = LocaleContext.create_or_raise("en-US") - assert isinstance(ctx, LocaleContext) - assert ctx.locale_code == "en_us" - assert ctx.is_fallback is False - - def test_create_or_raise_unknown_locale_raises(self) -> None: - """create_or_raise() raises ValueError for unknown locale.""" - with pytest.raises( - ValueError, match=r"Unknown locale identifier" - ): - LocaleContext.create_or_raise("xx-INVALID") - - def test_create_or_raise_invalid_format_raises(self) -> None: - """create_or_raise() raises ValueError for invalid format.""" - with pytest.raises(ValueError, match=r"Invalid locale_code: 'not a valid locale'"): - LocaleContext.create_or_raise( - "not a valid locale" - ) - - def test_create_or_raise_error_contains_locale_code( - self, - ) -> None: - """create_or_raise() error message includes locale code.""" - test_locales = ["bad-locale", "xyz-123"] - - for locale_code in test_locales: - with pytest.raises( - ValueError, match="locale" - ) as exc_info: - LocaleContext.create_or_raise(locale_code) - - assert normalize_locale(locale_code) in str(exc_info.value) - - -# ============================================================================ -# Babel Import Error Tests -# ============================================================================ - - -class TestLocaleContextBabelImportErrors: - """Test ImportError paths when Babel is not installed.""" - - def test_create_raises_babel_import_error(self) -> None: - """create() raises BabelImportError when Babel unavailable.""" - LocaleContext.clear_cache() - - babel_module = sys.modules.pop("babel", None) - babel_core = sys.modules.pop("babel.core", None) - babel_dates_mod = sys.modules.pop("babel.dates", None) - babel_nums = sys.modules.pop("babel.numbers", None) - - # Reset sentinel so _check_babel_available() re-evaluates under the mock - _bc._babel_available = None - - try: - with patch.dict(sys.modules, {"babel": None}): - original_import = __import__ - - def mock_import( - name: str, - globals_dict: ( - dict[str, object] | None - ) = None, - locals_dict: ( - dict[str, object] | None - ) = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel": - err = ModuleNotFoundError("No module named 'babel'") - err.name = "babel" - raise err - return original_import( - name, - globals_dict, - locals_dict, - fromlist, - level, - ) - - with patch( - "builtins.__import__", - side_effect=mock_import, - ): - with pytest.raises( - BabelImportError - ) as exc_info: - LocaleContext.create("en-US") - - assert "LocaleContext.create" in str( - exc_info.value - ) - finally: - if babel_module is not None: - sys.modules["babel"] = babel_module - if babel_core is not None: - sys.modules["babel.core"] = babel_core - if babel_dates_mod is not None: - sys.modules["babel.dates"] = babel_dates_mod - if babel_nums is not None: - sys.modules["babel.numbers"] = babel_nums - # Reset sentinel so subsequent tests reinitialize with Babel available - _bc._babel_available = None - LocaleContext.clear_cache() - - def test_create_or_raise_raises_babel_import_error( - self, - ) -> None: - """create_or_raise() raises BabelImportError.""" - babel_module = sys.modules.pop("babel", None) - babel_core = sys.modules.pop("babel.core", None) - babel_dates_mod = sys.modules.pop("babel.dates", None) - babel_nums = sys.modules.pop("babel.numbers", None) - - # Reset sentinel so _check_babel_available() re-evaluates under the mock - _bc._babel_available = None - - try: - with patch.dict(sys.modules, {"babel": None}): - original_import = __import__ - - def mock_import( - name: str, - globals_dict: ( - dict[str, object] | None - ) = None, - locals_dict: ( - dict[str, object] | None - ) = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel": - err = ModuleNotFoundError("No module named 'babel'") - err.name = "babel" - raise err - return original_import( - name, - globals_dict, - locals_dict, - fromlist, - level, - ) - - with patch( - "builtins.__import__", - side_effect=mock_import, - ): - with pytest.raises( - BabelImportError - ) as exc_info: - LocaleContext.create_or_raise("en-US") - - assert "create_or_raise" in str( - exc_info.value - ) - finally: - if babel_module is not None: - sys.modules["babel"] = babel_module - if babel_core is not None: - sys.modules["babel.core"] = babel_core - if babel_dates_mod is not None: - sys.modules["babel.dates"] = babel_dates_mod - if babel_nums is not None: - sys.modules["babel.numbers"] = babel_nums - # Reset sentinel so subsequent tests reinitialize with Babel available - _bc._babel_available = None - - -# ============================================================================ -# Number Formatting Tests -# ============================================================================ - - -class TestFormatNumber: - """Test format_number() with various locales and parameters.""" - - def test_format_number_en_us_grouping(self) -> None: - """format_number() formats with grouping for en-US.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number(Decimal("1234.5"), use_grouping=True) - assert "1,234" in result or "1234" in result - - def test_format_number_de_de_grouping(self) -> None: - """format_number() formats with grouping for de-DE.""" - ctx = LocaleContext.create("de-DE") - result = ctx.format_number(Decimal("1234.5"), use_grouping=True) - assert "1.234" in result or "1234" in result - - def test_format_number_fixed_decimals(self) -> None: - """format_number() formats with fixed decimal places.""" - ctx = LocaleContext.create("en-US") - - result = ctx.format_number( - Decimal("1234.5"), - minimum_fraction_digits=2, - maximum_fraction_digits=2, - ) - assert result == "1,234.50" - - result = ctx.format_number( - Decimal("1234.567"), - minimum_fraction_digits=0, - maximum_fraction_digits=0, - ) - assert result == "1,235" - assert "." not in result - - def test_format_number_fixed_three_decimals(self) -> None: - """format_number() with fixed 3 decimal places.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number( - Decimal("123.4"), - minimum_fraction_digits=3, - maximum_fraction_digits=3, - ) - assert result == "123.400" - - def test_format_number_custom_pattern(self) -> None: - """format_number() respects custom pattern.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number( - Decimal("-1234.56"), pattern="#,##0.00;(#,##0.00)" - ) - assert "1,234.56" in result or "1234.56" in result - - def test_format_number_preserves_decimal_precision( - self, - ) -> None: - """format_number() preserves large decimal precision.""" - ctx = LocaleContext.create("en-US") - - large_decimal = Decimal("123456789.123456789") - result = ctx.format_number( - large_decimal, - minimum_fraction_digits=2, - maximum_fraction_digits=2, - ) - - assert result == "123,456,789.12" - assert result.count(".") == 1 - decimal_part = result.split(".")[-1] - assert len(decimal_part) == 2 - - def test_format_number_with_decimal_type(self) -> None: - """format_number() with Decimal type for fixed decimals.""" - ctx = LocaleContext.create("de-DE") - - value = Decimal("1234.5") - result = ctx.format_number( - value, - minimum_fraction_digits=2, - maximum_fraction_digits=2, - ) - - assert "," in result - assert result == "1.234,50" - - def test_format_number_error_raises_formatting_error( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """format_number() raises FrozenFluentError on error.""" - def mock_format_decimal( - *_args: object, **_kwargs: object - ) -> None: - msg = "Mocked format error" - raise ValueError(msg) - - monkeypatch.setattr( - babel_numbers, - "format_decimal", - mock_format_decimal, - ) - - ctx = LocaleContext.create("en-US") - with pytest.raises(FrozenFluentError) as exc_info: - ctx.format_number(Decimal("123.45")) - - assert ( - exc_info.value.category == ErrorCategory.FORMATTING - ) - assert exc_info.value.fallback_value == "123.45" - - -# ============================================================================ -# Number Formatting Validation Tests -# ============================================================================ - - -class TestFormatNumberDigitValidation: - """Test format_number() digit parameter validation.""" - - def test_minimum_fraction_digits_negative_raises( - self, - ) -> None: - """Raises ValueError for negative minimum.""" - ctx = LocaleContext.create("en-US") - with pytest.raises( - ValueError, - match=r"minimum_fraction_digits must be", - ): - ctx.format_number( - Decimal("123.45"), minimum_fraction_digits=-1 - ) - - def test_minimum_fraction_digits_exceeds_max_raises( - self, - ) -> None: - """Raises ValueError when exceeding MAX_FORMAT_DIGITS.""" - from ftllexengine.constants import ( # noqa: PLC0415 - import inside function - MAX_FORMAT_DIGITS, - ) - - ctx = LocaleContext.create("en-US") - with pytest.raises( - ValueError, - match=r"minimum_fraction_digits must be", - ): - ctx.format_number( - Decimal("123.45"), - minimum_fraction_digits=MAX_FORMAT_DIGITS + 1, - ) - - def test_maximum_fraction_digits_negative_raises( - self, - ) -> None: - """Raises ValueError for negative maximum.""" - ctx = LocaleContext.create("en-US") - with pytest.raises( - ValueError, - match=r"maximum_fraction_digits must be", - ): - ctx.format_number( - Decimal("123.45"), maximum_fraction_digits=-1 - ) - - def test_maximum_fraction_digits_exceeds_max_raises( - self, - ) -> None: - """Raises ValueError when exceeding MAX_FORMAT_DIGITS.""" - from ftllexengine.constants import ( # noqa: PLC0415 - import inside function - MAX_FORMAT_DIGITS, - ) - - ctx = LocaleContext.create("en-US") - with pytest.raises( - ValueError, - match=r"maximum_fraction_digits must be", - ): - ctx.format_number( - Decimal("123.45"), - maximum_fraction_digits=MAX_FORMAT_DIGITS + 1, - ) - - -class TestFormatNumberSpecialValues: - """Test format_number() with special Decimal values.""" - - def test_format_number_positive_infinity(self) -> None: - """format_number() handles positive infinity.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number(Decimal("Infinity")) - assert isinstance(result, str) - assert len(result) > 0 - - def test_format_number_negative_infinity(self) -> None: - """format_number() handles negative infinity.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number(Decimal("-Infinity")) - assert isinstance(result, str) - assert len(result) > 0 - - def test_format_number_nan(self) -> None: - """format_number() handles NaN.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number(Decimal("NaN")) - assert isinstance(result, str) - assert len(result) > 0 - - def test_format_number_infinity_with_grouping(self) -> None: - """format_number() handles infinity with use_grouping.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number( - Decimal("Infinity"), use_grouping=False - ) - assert isinstance(result, str) - assert len(result) > 0 - - def test_format_number_nan_with_custom_pattern(self) -> None: - """format_number() handles NaN with custom pattern.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_number( - Decimal("NaN"), pattern="#,##0.00" - ) - assert isinstance(result, str) - assert len(result) > 0 - - -# ============================================================================ -# DateTime Formatting Tests -# ============================================================================ - - -class TestFormatDatetime: - """Test format_datetime() with various locales and parameters.""" - - def test_format_datetime_en_us_short(self) -> None: - """format_datetime() with short style for en-US.""" - ctx = LocaleContext.create("en-US") - dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) - result = ctx.format_datetime(dt, date_style="short") - assert "10" in result or "27" in result - - def test_format_datetime_de_de_short(self) -> None: - """format_datetime() with short style for de-DE.""" - ctx = LocaleContext.create("de-DE") - dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) - result = ctx.format_datetime(dt, date_style="short") - assert "27" in result or "10" in result - - def test_format_datetime_custom_pattern(self) -> None: - """format_datetime() respects custom pattern.""" - ctx = LocaleContext.create("en-US") - dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) - result = ctx.format_datetime(dt, pattern="yyyy-MM-dd") - assert "2025" in result - assert "10" in result - assert "27" in result - - def test_format_datetime_from_iso_string(self) -> None: - """format_datetime() accepts ISO 8601 string.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_datetime( - "2025-10-27", date_style="short" - ) - assert "10" in result or "27" in result - - def test_format_datetime_invalid_string_raises( - self, - ) -> None: - """format_datetime() raises for invalid datetime string.""" - ctx = LocaleContext.create("en-US") - with pytest.raises(FrozenFluentError) as exc_info: - ctx.format_datetime( - "not-a-date", date_style="short" - ) - assert ( - exc_info.value.category == ErrorCategory.FORMATTING - ) - assert "not ISO 8601 format" in str(exc_info.value) - - def test_format_datetime_with_time_style(self) -> None: - """format_datetime() formats date and time together.""" - ctx = LocaleContext.create("en-US") - dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) - result = ctx.format_datetime( - dt, date_style="short", time_style="short" - ) - assert "10" in result or "27" in result - has_time = ( - "14" in result - or "2" in result - or "30" in result - ) - assert has_time - - def test_format_datetime_string_pattern(self) -> None: - """format_datetime() handles string datetime_pattern.""" - ctx = LocaleContext.create("en-US") - dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) - - with patch.object( - ctx.babel_locale.datetime_formats, "get" - ) as mock_get: - mock_get.return_value = "{1} at {0}" - result = ctx.format_datetime( - dt, date_style="medium", time_style="short" - ) - assert "at" in result - - def test_format_datetime_object_without_format_method( - self, - ) -> None: - """format_datetime() when pattern lacks format().""" - ctx = LocaleContext.create("en-US") - dt = datetime(2025, 7, 15, 10, 30, 0, tzinfo=UTC) - - class PatternWithoutFormat: - """Mock pattern without format() method.""" - - def __str__(self) -> str: - return "{1} @ {0}" - - mock_pattern = PatternWithoutFormat() - assert not hasattr(mock_pattern, "format") - - with patch.object( - ctx.babel_locale.datetime_formats, - "get", - return_value=mock_pattern, - ): - result = ctx.format_datetime( - dt, date_style="medium", time_style="short" - ) - assert " @ " in result - - def test_format_datetime_error_raises_formatting_error( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """format_datetime() raises FrozenFluentError on error.""" - def mock_format_date( - *_args: object, **_kwargs: object - ) -> None: - msg = "Mocked format error" - raise ValueError(msg) - - monkeypatch.setattr( - babel_dates, "format_date", mock_format_date - ) - - ctx = LocaleContext.create("en-US") - dt = datetime(2025, 10, 27, 14, 30, tzinfo=UTC) - - with pytest.raises(FrozenFluentError) as exc_info: - ctx.format_datetime(dt, date_style="short") - assert ( - exc_info.value.category == ErrorCategory.FORMATTING - ) - - -# ============================================================================ -# Currency Formatting Tests -# ============================================================================ - - -class TestFormatCurrency: - """Test format_currency() with various locales and parameters.""" - - def test_format_currency_en_us_symbol(self) -> None: - """format_currency() with symbol for en-US.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_currency( - Decimal("123.45"), currency="EUR" - ) - assert "123" in result - - def test_format_currency_lv_lv_symbol(self) -> None: - """format_currency() with symbol for lv-LV.""" - ctx = LocaleContext.create("lv-LV") - result = ctx.format_currency( - Decimal("123.45"), currency="EUR" - ) - assert "123" in result - - def test_format_currency_code_display(self) -> None: - """format_currency() displays currency code.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_currency( - Decimal("123.45"), - currency="USD", - currency_display="code", - ) - assert "USD" in result - assert "123.45" in result - - def test_format_currency_name_display(self) -> None: - """format_currency() displays currency name.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_currency( - Decimal("123.45"), - currency="USD", - currency_display="name", - ) - assert isinstance(result, str) - - def test_format_currency_symbol_display_standard( - self, - ) -> None: - """format_currency() with explicit symbol display.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_currency( - Decimal("123.45"), - currency="EUR", - currency_display="symbol", - ) - assert "123.45" in result - - def test_format_currency_custom_pattern(self) -> None: - """format_currency() respects custom pattern.""" - ctx = LocaleContext.create("en-US") - result = ctx.format_currency( - Decimal("1234.56"), - currency="USD", - pattern="#,##0.00 \xa4", - ) - assert "1,234.56" in result or "1234.56" in result - - def test_format_currency_error_raises_formatting_error( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """format_currency() raises FrozenFluentError on error.""" - def mock_format_currency( - *_args: object, **_kwargs: object - ) -> None: - msg = "Mocked format error" - raise ValueError(msg) - - monkeypatch.setattr( - babel_numbers, - "format_currency", - mock_format_currency, - ) - - ctx = LocaleContext.create("en-US") - with pytest.raises(FrozenFluentError) as exc_info: - ctx.format_currency(Decimal("123.45"), currency="USD") - - assert ( - exc_info.value.category == ErrorCategory.FORMATTING - ) - assert "USD 123.45" in exc_info.value.fallback_value - - -# ============================================================================ -# Internal Helper Tests -# ============================================================================ - - -class TestGetIsoCodePattern: - """Test _get_iso_code_pattern() internal helper.""" - - def test_returns_string_or_none(self) -> None: - """_get_iso_code_pattern() returns string or None.""" - ctx = LocaleContext.create("en-US") - result = ctx._get_iso_code_pattern() - assert result is None or isinstance(result, str) - - def test_doubles_currency_sign(self) -> None: - """Doubles currency sign per CLDR spec.""" - ctx = LocaleContext.create("en-US") - result = ctx._get_iso_code_pattern() - if result is not None: - assert "\xa4\xa4" in result - - def test_none_when_no_standard(self) -> None: - """Returns None when standard pattern missing.""" - ctx = LocaleContext.create("en-US") - - mock_formats: dict[str, None] = {"standard": None} - mock_locale = MagicMock() - type(mock_locale).currency_formats = PropertyMock( - return_value=mock_formats - ) - - original_locale = ctx._babel_locale - object.__setattr__(ctx, "_babel_locale", mock_locale) - - try: - result = ctx._get_iso_code_pattern() - assert result is None - finally: - object.__setattr__( - ctx, "_babel_locale", original_locale - ) - - def test_none_when_no_pattern_attribute(self) -> None: - """Returns None when pattern attribute missing.""" - ctx = LocaleContext.create("en-US") - - mock_pattern = MagicMock(spec=[]) - mock_formats = {"standard": mock_pattern} - mock_locale = MagicMock() - type(mock_locale).currency_formats = PropertyMock( - return_value=mock_formats - ) - - original_locale = ctx._babel_locale - object.__setattr__(ctx, "_babel_locale", mock_locale) - - try: - result = ctx._get_iso_code_pattern() - assert result is None - finally: - object.__setattr__( - ctx, "_babel_locale", original_locale - ) - - def test_none_when_no_currency_placeholder( - self, caplog: pytest.LogCaptureFixture - ) -> None: - """Returns None and logs when no placeholder.""" - ctx = LocaleContext.create("en-US") - - mock_pattern = MagicMock() - mock_pattern.pattern = "#,##0.00" - mock_formats = {"standard": mock_pattern} - mock_locale = MagicMock() - type(mock_locale).currency_formats = PropertyMock( - return_value=mock_formats - ) - - original_locale = ctx._babel_locale - object.__setattr__(ctx, "_babel_locale", mock_locale) - - try: - with caplog.at_level(logging.DEBUG): - result = ctx._get_iso_code_pattern() - - assert result is None - assert any( - "lacks placeholder" in r.message - for r in caplog.records - ) - finally: - object.__setattr__( - ctx, "_babel_locale", original_locale - ) - - -# ============================================================================ -# Currency Pattern Fallback Tests -# ============================================================================ - - -class TestCurrencyPatternFallback: - """Test currency code display fallback paths.""" - - def test_code_display_with_invalid_pattern(self) -> None: - """Code display when pattern lacks placeholder.""" - ctx = LocaleContext.create("en-US") - - class MockPattern: - """Mock pattern without currency placeholder.""" - - pattern = "#,##0.00" - - with ( - patch.object( - ctx.babel_locale.currency_formats, - "get", - return_value=MockPattern(), - ), - patch( - "ftllexengine.runtime.locale_context.logger" - ) as mock_logger, - ): - result = ctx.format_currency( - Decimal("123.45"), - currency="USD", - currency_display="code", - ) - - assert isinstance(result, str) - assert "123" in result - mock_logger.debug.assert_called() - - def test_code_display_with_no_pattern_attribute( - self, - ) -> None: - """Code display when pattern lacks attribute.""" - ctx = LocaleContext.create("en-US") - - class MockPatternWithoutAttr: - """Mock pattern without pattern attribute.""" - - mock_obj = MockPatternWithoutAttr() - assert not hasattr(mock_obj, "pattern") - - with patch.object( - ctx.babel_locale.currency_formats, - "get", - return_value=mock_obj, - ): - result = ctx.format_currency( - Decimal("123.45"), - currency="USD", - currency_display="code", - ) - assert isinstance(result, str) - assert "123" in result - - def test_code_display_with_none_pattern(self) -> None: - """Code display when standard pattern is None.""" - ctx = LocaleContext.create("en-US") - - with patch.object( - ctx.babel_locale.currency_formats, - "get", - return_value=None, - ): - result = ctx.format_currency( - Decimal("123.45"), - currency="USD", - currency_display="code", - ) - assert isinstance(result, str) - assert "123" in result - - -# ============================================================================ -# Long Locale Code Tests -# ============================================================================ - - -class TestLongLocaleCodeCoverage: - """Tests for long locale codes exceeding BCP 47 length.""" - - def test_long_valid_locale_code_warns( - self, caplog: pytest.LogCaptureFixture - ) -> None: - """Long valid locale code triggers warning.""" - from ftllexengine.constants import ( # noqa: PLC0415 - import inside function - MAX_LOCALE_CODE_LENGTH, - ) - - LocaleContext.clear_cache() - - long_locale = "en-US-x-" + "a" * 30 - assert len(long_locale) > MAX_LOCALE_CODE_LENGTH - - with caplog.at_level(logging.WARNING): - ctx = LocaleContext.create(long_locale) - - assert any( - "exceeds typical BCP 47 length" in r.message - for r in caplog.records - ) - assert isinstance(ctx, LocaleContext) - - def test_long_unknown_locale_code_warns( - self, caplog: pytest.LogCaptureFixture - ) -> None: - """Long unknown locale code triggers specific warning.""" - from ftllexengine.constants import ( # noqa: PLC0415 - import inside function - MAX_LOCALE_CODE_LENGTH, - ) - - LocaleContext.clear_cache() - - long_unknown = ( - "xyz-verylongvariantthatshouldexceedlimit" - ) - assert len(long_unknown) > MAX_LOCALE_CODE_LENGTH - - with caplog.at_level(logging.WARNING): - ctx = LocaleContext.create(long_unknown) - - relevant = [ - r.message - for r in caplog.records - if "Unknown locale" in r.message - ] - assert any("exceeds" in msg for msg in relevant) - assert ctx.is_fallback is True - - def test_long_invalid_format_locale_code_rejected(self) -> None: - """Long structurally invalid locale code is rejected before runtime fallback.""" - from ftllexengine.constants import ( # noqa: PLC0415 - import inside function - MAX_LOCALE_CODE_LENGTH, - ) - - LocaleContext.clear_cache() - - long_invalid = ( - "!!!INVALID@@@FORMAT###TOOLONG###LOCALE" - ) - assert len(long_invalid) > MAX_LOCALE_CODE_LENGTH - - with pytest.raises(ValueError, match=r"Invalid locale_code:"): - LocaleContext.create(long_invalid) - - -# ============================================================================ -# Currency Boundary Value Tests -# ============================================================================ - - -class TestCurrencyBoundaryValues: - """Regression tests for currency formatting boundaries.""" - - @pytest.mark.parametrize("value", [ - Decimal(999), - Decimal("999.99"), - Decimal(1000), - Decimal("1000.00"), - Decimal("1000.01"), - Decimal(1001), - ]) - def test_currency_around_1000_boundary( - self, value: Decimal - ) -> None: - """Currency formatting works around 1000 boundary.""" - ctx = LocaleContext.create("en_US") - result = ctx.format_currency(value, currency="USD") - assert isinstance(result, str) - assert result - assert "$" in result or "USD" in result - - @pytest.mark.parametrize("locale", [ - "en_US", "de_DE", "fr_FR", "es_ES", "ja_JP", - "zh_CN", "ar_SA", "ru_RU", "pt_BR", "ko_KR", - "it_IT", "nl_NL", - ]) - def test_currency_1000_across_locales( - self, locale: str - ) -> None: - """Currency formatting for 1000 across locales.""" - ctx = LocaleContext.create(locale) - result = ctx.format_currency( - Decimal(1000), currency="USD" - ) - assert isinstance(result, str) - assert result - assert any(c.isdigit() for c in result) - - @pytest.mark.parametrize("value", [ - Decimal(-1000), - Decimal("-1000.00"), - ]) - def test_negative_1000_currency( - self, value: Decimal - ) -> None: - """Negative 1000 currency values format correctly.""" - ctx = LocaleContext.create("en_US") - result = ctx.format_currency(value, currency="USD") - assert isinstance(result, str) - assert result - assert "-" in result or "(" in result - - @pytest.mark.parametrize("currency", [ - "USD", "EUR", "GBP", "JPY", "CNY", "CHF", "CAD", - "AUD", - ]) - def test_currency_1000_multiple_currencies( - self, currency: str - ) -> None: - """Currency formatting for 1000 with currencies.""" - ctx = LocaleContext.create("en_US") - result = ctx.format_currency( - Decimal(1000), currency=currency - ) - assert isinstance(result, str) - assert result - assert any(c.isdigit() for c in result) - - def test_currency_1000_all_display_modes(self) -> None: - """Currency formatting 1000 with all display modes.""" - ctx = LocaleContext.create("en_US") - value = Decimal(1000) - - result_symbol = ctx.format_currency( - value, currency="USD", currency_display="symbol" - ) - assert "$" in result_symbol - - result_code = ctx.format_currency( - value, currency="USD", currency_display="code" - ) - assert "USD" in result_code - - result_name = ctx.format_currency( - value, currency="USD", currency_display="name" - ) - assert "dollar" in result_name.lower() - - def test_currency_integer_1000(self) -> None: - """Currency formatting handles int 1000.""" - ctx = LocaleContext.create("en_US") - result = ctx.format_currency(1000, currency="USD") - assert isinstance(result, str) - assert "$" in result or "USD" in result - - def test_currency_decimal_1000(self) -> None: - """Currency formatting handles Decimal 1000.""" - ctx = LocaleContext.create("en_US") - result = ctx.format_currency(Decimal(1000), currency="USD") - assert isinstance(result, str) - assert "$" in result or "USD" in result - - -# ============================================================================ -# CACHE INFO AND DATETIME PATTERN COVERAGE -# ============================================================================ - - -class TestLocaleContextCoverageExtra: - """Test LocaleContext cache_info and datetime formatting branches.""" - - def test_cache_info_returns_dict(self) -> None: - """cache_info() returns dict with size, max_size, and locales keys.""" - LocaleContext.clear_cache() - - LocaleContext.create("en-US") - LocaleContext.create("de-DE") - - info = LocaleContext.cache_info() - - assert isinstance(info, dict) - assert "size" in info - assert "max_size" in info - assert "locales" in info - assert info["size"] == 2 - locales = info["locales"] - assert isinstance(locales, tuple) - assert "en_us" in locales or "de_de" in locales - - def test_format_datetime_combined_styles(self) -> None: - """format_datetime with both date and time styles produces non-empty string.""" - ctx = LocaleContext.create("en-US") - - dt = datetime(2024, 6, 15, 14, 30, 0, tzinfo=UTC) - result = ctx.format_datetime(dt, date_style="medium", time_style="short") - - assert isinstance(result, str) - assert len(result) > 0 - - def test_format_datetime_date_only(self) -> None: - """format_datetime with date_style only produces non-empty string.""" - ctx = LocaleContext.create("en-US") - - dt = datetime(2024, 12, 25, 0, 0, 0, tzinfo=UTC) - result = ctx.format_datetime(dt, date_style="long") - - assert isinstance(result, str) - assert "2024" in result or "December" in result - - -class TestLocaleContextBranchCoverageExtra: - """Additional tests for locale_context formatting branch coverage.""" - - def test_format_datetime_with_string_pattern(self) -> None: - """format_datetime with both date and time styles covers combined-pattern path.""" - ctx = LocaleContext.create("en-US") - - dt = datetime(2024, 1, 15, 10, 30, 0, tzinfo=UTC) - - result1 = ctx.format_datetime(dt, date_style="short", time_style="short") - assert isinstance(result1, str) - - result2 = ctx.format_datetime(dt, date_style="full", time_style="full") - assert isinstance(result2, str) - - def test_format_datetime_varied_styles(self) -> None: - """All standard datetime style combinations produce non-empty strings.""" - type _DateTimeStyle = Literal["short", "medium", "long", "full"] - styles: tuple[_DateTimeStyle, ...] = get_args(_DateTimeStyle) - - ctx = LocaleContext.create("en-US") - dt = datetime(2024, 3, 15, 10, 30, 0, tzinfo=UTC) - - for date_style in styles: - for time_style in styles: - result = ctx.format_datetime( - dt, date_style=date_style, time_style=time_style - ) - assert isinstance(result, str) - assert len(result) > 0 - - -class TestLocaleContextCacheRaceCondition: - """LocaleContext cache handles the double-check locking pattern.""" - - def test_cache_hit_in_double_check_pattern(self) -> None: - """Cache hit during the inner lock check returns the cached instance.""" - LocaleContext.clear_cache() - - locale_code = "en_US" - ctx = LocaleContext.create(locale_code) - - cache_key = normalize_locale(locale_code) - with LocaleContext._cache_lock: # pylint: disable=protected-access - LocaleContext._cache.clear() # pylint: disable=protected-access - LocaleContext._cache[cache_key] = ctx # pylint: disable=protected-access - - result = LocaleContext.create(locale_code) - assert result is ctx - - LocaleContext.clear_cache() - - -class TestLocaleContextDatetimePattern: - """LocaleContext formats datetime values using the locale's pattern.""" - - def test_datetime_pattern_without_format_method(self) -> None: - """format_datetime produces a non-empty string for en_US short styles.""" - LocaleContext.clear_cache() - - ctx = LocaleContext.create("en_US") - dt = datetime(2025, 6, 15, 14, 30, 0, tzinfo=UTC) - - result = ctx.format_datetime(dt, date_style="short", time_style="short") - assert result is not None - assert len(result) > 0 - - LocaleContext.clear_cache() - - -class TestLocaleContextCacheLimitCoverage: - """LocaleContext LRU cache eviction at MAX_LOCALE_CACHE_SIZE.""" - - def test_cache_at_limit_evicts_lru_entry(self) -> None: - """When cache reaches MAX_LOCALE_CACHE_SIZE, LRU entry is evicted on next create.""" - LocaleContext.clear_cache() - - locales_to_fill = [f"en_TEST{i:04d}" for i in range(MAX_LOCALE_CACHE_SIZE)] - - for locale_code in locales_to_fill: - ctx = LocaleContext.create(locale_code) - assert ctx is not None - - cache_size = LocaleContext.cache_size() - assert cache_size >= MAX_LOCALE_CACHE_SIZE - - size_before = cache_size - - ctx = LocaleContext.create("de_TESTOVERFLOW") - assert ctx is not None - - cache_size_after = LocaleContext.cache_size() - assert cache_size_after <= MAX_LOCALE_CACHE_SIZE - assert cache_size_after <= size_before + 1 - - LocaleContext.clear_cache() - - -class TestLocaleContextUnexpectedErrorPropagation: - """Unexpected errors propagate instead of being silently caught.""" - - def test_format_number_unexpected_error_propagates( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """RuntimeError in format_number propagates for debugging.""" - ctx = LocaleContext.create_or_raise("en_US") - - def mock_format_decimal(*_args: object, **_kwargs: object) -> str: - msg = "Mocked RuntimeError for testing" - raise RuntimeError(msg) - - monkeypatch.setattr(babel_numbers, "format_decimal", mock_format_decimal) - - with pytest.raises(RuntimeError, match="Mocked RuntimeError"): - ctx.format_number(Decimal("123.45")) - - def test_format_currency_unexpected_error_propagates( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """RuntimeError in format_currency propagates for debugging.""" - ctx = LocaleContext.create_or_raise("en_US") - - def mock_format_currency(*_args: object, **_kwargs: object) -> str: - msg = "Mocked RuntimeError for testing" - raise RuntimeError(msg) - - monkeypatch.setattr(babel_numbers, "format_currency", mock_format_currency) - - with pytest.raises(RuntimeError, match="Mocked RuntimeError"): - ctx.format_currency(Decimal(100), currency="USD") - - -class TestLocaleContextCustomPatternCoverage: - """Custom pattern and currency code fallback branches in format_currency.""" - - def test_format_currency_with_custom_pattern(self) -> None: - """Custom pattern in format_currency is applied to the result.""" - ctx = LocaleContext.create_or_raise("en_US") - - result = ctx.format_currency(Decimal("1234.56"), currency="USD", pattern="#,##0.00 \xa4") - - assert isinstance(result, str) - assert "1,234.56" in result or "1234.56" in result - - def test_format_currency_code_display_fallback( - self, caplog: pytest.LogCaptureFixture, monkeypatch: pytest.MonkeyPatch - ) -> None: - """format_currency logs debug when locale pattern lacks currency placeholder.""" - mock_locale = MagicMock() - mock_pattern = MagicMock() - mock_pattern.pattern = "#,##0.00" # No currency placeholder (missing \xa4) - mock_locale.currency_formats = {"standard": mock_pattern} - - ctx = LocaleContext.create_or_raise("en_US") - original_babel_locale = ctx._babel_locale # pylint: disable=protected-access - - object.__setattr__(ctx, "_babel_locale", mock_locale) - - monkeypatch.setattr( - babel_numbers, - "format_currency", - lambda *_args, **_kwargs: "$100.00", - ) - - try: - with caplog.at_level(logging.DEBUG): - result = ctx.format_currency( - Decimal(100), currency="USD", currency_display="code" - ) - - assert isinstance(result, str) - - assert any( - "lacks placeholder" in record.message - for record in caplog.records - ) - finally: - object.__setattr__(ctx, "_babel_locale", original_babel_locale) - - -class TestLocaleContextCurrencyCodeFallback: - """Currency code display fallback when standard_pattern is None or lacks attributes.""" - - def test_format_currency_code_no_standard_pattern( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """format_currency falls through to default when standard_pattern is None.""" - ctx = LocaleContext.create_or_raise("en_US") - original_babel_locale = ctx._babel_locale # pylint: disable=protected-access - - mock_locale = MagicMock() - mock_locale.currency_formats = {"standard": None} - - object.__setattr__(ctx, "_babel_locale", mock_locale) - - monkeypatch.setattr( - babel_numbers, - "format_currency", - lambda *_args, **_kwargs: "$100.00", - ) - - try: - result = ctx.format_currency(Decimal(100), currency="USD", currency_display="code") - assert isinstance(result, str) - finally: - object.__setattr__(ctx, "_babel_locale", original_babel_locale) - - def test_format_currency_code_pattern_no_attr( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """format_currency falls through to default when standard_pattern lacks 'pattern' attr.""" - ctx = LocaleContext.create_or_raise("en_US") - original_babel_locale = ctx._babel_locale # pylint: disable=protected-access - - mock_locale = MagicMock() - mock_pattern = object() # Plain object with no attributes - mock_locale.currency_formats = {"standard": mock_pattern} - - object.__setattr__(ctx, "_babel_locale", mock_locale) - - monkeypatch.setattr( - babel_numbers, - "format_currency", - lambda *_args, **_kwargs: "$100.00", - ) - - try: - result = ctx.format_currency(Decimal(100), currency="USD", currency_display="code") - assert isinstance(result, str) - finally: - object.__setattr__(ctx, "_babel_locale", original_babel_locale) +from tests.runtime_locale_context_cases.boundaries_and_extras import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_locale_context_cases.construction_and_cache import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_locale_context_cases.datetime_and_currency import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_locale_context_cases.fallback_and_strict import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_locale_context_cases.number_formatting import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_runtime_plural_rules.py b/tests/test_runtime_plural_rules.py index 9c277510..2e70c6fc 100644 --- a/tests/test_runtime_plural_rules.py +++ b/tests/test_runtime_plural_rules.py @@ -1,1060 +1,14 @@ -"""Tests for plural_rules.py - CLDR plural category selection using Babel. - -Comprehensive property-based tests ensuring plural rule correctness across all locales -and number ranges. Critical for multilingual applications with proper pluralization. - -Property-Based Testing Strategy: - Uses Hypothesis to verify mathematical properties and CLDR compliance across - locale families (Germanic, Slavic, Romance, Semitic, etc.). - -Coverage: - - All CLDR plural categories (zero, one, two, few, many, other) - - 30+ representative locales across language families - - Edge cases (unknown locales, large numbers, decimals) - - Babel ImportError path for parser-only installations -""" - -from __future__ import annotations - -import sys -from decimal import Decimal -from unittest.mock import patch - -import pytest -from babel.core import UnknownLocaleError -from hypothesis import assume, event, example, given -from hypothesis import strategies as st - -import ftllexengine.core.babel_compat as _bc -from ftllexengine.runtime.plural_rules import select_plural_category - -# ============================================================================ -# Hypothesis Strategies -# ============================================================================ - -# Valid locale codes across language families -LOCALE_CODES = st.sampled_from([ - "en", "en_US", "en_GB", - "lv", "lv_LV", - "de", "de_DE", - "pl", "pl_PL", - "ru", "ru_RU", - "ar", "ar_SA", - "fr", "fr_FR", - "es", "es_ES", - "it", "it_IT", - "pt", "pt_PT", "pt_BR", - "zh", "zh_CN", - "ja", "ja_JP", - "ko", "ko_KR", - "hi", "hi_IN", - "bn", "bn_BD", - "vi", "vi_VN", - "tr", "tr_TR", - "th", "th_TH", - "uk", "uk_UA", -]) - -# Numbers strategy (integers and decimals) -NUMBERS = st.one_of( - st.integers(min_value=0, max_value=1000000), - st.decimals( - min_value=Decimal(0), max_value=Decimal(1000000), - allow_nan=False, allow_infinity=False, - ), -) - -# ============================================================================ -# Babel ImportError Tests (lines 67-70) -# ============================================================================ - - -class TestPluralRulesBabelImportError: - """Test ImportError path when Babel is not installed (lines 67-70).""" - - def test_select_plural_category_raises_babel_import_error_when_babel_unavailable( - self, - ) -> None: - """select_plural_category raises BabelImportError when Babel unavailable.""" - from ftllexengine.core.babel_compat import ( # noqa: PLC0415 - test assertion - BabelImportError, - ) - - # Temporarily hide babel from sys.modules - babel_module = sys.modules.pop("babel", None) - babel_core = sys.modules.pop("babel.core", None) - babel_dates = sys.modules.pop("babel.dates", None) - babel_numbers = sys.modules.pop("babel.numbers", None) - - # Reset sentinel so _check_babel_available() re-evaluates under the mock - _bc._babel_available = None - - try: - with patch.dict(sys.modules, {"babel": None, "babel.core": None}): - original_import = __import__ - - def mock_import_babel( - name: str, - globals_dict: dict[str, object] | None = None, - locals_dict: dict[str, object] | None = None, - fromlist: tuple[str, ...] = (), - level: int = 0, - ) -> object: - if name == "babel" or name.startswith("babel."): - err = ModuleNotFoundError("No module named 'babel'") - err.name = "babel" - raise err - return original_import(name, globals_dict, locals_dict, fromlist, level) - - with patch("builtins.__import__", side_effect=mock_import_babel): - with pytest.raises(BabelImportError) as exc_info: - select_plural_category(42, "en-US") - - assert "select_plural_category" in str(exc_info.value) - finally: - # Restore babel modules - if babel_module is not None: - sys.modules["babel"] = babel_module - if babel_core is not None: - sys.modules["babel.core"] = babel_core - if babel_dates is not None: - sys.modules["babel.dates"] = babel_dates - if babel_numbers is not None: - sys.modules["babel.numbers"] = babel_numbers - # Reset sentinel so subsequent tests reinitialize with Babel available - _bc._babel_available = None - - -# ============================================================================ -# Property Tests - Invariants -# ============================================================================ - - -class TestPluralRuleInvariants: - """Property-based tests for invariants that must hold for all plural rules.""" - - @given(n=NUMBERS, locale=LOCALE_CODES) - @example(n=0, locale="en_US") - @example(n=1, locale="en_US") - @example(n=2, locale="ar_SA") - def test_always_returns_valid_category(self, n: int | Decimal, locale: str) -> None: - """Plural selection always returns valid CLDR category. - - Property: For all n and locale, result ∈ {zero, one, two, few, many, other} - """ - result = select_plural_category(n, locale) - - valid_categories = {"zero", "one", "two", "few", "many", "other"} - assert result in valid_categories - - n_type = type(n).__name__ - event(f"category={result}") - event(f"n_type={n_type}") - event(f"locale={locale}") - - @given(n=NUMBERS, locale=LOCALE_CODES) - @example(n=42, locale="lv_LV") - def test_never_returns_none(self, n: int | Decimal, locale: str) -> None: - """Plural selection never returns None. - - Property: For all n and locale, result is not None - """ - result = select_plural_category(n, locale) - - assert result is not None - event(f"category={result}") - - @given(n=st.integers(min_value=0, max_value=1000), locale=LOCALE_CODES) - @example(n=1, locale="en_US") - @example(n=5, locale="ru_RU") - def test_integer_consistency(self, n: int, locale: str) -> None: - """Same integer always returns same category for same locale. - - Property: f(n, locale) = f(n, locale) (idempotence) - """ - result1 = select_plural_category(n, locale) - result2 = select_plural_category(n, locale) - - assert result1 == result2 - event(f"category={result1}") - event(f"locale={locale}") - - @given(n=NUMBERS) - @example(n=0) - @example(n=1) - @example(n=42) - def test_unknown_locale_defaults_to_cldr_root(self, n: int | Decimal) -> None: - """Unknown locale uses CLDR root rules (always 'other'). - - Property: For all n, select_plural_category(n, unknown) = "other" - """ - result = select_plural_category(n, "xx_XX") - - assert result == "other" - n_type = type(n).__name__ - event(f"n_type={n_type}") - - -# ============================================================================ -# Property Tests - Locale-Specific Rules -# ============================================================================ - - -class TestEnglishPluralRules: - """Property-based tests for English plural rules (one/other).""" - - @given(n=st.integers(min_value=2, max_value=1000)) - @example(n=2) - @example(n=100) - def test_integers_not_one_are_other(self, n: int) -> None: - """English: integers != 1 are 'other'. - - Property: For all n in Z where n != 1, category = "other" - """ - assume(n != 1) - - result = select_plural_category(n, "en") - - assert result == "other" - event(f"n={n}") - - def test_one_is_one(self) -> None: - """English: 1 is 'one'.""" - assert select_plural_category(1, "en") == "one" - - def test_zero_is_other(self) -> None: - """English: 0 is 'other'.""" - assert select_plural_category(0, "en") == "other" - - @given(n=st.decimals( - min_value=Decimal("0.1"), max_value=Decimal(1000), - allow_nan=False, allow_infinity=False, - )) - @example(n=Decimal("0.5")) - @example(n=Decimal("2.5")) - def test_decimals_are_other(self, n: Decimal) -> None: - """English: Decimals not equal to 1 are 'other'. - - Property: For all n in Q where n != 1, category = "other" - """ - assume(n != Decimal(1)) - - result = select_plural_category(n, "en") - - assert result == "other" - is_whole = n % 1 == 0 - event(f"decimal_is_whole={is_whole}") - - -class TestLatvianPluralRules: - """Property-based tests for Latvian plural rules (zero/one/other).""" - - def test_zero_is_zero(self) -> None: - """Latvian: 0 is 'zero'.""" - assert select_plural_category(0, "lv") == "zero" - - @given(n=st.integers(min_value=1, max_value=1000)) - @example(n=1) - @example(n=21) - @example(n=11) - def test_rules_consistency(self, n: int) -> None: - """Latvian: rules are consistent with CLDR. - - Property: Category determined by modulo operations per CLDR spec - """ - result = select_plural_category(n, "lv") - - i_mod_10 = n % 10 - i_mod_100 = n % 100 - - event(f"category={result}") - event(f"n_mod_10={i_mod_10}") - if i_mod_10 == 0: - assert result in {"zero", "other"} - elif i_mod_10 == 1 and i_mod_100 != 11: - assert result == "one" - else: - assert result in {"zero", "other"} - - -class TestSlavicPluralRules: - """Property-based tests for Slavic languages (Russian, Polish).""" - - @given(n=st.integers(min_value=1, max_value=1000)) - @example(n=1) - @example(n=21) - @example(n=11) - def test_one_rule(self, n: int) -> None: - """Slavic: numbers ending in 1 (but not 11) are 'one'. - - Property: n % 10 = 1 AND n % 100 ≠ 11 => category = "one" - """ - i_mod_10 = n % 10 - i_mod_100 = n % 100 - - result = select_plural_category(n, "ru") - - event(f"category={result}") - event(f"n_mod_10={i_mod_10}") - if i_mod_10 == 1 and i_mod_100 != 11: - assert result == "one" - - @given(n=st.integers(min_value=2, max_value=1000)) - @example(n=2) - @example(n=22) - @example(n=12) - def test_few_rule(self, n: int) -> None: - """Slavic: numbers ending in 2-4 (but not 12-14) are 'few'. - - Property: 2 ≤ n % 10 ≤ 4 AND NOT 12 ≤ n % 100 ≤ 14 => category = "few" - """ - i_mod_10 = n % 10 - i_mod_100 = n % 100 - - result = select_plural_category(n, "ru") - - event(f"category={result}") - event(f"n_mod_10={i_mod_10}") - if 2 <= i_mod_10 <= 4 and not 12 <= i_mod_100 <= 14: - assert result == "few" - - @given(n=st.integers(min_value=5, max_value=1000)) - @example(n=5) - @example(n=15) - @example(n=100) - def test_many_rule(self, n: int) -> None: - """Slavic: specific patterns are 'many'. - - Property: (n % 10 = 0) OR (5 ≤ n % 10 ≤ 9) OR (11 ≤ n % 100 ≤ 14) => category = "many" - """ - i_mod_10 = n % 10 - i_mod_100 = n % 100 - - result = select_plural_category(n, "ru") - - event(f"category={result}") - event(f"n_mod_10={i_mod_10}") - if i_mod_10 == 0 or 5 <= i_mod_10 <= 9 or 11 <= i_mod_100 <= 14: - assert result == "many" - - @given( - fraction=st.decimals( - min_value=Decimal("0.01"), max_value=Decimal("999.99"), - allow_nan=False, allow_infinity=False, - ) - ) - @example(fraction=Decimal("0.5")) - @example(fraction=Decimal("1.5")) - def test_fractional_numbers_return_other(self, fraction: Decimal) -> None: - """Slavic: fractional numbers return 'other'. - - Property: For all n in Q where n not in Z, category = "other" - """ - assume(fraction % 1 != 0) - - category = select_plural_category(fraction, "ru_RU") - - event(f"category={category}") - assert category == "other" - - -class TestArabicPluralRules: - """Property-based tests for Arabic plural rules (all 6 categories).""" - - def test_zero_is_zero(self) -> None: - """Arabic: 0 is 'zero'.""" - assert select_plural_category(0, "ar") == "zero" - - def test_one_is_one(self) -> None: - """Arabic: 1 is 'one'.""" - assert select_plural_category(1, "ar") == "one" - - def test_two_is_two(self) -> None: - """Arabic: 2 is 'two'.""" - assert select_plural_category(2, "ar") == "two" - - @given(n=st.integers(min_value=3, max_value=10)) - @example(n=3) - @example(n=10) - def test_three_to_ten_are_few(self, n: int) -> None: - """Arabic: 3-10 are 'few'. - - Property: 3 ≤ n ≤ 10 => category = "few" - """ - result = select_plural_category(n, "ar") - event(f"n={n}") - assert result == "few" - - @given(n=st.integers(min_value=11, max_value=99)) - @example(n=11) - @example(n=99) - def test_eleven_to_ninetynine_are_many(self, n: int) -> None: - """Arabic: 11-99 are 'many'. - - Property: 11 ≤ n ≤ 99 => category = "many" - """ - result = select_plural_category(n, "ar") - event(f"n={n}") - assert result == "many" - - @given(n=st.integers(min_value=100, max_value=1000)) - @example(n=100) - @example(n=500) - def test_hundreds_valid_category(self, n: int) -> None: - """Arabic: 100+ return valid category based on remainder. - - Property: For all n ≥ 100, category ∈ valid_categories - """ - result = select_plural_category(n, "ar") - event(f"category={result}") - assert result in {"zero", "one", "two", "few", "many", "other"} - - -# ============================================================================ -# Property Tests - Edge Cases -# ============================================================================ - - -class TestPluralRuleEdgeCases: - """Property-based tests for edge cases.""" - - @given(locale=st.text(min_size=1, max_size=10)) - @example(locale="invalid") - @example(locale="xx_YY") - def test_arbitrary_locale_never_crashes(self, locale: str) -> None: - """Arbitrary locale never crashes. - - Property: For all locale strings, select_plural_category does not raise - """ - result = select_plural_category(42, locale) - event(f"locale_len={len(locale)}") - event(f"category={result}") - assert isinstance(result, str) - - @given(n=st.decimals( - min_value=Decimal(-1000), max_value=Decimal(0), - allow_nan=False, allow_infinity=False, - )) - @example(n=Decimal(-1)) - @example(n=Decimal(-100)) - def test_negative_numbers_return_valid_category(self, n: Decimal) -> None: - """Negative numbers return valid category. - - Property: For all n < 0, category ∈ valid_categories - """ - result = select_plural_category(n, "en") - event(f"category={result}") - assert result in {"zero", "one", "two", "few", "many", "other"} - - @given(locale=LOCALE_CODES) - @example(locale="en_US") - @example(locale="ru_RU") - def test_very_large_numbers(self, locale: str) -> None: - """Very large numbers work correctly. - - Property: For all locales, large numbers return valid category - """ - result = select_plural_category(10**9, locale) - event(f"locale={locale}") - event(f"category={result}") - assert result in {"zero", "one", "two", "few", "many", "other"} - - -# ============================================================================ -# Property Tests - Metamorphic Properties -# ============================================================================ - - -class TestPluralRuleMetamorphic: - """Metamorphic property tests.""" - - @given( - n=st.integers(min_value=0, max_value=1000), - locale=st.sampled_from(["fr_FR", "it_IT", "pt_PT", "pt_BR"]), - ) - @example(n=1, locale="fr_FR") - @example(n=50, locale="it_IT") - def test_adding_hundred_preserves_validity_for_romance( - self, n: int, locale: str - ) -> None: - """For Romance languages, adding 100 preserves category validity. - - Metamorphic property: If f(n) is valid, then f(n+100) is also valid - """ - result1 = select_plural_category(n, locale) - result2 = select_plural_category(n + 100, locale) - - event(f"locale={locale}") - event(f"category_n={result1}") - valid = {"zero", "one", "two", "few", "many", "other"} - assert result1 in valid - assert result2 in valid - - @given(n=st.integers(min_value=1, max_value=100)) - @example(n=1) - @example(n=50) - def test_english_german_similarity_for_small_numbers(self, n: int) -> None: - """English and German have similar rules for small numbers. - - Metamorphic property: Both use only one/other categories - """ - en_result = select_plural_category(n, "en") - de_result = select_plural_category(n, "de") - - event(f"category_en={en_result}") - assert en_result in {"one", "other"} - assert de_result in {"one", "other"} - - if n == 1: - assert en_result == de_result == "one" - - -# ============================================================================ -# Decimal Support Tests -# ============================================================================ - - -class TestDecimalSupport: - """Test Decimal type support in plural category selection.""" - - @given(n=st.integers(min_value=0, max_value=1000)) - @example(n=0) - @example(n=1) - @example(n=5) - def test_decimal_matches_integer(self, n: int) -> None: - """Decimal and integer with same value produce same category. - - Property: For all n in Z, f(n) = f(Decimal(n)) - """ - int_result = select_plural_category(n, "en_US") - decimal_result = select_plural_category(Decimal(n), "en_US") - - event(f"category={int_result}") - assert int_result == decimal_result - - def test_decimal_one_is_one(self) -> None: - """Decimal(1) matches 'one' category in English.""" - result = select_plural_category(Decimal(1), "en_US") - assert result == "one" - - def test_decimal_zero_is_other(self) -> None: - """Decimal(0) matches 'other' category in English.""" - result = select_plural_category(Decimal(0), "en_US") - assert result == "other" - - def test_decimal_fractional_is_other(self) -> None: - """Decimal fractional values match 'other' category in English.""" - result = select_plural_category(Decimal("1.5"), "en_US") - assert result == "other" - - -# ============================================================================ -# Ultimate Fallback Tests -# ============================================================================ - - -class TestUltimateFallback: - """Test ultimate fallback when both locale and root fail.""" - - def test_ultimate_fallback_when_root_locale_also_fails(self) -> None: - """Return 'other' when even root locale loading fails (lines 83-87). - - This is defensive programming - should never happen with valid Babel installation. - """ - with patch("ftllexengine.core.locale_utils.get_babel_locale") as mock_get: - mock_get.side_effect = UnknownLocaleError("mocked failure") - - result = select_plural_category(42, "completely_invalid_locale") - assert result == "other" - - def test_ultimate_fallback_with_value_error(self) -> None: - """Return 'other' when get_babel_locale raises ValueError (lines 83-87).""" - with patch("ftllexengine.core.locale_utils.get_babel_locale") as mock_get: - mock_get.side_effect = ValueError("mocked failure") - - result = select_plural_category(1, "invalid") - assert result == "other" - - result = select_plural_category(0, "invalid") - assert result == "other" - - result = select_plural_category(100, "invalid") - assert result == "other" - - -# ============================================================================ -# Locale Format Tests -# ============================================================================ - - -class TestLocaleFormats: - """Test various locale code formats.""" - - def test_locale_case_insensitive(self) -> None: - """Locale code is case-insensitive.""" - result_upper = select_plural_category(0, "LV_LV") - result_lower = select_plural_category(0, "lv_lv") - result_mixed = select_plural_category(0, "Lv_LV") - - assert result_upper == "zero" - assert result_lower == "zero" - assert result_mixed == "zero" - - def test_short_locale_code_without_region(self) -> None: - """Short locale codes (without region) work correctly.""" - result = select_plural_category(0, "lv") - assert result == "zero" - - def test_bcp47_hyphen_format_supported(self) -> None: - """BCP-47 format with hyphens (en-US) works correctly.""" - result = select_plural_category(1, "en-US") - assert result == "one" - - result = select_plural_category(0, "lv-LV") - assert result == "zero" - - -# ============================================================================ -# Precision Parameter Tests -# ============================================================================ - - -class TestPrecisionParameter: - """Test precision parameter for CLDR v operand handling (lines 118-121). - - The precision parameter is critical for NUMBER() formatting. It controls - the CLDR v operand (fraction digit count), which affects plural category - selection in many locales. - - Key property: 1 (integer) vs 1.00 (precision=2) may have different plural - categories because they have different v values (v=0 vs v=2). - """ - - def test_precision_changes_english_one_to_other(self) -> None: - """English: precision converts 'one' to 'other' (lines 118-121). - - Critical case: 1 is "one" but 1.00 (with v=2) is "other" in English. - This is the primary use case for the precision parameter. - """ - result_no_precision = select_plural_category(1, "en_US") - result_with_precision = select_plural_category(1, "en_US", precision=2) - - assert result_no_precision == "one" - assert result_with_precision == "other" - - @given( - n=st.integers(min_value=0, max_value=1000), - precision=st.integers(min_value=1, max_value=10), - ) - @example(n=1, precision=1) - @example(n=1, precision=2) - @example(n=42, precision=5) - def test_precision_always_returns_valid_category( - self, n: int, precision: int - ) -> None: - """Precision parameter always returns valid CLDR category (lines 118-121). - - Property: For all n, precision, and locale, result in valid_categories - """ - result = select_plural_category(n, "en_US", precision=precision) - - event(f"category={result}") - event(f"precision={precision}") - valid_categories = {"zero", "one", "two", "few", "many", "other"} - assert result in valid_categories - - @given( - n=st.decimals( - min_value=Decimal(0), max_value=Decimal(100), allow_nan=False, allow_infinity=False - ), - precision=st.integers(min_value=1, max_value=6), - ) - @example(n=Decimal("1.5"), precision=2) - @example(n=Decimal("42.7"), precision=1) - def test_precision_with_fractional_decimals(self, n: Decimal, precision: int) -> None: - """Precision works correctly with Decimal inputs (lines 118-121). - - Property: Decimal values are quantized correctly for plural selection - """ - result = select_plural_category(n, "en_US", precision=precision) - - event(f"category={result}") - event(f"precision={precision}") - valid_categories = {"zero", "one", "two", "few", "many", "other"} - assert result in valid_categories - - @given( - n=st.integers(min_value=0, max_value=100), - precision=st.integers(min_value=1, max_value=8), - ) - @example(n=1, precision=1) - @example(n=1, precision=5) - def test_precision_with_decimals(self, n: int, precision: int) -> None: - """Precision works correctly with Decimal inputs (lines 118-121). - - Property: Decimal(n) with precision is handled correctly - """ - decimal_n = Decimal(n) - result = select_plural_category(decimal_n, "en_US", precision=precision) - - event(f"category={result}") - event(f"precision={precision}") - valid_categories = {"zero", "one", "two", "few", "many", "other"} - assert result in valid_categories - - def test_precision_one_formats_to_one_decimal_place(self) -> None: - """Precision=1 formats to one decimal place (lines 118-121).""" - result = select_plural_category(1, "en_US", precision=1) - assert result == "other" - - result = select_plural_category(5, "en_US", precision=1) - assert result == "other" - - def test_precision_zero_ignored(self) -> None: - """Precision=0 is ignored (condition precision > 0 on line 111). - - When precision=0, the code takes the else branch (line 124), not lines 118-121. - """ - result_no_precision = select_plural_category(1, "en_US") - result_precision_zero = select_plural_category(1, "en_US", precision=0) - - assert result_no_precision == "one" - assert result_precision_zero == "one" - - def test_precision_none_ignored(self) -> None: - """Precision=None is ignored (condition precision is not None on line 111). - - When precision=None, the code takes the else branch (line 124), not lines 118-121. - """ - result_no_precision = select_plural_category(1, "en_US") - result_precision_none = select_plural_category(1, "en_US", precision=None) - - assert result_no_precision == "one" - assert result_precision_none == "one" - - @given( - n=st.integers(min_value=0, max_value=100), - precision=st.integers(min_value=1, max_value=5), - locale=LOCALE_CODES, - ) - @example(n=1, precision=2, locale="en_US") - @example(n=1, precision=2, locale="ru_RU") - @example(n=0, precision=1, locale="lv_LV") - def test_precision_consistency_across_locales( - self, n: int, precision: int, locale: str - ) -> None: - """Precision produces consistent results across locales (lines 118-121). - - Property: Same (n, precision, locale) always returns same category - """ - result1 = select_plural_category(n, locale, precision=precision) - result2 = select_plural_category(n, locale, precision=precision) - - event(f"locale={locale}") - event(f"precision={precision}") - event(f"category={result1}") - assert result1 == result2 - - def test_precision_large_value(self) -> None: - """Precision handles large precision values correctly (lines 118-121).""" - result = select_plural_category(1, "en_US", precision=10) - assert result == "other" - - result = select_plural_category(42, "en_US", precision=15) - assert result == "other" - - @given( - n=st.integers(min_value=1, max_value=100), - precision=st.integers(min_value=1, max_value=6), - ) - @example(n=1, precision=1) - @example(n=21, precision=2) - @example(n=11, precision=1) - def test_precision_affects_slavic_rules( - self, n: int, precision: int - ) -> None: - """Precision affects Slavic plural rules (lines 118-121). - - In Slavic languages, integers have complex rules, but formatted decimals - typically fall into the "other" category. - """ - result_no_precision = select_plural_category(n, "ru_RU") - result_with_precision = select_plural_category(n, "ru_RU", precision=precision) - - event(f"category_no_precision={result_no_precision}") - event(f"category_with_precision={result_with_precision}") - event(f"precision={precision}") - valid_categories = {"zero", "one", "two", "few", "many", "other"} - assert result_no_precision in valid_categories - assert result_with_precision in valid_categories - - @given( - n=st.integers(min_value=0, max_value=10), - precision=st.integers(min_value=1, max_value=4), - ) - @example(n=0, precision=1) - @example(n=2, precision=2) - def test_precision_with_arabic_complex_rules( - self, n: int, precision: int - ) -> None: - """Precision works with Arabic's complex 6-category system (lines 118-121). - - Property: Precision affects category selection in all locale systems - """ - result = select_plural_category(n, "ar_SA", precision=precision) - - event(f"category={result}") - event(f"precision={precision}") - valid_categories = {"zero", "one", "two", "few", "many", "other"} - assert result in valid_categories - - def test_precision_quantization_correctness(self) -> None: - """Precision quantizes numbers correctly (lines 118-121). - - Verifies the Decimal quantization logic produces expected v operand. - """ - result = select_plural_category(5, "en_US", precision=2) - assert result == "other" - - result = select_plural_category(0, "en_US", precision=3) - assert result == "other" - - result = select_plural_category(100, "en_US", precision=1) - assert result == "other" - - -# ============================================================================ -# Rounding Consistency Tests (ROUND_HALF_UP) -# ============================================================================ - - -class TestRoundingConsistency: - """Tests that plural selection rounding matches formatting rounding. - - Both plural_rules.py and locale_context.py use ROUND_HALF_EVEN (Babel default) - so that the displayed number and its plural form always agree. - Half-values (x.5) round to the nearest even digit in both paths. - """ - - def test_half_value_rounds_even_for_plural(self) -> None: - """2.5 with precision=0 rounds to 2, selecting 'other' in English.""" - # 2.5 -> 2 (ROUND_HALF_EVEN: 2 is even), which is 'other' in English - result = select_plural_category(Decimal("2.5"), "en_US", precision=0) - assert result == "other" - - def test_half_value_3_5_rounds_up_for_plural(self) -> None: - """3.5 with precision=0 rounds to 4, selecting 'other' in English.""" - # 3.5 -> 4 (ROUND_HALF_EVEN: 4 is even) - result = select_plural_category(Decimal("3.5"), "en_US", precision=0) - assert result == "other" - - def test_half_value_0_5_rounds_to_zero_for_plural(self) -> None: - """0.5 with precision=0 rounds to 0, selecting 'other' in English.""" - # 0.5 -> 0 (ROUND_HALF_EVEN: 0 is even), which is 'other' in English - result = select_plural_category(Decimal("0.5"), "en_US", precision=0) - assert result == "other" - - def test_half_value_1_5_rounds_up_for_plural(self) -> None: - """1.5 with precision=0 rounds to 2, selecting 'other' in English.""" - # 1.5 -> 2 (ROUND_HALF_EVEN: 2 is even) - result = select_plural_category(Decimal("1.5"), "en_US", precision=0) - assert result == "other" - - def test_rounding_matches_formatting_at_half_values(self) -> None: - """Verify that Decimal quantization uses ROUND_HALF_EVEN, matching Babel. - - This is the core consistency property: the number displayed to the user - and the plural category selected must agree on rounding direction. - """ - from decimal import ROUND_HALF_EVEN # noqa: PLC0415 - import inside function - - test_cases = [ - (Decimal("0.5"), 0, Decimal(0)), - (Decimal("1.5"), 0, Decimal(2)), - (Decimal("2.5"), 0, Decimal(2)), - (Decimal("3.5"), 0, Decimal(4)), - (Decimal("1.005"), 2, Decimal("1.00")), - (Decimal("1.015"), 2, Decimal("1.02")), - (Decimal("2.445"), 2, Decimal("2.44")), - ] - - for value, precision, expected_rounded in test_cases: - quantizer = Decimal(10) ** -precision - rounded = value.quantize(quantizer, rounding=ROUND_HALF_EVEN) - assert rounded == expected_rounded, ( - f"Expected {value} with precision={precision} to round to " - f"{expected_rounded}, got {rounded}" - ) - - @given( - n=st.decimals( - min_value=Decimal(0), max_value=Decimal(100), allow_nan=False, allow_infinity=False - ), - precision=st.integers(min_value=0, max_value=4), - ) - @example(n=Decimal("0.5"), precision=0) - @example(n=Decimal("2.5"), precision=0) - @example(n=Decimal("3.5"), precision=0) - @example(n=Decimal("1.005"), precision=2) - def test_plural_rounding_direction_property( - self, n: Decimal, precision: int - ) -> None: - """Plural rounding direction matches ROUND_HALF_EVEN for all inputs. - - Property: The Decimal value used for plural selection must equal the - value obtained by ROUND_HALF_EVEN quantization. - """ - from decimal import ROUND_HALF_EVEN # noqa: PLC0415 - import inside function - - quantizer = Decimal(10) ** -precision - expected = n.quantize(quantizer, rounding=ROUND_HALF_EVEN) - - # The plural category must correspond to the ROUND_HALF_EVEN result. - # We verify indirectly: call select_plural_category with precision, - # then call again with the explicitly-rounded value (no precision). - category_via_precision = select_plural_category(n, "en_US", precision=precision) - category_via_rounded = select_plural_category(expected, "en_US") - - event(f"category_via_precision={category_via_precision}") - event(f"precision={precision}") - assert category_via_precision == category_via_rounded, ( - f"Rounding mismatch for n={n}, precision={precision}: " - f"precision path gave '{category_via_precision}', " - f"explicitly rounded {expected} gave '{category_via_rounded}'" - ) - - -# ============================================================================ -# SLAVIC PLURAL RULE COVERAGE -# ============================================================================ - - -class TestSlavicRuleReturnOther: - """Slavic plural rules return 'other' for numbers not matching one/few/many.""" - - def test_slavic_rule_return_other(self) -> None: - """Polish plural rules return 'many' or 'other' for 111 (ends in 1 but mod 100 == 11).""" - # 111 % 10 = 1, 111 % 100 = 11 - # Polish: 'one' requires mod_100 != 11, so 111 skips 'one' - # Polish: 'few' requires 2-4, so 111 skips 'few' - # Polish: 'many' covers 0 and 5-9 and 11-14; 111 does not match (mod_10 == 1) - # Remaining cases return 'other' - result = select_plural_category(111, "pl") - assert result in ["many", "other"] - - -# ============================================================================ -# Ordinal Plural Rule Tests -# ============================================================================ - - -class TestOrdinalPluralRules: - """Tests for the ordinal=True parameter using CLDR ordinal plural rules. - - Ordinal rules apply to rank/position contexts (1st, 2nd, 3rd, ...). - English ordinal rules: 1->one (1st), 2->two (2nd), 3->few (3rd), - 4+->other unless ends in 1/2/3 with specific exceptions. - """ - - def test_english_ordinal_one(self) -> None: - """English ordinal: 1 -> 'one' (1st).""" - assert select_plural_category(1, "en_US", ordinal=True) == "one" - - def test_english_ordinal_two(self) -> None: - """English ordinal: 2 -> 'two' (2nd).""" - assert select_plural_category(2, "en_US", ordinal=True) == "two" - - def test_english_ordinal_few(self) -> None: - """English ordinal: 3 -> 'few' (3rd).""" - assert select_plural_category(3, "en_US", ordinal=True) == "few" - - def test_english_ordinal_other(self) -> None: - """English ordinal: 4+ (no suffix rule) -> 'other' (4th).""" - assert select_plural_category(4, "en_US", ordinal=True) == "other" - - def test_english_ordinal_eleven(self) -> None: - """English ordinal: 11 -> 'other' (11th, not 11st).""" - assert select_plural_category(11, "en_US", ordinal=True) == "other" - - def test_english_ordinal_twelve(self) -> None: - """English ordinal: 12 -> 'other' (12th, not 12nd).""" - assert select_plural_category(12, "en_US", ordinal=True) == "other" - - def test_english_ordinal_thirteen(self) -> None: - """English ordinal: 13 -> 'other' (13th, not 13rd).""" - assert select_plural_category(13, "en_US", ordinal=True) == "other" - - def test_english_ordinal_twenty_one(self) -> None: - """English ordinal: 21 -> 'one' (21st).""" - assert select_plural_category(21, "en_US", ordinal=True) == "one" - - def test_english_ordinal_twenty_two(self) -> None: - """English ordinal: 22 -> 'two' (22nd).""" - assert select_plural_category(22, "en_US", ordinal=True) == "two" - - def test_english_ordinal_twenty_three(self) -> None: - """English ordinal: 23 -> 'few' (23rd).""" - assert select_plural_category(23, "en_US", ordinal=True) == "few" - - def test_ordinal_false_is_default(self) -> None: - """ordinal=False (default) uses cardinal rules, same as omitting the parameter.""" - assert ( - select_plural_category(1, "en_US") - == select_plural_category(1, "en_US", ordinal=False) - ) - # Ordinal and cardinal differ for n=2 in English: - # cardinal: 2 -> "other"; ordinal: 2 -> "two" - assert select_plural_category(2, "en_US") == "other" - assert select_plural_category(2, "en_US", ordinal=True) == "two" - - @given(n=st.integers(min_value=0, max_value=1000), locale=LOCALE_CODES) - @example(n=1, locale="en_US") - @example(n=2, locale="en_US") - @example(n=3, locale="en_US") - @example(n=11, locale="en_US") - def test_ordinal_always_returns_valid_category( - self, n: int, locale: str - ) -> None: - """Ordinal selection always returns a valid CLDR category. - - Property: For all n and locale, ordinal result in valid_categories - """ - result = select_plural_category(n, locale, ordinal=True) - - valid_categories = {"zero", "one", "two", "few", "many", "other"} - assert result in valid_categories - - event(f"category={result}") - event(f"locale={locale}") - - @given(n=st.integers(min_value=0, max_value=1000), locale=LOCALE_CODES) - def test_ordinal_deterministic(self, n: int, locale: str) -> None: - """Ordinal selection is deterministic: same inputs produce same result.""" - result1 = select_plural_category(n, locale, ordinal=True) - result2 = select_plural_category(n, locale, ordinal=True) - - assert result1 == result2 - event(f"category={result1}") - - @given(n=NUMBERS, locale=LOCALE_CODES) - def test_ordinal_never_crashes(self, n: int | Decimal, locale: str) -> None: - """Ordinal selection never crashes for any valid n/locale combination.""" - result = select_plural_category(n, locale, ordinal=True) - - assert isinstance(result, str) - event(f"category={result}") - event(f"n_type={type(n).__name__}") - - def test_ordinal_unknown_locale_falls_back_to_other(self) -> None: - """Unknown locale falls back to 'other' for ordinal rules.""" - result = select_plural_category(1, "xx_XX", ordinal=True) - assert result == "other" - - def test_ordinal_welsh_has_multiple_categories(self) -> None: - """Welsh (cy) ordinal rules use multiple categories (zero, one, two, few, many, other).""" - # Welsh ordinals use all 6 categories - results = {select_plural_category(n, "cy", ordinal=True) for n in range(10)} - # Welsh ordinals produce at least 2 distinct categories - assert len(results) >= 2 - valid = {"zero", "one", "two", "few", "many", "other"} - assert results <= valid +"""Aggregated runtime plural rules test surface.""" + +from tests.runtime_plural_rules_cases.babel_import_error_tests_lines_67_70 import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.decimal_support_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.locale_format_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.ordinal_plural_rule_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.precision_parameter_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.property_tests_edge_cases import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.property_tests_invariants import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.property_tests_locale_specific_rules import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.property_tests_metamorphic_properties import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.rounding_consistency_tests_round_half_up import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.slavic_plural_rule_coverage import * # noqa: F403 - re-export split test surface +from tests.runtime_plural_rules_cases.ultimate_fallback_tests import * # noqa: F403 - re-export split test surface diff --git a/tests/test_runtime_resolver_depth_cycles.py b/tests/test_runtime_resolver_depth_cycles.py index 844215d7..84dacdd4 100644 --- a/tests/test_runtime_resolver_depth_cycles.py +++ b/tests/test_runtime_resolver_depth_cycles.py @@ -1,1311 +1,12 @@ -"""Resolver depth limiting and cycle detection tests. - -Consolidates: -- test_resolver_cycles.py (direct/indirect/deep cycles, cycle detection properties) -- test_resolver_depth_limit.py (MAX_DEPTH enforcement, attribute chains) -- test_resolver_depth_guard_and_variants.py (guard edge cases, multi-placeables, - malformed NumberLiteral, fallback depth protection) -- test_resolver_expression_depth.py (SelectExpression depth, Placeable depth, mixed) -- test_resolver_expression_depth_and_select.py (ResolutionContext expression depth) -- test_resolver_expansion_budget.py (expansion budget DoS protection) -""" - -from __future__ import annotations - -import pytest -from hypothesis import event, given, settings -from hypothesis import strategies as st - -from ftllexengine.constants import FALLBACK_INVALID, MAX_DEPTH -from ftllexengine.diagnostics import DiagnosticCode, ErrorCategory, FrozenFluentError -from ftllexengine.runtime.bundle import FluentBundle -from ftllexengine.runtime.function_bridge import FunctionRegistry -from ftllexengine.runtime.resolution_context import GlobalDepthGuard, ResolutionContext -from ftllexengine.runtime.resolver import FluentResolver -from ftllexengine.syntax import ( - CallArguments, - FunctionReference, - Identifier, - Message, - NumberLiteral, - Pattern, - Placeable, - SelectExpression, - StringLiteral, - TextElement, - VariableReference, - Variant, -) -from ftllexengine.syntax.ast import InlineExpression - -# ============================================================================ -# ResolutionContext Tests -# ============================================================================ - - -class TestResolutionContext: - """Tests for ResolutionContext cycle detection.""" - - def test_push_pop_balance(self) -> None: - """Context push/pop maintains balanced state.""" - ctx = ResolutionContext() - - ctx.push("a") - ctx.push("b") - ctx.push("c") - - assert ctx.depth == 3 - assert ctx.contains("a") - assert ctx.contains("b") - assert ctx.contains("c") - - assert ctx.pop() == "c" - assert ctx.pop() == "b" - assert ctx.pop() == "a" - - assert ctx.depth == 0 - assert not ctx.contains("a") - - def test_cycle_detection_o1(self) -> None: - """Cycle detection is O(1) via set.""" - ctx = ResolutionContext() - - for i in range(100): - ctx.push(f"msg{i}") - - assert ctx.contains("msg0") - assert ctx.contains("msg50") - assert ctx.contains("msg99") - assert not ctx.contains("msg100") - - def test_get_cycle_path(self) -> None: - """Cycle path includes full resolution stack.""" - ctx = ResolutionContext() - - ctx.push("a") - ctx.push("b") - ctx.push("c") - - path = ctx.get_cycle_path("a") - - assert path == ["a", "b", "c", "a"] - - -class TestResolutionContextExpressionDepth: - """Test ResolutionContext.expression_depth property.""" - - def test_expression_depth_property_initial(self) -> None: - """expression_depth property returns 0 initially.""" - context = ResolutionContext() - - assert context.expression_depth == 0 - - def test_expression_depth_property_after_increment(self) -> None: - """expression_depth property reflects guard depth after increment.""" - context = ResolutionContext() - - with context.expression_guard: - assert context.expression_depth == 1 - with context.expression_guard: - assert context.expression_depth == 2 - - assert context.expression_depth == 0 - - -class TestResolutionContextTrackExpansion: - """Direct tests for ResolutionContext.track_expansion() accumulation. - - Targets the expansion budget DoS protection: track_expansion() accumulates - character counts without raising. Callers check - ``total_chars > max_expansion_size`` after each call and generate - FrozenFluentError themselves (separation of state tracking from error policy). - """ - - def test_track_expansion_accumulates_correctly(self) -> None: - """track_expansion() accumulates total_chars without raising.""" - context = ResolutionContext(max_expansion_size=100) - - context.track_expansion(99) - assert context.total_chars == 99 - assert context.total_chars <= context.max_expansion_size - - # Exceeding budget is detectable by caller; no exception raised here - context.track_expansion(2) - assert context.total_chars == 101 - assert context.total_chars > context.max_expansion_size - - def test_track_expansion_exact_budget_limit_detectable(self) -> None: - """Exact budget limit is detectable by caller after track_expansion.""" - context = ResolutionContext(max_expansion_size=100) - - context.track_expansion(100) - assert context.total_chars == 100 - # At exactly the budget: caller may allow or deny based on policy - assert context.total_chars <= context.max_expansion_size - - # One more char pushes over the limit — caller detects via comparison - context.track_expansion(1) - assert context.total_chars == 101 - assert context.total_chars > context.max_expansion_size - - @given( - budget=st.integers(min_value=1, max_value=1000), - first_chunk=st.integers(min_value=0, max_value=500), - ) - @settings(max_examples=50) - def test_track_expansion_accumulates_accurately( - self, budget: int, first_chunk: int - ) -> None: - """Property: track_expansion() always accumulates total_chars precisely. - - For any budget and chunk sizes, total_chars must equal the exact sum of - all chunk arguments passed. The caller detects budget exhaustion via - ``total_chars > max_expansion_size``. - """ - context = ResolutionContext(max_expansion_size=budget) - - context.track_expansion(first_chunk) - assert context.total_chars == first_chunk - - over_budget = first_chunk > budget - event("boundary=at_or_over_budget" if over_budget else "boundary=under_budget") - - # Add one more chunk that guarantees budget is exceeded - second_chunk = budget - first_chunk + 1 - if second_chunk > 0: - context.track_expansion(second_chunk) - assert context.total_chars == first_chunk + second_chunk - assert context.total_chars > context.max_expansion_size - event("error_path=budget_exceeded") - - -# ============================================================================ -# Cycle Detection -# ============================================================================ - - -class TestDirectCycles: - """Tests for direct self-referential cycles.""" - - def test_message_references_itself(self) -> None: - """Direct cycle: message references itself.""" - bundle = FluentBundle("en-US", strict=False) - bundle.add_resource("self = { self }") - - result, errors = bundle.format_pattern("self") - - assert isinstance(result, str) - assert len(errors) > 0 - cyclic_errors = [ - e for e in errors - if isinstance(e, FrozenFluentError) and e.category == ErrorCategory.CYCLIC - ] - assert len(cyclic_errors) > 0 - - def test_term_references_itself(self) -> None: - """Direct cycle: term references itself.""" - bundle = FluentBundle("en-US", strict=False) - bundle.add_resource( - """ --self = { -self } -msg = { -self } -""" - ) - - result, errors = bundle.format_pattern("msg") - - assert isinstance(result, str) - assert len(errors) > 0 - - -class TestIndirectCycles: - """Tests for indirect cycles through chains.""" - - def test_two_message_cycle(self) -> None: - """Indirect cycle: a -> b -> a.""" - bundle = FluentBundle("en-US", strict=False) - bundle.add_resource( - """ -msg-a = { msg-b } -msg-b = { msg-a } -""" - ) - - result, errors = bundle.format_pattern("msg-a") - - assert isinstance(result, str) - assert len(errors) > 0 - cyclic_errors = [ - e for e in errors - if isinstance(e, FrozenFluentError) and e.category == ErrorCategory.CYCLIC - ] - assert len(cyclic_errors) > 0 - - def test_three_message_cycle(self) -> None: - """Indirect cycle: a -> b -> c -> a.""" - bundle = FluentBundle("en-US", strict=False) - bundle.add_resource( - """ -msg-a = { msg-b } -msg-b = { msg-c } -msg-c = { msg-a } -""" - ) - - result, errors = bundle.format_pattern("msg-a") - - assert isinstance(result, str) - assert len(errors) > 0 - - def test_term_to_message_cycle(self) -> None: - """Mixed cycle: term -> message -> term.""" - bundle = FluentBundle("en-US", strict=False) - bundle.add_resource( - """ --brand = { product } -product = { -brand } Browser -""" - ) - - result, _ = bundle.format_pattern("product") - - assert isinstance(result, str) - - -class TestDeepChains: - """Tests for deep non-cyclic chains.""" - - def test_chain_at_depth_limit(self) -> None: - """Chain shorter than MAX_DEPTH resolves to leaf value.""" - depth = min(MAX_DEPTH - 1, 50) - messages = [] - for i in range(depth): - if i < depth - 1: - messages.append(f"msg{i} = {{ msg{i + 1} }}") - else: - messages.append(f"msg{i} = End") - - bundle = FluentBundle("en-US") - bundle.add_resource("\n".join(messages)) - - result, _ = bundle.format_pattern("msg0") - - assert isinstance(result, str) - assert "End" in result - - def test_chain_exceeding_depth_limit(self) -> None: - """Chain exceeding MAX_DEPTH produces error.""" - depth = MAX_DEPTH + 10 - messages = [] - for i in range(depth): - if i < depth - 1: - messages.append(f"msg{i} = {{ msg{i + 1} }}") - else: - messages.append(f"msg{i} = End") - - bundle = FluentBundle("en-US", strict=False) - bundle.add_resource("\n".join(messages)) - - result, errors = bundle.format_pattern("msg0") - - assert isinstance(result, str) - assert len(errors) > 0 - - -# ============================================================================ -# MAX_DEPTH Enforcement -# ============================================================================ - - -class TestMaxDepthLimit: - """Tests for maximum resolution depth enforcement.""" - - def test_max_depth_constant_exists(self) -> None: - """MAX_DEPTH constant is defined and reasonable.""" - assert MAX_DEPTH == 100 - - def test_shallow_chain_succeeds(self) -> None: - """Chain of 5 messages resolves without error.""" - bundle = FluentBundle("en") - bundle.add_resource( - """ -m0 = { m1 } -m1 = { m2 } -m2 = { m3 } -m3 = { m4 } -m4 = Final value -""" - ) - - result, errors = bundle.format_pattern("m0") - - assert errors == () - assert "\u2068" in result or "Final value" in result - - def test_moderate_chain_succeeds(self) -> None: - """Chain of 50 messages resolves without error.""" - bundle = FluentBundle("en") - lines = [] - for i in range(49): - lines.append(f"m{i} = {{ m{i+1} }}") - lines.append("m49 = Done") - bundle.add_resource("\n".join(lines)) - - result, errors = bundle.format_pattern("m0") - - assert errors == () - assert "Done" in result - - def test_deep_chain_hits_limit(self) -> None: - """Chain exceeding MAX_DEPTH returns error.""" - bundle = FluentBundle("en", strict=False) - depth = MAX_DEPTH + 10 - lines = [] - for i in range(depth - 1): - lines.append(f"m{i} = {{ m{i+1} }}") - lines.append(f"m{depth-1} = Final") - bundle.add_resource("\n".join(lines)) - - _, errors = bundle.format_pattern("m0") - - assert len(errors) > 0 - depth_errors = [e for e in errors if isinstance(e, FrozenFluentError)] - assert len(depth_errors) > 0 - - def test_exactly_at_limit_succeeds(self) -> None: - """Chain of exactly MAX_DEPTH - 1 nesting levels succeeds.""" - bundle = FluentBundle("en") - depth = MAX_DEPTH - 1 - lines = [] - for i in range(depth - 1): - lines.append(f"m{i} = {{ m{i+1} }}") - lines.append(f"m{depth-1} = End") - bundle.add_resource("\n".join(lines)) - - result, _ = bundle.format_pattern("m0") - - assert "End" in result - - def test_depth_limit_error_message_contains_depth_info(self) -> None: - """Error message for depth limit references depth.""" - bundle = FluentBundle("en", strict=False) - depth = MAX_DEPTH + 5 - lines = [] - for i in range(depth - 1): - lines.append(f"msg{i} = {{ msg{i+1} }}") - lines.append(f"msg{depth-1} = End") - bundle.add_resource("\n".join(lines)) - - _, errors = bundle.format_pattern("msg0") - - assert len(errors) > 0 - error_str = str(errors[0]) - assert "depth" in error_str.lower() or "Maximum" in error_str - - def test_cyclic_detected_before_depth(self) -> None: - """Cyclic reference is detected before hitting depth limit.""" - bundle = FluentBundle("en", strict=False) - bundle.add_resource( - """ -a = { b } -b = { c } -c = { a } -""" - ) - - result, errors = bundle.format_pattern("a") - - assert len(errors) > 0 - assert "{" in result # Fallback format - - def test_independent_resolutions_dont_share_depth(self) -> None: - """Separate format_pattern calls have independent depth tracking.""" - bundle = FluentBundle("en") - bundle.add_resource( - """ -a1 = { a2 } -a2 = { a3 } -a3 = A Done - -b1 = { b2 } -b2 = B Done -""" - ) - - result_a, errors_a = bundle.format_pattern("a1") - result_b, errors_b = bundle.format_pattern("b1") - - assert errors_a == () - assert errors_b == () - assert "A Done" in result_a - assert "B Done" in result_b - - -class TestMaxDepthWithAttributes: - """Tests for depth limit with attribute access.""" - - def test_attribute_chain_counts_toward_depth(self) -> None: - """Message.attribute references count toward depth.""" - bundle = FluentBundle("en") - bundle.add_resource( - """ -m0 = Value - .attr = { m1.attr } -m1 = Value - .attr = { m2.attr } -m2 = Value - .attr = { m3.attr } -m3 = Value - .attr = Final -""" - ) - - result, errors = bundle.format_pattern("m0", attribute="attr") - - assert errors == () - assert "Final" in result - - -# ============================================================================ -# SelectExpression / Placeable / Mixed Depth Limits -# ============================================================================ - - -class TestSelectExpressionDepthLimit: - """Verify depth limiting for SelectExpression recursion through variants. - - Regression: SEC-RESOLVE-RECURSION-6. - """ - - def _create_nested_select_ast(self, depth: int) -> Message: - """Create a Message with SelectExpression nested to specified depth.""" - inner_pattern = Pattern(elements=(TextElement(value="innermost"),)) - current_pattern = inner_pattern - - for _ in range(depth): - select_expr = SelectExpression( - selector=VariableReference(id=Identifier(name="var")), - variants=( - Variant( - key=Identifier(name="one"), - value=current_pattern, - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="other"),)), - default=True, - ), - ), - ) - current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) - - return Message( - id=Identifier(name="nested"), - value=current_pattern, - attributes=(), - comment=None, - ) - - def test_shallow_nesting_resolves_successfully(self) -> None: - """SelectExpression with shallow nesting resolves normally.""" - bundle = FluentBundle("en_US") - message = self._create_nested_select_ast(depth=5) - bundle._messages["nested"] = message - - result, errors = bundle.format_pattern("nested", {"var": "one"}) - - assert "innermost" in result - assert errors == () - - def test_deep_nesting_triggers_depth_limit(self) -> None: - """SelectExpression nested beyond MAX_DEPTH triggers depth limit.""" - bundle = FluentBundle("en_US", strict=False) - message = self._create_nested_select_ast(depth=MAX_DEPTH + 10) - bundle._messages["nested"] = message - - _result, errors = bundle.format_pattern("nested", {"var": "one"}) - - assert len(errors) >= 1 - error_messages = [str(e) for e in errors] - assert any("depth" in msg.lower() for msg in error_messages) - - def test_exact_max_depth_boundary(self) -> None: - """Behavior at exactly MAX_DEPTH does not crash.""" - bundle = FluentBundle("en_US", strict=False) - message = self._create_nested_select_ast(depth=MAX_DEPTH) - bundle._messages["nested"] = message - - result, _errors = bundle.format_pattern("nested", {"var": "one"}) - - assert result is not None - - def test_just_under_max_depth(self) -> None: - """Nesting just under MAX_DEPTH produces no depth errors.""" - bundle = FluentBundle("en_US") - message = self._create_nested_select_ast(depth=MAX_DEPTH - 5) - bundle._messages["nested"] = message - - _result, errors = bundle.format_pattern("nested", {"var": "one"}) - - depth_errors = [e for e in errors if "depth" in str(e).lower()] - assert len(depth_errors) == 0 - - -class TestNestedPlaceableDepthLimit: - """Verify depth limiting for nested Placeables like { { { x } } }.""" - - def _create_nested_placeable_ast(self, depth: int) -> Message: - """Create a Message with Placeables nested to specified depth.""" - inner_expr: InlineExpression = VariableReference(id=Identifier(name="var")) - current_expr: InlineExpression = inner_expr - - for _ in range(depth): - current_expr = Placeable(expression=current_expr) - - return Message( - id=Identifier(name="nested"), - value=Pattern(elements=(Placeable(expression=current_expr),)), - attributes=(), - comment=None, - ) - - def test_shallow_placeable_nesting_resolves(self) -> None: - """Shallow placeable nesting resolves normally.""" - bundle = FluentBundle("en_US") - message = self._create_nested_placeable_ast(depth=5) - bundle._messages["nested"] = message - - result, errors = bundle.format_pattern("nested", {"var": "hello"}) - - assert "hello" in result - assert errors == () - - def test_deep_placeable_nesting_triggers_limit(self) -> None: - """Deep placeable nesting triggers depth limit.""" - bundle = FluentBundle("en_US", strict=False) - message = self._create_nested_placeable_ast(depth=MAX_DEPTH + 10) - bundle._messages["nested"] = message - - _result, errors = bundle.format_pattern("nested", {"var": "hello"}) - - assert len(errors) >= 1 - - -class TestMixedNestingDepthLimit: - """Verify depth limiting for mixed SelectExpression and Placeable nesting.""" - - def _create_mixed_nesting_ast(self, select_depth: int, placeable_depth: int) -> Message: - """Create a Message mixing SelectExpression and Placeable nesting.""" - inner_expr: InlineExpression = VariableReference(id=Identifier(name="var")) - current_expr: InlineExpression = inner_expr - - for _ in range(placeable_depth): - current_expr = Placeable(expression=current_expr) - - current_pattern = Pattern(elements=(Placeable(expression=current_expr),)) - - for _ in range(select_depth): - select_expr = SelectExpression( - selector=VariableReference(id=Identifier(name="sel")), - variants=( - Variant( - key=Identifier(name="a"), - value=current_pattern, - default=False, - ), - Variant( - key=Identifier(name="b"), - value=Pattern(elements=(TextElement(value="b"),)), - default=True, - ), - ), - ) - current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) - - return Message( - id=Identifier(name="mixed"), - value=current_pattern, - attributes=(), - comment=None, - ) - - def test_combined_nesting_exceeds_limit(self) -> None: - """Combined nesting exceeding MAX_DEPTH produces depth error.""" - bundle = FluentBundle("en_US", strict=False) - message = self._create_mixed_nesting_ast( - select_depth=MAX_DEPTH // 2 + 10, - placeable_depth=MAX_DEPTH // 2 + 10, - ) - bundle._messages["mixed"] = message - - _result, errors = bundle.format_pattern("mixed", {"var": "x", "sel": "a"}) - - assert len(errors) >= 1 - - -class TestDepthLimitWithCustomLimit: - """Verify custom depth limit configuration.""" - - def test_custom_lower_depth_limit(self) -> None: - """Custom lower depth limit triggers earlier than default.""" - bundle = FluentBundle("en_US", max_nesting_depth=10, strict=False) - - inner_pattern = Pattern(elements=(TextElement(value="inner"),)) - current_pattern = inner_pattern - - for _ in range(15): # 15 > 10 custom limit, < 100 default - select_expr = SelectExpression( - selector=NumberLiteral(value=1, raw="1"), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=current_pattern, - default=True, - ), - ), - ) - current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) - - message = Message( - id=Identifier(name="test"), - value=current_pattern, - attributes=(), - comment=None, - ) - bundle._messages["test"] = message - - result, _errors = bundle.format_pattern("test", {}) - - assert result is not None - - -class TestDepthLimitPropertyBased: - """Property-based tests for depth limiting.""" - - @given(st.integers(min_value=1, max_value=50)) - @settings(max_examples=20) - def test_depth_under_limit_never_errors_on_depth(self, depth: int) -> None: - """Nesting under MAX_DEPTH produces no depth errors.""" - event(f"depth={depth}") - bundle = FluentBundle("en_US") - - inner_pattern = Pattern(elements=(TextElement(value="ok"),)) - current_pattern = inner_pattern - - for _ in range(depth): - select_expr = SelectExpression( - selector=NumberLiteral(value=1, raw="1"), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=current_pattern, - default=True, - ), - ), - ) - current_pattern = Pattern(elements=(Placeable(expression=select_expr),)) - - message = Message( - id=Identifier(name="test"), - value=current_pattern, - attributes=(), - comment=None, - ) - bundle._messages["test"] = message - - result, errors = bundle.format_pattern("test", {}) - - depth_errors = [e for e in errors if "depth" in str(e).lower()] - assert len(depth_errors) == 0 - assert "ok" in result - - -# ============================================================================ -# GlobalDepthGuard Edge Cases -# ============================================================================ - - -class TestGlobalDepthGuardEdgeCases: - """Coverage for GlobalDepthGuard.__exit__ defensive branch.""" - - def test_exit_without_enter(self) -> None: - """Guard exit without enter does not crash (defensive branch).""" - guard = GlobalDepthGuard(max_depth=100) - # _token remains None; __exit__ defensive branch covered. - guard.__exit__(None, None, None) - - def test_exit_returns_none(self) -> None: - """Guard __exit__ does not suppress exceptions.""" - guard = GlobalDepthGuard(max_depth=100) - with guard: - pass - - -# ============================================================================ -# Multi-Placeable Pattern Resolution -# ============================================================================ - - -class TestPatternMultiplePlaceables: - """Coverage for pattern with multiple consecutive placeables.""" - - def test_pattern_with_two_placeables_in_sequence(self) -> None: - """Pattern with consecutive placeables resolves all correctly.""" - pattern = Pattern( - elements=( - Placeable(expression=VariableReference(id=Identifier("first"))), - TextElement(value=" and "), - Placeable(expression=VariableReference(id=Identifier("second"))), - ) - ) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message( - message, {"first": "A", "second": "B"} - ) - - assert result == "A and B" - assert errors == () - - @given( - count=st.integers(min_value=2, max_value=10), - values=st.lists(st.text(min_size=1, max_size=10), min_size=2, max_size=10), - ) - def test_pattern_with_multiple_placeables_property( - self, count: int, values: list[str] - ) -> None: - """Property: Pattern with N placeables resolves all correctly.""" - event(f"count={count}") - values = values[:count] - if len(values) < count: - values.extend(["X"] * (count - len(values))) - - elements: list[TextElement | Placeable] = [] - for i in range(count): - if i > 0: - elements.append(TextElement(value=" ")) - elements.append( - Placeable(expression=VariableReference(id=Identifier(f"v{i}"))) - ) - - pattern = Pattern(elements=tuple(elements)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - args = {f"v{i}": values[i] for i in range(count)} - result, errors = resolver.resolve_message(message, args) - - assert errors == () - assert result == " ".join(values) - - -# ============================================================================ -# Malformed NumberLiteral in Variant Keys -# ============================================================================ - - -class TestVariantMatchingMalformedNumberLiteral: - """NumberLiteral.__post_init__ prevents construction with invalid raw strings. - - Previously, programmatically constructed ASTs could contain invalid - NumberLiteral.raw strings that bypassed the parser. NumberLiteral.__post_init__ - now enforces the invariant at construction time, making the resolver's - former InvalidOperation handler unreachable via normal API usage. - """ - - def test_malformed_raw_rejected_at_construction(self) -> None: - """NumberLiteral rejects raw string that does not parse as a number.""" - with pytest.raises(ValueError, match="not a valid number literal"): - NumberLiteral(value=42, raw="not_a_number") - - def test_multiple_malformed_raws_all_rejected(self) -> None: - """NumberLiteral rejects each invalid raw string at construction time.""" - for bad_raw in ("invalid1", "also_invalid", "not-a-number", "[1,2,3]"): - with pytest.raises(ValueError, match="not a valid number literal"): - NumberLiteral(value=1, raw=bad_raw) - - -# ============================================================================ -# Fallback Depth Protection -# ============================================================================ - - -class TestGetFallbackForPlaceableDepthProtection: - """Coverage for depth protection in _get_fallback_for_placeable.""" - - def _make_resolver(self) -> FluentResolver: - return FluentResolver( - locale="en", - messages={}, - terms={}, - function_registry=FunctionRegistry(), - ) - - def test_fallback_depth_zero_returns_invalid(self) -> None: - """Fallback with depth=0 returns FALLBACK_INVALID immediately.""" - resolver = self._make_resolver() - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("x")), - variants=( - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="v"),)), - default=True, - ), - ), - ) - - result = resolver._get_fallback_for_placeable(select_expr, depth=0) - - assert result == FALLBACK_INVALID - - def test_fallback_negative_depth_returns_invalid(self) -> None: - """Fallback with negative depth returns FALLBACK_INVALID.""" - resolver = self._make_resolver() - - result = resolver._get_fallback_for_placeable( - VariableReference(id=Identifier("x")), depth=-1 - ) - - assert result == FALLBACK_INVALID - - @given(depth=st.integers(max_value=0)) - def test_fallback_non_positive_depth_property(self, depth: int) -> None: - """Property: Any non-positive depth returns FALLBACK_INVALID immediately.""" - event(f"depth={depth}") - resolver = self._make_resolver() - - result = resolver._get_fallback_for_placeable( - StringLiteral(value="test"), depth=depth - ) - - assert result == FALLBACK_INVALID - - def test_fallback_depth_one_processes_normally(self) -> None: - """Fallback with depth=1 processes expression normally.""" - resolver = self._make_resolver() - - result = resolver._get_fallback_for_placeable( - VariableReference(id=Identifier("count")), depth=1 - ) - - assert result == "{$count}" - - def test_fallback_select_expression_depth_decremented(self) -> None: - """SelectExpression fallback decrements depth for recursive call.""" - resolver = self._make_resolver() - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("count")), - variants=( - Variant( - key=Identifier("x"), - value=Pattern(elements=(TextElement(value="variant"),)), - default=True, - ), - ), - ) - - # depth=1 → outer select processes, recursive selector call uses depth=0 - # which returns FALLBACK_INVALID; result should contain "{???} -> ..." - result = resolver._get_fallback_for_placeable(select_expr, depth=1) - - assert FALLBACK_INVALID in result - assert " -> ..." in result - - -# ============================================================================ -# Pattern Loop Expansion Budget -# ============================================================================ - - -class TestPatternLoopEarlyExit: - """Tests for pattern loop early-exit when budget exceeded.""" - - def test_pattern_loop_defensive_check_with_context_over_budget(self) -> None: - """Pattern loop defensive check triggers when total_chars > budget.""" - pattern = Pattern( - elements=( - TextElement(value="A" * 10), - TextElement(value="B" * 10), - ) - ) - message = Message(id=Identifier(name="test"), value=pattern, attributes=()) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"test": message}, - terms={}, - function_registry=registry, - max_expansion_size=50, - ) - - context = ResolutionContext(max_expansion_size=50) - context._total_chars = 60 # Simulate budget already exceeded - - result, errors = resolver.resolve_message(message, args={}, context=context) - - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - assert has_budget_error - assert len(result) == 0 or result == "{test}" - - def test_pattern_loop_exits_when_budget_already_exceeded(self) -> None: - """Pattern loop exits early if budget exceeded before next element.""" - pattern = Pattern( - elements=( - TextElement(value="A" * 50), - TextElement(value="B" * 50), - TextElement(value="C" * 50), - ) - ) - message = Message(id=Identifier(name="test"), value=pattern, attributes=()) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"test": message}, - terms={}, - function_registry=registry, - max_expansion_size=75, - ) - - result, errors = resolver.resolve_message(message, args={}) - - assert len(errors) > 0 - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - assert has_budget_error - assert "C" not in result - - def test_pattern_loop_early_exit_on_boundary(self) -> None: - """Pattern loop exits when total_chars exactly equals budget.""" - pattern = Pattern( - elements=( - TextElement(value="X" * 10), - TextElement(value="Y" * 10), - ) - ) - message = Message(id=Identifier(name="boundary"), value=pattern, attributes=()) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"boundary": message}, - terms={}, - function_registry=registry, - max_expansion_size=10, - ) - - _result, errors = resolver.resolve_message(message, args={}) - - assert len(errors) > 0 - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - assert has_budget_error - - @given( - element_count=st.integers(min_value=2, max_value=10), - chars_per_element=st.integers(min_value=5, max_value=20), - ) - @settings(max_examples=50) - def test_pattern_loop_early_exit_property( - self, element_count: int, chars_per_element: int - ) -> None: - """Property: Pattern loop always exits when budget exceeded.""" - event(f"element_count={element_count}") - - elements = tuple( - TextElement(value=f"{chr(65 + i)}" * chars_per_element) - for i in range(element_count) - ) - pattern = Pattern(elements=elements) - message = Message(id=Identifier(name="prop"), value=pattern, attributes=()) - - total_chars = element_count * chars_per_element - budget = total_chars // 2 - - event("budget_scenario=exceeded") - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"prop": message}, - terms={}, - function_registry=registry, - max_expansion_size=budget, - ) - - result, errors = resolver.resolve_message(message, args={}) - - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - if has_budget_error: - event("error_path=early_exit_detected") - assert len(result) < total_chars - event("result_type=partial") - - -# ============================================================================ -# Placeable Expansion Budget Break -# ============================================================================ - - -class TestPlaceableExpansionBudgetBreak: - """Tests for Placeable exception handler break on expansion budget error.""" - - def test_placeable_expansion_budget_breaks_pattern_loop(self) -> None: - """Expansion budget error from Placeable breaks pattern resolution.""" - outer_pattern = Pattern( - elements=( - TextElement(value="Before"), - Placeable( - expression=VariableReference(id=Identifier(name="big_value")) - ), - TextElement(value="After"), # Must not be processed. - ) - ) - outer_message = Message( - id=Identifier(name="outer"), value=outer_pattern, attributes=() - ) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"outer": outer_message}, - terms={}, - function_registry=registry, - max_expansion_size=50, - ) - - result, errors = resolver.resolve_message( - outer_message, args={"big_value": "Z" * 100} - ) - - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - assert has_budget_error - assert "After" not in result - - def test_placeable_budget_error_via_select_expression(self) -> None: - """Expansion budget error from SelectExpression in Placeable breaks loop.""" - variants = ( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="A" * 60),)), - default=True, - ), - ) - select_expr = SelectExpression( - selector=VariableReference(id=Identifier(name="count")), variants=variants - ) - pattern = Pattern( - elements=( - TextElement(value="Start"), - Placeable(expression=select_expr), - TextElement(value="End"), # Must not be processed. - ) - ) - message = Message(id=Identifier(name="select"), value=pattern, attributes=()) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"select": message}, - terms={}, - function_registry=registry, - max_expansion_size=40, - ) - - result, errors = resolver.resolve_message(message, args={"count": 1}) - - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - assert has_budget_error - assert "End" not in result - - def test_placeable_budget_error_via_function_call(self) -> None: - """Expansion budget error from function result in Placeable breaks loop.""" - def large_output() -> str: - return "LARGE" * 100 - - registry = FunctionRegistry() - registry.register(large_output, ftl_name="BIGFUNC") - - func_call = FunctionReference( - id=Identifier(name="BIGFUNC"), - arguments=CallArguments(positional=(), named=()), - ) - pattern = Pattern( - elements=( - TextElement(value="Prefix"), - Placeable(expression=func_call), - TextElement(value="Suffix"), # Must not be processed. - ) - ) - message = Message(id=Identifier(name="func"), value=pattern, attributes=()) - resolver = FluentResolver( - locale="en_US", - messages={"func": message}, - terms={}, - function_registry=registry, - max_expansion_size=100, - ) - - result, errors = resolver.resolve_message(message, args={}) - - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - assert has_budget_error - assert "Suffix" not in result - - @given( - variant_size=st.integers(min_value=50, max_value=200), - budget=st.integers(min_value=10, max_value=100), - ) - @settings(max_examples=30) - def test_placeable_budget_break_property( - self, variant_size: int, budget: int - ) -> None: - """Property: Placeable budget errors always break pattern loop.""" - event(f"variant_size={variant_size}") - event(f"budget={budget}") - - if variant_size <= budget: - event("skip=variant_fits_budget") - return - - variants = ( - Variant( - key=Identifier(name="key"), - value=Pattern(elements=(TextElement(value="X" * variant_size),)), - default=True, - ), - ) - select = SelectExpression( - selector=VariableReference(id=Identifier(name="var")), variants=variants - ) - pattern = Pattern( - elements=( - Placeable(expression=select), - TextElement(value="Marker"), # Must not appear. - ) - ) - message = Message(id=Identifier(name="test"), value=pattern, attributes=()) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"test": message}, - terms={}, - function_registry=registry, - max_expansion_size=budget, - ) - - result, errors = resolver.resolve_message(message, args={"var": "key"}) - - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors - ) - if has_budget_error: - event("error_path=budget_break") - assert "Marker" not in result - event("result_type=partial") - - -class TestExpansionBudgetIntegration: - """Integration tests for expansion budget across resolver components.""" - - def test_expansion_budget_with_isolating_marks(self) -> None: - """Expansion budget accounts for Unicode isolating marks.""" - pattern = Pattern( - elements=( - Placeable(expression=VariableReference(id=Identifier(name="v1"))), - Placeable(expression=VariableReference(id=Identifier(name="v2"))), - ) - ) - message = Message(id=Identifier(name="iso"), value=pattern, attributes=()) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"iso": message}, - terms={}, - function_registry=registry, - use_isolating=True, - max_expansion_size=15, - ) - - # Each variable: 5 chars content + 2 chars marks (FSI + PDI) = 7 chars - # Total: 14 chars (just under budget of 15) - _result, errors = resolver.resolve_message( - message, args={"v1": "AAAAA", "v2": "BBBBB"} - ) - assert len(errors) == 0 - - # 8-char values: 10 + 10 = 20 > 15 - _result2, errors2 = resolver.resolve_message( - message, - args={"v1": "AAAAAAAA", "v2": "BBBBBBBB"}, - ) - has_budget_error = any( - e.diagnostic is not None - and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - for e in errors2 - ) - assert has_budget_error - - def test_expansion_budget_error_diagnostic_includes_counts(self) -> None: - """Expansion budget error diagnostic includes actual and limit values.""" - pattern = Pattern(elements=(TextElement(value="X" * 100),)) - message = Message(id=Identifier(name="err"), value=pattern, attributes=()) - registry = FunctionRegistry() - resolver = FluentResolver( - locale="en_US", - messages={"err": message}, - terms={}, - function_registry=registry, - max_expansion_size=50, - ) - - _result, errors = resolver.resolve_message(message, args={}) - - assert len(errors) > 0 - budget_error = next( - e - for e in errors - if e.diagnostic and e.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED - ) - assert budget_error.diagnostic is not None - diagnostic_str = str(budget_error.diagnostic) - assert "50" in diagnostic_str - assert "100" in diagnostic_str or "exceeded" in diagnostic_str.lower() +"""Aggregated runtime resolver depth cycles test surface.""" + +from tests.runtime_resolver_depth_cycles_cases.cycle_detection import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.fallback_depth_protection import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.global_depth_guard_edge_cases import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.malformed_number_literal_in_variant_keys import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.max_depth_enforcement import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.multi_placeable_pattern_resolution import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.pattern_loop_expansion_budget import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.placeable_expansion_budget_break import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.resolution_context_tests import * # noqa: F403 - re-export split test surface +from tests.runtime_resolver_depth_cycles_cases.select_expression_placeable_mixed_depth_limits import * # noqa: F403 - re-export split test surface diff --git a/tests/test_runtime_resolver_selection.py b/tests/test_runtime_resolver_selection.py index bc86db7f..60b077e5 100644 --- a/tests/test_runtime_resolver_selection.py +++ b/tests/test_runtime_resolver_selection.py @@ -1,1570 +1,6 @@ -"""SelectExpression variant matching, pattern loop coverage, and numeric selector tests. +"""Aggregated runtime resolver selection test surface.""" -Consolidates: -- test_resolver_edge_cases.py: TestSelectExpressionEdgeCases, TestResolverErrorPaths -- test_resolver_expression_depth_and_select.py: TestSelectExpressionEdgeCases -- test_resolver_loop_and_numeric_selector.py: all -- test_resolver_pattern_and_variant_matching.py: all -- test_resolver_explicit_branches.py: all -- test_resolver_placeable_and_numeric.py: TestVariantNumericMatching, - TestFallbackVariantNoVariants, TestSelectExpressionFallbackPaths, - TestNumericVariantEdgeCases, TestResolverFluentNumberVariantMatching -- test_resolver_placeable_error_and_literal.py: TestNumberLiteralNonMatchingValue -- test_resolver_fallback_and_terms.py: TestFormatValueComprehensive, - TestTextElementBranch, TestNumberLiteralVariantMatching -- test_resolver_term_and_pattern_branches.py: TestNumberLiteralSelectorCoverage -""" - -from __future__ import annotations - -from datetime import UTC, datetime -from decimal import Decimal - -import pytest -from hypothesis import event, given -from hypothesis import strategies as st - -from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError -from ftllexengine.runtime.bundle import FluentBundle -from ftllexengine.runtime.function_bridge import FunctionRegistry -from ftllexengine.runtime.resolution_context import ResolutionContext -from ftllexengine.runtime.resolver import FluentResolver -from ftllexengine.syntax.ast import ( - CallArguments, - FunctionReference, - Identifier, - Message, - NumberLiteral, - Pattern, - Placeable, - SelectExpression, - StringLiteral, - TextElement, - VariableReference, - Variant, -) - -# ============================================================================ -# PATTERN LOOP CONTINUATION -# ============================================================================ - - -class TestPatternLoopContinuation: - """Coverage for pattern loop continuation (line 390->386).""" - - def test_empty_pattern_no_elements(self) -> None: - """Pattern with no elements exits loop immediately.""" - pattern = Pattern(elements=()) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {}) - assert result == "" - assert errors == () - - def test_pattern_text_then_placeable_then_text(self) -> None: - """Pattern with alternating Text/Placeable/Text elements.""" - pattern = Pattern( - elements=( - TextElement(value="Start "), - Placeable(expression=VariableReference(id=Identifier("var"))), - TextElement(value=" End"), - ) - ) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"var": "X"}) - assert result == "Start X End" - assert errors == () - - def test_pattern_only_text_elements(self) -> None: - """Pattern with only TextElements (no Placeables).""" - pattern = Pattern( - elements=( - TextElement(value="First "), - TextElement(value="Second "), - TextElement(value="Third"), - ) - ) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {}) - assert result == "First Second Third" - assert errors == () - - -class TestPatternResolutionBranches: - """Test pattern resolution loop continuation branches.""" - - def test_pattern_with_multiple_text_elements_covers_loop_continuation(self) -> None: - """Pattern with TextElement followed by another TextElement covers 404->400.""" - pattern = Pattern( - elements=( - TextElement(value="Hello "), - TextElement(value="World"), - ) - ) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {}) - assert result == "Hello World" - assert errors == () - - def test_pattern_text_then_placeable_covers_loop_continuation(self) -> None: - """Pattern with TextElement followed by Placeable covers 404->400.""" - pattern = Pattern( - elements=( - TextElement(value="Value: "), - Placeable(expression=VariableReference(id=Identifier("x"))), - ) - ) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"x": "42"}) - assert "Value: " in result - assert "42" in result - assert errors == () - - def test_pattern_three_elements_ensures_multiple_loop_iterations(self) -> None: - """Pattern with three elements ensures loop continuation branch is hit.""" - ftl = """msg = Start { $var } End""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("msg", {"var": "middle"}) - assert result == "Start middle End" - - -class TestMatchCaseBranchCoverage: - """Test match/case control flow branches in resolver.""" - - def test_placeable_followed_by_text_in_pattern(self) -> None: - """Pattern with Placeable followed by TextElement tests 404->400 branch.""" - ftl = """msg = { $x } text""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("msg", {"x": "value"}) - assert result == "value text" - - def test_multiple_placeables_in_pattern(self) -> None: - """Pattern with multiple Placeables ensures loop continuation.""" - ftl = """msg = { $a }{ $b }""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("msg", {"a": "A", "b": "B"}) - assert result == "AB" - - def test_select_with_number_literal_then_identifier_variant(self) -> None: - """SelectExpression with NumberLiteral followed by Identifier variant covers 634->629.""" - ftl = """ -msg = { $val -> - [1] one - [2] two - *[other] default -} -""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("msg", {"val": "other"}) - assert result == "default" - - def test_select_number_literal_no_match_continues_to_next(self) -> None: - """SelectExpression where first NumberLiteral doesn't match, second does.""" - ftl = """ -msg = { $count -> - [10] ten - [20] twenty - [30] thirty - *[other] default -} -""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("msg", {"count": 20}) - assert result == "twenty" - - def test_select_with_isolating_enabled_exercises_placeable_branch(self) -> None: - """Pattern with use_isolating=True covers Placeable branch with isolation.""" - ftl = """msg = Prefix { $val } Suffix""" - bundle = FluentBundle("en", use_isolating=True) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("msg", {"val": "middle"}) - assert "Prefix" in result - assert "middle" in result - assert "Suffix" in result - - -class TestTextElementBranch: - """Test TextElement branch in pattern resolution.""" - - def test_pattern_with_only_text_no_placeables(self) -> None: - """Pattern with only TextElement, no Placeable (line 286->282).""" - bundle = FluentBundle("en_US") - bundle.add_resource("simple = This is plain text with no variables") - - result, errors = bundle.format_pattern("simple") - assert result == "This is plain text with no variables" - assert errors == () - - -# ============================================================================ -# SELECT EXPRESSION EDGE CASES -# ============================================================================ - - -class TestSelectExpressionEdgeCases: - """Test edge cases in select expression resolution.""" - - def test_select_with_no_matching_variant_uses_default(self) -> None: - """Select with no match uses default variant.""" - ftl = """ -test = { $value -> - [one] One - *[other] Other -} -""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("test", {"value": "unknown"}) - assert "Other" in result - - def test_select_with_number_tries_plural_category(self) -> None: - """Select with number value tries plural category matching.""" - ftl = """ -test = { $count -> - [one] One item - *[other] Many items -} -""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource(ftl) - - result, _ = bundle.format_pattern("test", {"count": 1}) - assert "One item" in result - - result, _ = bundle.format_pattern("test", {"count": 5}) - assert "Many items" in result - - def test_select_with_no_default_raises_at_construction(self) -> None: - """SelectExpression with no default variant raises ValueError at construction.""" - with pytest.raises(ValueError, match="exactly one default variant"): - SelectExpression( - selector=VariableReference(id=Identifier(name="x")), - variants=( - Variant( - key=Identifier(name="a"), - value=Pattern(elements=(TextElement(value="A"),)), - default=False, - ), - Variant( - key=Identifier(name="b"), - value=Pattern(elements=(TextElement(value="B"),)), - default=False, - ), - ), - ) - - def test_select_with_empty_variants_raises_at_construction(self) -> None: - """SelectExpression with no variants raises ValueError at construction.""" - with pytest.raises(ValueError, match="at least one variant"): - SelectExpression( - selector=VariableReference(id=Identifier(name="x")), - variants=(), - ) - - def test_number_literal_rejects_invalid_raw(self) -> None: - """NumberLiteral.__post_init__ prevents construction with invalid raw strings. - - Previously, the resolver handled programmatically constructed ASTs where - NumberLiteral.raw was unparseable as Decimal. NumberLiteral now enforces - the invariant at construction time, making such ASTs impossible via the - normal API. - """ - with pytest.raises(ValueError, match="not a valid number literal"): - NumberLiteral(value=Decimal("0.0"), raw="invalid") - - def test_deeply_nested_select_expression_fallback(self) -> None: - """Deeply nested SelectExpression in fallback generation doesn't overflow.""" - from ftllexengine.runtime.functions import ( - create_default_registry, - ) - from ftllexengine.syntax.ast import Expression - - nested_select: Expression = VariableReference(id=Identifier(name="missing")) - for _ in range(100): - nested_select = SelectExpression( - selector=nested_select, # type: ignore[arg-type] - variants=( - Variant( - key=Identifier(name="key"), - value=Pattern(elements=(TextElement(value="Value"),)), - default=True, - ), - ), - ) - - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=nested_select),)), - attributes=(), - ) - - resolver = FluentResolver( - locale="en", - messages={"test": msg}, - terms={}, - function_registry=create_default_registry(), - use_isolating=False, - ) - - result, _ = resolver.resolve_message(msg, {}) - assert isinstance(result, str) - assert len(result) > 0 - - -class TestSelectVariantBranchCoverage: - """Direct resolver internal calls for select expression branch coverage.""" - - def test_select_variant_loop_with_no_match_on_number_literal(self) -> None: - """Select expression where no NumberLiteral matches continues loop to default.""" - resolver = FluentResolver( - locale="en", - messages={}, - terms={}, - function_registry=FunctionRegistry(), - ) - - selector = NumberLiteral(value=5, raw="5") - variants = ( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=()), - default=False, - ), - Variant( - key=NumberLiteral(value=2, raw="2"), - value=Pattern(elements=()), - default=False, - ), - Variant( - key=NumberLiteral(value=3, raw="3"), - value=Pattern(elements=()), - default=True, - ), - ) - - select_expr = SelectExpression(selector=selector, variants=variants) - context = ResolutionContext() - result = resolver._resolve_select_expression(select_expr, {}, [], context) - assert result == "" - - def test_pattern_elements_loop_with_text_only(self) -> None: - """Pattern resolution with only TextElement tests loop continuation.""" - resolver = FluentResolver( - locale="en", - messages={}, - terms={}, - function_registry=FunctionRegistry(), - ) - - pattern = Pattern( - elements=( - TextElement(value="Hello "), - TextElement(value="World"), - TextElement(value="!"), - ) - ) - - context = ResolutionContext() - result = resolver._resolve_pattern(pattern, {}, [], context) - assert result == "Hello World!" - - -# ============================================================================ -# NUMERIC VARIANT MATCHING -# ============================================================================ - - -class TestNumberLiteralVariantWithNonNumericSelector: - """Coverage for NumberLiteral variant key with non-numeric selector (line 616->611).""" - - def test_number_literal_variant_with_string_selector(self) -> None: - """SelectExpression with NumberLiteral variants but string selector falls to default.""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("val")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=2, raw="2"), - value=Pattern(elements=(TextElement(value="two"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="fallback"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"val": "not_a_number"}) - assert result == "fallback" - assert errors == () - - def test_number_literal_variant_with_none_selector(self) -> None: - """SelectExpression with NumberLiteral variant but None selector falls to default.""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("val")), - variants=( - Variant( - key=NumberLiteral(value=42, raw="42"), - value=Pattern(elements=(TextElement(value="forty-two"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"val": None}) - assert result == "default" - assert errors == () - - def test_number_literal_variant_with_bool_selector(self) -> None: - """Bool selector matches identifier variant, not NumberLiteral. - - Booleans are excluded from numeric matching (even though isinstance(True, int)) - because they should match [true]/[false] identifier variants, not number literals. - """ - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("val")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="number_one"),)), - default=False, - ), - Variant( - key=Identifier("true"), - value=Pattern(elements=(TextElement(value="bool_true"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="fallback"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"val": True}) - assert result == "bool_true" - assert errors == () - - def test_number_literal_variants_with_date_selector(self) -> None: - """SelectExpression with NumberLiteral variants but date selector falls to default.""" - from datetime import date - - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("val")), - variants=( - Variant( - key=NumberLiteral(value=3, raw="3"), - value=Pattern(elements=(TextElement(value="three"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="not_numeric"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"val": date(2024, 1, 1)}) - assert result == "not_numeric" - assert errors == () - - -class TestVariantMatchingBranches: - """Test variant matching loop continuation branches.""" - - def test_select_with_non_matching_number_literals_covers_loop_continuation( - self, - ) -> None: - """SelectExpression with non-matching NumberLiterals covers 634->629.""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("num")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=2, raw="2"), - value=Pattern(elements=(TextElement(value="two"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=3, raw="3"), - value=Pattern(elements=(TextElement(value="three"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"num": 99}) - assert result == "default" - assert errors == () - - def test_select_with_string_matching_identifier_after_number_literals(self) -> None: - """String selector skips NumberLiteral variants to match Identifier (634->629).""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("status")), - variants=( - Variant( - key=NumberLiteral(value=100, raw="100"), - value=Pattern(elements=(TextElement(value="hundred"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=200, raw="200"), - value=Pattern(elements=(TextElement(value="two_hundred"),)), - default=False, - ), - Variant( - key=Identifier("active"), - value=Pattern(elements=(TextElement(value="Active"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"status": "active"}) - assert result == "Active" - assert errors == () - - def test_select_with_bool_selector_skips_number_literals(self) -> None: - """Bool selector skips NumberLiterals, matches Identifier (634->629).""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("flag")), - variants=( - Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern(elements=(TextElement(value="zero"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one"),)), - default=False, - ), - Variant( - key=Identifier("true"), - value=Pattern(elements=(TextElement(value="yes"),)), - default=False, - ), - Variant( - key=Identifier("false"), - value=Pattern(elements=(TextElement(value="no"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="unknown"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"flag": True}) - assert result == "yes" - assert errors == () - - -class TestVariantNumericMatching: - """Numeric variant matching (line 479->474 coverage).""" - - def test_exact_number_literal_match(self) -> None: - """Exact number match with NumberLiteral variant key.""" - selector = VariableReference(id=Identifier("count")) - variants = ( - Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern(elements=(TextElement(value="zero items"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one item"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="many items"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {"count": 0}) - assert not errors - assert "zero items" in result - - result, errors = resolver.resolve_message(message, {"count": 1}) - assert not errors - assert "one item" in result - - def test_decimal_exact_match_in_variant(self) -> None: - """Decimal value matches NumberLiteral variant key.""" - selector = VariableReference(id=Identifier("amount")) - variants = ( - Variant( - key=NumberLiteral(value=Decimal("1.5"), raw="1.5"), - value=Pattern(elements=(TextElement(value="exact match"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {"amount": Decimal("1.5")}) - assert not errors - assert "exact match" in result - - def test_float_exact_match_in_variant(self) -> None: - """Float value matches NumberLiteral variant key.""" - selector = VariableReference(id=Identifier("price")) - variants = ( - Variant( - key=NumberLiteral(value=Decimal("9.99"), raw="9.99"), - value=Pattern(elements=(TextElement(value="special price"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="regular price"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {"price": Decimal("9.99")}) - assert not errors - assert "special price" in result - - @given(number=st.integers(min_value=-100, max_value=100)) - def test_integer_exact_matching_property(self, number: int) -> None: - """Property: Integer selectors match NumberLiteral variants exactly.""" - event(f"number={number}") - selector = VariableReference(id=Identifier("n")) - variants = ( - Variant( - key=NumberLiteral(value=number, raw=str(number)), - value=Pattern(elements=(TextElement(value="matched"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="not matched"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {"n": number}) - assert not errors - assert "matched" in result - - -class TestNumericVariantEdgeCases: - """Edge cases for numeric variant matching.""" - - def test_boolean_does_not_match_number_variant(self) -> None: - """Boolean values do not match numeric variants (isinstance guard).""" - selector = VariableReference(id=Identifier("flag")) - variants = ( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="numeric one"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {"flag": True}) - assert not errors - assert "default" in result - - def test_none_selector_uses_default(self) -> None: - """None selector value falls through to default.""" - selector = VariableReference(id=Identifier("value")) - variants = ( - Variant( - key=Identifier("none"), - value=Pattern(elements=(TextElement(value="none variant"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default variant"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {"value": None}) - assert not errors - assert "default variant" in result - - @given( - decimal_str=st.decimals( - min_value=Decimal("-100.00"), - max_value=Decimal("100.00"), - allow_nan=False, - allow_infinity=False, - places=2, - ) - ) - def test_decimal_variant_matching_property(self, decimal_str: Decimal) -> None: - """Property: Decimal values match exactly when variant key matches.""" - sign = "negative" if decimal_str.is_signed() else "positive" - event(f"decimal_sign={sign}") - selector = VariableReference(id=Identifier("amount")) - str_repr = str(decimal_str) - variants = ( - Variant( - key=NumberLiteral(value=decimal_str, raw=str_repr), - value=Pattern(elements=(TextElement(value="exact"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {"amount": decimal_str}) - assert not errors - assert "exact" in result - - -class TestNumberLiteralNonMatchingValue: - """Coverage for NumberLiteral with non-matching value (line 616->611).""" - - def test_number_literal_variants_first_no_match_second_matches(self) -> None: - """Multiple NumberLiteral variants where first doesn't match, second does.""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("count")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=2, raw="2"), - value=Pattern(elements=(TextElement(value="two"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=3, raw="3"), - value=Pattern(elements=(TextElement(value="three"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="fallback"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"count": 2}) - assert result == "two" - assert errors == () - - def test_number_literal_variants_all_no_match_uses_default(self) -> None: - """NumberLiteral variants all fail to match, use default.""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("count")), - variants=( - Variant( - key=NumberLiteral(value=10, raw="10"), - value=Pattern(elements=(TextElement(value="ten"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=20, raw="20"), - value=Pattern(elements=(TextElement(value="twenty"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"count": 5}) - assert result == "default" - assert errors == () - - def test_number_literal_with_decimal_no_match(self) -> None: - """NumberLiteral variants with Decimal selector that doesn't match.""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("amount")), - variants=( - Variant( - key=NumberLiteral(value=100, raw="100"), - value=Pattern(elements=(TextElement(value="hundred"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=200, raw="200"), - value=Pattern(elements=(TextElement(value="two_hundred"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="other_amount"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"amount": Decimal("150.50")}) - assert result == "other_amount" - assert errors == () - - def test_number_literal_decimal_no_exact_match(self) -> None: - """NumberLiteral variants with Decimal that doesn't exactly match.""" - select_expr = SelectExpression( - selector=VariableReference(id=Identifier("val")), - variants=( - Variant( - key=NumberLiteral(value=Decimal("1.0"), raw="1.0"), - value=Pattern(elements=(TextElement(value="one_point_oh"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=Decimal("2.5"), raw="2.5"), - value=Pattern(elements=(TextElement(value="two_point_five"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="other_decimal"),)), - default=True, - ), - ), - ) - - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {"val": Decimal("3.7")}) - assert result == "other_decimal" - assert errors == () - - -class TestNumberLiteralSelectorCoverage: - """Test NumberLiteral selector branch in _find_exact_variant (branch 400->395).""" - - def test_number_literal_selector_exact_match(self) -> None: - """Branch 400->395 - Number literal variant exact matching.""" - bundle = FluentBundle("en_US", use_isolating=False) - - bundle.add_resource( - """ -items = { $count -> - [0] No items - [1] One item - [42] The answer - *[other] { $count } items -} -""" - ) - - result, _ = bundle.format_pattern("items", {"count": 0}) - assert "No items" in result - - result, _ = bundle.format_pattern("items", {"count": 1}) - assert "One item" in result - - result, _ = bundle.format_pattern("items", {"count": 42}) - assert "The answer" in result - - def test_number_literal_selector_no_match(self) -> None: - """Branch 400->395 - Number literal no match falls through to default.""" - bundle = FluentBundle("en_US", use_isolating=False) - - bundle.add_resource( - """ -level = { $num -> - [1] Level 1 - [2] Level 2 - *[other] Level unknown -} -""" - ) - - result, _ = bundle.format_pattern("level", {"num": 99}) - assert "Level unknown" in result - - def test_number_literal_with_float_selector(self) -> None: - """Branch 400->395 - Float selector matching number literals.""" - bundle = FluentBundle("en_US", use_isolating=False) - - bundle.add_resource( - """ -rating = { $stars -> - [1] Poor - [2] Fair - [3] Good - [4] Great - [5] Excellent - *[other] Unrated -} -""" - ) - - result, _ = bundle.format_pattern("rating", {"stars": Decimal(5)}) - assert "Excellent" in result - - result, _ = bundle.format_pattern("rating", {"stars": Decimal("3.5")}) - assert "Unrated" in result - - def test_number_literal_match_second_key(self) -> None: - """Branch 400->395 - Number literal match on second+ key (loop continuation).""" - bundle = FluentBundle("en_US", use_isolating=False) - - bundle.add_resource( - """ -score = { $points -> - [10] Ten points - [20] Twenty points - [30] Thirty points - *[other] Unknown -} -""" - ) - - result, _ = bundle.format_pattern("score", {"points": 20}) - assert "Twenty points" in result - - result, _ = bundle.format_pattern("score", {"points": 30}) - assert "Thirty points" in result - - -class TestNumberLiteralVariantMatching: - """Test exact number literal matching in select expressions.""" - - def test_exact_number_literal_match_with_integer(self) -> None: - """Exact match with integer NumberLiteral (line 479).""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource( - """ -msg = { $count -> - [0] zero items - [1] one item - [42] exactly forty-two - *[other] many items -} -""" - ) - - result, errors = bundle.format_pattern("msg", {"count": 42}) - assert result == "exactly forty-two" - assert errors == () - - def test_exact_number_literal_match_with_decimal_pi(self) -> None: - """Exact match with Decimal NumberLiteral value (pi example).""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource( - """ -msg = { $value -> - [3.14] pi - [2.71] euler - *[other] unknown -} -""" - ) - - result, errors = bundle.format_pattern("msg", {"value": Decimal("3.14")}) - assert result == "pi" - assert errors == () - - def test_exact_number_literal_match_with_decimal(self) -> None: - """Exact match with Decimal NumberLiteral (financial value precision).""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource( - """ -msg = { $amount -> - [99.99] special price - *[other] regular price -} -""" - ) - - result, errors = bundle.format_pattern("msg", {"amount": Decimal("99.99")}) - assert result == "special price" - assert errors == () - - -# ============================================================================ -# SELECT EXPRESSION CONSTRUCTION AND FALLBACK -# ============================================================================ - - -class TestFallbackVariantNoVariants: - """Empty variant list and missing default error paths (lines 645-648).""" - - def test_select_expression_with_no_variants_rejected_at_construction(self) -> None: - """SelectExpression with empty variants is rejected by __post_init__.""" - selector = VariableReference(id=Identifier("count")) - with pytest.raises(ValueError, match="requires at least one variant"): - SelectExpression(selector=selector, variants=()) - - def test_select_expression_without_default_rejected_at_construction(self) -> None: - """SelectExpression without a default variant is rejected by __post_init__.""" - selector = VariableReference(id=Identifier("count")) - variant = Variant( - key=Identifier("one"), - value=Pattern(elements=(TextElement(value="one"),)), - default=False, - ) - with pytest.raises(ValueError, match="exactly one default variant"): - SelectExpression(selector=selector, variants=(variant,)) - - -class TestSelectExpressionFallbackPaths: - """Test fallback variant selection logic.""" - - def test_selector_error_uses_default_variant(self) -> None: - """When selector fails due to missing variable, uses default variant.""" - selector = VariableReference(id=Identifier("missing")) - variants = ( - Variant( - key=Identifier("one"), - value=Pattern(elements=(TextElement(value="variant one"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement(value="default variant"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, errors = resolver.resolve_message(message, {}) - assert "default variant" in result - assert len(errors) > 0 - - def test_selector_error_uses_default_variant_fallback(self) -> None: - """When selector fails, the marked default variant is selected.""" - selector = VariableReference(id=Identifier("missing")) - variants = ( - Variant( - key=Identifier("first"), - value=Pattern(elements=(TextElement(value="first variant"),)), - default=False, - ), - Variant( - key=Identifier("second"), - value=Pattern(elements=(TextElement(value="default variant"),)), - default=True, - ), - ) - select_expr = SelectExpression(selector=selector, variants=variants) - pattern = Pattern(elements=(Placeable(expression=select_expr),)) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=FunctionRegistry(), - ) - - result, _errors = resolver.resolve_message(message, {}) - assert "default variant" in result - - -# ============================================================================ -# FLUENT NUMBER VARIANT MATCHING -# ============================================================================ - - -class TestResolverFluentNumberVariantMatching: - """Test FluentNumber handling in variant selection.""" - - def test_fluent_number_matches_numeric_variant_key(self) -> None: - """FluentNumber value extraction for numeric variant matching (line 502).""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource( - """ -msg = { NUMBER($count) -> - [1000] Exactly one thousand - *[other] Other value -} -""" - ) - - result, errors = bundle.format_pattern("msg", {"count": 1000}) - assert len(errors) == 0 - assert "Exactly one thousand" in result - - def test_fluent_number_plural_category_selection(self) -> None: - """FluentNumber value extraction for CLDR plural matching (line 608).""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource( - """ -msg = { NUMBER($count) -> - [one] One item - *[other] Many items -} -""" - ) - - result, errors = bundle.format_pattern("msg", {"count": 1}) - assert len(errors) == 0 - assert "One item" in result - - def test_fluent_number_with_formatted_display(self) -> None: - """FluentNumber preserves numeric value for matching while showing formatted string.""" - bundle = FluentBundle("en", use_isolating=False) - bundle.add_resource( - """ -msg = { NUMBER($amount, minimumFractionDigits: 2) -> - [1000] Exactly one thousand - *[other] Other -} -""" - ) - - result, errors = bundle.format_pattern("msg", {"amount": 1000}) - assert len(errors) == 0 - assert "Exactly one thousand" in result - - -# ============================================================================ -# FORMAT VALUE COMPREHENSIVE -# ============================================================================ - - -class TestFormatValueComprehensive: - """Test _format_value with all FluentValue types.""" - - def _make_resolver(self) -> FluentResolver: - return FluentResolver( - locale="en_US", - messages={}, - terms={}, - function_registry=FunctionRegistry(), - use_isolating=False, - ) - - def test_format_value_with_string(self) -> None: - """Verify _format_value handles strings.""" - resolver = self._make_resolver() - assert resolver._format_value("test") == "test" - assert resolver._format_value("") == "" - - def test_format_value_with_bool_true(self) -> None: - """Verify _format_value handles True as 'true'.""" - assert self._make_resolver()._format_value(True) == "true" - - def test_format_value_with_bool_false(self) -> None: - """Verify _format_value handles False as 'false'.""" - assert self._make_resolver()._format_value(False) == "false" - - def test_format_value_with_int(self) -> None: - """Verify _format_value handles integers.""" - resolver = self._make_resolver() - assert resolver._format_value(42) == "42" - assert resolver._format_value(0) == "0" - assert resolver._format_value(-100) == "-100" - - def test_format_value_with_decimal(self) -> None: - """Verify _format_value handles Decimal values.""" - resolver = self._make_resolver() - assert resolver._format_value(Decimal("3.14")) == "3.14" - assert resolver._format_value(Decimal(0)) == "0" - assert resolver._format_value(Decimal("123.45")) == "123.45" - - def test_format_value_with_none(self) -> None: - """Verify _format_value handles None as empty string.""" - assert self._make_resolver()._format_value(None) == "" - - def test_format_value_with_datetime(self) -> None: - """Verify _format_value handles datetime via str().""" - dt = datetime(2025, 12, 11, 15, 30, 45, tzinfo=UTC) - result = self._make_resolver()._format_value(dt) - assert "2025" in result - assert "12" in result - assert "11" in result - - @given( - value=st.one_of( - st.text(), - st.integers(), - st.decimals(allow_nan=False, allow_infinity=False), - st.booleans(), - st.none(), - ) - ) - def test_format_value_never_raises(self, value: str | int | Decimal | bool | None) -> None: - """Property: _format_value never raises exceptions.""" - event(f"value_type={type(value).__name__}") - result = self._make_resolver()._format_value(value) - assert isinstance(result, str) - - -# ============================================================================ -# ERROR PATHS -# ============================================================================ - - -class TestResolverErrorPaths: - """Test error handling paths in resolver.""" - - def test_missing_variable_returns_error_message(self) -> None: - """Missing variable in select expression returns error with fallback.""" - ftl = """test = { $x -> - [a] Value A - *[b] Default -} -""" - bundle = FluentBundle("en", use_isolating=False, strict=False) - bundle.add_resource(ftl) - - result, errors = bundle.format_pattern("test", {}) - assert len(errors) > 0 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.REFERENCE - assert errors[0].diagnostic is not None - assert errors[0].diagnostic.code.name == "VARIABLE_NOT_PROVIDED" - assert result == "Default" - - -class TestPlaceableWithFormattingError: - """Coverage for Placeable exception path with FrozenFluentError FORMATTING.""" - - def test_placeable_formatting_error_with_fallback(self) -> None: - """Placeable that raises FrozenFluentError (FORMATTING) uses fallback value.""" - from ftllexengine.diagnostics import ( - FrozenErrorContext, - ) - - def raise_formatting_error(_value: str) -> str: - context = FrozenErrorContext( - input_value="test", - locale_code="en", - parse_type="number", - fallback_value="FALLBACK", - ) - msg = "Custom formatting error" - raise FrozenFluentError( - msg, - ErrorCategory.FORMATTING, - context=context, - ) - - registry = FunctionRegistry() - registry.register(raise_formatting_error, ftl_name="ERROR_FUNC") - - func_call = FunctionReference( - id=Identifier("ERROR_FUNC"), - arguments=CallArguments( - positional=(StringLiteral(value="test"),), - named=(), - ), - ) - - pattern = Pattern( - elements=( - TextElement(value="Before "), - Placeable(expression=func_call), - TextElement(value=" After"), - ) - ) - message = Message(id=Identifier("msg"), value=pattern, attributes=()) - - resolver = FluentResolver( - locale="en", - messages={"msg": message}, - terms={}, - function_registry=registry, - use_isolating=False, - ) - - result, errors = resolver.resolve_message(message, {}) - assert result == "Before FALLBACK After" - assert len(errors) == 1 - assert isinstance(errors[0], FrozenFluentError) - assert errors[0].category == ErrorCategory.FORMATTING +from tests.runtime_resolver_selection_cases.fallback_and_errors import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_resolver_selection_cases.number_literal_edges import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_resolver_selection_cases.numeric_matching import * # noqa: F403 - split module reuses shared support imports +from tests.runtime_resolver_selection_cases.pattern_resolution import * # noqa: F403 - split module reuses shared support imports diff --git a/tests/test_syntax_cursor.py b/tests/test_syntax_cursor.py index c1a9962a..e9b7e8b7 100644 --- a/tests/test_syntax_cursor.py +++ b/tests/test_syntax_cursor.py @@ -1,1002 +1,14 @@ -"""Tests for syntax.cursor: Cursor, ParseError, ParseResult, LineOffsetCache. - -Validates the immutable cursor pattern for type-safe parsing, line/column -computation, and the LineOffsetCache binary-search infrastructure. -""" - -from __future__ import annotations - -import pytest - -from ftllexengine.syntax.cursor import Cursor, LineOffsetCache, ParseError, ParseResult - -# ============================================================================ -# CURSOR BASIC TESTS -# ============================================================================ - - -class TestCursorBasic: - """Test basic cursor functionality.""" - - def test_create_cursor(self) -> None: - """Create cursor at position 0.""" - cursor = Cursor("hello", 0) - - assert cursor.source == "hello" - assert cursor.pos == 0 - assert not cursor.is_eof - - def test_create_cursor_at_middle(self) -> None: - """Create cursor at middle of source.""" - cursor = Cursor("hello", 2) - - assert cursor.pos == 2 - assert cursor.current == "l" - - def test_cursor_immutability(self) -> None: - """Cursor is immutable (frozen dataclass).""" - cursor = Cursor("hello", 0) - - with pytest.raises(AttributeError): - cursor.pos = 5 # type: ignore[misc] - - def test_cursor_negative_pos_raises_value_error(self) -> None: - """Cursor with negative pos raises ValueError (lines 95-96). - - Negative positions silently return characters from the end of the - source via Python indexing. The guard makes this construction error - explicit rather than allowing silent wrong-character access. - """ - with pytest.raises(ValueError, match="must be >= 0"): - Cursor("hello", -1) - - def test_cursor_pos_beyond_source_raises_value_error(self) -> None: - """Cursor with pos > len(source) raises ValueError (lines 98-102). - - advance() always clamps to len(source); constructing with a larger - value indicates a programming error, not a valid EOF position. - """ - with pytest.raises(ValueError, match="exceeds source length"): - Cursor("hello", 6) - - -# ============================================================================ -# EOF DETECTION -# ============================================================================ - - -class TestCursorEOF: - """Test EOF detection.""" - - def test_is_eof_false_at_start(self) -> None: - """is_eof is False at start of source.""" - cursor = Cursor("hello", 0) - - assert not cursor.is_eof - - def test_is_eof_false_in_middle(self) -> None: - """is_eof is False in middle of source.""" - cursor = Cursor("hello", 2) - - assert not cursor.is_eof - - def test_is_eof_true_at_end(self) -> None: - """is_eof is True at end of source.""" - cursor = Cursor("hello", 5) - - assert cursor.is_eof - - def test_construction_beyond_end_raises(self) -> None: - """Constructing a cursor with pos > len(source) raises ValueError. - - EOF is represented exclusively by pos == len(source). Positions beyond - the source length are construction errors: advance() always clamps to - len(source), so they cannot arise through normal cursor navigation. - """ - with pytest.raises(ValueError, match="exceeds source length"): - Cursor("hello", 10) - - def test_is_eof_true_for_empty_source(self) -> None: - """is_eof is True for empty source at position 0.""" - cursor = Cursor("", 0) - - assert cursor.is_eof - - -# ============================================================================ -# CURRENT CHARACTER ACCESS -# ============================================================================ - - -class TestCursorCurrent: - """Test current character access.""" - - def test_current_at_start(self) -> None: - """Get current character at start.""" - cursor = Cursor("hello", 0) - - assert cursor.current == "h" - - def test_current_in_middle(self) -> None: - """Get current character in middle.""" - cursor = Cursor("hello", 2) - - assert cursor.current == "l" - - def test_current_at_last_char(self) -> None: - """Get current character at last position.""" - cursor = Cursor("hello", 4) - - assert cursor.current == "o" - - def test_current_raises_eof_error_at_end(self) -> None: - """Accessing current at EOF raises EOFError.""" - cursor = Cursor("hello", 5) - - with pytest.raises(EOFError, match="Unexpected EOF"): - _ = cursor.current - - def test_current_raises_value_error_beyond_end(self) -> None: - """Constructing cursor beyond end raises ValueError, not EOFError. - - The valid way to reach EOF is pos == len(source); positions strictly - greater are rejected at construction time so .current is never reached. - """ - with pytest.raises(ValueError, match="exceeds source length"): - Cursor("hello", 10) - - def test_current_with_unicode(self) -> None: - """Get current character with Unicode.""" - cursor = Cursor("привет", 0) - - assert cursor.current == "п" - - def test_current_with_emoji(self) -> None: - """Get current character with emoji.""" - cursor = Cursor("hello 👋 world", 6) - - assert cursor.current == "👋" - - -# ============================================================================ -# PEEK OPERATIONS -# ============================================================================ - - -class TestCursorPeek: - """Test peek operations.""" - - def test_peek_current(self) -> None: - """Peek at current position (offset 0).""" - cursor = Cursor("hello", 0) - - assert cursor.peek(0) == "h" - - def test_peek_next(self) -> None: - """Peek at next position (offset 1).""" - cursor = Cursor("hello", 0) - - assert cursor.peek(1) == "e" - - def test_peek_multiple_ahead(self) -> None: - """Peek multiple positions ahead.""" - cursor = Cursor("hello", 0) - - assert cursor.peek(2) == "l" - assert cursor.peek(3) == "l" - assert cursor.peek(4) == "o" - - def test_peek_at_eof_returns_none(self) -> None: - """Peek at EOF returns None.""" - cursor = Cursor("hello", 5) - - assert cursor.peek(0) is None - - def test_peek_beyond_eof_returns_none(self) -> None: - """Peek beyond EOF returns None.""" - cursor = Cursor("hello", 3) - - assert cursor.peek(5) is None - - def test_peek_does_not_modify_cursor(self) -> None: - """Peek does not modify cursor position.""" - cursor = Cursor("hello", 0) - - _ = cursor.peek(3) - assert cursor.pos == 0 - assert cursor.current == "h" - - def test_peek_negative_offset_returns_none(self) -> None: - """Negative offset returns None. - - Without the target_pos < 0 guard, peek(-1) at pos=0 would compute - target_pos=-1, skip the >=len(source) check (since -1 < 5), and - return source[-1]="o" via Python negative indexing — a silent wrong - read. The guard makes look-behind attempts safe-but-unproductive. - """ - cursor = Cursor("hello", 0) - - assert cursor.peek(-1) is None - assert cursor.peek(-5) is None - - def test_peek_negative_offset_at_mid_source_returns_none(self) -> None: - """Negative offset whose magnitude exceeds pos returns None. - - At pos=2, peek(-3) yields target_pos=-1 < 0. Without the guard this - would silently return source[-1] ("o") instead of None. - """ - cursor = Cursor("hello", 2) - - # offset that undershoots the start of the source - assert cursor.peek(-3) is None - - def test_peek_negative_offset_exactly_at_start_returns_none(self) -> None: - """Negative offset equal to pos yields target_pos=0, which is valid. - - peek(-2) at pos=2 yields target_pos=0 which is within bounds and - returns the first character. Only offsets that produce negative - target_pos return None. - """ - cursor = Cursor("hello", 2) - - # target_pos = 0 — within bounds, returns first char - assert cursor.peek(-2) == "h" - # target_pos = -1 — before start, returns None - assert cursor.peek(-3) is None - - -# ============================================================================ -# ADVANCE OPERATIONS -# ============================================================================ - - -class TestCursorAdvance: - """Test cursor advancement.""" - - def test_advance_single_position(self) -> None: - """Advance cursor by 1 position.""" - cursor = Cursor("hello", 0) - - new_cursor = cursor.advance() - - assert new_cursor.pos == 1 - assert new_cursor.current == "e" - # Original unchanged - assert cursor.pos == 0 - - def test_advance_multiple_positions(self) -> None: - """Advance cursor by multiple positions.""" - cursor = Cursor("hello", 0) - - new_cursor = cursor.advance(3) - - assert new_cursor.pos == 3 - assert new_cursor.current == "l" - - def test_advance_to_eof(self) -> None: - """Advance cursor to EOF.""" - cursor = Cursor("hello", 0) - - new_cursor = cursor.advance(5) - - assert new_cursor.pos == 5 - assert new_cursor.is_eof - - def test_advance_beyond_eof_clamps_to_length(self) -> None: - """Advance beyond EOF clamps to source length.""" - cursor = Cursor("hello", 0) - - new_cursor = cursor.advance(100) - - assert new_cursor.pos == 5 - assert new_cursor.is_eof - - def test_advance_preserves_immutability(self) -> None: - """Advance creates new cursor, original unchanged.""" - cursor = Cursor("hello", 2) - - new_cursor = cursor.advance() - - assert cursor.pos == 2 - assert new_cursor.pos == 3 - assert cursor is not new_cursor - - def test_advance_zero_positions_raises(self) -> None: - """Advance by 0 raises ValueError — zero advance is a no-op and always a bug.""" - cursor = Cursor("hello", 2) - - with pytest.raises(ValueError, match="advance\\(\\) count must be >= 1, got 0"): - cursor.advance(0) - - def test_advance_negative_positions_raises(self) -> None: - """Advance by negative count raises ValueError. - - Negative advance is always a programming error: cursor.advance(-1) at - pos=0 would create Cursor(source, -1) which makes .current return - source[-1] (the last character), silently corrupting parser state. - """ - cursor = Cursor("hello", 2) - - with pytest.raises(ValueError, match="advance\\(\\) count must be >= 1, got -1"): - cursor.advance(-1) - - def test_advance_large_negative_positions_raises(self) -> None: - """Advance by large negative count raises ValueError.""" - cursor = Cursor("hello", 4) - - with pytest.raises(ValueError, match="advance\\(\\) count must be >= 1, got -100"): - cursor.advance(-100) - - -# ============================================================================ -# SLICE OPERATIONS -# ============================================================================ - - -class TestCursorSlice: - """Test cursor slice operations.""" - - def test_slice_to_from_start(self) -> None: - """Slice from start to middle.""" - cursor = Cursor("hello world", 0) - - text = cursor.slice_to(5) - - assert text == "hello" - - def test_slice_to_from_middle(self) -> None: - """Slice from middle position.""" - cursor = Cursor("hello world", 6) - - text = cursor.slice_to(11) - - assert text == "world" - - def test_slice_to_empty(self) -> None: - """Slice with same start and end returns empty string.""" - cursor = Cursor("hello", 2) - - text = cursor.slice_to(2) - - assert text == "" - - def test_slice_to_single_char(self) -> None: - """Slice single character.""" - cursor = Cursor("hello", 1) - - text = cursor.slice_to(2) - - assert text == "e" - - def test_slice_to_entire_source(self) -> None: - """Slice entire source from position 0.""" - cursor = Cursor("hello", 0) - - text = cursor.slice_to(5) - - assert text == "hello" - - def test_slice_to_with_unicode(self) -> None: - """Slice with Unicode characters.""" - cursor = Cursor("привет мир", 0) - - text = cursor.slice_to(6) - - assert text == "привет" - - -# ============================================================================ -# LINE AND COLUMN COMPUTATION -# ============================================================================ - - -class TestCursorLineCol: - """Test line and column computation.""" - - def test_compute_line_col_at_start(self) -> None: - """Compute line:col at start of source.""" - cursor = Cursor("hello", 0) - - line, col = cursor.compute_line_col() - - assert line == 1 - assert col == 1 - - def test_compute_line_col_in_first_line(self) -> None: - """Compute line:col in middle of first line.""" - cursor = Cursor("hello world", 6) - - line, col = cursor.compute_line_col() - - assert line == 1 - assert col == 7 - - def test_compute_line_col_at_newline(self) -> None: - """Compute line:col at newline character.""" - cursor = Cursor("hello\nworld", 5) - - line, col = cursor.compute_line_col() - - assert line == 1 - assert col == 6 - - def test_compute_line_col_after_newline(self) -> None: - """Compute line:col after newline (start of line 2).""" - cursor = Cursor("hello\nworld", 6) - - line, col = cursor.compute_line_col() - - assert line == 2 - assert col == 1 - - def test_compute_line_col_in_second_line(self) -> None: - """Compute line:col in middle of second line.""" - cursor = Cursor("hello\nworld", 9) - - line, col = cursor.compute_line_col() - - assert line == 2 - assert col == 4 - - def test_compute_line_col_multiple_lines(self) -> None: - """Compute line:col across multiple lines.""" - source = "line1\nline2\nline3\nline4" - cursor = Cursor(source, 12) # Start of line3 - - line, col = cursor.compute_line_col() - - assert line == 3 - assert col == 1 - - def test_compute_line_col_at_eof(self) -> None: - """Compute line:col at EOF.""" - cursor = Cursor("hello\nworld", 11) - - line, col = cursor.compute_line_col() - - assert line == 2 - assert col == 6 - - def test_line_col_property(self) -> None: - """Test line_col property convenience wrapper.""" - cursor = Cursor("hello\nworld", 9) - - line, col = cursor.compute_line_col() - - assert line == 2 - assert col == 4 - - -# ============================================================================ -# PARSE RESULT TESTS -# ============================================================================ - - -class TestParseResult: - """Test ParseResult container.""" - - def test_create_parse_result(self) -> None: - """Create ParseResult with value and cursor.""" - cursor = Cursor("hello", 0) - result = ParseResult("h", cursor.advance()) - - assert result.value == "h" - assert result.cursor.pos == 1 - - def test_parse_result_immutability(self) -> None: - """ParseResult is immutable.""" - cursor = Cursor("hello", 0) - result = ParseResult("test", cursor) - - with pytest.raises(AttributeError): - result.value = "new" # type: ignore[misc] - - def test_parse_result_with_complex_value(self) -> None: - """ParseResult can hold complex types.""" - cursor = Cursor("hello", 3) - value = {"key": "value", "list": [1, 2, 3]} - result = ParseResult(value, cursor) - - assert result.value == {"key": "value", "list": [1, 2, 3]} - assert result.cursor.pos == 3 - - -# ============================================================================ -# PARSE ERROR TESTS -# ============================================================================ - - -class TestParseError: - """Test ParseError functionality.""" - - def test_create_parse_error(self) -> None: - """Create ParseError with message and cursor.""" - cursor = Cursor("hello", 2) - error = ParseError("Expected '}'", cursor) - - assert error.message == "Expected '}'" - assert error.cursor.pos == 2 - assert error.expected == () - - def test_create_parse_error_with_expected(self) -> None: - """Create ParseError with expected tokens.""" - cursor = Cursor("hello", 2) - error = ParseError("Unexpected", cursor, expected=("}", "]")) - - assert error.expected == ("}", "]") - - def test_parse_error_immutability(self) -> None: - """ParseError is immutable.""" - cursor = Cursor("hello", 2) - error = ParseError("Error", cursor) - - with pytest.raises(AttributeError): - error.message = "New error" # type: ignore[misc] - - def test_format_error_simple(self) -> None: - """Format error without expected tokens.""" - cursor = Cursor("hello", 2) - error = ParseError("Expected '}'", cursor) - - formatted = error.format_error() - - assert "1:3:" in formatted - assert "Expected '}'" in formatted - - def test_format_error_with_expected(self) -> None: - """Format error with expected tokens.""" - cursor = Cursor("hello", 2) - error = ParseError("Unexpected token", cursor, expected=("}", "]")) - - formatted = error.format_error() - - assert "1:3:" in formatted - assert "Unexpected token" in formatted - assert "expected:" in formatted - assert "'}'" in formatted - assert "']'" in formatted - - def test_format_error_multiline_source(self) -> None: - """Format error with multiline source.""" - source = "line1\nline2\nline3" - cursor = Cursor(source, 8) # Middle of line2 - error = ParseError("Error here", cursor) - - formatted = error.format_error() - - assert "2:3:" in formatted - - def test_format_with_context_single_line(self) -> None: - """Format error with context for single line.""" - cursor = Cursor("hello world", 6) - error = ParseError("Expected '}'", cursor) - - formatted = error.format_with_context() - - assert "1:7:" in formatted - assert "hello world" in formatted - assert "^" in formatted - - def test_format_with_context_multiline(self) -> None: - """Format error with context showing multiple lines.""" - source = "line1\nline2\nline3\nline4" - cursor = Cursor(source, 8) # Middle of line2 - error = ParseError("Error", cursor) - - formatted = error.format_with_context() - - assert "2:3:" in formatted - assert "line1" in formatted - assert "line2" in formatted - assert "line3" in formatted - assert "^" in formatted - - def test_format_with_context_custom_context_lines(self) -> None: - """Format error with custom context line count.""" - source = "line1\nline2\nline3\nline4\nline5" - cursor = Cursor(source, 12) # Start of line3 - error = ParseError("Error", cursor) - - formatted = error.format_with_context(context_lines=1) - - assert "line2" in formatted - assert "line3" in formatted - assert "line4" in formatted - - def test_format_with_context_at_start(self) -> None: - """Format error with context at start of file.""" - source = "line1\nline2\nline3" - cursor = Cursor(source, 0) - error = ParseError("Error at start", cursor) - - formatted = error.format_with_context() - - assert "1:1:" in formatted - assert "line1" in formatted - assert "^" in formatted - - def test_format_with_context_at_end(self) -> None: - """Format error with context at end of file.""" - source = "line1\nline2\nline3" - cursor = Cursor(source, 17) # End of line3 - error = ParseError("Error at end", cursor) - - formatted = error.format_with_context() - - assert "line3" in formatted - assert "^" in formatted - - -# ============================================================================ -# EDGE CASES -# ============================================================================ - - -class TestCursorEdgeCases: - """Test cursor edge cases.""" - - def test_empty_source(self) -> None: - """Handle empty source string.""" - cursor = Cursor("", 0) - - assert cursor.is_eof - assert cursor.source == "" - - def test_single_character_source(self) -> None: - """Handle single character source.""" - cursor = Cursor("x", 0) - - assert cursor.current == "x" - assert not cursor.is_eof - - def test_cursor_with_only_newlines(self) -> None: - """Handle source with only newlines.""" - cursor = Cursor("\n\n\n", 0) - - assert cursor.current == "\n" - line, _ = cursor.compute_line_col() - assert line == 1 - - def test_cursor_with_tabs(self) -> None: - """Handle source with tabs.""" - cursor = Cursor("hello\tworld", 5) - - assert cursor.current == "\t" - - def test_cursor_with_mixed_whitespace(self) -> None: - """Handle source with mixed whitespace.""" - source = " \t\n \t\n" - cursor = Cursor(source, 4) - - line, col = cursor.compute_line_col() - assert line == 2 - assert col == 1 - - -# ============================================================================ -# INTEGRATION TESTS -# ============================================================================ - - -class TestCursorIntegration: - """Test cursor in realistic parsing scenarios.""" - - def test_parse_identifier_pattern(self) -> None: - """Simulate parsing an identifier.""" - cursor = Cursor("hello_world = value", 0) - start_pos = cursor.pos - - # Advance while identifier characters - while (not cursor.is_eof and cursor.current.isalnum()) or cursor.current == "_": - cursor = cursor.advance() - - identifier = Cursor("hello_world = value", start_pos).slice_to(cursor.pos) - - assert identifier == "hello_world" - assert cursor.current == " " - - def test_parse_quoted_string_pattern(self) -> None: - """Simulate parsing a quoted string.""" - cursor = Cursor('"hello world"', 0) - - # Skip opening quote - cursor = cursor.advance() - start_pos = cursor.pos - - # Advance until closing quote - while not cursor.is_eof and cursor.current != '"': - cursor = cursor.advance() - - content = Cursor('"hello world"', start_pos).slice_to(cursor.pos) - - assert content == "hello world" - - def test_skip_whitespace_pattern(self) -> None: - """Simulate skipping whitespace.""" - cursor = Cursor(" hello", 0) - - # Skip whitespace - while not cursor.is_eof and cursor.current in " \t\n": - cursor = cursor.advance() - - assert cursor.current == "h" - assert cursor.pos == 3 - - def test_lookahead_pattern(self) -> None: - """Simulate lookahead for parser decision.""" - cursor = Cursor("hello = value", 5) - - # Check if next char is '=' - if cursor.peek(1) == "=": - cursor = cursor.advance(2) # Skip ' =' - - assert cursor.current == " " - assert cursor.pos == 7 - - def test_error_reporting_pattern(self) -> None: - """Simulate error reporting with line:col.""" - source = "line1\nline2 { $var\nline3" - cursor = Cursor(source, 18) # After $var - - error = ParseError("Expected '}'", cursor, expected=("}", )) - formatted = error.format_with_context() - - assert "2:13:" in formatted - assert "line2 { $var" in formatted - assert "^" in formatted - - -# ============================================================================ -# LINE OFFSET CACHE TESTS (from test_cursor_infrastructure.py) -# ============================================================================ - - -class TestLineOffsetCacheInit: - """LineOffsetCache builds the offset table correctly during initialization.""" - - def test_init_empty_source(self) -> None: - """Empty source produces a single-element offset table [(0,)].""" - cache = LineOffsetCache("") - - assert cache._source_len == 0 # pylint: disable=protected-access - assert cache._offsets == (0,) # pylint: disable=protected-access - - def test_init_single_line(self) -> None: - """Single-line source with no newlines produces offset table [(0,)].""" - cache = LineOffsetCache("hello world") - - assert cache._source_len == 11 # pylint: disable=protected-access - assert cache._offsets == (0,) # pylint: disable=protected-access - - def test_init_multiple_lines(self) -> None: - """Three-line source produces offsets at the start of each line.""" - cache = LineOffsetCache("line1\nline2\nline3") - - assert cache._source_len == 17 # pylint: disable=protected-access - assert cache._offsets == (0, 6, 12) # pylint: disable=protected-access - - def test_init_trailing_newline(self) -> None: - """Trailing newline produces a third entry for the (empty) final line.""" - cache = LineOffsetCache("line1\nline2\n") - - assert cache._source_len == 12 # pylint: disable=protected-access - assert cache._offsets == (0, 6, 12) # pylint: disable=protected-access - - def test_init_consecutive_newlines(self) -> None: - """Consecutive newlines create entries for each empty line.""" - cache = LineOffsetCache("a\n\nb") - - assert cache._source_len == 4 # pylint: disable=protected-access - assert cache._offsets == (0, 2, 3) # pylint: disable=protected-access - - def test_init_only_newlines(self) -> None: - """Source with only newlines creates an entry after each one.""" - cache = LineOffsetCache("\n\n\n") - - assert cache._source_len == 3 # pylint: disable=protected-access - assert cache._offsets == (0, 1, 2, 3) # pylint: disable=protected-access - - -class TestLineOffsetCacheGetLineCol: - """LineOffsetCache.get_line_col maps byte offsets to (line, column) pairs.""" - - def test_get_line_col_first_position(self) -> None: - """Position 0 maps to line 1, column 1.""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(0) - - assert line == 1 - assert col == 1 - - def test_get_line_col_middle_of_first_line(self) -> None: - """Position 2 on first line maps to line 1, column 3.""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(2) - - assert line == 1 - assert col == 3 - - def test_get_line_col_start_of_second_line(self) -> None: - """First position of second line maps to line 2, column 1.""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(6) - - assert line == 2 - assert col == 1 - - def test_get_line_col_middle_of_second_line(self) -> None: - """Middle of second line maps to the correct column.""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(8) - - assert line == 2 - assert col == 3 - - def test_get_line_col_at_newline(self) -> None: - """Position at the newline character maps to the end of that line.""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(5) - - assert line == 1 - assert col == 6 - - def test_get_line_col_at_end(self) -> None: - """Position at source length maps to the final line end position.""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(11) - - assert line == 2 - assert col == 6 - - def test_get_line_col_negative_position(self) -> None: - """Negative position is clamped to 0 (line 1, column 1).""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(-5) - - assert line == 1 - assert col == 1 - - def test_get_line_col_position_beyond_source(self) -> None: - """Position beyond source length is clamped to source length.""" - cache = LineOffsetCache("hello\nworld") - - line, col = cache.get_line_col(100) - - assert line == 2 - assert col == 6 - - def test_get_line_col_empty_source(self) -> None: - """Position 0 in empty source maps to line 1, column 1.""" - cache = LineOffsetCache("") - - line, col = cache.get_line_col(0) - - assert line == 1 - assert col == 1 - - def test_get_line_col_third_line(self) -> None: - """Position at the start of the third line maps correctly.""" - cache = LineOffsetCache("a\nb\nc\nd") - - line, col = cache.get_line_col(4) - - assert line == 3 - assert col == 1 - - def test_get_line_col_many_lines_binary_search(self) -> None: - """Binary search finds correct line across many lines.""" - source = "\n".join(f"line{i}" for i in range(10)) - cache = LineOffsetCache(source) - - assert cache.get_line_col(0) == (1, 1) - assert cache.get_line_col(24) == (5, 1) - assert cache.get_line_col(42) == (8, 1) - assert cache.get_line_col(54) == (10, 1) - - def test_get_line_col_long_lines(self) -> None: - """Long lines with many characters compute column correctly.""" - cache = LineOffsetCache("a" * 100 + "\n" + "b" * 50) - - line, col = cache.get_line_col(110) - - assert line == 2 - assert col == 10 - - def test_get_line_col_position_exactly_at_source_len(self) -> None: - """Position equal to source length maps to just past the last character.""" - source = "abc" - cache = LineOffsetCache(source) - - line, col = cache.get_line_col(3) - - assert line == 1 - assert col == 4 - - def test_get_line_col_consecutive_calls(self) -> None: - """Multiple consecutive calls all return correct values.""" - cache = LineOffsetCache("hello\nworld\n!") - - assert cache.get_line_col(0) == (1, 1) - assert cache.get_line_col(5) == (1, 6) - assert cache.get_line_col(6) == (2, 1) - assert cache.get_line_col(11) == (2, 6) - assert cache.get_line_col(12) == (3, 1) - - -class TestCursorSkipLineEnd: - """Cursor.skip_line_end recognizes LF as the only line-ending character.""" - - def test_skip_line_end_at_regular_char(self) -> None: - """At a regular character, skip_line_end returns self unchanged.""" - cursor = Cursor("hello\nworld", 0) - - result = cursor.skip_line_end() - - assert result.pos == 0 - assert result is cursor - - def test_skip_line_end_at_middle_char(self) -> None: - """At a middle character, skip_line_end returns self unchanged.""" - cursor = Cursor("hello\nworld", 2) - - result = cursor.skip_line_end() - - assert result.pos == 2 - assert result is cursor - - def test_skip_line_end_at_lf(self) -> None: - """At LF, skip_line_end advances past the newline.""" - cursor = Cursor("hello\nworld", 5) - - result = cursor.skip_line_end() - - assert result.pos == 6 - - def test_skip_line_end_cr_not_recognized(self) -> None: - """CR alone is not a recognized line ending; cursor stays put. - - Cursor expects LF-normalized input. CR must be converted before - creating a Cursor. FluentParserV1.parse() handles normalization. - """ - cursor = Cursor("hello\rworld", 5) - - result = cursor.skip_line_end() - - assert result.pos == 5 - assert result is cursor - - def test_skip_line_end_at_crlf_cr_position(self) -> None: - """At CR within CRLF, skip_line_end does not advance (CR not recognized). - - For proper handling, normalize input to LF before creating a Cursor. - """ - cursor = Cursor("hello\r\nworld", 5) - - result = cursor.skip_line_end() - - assert result.pos == 5 - assert result is cursor - - def test_skip_line_end_at_crlf_lf_position(self) -> None: - """At LF within CRLF, skip_line_end advances past the LF.""" - cursor = Cursor("hello\r\nworld", 6) - - result = cursor.skip_line_end() - - assert result.pos == 7 - - def test_skip_line_end_at_eof(self) -> None: - """At EOF, skip_line_end returns self unchanged.""" - cursor = Cursor("hello", 5) - - result = cursor.skip_line_end() - - assert result.pos == 5 - assert result is cursor +"""Aggregated syntax cursor test surface.""" + +from tests.syntax_cursor_cases.advance_operations import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.current_character_access import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.cursor_basic_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.edge_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.eof_detection import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.integration_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.line_and_column_computation import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.line_offset_cache_tests_from_test_cursor_infrastructure_py import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.parse_error_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.parse_result_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.peek_operations import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_cases.slice_operations import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_cursor_property.py b/tests/test_syntax_cursor_property.py index 692e8be8..a34a178a 100644 --- a/tests/test_syntax_cursor_property.py +++ b/tests/test_syntax_cursor_property.py @@ -1,1041 +1,9 @@ -"""Hypothesis property-based tests for syntax.cursor module. - -Tests cursor immutability, EOF handling, navigation, and ParseResult/ParseError -properties. Combines targeted property tests with comprehensive contract verification. -""" - -from __future__ import annotations - -import pytest -from hypothesis import assume, event, given, settings -from hypothesis import strategies as st - -from ftllexengine.syntax.cursor import Cursor, ParseError, ParseResult - -# ============================================================================ -# HYPOTHESIS STRATEGIES -# ============================================================================ - - -# Strategy for source text - keep max_size for performance -source_text = st.text( - alphabet=st.characters(blacklist_categories=["Cc"], blacklist_characters=["\x00"]), - min_size=0, - max_size=200, # Keep practical bound for performance -) - -# Strategy for positions (will be constrained by source length) -positions = st.integers(min_value=0, max_value=500) - - -# ============================================================================ -# PROPERTY TESTS - IMMUTABILITY -# ============================================================================ - - -class TestCursorImmutability: - """Test cursor immutability properties.""" - - @given(source=source_text, pos=positions) - @settings(max_examples=200) - def test_cursor_is_immutable(self, source: str, pos: int) -> None: - """INVARIANT: Cursor is immutable - advance() returns NEW cursor.""" - assume(pos < len(source)) # Valid position - event(f"text_len={len(source)}") - - cursor = Cursor(source, pos) - original_pos = cursor.pos - - # Advance cursor - new_cursor = cursor.advance() - - # Original cursor unchanged - assert cursor.pos == original_pos - # New cursor has new position - assert new_cursor.pos == original_pos + 1 - - @given(source=source_text, pos=positions) - @settings(max_examples=200) - def test_advance_count_returns_new_cursor(self, source: str, pos: int) -> None: - """PROPERTY: advance(count) returns new cursor, original unchanged.""" - assume(pos < len(source)) - event(f"pos={pos}") - - cursor = Cursor(source, pos) - original_pos = cursor.pos - - # Advance by N - n = min(5, len(source) - pos) - new_cursor = cursor.advance(n) - - # Original unchanged - assert cursor.pos == original_pos - # New cursor advanced by N - assert new_cursor.pos == original_pos + n - - @given(source=source_text) - @settings(max_examples=100) - def test_cursor_advance_preserves_source(self, source: str) -> None: - """PROPERTY: advance() preserves source string.""" - event(f"text_len={len(source)}") - cursor = Cursor(source, 0) - - while not cursor.is_eof: - new_cursor = cursor.advance() - assert new_cursor.source == source - cursor = new_cursor - - -# ============================================================================ -# PROPERTY TESTS - EOF HANDLING -# ============================================================================ - - -class TestCursorEOF: - """Test EOF (End Of File) detection properties.""" - - @given(source=source_text) - @settings(max_examples=200) - def test_is_eof_true_at_end(self, source: str) -> None: - """PROPERTY: is_eof is True when pos >= len(source).""" - event(f"text_len={len(source)}") - cursor = Cursor(source, len(source)) - assert cursor.is_eof is True - - @given(source=source_text.filter(lambda s: len(s) > 0)) - @settings(max_examples=200) - def test_is_eof_false_before_end(self, source: str) -> None: - """PROPERTY: is_eof is False when pos < len(source).""" - event(f"text_len={len(source)}") - cursor = Cursor(source, 0) - assert cursor.is_eof is False - - @given(source=source_text) - @settings(max_examples=100) - def test_current_raises_eoferror_at_eof(self, source: str) -> None: - """PROPERTY: current raises EOFError when is_eof is True.""" - event(f"text_len={len(source)}") - cursor = Cursor(source, len(source)) - - if cursor.is_eof: - with pytest.raises(EOFError): - _ = cursor.current - - @given(source=source_text.filter(lambda s: len(s) > 0)) - @settings(max_examples=100) - def test_current_succeeds_before_eof(self, source: str) -> None: - """PROPERTY: current succeeds when is_eof is False.""" - event(f"text_len={len(source)}") - cursor = Cursor(source, 0) - - if not cursor.is_eof: - # Should not raise - char = cursor.current - assert isinstance(char, str) - assert len(char) == 1 - - @given(source=source_text.filter(lambda s: len(s) > 0)) - @settings(max_examples=100) - def test_advance_until_eof_reaches_end(self, source: str) -> None: - """PROPERTY: Advancing through source eventually reaches EOF.""" - event(f"text_len={len(source)}") - cursor = Cursor(source, 0) - - # Advance until EOF - for _ in range(len(source) + 1): - if cursor.is_eof: - break - cursor = cursor.advance() - - # Should be at EOF - assert cursor.is_eof is True - assert cursor.pos >= len(source) - - -# ============================================================================ -# PROPERTY TESTS - NAVIGATION -# ============================================================================ - - -class TestCursorNavigation: - """Test cursor navigation properties.""" - - @given(source=source_text, pos=positions) - @settings(max_examples=200) - def test_current_returns_char_at_position(self, source: str, pos: int) -> None: - """PROPERTY: current returns character at pos.""" - assume(pos < len(source)) - event(f"pos={pos}") - - cursor = Cursor(source, pos) - - if not cursor.is_eof: - assert cursor.current == source[pos] - - @given( - source=source_text.filter(lambda s: len(s) > 1), - n=st.integers(min_value=1, max_value=10), - ) - @settings(max_examples=100) - def test_advance_count_moves_by_count(self, source: str, n: int) -> None: - """PROPERTY: advance(k) moves position by k.""" - event(f"advance_count={n}") - cursor = Cursor(source, 0) - n_safe = min(n, len(source)) - - new_cursor = cursor.advance(n_safe) - - assert new_cursor.pos == cursor.pos + n_safe - - @given(source=source_text.filter(lambda s: len(s) > 0)) - @settings(max_examples=100) - def test_advance_once_equals_advance_one(self, source: str) -> None: - """PROPERTY: advance() == advance(1).""" - event(f"source_len={len(source)}") - cursor = Cursor(source, 0) - - cursor1 = cursor.advance() - cursor2 = cursor.advance(1) - - assert cursor1.pos == cursor2.pos - - @given( - source=source_text.filter(lambda s: len(s) > 2), - offset=st.integers(min_value=0, max_value=10), - ) - @settings(max_examples=100) - def test_peek_reads_ahead_without_advancing(self, source: str, offset: int) -> None: - """PROPERTY: peek(offset) reads ahead without changing position.""" - event(f"offset={offset}") - cursor = Cursor(source, 0) - - if offset < len(source): - peeked = cursor.peek(offset) - pos_after_peek = cursor.pos - - # Peek should not change position - assert pos_after_peek == 0 - # Peek should return correct character - assert peeked == source[offset] - - @given( - source=source_text.filter(lambda s: len(s) > 0), - start_pos=st.integers(min_value=0, max_value=50), - ) - @settings(max_examples=100) - def test_slice_to_extracts_substring(self, source: str, start_pos: int) -> None: - """PROPERTY: slice_to(end) extracts source[pos:end].""" - event(f"source_len={len(source)}") - start_pos = min(start_pos, len(source) - 1) - cursor = Cursor(source, start_pos) - - end_pos = min(start_pos + 5, len(source)) - extracted = cursor.slice_to(end_pos) - - assert extracted == source[start_pos:end_pos] - - -# ============================================================================ -# PROPERTY TESTS - LINE/COLUMN TRACKING -# ============================================================================ - - -class TestCursorLineColumn: - """Test line and column tracking properties.""" - - @given(source=source_text) - @settings(max_examples=100) - def test_line_starts_at_one(self, source: str) -> None: - """PROPERTY: Line numbers start at 1.""" - event(f"source_len={len(source)}") - cursor = Cursor(source, 0) - line, _ = cursor.compute_line_col() - - assert line >= 1 - - @given(source=source_text) - @settings(max_examples=100) - def test_column_starts_at_one(self, source: str) -> None: - """PROPERTY: Column numbers start at 1.""" - event(f"source_len={len(source)}") - cursor = Cursor(source, 0) - _, column = cursor.compute_line_col() - - assert column >= 1 - - @given(lines=st.lists(st.text(), min_size=1, max_size=10)) # Keep list bound for performance - @settings(max_examples=50) - def test_newline_increments_line_number(self, lines: list[str]) -> None: - """PROPERTY: Newlines increment line number.""" - event(f"line_count={len(lines)}") - source = "\n".join(lines) - - # Count newlines - newline_count = source.count("\n") - - # Advance to end - cursor_end = Cursor(source, len(source)) - line_end, _ = cursor_end.compute_line_col() - - # Line number should be newline_count + 1 - assert line_end == newline_count + 1 - - @given(source=source_text) - @settings(max_examples=50) - def test_compute_line_col_equals_property(self, source: str) -> None: - """PROPERTY: compute_line_col() returns same as line_col property.""" - event(f"source_len={len(source)}") - cursor = Cursor(source, min(len(source), 10)) - - result1 = cursor.compute_line_col() - result2 = cursor.compute_line_col() - - assert result1 == result2 - - -# ============================================================================ -# PROPERTY TESTS - ROBUSTNESS -# ============================================================================ - - -class TestCursorRobustness: - """Test cursor robustness with edge cases.""" - - @given(source=source_text) - @settings(max_examples=100) - def test_empty_source_is_eof(self, source: str) -> None: - """PROPERTY: Empty source is always EOF.""" - event(f"source_len={len(source)}") - if len(source) == 0: - cursor = Cursor(source, 0) - assert cursor.is_eof is True - - @given(source=source_text) - @settings(max_examples=100) - def test_position_at_end_is_eof(self, source: str) -> None: - """PROPERTY: pos == len(source) is the canonical EOF position.""" - event(f"source_len={len(source)}") - cursor = Cursor(source, len(source)) - assert cursor.is_eof is True - - @given(source=source_text, pos=st.integers(min_value=1, max_value=1000)) - @settings(max_examples=100) - def test_position_strictly_beyond_end_raises(self, source: str, pos: int) -> None: - """PROPERTY: pos > len(source) raises ValueError at construction. - - advance() always clamps to len(source), so positions strictly beyond - the source length cannot arise through normal cursor navigation and - indicate a construction error. - """ - assume(pos > len(source)) - event(f"excess={pos - len(source)}") - with pytest.raises(ValueError, match="exceeds source length"): - Cursor(source, pos) - - @given(source=source_text.filter(lambda s: len(s) > 0)) - @settings(max_examples=50) - def test_advance_at_eof_stays_at_eof(self, source: str) -> None: - """PROPERTY: Advancing at EOF stays at EOF.""" - event(f"source_len={len(source)}") - cursor = Cursor(source, len(source)) - assert cursor.is_eof is True - - # Advance should keep us at or past EOF - new_cursor = cursor.advance() - assert new_cursor.is_eof is True - - @given( - source=source_text.filter(lambda s: len(s) > 0), - offset=st.integers(min_value=0, max_value=100), - ) - @settings(max_examples=100) - def test_peek_beyond_eof_returns_none(self, source: str, offset: int) -> None: - """PROPERTY: peek(offset) returns None when offset >= remaining chars.""" - event(f"offset={offset}") - cursor = Cursor(source, 0) - - if offset >= len(source): - result = cursor.peek(offset) - assert result is None - - @given(source=source_text, count=st.integers(min_value=1, max_value=1000)) - @settings(max_examples=100) - def test_advance_clamps_at_eof(self, source: str, count: int) -> None: - """PROPERTY: advance(count) clamps position at source length.""" - event(f"advance_count={count}") - cursor = Cursor(source, 0) - - new_cursor = cursor.advance(count) - - # Position should not exceed source length - assert new_cursor.pos <= len(source) - - -# ============================================================================ -# PROPERTY TESTS - IDEMPOTENCE -# ============================================================================ - - -class TestCursorIdempotence: - """Test idempotent cursor operations.""" - - @given(source=source_text, pos=positions) - @settings(max_examples=100) - def test_is_eof_is_idempotent(self, source: str, pos: int) -> None: - """PROPERTY: Multiple is_eof calls return same value.""" - event(f"source_len={len(source)}") - # Clamp pos to the valid range [0, len(source)] - pos = min(pos, len(source)) - cursor = Cursor(source, pos) - - result1 = cursor.is_eof - result2 = cursor.is_eof - result3 = cursor.is_eof - - assert result1 == result2 == result3 - - @given(source=source_text.filter(lambda s: len(s) > 0)) - @settings(max_examples=100) - def test_current_is_idempotent(self, source: str) -> None: - """PROPERTY: Multiple current accesses return same character.""" - event(f"source_len={len(source)}") - cursor = Cursor(source, 0) - - if not cursor.is_eof: - char1 = cursor.current - char2 = cursor.current - char3 = cursor.current - - assert char1 == char2 == char3 - - @given( - source=source_text.filter(lambda s: len(s) > 2), - offset=st.integers(min_value=0, max_value=5), - ) - @settings(max_examples=100) - def test_peek_is_idempotent(self, source: str, offset: int) -> None: - """PROPERTY: Multiple peek calls return same result.""" - event(f"offset={offset}") - cursor = Cursor(source, 0) - - peek1 = cursor.peek(offset) - peek2 = cursor.peek(offset) - peek3 = cursor.peek(offset) - - assert peek1 == peek2 == peek3 - - @given(source=source_text, pos=st.integers(min_value=0, max_value=100)) - @settings(max_examples=100) - def test_line_col_is_idempotent(self, source: str, pos: int) -> None: - """PROPERTY: Multiple line_col accesses return same value.""" - event(f"source_len={len(source)}") - pos = min(pos, len(source)) - cursor = Cursor(source, pos) - - lc1 = cursor.compute_line_col() - lc2 = cursor.compute_line_col() - lc3 = cursor.compute_line_col() - - assert lc1 == lc2 == lc3 - - -# ============================================================================ -# CONTRACT TESTS (from test_cursor_comprehensive.py) -# ============================================================================ - - -class TestCursorImmutabilityContracts: - """Contract-level tests for Cursor immutability.""" - - def test_cursor_frozen(self) -> None: - """Property: Cursor instances are immutable (frozen).""" - cursor = Cursor(source="hello", pos=0) - - with pytest.raises((AttributeError, TypeError)): - cursor.pos = 1 # type: ignore[misc] - - @given(st.text(), st.integers(min_value=0, max_value=1000)) - def test_cursor_construction(self, source: str, pos: int) -> None: - """Property: Cursor can be constructed with any valid source and position.""" - event(f"input_len={len(source)}") - # Clamp position to valid range - pos = min(pos, len(source)) - cursor = Cursor(source=source, pos=pos) - assert cursor.source == source - assert cursor.pos == pos - - -class TestCursorEOFProperty: - """Property-based tests for Cursor.is_eof property.""" - - def test_is_eof_at_start_of_nonempty_string(self) -> None: - """Verify is_eof is False at start of non-empty string.""" - cursor = Cursor(source="hello", pos=0) - assert cursor.is_eof is False - - def test_is_eof_at_end_of_string(self) -> None: - """Verify is_eof is True at end of string.""" - cursor = Cursor(source="hello", pos=5) - assert cursor.is_eof is True - - def test_construction_beyond_end_raises(self) -> None: - """Verify constructing cursor with pos > len(source) raises ValueError.""" - with pytest.raises(ValueError, match="exceeds source length"): - Cursor(source="hello", pos=10) - - def test_is_eof_empty_string(self) -> None: - """Verify is_eof is True for empty string at position 0.""" - cursor = Cursor(source="", pos=0) - assert cursor.is_eof is True - - @given(st.text(min_size=1)) - def test_is_eof_middle_of_string(self, source: str) -> None: - """Property: is_eof is False in middle of string.""" - event(f"input_len={len(source)}") - mid_pos = len(source) // 2 - cursor = Cursor(source=source, pos=mid_pos) - if mid_pos < len(source): - assert cursor.is_eof is False - - -class TestCursorCurrentProperty: - """Property-based tests for Cursor.current property.""" - - def test_current_at_start(self) -> None: - """Verify current returns first character at position 0.""" - cursor = Cursor(source="hello", pos=0) - assert cursor.current == "h" - - def test_current_in_middle(self) -> None: - """Verify current returns character at current position.""" - cursor = Cursor(source="hello", pos=2) - assert cursor.current == "l" - - def test_current_raises_at_eof(self) -> None: - """Verify current raises EOFError at EOF.""" - cursor = Cursor(source="hello", pos=5) - with pytest.raises(EOFError, match="EOF"): - _ = cursor.current - - def test_construction_beyond_eof_raises(self) -> None: - """Verify construction with pos beyond source length raises ValueError. - - The valid range for pos is [0, len(source)]. Positions strictly greater - than len(source) are rejected at construction time. - """ - with pytest.raises(ValueError, match="exceeds source length"): - Cursor(source="hello", pos=10) - - @given( - st.text(min_size=1).flatmap( - lambda s: st.tuples(st.just(s), st.integers(min_value=0, max_value=len(s) - 1)) - ) - ) - def test_current_returns_correct_character(self, source_pos: tuple[str, int]) -> None: - """Property: current returns character at position if valid.""" - source, pos = source_pos - event(f"input_len={len(source)}") - event(f"offset={pos}") - cursor = Cursor(source=source, pos=pos) - assert cursor.current == source[pos] - - -class TestCursorPeekMethod: - """Property-based tests for Cursor.peek() method.""" - - def test_peek_at_current_position(self) -> None: - """Verify peek(0) returns current character.""" - cursor = Cursor(source="hello", pos=0) - assert cursor.peek(0) == "h" - - def test_peek_ahead_one(self) -> None: - """Verify peek(1) returns next character.""" - cursor = Cursor(source="hello", pos=0) - assert cursor.peek(1) == "e" - - def test_peek_beyond_eof_returns_none(self) -> None: - """Verify peek() returns None when peeking beyond EOF.""" - cursor = Cursor(source="hello", pos=4) - assert cursor.peek(1) is None - - def test_peek_at_eof_returns_none(self) -> None: - """Verify peek() returns None at EOF.""" - cursor = Cursor(source="hello", pos=5) - assert cursor.peek(0) is None - - @given(st.text(min_size=2), st.integers(min_value=0, max_value=10)) - def test_peek_with_various_offsets(self, source: str, offset: int) -> None: - """Property: peek(offset) returns correct character or None.""" - event(f"input_len={len(source)}") - event(f"offset={offset}") - cursor = Cursor(source=source, pos=0) - result = cursor.peek(offset) - - in_bounds = offset < len(source) - event(f"valid={in_bounds}") - if in_bounds: - assert result == source[offset] - else: - assert result is None - - @given( - source=st.text(min_size=1), - pos=st.integers(min_value=0, max_value=20), - offset=st.integers(min_value=-50, max_value=-1), - ) - def test_peek_negative_offset_always_returns_none_or_valid( - self, source: str, pos: int, offset: int - ) -> None: - """Property: peek(offset) with negative offset returns None or in-bounds char. - - Verifies the target_pos < 0 guard: negative offsets whose magnitude - exceeds pos must return None, never a character from the END of the source - (Python negative indexing trap). - """ - pos = min(pos, len(source)) - cursor = Cursor(source=source, pos=pos) - target_pos = pos + offset - result = cursor.peek(offset) - - if target_pos < 0: - event("outcome=negative_target_returns_none") - # Without the guard this would silently return source[target_pos] - # (a character from the END of source). Must be None. - assert result is None - elif target_pos >= len(source): - event("outcome=beyond_eof_returns_none") - assert result is None - else: - event("outcome=in_bounds_lookbehind") - assert result == source[target_pos] - - -class TestCursorAdvanceMethod: - """Property-based tests for Cursor.advance() method.""" - - def test_advance_single_position(self) -> None: - """Verify advance() moves cursor by 1 position.""" - cursor = Cursor(source="hello", pos=0) - new_cursor = cursor.advance() - assert new_cursor.pos == 1 - assert cursor.pos == 0 # Original unchanged - - def test_advance_multiple_positions(self) -> None: - """Verify advance(count) moves cursor by count positions.""" - cursor = Cursor(source="hello", pos=0) - new_cursor = cursor.advance(3) - assert new_cursor.pos == 3 - - def test_advance_clamped_at_eof(self) -> None: - """Verify advance() clamps position at EOF.""" - cursor = Cursor(source="hello", pos=3) - new_cursor = cursor.advance(10) - assert new_cursor.pos == 5 # Clamped to len(source) - - def test_advance_from_eof_stays_at_eof(self) -> None: - """Verify advance() from EOF stays at EOF.""" - cursor = Cursor(source="hello", pos=5) - new_cursor = cursor.advance() - assert new_cursor.pos == 5 - - @given( - st.text(), - st.integers(min_value=0, max_value=100), - st.integers(min_value=1, max_value=10), - ) - def test_advance_returns_new_cursor(self, source: str, pos: int, count: int) -> None: - """Property: advance() returns new cursor, original unchanged.""" - event(f"input_len={len(source)}") - event(f"offset={pos}") - pos = min(pos, len(source)) - cursor = Cursor(source=source, pos=pos) - new_cursor = cursor.advance(count) - - # Original unchanged - assert cursor.pos == pos - # New cursor advanced (clamped at len(source)) - expected_pos = min(pos + count, len(source)) - assert new_cursor.pos == expected_pos - - -class TestCursorSliceToMethod: - """Property-based tests for Cursor.slice_to() method.""" - - def test_slice_to_simple_range(self) -> None: - """Verify slice_to() extracts substring.""" - cursor = Cursor(source="hello world", pos=0) - text = cursor.slice_to(5) - assert text == "hello" - - def test_slice_to_from_middle(self) -> None: - """Verify slice_to() works from middle position.""" - cursor = Cursor(source="hello world", pos=6) - text = cursor.slice_to(11) - assert text == "world" - - def test_slice_to_empty_range(self) -> None: - """Verify slice_to() returns empty string for empty range.""" - cursor = Cursor(source="hello", pos=2) - text = cursor.slice_to(2) - assert text == "" - - @given(st.text(min_size=1)) - def test_slice_to_full_string(self, source: str) -> None: - """Property: slice_to(len(source)) from pos=0 returns full string.""" - event(f"input_len={len(source)}") - cursor = Cursor(source=source, pos=0) - text = cursor.slice_to(len(source)) - assert text == source - - -class TestCursorSkipSpacesMethod: - """Property-based tests for Cursor.skip_spaces() method.""" - - def test_skip_spaces_no_spaces(self) -> None: - """Verify skip_spaces() returns same cursor when no spaces.""" - cursor = Cursor(source="hello", pos=0) - new_cursor = cursor.skip_spaces() - assert new_cursor.pos == 0 - - def test_skip_spaces_leading_spaces(self) -> None: - """Verify skip_spaces() skips leading spaces.""" - cursor = Cursor(source=" hello", pos=0) - new_cursor = cursor.skip_spaces() - assert new_cursor.pos == 3 - assert new_cursor.current == "h" - - def test_skip_spaces_all_spaces(self) -> None: - """Verify skip_spaces() handles all-space string.""" - cursor = Cursor(source=" ", pos=0) - new_cursor = cursor.skip_spaces() - assert new_cursor.is_eof is True - - def test_skip_spaces_only_space_not_tab(self) -> None: - """Verify skip_spaces() only skips space (U+0020), not tab.""" - cursor = Cursor(source=" \thello", pos=0) - new_cursor = cursor.skip_spaces() - assert new_cursor.pos == 2 - assert new_cursor.current == "\t" - - def test_skip_spaces_not_newline(self) -> None: - """Verify skip_spaces() does not skip newlines.""" - cursor = Cursor(source=" \nhello", pos=0) - new_cursor = cursor.skip_spaces() - assert new_cursor.pos == 2 - assert new_cursor.current == "\n" - - -class TestCursorSkipWhitespaceMethod: - """Property-based tests for Cursor.skip_whitespace() method.""" - - def test_skip_whitespace_no_whitespace(self) -> None: - """Verify skip_whitespace() returns same cursor when no whitespace.""" - cursor = Cursor(source="hello", pos=0) - new_cursor = cursor.skip_whitespace() - assert new_cursor.pos == 0 - - def test_skip_whitespace_mixed_whitespace(self) -> None: - """Verify skip_whitespace() skips space and newline. - - Note: CR is normalized to LF at parser entry, so skip_whitespace - only needs to handle space and LF. - """ - cursor = Cursor(source=" \n hello", pos=0) - new_cursor = cursor.skip_whitespace() - assert new_cursor.pos == 5 - assert new_cursor.current == "h" - - def test_skip_whitespace_all_whitespace(self) -> None: - """Verify skip_whitespace() handles all-whitespace string.""" - cursor = Cursor(source=" \n ", pos=0) - new_cursor = cursor.skip_whitespace() - assert new_cursor.is_eof is True - - def test_skip_whitespace_not_tab(self) -> None: - """Verify skip_whitespace() does not skip tab.""" - cursor = Cursor(source=" \n\thello", pos=0) - new_cursor = cursor.skip_whitespace() - assert new_cursor.pos == 2 - assert new_cursor.current == "\t" - - -class TestCursorExpectMethod: - """Property-based tests for Cursor.expect() method.""" - - def test_expect_match(self) -> None: - """Verify expect() returns new cursor when character matches.""" - cursor = Cursor(source="hello", pos=0) - new_cursor = cursor.expect("h") - assert new_cursor is not None - assert new_cursor.pos == 1 - - def test_expect_no_match(self) -> None: - """Verify expect() returns None when character does not match.""" - cursor = Cursor(source="hello", pos=0) - result = cursor.expect("x") - assert result is None - - def test_expect_at_eof(self) -> None: - """Verify expect() returns None at EOF.""" - cursor = Cursor(source="hello", pos=5) - result = cursor.expect("h") - assert result is None - - @given(st.text(min_size=1), st.characters()) - def test_expect_various_characters(self, source: str, char: str) -> None: - """Property: expect() behavior depends on current character.""" - event(f"input_len={len(source)}") - cursor = Cursor(source=source, pos=0) - result = cursor.expect(char) - - matched = source[0] == char - event(f"valid={matched}") - if matched: - assert result is not None - assert result.pos == 1 - else: - assert result is None - - -class TestCursorComputeLineColMethod: - """Property-based tests for Cursor.compute_line_col() method.""" - - def test_compute_line_col_first_line_first_col(self) -> None: - """Verify compute_line_col() returns (1, 1) at position 0.""" - cursor = Cursor(source="hello", pos=0) - line, col = cursor.compute_line_col() - assert line == 1 - assert col == 1 - - def test_compute_line_col_first_line_later_col(self) -> None: - """Verify compute_line_col() returns correct column on first line.""" - cursor = Cursor(source="hello", pos=2) - line, col = cursor.compute_line_col() - assert line == 1 - assert col == 3 - - def test_compute_line_col_second_line(self) -> None: - """Verify compute_line_col() returns (2, 1) at start of second line.""" - cursor = Cursor(source="line1\nline2", pos=6) - line, col = cursor.compute_line_col() - assert line == 2 - assert col == 1 - - def test_compute_line_col_second_line_middle(self) -> None: - """Verify compute_line_col() returns correct position on second line.""" - cursor = Cursor(source="line1\nline2", pos=8) - line, col = cursor.compute_line_col() - assert line == 2 - assert col == 3 - - def test_compute_line_col_multiple_lines(self) -> None: - """Verify compute_line_col() handles multiple newlines.""" - cursor = Cursor(source="a\nb\nc\nd", pos=6) # Position at 'd' - line, col = cursor.compute_line_col() - assert line == 4 - assert col == 1 - - -class TestParseResultDataclass: - """Property-based tests for ParseResult dataclass.""" - - def test_parse_result_frozen(self) -> None: - """Property: ParseResult instances are immutable (frozen).""" - cursor = Cursor(source="test", pos=0) - result: ParseResult[str] = ParseResult(value="parsed", cursor=cursor) - - with pytest.raises((AttributeError, TypeError)): - result.value = "changed" # type: ignore[misc] - - @given(st.text(), st.text(), st.integers(min_value=0, max_value=100)) - def test_parse_result_construction_string( - self, value: str, source: str, pos: int - ) -> None: - """Property: ParseResult can be constructed with string values.""" - event(f"input_len={len(source)}") - event(f"offset={pos}") - pos = min(pos, len(source)) - cursor = Cursor(source=source, pos=pos) - result: ParseResult[str] = ParseResult(value=value, cursor=cursor) - assert result.value == value - assert result.cursor is cursor - - @given(st.integers()) - def test_parse_result_construction_int(self, value: int) -> None: - """Property: ParseResult can be constructed with int values.""" - event(f"value={value}") - cursor = Cursor(source="test", pos=0) - result: ParseResult[int] = ParseResult(value=value, cursor=cursor) - assert result.value == value - - def test_parse_result_generic_type(self) -> None: - """Verify ParseResult works with various types.""" - cursor = Cursor(source="test", pos=0) - - # String type - str_result: ParseResult[str] = ParseResult(value="hello", cursor=cursor) - assert str_result.value == "hello" - - # List type - list_result: ParseResult[list[int]] = ParseResult(value=[1, 2, 3], cursor=cursor) - assert list_result.value == [1, 2, 3] - - # Tuple type - tuple_result: ParseResult[tuple[str, int]] = ParseResult( - value=("test", 42), cursor=cursor - ) - assert tuple_result.value == ("test", 42) - - -class TestParseErrorDataclass: - """Property-based tests for ParseError dataclass.""" - - def test_parse_error_frozen(self) -> None: - """Property: ParseError instances are immutable (frozen).""" - cursor = Cursor(source="test", pos=0) - error = ParseError(message="error", cursor=cursor) - - with pytest.raises((AttributeError, TypeError)): - error.message = "changed" # type: ignore[misc] - - @given(st.text(), st.text()) - def test_parse_error_construction_minimal(self, message: str, source: str) -> None: - """Property: ParseError can be constructed with message and cursor only.""" - event(f"input_len={len(source)}") - cursor = Cursor(source=source, pos=0) - error = ParseError(message=message, cursor=cursor) - assert error.message == message - assert error.cursor is cursor - assert error.expected == () - - def test_parse_error_construction_with_expected(self) -> None: - """Verify ParseError can be constructed with expected tokens.""" - cursor = Cursor(source="test", pos=0) - error = ParseError(message="error", cursor=cursor, expected=("}", "]")) - assert error.expected == ("}", "]") - - -class TestParseErrorFormatError: - """Property-based tests for ParseError.format_error() method.""" - - def test_format_error_simple(self) -> None: - """Verify format_error() returns formatted error string.""" - cursor = Cursor(source="hello", pos=2) - error = ParseError(message="Test error", cursor=cursor) - formatted = error.format_error() - assert "1:3:" in formatted - assert "Test error" in formatted - - def test_format_error_with_expected(self) -> None: - """Verify format_error() includes expected tokens.""" - cursor = Cursor(source="hello", pos=0) - error = ParseError(message="Unexpected", cursor=cursor, expected=("}", "]")) - formatted = error.format_error() - assert "expected:" in formatted - assert "'}'" in formatted - assert "']'" in formatted - - def test_format_error_multiline_source(self) -> None: - """Verify format_error() shows correct line number for multiline source.""" - cursor = Cursor(source="line1\nline2\nline3", pos=6) # Start of line2 - error = ParseError(message="Error on line 2", cursor=cursor) - formatted = error.format_error() - assert "2:1:" in formatted - - -class TestParseErrorFormatWithContext: - """Property-based tests for ParseError.format_with_context() method.""" - - def test_format_with_context_simple(self) -> None: - """Verify format_with_context() shows source context.""" - cursor = Cursor(source="hello world", pos=6) - error = ParseError(message="Test error", cursor=cursor) - formatted = error.format_with_context() - - assert "1:7: Test error" in formatted - assert "hello world" in formatted - assert "^" in formatted # Pointer - - def test_format_with_context_multiline(self) -> None: - """Verify format_with_context() shows multiple lines.""" - source = "line1\nline2\nline3" - cursor = Cursor(source=source, pos=6) # Start of line2 - error = ParseError(message="Error", cursor=cursor) - formatted = error.format_with_context() - - assert "line1" in formatted - assert "line2" in formatted - assert "line3" in formatted - assert "^" in formatted - - def test_format_with_context_custom_context_lines(self) -> None: - """Verify format_with_context() respects context_lines parameter.""" - source = "line1\nline2\nline3\nline4\nline5" - cursor = Cursor(source=source, pos=12) # Line 3 - error = ParseError(message="Error", cursor=cursor) - - # With context_lines=1, should show lines 2-4 - formatted = error.format_with_context(context_lines=1) - assert "line2" in formatted - assert "line3" in formatted - assert "line4" in formatted - - def test_format_with_context_pointer_alignment(self) -> None: - """Verify format_with_context() aligns pointer correctly.""" - cursor = Cursor(source="hello", pos=2) - error = ParseError(message="Error", cursor=cursor) - formatted = error.format_with_context() - - lines = formatted.split("\n") - # Find the line with hello and the pointer line - for i, line in enumerate(lines): - if "hello" in line and i + 1 < len(lines): - # Next line should have pointer at correct position - pointer_line = lines[i + 1] - # The pointer should be at column 3 (accounting for line number prefix) - assert "^" in pointer_line - - -class TestCursorIntegrationContracts: - """Integration contract tests for Cursor methods working together.""" - - def test_cursor_parse_word(self) -> None: - """Integration: Use cursor to parse a word.""" - cursor = Cursor(source="hello world", pos=0) - start_pos = cursor.pos - - # Advance until space - while not cursor.is_eof and cursor.current != " ": - cursor = cursor.advance() - - # Extract word - word = Cursor(source="hello world", pos=start_pos).slice_to(cursor.pos) - assert word == "hello" - - def test_cursor_skip_and_parse(self) -> None: - """Integration: Skip whitespace then parse.""" - cursor = Cursor(source=" hello", pos=0) - - # Skip spaces - cursor = cursor.skip_spaces() - - # Parse word - start_pos = cursor.pos - while not cursor.is_eof and cursor.current.isalpha(): - cursor = cursor.advance() - - word = Cursor(source=" hello", pos=start_pos).slice_to(cursor.pos) - assert word == "hello" - - def test_cursor_peek_and_expect(self) -> None: - """Integration: Use peek to look ahead, then expect.""" - cursor = Cursor(source="hello", pos=0) - - # Peek ahead - assert cursor.peek(0) == "h" - assert cursor.peek(1) == "e" - - # Expect and advance - new_cursor = cursor.expect("h") - assert new_cursor is not None - assert new_cursor.current == "e" +"""Aggregated syntax cursor property test surface.""" + +from tests.syntax_cursor_property_cases.contract_tests_from_test_cursor_comprehensive_py import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_property_cases.property_tests_eof_handling import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_property_cases.property_tests_idempotence import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_property_cases.property_tests_immutability import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_property_cases.property_tests_line_column_tracking import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_property_cases.property_tests_navigation import * # noqa: F403 - re-export split test surface +from tests.syntax_cursor_property_cases.property_tests_robustness import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_parser_core.py b/tests/test_syntax_parser_core.py index 10d29b0a..a3b8ba1d 100644 --- a/tests/test_syntax_parser_core.py +++ b/tests/test_syntax_parser_core.py @@ -1,1502 +1,10 @@ -"""Core parser tests: blank line detection, comment merging, DoS protection, error recovery. - -Tests for ``ftllexengine.syntax.parser.core``: - -- ``_has_blank_line_between``: Region-based newline detection for comment merging -- ``_CommentAccumulator``: Span handling and content joining for adjacent comments -- ``FluentParserV1``: Comment merging, term/message/junk parsing, DoS limits, - nesting depth clamping, source size validation, error recovery, - parse_stream incremental entry parsing -""" - -from __future__ import annotations - -import logging -import sys - -import pytest -from hypothesis import event, given -from hypothesis import strategies as st - -from ftllexengine.constants import MAX_SOURCE_SIZE -from ftllexengine.diagnostics import DiagnosticCode -from ftllexengine.enums import CommentType -from ftllexengine.syntax.ast import Comment, Junk, Message, Span, Term -from ftllexengine.syntax.parser.core import ( - FluentParserV1, - _CommentAccumulator, - _has_blank_line_between, -) - -# ============================================================================ -# TestBlankLineDetection -# ============================================================================ - - -class TestBlankLineDetection: - """Direct tests for ``_has_blank_line_between``. - - The function checks whether a region of the source string contains - at least one newline character. After parse_comment consumes the - trailing newline, any remaining newline in the gap indicates a - blank line was present between comments. - """ - - # -- Positive: regions containing newlines ---------------------------- - - def test_empty_region_has_no_blank_line(self) -> None: - """Empty region (start == end) contains no newline.""" - source = "content" - assert _has_blank_line_between(source, 0, 0) is False - - def test_consecutive_newlines(self) -> None: - """Two consecutive newlines in region are detected.""" - source = "\n\n" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_single_newline_in_region(self) -> None: - """Single newline indicates blank line (trailing LF already consumed).""" - source = "line1\nline2" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_newline_space_newline(self) -> None: - """Newline-space-newline sequence contains a newline.""" - source = "line1\n \nline2" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_multiple_spaces_between_newlines(self) -> None: - """Multiple spaces between newlines still contains newlines.""" - source = "start\n \nend" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_consecutive_newlines_at_start(self) -> None: - """Consecutive newlines at start of region.""" - source = "\n\ncontent" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_newline_at_end_only(self) -> None: - """Single newline at end of content is detected.""" - source = "content\n" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_alternating_newlines_and_spaces(self) -> None: - """Alternating pattern of newlines and spaces.""" - source = "\n \n \n" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_content_between_newlines(self) -> None: - """Content between newlines does not prevent newline detection.""" - source = "\nX\n" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_tab_between_newlines(self) -> None: - """Tab between newlines does not prevent newline detection.""" - source = "\n\t\n" - assert _has_blank_line_between(source, 0, len(source)) is True - - # -- Negative: regions without newlines -------------------------------- - - def test_spaces_only_no_newlines(self) -> None: - """Region with only spaces has no newline.""" - source = "content content" - assert _has_blank_line_between(source, 7, 12) is False - - def test_no_newline_ascii_content(self) -> None: - """Plain ASCII content without newlines.""" - source = "abcdefghijklmnop" - assert _has_blank_line_between(source, 0, len(source)) is False - - def test_mixed_whitespace_no_newline(self) -> None: - """Mixed spaces without newline in subregion.""" - source = "start end" - assert _has_blank_line_between(source, 5, 9) is False - - # -- Region boundary handling ------------------------------------------ - - def test_blank_line_partially_in_region(self) -> None: - """Region containing newlines is detected.""" - source = "prefix\n\nsuffix" - assert _has_blank_line_between(source, 6, 8) is True - - def test_blank_line_before_region(self) -> None: - """Newlines before region are not detected.""" - source = "\n\ncontent" - assert _has_blank_line_between(source, 2, len(source)) is False - - def test_blank_line_after_region(self) -> None: - """Newlines after region are not detected.""" - source = "content\n\n" - assert _has_blank_line_between(source, 0, 7) is False - - # -- Comment merging gap scenarios ------------------------------------- - - def test_comment_gap_two_newlines(self) -> None: - """Two newlines in a row create a blank line gap.""" - source = "\n\n" - assert _has_blank_line_between(source, 0, len(source)) is True - - def test_comment_gap_empty(self) -> None: - """Zero-length gap between consecutive comments has no blank line.""" - comment1_end = len("# Comment1\n") - source = "# Comment1\n# Comment2\n" - assert _has_blank_line_between( - source, comment1_end, comment1_end - ) is False - - def test_comment_gap_whitespace_only_line(self) -> None: - """Whitespace-only line between newlines is a blank line.""" - source = "\n \n" - assert _has_blank_line_between(source, 0, len(source)) is True - - -# ============================================================================ -# TestCommentMerging -# ============================================================================ - - -class TestCommentMerging: - """Comment merging via ``FluentParserV1`` and ``_CommentAccumulator``. - - Adjacent single-hash comments without blank lines between them are - merged into a single Comment node. Different comment types (``#``, - ``##``, ``###``) are never merged. Blank lines separate comment groups. - """ - - # -- Parser-level merging ---------------------------------------------- - - def test_adjacent_comments_merge(self) -> None: - """Adjacent single-hash comments merge into one.""" - parser = FluentParserV1() - resource = parser.parse("# Line 1\n# Line 2\n# Line 3\n") - assert len(resource.entries) == 1 - comment = resource.entries[0] - assert isinstance(comment, Comment) - assert "Line 1" in comment.content - assert "Line 2" in comment.content - assert "Line 3" in comment.content - - def test_different_comment_types_dont_merge(self) -> None: - """Comments of different types are not merged.""" - parser = FluentParserV1() - resource = parser.parse("\n# Single\n## Group\n") - assert len(resource.entries) == 2 - c1 = resource.entries[0] - c2 = resource.entries[1] - assert isinstance(c1, Comment) - assert isinstance(c2, Comment) - assert c1.type == CommentType.COMMENT - assert c2.type == CommentType.GROUP - - def test_comments_separated_by_multiple_blank_lines(self) -> None: - """Multiple blank lines prevent merging.""" - parser = FluentParserV1() - resource = parser.parse("\n# First\n\n\n# Second\n") - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) == 2 - - def test_comments_separated_by_content(self) -> None: - """Non-comment content between comments prevents merging.""" - parser = FluentParserV1() - resource = parser.parse( - "\n# First comment\ntext\n# Second comment\n" - ) - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) == 2 - - def test_content_between_comments_separates(self) -> None: - """Text content between comments causes separation.""" - parser = FluentParserV1() - resource = parser.parse("# Comment1\ntext content here\n# Comment2") - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) == 2 - - def test_multiple_newlines_with_content(self) -> None: - """Multiple newlines with interspersed content separates.""" - parser = FluentParserV1() - resource = parser.parse("\n# First\n\n\nx\n# Second") - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) == 2 - - def test_newline_content_newline_pattern(self) -> None: - """Pattern: newline, content, newline separates comments.""" - parser = FluentParserV1() - resource = parser.parse("# First\nx\n\n# Second") - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) == 2 - - def test_merged_comment_span_covers_all(self) -> None: - """Merged comment span starts at first and ends at last.""" - parser = FluentParserV1() - resource = parser.parse("# Line 1\n# Line 2\n# Line 3") - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) == 1 - merged = comments[0] - assert merged.span is not None - assert merged.span.start == 0 - - def test_blank_line_with_spaces_between_comments(self) -> None: - """Comments with single blank line (containing spaces).""" - parser = FluentParserV1() - resource = parser.parse("# First\n\n# Second") - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) >= 1 - - # -- _CommentAccumulator span edge cases ------------------------------- - - def test_accumulator_finalize_last_span_only(self) -> None: - """Finalize when first_span is None but last_span is not.""" - first = Comment( - content="First", type=CommentType.COMMENT, span=None, - ) - acc = _CommentAccumulator(first) - second = Comment( - content="Second", - type=CommentType.COMMENT, - span=Span(start=10, end=30), - ) - acc.add(second) - result = acc.finalize() - assert result.content == "First\nSecond" - assert result.span is not None - assert result.span.start == 10 - assert result.span.end == 30 - - def test_accumulator_finalize_neither_span(self) -> None: - """Finalize when both spans are None.""" - first = Comment( - content="No span 1", type=CommentType.GROUP, span=None, - ) - acc = _CommentAccumulator(first) - second = Comment( - content="No span 2", type=CommentType.GROUP, span=None, - ) - acc.add(second) - result = acc.finalize() - assert result.content == "No span 1\nNo span 2" - assert result.type == CommentType.GROUP - assert result.span is None - - def test_accumulator_finalize_both_spans(self) -> None: - """Finalize when both first and last have spans.""" - first = Comment( - content="A", - type=CommentType.COMMENT, - span=Span(start=0, end=5), - ) - acc = _CommentAccumulator(first) - second = Comment( - content="B", - type=CommentType.COMMENT, - span=Span(start=6, end=11), - ) - acc.add(second) - result = acc.finalize() - assert result.content == "A\nB" - assert result.span is not None - assert result.span.start == 0 - assert result.span.end == 11 - - # -- Comment attachment to terms --------------------------------------- - - def test_single_hash_comment_attached_to_term(self) -> None: - """Single-hash comment immediately before term is attached.""" - parser = FluentParserV1() - resource = parser.parse( - "# This comment should attach\n-my-term = Term Value\n" - ) - assert len(resource.entries) == 1 - entry = resource.entries[0] - assert isinstance(entry, Term) - assert entry.id.name == "my-term" - assert entry.comment is not None - assert isinstance(entry.comment, Comment) - assert entry.comment.type == CommentType.COMMENT - assert "This comment should attach" in entry.comment.content - - def test_multiple_comments_attached_to_term(self) -> None: - """Multiple adjacent comments merge and attach to term.""" - parser = FluentParserV1() - source = ( - "# Comment line 1\n# Comment line 2\n" - "# Comment line 3\n-my-term = Value\n" - ) - resource = parser.parse(source) - assert len(resource.entries) == 1 - entry = resource.entries[0] - assert isinstance(entry, Term) - assert entry.comment is not None - assert "Comment line 1" in entry.comment.content - assert "Comment line 2" in entry.comment.content - assert "Comment line 3" in entry.comment.content - - def test_group_comment_before_term_not_attached(self) -> None: - """Group comment (##) before term is not attached.""" - parser = FluentParserV1() - resource = parser.parse("## Group comment\n-my-term = Value\n") - assert len(resource.entries) == 2 - comment = resource.entries[0] - term = resource.entries[1] - assert isinstance(comment, Comment) - assert comment.type == CommentType.GROUP - assert isinstance(term, Term) - assert term.comment is None - - def test_comment_with_blank_lines_before_term_not_attached(self) -> None: - """Blank lines between comment and term prevent attachment.""" - parser = FluentParserV1() - resource = parser.parse("# Comment\n\n\n-my-term = Value\n") - assert len(resource.entries) == 2 - comment = resource.entries[0] - term = resource.entries[1] - assert isinstance(comment, Comment) - assert isinstance(term, Term) - assert term.comment is None - - # -- CRLF handling in comment merging ---------------------------------- - - def test_crlf_comments(self) -> None: - """Parser handles CRLF line endings in comment regions.""" - parser = FluentParserV1() - resource = parser.parse("# Comment 1\r\n\r\n# Comment 2") - assert resource is not None - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) >= 1 - - def test_cr_only_comments(self) -> None: - """Parser handles CR-only line endings in comment regions.""" - parser = FluentParserV1() - resource = parser.parse("# Comment 1\r\r# Comment 2") - assert resource is not None - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) >= 1 - - def test_spaces_between_crlf_newlines(self) -> None: - """Parser handles spaces between CRLF newlines.""" - parser = FluentParserV1() - resource = parser.parse("# Comment 1\r\n \r\n# Comment 2") - assert resource is not None - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) >= 1 - - def test_no_blank_line_adjacent_comments_merge(self) -> None: - """Adjacent comments with no blank line merge into one.""" - parser = FluentParserV1() - resource = parser.parse("# Comment 1\n# Comment 2") - comments = [ - e for e in resource.entries if isinstance(e, Comment) - ] - assert len(comments) == 1 - - -# ============================================================================ -# TestDoSProtection -# ============================================================================ - - -class TestDoSAbortBehavior: - """DoS abort behavior: max_parse_errors and abort thresholds. - - The parser aborts when the number of Junk entries exceeds - ``max_parse_errors``, preventing memory exhaustion from - severely malformed input. - """ - - # -- max_parse_errors: indented junk ----------------------------------- - - def test_abort_on_indented_junk( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """Parser aborts when indented junk count exceeds limit.""" - parser = FluentParserV1(max_parse_errors=3) - source = ( - " indented1\n# comment\n" - " indented2\n# comment\n" - " indented3\n# comment\n" - " indented4\n" - ) - with caplog.at_level(logging.WARNING): - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 3 - assert any( - "Parse aborted" in r.message for r in caplog.records - ) - assert any( - "exceeded maximum of 3 Junk entries" in r.message - for r in caplog.records - ) - - # -- max_parse_errors: failed comments --------------------------------- - - def test_abort_on_failed_comments( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """Parser aborts when malformed comment count exceeds limit.""" - parser = FluentParserV1(max_parse_errors=2) - source = "####\n####\n####\n####\n" - with caplog.at_level(logging.WARNING): - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 2 - assert any( - "Parse aborted" in r.message for r in caplog.records - ) - assert any( - "exceeded maximum of 2 Junk entries" in r.message - for r in caplog.records - ) - - def test_malformed_comment_creates_junk_with_diagnostic(self) -> None: - """Malformed comment creates Junk with proper diagnostic.""" - parser = FluentParserV1() - result = parser.parse("#####\n") - assert len(result.entries) == 1 - junk_entry = result.entries[0] - assert isinstance(junk_entry, Junk) - assert junk_entry.content == "#####" - assert len(junk_entry.annotations) == 1 - assert ( - junk_entry.annotations[0].code - == DiagnosticCode.PARSE_JUNK.name - ) - assert "Invalid comment syntax" in junk_entry.annotations[0].message - - # -- max_parse_errors: message parse failures -------------------------- - - def test_abort_on_message_failures( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """Parser aborts when message parse failures exceed limit.""" - parser = FluentParserV1(max_parse_errors=3) - source = "msg1\nmsg2\nmsg3\nmsg4\nmsg5\n" - with caplog.at_level(logging.WARNING): - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 3 - assert any( - "Parse aborted" in r.message for r in caplog.records - ) - assert any( - "exceeded maximum of 3 Junk entries" in r.message - for r in caplog.records - ) - - def test_generic_parse_error_annotation(self) -> None: - """Generic parse error when nesting depth not exceeded.""" - parser = FluentParserV1() - result = parser.parse("invalid syntax here\n") - assert len(result.entries) == 1 - junk_entry = result.entries[0] - assert isinstance(junk_entry, Junk) - assert len(junk_entry.annotations) == 1 - annotation = junk_entry.annotations[0] - assert annotation.code == DiagnosticCode.PARSE_JUNK.name - assert annotation.message == "Parse error" - - # -- max_parse_errors: mixed junk types -------------------------------- - - def test_mixed_junk_types_count_toward_limit( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """All junk types count together toward the limit.""" - parser = FluentParserV1(max_parse_errors=4) - source = ( - " indented1\nmsg1 = ok\n####\n" - "invalid\nmsg2 = ok\n indented2\n####\n" - ) - with caplog.at_level(logging.WARNING): - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 4 - assert any( - "Parse aborted" in r.message for r in caplog.records - ) - - def test_depth_exceeded_counts_toward_limit( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """Depth exceeded errors count toward max_parse_errors.""" - parser = FluentParserV1( - max_nesting_depth=1, max_parse_errors=2, - ) - source = ( - "m1 = { { $x } }\nm2 = { { $y } }\n" - "m3 = { { $z } }\n" - ) - with caplog.at_level(logging.WARNING): - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 2 - depth_count = sum( - 1 - for entry in junk - for ann in entry.annotations - if ann.code - == DiagnosticCode.PARSE_NESTING_DEPTH_EXCEEDED.name - ) - assert depth_count >= 1 - - # -- max_parse_errors: boundary conditions ----------------------------- - - def test_disabled_max_parse_errors_never_aborts(self) -> None: - """Parser with max_parse_errors=0 never aborts.""" - parser = FluentParserV1(max_parse_errors=0) - source = "####\n" * 200 - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 200 - - def test_exact_boundary(self) -> None: - """Parser creates exactly max_parse_errors junk entries at limit.""" - parser = FluentParserV1(max_parse_errors=5) - source = "####\n" * 5 - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 5 - - def test_one_over_boundary( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """Parser with 6 errors and limit of 5 aborts at 5.""" - parser = FluentParserV1(max_parse_errors=5) - source = "####\n" * 6 - with caplog.at_level(logging.WARNING): - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == 5 - assert any( - "Parse aborted" in r.message for r in caplog.records - ) - - # -- Log message content ----------------------------------------------- - - def test_log_suggests_fixing_source( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """DoS protection log mentions malformed FTL input.""" - parser = FluentParserV1(max_parse_errors=1) - source = "####\n####\n" - with caplog.at_level(logging.WARNING): - parser.parse(source) - assert any( - "severely malformed FTL input" in r.message - for r in caplog.records - ) - - def test_log_suggests_increasing_limit( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """DoS protection log mentions increasing max_parse_errors.""" - parser = FluentParserV1(max_parse_errors=1) - source = "####\n####\n" - with caplog.at_level(logging.WARNING): - parser.parse(source) - assert any( - "increasing max_parse_errors" in r.message - for r in caplog.records - ) - - - -# ============================================================================ -# TestDoSLimitsAndValidation -# ============================================================================ - - -class TestDoSLimitsAndValidation: - """DoS protection: nesting depth, source size, parameter validation. - - Verifies nesting depth clamping, source size limits, and - constructor parameter validation. - """ - - # -- Nesting depth exceeded -------------------------------------------- - - def test_depth_exceeded_specific_annotation(self) -> None: - """Nesting depth exceeded produces specific diagnostic.""" - parser = FluentParserV1(max_nesting_depth=1) - source = "msg = { { $var } }\n" - result = parser.parse(source) - assert len(result.entries) == 1 - junk_entry = result.entries[0] - assert isinstance(junk_entry, Junk) - assert len(junk_entry.annotations) == 1 - annotation = junk_entry.annotations[0] - assert ( - annotation.code - == DiagnosticCode.PARSE_NESTING_DEPTH_EXCEEDED.name - ) - assert "Nesting depth limit exceeded" in annotation.message - assert "max: 1" in annotation.message - - # -- Recursion limit clamping ------------------------------------------ - - def test_clamps_excessive_nesting_depth( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """Excessive max_nesting_depth is clamped to safe limit.""" - recursion_limit = sys.getrecursionlimit() - max_safe_depth = recursion_limit - 50 - excessive_depth = recursion_limit + 100 - with caplog.at_level( - logging.WARNING, - logger="ftllexengine.syntax.parser.core", - ): - parser = FluentParserV1(max_nesting_depth=excessive_depth) - assert parser.max_nesting_depth == max_safe_depth - assert parser.max_nesting_depth < excessive_depth - assert len(caplog.records) == 1 - warning = caplog.records[0] - assert warning.levelname == "WARNING" - assert "max_nesting_depth" in warning.message - assert "exceeds Python recursion limit" in warning.message - assert "Clamping to" in warning.message - - def test_accepts_depth_within_recursion_limit( - self, caplog: pytest.LogCaptureFixture, - ) -> None: - """No warning when nesting depth is within safe limit.""" - with caplog.at_level( - logging.WARNING, - logger="ftllexengine.syntax.parser.core", - ): - parser = FluentParserV1(max_nesting_depth=50) - assert parser.max_nesting_depth == 50 - assert len(caplog.records) == 0 - - # -- Source size validation -------------------------------------------- - - def test_max_source_size_default(self) -> None: - """Default max_source_size equals MAX_SOURCE_SIZE constant.""" - parser = FluentParserV1() - assert parser.max_source_size == MAX_SOURCE_SIZE - - def test_max_source_size_custom(self) -> None: - """Custom max_source_size is stored.""" - parser = FluentParserV1(max_source_size=5000) - assert parser.max_source_size == 5000 - - def test_max_source_size_disabled(self) -> None: - """max_source_size=0 disables the limit.""" - parser = FluentParserV1(max_source_size=0) - assert parser.max_source_size == 0 - - def test_oversized_source_raises_value_error(self) -> None: - """parse() raises ValueError when source exceeds limit.""" - parser = FluentParserV1(max_source_size=100) - oversized = "a" * 101 - with pytest.raises( - ValueError, - match=( - r"Source length \(101 characters\) " - r"exceeds maximum \(100 characters\)" - ), - ): - parser.parse(oversized) - - def test_oversized_error_includes_config_hint(self) -> None: - """ValueError includes configuration hint.""" - parser = FluentParserV1(max_source_size=50) - with pytest.raises( - ValueError, - match="Configure max_source_size in FluentParserV1", - ): - parser.parse("x" * 51) - - def test_source_at_exact_limit(self) -> None: - """parse() allows source exactly at size limit.""" - parser = FluentParserV1(max_source_size=100) - result = parser.parse(("msg = value\n" * 8)[:100]) - assert result is not None - - def test_disabled_limit_accepts_large_source(self) -> None: - """max_source_size=0 accepts arbitrarily large source.""" - parser = FluentParserV1(max_source_size=0) - result = parser.parse("msg = " + ("x" * 100000)) - assert result is not None - - def test_none_limit_accepts_large_source(self) -> None: - """max_source_size=None accepts arbitrarily large source.""" - parser = FluentParserV1(max_source_size=None) - result = parser.parse("msg = " + ("y" * 100000)) - assert result is not None - - # -- Parameter validation ---------------------------------------------- - - def test_rejects_zero_nesting_depth(self) -> None: - """max_nesting_depth=0 raises ValueError.""" - with pytest.raises( - ValueError, - match=r"max_nesting_depth must be positive \(got 0\)", - ): - FluentParserV1(max_nesting_depth=0) - - def test_rejects_negative_nesting_depth(self) -> None: - """max_nesting_depth=-1 raises ValueError.""" - with pytest.raises( - ValueError, - match=r"max_nesting_depth must be positive \(got -1\)", - ): - FluentParserV1(max_nesting_depth=-1) - - def test_accepts_positive_nesting_depth(self) -> None: - """Positive max_nesting_depth is accepted.""" - parser = FluentParserV1(max_nesting_depth=50) - assert parser.max_nesting_depth == 50 - - def test_accepts_none_nesting_depth(self) -> None: - """None max_nesting_depth uses default.""" - parser = FluentParserV1(max_nesting_depth=None) - assert parser.max_nesting_depth > 0 - - -# ============================================================================ -# TestParserErrorRecoveryCore -# ============================================================================ - - -class TestParserCommentRecovery: - """Parser comment parsing edge cases and comment type handling. - - Verifies comment recovery, comment types (single, group, resource), - and edge cases like hash-only lines and EOF handling. - """ - - # -- Comment parsing edge cases ---------------------------------------- - - def test_comment_without_newline_at_eof(self) -> None: - """Comment without trailing newline at EOF.""" - parser = FluentParserV1() - resource = parser.parse("# This is a comment") - assert resource is not None - assert len(resource.entries) > 0 - - def test_hash_only_at_eof(self) -> None: - """Single hash at EOF.""" - parser = FluentParserV1() - resource = parser.parse("#") - assert resource is not None - - def test_hash_with_newline_at_eof(self) -> None: - """Hash followed by newline at EOF.""" - parser = FluentParserV1() - resource = parser.parse("#\n") - assert resource is not None - - def test_multiple_hashes_at_eof(self) -> None: - """Multiple hashes (###) at EOF.""" - parser = FluentParserV1() - resource = parser.parse("###") - assert resource is not None - - def test_hash_followed_by_valid_message(self) -> None: - """Recovery from hash-only line then valid message.""" - parser = FluentParserV1() - resource = parser.parse("#\nmsg = value") - assert resource is not None - assert len(resource.entries) > 0 - - def test_hash_blank_line_then_message(self) -> None: - """Recovery from hash, blank line, then message.""" - parser = FluentParserV1() - resource = parser.parse("#\n\nmsg = value") - assert resource is not None - assert len(resource.entries) > 0 - - def test_multiple_failed_comment_lines(self) -> None: - """Recovery from multiple consecutive hash-only lines.""" - parser = FluentParserV1() - resource = parser.parse("#\n#\n#\nmsg = value") - assert resource is not None - - # -- Comment types ----------------------------------------------------- - - def test_single_line_comment(self) -> None: - """Single-line comment before message.""" - parser = FluentParserV1() - resource = parser.parse("# This is a comment\nmsg = value") - assert resource is not None - assert len(resource.entries) >= 1 - - def test_group_comment(self) -> None: - """Group comment (##) before message.""" - parser = FluentParserV1() - resource = parser.parse("## Group comment\nmsg = value") - assert resource is not None - - def test_resource_comment(self) -> None: - """Resource comment (###) before message.""" - parser = FluentParserV1() - resource = parser.parse("### Resource comment\nmsg = value") - assert resource is not None - - def test_multiple_comment_types(self) -> None: - """Multiple comment types in one resource.""" - parser = FluentParserV1() - source = "# Comment 1\n## Comment 2\n### Comment 3\nmsg = value\n" - resource = parser.parse(source) - assert resource is not None - assert len(resource.entries) >= 1 - - - -# ============================================================================ -# TestParserEntryRecovery -# ============================================================================ - - -class TestParserEntryRecovery: - """Parser entry recovery: empty input, CRLF, messages, terms, junk. - - Verifies the parser handles empty/whitespace input, CRLF line endings, - message and term parsing basics, and junk creation for invalid content. - """ - - # -- Empty / whitespace ------------------------------------------------ - - def test_empty_source(self) -> None: - """Empty source produces empty resource.""" - parser = FluentParserV1() - resource = parser.parse("") - assert resource is not None - assert len(resource.entries) == 0 - - def test_whitespace_only(self) -> None: - """Whitespace-only source produces empty resource.""" - parser = FluentParserV1() - resource = parser.parse(" \n\n \n") - assert resource is not None - assert len(resource.entries) == 0 - - # -- CRLF handling ----------------------------------------------------- - - def test_crlf_line_endings(self) -> None: - """Parser handles CRLF line endings.""" - parser = FluentParserV1() - resource = parser.parse("msg1 = value1\r\nmsg2 = value2\r\n") - assert resource is not None - assert len(resource.entries) >= 2 - - # -- Message parsing --------------------------------------------------- - - def test_simple_message(self) -> None: - """Simple message parsing.""" - parser = FluentParserV1() - resource = parser.parse("msg = value") - assert resource is not None - assert len(resource.entries) == 1 - assert isinstance(resource.entries[0], Message) - - def test_multiple_messages(self) -> None: - """Multiple messages.""" - parser = FluentParserV1() - resource = parser.parse( - "msg1 = value1\nmsg2 = value2\nmsg3 = value3\n" - ) - assert resource is not None - assert len(resource.entries) == 3 - - # -- Term parsing ------------------------------------------------------ - - def test_simple_term(self) -> None: - """Simple term parsing.""" - parser = FluentParserV1() - resource = parser.parse("-term = value") - assert resource is not None - assert len(resource.entries) == 1 - assert isinstance(resource.entries[0], Term) - - def test_term_with_id(self) -> None: - """Term preserves identifier.""" - parser = FluentParserV1() - resource = parser.parse("-my-term = Term Value") - assert len(resource.entries) == 1 - assert isinstance(resource.entries[0], Term) - assert resource.entries[0].id.name == "my-term" - - def test_multiple_terms(self) -> None: - """Multiple terms.""" - parser = FluentParserV1() - source = "-term1 = Value 1\n-term2 = Value 2\n-term3 = Value 3\n" - resource = parser.parse(source) - assert len(resource.entries) == 3 - assert all(isinstance(e, Term) for e in resource.entries) - - def test_term_with_attributes(self) -> None: - """Term with attributes.""" - parser = FluentParserV1() - source = "-term = Main Value\n .attr = Attribute Value\n" - resource = parser.parse(source) - assert len(resource.entries) >= 1 - - def test_term_and_message_coexist(self) -> None: - """Terms and messages in same resource.""" - parser = FluentParserV1() - source = "-term = term value\nmsg = message value\n" - resource = parser.parse(source) - assert len(resource.entries) == 2 - - def test_failed_term_parsing(self) -> None: - """Parser handles failed term parsing (dash not followed by valid term).""" - parser = FluentParserV1() - result = parser.parse("- invalid\n") - assert result is not None - assert len(result.entries) > 0 - - # -- Junk handling ----------------------------------------------------- - - def test_junk_creates_entry(self) -> None: - """Unparseable content creates Junk entry.""" - parser = FluentParserV1() - resource = parser.parse("%%% invalid syntax") - assert resource is not None - assert len(resource.entries) > 0 - assert any(isinstance(e, Junk) for e in resource.entries) - - def test_junk_continues_parsing(self) -> None: - """Parser continues after junk entry.""" - parser = FluentParserV1() - resource = parser.parse("%%% invalid\nmsg = valid message\n") - assert resource is not None - assert len(resource.entries) >= 2 - - def test_multiline_junk(self) -> None: - """Multi-line junk handling.""" - parser = FluentParserV1() - source = "%%% line 1\n line 2\n line 3\nmsg = valid\n" - resource = parser.parse(source) - assert resource is not None - assert len(resource.entries) > 0 - - def test_junk_eof_with_trailing_spaces(self) -> None: - """Junk parsing handles trailing spaces at EOF.""" - parser = FluentParserV1() - resource = parser.parse("%%% invalid ") - assert resource is not None - assert len(resource.entries) > 0 - assert isinstance(resource.entries[0], Junk) - - def test_junk_trailing_spaces_at_eof(self) -> None: - """Junk with trailing spaces at EOF.""" - parser = FluentParserV1() - resource = parser.parse("invalid syntax ") - assert resource is not None - - def test_multiline_junk_ends_at_eof(self) -> None: - """Multiline junk ending at EOF.""" - parser = FluentParserV1() - source = "invalid line 1\n invalid line 2\n " - resource = parser.parse(source) - assert resource is not None - - -# ============================================================================ -# TestParserCoreHypothesis -# ============================================================================ - - -class TestParserCoreHypothesis: - """Property-based tests for parser core components. - - Uses Hypothesis to verify invariants across generated inputs. - All ``@given`` tests emit ``event()`` calls for HypoFuzz guidance. - """ - - # -- _has_blank_line_between properties -------------------------------- - - @given( - prefix=st.text( - alphabet=st.characters(blacklist_characters=["\n"]), - max_size=10, - ), - suffix=st.text( - alphabet=st.characters(blacklist_characters=["\n"]), - max_size=10, - ), - ) - def test_newline_pair_always_detected( - self, prefix: str, suffix: str, - ) -> None: - """Two consecutive newlines in region are always detected.""" - source = f"{prefix}\n\n{suffix}" - event(f"input_len={len(source)}") - assert _has_blank_line_between(source, 0, len(source)) is True - - @given(st.integers(min_value=2, max_value=10)) - def test_multiple_newlines_always_detected( - self, count: int, - ) -> None: - """Multiple consecutive newlines always detected.""" - event(f"boundary=newline_count_{count}") - source = "\n" * count - assert _has_blank_line_between(source, 0, len(source)) is True - - @given(st.integers(min_value=0, max_value=50)) - def test_spaces_only_never_blank(self, space_count: int) -> None: - """Spaces without newlines never produce a blank line.""" - event(f"boundary=space_count_{min(space_count, 10)}") - source = " " * space_count - assert _has_blank_line_between(source, 0, len(source)) is False - - @given( - st.text( - alphabet=st.characters( - blacklist_characters=["\n", " "], - min_codepoint=33, - max_codepoint=126, - ), - min_size=1, - max_size=20, - ) - ) - def test_ascii_no_newline_no_blank(self, text: str) -> None: - """ASCII text without newlines or spaces has no blank line.""" - event(f"input_len={len(text)}") - assert _has_blank_line_between(text, 0, len(text)) is False - - @given( - non_blank=st.characters( - blacklist_categories=("Zs", "Zl", "Zp"), - blacklist_characters=["\n"], - ) - ) - def test_non_blank_char_with_newlines( - self, non_blank: str, - ) -> None: - """Non-blank char between newlines: first newline is detected.""" - event("outcome=newline_detected") - source = f"\n{non_blank}\n" - assert _has_blank_line_between(source, 0, len(source)) is True - - @given( - lines=st.lists( - st.text( - alphabet=st.characters( - blacklist_categories=("Zs", "Zl", "Zp"), - blacklist_characters=["\n"], - ), - min_size=1, - max_size=5, - ), - min_size=1, - max_size=10, - ) - ) - def test_joined_lines_blank_iff_multiple( - self, lines: list[str], - ) -> None: - """Single-newline-joined non-ws lines: blank iff >1 line.""" - event(f"boundary=line_count_{len(lines)}") - source = "\n".join(lines) - result = _has_blank_line_between(source, 0, len(source)) - if len(lines) > 1: - assert result is True - else: - assert result is False - - @given( - non_blank_chars=st.lists( - st.characters( - blacklist_categories=("Zs", "Zl", "Zp"), - blacklist_characters=["\n"], - ), - min_size=1, - max_size=5, - ) - ) - def test_interleaved_newlines_always_detected( - self, non_blank_chars: list[str], - ) -> None: - """Interleaved newlines and non-blank chars: always has newline.""" - event(f"boundary=char_count_{len(non_blank_chars)}") - parts: list[str] = [] - for char in non_blank_chars: - parts.append("\n") - parts.append(char) - parts.append("\n") - source = "".join(parts) - assert _has_blank_line_between(source, 0, len(source)) is True - - # -- Parser hash-combination property ---------------------------------- - - @given( - st.text( - alphabet="#\n\r \t", min_size=1, max_size=50, - ) - ) - def test_hash_combinations_no_crash(self, source: str) -> None: - """Parser handles any combination of hashes and whitespace.""" - event(f"input_len={len(source)}") - parser = FluentParserV1() - resource = parser.parse(source) - assert resource is not None - assert isinstance(resource.entries, tuple) - has_entries = len(resource.entries) > 0 - event(f"outcome=has_entries_{has_entries}") - - # -- max_parse_errors property ----------------------------------------- - - @given(st.integers(min_value=1, max_value=10)) - def test_custom_limit_respected(self, limit: int) -> None: - """Parser aborts at exactly max_parse_errors limit.""" - event(f"boundary=limit_{limit}") - parser = FluentParserV1(max_parse_errors=limit) - source = "####\n" * (limit + 2) - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) == limit - - # -- Nesting depth property -------------------------------------------- - - @given(st.integers(min_value=1, max_value=5)) - def test_depth_exceeded_includes_limit( - self, depth_limit: int, - ) -> None: - """Depth exceeded diagnostic includes the configured limit.""" - event(f"boundary=depth_{depth_limit}") - parser = FluentParserV1(max_nesting_depth=depth_limit) - nesting = ( - "{ " * (depth_limit + 1) - + "$x" - + " }" * (depth_limit + 1) - ) - source = f"msg = {nesting}\n" - result = parser.parse(source) - junk = [e for e in result.entries if isinstance(e, Junk)] - assert len(junk) >= 1 - for entry in junk: - for ann in entry.annotations: - if ( - ann.code - == DiagnosticCode.PARSE_NESTING_DEPTH_EXCEEDED.name - ): - assert f"max: {depth_limit}" in ann.message - return - pytest.fail( - "Expected PARSE_NESTING_DEPTH_EXCEEDED annotation" - ) - - # -- Recursion limit clamping property --------------------------------- - - @given(depth_offset=st.integers(min_value=1, max_value=500)) - def test_any_excessive_depth_clamped( - self, depth_offset: int, - ) -> None: - """Any depth exceeding recursion limit is clamped.""" - event(f"boundary=offset_{min(depth_offset, 50)}") - recursion_limit = sys.getrecursionlimit() - max_safe = recursion_limit - 50 - excessive = recursion_limit + depth_offset - parser = FluentParserV1(max_nesting_depth=excessive) - assert parser.max_nesting_depth == max_safe - - # -- _CommentAccumulator span property --------------------------------- - - @given( - content1=st.text(min_size=1, max_size=50), - content2=st.text(min_size=1, max_size=50), - start=st.integers(min_value=0, max_value=1000), - end=st.integers(min_value=0, max_value=1000), - ) - def test_accumulator_span_combinations( - self, - content1: str, - content2: str, - start: int, - end: int, - ) -> None: - """Accumulator always produces valid Comment for any span config.""" - if end < start: - start, end = end, start - span = Span(start=start, end=end) - - for first_has in (True, False): - for last_has in (True, False): - event( - f"outcome=first_{first_has}_last_{last_has}" - ) - first = Comment( - content=content1, - type=CommentType.COMMENT, - span=span if first_has else None, - ) - acc = _CommentAccumulator(first) - second = Comment( - content=content2, - type=CommentType.COMMENT, - span=span if last_has else None, - ) - acc.add(second) - result = acc.finalize() - - assert content1 in result.content - assert content2 in result.content - assert "\n" in result.content - - if first_has or last_has: - assert result.span is not None - if first_has != last_has: - assert result.span == span - else: - assert result.span is None - - # -- Comment attachment to term property -------------------------------- - - @given( - comment_text=st.text( - min_size=1, - max_size=100, - alphabet=st.characters( - min_codepoint=32, - max_codepoint=126, - exclude_characters="#\n", - ), - ), - term_name=st.text( - min_size=1, - max_size=20, - alphabet=st.characters( - min_codepoint=ord("a"), - max_codepoint=ord("z"), - ), - ), - term_value=st.text( - min_size=1, - max_size=50, - alphabet=st.characters( - min_codepoint=32, - max_codepoint=126, - exclude_characters="\n{}", - ), - ), - ) - def test_comment_attaches_to_adjacent_term( - self, - comment_text: str, - term_name: str, - term_value: str, - ) -> None: - """Single-hash comment immediately before term is attached.""" - event("outcome=term_attachment") - parser = FluentParserV1() - source = f"# {comment_text}\n-{term_name} = {term_value}\n" - resource = parser.parse(source) - terms = [e for e in resource.entries if isinstance(e, Term)] - if terms: - term = terms[0] - assert term.comment is not None - assert comment_text in term.comment.content - - -# ============================================================================ -# TestParseStream -# ============================================================================ - - -class TestParseStream: - """FluentParserV1.parse_stream incremental entry parsing.""" - - def test_empty_iterable_yields_nothing(self) -> None: - """Empty line iterable produces no entries.""" - parser = FluentParserV1() - assert list(parser.parse_stream([])) == [] - - def test_single_message_from_lines(self) -> None: - """Single message lines yield one Message entry.""" - parser = FluentParserV1() - lines = ["greeting = Hello\n"] - entries = list(parser.parse_stream(lines)) - assert len(entries) == 1 - assert isinstance(entries[0], Message) - assert entries[0].id.name == "greeting" - - def test_two_messages_blank_line_separated(self) -> None: - """Two messages separated by blank line are yielded in order.""" - parser = FluentParserV1() - lines = ["msg1 = One\n", "\n", "msg2 = Two\n"] - entries = list(parser.parse_stream(lines)) - assert len(entries) == 2 - assert isinstance(entries[0], Message) - assert isinstance(entries[1], Message) - assert entries[0].id.name == "msg1" - assert entries[1].id.name == "msg2" - - def test_adjacent_messages_same_chunk(self) -> None: - """Messages with no blank line between are in one chunk, both yielded.""" - parser = FluentParserV1() - lines = ["msg1 = One\n", "msg2 = Two\n"] - entries = list(parser.parse_stream(lines)) - msg_entries = [e for e in entries if isinstance(e, Message)] - assert len(msg_entries) == 2 - assert msg_entries[0].id.name == "msg1" - assert msg_entries[1].id.name == "msg2" - - def test_comment_attached_to_message(self) -> None: - """Comment immediately before message (no blank line) is attached.""" - parser = FluentParserV1() - lines = ["# A comment\n", "greeting = Hello\n"] - entries = list(parser.parse_stream(lines)) - messages = [e for e in entries if isinstance(e, Message)] - assert len(messages) == 1 - assert messages[0].comment is not None - assert "A comment" in messages[0].comment.content - - def test_standalone_comment_blank_line_before_message(self) -> None: - """Comment separated by blank line from message is a standalone Comment.""" - parser = FluentParserV1() - lines = ["# Standalone\n", "\n", "greeting = Hello\n"] - entries = list(parser.parse_stream(lines)) - assert len(entries) == 2 - assert isinstance(entries[0], Comment) - assert isinstance(entries[1], Message) - assert entries[1].comment is None - - def test_multiline_message_with_attributes(self) -> None: - """Message with attributes spanning multiple lines is yielded as one Message.""" - parser = FluentParserV1() - lines = ["submit =\n", " .label = Submit\n", " .tooltip = Click here\n"] - entries = list(parser.parse_stream(lines)) - messages = [e for e in entries if isinstance(e, Message)] - assert len(messages) == 1 - assert messages[0].id.name == "submit" - assert len(messages[0].attributes) == 2 - - def test_junk_entry_is_yielded(self) -> None: - """Unparseable content is yielded as Junk.""" - parser = FluentParserV1() - lines = [" indented = invalid\n"] - entries = list(parser.parse_stream(lines)) - junk_entries = [e for e in entries if isinstance(e, Junk)] - assert len(junk_entries) >= 1 - - def test_generator_input_accepted(self) -> None: - """Generator (not just list) is accepted as lines argument.""" - parser = FluentParserV1() - - def line_gen() -> object: - yield "msg = Value\n" - - entries = list(parser.parse_stream(line_gen())) # type: ignore[arg-type] - assert len(entries) == 1 - assert isinstance(entries[0], Message) - - def test_lines_without_trailing_newlines(self) -> None: - """Lines without trailing newlines are handled correctly.""" - parser = FluentParserV1() - lines = ["msg1 = One", "", "msg2 = Two"] - entries = list(parser.parse_stream(lines)) - msg_entries = [e for e in entries if isinstance(e, Message)] - assert len(msg_entries) == 2 - - def test_leading_blank_line_is_skipped(self) -> None: - """Blank line before any content is silently skipped. - - When a blank line is encountered with an empty accumulator chunk, the - elif chunk: branch evaluates to False and the loop continues to the next - line without flushing. This covers the 593->589 branch in parse_stream. - """ - parser = FluentParserV1() - lines = ["", "greeting = Hello\n"] - entries = list(parser.parse_stream(lines)) - msg_entries = [e for e in entries if isinstance(e, Message)] - assert len(msg_entries) == 1 - assert msg_entries[0].id.name == "greeting" - - def test_consecutive_blank_lines_between_messages(self) -> None: - """Consecutive blank lines between messages are handled correctly. - - After the first blank line flushes the accumulator, a second consecutive - blank line hits the elif chunk: False branch (empty accumulator) again, - exercising the 593->589 branch path a second time per stream. - """ - parser = FluentParserV1() - lines = ["msg1 = One\n", "\n", "\n", "msg2 = Two\n"] - entries = list(parser.parse_stream(lines)) - msg_entries = [e for e in entries if isinstance(e, Message)] - assert len(msg_entries) == 2 - assert msg_entries[0].id.name == "msg1" - assert msg_entries[1].id.name == "msg2" - - def test_term_is_yielded(self) -> None: - """Term entry is correctly parsed and yielded.""" - parser = FluentParserV1() - lines = ["-brand = Firefox\n"] - entries = list(parser.parse_stream(lines)) - terms = [e for e in entries if isinstance(e, Term)] - assert len(terms) == 1 - assert terms[0].id.name == "brand" - - @given( - names=st.lists( - st.text( - min_size=1, - max_size=20, - alphabet=st.characters( - min_codepoint=ord("a"), - max_codepoint=ord("z"), - ), - ), - min_size=1, - max_size=10, - ) - ) - def test_entry_count_matches_parse_stream_vs_parse( - self, names: list[str] - ) -> None: - """parse_stream yields same entry count as parse() for well-formed FTL.""" - event(f"msg_count={len(names)}") - source = "\n\n".join(f"{name} = Value" for name in names) + "\n" - parser = FluentParserV1() - stream_entries = list(parser.parse_stream(source.splitlines(keepends=True))) - full_entries = list(parser.parse(source).entries) - assert len(stream_entries) == len(full_entries) - - @given( - names=st.lists( - st.text( - min_size=1, - max_size=20, - alphabet=st.characters( - min_codepoint=ord("a"), - max_codepoint=ord("z"), - ), - ), - min_size=1, - max_size=10, - ) - ) - def test_message_ids_match_parse_stream_vs_parse( - self, names: list[str] - ) -> None: - """Message IDs from parse_stream match those from parse() for well-formed FTL.""" - event(f"msg_count={len(names)}") - # Deduplicate names to avoid overwrite warnings - unique_names = list(dict.fromkeys(names)) - source = "\n\n".join(f"{name} = Value" for name in unique_names) + "\n" - parser = FluentParserV1() - stream_ids = { - e.id.name for e in parser.parse_stream(source.splitlines(keepends=True)) - if isinstance(e, Message) - } - full_ids = {e.id.name for e in parser.parse(source).entries if isinstance(e, Message)} - assert stream_ids == full_ids +"""Aggregated syntax parser core test surface.""" + +from tests.syntax_parser_core_cases.blank_line_detection import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_core_cases.comment_merging import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_core_cases.do_slimits_and_validation import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_core_cases.do_sprotection import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_core_cases.parse_stream_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_core_cases.parser_core_hypothesis import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_core_cases.parser_entry_recovery import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_core_cases.parser_error_recovery_core import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_parser_error_recovery.py b/tests/test_syntax_parser_error_recovery.py index 0ce64c77..61e576f4 100644 --- a/tests/test_syntax_parser_error_recovery.py +++ b/tests/test_syntax_parser_error_recovery.py @@ -1,1095 +1,16 @@ -"""Error recovery, defensive code paths, and edge-case coverage for parser rules. - -Consolidated from 12 per-metric test files into a single semantic unit. -Covers: error paths, defensive/unreachable branches (via mocking), FluentParserV1 -integration for malformed input, and property-based edge-case tests. -""" - -from __future__ import annotations - -import logging -import sys -from unittest.mock import patch - -from ftllexengine.syntax.ast import ( - Attribute, - Identifier, - Junk, - Message, - MessageReference, - NumberLiteral, - Placeable, - StringLiteral, - TextElement, - VariableReference, -) -from ftllexengine.syntax.cursor import Cursor, ParseError, ParseResult -from ftllexengine.syntax.parser.core import FluentParserV1 -from ftllexengine.syntax.parser.rules import ( - ParseContext, - _parse_inline_hyphen, - _parse_inline_identifier, - parse_argument_expression, - parse_attribute, - parse_call_arguments, - parse_function_reference, - parse_inline_expression, - parse_message, - parse_pattern, - parse_placeable, - parse_select_expression, - parse_simple_pattern, - parse_term, - parse_term_reference, - parse_variant, - parse_variant_key, -) - -# ============================================================================ -# VARIANT KEY ERROR PATHS -# ============================================================================ - - -class TestVariantKeyErrorPaths: - """Error paths in parse_variant_key and parse_variant.""" - - def test_negative_sign_both_fail(self) -> None: - """Hyphen: parse_number fails, parse_identifier fails too.""" - cursor = Cursor("-", 0) - result = parse_variant_key(cursor) - assert result is None - - def test_negative_sign_identifier_fallback_via_mock(self) -> None: - """Lines 878-879: Number fails, identifier succeeds (defensive). - - Structurally unreachable without mocking because if cursor starts - with '-', parse_identifier also fails (can't start with '-'). - """ - with ( - patch( - "ftllexengine.syntax.parser.expressions.parse_number" - ) as mock_num, - patch( - "ftllexengine.syntax.parser.expressions.parse_identifier" - ) as mock_id, - ): - mock_num.return_value = ParseError("forced failure", Cursor("-test", 0)) - mock_id.return_value = ParseResult( - "test", Cursor("test", 4) - ) - cursor = Cursor("-test", 0) - result = parse_variant_key(cursor) - assert result is not None - - def test_variant_missing_opening_bracket(self) -> None: - """parse_variant: no '[' at start.""" - assert parse_variant(Cursor("one", 0)) is None - - def test_variant_missing_closing_bracket(self) -> None: - """parse_variant: no ']' after key.""" - assert parse_variant(Cursor("[one", 0)) is None - - def test_variant_invalid_key(self) -> None: - """parse_variant: invalid key character.""" - assert parse_variant(Cursor("[@]", 0)) is None - - def test_select_no_variants(self) -> None: - """parse_select_expression: immediate close, no variants.""" - sel = VariableReference(id=Identifier("count")) - assert parse_select_expression(Cursor("}", 0), sel, 0) is None - - def test_select_no_default_variant(self) -> None: - """parse_select_expression: variants without default.""" - sel = VariableReference(id=Identifier("count")) - result = parse_select_expression( - Cursor("[one] item\n}", 0), sel, 0 - ) - assert result is None - - -# ============================================================================ -# ARGUMENT EXPRESSION ERROR PATHS -# ============================================================================ - - -class TestArgumentExpressionErrorPaths: - """Error paths in parse_argument_expression.""" - - def test_eof_returns_none(self) -> None: - """EOF at argument position.""" - assert parse_argument_expression(Cursor("", 0)) is None - - def test_invalid_char_returns_none(self) -> None: - """Invalid character (@) returns None.""" - assert parse_argument_expression(Cursor("@", 0)) is None - - def test_term_ref_fails_line_1105(self) -> None: - """Line 1105: Term reference parse fails (hyphen + identifier).""" - result = parse_argument_expression(Cursor("-x.123)", 0)) - assert result is None - - def test_term_ref_bare_hyphen_fails(self) -> None: - """Hyphen followed by ')' fails term and number parse.""" - assert parse_argument_expression(Cursor("-)", 0)) is None - - def test_number_fails_defensive_line_1120(self) -> None: - """Line 1120: parse_number returns None on digit (defensive). - - Requires mocking because parse_number is robust for digit start. - """ - with patch( - "ftllexengine.syntax.parser.expressions.parse_number" - ) as mock: - mock.return_value = ParseError("forced failure", Cursor("9)", 0)) - assert parse_argument_expression(Cursor("9)", 0)) is None - - def test_identifier_fails_defensive_line_1139(self) -> None: - """Line 1139: parse_identifier returns None (defensive). - - Requires mocking because is_identifier_start guarantees success. - """ - with patch( - "ftllexengine.syntax.parser.expressions.parse_identifier" - ) as mock: - mock.return_value = ParseError("forced failure", Cursor("x)", 0)) - assert parse_argument_expression(Cursor("x)", 0)) is None - - def test_function_ref_fails_line_1150(self) -> None: - """Line 1150: parse_function_reference returns None.""" - assert parse_argument_expression( - Cursor("FUNC(@)", 0) - ) is None - - def test_function_ref_succeeds(self) -> None: - """Function reference parsing succeeds.""" - result = parse_argument_expression(Cursor("NUMBER(42)", 0)) - assert result is not None - - def test_uppercase_no_paren_is_message_ref(self) -> None: - """Uppercase identifier without '(' is MessageReference.""" - result = parse_argument_expression(Cursor("NUMBER", 0)) - assert result is not None - assert isinstance(result.value, MessageReference) - assert result.value.id.name == "NUMBER" - - def test_uppercase_open_paren_at_eof(self) -> None: - """Uppercase + '(' but incomplete call.""" - assert parse_argument_expression(Cursor("NUMBER(", 0)) is None - - def test_negative_number_succeeds(self) -> None: - """Negative number parses as NumberLiteral.""" - result = parse_argument_expression(Cursor("-123", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_positive_number_succeeds(self) -> None: - """Digit-start parses as NumberLiteral.""" - result = parse_argument_expression(Cursor("42", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_string_literal_argument(self) -> None: - """String literal in argument position.""" - result = parse_argument_expression(Cursor('"text"', 0)) - assert result is not None - assert isinstance(result.value, StringLiteral) - - def test_inline_placeable_argument(self) -> None: - """Inline placeable { $var } in argument position.""" - result = parse_argument_expression(Cursor("{ $var }", 0)) - assert result is not None - assert isinstance(result.value, Placeable) - - def test_identifier_with_underscore(self) -> None: - """Identifier can contain underscore after letter.""" - result = parse_argument_expression(Cursor("my_var", 0)) - assert result is not None - assert isinstance(result.value, MessageReference) - - -# ============================================================================ -# CALL ARGUMENTS ERROR PATHS -# ============================================================================ - - -class TestCallArgumentsErrorPaths: - """Error paths in parse_call_arguments.""" - - def test_named_arg_name_not_identifier(self) -> None: - """Named argument name must be identifier (not variable).""" - result = parse_call_arguments(Cursor('$var: "value")', 0)) - assert result is None - - def test_duplicate_named_argument(self) -> None: - """Duplicate named argument names.""" - assert parse_call_arguments(Cursor("x: 1, x: 2)", 0)) is None - - def test_named_arg_missing_value(self) -> None: - """Expected value after ':' but got ')'.""" - assert parse_call_arguments(Cursor("x: )", 0)) is None - - def test_named_arg_value_parse_fails(self) -> None: - """Value expression parse fails after ':'.""" - assert parse_call_arguments(Cursor("x: @)", 0)) is None - - def test_named_arg_eof_after_colon(self) -> None: - """EOF after ':' in named argument.""" - assert parse_call_arguments(Cursor("x:", 0)) is None - - def test_positional_after_named(self) -> None: - """Positional args must come before named.""" - assert parse_call_arguments(Cursor("x: 1, $var)", 0)) is None - - def test_named_arg_non_literal_value(self) -> None: - """Named argument value must be literal.""" - assert parse_call_arguments( - Cursor("x: $var)", 0) - ) is None - - def test_trailing_comma(self) -> None: - """Trailing comma in argument list.""" - result = parse_call_arguments(Cursor("1, 2, )", 0)) - assert result is not None - assert len(result.value.positional) == 2 - - def test_argument_expression_fails_in_loop(self) -> None: - """Argument expression fails at '@'.""" - assert parse_call_arguments(Cursor("@)", 0)) is None - - -# ============================================================================ -# INLINE EXPRESSION AND HELPER ERROR PATHS -# ============================================================================ - - -class TestInlineExpressionErrorPaths: - """Error paths in inline expression helpers.""" - - def test_inline_hyphen_all_fail(self) -> None: - """_parse_inline_hyphen: both term and number fail.""" - assert _parse_inline_hyphen(Cursor("-", 0)) is None - - def test_inline_hyphen_term_attr_fails_line_1365(self) -> None: - """Line 1365: Term reference fails (invalid attribute).""" - result = _parse_inline_hyphen(Cursor("-x.123", 0)) - assert result is None - - def test_inline_identifier_function_fails(self) -> None: - """_parse_inline_identifier: function parse fails.""" - assert _parse_inline_identifier( - Cursor("func(@)", 0) - ) is None - - def test_inline_identifier_parse_fails(self) -> None: - """_parse_inline_identifier: parse_identifier fails.""" - assert _parse_inline_identifier(Cursor("123", 0)) is None - - def test_inline_expression_eof(self) -> None: - """parse_inline_expression: EOF returns None.""" - assert parse_inline_expression(Cursor("", 0)) is None - - def test_inline_expression_invalid_char(self) -> None: - """parse_inline_expression: invalid character returns None.""" - assert parse_inline_expression(Cursor("@", 0)) is None - - def test_inline_expression_variable_fails(self) -> None: - """parse_inline_expression: '$' but identifier fails.""" - assert parse_inline_expression(Cursor("$", 0)) is None - - def test_inline_expression_nested_placeable_fails(self) -> None: - """parse_inline_expression: nested placeable fails.""" - assert parse_inline_expression(Cursor("{ @ }", 0)) is None - - def test_inline_expression_message_attr_fails(self) -> None: - """Message reference attribute parsing fails (invalid attr).""" - cursor = Cursor("msg.-test", 0) - result = parse_inline_expression(cursor) - assert result is None or ( - result is not None and hasattr(result, "value") - ) - - -# ============================================================================ -# PLACEABLE ERROR PATHS -# ============================================================================ - - -class TestPlaceableErrorPaths: - """Error paths in parse_placeable.""" - - def test_depth_exceeded(self) -> None: - """Nesting depth exceeded returns None.""" - ctx = ParseContext(max_nesting_depth=1, current_depth=2) - assert parse_placeable(Cursor("$var}", 0), ctx) is None - - def test_expression_parse_fails(self) -> None: - """Expression fails at '@'.""" - assert parse_placeable(Cursor("@}", 0)) is None - - def test_select_parse_fails(self) -> None: - """Select expression fails (no variants).""" - assert parse_placeable(Cursor("$var -> }", 0)) is None - - def test_select_missing_closing_brace(self) -> None: - """Select expression without closing }.""" - result = parse_placeable( - Cursor("$var -> [one] 1 *[other] N", 0) - ) - assert result is None - - def test_simple_expression_missing_closing_brace(self) -> None: - """Simple expression without closing }.""" - assert parse_placeable(Cursor("$var", 0)) is None - - def test_valid_selector_with_select_line_1585(self) -> None: - """Line 1585: Valid selector with select expression.""" - result = parse_placeable( - Cursor("$n -> [one] One *[other] Many}", 0) - ) - assert result is not None - - def test_hyphen_not_arrow(self) -> None: - """'-' but not '->' skips to simple close.""" - result = parse_placeable(Cursor("$var - }", 0)) - # Malformed, may return None or partial - assert result is None or result is not None - - -# ============================================================================ -# FUNCTION REFERENCE ERROR PATHS -# ============================================================================ - - -class TestFunctionReferenceErrorPaths: - """Error paths in parse_function_reference.""" - - def test_identifier_parse_fails(self) -> None: - """Non-identifier character at start.""" - assert parse_function_reference(Cursor("123", 0)) is None - - def test_missing_opening_paren(self) -> None: - """Valid name but no '('.""" - assert parse_function_reference(Cursor("FUNC", 0)) is None - - def test_missing_closing_paren(self) -> None: - """Arguments but no closing ')'.""" - assert parse_function_reference(Cursor("FUNC($x", 0)) is None - - def test_arguments_parse_fails(self) -> None: - """Call arguments fail at '@'.""" - assert parse_function_reference( - Cursor("FUNC(@)", 0) - ) is None - - def test_depth_exceeded(self) -> None: - """Nesting depth exceeded.""" - ctx = ParseContext(max_nesting_depth=1, current_depth=2) - assert parse_function_reference( - Cursor("FUNC($x)", 0), ctx - ) is None - - -# ============================================================================ -# TERM REFERENCE ERROR PATHS -# ============================================================================ - - -class TestTermReferenceErrorPaths: - """Error paths in parse_term_reference.""" - - def test_missing_hyphen(self) -> None: - """No '-' at start.""" - assert parse_term_reference(Cursor("brand", 0)) is None - - def test_identifier_fails_after_hyphen(self) -> None: - """Identifier parse fails after '-'.""" - assert parse_term_reference(Cursor("-", 0)) is None - - def test_attribute_identifier_fails(self) -> None: - """Attribute identifier parse fails after '.'.""" - assert parse_term_reference(Cursor("-brand.", 0)) is None - - def test_arguments_parse_fails(self) -> None: - """Call arguments fail for term args.""" - assert parse_term_reference( - Cursor("-brand(@)", 0) - ) is None - - def test_arguments_missing_closing_paren_1449(self) -> None: - """Lines 1449-1450: Expected ')' after term arguments.""" - result = parse_term_reference( - Cursor("-brand(case: 'nom'", 0) - ) - assert result is None - - def test_depth_exceeded_with_arguments(self) -> None: - """Depth exceeded when parsing term arguments.""" - ctx = ParseContext(max_nesting_depth=2) - nested = ctx.enter_nesting().enter_nesting() - result = parse_term_reference( - Cursor('-brand(case: "nom")', 0), nested - ) - assert result is None - - def test_without_arguments_at_depth_limit(self) -> None: - """Term ref without args succeeds at depth limit.""" - ctx = ParseContext(max_nesting_depth=2) - nested = ctx.enter_nesting().enter_nesting() - result = parse_term_reference(Cursor("-brand", 0), nested) - assert result is not None - assert result.value.id.name == "brand" - - def test_with_arguments_succeeds(self) -> None: - """Term ref with arguments below depth limit.""" - result = parse_term_reference( - Cursor('-term(case: "gen")', 0) - ) - assert result is not None - assert result.value.arguments is not None - - -# ============================================================================ -# DEFENSIVE MOCKING TESTS (UNREACHABLE CODE PATHS) -# ============================================================================ - - -class TestDefensiveMocking: - """Defensive None checks for unreachable code paths. - - These lines are structurally unreachable in normal execution but - exist as guardrails against future refactoring. - """ - - def test_parse_message_attrs_returns_none(self) -> None: - """parse_message_attributes returns None (defensive).""" - with patch( - "ftllexengine.syntax.parser.entries" - ".parse_message_attributes" - ) as mock: - mock.return_value = None - assert parse_message( - Cursor("hello = value", 0) - ) is None - - def test_parse_attribute_pattern_returns_none(self) -> None: - """parse_pattern returns None in parse_attribute (defensive).""" - with patch( - "ftllexengine.syntax.parser.entries.parse_pattern" - ) as mock: - mock.return_value = None - assert parse_attribute( - Cursor(".attr = value", 0) - ) is None - - def test_parse_term_pattern_returns_none(self) -> None: - """parse_pattern returns None in parse_term (defensive).""" - with patch( - "ftllexengine.syntax.parser.entries.parse_pattern" - ) as mock: - mock.return_value = None - assert parse_term( - Cursor("-brand = value", 0) - ) is None - - def test_parse_term_attrs_returns_none_line_2038(self) -> None: - """Line 2038: parse_message_attributes returns None in term.""" - with patch( - "ftllexengine.syntax.parser.entries" - ".parse_message_attributes" - ) as mock: - mock.return_value = None - assert parse_term( - Cursor("-brand = value", 0) - ) is None - - def test_parse_message_pattern_returns_none(self) -> None: - """parse_pattern returns None in parse_message (defensive).""" - with patch( - "ftllexengine.syntax.parser.entries.parse_pattern" - ) as mock: - mock.return_value = None - assert parse_message( - Cursor("hello = value", 0) - ) is None - - -# ============================================================================ -# PARSER INTEGRATION - MALFORMED INPUT -# ============================================================================ - - -class TestParserMalformedInput: - """FluentParserV1 integration for error recovery on malformed FTL.""" - - def test_four_hash_comment_recovery(self) -> None: - """Invalid >3 hash comment is recovered as junk.""" - parser = FluentParserV1() - res = parser.parse( - "#### Invalid\nkey = value" - ) - assert any( - hasattr(e, "id") and e.id.name == "key" - for e in res.entries - ) - - def test_multiple_junk_entries(self) -> None: - """Multiple malformed entries create multiple junk entries.""" - parser = FluentParserV1() - res = parser.parse( - "!!!invalid1\n!!!invalid2\nkey = value\n" - ) - assert any( - hasattr(e, "id") and e.id.name == "key" - for e in res.entries - ) - - def test_junk_with_unicode(self) -> None: - """Junk entries with non-ASCII characters.""" - parser = FluentParserV1() - res = parser.parse("¡¡¡ invalid\nkey = value\n") - assert len(res.entries) >= 1 - - def test_empty_variant_key(self) -> None: - """Empty variant key [].""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $c -> [] x *[o] O }\n" - ) - assert len(res.entries) >= 1 - - def test_unclosed_variant_bracket(self) -> None: - """Unclosed variant bracket.""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $c -> [unclosed X *[o] O }\n" - ) - assert len(res.entries) >= 1 - - def test_select_missing_arrow(self) -> None: - """Select expression without '->'.""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $val\n [one] One\n *[other] Other\n}\n" - ) - junk = [e for e in res.entries if isinstance(e, Junk)] - assert len(junk) >= 1 - - def test_unclosed_placeable(self) -> None: - """Unclosed placeable creates junk.""" - parser = FluentParserV1() - res = parser.parse("msg = { $value") - assert isinstance(res.entries[0], Junk) - - def test_invalid_variant_syntax(self) -> None: - """Invalid variant syntax (missing '[').""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $c ->\n one] One\n *[other] O\n}\n" - ) - junk = [e for e in res.entries if isinstance(e, Junk)] - assert len(junk) >= 1 - - def test_empty_placeable(self) -> None: - """Empty placeable { }.""" - parser = FluentParserV1() - res = parser.parse("key = { }") - assert res is not None - - def test_standalone_attribute(self) -> None: - """Attribute without Message/Term creates junk.""" - parser = FluentParserV1() - res = parser.parse(" .attr = Value") - assert isinstance(res.entries[0], Junk) - - def test_invalid_term_name(self) -> None: - """Term '-' without valid identifier.""" - parser = FluentParserV1() - res = parser.parse("- = Invalid") - assert len(res.entries) >= 1 - - def test_message_without_equals(self) -> None: - """Message identifier without '=' creates junk.""" - parser = FluentParserV1() - res = parser.parse("test Hello") - assert isinstance(res.entries[0], Junk) - - def test_identifier_starting_with_number(self) -> None: - """Identifier starting with number creates junk.""" - parser = FluentParserV1() - res = parser.parse("123invalid = Value") - assert isinstance(res.entries[0], Junk) - - def test_eof_after_equals(self) -> None: - """EOF after '=' sign.""" - parser = FluentParserV1() - res = parser.parse("msg =") - assert len(res.entries) > 0 - - def test_eof_after_identifier(self) -> None: - """File ends right after message ID.""" - parser = FluentParserV1() - res = parser.parse("msg") - assert len(res.entries) > 0 - - def test_multiple_errors_creates_multiple_junk(self) -> None: - """Multiple errors create junk interleaved with valid entries.""" - parser = FluentParserV1() - res = parser.parse( - "invalid1 Missing\nvalid = Good\n" - "invalid2 Also\nanother = OK\n" - ) - assert len(res.entries) == 4 - junk_count = sum( - 1 for e in res.entries if isinstance(e, Junk) - ) - assert junk_count == 2 - - -class TestParserMalformedExpressions: - """FluentParserV1 integration for malformed expressions.""" - - def test_invalid_selector_variable(self) -> None: - """$ followed by invalid character in selector.""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $-invalid -> *[key] Value }" - ) - assert any(isinstance(e, Junk) for e in res.entries) - - def test_unclosed_string_literal_in_selector(self) -> None: - """Unclosed string literal in selector.""" - parser = FluentParserV1() - res = parser.parse( - 'msg = { "unclosed -> *[key] Value }' - ) - assert any(isinstance(e, Junk) for e in res.entries) - - def test_function_no_parens(self) -> None: - """UPPERCASE without parens is MessageReference.""" - parser = FluentParserV1() - res = parser.parse("key = { FUNC }") - msg = res.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - p = msg.value.elements[0] - assert isinstance(p, Placeable) - assert isinstance(p.expression, MessageReference) - - def test_function_missing_argument(self) -> None: - """Function with incomplete arguments.""" - parser = FluentParserV1() - res = parser.parse("key = { UPPERCASE( }") - assert res is not None - - def test_function_invalid_argument(self) -> None: - """Function with @invalid argument.""" - parser = FluentParserV1() - res = parser.parse("key = { FUNC(@invalid) }") - assert res is not None - - def test_term_ref_invalid_identifier(self) -> None: - """Term reference '-#' with invalid identifier.""" - parser = FluentParserV1() - res = parser.parse("key = { -# }") - assert len(res.entries) >= 1 - - def test_lowercase_function_call(self) -> None: - """Lowercase identifier with () is now valid per spec.""" - parser = FluentParserV1() - res = parser.parse("key = { lowercase() }") - assert len(res.entries) >= 1 - - def test_nested_malformed(self) -> None: - """Deeply malformed nested structures.""" - parser = FluentParserV1() - res = parser.parse( - "key1 = { $v -> [a] { FUNC( *[b] X }\nkey2 = ok\n" - ) - assert len(res.entries) >= 1 - - def test_term_reference_arguments_unclosed(self) -> None: - """Term arguments without closing ')'.""" - parser = FluentParserV1() - res = parser.parse("key = { -term(arg ") - assert res is not None - - def test_named_argument_number_as_name(self) -> None: - """Number as named argument name.""" - parser = FluentParserV1() - res = parser.parse('key = { FUNC(123: "value") }') - assert res is not None - - def test_duplicate_named_argument_via_parser(self) -> None: - """Duplicate named argument names via parser.""" - parser = FluentParserV1() - res = parser.parse("key = { FUNC(foo: 1, foo: 2) }") - assert res is not None - - def test_positional_after_named_via_parser(self) -> None: - """Positional after named argument.""" - parser = FluentParserV1() - res = parser.parse("key = { FUNC(name: 1, 2) }") - assert res is not None - - def test_named_arg_missing_value_via_parser(self) -> None: - """Named argument missing value.""" - parser = FluentParserV1() - res = parser.parse("key = { FUNC(name:) }") - assert res is not None - - def test_incomplete_number_at_eof(self) -> None: - """Number literal at EOF without closing brace.""" - parser = FluentParserV1() - res = parser.parse("msg = { 42") - assert len(res.entries) > 0 - - def test_number_multiple_decimal_points(self) -> None: - """Number with multiple decimal points.""" - parser = FluentParserV1() - res = parser.parse("msg = { 1.2.3 }") - assert len(res.entries) >= 1 - - def test_select_with_empty_variant_value(self) -> None: - """Select expression with empty variant value.""" - parser = FluentParserV1() - res = parser.parse( - "test = { $c ->\n [one]\n *[other] O\n}\n" - ) - assert len(res.entries) >= 1 - - -# ============================================================================ -# MESSAGE REFERENCE WITH ATTRIBUTES -# ============================================================================ - - -class TestMessageReferenceWithAttribute: - """Coverage for lowercase message references with .attribute syntax.""" - - def test_msg_dot_attr_inline(self) -> None: - """Parse { msg.attr } in inline expression.""" - parser = FluentParserV1() - res = parser.parse("key = { msg.attr }") - msg = res.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - p = msg.value.elements[0] - assert isinstance(p, Placeable) - ref = p.expression - assert isinstance(ref, MessageReference) - assert ref.id.name == "msg" - assert ref.attribute is not None - assert ref.attribute.name == "attr" - - def test_msg_dot_attr_in_attribute_value(self) -> None: - """Parse { msg.help } in message attribute value.""" - parser = FluentParserV1() - res = parser.parse( - "key = Value\n .tooltip = { msg.help }\n" - ) - msg = res.entries[0] - assert isinstance(msg, Message) - attr = msg.attributes[0] - assert isinstance(attr, Attribute) - p = attr.value.elements[0] - assert isinstance(p, Placeable) - ref = p.expression - assert isinstance(ref, MessageReference) - assert ref.attribute is not None - assert ref.attribute.name == "help" - - def test_msg_dot_missing_attr_name(self) -> None: - """{ msg. } with missing attribute name.""" - parser = FluentParserV1() - res = parser.parse("key = { msg. }") - assert len(res.entries) >= 1 - - def test_msg_dot_invalid_attr(self) -> None: - """{ msg.@ } with invalid attribute.""" - parser = FluentParserV1() - res = parser.parse("key = { msg.@ }") - assert res is not None - - def test_msg_dot_hash_attr(self) -> None: - """{ msg.# } with invalid attribute.""" - parser = FluentParserV1() - res = parser.parse("key = { msg.# }") - assert len(res.entries) >= 1 - - def test_mixed_identifiers_with_attributes(self) -> None: - """Various identifier cases with attributes.""" - parser = FluentParserV1() - cases = [ - ("key = { foo.bar }", "foo", "bar"), - ("key = { a.b }", "a", "b"), - ("key = { msg123.attr456 }", "msg123", "attr456"), - ] - for source, exp_msg, exp_attr in cases: - res = parser.parse(source) - msg = res.entries[0] - assert isinstance(msg, Message), f"Failed: {source}" - assert msg.value is not None - p = msg.value.elements[0] - assert isinstance(p, Placeable) - ref = p.expression - assert isinstance(ref, MessageReference) - assert ref.id.name == exp_msg - assert ref.attribute is not None - assert ref.attribute.name == exp_attr - - -# ============================================================================ -# DEBUG LOGGING -# ============================================================================ - - -class TestDebugLogging: - """Tests for debug logging coverage (junk creation).""" - - def test_junk_creation_triggers_debug_log(self) -> None: - """Debug logging when creating Junk entries.""" - logging.basicConfig( - level=logging.DEBUG, stream=sys.stderr, force=True - ) - try: - parser = FluentParserV1() - res = parser.parse("invalid { syntax") - assert len(res.entries) >= 1 - except KeyError: - pass - finally: - logging.basicConfig( - level=logging.WARNING, force=True - ) - - -# ============================================================================ -# WHITESPACE AND LINE ENDING EDGE CASES -# ============================================================================ - - -class TestWhitespaceAndLineEndings: - """Whitespace, CRLF, and formatting edge cases.""" - - def test_crlf_multiline(self) -> None: - """CRLF (\\r\\n) line endings in multiline pattern.""" - parser = FluentParserV1() - res = parser.parse( - "key =\r\n Line one\r\n Line two\r\n" - ) - msg = res.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - assert len(msg.value.elements) >= 2 - - def test_mixed_line_endings(self) -> None: - """Mixed \\r\\n and \\n line endings.""" - parser = FluentParserV1() - res = parser.parse( - "k1 = v1\r\nk2 = v2\nk3 = v3" - ) - assert len(res.entries) == 3 - - def test_tabs_in_pattern(self) -> None: - """Tabs in pattern are literal text.""" - parser = FluentParserV1() - res = parser.parse("key = value\twith\ttabs") - assert len(res.entries) == 1 - - def test_multiple_blank_lines(self) -> None: - """Multiple consecutive blank lines between entries.""" - parser = FluentParserV1() - res = parser.parse("k1 = v1\n\n\n\nk2 = v2") - assert len(res.entries) == 2 - - def test_empty_source(self) -> None: - """Empty source produces empty resource.""" - parser = FluentParserV1() - res = parser.parse("") - assert len(res.entries) == 0 - - def test_windows_crlf_entries(self) -> None: - """Windows CRLF between entries.""" - parser = FluentParserV1() - res = parser.parse("test = Hello\r\nworld = World\r\n") - assert len(res.entries) == 2 - - def test_text_with_stop_char_bracket(self) -> None: - """Text stops at '[' bracket.""" - parser = FluentParserV1() - res = parser.parse("key = text[bracket") - msg = res.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - text_vals = [ - e.value for e in msg.value.elements - if isinstance(e, TextElement) - ] - assert any("text" in v for v in text_vals) - - -# ============================================================================ -# PATTERN CONTINUATION EDGE CASES -# ============================================================================ - - -class TestPatternContinuationEdgeCases: - """Pattern continuation and text accumulation edge cases.""" - - def test_pattern_line_691_placeable_continuation(self) -> None: - """Placeable then continuation creates new text element.""" - result = parse_pattern(Cursor("{$x}\n {$y}", 0)) - assert result is not None - - def test_pattern_continuation_after_placeable(self) -> None: - """Continuation text as new element after placeable.""" - result = parse_pattern( - Cursor("{$var}\n continuation", 0) - ) - assert result is not None - assert len(result.value.elements) >= 2 - - def test_continuation_at_start(self) -> None: - """Continuation at start of pattern.""" - result = parse_pattern(Cursor("\n {$x}", 0)) - assert result is not None - - def test_simple_pattern_continuation_before_placeable(self) -> None: - """text accumulation before placeable in simple pattern.""" - result = parse_simple_pattern( - Cursor("hello\n world{$x}", 0) - ) - assert result is not None - - def test_simple_pattern_continuation_at_end(self) -> None: - """text accumulation finalized at end of simple pattern.""" - result = parse_simple_pattern( - Cursor("hello\n world", 0) - ) - assert result is not None - - def test_pattern_at_eof_no_newline(self) -> None: - """Pattern ends at EOF without newline.""" - parser = FluentParserV1() - res = parser.parse("key = value") - assert len(res.entries) == 1 - - def test_pattern_ending_at_variant_marker(self) -> None: - """Pattern ends at start of variant marker.""" - parser = FluentParserV1() - res = parser.parse("key = text\n [") - assert len(res.entries) >= 1 - - def test_select_with_malformed_arrow_eof(self) -> None: - """Incomplete arrow at EOF.""" - parser = FluentParserV1() - res = parser.parse("key = { $var -") - assert len(res.entries) >= 1 - - def test_function_with_trailing_comma(self) -> None: - """Function call with trailing comma.""" - parser = FluentParserV1() - res = parser.parse("key = { FUNC(a, b,) }") - assert len(res.entries) >= 1 - - -# ============================================================================ -# PARSER INTEGRATION SUITE -# ============================================================================ - - -class TestParserIntegration: - """Integration tests combining multiple edge cases.""" - - def test_complex_resource(self) -> None: - """FTL resource exercising multiple edge cases.""" - parser = FluentParserV1() - res = parser.parse( - "# Comment\n" - "msg = Value\n" - " .a = Short attr\n" - "\n" - "-t = Term\n" - "\n" - "select = { $n ->\n" - " [0] Zero\n" - " [1] One\n" - " *[other] Other\n" - "}\n" - "\n" - "func = { FUNC() }\n" - "\n" - "complex = { $a }{ $b } text { UPPER($c) }\n" - ) - assert len(res.entries) >= 5 - - def test_select_with_number_and_identifier_keys(self) -> None: - """Select with both number and identifier variant keys.""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $c ->\n" - " [0] Zero\n" - " [1] One\n" - " [42] Forty-two\n" - " *[other] Other\n" - "}\n" - ) - assert len(res.entries) >= 1 - - def test_select_identifier_keys(self) -> None: - """Select with identifier variant keys.""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $v ->\n" - " [yes] Affirmative\n" - " *[no] Negative\n" - "}\n" - ) - assert len(res.entries) >= 1 - - def test_variant_key_negative_hyphen_not_number(self) -> None: - """Variant key starts with - but isn't a number.""" - parser = FluentParserV1() - res = parser.parse( - "msg = { $s ->\n" - " [-not-a-number] Value\n" - " *[default] Default\n" - "}\n" - ) - assert len(res.entries) >= 1 - - def test_term_attribute_selection(self) -> None: - """Select on term attribute.""" - parser = FluentParserV1() - res = parser.parse( - "-term = Term\n" - " .attr = a\n" - "msg = { -term.attr -> *[a] Value }\n" - ) - assert len(res.entries) >= 1 - - def test_term_reference_arguments_via_parser(self) -> None: - """Term reference with arguments.""" - parser = FluentParserV1() - res = parser.parse( - "msg = { -term(case: 'accusative') }" - ) - assert len(res.entries) >= 1 - - def test_pattern_with_only_placeables(self) -> None: - """Pattern with adjacent placeables.""" - parser = FluentParserV1() - res = parser.parse("msg = { $a }{ $b }{ $c }") - assert len(res.entries) > 0 - - def test_function_variations(self) -> None: - """Function with various argument combinations.""" - parser = FluentParserV1() - for src in [ - "m = { FUNC() }", - "m = { FUNC($a, $b, $c) }", - 'm = { FUNC(key: "value", ot: "data") }', - 'm = { FUNC($p1, $p2, named: "value") }', - ]: - res = parser.parse(src) - assert len(res.entries) > 0, f"Failed: {src}" +"""Aggregated syntax parser error recovery test surface.""" + +from tests.syntax_parser_error_recovery_cases.argument_expression_error_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.call_arguments_error_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.debug_logging import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.defensive_mocking_tests_unreachable_code_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.function_reference_error_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.inline_expression_and_helper_error_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.message_reference_with_attributes import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.parser_integration_malformed_input import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.parser_integration_suite import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.pattern_continuation_edge_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.placeable_error_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.term_reference_error_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.variant_key_error_paths import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_error_recovery_cases.whitespace_and_line_ending_edge_cases import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_parser_expressions.py b/tests/test_syntax_parser_expressions.py index a297cc2c..71979073 100644 --- a/tests/test_syntax_parser_expressions.py +++ b/tests/test_syntax_parser_expressions.py @@ -1,1473 +1,16 @@ -"""Tests for parser expression and placeable handling. - -Tests expression parsing functions: parse_variable_reference, -parse_variant_key, parse_variant, parse_select_expression, -parse_argument_expression, parse_call_arguments, parse_function_reference, -parse_term_reference, parse_inline_expression, parse_placeable, and -associated helpers (_parse_inline_hyphen, _parse_inline_identifier, -_parse_inline_number_literal, _parse_inline_string_literal, -_parse_message_attribute, _is_variant_marker, _is_valid_variant_key_char, -_trim_pattern_blank_lines, validate_message_content). -""" - -from __future__ import annotations - -from typing import cast - -from hypothesis import event, example, given -from hypothesis import strategies as st - -from ftllexengine.runtime.bundle import FluentBundle -from ftllexengine.syntax.ast import ( - Attribute, - Identifier, - Message, - MessageReference, - NumberLiteral, - Pattern, - Placeable, - SelectExpression, - StringLiteral, - TermReference, - TextElement, - VariableReference, - Variant, -) -from ftllexengine.syntax.cursor import Cursor -from ftllexengine.syntax.parser import FluentParserV1 -from ftllexengine.syntax.parser.rules import _MAX_LOOKAHEAD_CHARS as MAX_LOOKAHEAD_CHARS -from ftllexengine.syntax.parser.rules import ( - ParseContext, - _is_valid_variant_key_char, - _is_variant_marker, - _parse_inline_hyphen, - _parse_inline_identifier, - _parse_inline_number_literal, - _parse_inline_string_literal, - _parse_message_attribute, - _trim_pattern_blank_lines, - parse_argument_expression, - parse_call_arguments, - parse_function_reference, - parse_inline_expression, - parse_message, - parse_pattern, - parse_placeable, - parse_select_expression, - parse_simple_pattern, - parse_term_reference, - parse_variable_reference, - parse_variant, - parse_variant_key, - validate_message_content, -) - -# ============================================================================ -# VARIABLE REFERENCE -# ============================================================================ - - -class TestParseVariableReference: - """Tests for parse_variable_reference error and success paths.""" - - def test_no_dollar_sign(self) -> None: - """Returns None without '$' prefix.""" - assert parse_variable_reference(Cursor("name", 0)) is None - - def test_at_eof(self) -> None: - """Returns None at EOF.""" - assert parse_variable_reference(Cursor("", 0)) is None - - def test_dollar_only(self) -> None: - """Returns None with just '$' (no identifier).""" - assert parse_variable_reference(Cursor("$ ", 0)) is None - - def test_dollar_followed_by_digit(self) -> None: - """Returns None with '$' followed by digit.""" - assert parse_variable_reference(Cursor("$123", 0)) is None - - def test_valid_variable_reference(self) -> None: - """Parses valid '$name' as VariableReference.""" - result = parse_variable_reference(Cursor("$var", 0)) - assert result is not None - assert isinstance(result.value, VariableReference) - assert result.value.id.name == "var" - - @given(st.text(min_size=1).filter(lambda t: not t.startswith("$"))) - @example("") - @example("x") - def test_no_dollar_prefix_property(self, text: str) -> None: - """Non-$ prefixed text always returns None.""" - event(f"first_char={repr(text[:1]) if text else 'eof'}") - cursor = Cursor(text, 0) - result = parse_variable_reference(cursor) - assert result is None - - @given(st.text(max_size=0)) - @example("$") - @example("$123") - @example("$ ") - def test_dollar_without_valid_identifier_property( - self, suffix: str - ) -> None: - """'$' plus invalid identifier always returns None.""" - event(f"suffix_len={len(suffix)}") - text = "$" + suffix - cursor = Cursor(text, 0) - result = parse_variable_reference(cursor) - if result is not None: - assert isinstance(result.value, VariableReference) - - -# ============================================================================ -# VARIANT KEY & VARIANT MARKER -# ============================================================================ - - -class TestIsValidVariantKeyChar: - """Tests for _is_valid_variant_key_char helper.""" - - @given(st.sampled_from([".", "-", "_"])) - def test_special_chars_in_variant_keys(self, char: str) -> None: - """Special character handling follows identifier rules.""" - event(f"char={char!r}") - if char == "_": - assert _is_valid_variant_key_char(char, is_first=True) - else: - assert not _is_valid_variant_key_char(char, is_first=True) - assert _is_valid_variant_key_char(char, is_first=False) - - -class TestIsVariantMarker: - """Tests for _is_variant_marker lookahead logic.""" - - def test_eof_cursor_returns_false(self) -> None: - """EOF cursor returns False.""" - assert not _is_variant_marker(Cursor("", 0)) - - def test_empty_brackets_not_variant(self) -> None: - """Empty [] is not a variant key.""" - assert not _is_variant_marker(Cursor("[]", 0)) - - def test_bracket_at_eof_after_closing(self) -> None: - """Valid variant when ] at EOF.""" - assert _is_variant_marker(Cursor("[one]", 0)) - - def test_bracket_followed_by_newline(self) -> None: - """Valid variant when ] followed by newline.""" - assert _is_variant_marker(Cursor("[one]\n", 0)) - - def test_bracket_followed_by_closing_brace(self) -> None: - """Valid variant when ] followed by }.""" - assert _is_variant_marker(Cursor("[one]}", 0)) - - def test_bracket_followed_by_open_bracket(self) -> None: - """Valid variant when ] followed by [.""" - assert _is_variant_marker(Cursor("[one][two]", 0)) - - def test_bracket_followed_by_asterisk(self) -> None: - """Valid variant when ] followed by *.""" - assert _is_variant_marker(Cursor("[one]*[other]", 0)) - - def test_bracket_with_comma_not_variant(self) -> None: - """Comma makes it literal text, not variant.""" - assert not _is_variant_marker(Cursor("[1, 2]", 0)) - - def test_bracket_with_invalid_char_not_variant(self) -> None: - """Invalid char for identifier/number.""" - assert not _is_variant_marker(Cursor("[in@valid]", 0)) - - def test_bracket_exceeds_lookahead(self) -> None: - """Exceeded lookahead before finding ].""" - long_text = "[" + "a" * (MAX_LOOKAHEAD_CHARS + 10) - assert not _is_variant_marker(Cursor(long_text, 0)) - - def test_lookahead_exhausted_in_whitespace_scan(self) -> None: - """Lookahead exhausted while skipping whitespace after ].""" - text = "[one]" + " " * (MAX_LOOKAHEAD_CHARS + 10) - result = _is_variant_marker(Cursor(text, 0)) - assert isinstance(result, bool) - - def test_non_bracket_non_asterisk_returns_false(self) -> None: - """Non-[ non-* character returns False.""" - assert not _is_variant_marker(Cursor("x", 0)) - - def test_variant_marker_with_leading_space(self) -> None: - """Leading space after '[' is valid per Fluent EBNF.""" - assert _is_variant_marker(Cursor("[ one]", 0)) - - def test_variant_marker_with_multiple_leading_spaces(self) -> None: - """Multiple leading spaces after '[' are valid.""" - assert _is_variant_marker(Cursor("[ other]", 0)) - - @given( - num_spaces=st.integers(min_value=1, max_value=10), - key=st.sampled_from( - ["one", "other", "few", "many", "zero", "0", "42"] - ), - ) - def test_variant_marker_leading_spaces_property( - self, num_spaces: int, key: str - ) -> None: - """Any number of leading spaces in variant key is valid.""" - event(f"num_spaces={num_spaces}") - event(f"key_type={'digit' if key.isdigit() else 'ident'}") - source = f"[{' ' * num_spaces}{key}]" - assert _is_variant_marker(Cursor(source, 0)) - - -class TestParseVariantKey: - """Tests for parse_variant_key paths.""" - - def test_identifier_variant_key(self) -> None: - """Identifier parsed as variant key.""" - result = parse_variant_key(Cursor("abc", 0)) - assert result is not None - assert isinstance(result.value, Identifier) - assert result.value.name == "abc" - - def test_identifier_from_bracket(self) -> None: - """Variant key parsed from inside brackets.""" - result = parse_variant_key(Cursor("[abc]", 1)) - assert result is not None - assert isinstance(result.value, Identifier) - - def test_number_variant_key(self) -> None: - """Number parsed as variant key.""" - result = parse_variant_key(Cursor("42", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_negative_number_fallback_fails(self) -> None: - """Hyphen followed by non-digit: both number and identifier fail.""" - assert parse_variant_key(Cursor("-foo", 0)) is None - - def test_hyphen_alone_fails(self) -> None: - """Hyphen alone fails both number and identifier parse.""" - assert parse_variant_key(Cursor("-", 0)) is None - - def test_invalid_start_char_fails(self) -> None: - """Characters invalid for both number and identifier fail.""" - assert parse_variant_key(Cursor("???", 1)) is None - - @given(st.integers(min_value=0, max_value=1000)) - @example(42) - @example(-42) - @example(0) - def test_numeric_variant_key_property(self, num: int) -> None: - """Numeric variant keys parsed correctly.""" - event(f"num={num}") - result = parse_variant_key(Cursor(str(num), 0)) - if result is not None: - assert isinstance( - result.value, (NumberLiteral, Identifier) - ) - - -class TestTrimPatternBlankLines: - """Tests for _trim_pattern_blank_lines edge cases.""" - - def test_empty_returns_empty(self) -> None: - """Empty list returns empty tuple.""" - assert _trim_pattern_blank_lines([]) == () - - def test_single_placeable_preserved(self) -> None: - """Placeable-only pattern is preserved.""" - placeable = Placeable( - expression=VariableReference(id=Identifier("x")) - ) - result = _trim_pattern_blank_lines([placeable]) - assert len(result) == 1 - assert result[0] == placeable - - def test_text_with_content_after_newline_preserved(self) -> None: - """Content after last newline is preserved.""" - elements = cast( - "list[TextElement | Placeable]", - [TextElement(value="Hello\nWorld")], - ) - result = _trim_pattern_blank_lines(elements) - assert len(result) == 1 - assert isinstance(result[0], TextElement) - assert result[0].value == "Hello\nWorld" - - def test_trailing_blank_line_removed(self) -> None: - """Trailing blank line is removed.""" - elements = cast( - "list[TextElement | Placeable]", - [TextElement(value="Content\n \n")], - ) - result = _trim_pattern_blank_lines(elements) - assert len(result) == 1 - assert isinstance(result[0], TextElement) - assert result[0].value == "Content" - - def test_leading_all_whitespace_removed(self) -> None: - """First element all whitespace is removed.""" - elements = cast( - "list[TextElement | Placeable]", - [TextElement(value=" "), TextElement(value="content")], - ) - result = _trim_pattern_blank_lines(elements) - assert len(result) == 1 - assert isinstance(result[0], TextElement) - assert result[0].value == "content" - - def test_trailing_all_whitespace_removed(self) -> None: - """Last element all whitespace after trimming is removed.""" - elements = cast( - "list[TextElement | Placeable]", - [TextElement(value="content"), TextElement(value="\n ")], - ) - result = _trim_pattern_blank_lines(elements) - assert len(result) == 1 - assert isinstance(result[0], TextElement) - assert result[0].value == "content" - - -# ============================================================================ -# VARIANT & SELECT EXPRESSION -# ============================================================================ - - -class TestParseVariant: - """Tests for parse_variant error paths.""" - - def test_missing_opening_bracket(self) -> None: - """Returns None when '[' is missing.""" - assert parse_variant(Cursor("one", 0)) is None - - def test_missing_closing_bracket(self) -> None: - """Returns None when ']' is missing.""" - assert parse_variant(Cursor("[one", 0)) is None - - def test_invalid_key(self) -> None: - """Returns None when variant key is invalid.""" - assert parse_variant(Cursor("[@]", 0)) is None - - def test_variant_with_pattern(self) -> None: - """Variant with text pattern succeeds.""" - result = parse_variant(Cursor("[one] item", 0)) - assert result is not None - assert isinstance(result.value, Variant) - - def test_variant_with_empty_pattern(self) -> None: - """Variant with empty pattern succeeds.""" - result = parse_variant(Cursor("[one] ", 0)) - assert result is not None or result is None - - -class TestParseSelectExpression: - """Tests for parse_select_expression validation and EOF handling.""" - - def test_no_variants_returns_none(self) -> None: - """Must have at least one variant.""" - selector = VariableReference(id=Identifier("count")) - result = parse_select_expression( - Cursor("}", 0), selector, 0 - ) - assert result is None - - def test_no_default_variant_returns_none(self) -> None: - """Must have exactly one default variant.""" - selector = VariableReference(id=Identifier("count")) - result = parse_select_expression( - Cursor("[one] item\n}", 0), selector, 0 - ) - assert result is None - - def test_multiple_defaults_returns_none(self) -> None: - """Multiple default variants detected.""" - selector = VariableReference(id=Identifier("count")) - result = parse_select_expression( - Cursor("*[one] One\n*[other] Other", 0), selector, 0 - ) - assert result is None - - def test_variant_parse_fails_in_loop(self) -> None: - """Variant parse failure in loop returns None.""" - selector = VariableReference(id=Identifier("x")) - result = parse_select_expression( - Cursor("[@]", 0), selector, 0 - ) - assert result is None - - def test_eof_after_variant_whitespace(self) -> None: - """EOF reached after skip_blank between variants.""" - source = "*[other] value\n\n\n" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 1 - assert result.cursor.is_eof - - def test_eof_multiple_blank_lines_after_variant(self) -> None: - """EOF with multiple blank lines after variant.""" - source = "*[other] text\n\n\n\n" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 1 - assert result.cursor.is_eof - - def test_eof_single_newline_after_variant(self) -> None: - """EOF with single newline after variant.""" - source = "*[default] value\n" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 1 - assert result.cursor.is_eof - - def test_eof_empty_pattern_variant(self) -> None: - """Variant with empty pattern followed by EOF.""" - source = "*[other]\n\n" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 1 - assert len(result.value.variants[0].value.elements) == 0 - assert result.cursor.is_eof - - def test_eof_multiple_variants(self) -> None: - """Multiple variants with EOF after last one.""" - source = "[one] singular\n*[other] plural\n\n" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 2 - assert result.cursor.is_eof - - def test_eof_complex_pattern(self) -> None: - """Complex pattern in variant, then EOF.""" - source = "*[other] You have items\n\n" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 1 - assert result.cursor.is_eof - - def test_immediate_eof(self) -> None: - """EOF immediately after arrow position.""" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor("", 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is None - - def test_whitespace_then_eof(self) -> None: - """Only whitespace after arrow, then EOF.""" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(" \n ", 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is None - - def test_variant_leading_spaces_integration(self) -> None: - """Variant keys with leading spaces via parse_message.""" - source = ( - "msg = {$count ->\n" - " [ one] item\n" - " *[other] items\n}" - ) - result = parse_message(Cursor(source, 0), ParseContext()) - assert result is not None - message = result.value - assert message.value is not None - assert len(message.value.elements) == 1 - placeable = message.value.elements[0] - assert isinstance(placeable, Placeable) - assert isinstance(placeable.expression, SelectExpression) - - def test_multiline_select_complex_spacing(self) -> None: - """Complex spacing and continuation in variant patterns.""" - source = ( - "msg = {$count ->\n" - " [ zero]\n" - " No items\n" - " [one]\n" - " {$count} item\n" - " *[other]\n" - " {$count} items\n" - "}" - ) - result = parse_message(Cursor(source, 0), ParseContext()) - assert result is not None - assert result.value.value is not None - - @given(st.integers(min_value=1, max_value=20)) - @example(1) - @example(5) - @example(20) - def test_eof_variable_newlines_property( - self, num_newlines: int - ) -> None: - """Various numbers of trailing newlines trigger EOF handling.""" - event(f"num_newlines={num_newlines}") - source = f"*[other] value{'\\n' * num_newlines}" - # Build actual newlines - source = "*[other] value" + "\n" * num_newlines - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 1 - assert result.cursor.is_eof - - @given(st.text(alphabet="\n", min_size=1, max_size=50)) - @example("\n") - @example("\n\n\n") - @example("\n\n\n\n\n") - def test_eof_arbitrary_newlines_property( - self, whitespace: str - ) -> None: - """Arbitrary newline sequences after variant trigger EOF.""" - event(f"ws_len={len(whitespace)}") - source = f"*[other] text{whitespace}" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - assert result is not None - assert len(result.value.variants) == 1 - assert result.cursor.is_eof - - @given( - st.lists( - st.sampled_from( - ["[one]", "[two]", "[zero]", "*[other]"] - ), - min_size=1, - max_size=5, - ) - ) - @example(["*[other]"]) - @example(["[one]", "*[other]"]) - def test_variant_configurations_property( - self, variant_keys: list[str] - ) -> None: - """Various variant configurations with EOF handling.""" - num_keys = len(variant_keys) - has_default = any("*" in k for k in variant_keys) - event(f"num_variants={num_keys}") - event(f"has_default={has_default}") - variants_text = "\n".join( - f"{key} text" for key in variant_keys - ) - source = f"{variants_text}\n\n" - selector = VariableReference(id=None) # type: ignore[arg-type] - result = parse_select_expression( - Cursor(source, 0), selector, start_pos=0, - context=ParseContext(), - ) - default_count = sum( - 1 for key in variant_keys if "*" in key - ) - if default_count == 1: - assert result is not None - assert len(result.value.variants) == len(variant_keys) - assert result.cursor.is_eof - else: - assert result is None - - -# ============================================================================ -# ARGUMENT EXPRESSION & CALL ARGUMENTS -# ============================================================================ - - -class TestParseArgumentExpression: - """Tests for parse_argument_expression dispatch paths.""" - - def test_eof_returns_none(self) -> None: - """EOF returns None.""" - assert parse_argument_expression(Cursor("", 0)) is None - - def test_string_literal(self) -> None: - """Parses string literal argument.""" - result = parse_argument_expression(Cursor('"text"', 0)) - assert result is not None - assert isinstance(result.value, StringLiteral) - - def test_negative_number(self) -> None: - """Parses negative number argument.""" - result = parse_argument_expression(Cursor("-123", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_term_reference(self) -> None: - """Parses term reference (-brand) argument.""" - result = parse_argument_expression(Cursor("-brand", 0)) - assert result is not None - assert isinstance(result.value, TermReference) - - def test_positive_number(self) -> None: - """Parses positive number argument.""" - result = parse_argument_expression(Cursor("42", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_inline_placeable(self) -> None: - """Parses inline placeable { expr } argument.""" - result = parse_argument_expression(Cursor("{ $var }", 0)) - assert result is not None - assert isinstance(result.value, Placeable) - - def test_message_reference_no_paren(self) -> None: - """Identifier without '(' parsed as MessageReference.""" - result = parse_argument_expression(Cursor("msg:", 0)) - assert result is not None - assert isinstance(result.value, MessageReference) - - def test_invalid_char_returns_none(self) -> None: - """Invalid character returns None.""" - assert parse_argument_expression(Cursor("@", 0)) is None - - def test_variable_reference_fails(self) -> None: - """'$' alone fails variable reference.""" - assert parse_argument_expression(Cursor("$", 0)) is None - - def test_string_literal_fails(self) -> None: - """Unclosed quote fails string literal.""" - assert parse_argument_expression(Cursor('"', 0)) is None - - def test_term_reference_fails(self) -> None: - """'-' alone fails term reference.""" - assert parse_argument_expression(Cursor("-", 0)) is None - - def test_negative_number_invalid(self) -> None: - """'-x' fails both term reference and number parse.""" - result = parse_argument_expression(Cursor("-x", 0)) - assert result is None or result is not None - - def test_placeable_fails(self) -> None: - """Invalid placeable content fails.""" - assert parse_argument_expression( - Cursor("{ @ }", 0) - ) is None - - def test_identifier_fails(self) -> None: - """Non-identifier start character fails.""" - assert parse_argument_expression(Cursor("@)", 0)) is None - - def test_function_reference_fails(self) -> None: - """Function reference with invalid args fails.""" - assert parse_argument_expression( - Cursor("FUNC(@)", 0) - ) is None - - def test_term_ref_fails_hyphen_only(self) -> None: - """Hyphen alone in argument position.""" - assert parse_argument_expression(Cursor("-)", 0)) is None - - def test_number_after_digit(self) -> None: - """Digit start parses as number.""" - result = parse_argument_expression(Cursor("0)", 0)) - assert result is not None - - def test_function_ref_fails_lower(self) -> None: - """Lowercase identifier with paren fails function ref.""" - result = parse_argument_expression(Cursor("func (", 0)) - assert result is None - - -class TestParseCallArguments: - """Tests for parse_call_arguments error paths.""" - - def test_named_arg_not_identifier(self) -> None: - """Named argument name must be identifier.""" - result = parse_call_arguments(Cursor('$var: "value")', 0)) - assert result is None - - def test_duplicate_named_argument(self) -> None: - """Duplicate named argument names fail.""" - assert parse_call_arguments( - Cursor("x: 1, x: 2)", 0) - ) is None - - def test_named_arg_missing_value(self) -> None: - """Expected value after ':'.""" - assert parse_call_arguments( - Cursor("x: )", 0) - ) is None - - def test_named_arg_value_parse_fails(self) -> None: - """Value expression parse fails.""" - assert parse_call_arguments( - Cursor("x: @)", 0) - ) is None - - def test_named_arg_non_literal_value(self) -> None: - """Named argument value must be literal.""" - assert parse_call_arguments( - Cursor("x: $var)", 0) - ) is None - - def test_positional_after_named_error(self) -> None: - """Positional args must come before named.""" - assert parse_call_arguments( - Cursor("x: 1, $var)", 0) - ) is None - - def test_trailing_comma(self) -> None: - """Trailing comma handled gracefully.""" - result = parse_call_arguments(Cursor("1, 2, )", 0)) - assert result is not None - assert len(result.value.positional) == 2 - - def test_argument_expression_fails(self) -> None: - """Argument expression parse fails.""" - assert parse_call_arguments(Cursor("@)", 0)) is None - - def test_named_arg_eof_after_colon(self) -> None: - """EOF after ':' in named argument.""" - assert parse_call_arguments(Cursor("x:", 0)) is None - - -# ============================================================================ -# FUNCTION REFERENCE -# ============================================================================ - - -class TestParseFunctionReference: - """Tests for parse_function_reference paths.""" - - def test_valid_function(self) -> None: - """Valid function reference parses successfully.""" - result = parse_function_reference(Cursor("NUMBER(42)", 0)) - assert result is not None - - def test_function_with_named_args(self) -> None: - """Function with named arguments parses.""" - result = parse_function_reference( - Cursor('NUMBER(42, style: "percent")', 0) - ) - assert result is not None - - def test_missing_opening_paren(self) -> None: - """Returns None when '(' is missing.""" - assert parse_function_reference(Cursor("FUNC", 0)) is None - - def test_missing_closing_paren(self) -> None: - """Returns None when ')' is missing.""" - assert parse_function_reference( - Cursor("FUNC($x", 0) - ) is None - - def test_no_identifier(self) -> None: - """Returns None when identifier is missing.""" - assert parse_function_reference(Cursor(" ", 0)) is None - - def test_non_identifier_start(self) -> None: - """Returns None for non-identifier start.""" - assert parse_function_reference(Cursor("123", 0)) is None - - def test_depth_exceeded(self) -> None: - """Returns None when nesting depth exceeded.""" - context = ParseContext(max_nesting_depth=1, current_depth=2) - result = parse_function_reference( - Cursor("FUNC($x)", 0), context - ) - assert result is None - - def test_arguments_parse_fails(self) -> None: - """Returns None when call arguments fail.""" - assert parse_function_reference( - Cursor("FUNC(@)", 0) - ) is None - - def test_no_closing_paren_after_args(self) -> None: - """Function with incomplete arguments.""" - assert parse_function_reference( - Cursor("NUMBER(", 0) - ) is None - - def test_invalid_arg_syntax(self) -> None: - """Function with invalid argument syntax.""" - assert parse_function_reference( - Cursor("FUNC(,,,)", 0) - ) is None - - -# ============================================================================ -# TERM REFERENCE -# ============================================================================ - - -class TestParseTermReference: - """Tests for parse_term_reference paths.""" - - def test_valid_term(self) -> None: - """Valid term reference parses.""" - result = parse_term_reference(Cursor("-brand", 0)) - assert result is not None - assert result.value.id.name == "brand" - - def test_term_with_attribute(self) -> None: - """Term with .attribute access.""" - result = parse_term_reference(Cursor("-brand.short", 0)) - assert result is not None - assert result.value.attribute is not None - - def test_missing_hyphen(self) -> None: - """Returns None without '-' prefix.""" - assert parse_term_reference(Cursor("brand", 0)) is None - - def test_no_identifier_after_hyphen(self) -> None: - """Returns None when identifier missing after '-'.""" - assert parse_term_reference(Cursor("-", 0)) is None - - def test_no_identifier_with_spaces(self) -> None: - """Returns None with spaces after '-'.""" - assert parse_term_reference(Cursor("- ", 0)) is None - - def test_attribute_parse_fails(self) -> None: - """Dot without attribute name returns None.""" - assert parse_term_reference(Cursor("-term.", 0)) is None - - def test_attribute_with_spaces_fails(self) -> None: - """Dot followed by whitespace returns None.""" - assert parse_term_reference( - Cursor("-brand. ", 0) - ) is None - - def test_arguments_parse_fails(self) -> None: - """Invalid arguments return None.""" - assert parse_term_reference( - Cursor("-brand(@)", 0) - ) is None - - def test_arguments_missing_closing_paren(self) -> None: - """Missing ')' after term arguments.""" - assert parse_term_reference( - Cursor("-brand(case: 'nom'", 0) - ) is None - - def test_missing_closing_paren_no_args(self) -> None: - """Missing ')' after open paren.""" - assert parse_term_reference(Cursor("-brand(", 0)) is None - - def test_depth_exceeded(self) -> None: - """Returns None when nesting depth exceeded.""" - context = ParseContext(max_nesting_depth=1, current_depth=2) - result = parse_term_reference( - Cursor("-brand(case: 'nom')", 0), context - ) - assert result is None - - def test_attribute_identifier_parse_fails(self) -> None: - """Attribute identifier parse fails after dot.""" - assert parse_term_reference(Cursor("-brand.", 0)) is None - - -# ============================================================================ -# INLINE EXPRESSION HELPERS -# ============================================================================ - - -class TestInlineExpressionHelpers: - """Tests for inline expression helper functions.""" - - def test_inline_string_literal(self) -> None: - """String literal inline expression.""" - result = _parse_inline_string_literal(Cursor('"text"', 0)) - assert result is not None - assert isinstance(result.value, StringLiteral) - - def test_inline_string_literal_fails(self) -> None: - """Unclosed string literal returns None.""" - assert _parse_inline_string_literal(Cursor('"', 0)) is None - - def test_inline_number_literal(self) -> None: - """Number literal inline expression.""" - result = _parse_inline_number_literal(Cursor("42", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_inline_number_single_digit(self) -> None: - """Single digit number parses.""" - result = _parse_inline_number_literal(Cursor("1", 0)) - assert result is not None - - def test_inline_hyphen_term(self) -> None: - """Hyphen-prefixed term reference.""" - result = _parse_inline_hyphen(Cursor("-brand", 0)) - assert result is not None - assert isinstance(result.value, TermReference) - - def test_inline_hyphen_number(self) -> None: - """Hyphen-prefixed negative number.""" - result = _parse_inline_hyphen(Cursor("-123", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_inline_hyphen_fails(self) -> None: - """Hyphen alone returns None.""" - assert _parse_inline_hyphen(Cursor("-", 0)) is None - - def test_message_attribute_with_dot(self) -> None: - """Parse .attribute suffix.""" - attr, _ = _parse_message_attribute(Cursor(".attr", 0)) - assert attr is not None - assert isinstance(attr, Identifier) - - def test_message_attribute_no_dot(self) -> None: - """No dot returns None.""" - attr, _ = _parse_message_attribute(Cursor("x", 0)) - assert attr is None - - def test_message_attribute_identifier_fails(self) -> None: - """Dot followed by non-identifier returns None.""" - attr, _ = _parse_message_attribute(Cursor(".123", 0)) - assert attr is None - - def test_inline_identifier_function_call(self) -> None: - """Identifier followed by '(' is function call.""" - result = _parse_inline_identifier(Cursor("FUNC($x)", 0)) - assert result is not None - - def test_inline_identifier_message_ref(self) -> None: - """Identifier as message reference.""" - result = _parse_inline_identifier(Cursor("msg", 0)) - assert result is not None - assert isinstance(result.value, MessageReference) - - def test_inline_identifier_with_attribute(self) -> None: - """Message reference with attribute.""" - result = _parse_inline_identifier(Cursor("msg.attr", 0)) - assert result is not None - assert isinstance(result.value, MessageReference) - assert result.value.attribute is not None - - def test_inline_identifier_non_ident_start(self) -> None: - """Non-identifier start returns None.""" - assert _parse_inline_identifier(Cursor("123", 0)) is None - - def test_inline_identifier_function_fails(self) -> None: - """Lowercase function with invalid args fails.""" - assert _parse_inline_identifier( - Cursor("func(@)", 0) - ) is None - - -# ============================================================================ -# INLINE EXPRESSION -# ============================================================================ - - -class TestParseInlineExpression: - """Tests for parse_inline_expression dispatch.""" - - def test_eof_returns_none(self) -> None: - """EOF returns None.""" - assert parse_inline_expression(Cursor("", 0)) is None - - def test_variable_reference(self) -> None: - """'$' dispatches to variable reference.""" - result = parse_inline_expression(Cursor("$var", 0)) - assert result is not None - assert isinstance(result.value, VariableReference) - - def test_variable_reference_fails(self) -> None: - """'$' alone fails.""" - assert parse_inline_expression(Cursor("$", 0)) is None - - def test_string_literal(self) -> None: - """Quote dispatches to string literal.""" - result = parse_inline_expression(Cursor('"text"', 0)) - assert result is not None - assert isinstance(result.value, StringLiteral) - - def test_hyphen_dispatch(self) -> None: - """'-' dispatches to hyphen handler.""" - result = parse_inline_expression(Cursor("-brand", 0)) - assert result is not None - - def test_nested_placeable(self) -> None: - """'{' dispatches to nested placeable.""" - result = parse_inline_expression(Cursor("{ $var }", 0)) - assert result is not None - assert isinstance(result.value, Placeable) - - def test_nested_placeable_fails(self) -> None: - """Invalid nested placeable fails.""" - assert parse_inline_expression( - Cursor("{ @ }", 0) - ) is None - - def test_digit_dispatch(self) -> None: - """Digit dispatches to number literal.""" - result = parse_inline_expression(Cursor("42", 0)) - assert result is not None - assert isinstance(result.value, NumberLiteral) - - def test_identifier_dispatch(self) -> None: - """Identifier dispatches to message reference.""" - result = parse_inline_expression(Cursor("msg", 0)) - assert result is not None - assert isinstance(result.value, MessageReference) - - def test_invalid_char_returns_none(self) -> None: - """Invalid character returns None.""" - assert parse_inline_expression(Cursor("@", 0)) is None - - def test_inline_expression_past_eof(self) -> None: - """Cursor past content returns None.""" - result = parse_inline_expression(Cursor("$", 1)) - assert result is None - - -# ============================================================================ -# PLACEABLE -# ============================================================================ - - -class TestParsePlaceable: - """Tests for parse_placeable paths.""" - - def test_simple_variable(self) -> None: - """Parses simple variable placeable.""" - result = parse_placeable(Cursor("$var}", 0)) - assert result is not None - assert isinstance(result.value.expression, VariableReference) - - def test_depth_exceeded(self) -> None: - """Returns None when nesting depth exceeded.""" - context = ParseContext(max_nesting_depth=1, current_depth=2) - assert parse_placeable( - Cursor("$var}", 0), context - ) is None - - def test_expression_fails(self) -> None: - """Invalid expression content returns None.""" - assert parse_placeable(Cursor("@}", 0)) is None - - def test_whitespace_only(self) -> None: - """Only whitespace inside braces returns None.""" - assert parse_placeable(Cursor(" }", 1)) is None - - def test_empty_content(self) -> None: - """Empty content returns None.""" - assert parse_placeable(Cursor("}", 0)) is None - - def test_select_valid_selector(self) -> None: - """Select expression with valid selector.""" - result = parse_placeable( - Cursor("$x -> [one] 1 *[other] N}", 0) - ) - assert result is not None - - def test_select_expression_fails(self) -> None: - """Select expression parse fails (no variants).""" - assert parse_placeable(Cursor("$var -> }", 0)) is None - - def test_select_missing_closing_brace(self) -> None: - """Missing '}' after select expression.""" - assert parse_placeable( - Cursor("$var -> [one] 1 *[other] N", 0) - ) is None - - def test_simple_expression_missing_brace(self) -> None: - """Missing '}' after simple expression.""" - assert parse_placeable(Cursor("$var", 0)) is None - - def test_function_followed_by_hyphen(self) -> None: - """Function selector with hyphen (not ->) returns None.""" - assert parse_placeable( - Cursor("NUMBER(42)-}", 0) - ) is None - - def test_function_followed_by_hyphen_eof(self) -> None: - """Function selector with hyphen at EOF returns None.""" - assert parse_placeable( - Cursor("NUMBER(42)-", 0) - ) is None - - def test_message_ref_with_hyphen_in_name(self) -> None: - """Message ref with hyphen in identifier name.""" - result = parse_placeable(Cursor("msg-}", 0)) - assert result is not None - - def test_nested_opening_braces(self) -> None: - """Multiple nested opening braces fail.""" - assert parse_placeable(Cursor("{{{", 1)) is None - - def test_incomplete_expression(self) -> None: - """Incomplete expression returns None.""" - assert parse_placeable(Cursor("NUMBER", 0)) is None - - -# ============================================================================ -# VALIDATE MESSAGE CONTENT -# ============================================================================ - - -class TestValidateMessageContent: - """Tests for validate_message_content.""" - - def test_empty_pattern_with_attributes_valid(self) -> None: - """No pattern but with attributes is valid.""" - pattern = Pattern(elements=()) - attributes = [ - Attribute( - id=Identifier("attr"), - value=Pattern( - elements=(TextElement("val"),) - ), - ) - ] - assert validate_message_content(pattern, attributes) - - def test_pattern_no_attributes_valid(self) -> None: - """Pattern with no attributes is valid.""" - pattern = Pattern(elements=(TextElement("value"),)) - assert validate_message_content(pattern, []) - - def test_no_pattern_no_attributes_invalid(self) -> None: - """Neither pattern nor attributes is invalid.""" - assert not validate_message_content( - Pattern(elements=()), [] - ) - - -# ============================================================================ -# PARSE CONTEXT -# ============================================================================ - - -class TestParseContextDepthExceeded: - """Tests for ParseContext._depth_exceeded_flag edge case.""" - - def test_mark_depth_exceeded_with_none_flag(self) -> None: - """Handle _depth_exceeded_flag being None gracefully.""" - context = object.__new__(ParseContext) - object.__setattr__(context, "max_nesting_depth", 5) - object.__setattr__(context, "current_depth", 0) - object.__setattr__(context, "_depth_exceeded_flag", None) - context.mark_depth_exceeded() - assert context._depth_exceeded_flag is None - - -# ============================================================================ -# LINE-TARGETED COVERAGE (parse_simple_pattern / parse_pattern) -# ============================================================================ - - -class TestSimplePatternLineCoverage: - """Targeted line coverage for parse_simple_pattern.""" - - def test_accumulated_text_before_placeable_prepend(self) -> None: - """Accumulated text merged with last element before placeable.""" - result = parse_simple_pattern( - Cursor("First\n continued{$var}", 0) - ) - assert result is not None - - def test_accumulated_text_before_placeable_new(self) -> None: - """Accumulated text as new element before placeable.""" - result = parse_simple_pattern( - Cursor("\n start{$var}", 0) - ) - assert result is not None - - def test_finalize_accumulated_merged(self) -> None: - """Finalize accumulated text merged with existing element.""" - result = parse_simple_pattern( - Cursor("Text\n more continuation", 0) - ) - assert result is not None - - def test_finalize_accumulated_new_element(self) -> None: - """Finalize accumulated text as new element.""" - result = parse_simple_pattern( - Cursor("{$var}\n ending text", 0) - ) - assert result is not None - - def test_variant_continuation_extra_spaces(self) -> None: - """Variant value with extra indent before placeable.""" - source = ( - "msg = {$count ->\n" - " [one] Items:\n" - " {$count}\n" - " *[other] Items\n" - "}" - ) - result = parse_message(Cursor(source, 0), ParseContext()) - assert result is not None - assert isinstance(result.value, Message) - - def test_variant_trailing_accumulated_spaces(self) -> None: - """Variant ending with accumulated extra spaces.""" - source = ( - "msg = {$count ->\n" - " [one] Items\n\n" - " *[other] More\n" - "}" - ) - result = parse_message(Cursor(source, 0), ParseContext()) - assert result is not None - assert isinstance(result.value, Message) - - -class TestPatternLineCoverage: - """Targeted line coverage for parse_pattern.""" - - def test_accumulated_as_new_element(self) -> None: - """Accumulated continuation becomes new element.""" - result = parse_pattern( - Cursor("{$x}\n text after placeable", 0) - ) - assert result is not None - - def test_finalize_merged(self) -> None: - """Finalize merged with existing element.""" - result = parse_pattern( - Cursor("Text\n final continuation", 0) - ) - assert result is not None - - def test_finalize_new_element(self) -> None: - """Finalize as new element.""" - result = parse_pattern( - Cursor("{$x}\n final", 0) - ) - assert result is not None - - -# ============================================================================ -# INTEGRATION VIA FLUENTBUNDLE -# ============================================================================ - - -class TestExpressionsIntegration: - """Integration tests via FluentBundle for expression paths.""" - - def test_function_name_not_uppercase(self) -> None: - """Lowercase function name fails, soft recovery.""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource("msg = { lowercase() }") - result, errors = bundle.format_pattern("msg") - assert len(errors) > 0 or "{" in result - - def test_function_missing_paren(self) -> None: - """UPPERCASE without paren treated as message reference, soft recovery.""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource("msg = { NUMBER }") - result, errors = bundle.format_pattern("msg") - assert "{NUMBER}" in result or len(errors) > 0 - - def test_string_literal_selector(self) -> None: - """String literal as selector in select expression.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - 'msg = {"test" ->\n' - " [test] Matched\n" - " *[other] Other\n" - "}" - ) - result, _ = bundle.format_pattern("msg") - assert "Matched" in result or "test" in result - - def test_number_literal_selector(self) -> None: - """Number literal as selector.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - "msg = {42 ->\n" - " [42] Exact match\n" - " *[other] Other\n" - "}" - ) - result, _ = bundle.format_pattern("msg") - assert result is not None - - def test_nested_selects(self) -> None: - """Nested select expressions.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - "msg = {NUMBER(1) ->\n" - " [one] {NUMBER(2) ->\n" - " [one] One-One\n" - " *[other] One-Other\n" - " }\n" - " *[other] Other\n" - "}" - ) - result, _ = bundle.format_pattern("msg") - assert result is not None - - def test_function_with_multiple_args(self) -> None: - """Function call with multiple named arguments, soft recovery.""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource( - 'msg = {NUMBER(42, style: "percent")}' - ) - result, _ = bundle.format_pattern("msg") - assert result is not None - - def test_attribute_access(self) -> None: - """Message attribute reference in placeable.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - "base = Base\n" - " .attr = Attribute\n\n" - "msg = {base.attr}" - ) - result, _ = bundle.format_pattern("msg") - assert "Attribute" in result - - def test_term_attribute_selector(self) -> None: - """Term attribute as selector.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - "-brand = Firefox\n" - " .version = 1\n\n" - "msg = {-brand.version ->\n" - " [1] Version One\n" - " *[other] Other Version\n" - "}" - ) - result, _ = bundle.format_pattern("msg") - assert result is not None - - def test_deeply_nested_expressions(self) -> None: - """Deep nesting of expressions.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - "msg = {NUMBER(1) ->\n" - " [one] {NUMBER(2) ->\n" - " [one] {NUMBER(3) ->\n" - " [one] Deep\n" - " *[other] Level3\n" - " }\n" - " *[other] Level2\n" - " }\n" - " *[other] Level1\n" - "}" - ) - result, _ = bundle.format_pattern("msg") - assert result is not None - - def test_select_missing_arrow(self) -> None: - """Select expression without -> operator, soft recovery.""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource( - "msg = {NUMBER(1)\n" - " [one] One\n" - " *[other] Other\n" - "}" - ) - result, _errors = bundle.format_pattern("msg") - assert result is not None - - def test_select_missing_default_via_bundle(self) -> None: - """Select without default variant via bundle, soft recovery.""" - bundle = FluentBundle("en_US", strict=False) - bundle.add_resource( - "msg = {NUMBER(1) ->\n" - " [one] One\n" - " [two] Two\n" - "}" - ) - result, _errors = bundle.format_pattern("msg") - assert result is not None - - def test_unicode_expression(self) -> None: - """Unicode characters in expressions.""" - bundle = FluentBundle("en_US") - bundle.add_resource( - 'msg = {"Hello \\u4E16\\u754C" ->\n' - " *[other] Unicode test\n" - "}" - ) - result, _ = bundle.format_pattern("msg") - assert result is not None - - -# ============================================================================ -# PARSER/RULES BRANCH COVERAGE -# ============================================================================ - - -class TestParserRulesCoverage: - """Test parser/rules.py coverage gaps for function arguments.""" - - def test_placeable_as_function_argument(self) -> None: - """Placeable inside function call arguments parses successfully.""" - parser = FluentParserV1() - ftl = 'msg = { NUMBER({ "5" }) }' - - resource = parser.parse(ftl) - - assert len(resource.entries) == 1 - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - - def test_function_reference_as_argument(self) -> None: - """Function reference inside function arguments parses without crash.""" - parser = FluentParserV1() - ftl = "msg = { NUMBER(UPPER($val)) }" - - resource = parser.parse(ftl) - - assert len(resource.entries) >= 1 - - def test_uppercase_identifier_not_function(self) -> None: - """Uppercase identifier without parentheses is treated as message reference.""" - parser = FluentParserV1() - ftl = "msg = { THIS }" - - resource = parser.parse(ftl) - - assert len(resource.entries) == 1 - msg = resource.entries[0] - assert isinstance(msg, Message) - - -class TestParserRulesBranchCoverage: - """Additional tests for parser/rules branch coverage.""" - - def test_parse_complex_select_with_functions(self) -> None: - """Complex select expression with function calls in variants parses correctly.""" - parser = FluentParserV1() - ftl = """ -complex = { $gender -> - [male] Mr. { $lastName } - [female] Ms. { $lastName } - *[other] { $firstName } { $lastName } -} -""" - resource = parser.parse(ftl) - assert len(resource.entries) == 1 - - def test_parse_nested_function_calls(self) -> None: - """NUMBER with string literal argument parses correctly.""" - parser = FluentParserV1() - ftl = 'msg = { NUMBER("123.45") }' - - resource = parser.parse(ftl) - assert len(resource.entries) == 1 +"""Aggregated syntax parser expressions test surface.""" + +from tests.syntax_parser_expressions_cases.argument_expression_call_arguments import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.function_reference import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.inline_expression import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.inline_expression_helpers import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.integration_via_fluentbundle import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.line_targeted_coverage_parse_simple_pattern_parse_pattern import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.message_content_validation import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.parse_context import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.parser_rules_branch_coverage import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.placeable import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.term_reference import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.variable_reference import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.variant_key_variant_marker import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_expressions_cases.variant_select_expression import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_parser_patterns.py b/tests/test_syntax_parser_patterns.py index b616e333..3119ab3d 100644 --- a/tests/test_syntax_parser_patterns.py +++ b/tests/test_syntax_parser_patterns.py @@ -1,1443 +1,8 @@ -"""Tests for parser pattern and whitespace handling. - -Tests whitespace utilities (skip_blank_inline, skip_blank, -is_indented_continuation, skip_multiline_pattern_start) and pattern -parsing (parse_pattern, parse_simple_pattern) including multiline -continuation, blank line handling, text accumulation, variant delimiter -lookahead, and CRLF normalization. -""" - -from __future__ import annotations - -from unittest.mock import patch - -import pytest -from hypothesis import event, example, given -from hypothesis import strategies as st - -from ftllexengine import parse_ftl -from ftllexengine.runtime.bundle import FluentBundle -from ftllexengine.syntax.ast import ( - Message, - Pattern, - Placeable, - SelectExpression, - Term, - TextElement, -) -from ftllexengine.syntax.cursor import Cursor -from ftllexengine.syntax.parser.rules import ( - ParseContext, - parse_message, - parse_pattern, - parse_simple_pattern, - parse_variant, -) -from ftllexengine.syntax.parser.whitespace import ( - is_indented_continuation, - skip_blank, - skip_blank_inline, - skip_multiline_pattern_start, -) - -# ============================================================================ -# WHITESPACE UTILITIES -# ============================================================================ - - -class TestSkipBlankInline: - """Tests for skip_blank_inline (U+0020 only, per FTL spec).""" - - def test_no_spaces(self) -> None: - """Returns same position when no spaces.""" - cursor = Cursor(source="hello", pos=0) - assert skip_blank_inline(cursor).pos == 0 - - def test_leading_spaces(self) -> None: - """Skips leading spaces.""" - cursor = Cursor(source=" hello", pos=0) - result = skip_blank_inline(cursor) - assert result.pos == 3 - assert result.current == "h" - - def test_all_spaces(self) -> None: - """Handles all-space string.""" - cursor = Cursor(source=" ", pos=0) - assert skip_blank_inline(cursor).is_eof is True - - def test_stops_at_tab(self) -> None: - """Does NOT skip tabs.""" - cursor = Cursor(source=" \thello", pos=0) - result = skip_blank_inline(cursor) - assert result.pos == 2 - assert result.current == "\t" - - def test_stops_at_newline(self) -> None: - """Does NOT skip newlines.""" - cursor = Cursor(source=" \nhello", pos=0) - result = skip_blank_inline(cursor) - assert result.pos == 2 - assert result.current == "\n" - - def test_at_eof(self) -> None: - """Handles EOF.""" - cursor = Cursor(source="", pos=0) - assert skip_blank_inline(cursor).is_eof - - -class TestSkipBlank: - """Tests for skip_blank (spaces and line endings).""" - - def test_no_whitespace(self) -> None: - """Returns same position when no whitespace.""" - cursor = Cursor(source="hello", pos=0) - assert skip_blank(cursor).pos == 0 - - def test_spaces_only(self) -> None: - """Skips spaces.""" - cursor = Cursor(source=" hello", pos=0) - result = skip_blank(cursor) - assert result.pos == 3 - assert result.current == "h" - - def test_newlines_only(self) -> None: - """Skips newlines.""" - cursor = Cursor(source="\n\nhello", pos=0) - result = skip_blank(cursor) - assert result.pos == 2 - assert result.current == "h" - - def test_mixed_whitespace(self) -> None: - """Skips mixed spaces and newlines.""" - cursor = Cursor(source=" \n hello", pos=0) - result = skip_blank(cursor) - assert result.pos == 6 - assert result.current == "h" - - def test_all_whitespace(self) -> None: - """Handles all-whitespace string.""" - cursor = Cursor(source=" \n ", pos=0) - assert skip_blank(cursor).is_eof is True - - def test_stops_at_tab(self) -> None: - """Does NOT skip tabs.""" - cursor = Cursor(source=" \n\thello", pos=0) - result = skip_blank(cursor) - assert result.pos == 2 - assert result.current == "\t" - - def test_normalized_crlf(self) -> None: - """Handles CRLF normalized to LF.""" - cursor = Cursor(source="\nhello", pos=0) - result = skip_blank(cursor) - assert result.pos == 1 - assert result.current == "h" - - def test_at_eof(self) -> None: - """Handles EOF.""" - cursor = Cursor(source="", pos=0) - assert skip_blank(cursor).is_eof - - -class TestIsIndentedContinuation: - """Tests for is_indented_continuation detection.""" - - def test_true_for_indented_line(self) -> None: - """Returns True for indented line after newline.""" - cursor = Cursor(source="\n hello", pos=0) - assert is_indented_continuation(cursor) is True - - def test_false_no_indentation(self) -> None: - """Returns False without indentation.""" - cursor = Cursor(source="\nhello", pos=0) - assert is_indented_continuation(cursor) is False - - def test_false_bracket(self) -> None: - """Returns False for line starting with [ (variant).""" - cursor = Cursor(source="\n [variant]", pos=0) - assert is_indented_continuation(cursor) is False - - def test_false_asterisk(self) -> None: - """Returns False for line starting with * (default variant).""" - cursor = Cursor(source="\n *[default]", pos=0) - assert is_indented_continuation(cursor) is False - - def test_false_dot(self) -> None: - """Returns False for line starting with . (attribute).""" - cursor = Cursor(source="\n .attribute", pos=0) - assert is_indented_continuation(cursor) is False - - def test_false_not_at_newline(self) -> None: - """Returns False when not at newline.""" - cursor = Cursor(source="hello", pos=0) - assert is_indented_continuation(cursor) is False - - def test_false_at_eof(self) -> None: - """Returns False at EOF.""" - cursor = Cursor(source="", pos=0) - assert is_indented_continuation(cursor) is False - - def test_normalized_line_ending(self) -> None: - """Works with normalized LF line endings.""" - cursor = Cursor(source="\n hello", pos=0) - assert is_indented_continuation(cursor) is True - - def test_eof_after_newline(self) -> None: - """Returns False for newline at EOF.""" - cursor = Cursor(source="\n", pos=0) - assert is_indented_continuation(cursor) is False - - def test_only_spaces_after_newline(self) -> None: - """Empty indented line is considered a valid continuation.""" - cursor = Cursor(source="\n ", pos=0) - assert is_indented_continuation(cursor) is True - - def test_tab_indentation_rejected(self) -> None: - """Returns False for tab indentation.""" - cursor = Cursor(source="\n\thello", pos=0) - assert is_indented_continuation(cursor) is False - - -class TestSkipMultilinePatternStart: - """Tests for skip_multiline_pattern_start.""" - - def test_inline_pattern(self) -> None: - """Handles inline pattern (no newline).""" - cursor = Cursor(source=" value", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.pos == 2 - assert new_cursor.current == "v" - assert indent == 0 - - def test_multiline_pattern(self) -> None: - """Handles multiline pattern (newline + indent).""" - cursor = Cursor(source="\n value", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.pos == 3 - assert new_cursor.current == "v" - assert indent == 2 - - def test_no_continuation(self) -> None: - """Stops at non-continuation newline.""" - cursor = Cursor(source="\nvalue", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.pos == 0 - assert new_cursor.current == "\n" - assert indent == 0 - - def test_empty_input(self) -> None: - """Handles empty input.""" - cursor = Cursor(source="", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.is_eof - assert indent == 0 - - def test_no_leading_spaces(self) -> None: - """Handles no leading spaces.""" - cursor = Cursor(source="value", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.pos == 0 - assert new_cursor.current == "v" - assert indent == 0 - - def test_normalized_line_ending(self) -> None: - """Handles normalized LF line endings.""" - cursor = Cursor(source="\n value", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.current == "v" - assert indent == 2 - - def test_stops_at_bracket(self) -> None: - """Stops at bracket (variant marker).""" - cursor = Cursor(source="\n [variant]", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.pos == 0 - assert new_cursor.current == "\n" - assert indent == 0 - - def test_inline_spaces_then_newline(self) -> None: - """Handles inline spaces then newline.""" - cursor = Cursor(source=" \nvalue", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.pos == 2 - assert new_cursor.current == "\n" - assert indent == 0 - - def test_only_newline(self) -> None: - """Handles only newline.""" - cursor = Cursor(source="\n", pos=0) - new_cursor, indent = skip_multiline_pattern_start(cursor) - assert new_cursor.pos == 0 - assert indent == 0 - - -class TestWhitespaceSpecCompliance: - """Spec compliance, integration, and edge cases for whitespace.""" - - def test_blank_inline_only_u0020(self) -> None: - """blank_inline ONLY accepts U+0020 (space).""" - assert skip_blank_inline(Cursor(" text", 0)).pos == 3 - assert skip_blank_inline(Cursor("\ttext", 0)).pos == 0 - - def test_blank_accepts_lf(self) -> None: - """blank accepts LF line endings.""" - assert skip_blank(Cursor("\ntext", 0)).current == "t" - - def test_blank_rejects_cr(self) -> None: - """Standalone CR is NOT whitespace per Fluent spec.""" - assert skip_blank(Cursor("\rtext", 0)).current == "\r" - - def test_continuation_special_chars(self) -> None: - """Special starting characters correctly identified.""" - assert is_indented_continuation(Cursor("\n [", 0)) is False - assert is_indented_continuation(Cursor("\n *", 0)) is False - assert is_indented_continuation(Cursor("\n .", 0)) is False - assert is_indented_continuation(Cursor("\n a", 0)) is True - - def test_carriage_return_not_whitespace(self) -> None: - """CR alone is not skipped by skip_blank.""" - cursor = Cursor(source="\rhello", pos=0) - assert skip_blank(cursor).current == "\r" - - def test_inline_pattern_integration(self) -> None: - """Simulate parsing message with inline pattern.""" - cursor = Cursor(source="hello = World", pos=5) - cursor = skip_blank_inline(cursor) - assert cursor.current == "=" - cursor = cursor.advance() - cursor, indent = skip_multiline_pattern_start(cursor) - assert cursor.current == "W" - assert indent == 0 - - def test_multiline_pattern_integration(self) -> None: - """Simulate parsing message with multiline pattern.""" - cursor = Cursor(source="hello =\n World", pos=5) - cursor = skip_blank_inline(cursor) - assert cursor.current == "=" - cursor = cursor.advance() - cursor, indent = skip_multiline_pattern_start(cursor) - assert cursor.current == "W" - assert indent == 2 - - def test_select_expression_with_blank(self) -> None: - """Simulate parsing select expression with blank lines.""" - cursor = Cursor(source=" \n \n [variant]", pos=0) - cursor = skip_blank(cursor) - assert cursor.current == "[" - - def test_continuation_detection_in_pattern(self) -> None: - """Detect continuation vs attribute.""" - c1 = Cursor(source="\n continued text", pos=0) - assert is_indented_continuation(c1) is True - c2 = Cursor(source="\n .attribute = value", pos=0) - assert is_indented_continuation(c2) is False - - -# ============================================================================ -# PARSE_SIMPLE_PATTERN -# ============================================================================ - - -class TestParseSimplePattern: - """Tests for parse_simple_pattern basic behavior.""" - - def test_with_variable(self) -> None: - """Parses pattern with variable reference.""" - cursor = Cursor("Hello {$name}", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert len(result.value.elements) == 2 - - def test_stops_at_bracket(self) -> None: - """Bracket lookahead: [key]rest is literal text.""" - cursor = Cursor("Value[key]rest", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert result.value.elements[0].value == "Value[key]rest" # type: ignore[union-attr] - assert result.cursor.is_eof - - # [key] followed by } IS a variant marker - cursor = Cursor("Value [one]}", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert result.value.elements[0].value == "Value " # type: ignore[union-attr] - assert result.cursor.current == "[" - - def test_stops_at_asterisk(self) -> None: - """Asterisk lookahead: *[ is variant, * alone is literal.""" - cursor = Cursor("Text*[other]", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert result.cursor.current == "*" - - cursor = Cursor("Text*rest", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert result.value.elements[0].value == "Text*rest" # type: ignore[union-attr] - - def test_stops_at_brace(self) -> None: - """Stops at } (expression end).""" - cursor = Cursor("Value}rest", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert result.cursor.current == "}" - - def test_placeable_parse_fails(self) -> None: - """Returns None when placeable parsing fails.""" - cursor = Cursor("Text {invalid", 0) - with patch( - "ftllexengine.syntax.parser.expressions.parse_placeable", - return_value=None, - ): - result = parse_simple_pattern(cursor) - assert result is None - - def test_variant_markers_lookahead(self) -> None: - """Variant markers vs literal text disambiguation.""" - # *[other] IS a variant marker - cursor = Cursor("*[other]", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert len(result.value.elements) == 0 - assert result.cursor.current == "*" - - # [INFO] followed by text is literal - cursor = Cursor("[INFO] message", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert result.value.elements[0].value == "[INFO] message" # type: ignore[union-attr] - - def test_malformed_placeable_returns_none(self) -> None: - """Malformed placeable ({@) returns None.""" - cursor = Cursor("text{@", 0) - result = parse_simple_pattern(cursor) - assert result is None - - def test_in_select_expression(self) -> None: - """parse_simple_pattern as used in select expression variants.""" - bundle = FluentBundle("en_US") - bundle.add_resource("""msg = {NUMBER(1) -> - [one] One item - *[other] Many items -}""") - result, _ = bundle.format_pattern("msg") - assert "item" in result - - -class TestSimplePatternTextAccDirect: - """Tests for text_acc paths in parse_simple_pattern (Cursor-direct).""" - - def test_text_then_continuation_then_placeable(self) -> None: - """Accumulated text merged with prior element before placeable.""" - result = parse_simple_pattern(Cursor("hello\n {$x}", 0)) - assert result is not None - assert len(result.value.elements) >= 2 - - def test_continuation_then_placeable_no_prior(self) -> None: - """Continuation before placeable with no prior elements.""" - result = parse_simple_pattern(Cursor("\n {$x}", 0)) - assert result is not None - - def test_placeable_then_continuation_then_placeable(self) -> None: - """Placeable, continuation, then another placeable.""" - result = parse_simple_pattern(Cursor("{$a}\n {$b}", 0)) - assert result is not None - - def test_text_then_continuation_at_end(self) -> None: - """Text followed by trailing continuation.""" - result = parse_simple_pattern(Cursor("hello\n ", 0)) - assert result is not None - - def test_continuation_at_end_no_prior(self) -> None: - """Trailing continuation with no prior elements.""" - result = parse_simple_pattern(Cursor("\n ", 0)) - assert result is not None - - def test_placeable_then_continuation_at_end(self) -> None: - """Placeable then trailing continuation.""" - result = parse_simple_pattern(Cursor("{$x}\n ", 0)) - assert result is not None - - def test_complex_continuation_before_placeable(self) -> None: - """Multiple continuations before placeable.""" - text = "start\n line1\n line2\n {$x}" - result = parse_simple_pattern(Cursor(text, 0)) - assert result is not None - - def test_multiple_placeables_with_continuations(self) -> None: - """Multiple placeables separated by continuations.""" - result = parse_simple_pattern(Cursor("{$a}\n {$b}\n {$c}", 0)) - assert result is not None - - def test_blank_continuation_lines(self) -> None: - """Blank lines between continuations.""" - result = parse_simple_pattern(Cursor("text\n\n continued", 0)) - assert result is not None - - def test_continuation_before_placeable_with_text(self) -> None: - """Leading spaces then text then placeable.""" - cursor = Cursor(" continuation{$var}", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert len(result.value.elements) >= 2 - - def test_placeable_continuation_text_placeable(self) -> None: - """Placeable, continuation with text, then another placeable.""" - cursor = Cursor("{$x}\n text{$y}", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert len(result.value.elements) >= 3 - - def test_continuation_before_text_no_prior(self) -> None: - """Leading spaces then text, no prior elements.""" - cursor = Cursor(" line1\n line2", 0) - result = parse_simple_pattern(cursor) - assert result is not None - - def test_finalize_continuation_no_prior(self) -> None: - """Finalize accumulated text when no prior elements.""" - cursor = Cursor(" just continuation", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert len(result.value.elements) >= 1 - - def test_finalize_continuation_last_is_placeable(self) -> None: - """Finalize accumulated text when last element is placeable.""" - cursor = Cursor("{$x}\n continuation", 0) - result = parse_simple_pattern(cursor) - assert result is not None - assert len(result.value.elements) >= 2 - - def test_direct_text_acc_finalization(self) -> None: - """Extra spaces accumulated then stop character triggers finalization.""" - source = "a\n b\n }" - result = parse_simple_pattern(Cursor(source, 0)) - assert result is not None - assert len(result.value.elements) >= 1 - - -class TestSimplePatternTextAccVariant: - """Tests for text_acc in variant/message context (parse_ftl/parse_message).""" - - def test_extra_spaces_before_placeable(self) -> None: - """Extra indentation before placeable in variant pattern.""" - ftl = """msg = { $n -> - [one] - first - {$count} - *[other] items -}""" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - - def test_trailing_extra_spaces(self) -> None: - """Trailing extra spaces at end of variant pattern.""" - ftl = """msg = { $n -> - [one] - item - - *[other] items -}""" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - - def test_continuation_extra_spaces_then_placeable(self) -> None: - """Extra spaces before placeable via parse_message.""" - source = """msg = {$n -> - [one] Line1 - Line2 - {$var} - *[other] Items -}""" - cursor = Cursor(source, 0) - result = parse_message(cursor, ParseContext()) - assert result is not None - message = result.value - assert isinstance(message, Message) - assert message.value is not None - - def test_continuation_spaces_only_then_placeable(self) -> None: - """Blank continuation creating extra_spaces, then text+placeable.""" - source = """msg = {$n -> - [one] Start - - text {$x} - *[other] End -}""" - cursor = Cursor(source, 0) - result = parse_message(cursor, ParseContext()) - assert result is not None - assert isinstance(result.value, Message) - - def test_trailing_extra_spaces_via_message(self) -> None: - """Variant ending with only accumulated extra spaces.""" - variant_one = ( - "[one] Text\n MoreText\n " - ) - variant_other = "*[other] Items" - source = ( - f"msg = {{$n ->\n {variant_one}" - f"\n {variant_other}\n}}" - ) - cursor = Cursor(source, 0) - result = parse_message(cursor, ParseContext()) - assert result is not None - assert isinstance(result.value, Message) - assert result.value.value is not None - - def test_extra_spaces_at_close_brace(self) -> None: - """Trailing extra spaces ending at close brace.""" - source = """msg = {$n -> - *[other] Text - -}""" - cursor = Cursor(source, 0) - result = parse_message(cursor, ParseContext()) - assert result is not None - assert isinstance(result.value, Message) - - def test_complex_spacing_finalization(self) -> None: - """Multiple continuations ending with accumulated spaces.""" - source = """msg = {$count -> - [one] Line one - Line two - Line three - - *[other] Other -}""" - cursor = Cursor(source, 0) - result = parse_message(cursor, ParseContext()) - assert result is not None - message = result.value - assert isinstance(message, Message) - assert message.value is not None - placeable = message.value.elements[0] - assert isinstance(placeable, Placeable) - assert isinstance(placeable.expression, SelectExpression) - - def test_variant_ending_with_continuation(self) -> None: - """Variant ending with continuation extra spaces.""" - ftl = """msg = { $n -> - [one] value - text - - [two] other - *[three] default -}""" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - - def test_variant_extra_indent_then_next(self) -> None: - """Variant with extra indent followed by next variant.""" - ftl = """msg = { $n -> - [one] - line1 - - [two] line2 - *[other] other -}""" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - - -# ============================================================================ -# PARSE_PATTERN -# ============================================================================ - - -class TestParsePatternBasic: - """Tests for parse_pattern basic behavior.""" - - def test_no_text_before_newline(self) -> None: - """Empty pattern at newline (cursor.pos == text_start).""" - result = parse_pattern(Cursor("\n", 0)) - assert result is not None - assert len(result.value.elements) == 0 - - def test_placeable_then_newline(self) -> None: - """Placeable immediately followed by newline.""" - result = parse_pattern(Cursor("{$var}\n", 0)) - assert result is not None - assert len(result.value.elements) == 1 - - def test_placeable_parse_fails(self) -> None: - """Returns None when parse_placeable fails.""" - cursor = Cursor("Text {invalid", 0) - with patch( - "ftllexengine.syntax.parser.expressions.parse_placeable", - return_value=None, - ): - result = parse_pattern(cursor) - assert result is None - - def test_stop_char_not_placeable(self) -> None: - """Pattern with stop character that's not '{'.""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg = Value\n") - result, errors = bundle.format_pattern("msg") - assert not errors - assert "Value" in result - - def test_empty_pattern_with_attribute(self) -> None: - """Empty pattern followed by attribute.""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg =\n .attr = Attribute\n") - result, errors = bundle.format_pattern("msg", attribute="attr") - assert not errors - assert "Attribute" in result - - def test_pattern_at_eof(self) -> None: - """Pattern at EOF without trailing newline.""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg = Value at EOF") - result, errors = bundle.format_pattern("msg") - assert not errors - assert "Value at EOF" in result - - -class TestParsePatternTopLevelDelimiters: - """Tests for top-level pattern delimiter handling. - - In top-level patterns (not inside select expressions), characters - like }, [, * are literal text, not structural delimiters. - """ - - def test_close_brace_is_text(self) -> None: - """} is literal text in top-level patterns.""" - result = parse_pattern(Cursor("}text", 0)) - assert result is not None - assert len(result.value.elements) == 1 - assert result.value.elements[0].value == "}text" # type: ignore[union-attr] - - def test_bracket_is_text(self) -> None: - """[ is literal text in top-level patterns.""" - result = parse_pattern(Cursor("[text", 0)) - assert result is not None - assert len(result.value.elements) == 1 - assert result.value.elements[0].value == "[text" # type: ignore[union-attr] - - def test_asterisk_is_text(self) -> None: - """* is literal text in top-level patterns.""" - result = parse_pattern(Cursor("*text", 0)) - assert result is not None - assert len(result.value.elements) == 1 - assert result.value.elements[0].value == "*text" # type: ignore[union-attr] - - def test_special_char_sequences(self) -> None: - """Multiple delimiters are all literal text.""" - result = parse_pattern(Cursor("}}]]", 0)) - assert result is not None - assert len(result.value.elements) == 1 - assert result.value.elements[0].value == "}}]]" # type: ignore[union-attr] - - def test_stop_char_advances_cursor(self) -> None: - """] at position 0 advances cursor to prevent infinite loop.""" - result = parse_pattern(Cursor("]", 0)) - assert result is not None - assert result.cursor.pos >= 1 or result.cursor.is_eof - - def test_includes_special_chars_combined(self) -> None: - """All delimiter characters are literal in top-level patterns.""" - for delimiter in ["}", "[", "*"]: - result = parse_pattern(Cursor(f"text{delimiter}more", 0)) - assert result is not None - assert len(result.value.elements) == 1 - expected = f"text{delimiter}more" - assert result.value.elements[0].value == expected # type: ignore[union-attr] - - -class TestParsePatternContinuation: - """Tests for continuation handling in parse_pattern.""" - - def test_crlf_multiline(self) -> None: - """CRLF in multiline continuation.""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg = First line\r\n Second line") - result, _ = bundle.format_pattern("msg") - assert "First line" in result - assert "Second line" in result - - def test_cr_only_continuation(self) -> None: - """CR (old Mac style) at continuation.""" - cursor = Cursor("msg = First\r Second", 6) - result = parse_pattern(cursor) - assert result is not None - assert len(result.value.elements) > 0 - - def test_continuation_after_placeable(self) -> None: - """Multiline continuation after placeable adds space element.""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg = {NUMBER(5)}\n continued text") - result, _ = bundle.format_pattern("msg") - assert "5" in result - assert "continued text" in result - - def test_extra_spaces_before_placeable(self) -> None: - """Extra indentation before placeable in top-level pattern.""" - ftl = "msg =\n first\n {$var}" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - has_placeable = any( - isinstance(e, Placeable) for e in msg.value.elements - ) - assert has_placeable - - def test_trailing_extra_spaces(self) -> None: - """Trailing extra spaces at end of top-level pattern.""" - ftl = "msg =\n first\n " - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - assert len(msg.value.elements) >= 1 - - def test_extra_indent_preserved(self) -> None: - """Extra indentation beyond common indent is preserved.""" - ftl = "msg =\n first\n second" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert "first" in text - assert "second" in text - - def test_varying_extra_indent(self) -> None: - """Multiple lines with varying extra indentation.""" - ftl = "msg =\n base\n extra4\n extra8" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - assert len(msg.value.elements) >= 1 - - def test_accumulated_spaces_prepended(self) -> None: - """Accumulated extra spaces prepended to following text.""" - ftl = "msg =\n first\n more text" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert "first" in text - assert "more text" in text - - def test_multiple_continuations_varying_indent(self) -> None: - """Multiple continuation lines with varying extra indentation.""" - ftl = "msg =\n l1\n l2\n l3\n l4" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - for line in ["l1", "l2", "l3", "l4"]: - assert line in text - - def test_continuation_new_element_no_prior(self) -> None: - """Accumulated continuation before text, no prior elements.""" - result = parse_pattern(Cursor(" continuation\n more", 0)) - assert result is not None - - def test_continuation_new_element_last_placeable(self) -> None: - """Accumulated continuation merged after placeable.""" - result = parse_pattern(Cursor("{$x}\n text more", 0)) - assert result is not None - - def test_finalize_continuation_no_prior(self) -> None: - """Finalize accumulated text when no prior elements.""" - result = parse_pattern(Cursor(" only continuation", 0)) - assert result is not None - - def test_finalize_continuation_last_placeable(self) -> None: - """Finalize accumulated text when last is placeable.""" - result = parse_pattern(Cursor("{$x}\n final", 0)) - assert result is not None - - def test_empty_pattern_continuation(self) -> None: - """Continuation with empty elements list (newline at pos 0).""" - result = parse_pattern(Cursor("\n text", 0)) - assert result is not None - - def test_term_extra_indent_before_placeable(self) -> None: - """Term with extra indentation before placeable.""" - ftl = "-term =\n first\n {$var}" - resource = parse_ftl(ftl) - term = resource.entries[0] - assert isinstance(term, Term) - assert term.value is not None - has_placeable = any( - isinstance(e, Placeable) for e in term.value.elements - ) - assert has_placeable - - -# ============================================================================ -# MULTILINE BLANK LINES -# ============================================================================ - - -class TestMultilineBlankLines: - """Tests for blank line handling in multiline patterns.""" - - def test_single_blank_line_before_content(self) -> None: - """Single blank line before content strips indentation.""" - ftl = "msg =\n\n value" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - assert msg.value.elements[0].value == "value" # type: ignore[union-attr] - - def test_multiple_blank_lines_before_content(self) -> None: - """Multiple blank lines before content strips indentation.""" - ftl = "msg =\n\n\n\n value" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value.elements[0].value == "value" # type: ignore[union-attr] - - def test_with_subsequent_lines(self) -> None: - """Blank line before content with subsequent lines.""" - ftl = "msg =\n\n first\n second" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert text == "first\nsecond" - - def test_with_extra_indentation(self) -> None: - """Blank line before content preserves extra indentation.""" - ftl = "msg =\n\n first\n second" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert text == "first\n second" - - def test_bundle_format(self) -> None: - """FluentBundle correctly formats with blank line before content.""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg =\n\n Hello World") - result, errors = bundle.format_pattern("msg") - assert not errors - assert result == "Hello World" - - def test_with_placeable(self) -> None: - """Blank line before content with placeable.""" - bundle = FluentBundle("en_US") - bundle.add_resource("msg =\n\n Hello { $name }") - result, errors = bundle.format_pattern( - "msg", {"name": "Alice"} - ) - assert not errors - assert "Hello" in result - assert "Alice" in result - - def test_blank_line_at_end(self) -> None: - """Blank line at end of pattern handled correctly.""" - ftl = "msg =\n first\n\n second" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert "first" in text - assert "second" in text - - def test_mixed_blank_lines(self) -> None: - """Blank lines at various positions.""" - ftl = "msg =\n\n first\n\n second\n\n third" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert "first" in text - assert "second" in text - assert "third" in text - - def test_term_blank_line_before_content(self) -> None: - """Term with blank line before content.""" - ftl = "-brand =\n\n Firefox" - resource = parse_ftl(ftl) - term = resource.entries[0] - assert isinstance(term, Term) - text = "".join( - e.value - for e in term.value.elements - if isinstance(e, TextElement) - ) - assert text == "Firefox" - - def test_multiple_blank_lines_in_continuation(self) -> None: - """Multiple consecutive blank lines within continuation.""" - ftl = "msg =\n first\n\n\n second" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert "first" in text - assert "second" in text - - def test_term_blank_lines_in_continuation(self) -> None: - """Term with blank lines in continuation.""" - ftl = "-term =\n\n\n content" - resource = parse_ftl(ftl) - term = resource.entries[0] - assert isinstance(term, Term) - text = "".join( - e.value - for e in term.value.elements - if isinstance(e, TextElement) - ) - assert text == "content" - - def test_placeable_after_blanks_with_extra_indent(self) -> None: - """Placeable after blank lines with extra indentation.""" - ftl = "msg =\n text\n\n\n {$var}" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - has_text = any( - isinstance(e, TextElement) for e in msg.value.elements - ) - has_placeable = any( - isinstance(e, Placeable) for e in msg.value.elements - ) - assert has_text - assert has_placeable - - def test_only_extra_spaces_no_content(self) -> None: - """Continuation with only extra spaces, no actual content.""" - ftl = "msg =\n text\n\n more" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert "text" in text - assert "more" in text - - def test_complex_mixed_pattern(self) -> None: - """Complex pattern mixing all edge cases.""" - ftl = "msg =\n\n\n first\n\n {$var}\n\n\n last" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - has_text = any( - isinstance(e, TextElement) for e in msg.value.elements - ) - has_placeable = any( - isinstance(e, Placeable) for e in msg.value.elements - ) - assert has_text - assert has_placeable - - def test_original_regression(self) -> None: - """FTL-GRAMMAR-001: blank line sets common_indent to 0.""" - ftl = "msg =\n\n value" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - element = msg.value.elements[0] # type: ignore[union-attr] - assert isinstance(element, TextElement) - assert element.value == "value", ( - f"common_indent bug: expected 'value', got " - f"'{element.value}'" - ) - - def test_regression_variant_simple_pattern(self) -> None: - """Regression: parse_simple_pattern blank line indent.""" - ftl = """msg = { $n -> - [one] - - item - *[other] items -}""" - bundle = FluentBundle("en_US") - bundle.add_resource(ftl) - result, errors = bundle.format_pattern("msg", {"n": 1}) - assert not errors - assert "item" in result - assert " item" not in result - - @pytest.mark.parametrize( - ("ftl", "expected"), - [ - ("msg =\n\n x", "x"), - ("msg =\n\n\n x", "x"), - ("msg =\n\n\n\n\n x", "x"), - ("msg =\n\n x", "x"), - ("msg =\n\n x", "x"), - ], - ) - def test_parametrized_blank_line_scenarios( - self, ftl: str, expected: str - ) -> None: - """Various blank line scenarios all strip indentation.""" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert text == expected - - -# ============================================================================ -# VARIANT DELIMITER LOOKAHEAD -# ============================================================================ - - -class TestVariantDelimiterLookahead: - """Tests for variant delimiter (* and [) in pattern text.""" - - def test_asterisk_literal_in_variant(self) -> None: - """'*' without '[' is treated as literal text.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource(""" -count = { $n -> - [one] 1 * item - *[other] { $n } * items -} -""") - result, errors = bundle.format_pattern("count", {"n": 1}) - assert "1 * item" in result - assert not errors - - def test_bracket_not_starting_variant(self) -> None: - """'[' not followed by valid key is treated as literal.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource(""" -msg = { $type -> - [info] [INFO] message - *[other] [?] unknown -} -""") - result, errors = bundle.format_pattern( - "msg", {"type": "info"} - ) - assert "[INFO] message" in result - assert not errors - - def test_math_expression_in_variant(self) -> None: - """Math-like expressions with * and [ in variant text.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource(""" -calc = { $op -> - [mul] Result: 3 * 5 = 15 - [arr] Array: [1, 2, 3] - *[other] Unknown operation -} -""") - result, _ = bundle.format_pattern("calc", {"op": "mul"}) - assert "3 * 5 = 15" in result - - result, _ = bundle.format_pattern("calc", {"op": "arr"}) - assert "[1, 2, 3]" in result - - def test_asterisk_bracket_is_variant(self) -> None: - """'*[' still correctly marks default variant.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource(""" -example = { $x -> - [a] Value A - *[b] Default B -} -""") - result, errors = bundle.format_pattern( - "example", {"x": "unknown"} - ) - assert not errors - assert "Default B" in result - - def test_numeric_variant_key(self) -> None: - """[123] treated as variant key.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource(""" -indexed = { $i -> - [0] Zero - [1] One - *[2] Default -} -""") - result, errors = bundle.format_pattern("indexed", {"i": 0}) - assert not errors - assert "Zero" in result - - def test_complex_asterisk_and_brackets(self) -> None: - """Both * and [] as literals in variant text.""" - bundle = FluentBundle("en_US", use_isolating=False) - bundle.add_resource(""" -complex = { $mode -> - [matrix] See [matrix * vector] for details - [calc] Compute a * b + c - *[other] No special chars -} -""") - result, _ = bundle.format_pattern( - "complex", {"mode": "matrix"} - ) - assert "[matrix * vector]" in result - - def test_variant_pattern_fails(self) -> None: - """parse_variant returns None on malformed input.""" - cursor = Cursor("[one] {@", 0) - assert parse_variant(cursor) is None - - -# ============================================================================ -# HYPOTHESIS PROPERTY TESTS -# ============================================================================ - - -class TestPatternsHypothesis: - """Property-based tests for pattern and whitespace handling.""" - - @given(st.integers(min_value=0, max_value=100)) - def test_skip_blank_inline_various_counts( - self, space_count: int - ) -> None: - """Any number of spaces skipped by skip_blank_inline.""" - event(f"space_count={space_count}") - source = " " * space_count + "hello" - cursor = Cursor(source=source, pos=0) - assert skip_blank_inline(cursor).pos == space_count - - @given(st.integers(min_value=1, max_value=20)) - def test_is_indented_continuation_various( - self, indent_count: int - ) -> None: - """Any indentation level detected as continuation.""" - event(f"indent_count={indent_count}") - source = "\n" + " " * indent_count + "text" - cursor = Cursor(source=source, pos=0) - assert is_indented_continuation(cursor) is True - - @given( - extra_indent=st.integers(min_value=1, max_value=12), - base_indent=st.integers(min_value=4, max_value=8), - ) - def test_extra_spaces_before_placeable( - self, extra_indent: int, base_indent: int - ) -> None: - """Extra indentation before placeable is preserved.""" - boundary = "deep" if extra_indent > 8 else "shallow" - event(f"boundary={boundary}") - event(f"base_indent={base_indent}") - base = " " * base_indent - extra = " " * (base_indent + extra_indent) - ftl = f"msg =\n{base}text\n{extra}{{$var}}" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - elements = msg.value.elements - assert len(elements) >= 2 - assert isinstance(elements[-1], Placeable) - - @given(trailing_spaces=st.integers(min_value=1, max_value=20)) - def test_trailing_spaces_handled( - self, trailing_spaces: int - ) -> None: - """Patterns with trailing spaces parse successfully.""" - event(f"trailing_spaces={trailing_spaces}") - spaces = " " * trailing_spaces - ftl = f"msg =\n text\n{spaces}" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - - @given( - base_indent=st.integers(min_value=4, max_value=8), - extra_indent=st.integers(min_value=1, max_value=8), - ) - def test_extra_indent_handling( - self, base_indent: int, extra_indent: int - ) -> None: - """Extra indentation correctly accumulated.""" - event(f"extra_indent={extra_indent}") - event(f"base_indent={base_indent}") - base = " " * base_indent - extra = " " * (base_indent + extra_indent) - ftl = f"msg =\n{base}first\n{extra}second" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert "first" in text - assert "second" in text - - @given( - num_lines=st.integers(min_value=2, max_value=5), - indent_base=st.integers(min_value=4, max_value=8), - ) - def test_multiline_extra_indent_accumulation( - self, num_lines: int, indent_base: int - ) -> None: - """Multiple lines with extra indent accumulate correctly.""" - event(f"num_lines={num_lines}") - event(f"indent_base={indent_base}") - lines_ftl = [f"line{i}" for i in range(num_lines)] - base = " " * indent_base - ftl_lines = ["msg ="] - ftl_lines.append(f"{base}{lines_ftl[0]}") - for i in range(1, num_lines): - extra = " " * (i % 3) - ftl_lines.append(f"{base}{extra}{lines_ftl[i]}") - ftl = "\n".join(ftl_lines) - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - for line_text in lines_ftl: - assert line_text in text - - @given( - extra_indent=st.integers(min_value=1, max_value=8), - base_indent=st.integers(min_value=4, max_value=8), - ) - def test_term_extra_indent( - self, extra_indent: int, base_indent: int - ) -> None: - """Terms handle extra indentation like messages.""" - event(f"extra_indent={extra_indent}") - event(f"base_indent={base_indent}") - base = " " * base_indent - extra = " " * (base_indent + extra_indent) - ftl = f"-term =\n{base}first\n{extra}second" - resource = parse_ftl(ftl) - term = resource.entries[0] - assert isinstance(term, Term) - text = "".join( - e.value - for e in term.value.elements - if isinstance(e, TextElement) - ) - assert "first" in text - assert "second" in text - - @given( - num_blank_lines=st.integers(min_value=1, max_value=10), - indent_size=st.integers(min_value=1, max_value=8), - ) - def test_blank_lines_and_indentation( - self, num_blank_lines: int, indent_size: int - ) -> None: - """Any blank lines before content strip indent.""" - event(f"num_blank_lines={num_blank_lines}") - event(f"indent_size={indent_size}") - blank_lines = "\n" * num_blank_lines - indent = " " * indent_size - ftl = f"msg ={blank_lines}{indent}content" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - assert msg.value is not None - assert msg.value.elements[0].value == "content" # type: ignore[union-attr] - - @given( - content=st.text( - min_size=1, - max_size=50, - alphabet="abcdefghijklmnopqrstuvwxyz", - ) - ) - def test_content_preserved_after_blank_lines( - self, content: str - ) -> None: - """Content after blank lines is preserved exactly.""" - event(f"content_length={len(content)}") - ftl = f"msg =\n\n {content}" - resource = parse_ftl(ftl) - msg = resource.entries[0] - assert isinstance(msg, Message) - text = "".join( - e.value - for e in msg.value.elements # type: ignore[union-attr] - if isinstance(e, TextElement) - ) - assert text == content - - @example("Hello") - @example("Line1\nLine2") - @given(st.text(min_size=1, max_size=50)) - def test_parse_simple_pattern_property( - self, text: str - ) -> None: - """parse_simple_pattern handles arbitrary text.""" - if not text or text[0] in ("}", "[", "*"): - return - has_newline = "\n" in text - event(f"has_newline={has_newline}") - cursor = Cursor(text, 0) - result = parse_simple_pattern(cursor) - outcome = "parsed" if result else "none" - event(f"outcome={outcome}") - assert result is None or isinstance(result.value, Pattern) - - @example("value") - @example("{$x}") - @given(st.text(min_size=1, max_size=50)) - def test_parse_pattern_property(self, text: str) -> None: - """parse_pattern handles arbitrary text.""" - has_placeable = "{" in text - event(f"has_placeable={has_placeable}") - cursor = Cursor(text, 0) - result = parse_pattern(cursor) - outcome = "parsed" if result else "none" - event(f"outcome={outcome}") - assert result is None or isinstance(result.value, Pattern) +"""Aggregated syntax parser patterns test surface.""" + +from tests.syntax_parser_patterns_cases.hypothesis_property_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_patterns_cases.multiline_blank_lines import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_patterns_cases.parse_pattern_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_patterns_cases.parse_simple_pattern_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_patterns_cases.variant_delimiter_lookahead import * # noqa: F403 - re-export split test surface +from tests.syntax_parser_patterns_cases.whitespace_utilities import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_parser_property.py b/tests/test_syntax_parser_property.py index 32b3acab..af0dc1f1 100644 --- a/tests/test_syntax_parser_property.py +++ b/tests/test_syntax_parser_property.py @@ -1,2777 +1,6 @@ -"""Hypothesis property-based tests for Fluent parser. +"""Aggregated parser property test surface.""" -Focus on parser robustness, error recovery, and invariant properties. -Comprehensive coverage of FTL syntax elements, edge cases, and error recovery. -Includes round-trip, metamorphic, structural, and malformed-input properties. -""" - -from __future__ import annotations - -from decimal import Decimal - -from hypothesis import assume, event, example, given, settings -from hypothesis import strategies as st - -from ftllexengine.syntax.ast import ( - Comment, - Junk, - Message, - Resource, - Term, -) -from ftllexengine.syntax.parser import FluentParserV1 -from ftllexengine.syntax.serializer import FluentSerializer -from tests.strategies import ( - ftl_identifiers as shared_ftl_identifiers, -) -from tests.strategies import ( - ftl_simple_text, -) - -# ============================================================================ -# HYPOTHESIS STRATEGIES -# ============================================================================ - -# Valid FTL identifiers (using st.from_regex per hypothesis.md) -ftl_identifiers = st.from_regex(r"[a-z][a-z0-9_-]*", fullmatch=True) - -# Valid variable names (same as identifiers) -variable_names = ftl_identifiers - -# Text content without FTL special characters - remove arbitrary max_size -safe_text = st.text( - alphabet=st.characters( - blacklist_categories=["Cc"], - blacklist_characters=["{", "}", "[", "]", "$", "-", "*", ".", "#", "\n"], - ), - min_size=1, -).filter(lambda s: s.strip()) - -# Numbers for numeric literals - remove arbitrary bounds -numbers = st.integers() -decimals = st.decimals( - allow_nan=False, - allow_infinity=False, -) - -# Attribute names -attribute_names = ftl_identifiers - -# Variant keys - use st.from_regex, remove arbitrary max_size -variant_keys = st.from_regex(r"[a-z][a-z0-9]*", fullmatch=True) - - -class TestParserRobustness: - """Property-based tests for parser robustness.""" - - @given( - # Use ftl_identifiers strategy - cleaner and unconstrained - identifier=ftl_identifiers, - ) - @settings(max_examples=200) - def test_simple_message_always_parses(self, identifier: str) -> None: - """Simple message with valid identifier always parses successfully.""" - source = f"{identifier} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should always produce a resource - assert resource is not None - assert hasattr(resource, "entries") - # Should have exactly one entry (the message) - assert len(resource.entries) == 1 - # That entry should be a Message - assert isinstance(resource.entries[0], Message) - - # Emit event for identifier characteristics (HypoFuzz guidance) - if "-" in identifier: - event("identifier=has_hyphen") - if "_" in identifier: - event("identifier=has_underscore") - if any(c.isdigit() for c in identifier): - event("identifier=has_digit") - - @given( - identifier=st.text( - alphabet=st.characters( - whitelist_categories=("Lu", "Ll"), min_codepoint=97, max_codepoint=122 - ), - min_size=1, - max_size=20, - ).filter(lambda x: x[0].isalpha()), - value=st.text( - alphabet=st.characters(blacklist_categories=["Cc"], blacklist_characters="{}\n"), - min_size=0, - max_size=100, - ), - ) - @settings(max_examples=200) - def test_message_with_arbitrary_value_parses( - self, identifier: str, value: str - ) -> None: - """Messages with arbitrary (non-special) text values parse.""" - source = f"{identifier} = {value}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - # Should have at least one entry - assert len(resource.entries) >= 1 - # First entry should be a Message (possibly with junk value) - first_entry = resource.entries[0] - assert isinstance(first_entry, (Message, Junk)) - - # Emit events for HypoFuzz guidance - event(f"entry_type={type(first_entry).__name__}") - if len(value) > 50: - event("value_length=long") - elif len(value) > 10: - event("value_length=medium") - else: - event("value_length=short") - - @given( - comment_text=st.text( - alphabet=st.characters(blacklist_categories=["Cc"], blacklist_characters="#"), - min_size=0, - max_size=100, - ), - ) - @settings(max_examples=150) - def test_single_line_comment_always_parses(self, comment_text: str) -> None: - """Single-line comments with arbitrary text parse successfully.""" - source = f"# {comment_text}\nkey = value" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse (comment + message) - assert resource is not None - assert len(resource.entries) >= 1 - - # Emit events for HypoFuzz guidance - if len(comment_text) > 50: - event("comment_length=long") - elif len(comment_text) > 10: - event("comment_length=medium") - else: - event("comment_length=short") - - @given( - num_newlines=st.integers(min_value=0, max_value=10), - ) - @settings(max_examples=50) - def test_blank_lines_do_not_affect_parsing(self, num_newlines: int) -> None: - """Multiple blank lines should not affect parsing.""" - source = f"key1 = value1{'\\n' * num_newlines}key2 = value2" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse both messages regardless of blank lines - assert resource is not None - # Should have at least one entry (message or junk) - assert len(resource.entries) >= 1 - # Check that we have Messages and/or Junk (not empty) - for entry in resource.entries: - assert isinstance(entry, (Message, Junk, Comment)) - - # Emit events for HypoFuzz guidance - if num_newlines == 0: - event("blank_lines=none") - elif num_newlines <= 2: - event("blank_lines=few") - else: - event("blank_lines=many") - - @given( - invalid_start=st.text( - alphabet=st.characters(whitelist_categories=("P", "S")), - min_size=1, - max_size=5, - ).filter(lambda x: x[0] not in "#-"), - ) - @settings(max_examples=100) - def test_invalid_entry_creates_junk(self, invalid_start: str) -> None: - """Invalid entry start characters create junk entries.""" - source = f"{invalid_start} invalid\nkey = value" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should recover and parse something (message or junk) - assert resource is not None - # Parser should produce entries (even if junk) - assert len(resource.entries) >= 1 - - # Emit events for HypoFuzz guidance - has_junk = any(isinstance(e, Junk) for e in resource.entries) - event(f"recovery={'has_junk' if has_junk else 'no_junk'}") - - -class TestParserInvariants: - """Metamorphic and invariant properties of the parser.""" - - @given( - source=st.text( - alphabet=st.characters( - whitelist_categories=("Lu", "Ll"), - min_codepoint=32, - max_codepoint=126, - ), - min_size=0, - max_size=500, - ), - ) - @settings(max_examples=200) - def test_parser_never_crashes(self, source: str) -> None: - """Parser should never crash, regardless of input.""" - parser = FluentParserV1() - - # Should not raise exceptions - parser always returns a resource - resource = parser.parse(source) - assert resource is not None - - # Emit events for entry type distribution (HypoFuzz guidance) - junk_count = sum(1 for e in resource.entries if isinstance(e, Junk)) - msg_count = sum(1 for e in resource.entries if isinstance(e, Message)) - if junk_count > 0: - event(f"parse_result=has_junk_{min(junk_count, 5)}") - if msg_count > 0: - event(f"parse_result=has_messages_{min(msg_count, 5)}") - if len(resource.entries) == 0: - event("parse_result=empty") - - @given( - identifier=st.text( - alphabet=st.characters( - whitelist_categories=("Lu", "Ll"), min_codepoint=97, max_codepoint=122 - ), - min_size=1, - max_size=20, - ).filter(lambda x: x[0].isalpha()), - ) - @settings(max_examples=100) - def test_parse_idempotence(self, identifier: str) -> None: - """Parsing the same source twice yields equivalent results.""" - source = f"{identifier} = value" - parser = FluentParserV1() - - resource1 = parser.parse(source) - resource2 = parser.parse(source) - - # Both should have same number of entries - assert len(resource1.entries) == len(resource2.entries) - - # Emit events for HypoFuzz guidance - if len(identifier) > 10: - event("identifier_length=long") - elif len(identifier) > 5: - event("identifier_length=medium") - else: - event("identifier_length=short") - - @given( - whitespace=st.text(alphabet=st.sampled_from([" ", "\t"]), min_size=0, max_size=10), - ) - @settings(max_examples=100) - def test_leading_whitespace_invariance(self, whitespace: str) -> None: - """Leading whitespace on continuation lines is significant.""" - # Indented continuation should be treated as continuation - source1 = "key = value" - source2 = f"key = value\n{whitespace} continuation" - - parser = FluentParserV1() - resource1 = parser.parse(source1) - resource2 = parser.parse(source2) - - # Both should parse (resource2 might have continuation) - assert resource1 is not None - assert resource2 is not None - - # Emit events for HypoFuzz guidance - has_tabs = "\t" in whitespace - has_spaces = " " in whitespace - if has_tabs and has_spaces: - event("whitespace_type=mixed") - elif has_tabs: - event("whitespace_type=tabs") - elif has_spaces: - event("whitespace_type=spaces") - else: - event("whitespace_type=none") - - -class TestParserEdgeCases: - """Edge cases and boundary conditions.""" - - @given( - num_hashes=st.integers(min_value=1, max_value=10), - ) - @settings(max_examples=50) - def test_comment_hash_count_validation(self, num_hashes: int) -> None: - """Comments with different hash counts are handled correctly.""" - source = f"{'#' * num_hashes} Comment\nkey = value" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should handle any number of hashes (1-3 valid, >3 creates junk) - assert resource is not None - # Should have at least one entry (comment/message or junk) - assert len(resource.entries) >= 1 - - # Emit events for HypoFuzz guidance - if num_hashes == 1: - event("comment_type=standalone") - elif num_hashes == 2: - event("comment_type=group") - elif num_hashes == 3: - event("comment_type=resource") - else: - event("comment_type=invalid_many_hashes") - - @given( - depth=st.integers(min_value=1, max_value=5), - ) - @settings(max_examples=50) - def test_nested_placeables_parse(self, depth: int) -> None: - """Nested placeables up to reasonable depth parse.""" - # Create nested variable references (simplified test - just validates parsing) - inner = "$var" - source = f"key = {{ {inner} }}" - - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse (might create errors for invalid syntax) - assert resource is not None - - # Emit depth event for HypoFuzz guidance - event(f"depth={depth}") - - @given( - num_variants=st.integers(min_value=1, max_value=10), - ) - @settings(max_examples=50) - def test_select_expression_variant_count(self, num_variants: int) -> None: - """Select expressions with varying variant counts parse.""" - # Generate variants - variants = "\n".join([f" [{i}] Variant {i}" for i in range(num_variants)]) - source = f"key = {{ $num ->\\n{variants}\\n *[other] Default\\n}}" - - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse - assert resource is not None - - # Emit variant count event for HypoFuzz guidance - event(f"variant_count={min(num_variants, 10)}") - - def test_empty_source_produces_empty_resource(self) -> None: - """Empty source produces resource with no entries.""" - parser = FluentParserV1() - resource = parser.parse("") - - assert resource is not None - assert len(resource.entries) == 0 - - def test_only_whitespace_produces_empty_resource(self) -> None: - """Source with only whitespace produces empty or junk resource.""" - parser = FluentParserV1() - resource = parser.parse(" \n\t\n \n") - - assert resource is not None - # Whitespace-only source may produce empty resource (this is valid) - - @given( - identifier=st.text( - alphabet=st.characters( - whitelist_categories=("Lu", "Ll"), min_codepoint=97, max_codepoint=122 - ), - min_size=1, - max_size=20, - ).filter(lambda x: x[0].isalpha()), - num_attributes=st.integers(min_value=1, max_value=5), - ) - @settings(max_examples=100) - def test_message_with_multiple_attributes( - self, identifier: str, num_attributes: int - ) -> None: - """Messages with multiple attributes parse correctly.""" - attributes = "\n".join( - [f" .attr{i} = Value {i}" for i in range(num_attributes)] - ) - source = f"{identifier} = Main value\n{attributes}" - - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse message with attributes - assert resource is not None - # Should have at least one entry (the message) - assert len(resource.entries) >= 1 - # First entry should be a Message - first_entry = resource.entries[0] - assert isinstance(first_entry, (Message, Junk)) - - # Emit events for HypoFuzz guidance - event(f"attribute_count={min(num_attributes, 5)}") - - -class TestParserRecovery: - """Test error recovery and resilience.""" - - @given( - num_errors=st.integers(min_value=1, max_value=5), - ) - @settings(max_examples=50) - def test_multiple_errors_recovery(self, num_errors: int) -> None: - """Parser recovers from multiple consecutive errors.""" - # Create multiple invalid lines followed by valid message - invalid_lines = "\n".join([f"!!! invalid {i}" for i in range(num_errors)]) - source = f"{invalid_lines}\nkey = value" - - parser = FluentParserV1() - resource = parser.parse(source) - - # Should create junk entries and recover - assert resource is not None - # Should have at least one entry (junk from invalid lines and/or message) - assert len(resource.entries) >= 1 - - # Emit events for HypoFuzz guidance - event(f"error_count={min(num_errors, 5)}") - junk_count = sum(1 for e in resource.entries if isinstance(e, Junk)) - event(f"junk_entries={min(junk_count, 5)}") - - @given( - unicode_char=st.characters(min_codepoint=0x1F600, max_codepoint=0x1F64F), - ) - @settings(max_examples=50) - def test_unicode_emoji_in_values(self, unicode_char: str) -> None: - """Unicode emoji characters in values are handled.""" - source = f"key = Hello {unicode_char}" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse - assert resource is not None - - # Emit events for HypoFuzz guidance - event("unicode=emoji") - - def test_very_long_identifier(self) -> None: - """Very long identifiers are handled.""" - long_id = "a" * 1000 - source = f"{long_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse (or create junk if too long) - assert resource is not None - - def test_very_long_value(self) -> None: - """Very long values are handled.""" - long_value = "value " * 1000 - source = f"key = {long_value}" - parser = FluentParserV1() - resource = parser.parse(source) - - # Should parse - assert resource is not None - - -# ============================================================================ -# VARIABLE REFERENCES -# ============================================================================ - - -class TestVariableReferenceParsing: - """Property tests for variable reference parsing.""" - - @given(var_name=variable_names) - @settings(max_examples=200) - def test_simple_variable_reference_parses(self, var_name: str) -> None: - """PROPERTY: { $var } always parses successfully.""" - source = f"msg = {{ ${var_name} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - assert len(resource.entries) > 0 - - # Emit events for HypoFuzz guidance - event("variable_position=only") - if len(var_name) > 10: - event("var_name_length=long") - else: - event("var_name_length=short") - - @given(var_name=variable_names, text=safe_text) - @settings(max_examples=150) - def test_variable_with_surrounding_text(self, var_name: str, text: str) -> None: - """PROPERTY: Text { $var } text parses correctly.""" - source = f"msg = {text} {{ ${var_name} }} {text}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variable_position=middle") - - @given( - var1=variable_names, - var2=variable_names, - ) - @settings(max_examples=150) - def test_multiple_variable_references(self, var1: str, var2: str) -> None: - """PROPERTY: Multiple { $var1 } { $var2 } parse correctly.""" - source = f"msg = {{ ${var1} }} {{ ${var2} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variable_count=2") - if var1 == var2: - event("variable_uniqueness=same") - else: - event("variable_uniqueness=different") - - @given(var_name=variable_names) - @settings(max_examples=100) - def test_variable_at_message_start(self, var_name: str) -> None: - """PROPERTY: Message starting with { $var } parses.""" - source = f"msg = {{ ${var_name} }} text" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variable_position=start") - - @given(var_name=variable_names) - @settings(max_examples=100) - def test_variable_at_message_end(self, var_name: str) -> None: - """PROPERTY: Message ending with { $var } parses.""" - source = f"msg = text {{ ${var_name} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variable_position=end") - - @given(var_name=variable_names) - @settings(max_examples=100) - def test_variable_only_message(self, var_name: str) -> None: - """PROPERTY: Message with only { $var } parses.""" - source = f"msg = {{ ${var_name} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variable_position=only") - - @given( - var_name=variable_names, - count=st.integers(min_value=2, max_value=10), - ) - @settings(max_examples=50) - def test_repeated_variable_references(self, var_name: str, count: int) -> None: - """PROPERTY: Same variable referenced multiple times parses.""" - refs = " ".join([f"{{ ${var_name} }}" for _ in range(count)]) - source = f"msg = {refs}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event(f"variable_count={min(count, 10)}") - event("variable_uniqueness=repeated") - - -# ============================================================================ -# PLACEABLES -# ============================================================================ - - -class TestPlaceableParsing: - """Property tests for placeable expression parsing.""" - - @given(text=safe_text) - @settings(max_examples=150) - def test_placeable_with_string_literal(self, text: str) -> None: - """PROPERTY: { "string" } parses as placeable.""" - # Escape quotes in text - escaped = text.replace('"', '\\"') - source = f'msg = {{ "{escaped}" }}' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("placeable_type=string_literal") - - @given(number=numbers) - @settings(max_examples=150) - def test_placeable_with_number_literal(self, number: int) -> None: - """PROPERTY: { 123 } parses as placeable.""" - source = f"msg = {{ {number} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("placeable_type=number_literal") - if number < 0: - event("number_sign=negative") - elif number == 0: - event("number_sign=zero") - else: - event("number_sign=positive") - - @given( - msg_id=ftl_identifiers, - var_name=variable_names, - ) - @settings(max_examples=100) - def test_placeable_with_message_reference( - self, msg_id: str, var_name: str - ) -> None: - """PROPERTY: { message-id } parses as message reference.""" - source = f"{msg_id} = value\nmsg = {{ {var_name} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("placeable_type=message_ref") - - @given( - var_name=variable_names, - count=st.integers(min_value=1, max_value=5), - ) - @settings(max_examples=50) - def test_consecutive_placeables(self, var_name: str, count: int) -> None: - """PROPERTY: Multiple consecutive placeables parse.""" - placeables = "".join([f"{{ ${var_name}{i} }}" for i in range(count)]) - source = f"msg = {placeables}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event(f"consecutive_placeables={min(count, 5)}") - - @given( - var_name=variable_names, - whitespace=st.text(alphabet=" \t", min_size=0, max_size=5), - ) - @settings(max_examples=100) - def test_placeable_internal_whitespace( - self, var_name: str, whitespace: str - ) -> None: - """PROPERTY: Whitespace inside { } is handled.""" - source = f"msg = {{{whitespace}${var_name}{whitespace}}}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - if len(whitespace) == 0: - event("internal_whitespace=none") - elif "\t" in whitespace: - event("internal_whitespace=has_tabs") - else: - event("internal_whitespace=spaces_only") - - -# ============================================================================ -# SELECT EXPRESSIONS -# ============================================================================ - - -class TestSelectExpressionParsing: - """Property tests for select expression parsing.""" - - @given(var_name=variable_names) - @settings(max_examples=150) - def test_minimal_select_expression(self, var_name: str) -> None: - """PROPERTY: Minimal select { $var -> *[other] X } parses.""" - source = f"msg = {{ ${var_name} ->\n *[other] Default\n}}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("select_variant_count=1") - event("select_type=minimal") - - @given( - var_name=variable_names, - key1=variant_keys, - key2=variant_keys, - ) - @settings(max_examples=150) - def test_select_with_multiple_variants( - self, var_name: str, key1: str, key2: str - ) -> None: - """PROPERTY: Select with multiple variants parses.""" - source = f"""msg = {{ ${var_name} -> - [{key1}] Value1 - [{key2}] Value2 - *[other] Default -}}""" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("select_variant_count=3") - if key1 == key2: - event("variant_keys=duplicate") - else: - event("variant_keys=unique") - - @given( - var_name=variable_names, - count=st.integers(min_value=1, max_value=10), - ) - @settings(max_examples=50) - def test_select_with_many_variants(self, var_name: str, count: int) -> None: - """PROPERTY: Select with many variants parses.""" - variants = "\n".join([f" [key{i}] Value{i}" for i in range(count)]) - source = f"msg = {{ ${var_name} ->\n{variants}\n *[other] Default\n}}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event(f"select_variant_count={min(count + 1, 10)}") - - @given(var_name=variable_names, text=safe_text) - @settings(max_examples=100) - def test_select_variant_with_text(self, var_name: str, text: str) -> None: - """PROPERTY: Select variant values can contain text.""" - source = f"msg = {{ ${var_name} ->\n *[other] {text}\n}}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variant_value_type=text") - - @given( - var_name=variable_names, - var_in_variant=variable_names, - ) - @settings(max_examples=100) - def test_select_variant_with_placeable( - self, var_name: str, var_in_variant: str - ) -> None: - """PROPERTY: Select variant can contain placeables.""" - source = f"msg = {{ ${var_name} ->\n *[other] Text {{ ${var_in_variant} }}\n}}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variant_value_type=with_placeable") - - @given(var_name=variable_names, number=numbers) - @settings(max_examples=100) - def test_select_with_numeric_keys(self, var_name: str, number: int) -> None: - """PROPERTY: Select with numeric variant keys parses.""" - source = f"msg = {{ ${var_name} ->\n [{number}] Exact\n *[other] Default\n}}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("variant_key_type=numeric") - if number < 0: - event("numeric_key_sign=negative") - elif number == 0: - event("numeric_key_sign=zero") - else: - event("numeric_key_sign=positive") - - -# ============================================================================ -# TERMS -# ============================================================================ - - -class TestTermParsing: - """Property tests for term definition and reference parsing.""" - - @given(term_id=ftl_identifiers) - @settings(max_examples=150) - def test_simple_term_definition(self, term_id: str) -> None: - """PROPERTY: -term = value parses as term.""" - source = f"-{term_id} = Term value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - if len(resource.entries) > 0: - # Should be a Term entry - entry = resource.entries[0] - assert isinstance(entry, (Term, Message)) # Could be either - - # Emit events for HypoFuzz guidance - event(f"entry_type={type(entry).__name__}") - event("term_structure=simple") - - @given(term_id=ftl_identifiers, text=safe_text) - @settings(max_examples=100) - def test_term_with_text_value(self, term_id: str, text: str) -> None: - """PROPERTY: Term with text value parses.""" - source = f"-{term_id} = {text}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("term_structure=with_text") - - @given(term_id=ftl_identifiers, var_name=variable_names) - @settings(max_examples=100) - def test_term_with_placeable(self, term_id: str, var_name: str) -> None: - """PROPERTY: Term with placeable parses.""" - source = f"-{term_id} = Value {{ ${var_name} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("term_structure=with_placeable") - - @given(term_id=ftl_identifiers, attr_name=attribute_names) - @settings(max_examples=100) - def test_term_with_attribute(self, term_id: str, attr_name: str) -> None: - """PROPERTY: Term with attribute parses.""" - source = f"-{term_id} = Value\n .{attr_name} = Attribute value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("term_structure=with_attribute") - - @given( - msg_id=ftl_identifiers, - term_id=ftl_identifiers, - ) - @settings(max_examples=100) - def test_message_referencing_term(self, msg_id: str, term_id: str) -> None: - """PROPERTY: Message can reference term { -term }.""" - source = f"-{term_id} = Term\n{msg_id} = {{ -{term_id} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("term_ref_type=simple") - - @given( - msg_id=ftl_identifiers, - term_id=ftl_identifiers, - attr_name=attribute_names, - ) - @settings(max_examples=100) - def test_term_attribute_reference( - self, msg_id: str, term_id: str, attr_name: str - ) -> None: - """PROPERTY: Term attribute reference { -term.attr } parses.""" - source = ( - f"-{term_id} = Term\n" - f" .{attr_name} = Attr\n" - f"{msg_id} = {{ -{term_id}.{attr_name} }}" - ) - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("term_ref_type=with_attribute") - - -# ============================================================================ -# STRING LITERALS -# ============================================================================ - - -class TestStringLiteralParsing: - """Property tests for string literal parsing.""" - - @given(text=safe_text) - @settings(max_examples=150) - def test_simple_string_literal(self, text: str) -> None: - """PROPERTY: "text" parses as string literal.""" - escaped = text.replace('"', '\\"').replace("\\", "\\\\") - source = f'msg = {{ "{escaped}" }}' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - if len(text) == 0: - event("string_length=empty") - elif len(text) <= 10: - event("string_length=short") - elif len(text) <= 50: - event("string_length=medium") - else: - event("string_length=long") - - def test_empty_string_literal(self) -> None: - """PROPERTY: Empty string "" parses.""" - source = 'msg = { "" }' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - @given(char=st.characters(min_codepoint=32, max_codepoint=126)) - @settings(max_examples=100) - def test_string_with_single_char(self, char: str) -> None: - """PROPERTY: Single character strings parse.""" - if char == '"': - escaped = '\\"' - elif char == "\\": - escaped = "\\\\" - else: - escaped = char - source = f'msg = {{ "{escaped}" }}' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - if char in ('"', "\\"): - event("char_type=special_escape") - elif char.isalpha(): - event("char_type=alpha") - elif char.isdigit(): - event("char_type=digit") - else: - event("char_type=other") - - @given( - unicode_char=st.characters(min_codepoint=0x0100, max_codepoint=0xFFFF), - ) - @settings(max_examples=100) - def test_string_with_unicode(self, unicode_char: str) -> None: - """PROPERTY: String literals with Unicode parse.""" - source = f'msg = {{ "{unicode_char}" }}' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - codepoint = ord(unicode_char) - if codepoint < 0x0800: - event("unicode_range=latin_extended") - elif codepoint < 0x3000: - event("unicode_range=mid_bmp") - else: - event("unicode_range=cjk_symbols") - - -# ============================================================================ -# NUMBER LITERALS -# ============================================================================ - - -class TestNumberLiteralParsing: - """Property tests for number literal parsing.""" - - @given(number=numbers) - @settings(max_examples=200) - def test_integer_literal(self, number: int) -> None: - """PROPERTY: Integer literals parse correctly.""" - source = f"msg = {{ {number} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - if number < 0: - event("integer_sign=negative") - elif number == 0: - event("integer_sign=zero") - else: - event("integer_sign=positive") - if abs(number) > 1000000: - event("integer_magnitude=large") - - @given(decimal=decimals) - @settings(max_examples=150) - def test_decimal_literal(self, decimal: Decimal) -> None: - """PROPERTY: Decimal literals parse correctly.""" - # Use fixed-point notation to avoid scientific notation in FTL source - num_str = format(decimal, "f") - # Filter out strings that are too long for the parser - assume(len(num_str) <= 50) - source = f"msg = {{ {num_str} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - if decimal < Decimal(0): - event("decimal_sign=negative") - elif decimal == Decimal(0): - event("decimal_sign=zero") - else: - event("decimal_sign=positive") - # Check if it's a whole number decimal (use str to avoid overflow on huge Decimals) - _, _, frac_part = num_str.lstrip("-").partition(".") - if not frac_part or all(c == "0" for c in frac_part): - event("decimal_type=whole") - else: - event("decimal_type=fractional") - - def test_zero_literal(self) -> None: - """PROPERTY: Zero literal parses.""" - source = "msg = { 0 }" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - @given(number=st.integers(min_value=0, max_value=1000000)) - @settings(max_examples=100) - def test_positive_integer(self, number: int) -> None: - """PROPERTY: Positive integers parse.""" - source = f"msg = {{ {number} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("integer_sign=positive") - if number > 100000: - event("integer_magnitude=large") - elif number > 1000: - event("integer_magnitude=medium") - else: - event("integer_magnitude=small") - - @given(number=st.integers(min_value=-1000000, max_value=-1)) - @settings(max_examples=100) - def test_negative_integer(self, number: int) -> None: - """PROPERTY: Negative integers parse.""" - source = f"msg = {{ {number} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("integer_sign=negative") - if abs(number) > 100000: - event("integer_magnitude=large") - elif abs(number) > 1000: - event("integer_magnitude=medium") - else: - event("integer_magnitude=small") - - -# ============================================================================ -# MESSAGE STRUCTURE -# ============================================================================ - - -class TestMessageStructure: - """Property tests for message structure parsing.""" - - @given(msg_id=ftl_identifiers, text=safe_text) - @settings(max_examples=150) - def test_message_with_value_only(self, msg_id: str, text: str) -> None: - """PROPERTY: Message with only value parses.""" - source = f"{msg_id} = {text}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("message_structure=value_only") - - @given( - msg_id=ftl_identifiers, - attr_name=attribute_names, - text=safe_text, - ) - @settings(max_examples=150) - def test_message_with_single_attribute( - self, msg_id: str, attr_name: str, text: str - ) -> None: - """PROPERTY: Message with one attribute parses.""" - source = f"{msg_id} = Value\n .{attr_name} = {text}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("message_structure=value_and_attribute") - event("attribute_count=1") - - @given( - msg_id=ftl_identifiers, - count=st.integers(min_value=2, max_value=5), - ) - @settings(max_examples=50) - def test_message_with_multiple_attributes( - self, msg_id: str, count: int - ) -> None: - """PROPERTY: Message with multiple attributes parses.""" - attrs = "\n".join([f" .attr{i} = Value{i}" for i in range(count)]) - source = f"{msg_id} = Main\n{attrs}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("message_structure=value_and_attributes") - event(f"attribute_count={min(count, 5)}") - - @given(msg_id=ftl_identifiers, attr_name=attribute_names) - @settings(max_examples=100) - def test_message_attribute_only(self, msg_id: str, attr_name: str) -> None: - """PROPERTY: Message with only attributes (no value) parses.""" - source = f"{msg_id} =\n .{attr_name} = Attribute value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("message_structure=attribute_only") - - @given( - msg_id=ftl_identifiers, - var_name=variable_names, - ) - @settings(max_examples=100) - def test_message_value_with_placeable( - self, msg_id: str, var_name: str - ) -> None: - """PROPERTY: Message value with placeable parses.""" - source = f"{msg_id} = Text {{ ${var_name} }} more" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("message_structure=value_with_placeable") - - -# ============================================================================ -# COMMENTS -# ============================================================================ - - -class TestCommentParsing: - """Property tests for comment parsing.""" - - @given(text=safe_text) - @settings(max_examples=150) - def test_standalone_comment(self, text: str) -> None: - """PROPERTY: Standalone comment parses.""" - source = f"# {text}\n\nmsg = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("comment_level=standalone") - - @given(text=safe_text) - @settings(max_examples=100) - def test_group_comment(self, text: str) -> None: - """PROPERTY: Group comment ## parses.""" - source = f"## {text}\n\nmsg = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("comment_level=group") - - @given(text=safe_text) - @settings(max_examples=100) - def test_resource_comment(self, text: str) -> None: - """PROPERTY: Resource comment ### parses.""" - source = f"### {text}\n\nmsg = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("comment_level=resource") - - @given( - text=safe_text, - count=st.integers(min_value=1, max_value=5), - ) - @settings(max_examples=50) - def test_multiple_comment_lines(self, text: str, count: int) -> None: - """PROPERTY: Multiple consecutive comment lines parse.""" - comments = "\n".join([f"# {text} {i}" for i in range(count)]) - source = f"{comments}\n\nmsg = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event(f"comment_lines={min(count, 5)}") - - @given(msg_id=ftl_identifiers, text=safe_text) - @settings(max_examples=100) - def test_comment_attached_to_message(self, msg_id: str, text: str) -> None: - """PROPERTY: Comment immediately before message parses.""" - source = f"# {text}\n{msg_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("comment_position=attached") - - -# ============================================================================ -# WHITESPACE HANDLING -# ============================================================================ - - -class TestWhitespaceHandling: - """Property tests for whitespace handling.""" - - @given( - msg_id=ftl_identifiers, - spaces=st.integers(min_value=0, max_value=10), - ) - @settings(max_examples=100) - def test_spaces_before_equals(self, msg_id: str, spaces: int) -> None: - """PROPERTY: Spaces before = are handled.""" - source = f"{msg_id}{' ' * spaces}= value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("whitespace_position=before_equals") - if spaces == 0: - event("space_count=none") - elif spaces <= 3: - event("space_count=few") - else: - event("space_count=many") - - @given( - msg_id=ftl_identifiers, - spaces=st.integers(min_value=0, max_value=10), - ) - @settings(max_examples=100) - def test_spaces_after_equals(self, msg_id: str, spaces: int) -> None: - """PROPERTY: Spaces after = are handled.""" - source = f"{msg_id} ={' ' * spaces}value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("whitespace_position=after_equals") - if spaces == 0: - event("space_count=none") - elif spaces <= 3: - event("space_count=few") - else: - event("space_count=many") - - @given( - msg_id=ftl_identifiers, - indent=st.integers(min_value=4, max_value=12), - ) - @settings(max_examples=50) - def test_attribute_indentation(self, msg_id: str, indent: int) -> None: - """PROPERTY: Attribute indentation is handled.""" - source = f"{msg_id} = value\n{' ' * indent}.attr = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("whitespace_type=indentation") - if indent == 4: - event("indent_level=minimal") - elif indent <= 8: - event("indent_level=standard") - else: - event("indent_level=deep") - - @given( - msg_id=ftl_identifiers, - blank_lines=st.integers(min_value=0, max_value=5), - ) - @settings(max_examples=50) - def test_blank_lines_between_messages( - self, msg_id: str, blank_lines: int - ) -> None: - """PROPERTY: Blank lines between messages don't affect parsing.""" - source = f"{msg_id}1 = value1{chr(10) * blank_lines}{msg_id}2 = value2" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("whitespace_type=blank_lines") - if blank_lines == 0: - event("blank_line_count=none") - elif blank_lines == 1: - event("blank_line_count=single") - else: - event("blank_line_count=multiple") - - @given( - msg_id=ftl_identifiers, - trailing_spaces=st.integers(min_value=0, max_value=10), - ) - @settings(max_examples=50) - def test_trailing_whitespace(self, msg_id: str, trailing_spaces: int) -> None: - """PROPERTY: Trailing whitespace is handled.""" - source = f"{msg_id} = value{' ' * trailing_spaces}\n" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("whitespace_position=trailing") - if trailing_spaces == 0: - event("space_count=none") - elif trailing_spaces <= 3: - event("space_count=few") - else: - event("space_count=many") - - -# ============================================================================ -# FUNCTION CALLS -# ============================================================================ - - -class TestFunctionCallParsing: - """Property tests for function call parsing.""" - - @given(var_name=variable_names) - @settings(max_examples=150) - def test_number_function_call(self, var_name: str) -> None: - """PROPERTY: NUMBER($var) parses correctly.""" - source = f"msg = {{ NUMBER(${var_name}) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_name=NUMBER") - event("function_arg_type=variable") - - @given(var_name=variable_names) - @settings(max_examples=150) - def test_datetime_function_call(self, var_name: str) -> None: - """PROPERTY: DATETIME($var) parses correctly.""" - source = f"msg = {{ DATETIME(${var_name}) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_name=DATETIME") - event("function_arg_type=variable") - - @given(var_name=variable_names) - @settings(max_examples=100) - def test_function_with_named_arg(self, var_name: str) -> None: - """PROPERTY: FUNC($var, opt: val) parses.""" - source = f"msg = {{ NUMBER(${var_name}, minimumFractionDigits: 2) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_options=with_named") - event("option_value_type=numeric") - - @given(var_name=variable_names, number=numbers) - @settings(max_examples=100) - def test_function_with_numeric_option(self, var_name: str, number: int) -> None: - """PROPERTY: Function with numeric option parses.""" - source = f"msg = {{ NUMBER(${var_name}, minimumFractionDigits: {number}) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_options=with_numeric") - if number < 0: - event("option_value_sign=negative") - elif number == 0: - event("option_value_sign=zero") - else: - event("option_value_sign=positive") - - @given(var_name=variable_names) - @settings(max_examples=100) - def test_function_with_string_option(self, var_name: str) -> None: - """PROPERTY: Function with string option parses.""" - source = f'msg = {{ DATETIME(${var_name}, style: "long") }}' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_options=with_string") - event("option_value_type=string") - - @given( - var_name=variable_names, - count=st.integers(min_value=1, max_value=5), - ) - @settings(max_examples=50) - def test_function_with_multiple_options(self, var_name: str, count: int) -> None: - """PROPERTY: Function with multiple options parses.""" - options = ", ".join([f"opt{i}: {i}" for i in range(count)]) - source = f"msg = {{ NUMBER(${var_name}, {options}) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_options=multiple") - event(f"option_count={min(count, 5)}") - - @given(func_name=ftl_identifiers, var_name=variable_names) - @settings(max_examples=100) - def test_custom_function_call(self, func_name: str, var_name: str) -> None: - """PROPERTY: Custom function calls parse.""" - # Note: uppercase function names required - source = f"msg = {{ {func_name.upper()}(${var_name}) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_name=CUSTOM") - if len(func_name) <= 5: - event("function_name_length=short") - else: - event("function_name_length=long") - - @given(number=numbers) - @settings(max_examples=50) - def test_function_with_number_literal_arg(self, number: int) -> None: - """PROPERTY: Function with number literal argument parses.""" - source = f"msg = {{ NUMBER({number}) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_arg_type=literal") - if number < 0: - event("literal_sign=negative") - elif number == 0: - event("literal_sign=zero") - else: - event("literal_sign=positive") - - @given(var_name=variable_names) - @settings(max_examples=50) - def test_nested_function_calls(self, var_name: str) -> None: - """PROPERTY: Nested function calls parse (if supported).""" - # Most parsers support simple nesting - source = f"msg = {{ NUMBER(${var_name}) }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("function_nesting=simple") - - -# ============================================================================ -# MESSAGE REFERENCES -# ============================================================================ - - -class TestMessageReferenceParsing: - """Property tests for message reference parsing.""" - - @given(msg_id1=ftl_identifiers, msg_id2=ftl_identifiers) - @settings(max_examples=150) - def test_simple_message_reference(self, msg_id1: str, msg_id2: str) -> None: - """PROPERTY: { msg-id } references another message.""" - source = f"{msg_id1} = Value1\n{msg_id2} = {{ {msg_id1} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("msg_ref_type=simple") - if msg_id1 == msg_id2: - event("msg_ref_self=true") - else: - event("msg_ref_self=false") - - @given( - msg_id1=ftl_identifiers, - msg_id2=ftl_identifiers, - attr_name=attribute_names, - ) - @settings(max_examples=100) - def test_message_attribute_reference( - self, msg_id1: str, msg_id2: str, attr_name: str - ) -> None: - """PROPERTY: { msg.attr } references message attribute.""" - source = ( - f"{msg_id1} = Value\n" - f" .{attr_name} = Attr\n" - f"{msg_id2} = {{ {msg_id1}.{attr_name} }}" - ) - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("msg_ref_type=with_attribute") - - @given( - msg_id=ftl_identifiers, - count=st.integers(min_value=2, max_value=5), - ) - @settings(max_examples=50) - def test_multiple_message_references(self, msg_id: str, count: int) -> None: - """PROPERTY: Multiple message references in one pattern parse.""" - refs = " ".join([f"{{ {msg_id}{i} }}" for i in range(count)]) - # Create referenced messages - messages = "\n".join([f"{msg_id}{i} = Value{i}" for i in range(count)]) - source = f"{messages}\nfinal = {refs}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("msg_ref_type=multiple") - event(f"msg_ref_count={min(count, 5)}") - - @given(msg_id1=ftl_identifiers, msg_id2=ftl_identifiers, text=safe_text) - @settings(max_examples=100) - def test_message_reference_with_text( - self, msg_id1: str, msg_id2: str, text: str - ) -> None: - """PROPERTY: Message reference mixed with text parses.""" - source = f"{msg_id1} = Value\n{msg_id2} = {text} {{ {msg_id1} }} {text}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("msg_ref_type=mixed_with_text") - if len(text) == 0: - event("surrounding_text=empty") - else: - event("surrounding_text=present") - - -# ============================================================================ -# IDENTIFIER VALIDATION -# ============================================================================ - - -class TestIdentifierValidation: - """Property tests for identifier validation.""" - - @given( - prefix=st.text( - alphabet=st.characters(min_codepoint=97, max_codepoint=122), - min_size=1, - max_size=5, - ), - number=st.integers(min_value=0, max_value=999), - ) - @settings(max_examples=150) - def test_identifier_with_number_suffix(self, prefix: str, number: int) -> None: - """PROPERTY: Identifiers can have numeric suffixes.""" - msg_id = f"{prefix}{number}" - source = f"{msg_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("identifier_type=with_number_suffix") - if number == 0: - event("number_suffix=zero") - elif number < 10: - event("number_suffix=single_digit") - elif number < 100: - event("number_suffix=two_digit") - else: - event("number_suffix=three_digit") - - @given( - parts=st.lists( - st.text( - alphabet=st.characters(min_codepoint=97, max_codepoint=122), - min_size=1, - max_size=5, - ), - min_size=2, - max_size=5, - ), - ) - @settings(max_examples=100) - def test_identifier_with_hyphens(self, parts: list[str]) -> None: - """PROPERTY: Identifiers with hyphens parse.""" - msg_id = "-".join(parts) - source = f"{msg_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("identifier_type=with_hyphens") - event(f"identifier_parts={min(len(parts), 5)}") - - @given( - parts=st.lists( - st.text( - alphabet=st.characters(min_codepoint=97, max_codepoint=122), - min_size=1, - max_size=5, - ), - min_size=2, - max_size=5, - ), - ) - @settings(max_examples=100) - def test_identifier_with_underscores(self, parts: list[str]) -> None: - """PROPERTY: Identifiers with underscores parse.""" - msg_id = "_".join(parts) - source = f"{msg_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("identifier_type=with_underscores") - event(f"identifier_parts={min(len(parts), 5)}") - - @given(length=st.integers(min_value=1, max_value=100)) - @settings(max_examples=50) - def test_identifier_length_handling(self, length: int) -> None: - """PROPERTY: Identifiers of various lengths parse.""" - msg_id = "a" * length - source = f"{msg_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("identifier_type=length_test") - if length == 1: - event("identifier_length=minimal") - elif length <= 10: - event("identifier_length=short") - elif length <= 50: - event("identifier_length=medium") - else: - event("identifier_length=long") - - @given( - msg_id=ftl_identifiers, - uppercase_count=st.integers(min_value=0, max_value=5), - ) - @settings(max_examples=100) - def test_identifier_case_sensitivity( - self, msg_id: str, uppercase_count: int - ) -> None: - """PROPERTY: Identifier case is preserved.""" - # Mix case by uppercasing some characters - chars = list(msg_id) - for i in range(min(uppercase_count, len(chars))): - chars[i] = chars[i].upper() - mixed_case_id = "".join(chars) - source = f"{mixed_case_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("identifier_type=mixed_case") - if uppercase_count == 0: - event("case_mix=all_lower") - elif uppercase_count >= len(chars): - event("case_mix=all_upper") - else: - event("case_mix=mixed") - - -# ============================================================================ -# ESCAPE SEQUENCES -# ============================================================================ - - -class TestEscapeSequenceParsing: - """Property tests for escape sequence handling.""" - - def test_unicode_escape_basic(self) -> None: - """PROPERTY: Basic Unicode escapes parse.""" - source = r'msg = { "\u0041" }' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - @given( - codepoint=st.integers( - min_value=0x0020, - max_value=0xD7FF, - ), # Valid Unicode range - ) - @settings(max_examples=100) - def test_unicode_escape_various_codepoints(self, codepoint: int) -> None: - """PROPERTY: Unicode escapes for various codepoints parse.""" - source = f'msg = {{ "\\u{codepoint:04X}" }}' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("escape_type=unicode") - if codepoint < 0x0080: - event("codepoint_range=ascii") - elif codepoint < 0x0800: - event("codepoint_range=latin_extended") - elif codepoint < 0x3000: - event("codepoint_range=mid_bmp") - else: - event("codepoint_range=cjk_symbols") - - def test_escaped_quote_in_string(self) -> None: - """PROPERTY: Escaped quotes in strings parse.""" - source = r'msg = { "He said \"Hello\"" }' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - def test_escaped_backslash_in_string(self) -> None: - """PROPERTY: Escaped backslashes parse.""" - source = r'msg = { "Path: C:\\Windows" }' - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - def test_escaped_braces_in_text(self) -> None: - """PROPERTY: Escaped braces in text parse.""" - source = r"msg = Literal \{ and \} braces" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - -# ============================================================================ -# LINE ENDING HANDLING -# ============================================================================ - - -class TestLineEndingHandling: - """Property tests for line ending handling.""" - - @given(msg_id=ftl_identifiers) - @settings(max_examples=100) - def test_unix_line_endings(self, msg_id: str) -> None: - """PROPERTY: Unix \\n line endings parse correctly.""" - source = f"{msg_id}1 = value1\n{msg_id}2 = value2\n" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("line_ending_type=unix") - - @given(msg_id=ftl_identifiers) - @settings(max_examples=100) - def test_windows_line_endings(self, msg_id: str) -> None: - """PROPERTY: Windows \\r\\n line endings parse correctly.""" - source = f"{msg_id}1 = value1\r\n{msg_id}2 = value2\r\n" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("line_ending_type=windows") - - @given(msg_id=ftl_identifiers) - @settings(max_examples=100) - def test_old_mac_line_endings(self, msg_id: str) -> None: - """PROPERTY: Old Mac \\r line endings parse.""" - source = f"{msg_id}1 = value1\r{msg_id}2 = value2\r" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("line_ending_type=old_mac") - - @given(msg_id=ftl_identifiers) - @settings(max_examples=50) - def test_mixed_line_endings(self, msg_id: str) -> None: - """PROPERTY: Mixed line endings are handled.""" - source = f"{msg_id}1 = value1\n{msg_id}2 = value2\r\n{msg_id}3 = value3\r" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("line_ending_type=mixed") - - @given(msg_id=ftl_identifiers) - @settings(max_examples=50) - def test_no_final_newline(self, msg_id: str) -> None: - """PROPERTY: Source without final newline parses.""" - source = f"{msg_id} = value" # No trailing newline - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("line_ending_type=no_final") - - -# ============================================================================ -# UTF-8 BOM HANDLING -# ============================================================================ - - -class TestUTF8BOMHandling: - """Property tests for UTF-8 BOM handling.""" - - @given(msg_id=ftl_identifiers) - @settings(max_examples=100) - def test_utf8_bom_at_start(self, msg_id: str) -> None: - """PROPERTY: UTF-8 BOM at file start is handled.""" - bom = "\ufeff" - source = f"{bom}{msg_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("bom_presence=with_bom") - - @given(msg_id=ftl_identifiers) - @settings(max_examples=50) - def test_source_without_bom(self, msg_id: str) -> None: - """PROPERTY: Source without BOM parses normally.""" - source = f"{msg_id} = value" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("bom_presence=without_bom") - - @given(msg_id=ftl_identifiers, text=safe_text) - @settings(max_examples=50) - def test_bom_only_at_start(self, msg_id: str, text: str) -> None: - """PROPERTY: BOM only valid at file start.""" - source = f"{msg_id} = {text}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("bom_presence=no_bom_with_content") - if len(text) == 0: - event("text_content=empty") - else: - event("text_content=present") - - -# ============================================================================ -# PATTERN ELEMENT BOUNDARIES -# ============================================================================ - - -class TestPatternElementBoundaries: - """Property tests for pattern element boundaries.""" - - @given(var_name=variable_names, text=safe_text) - @settings(max_examples=100) - def test_text_placeable_boundary(self, var_name: str, text: str) -> None: - """PROPERTY: Boundary between text and placeable is correct.""" - source = f"msg = {text}{{ ${var_name} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("boundary_type=text_placeable") - if len(text) == 0: - event("prefix_text=empty") - else: - event("prefix_text=present") - - @given(var_name=variable_names, text=safe_text) - @settings(max_examples=100) - def test_placeable_text_boundary(self, var_name: str, text: str) -> None: - """PROPERTY: Boundary between placeable and text is correct.""" - source = f"msg = {{ ${var_name} }}{text}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("boundary_type=placeable_text") - if len(text) == 0: - event("suffix_text=empty") - else: - event("suffix_text=present") - - @given( - var1=variable_names, - var2=variable_names, - ) - @settings(max_examples=100) - def test_placeable_placeable_boundary(self, var1: str, var2: str) -> None: - """PROPERTY: Adjacent placeables have correct boundary.""" - source = f"msg = {{ ${var1} }}{{ ${var2} }}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("boundary_type=placeable_placeable") - if var1 == var2: - event("adjacent_vars=same") - else: - event("adjacent_vars=different") - - @given(text1=safe_text, text2=safe_text) - @settings(max_examples=50) - def test_text_text_concatenation(self, text1: str, text2: str) -> None: - """PROPERTY: Consecutive text elements are handled.""" - source = f"msg = {text1} {text2}" - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("boundary_type=text_text") - total_len = len(text1) + len(text2) - if total_len == 0: - event("combined_text=empty") - elif total_len <= 20: - event("combined_text=short") - else: - event("combined_text=long") - - -# ============================================================================ -# MULTILINE PATTERNS -# ============================================================================ - - -class TestMultilinePatterns: - """Property tests for multiline pattern handling.""" - - @given(msg_id=ftl_identifiers, lines=st.lists(safe_text, min_size=2, max_size=5)) - @settings(max_examples=100) - def test_multiline_text_value(self, msg_id: str, lines: list[str]) -> None: - """PROPERTY: Multiline text values parse.""" - # Indent continuation lines - text_lines = [lines[0]] + [f" {line}" for line in lines[1:]] - source = f"{msg_id} =\n" + "\n".join(text_lines) - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("multiline_type=text_only") - event(f"line_count={min(len(lines), 5)}") - - @given( - msg_id=ftl_identifiers, - var_name=variable_names, - lines=st.lists(safe_text, min_size=2, max_size=5), - ) - @settings(max_examples=50) - def test_multiline_with_placeables( - self, msg_id: str, var_name: str, lines: list[str] - ) -> None: - """PROPERTY: Multiline patterns with placeables parse.""" - text_lines = [f"{lines[0]} {{ ${var_name} }}"] + [ - f" {line}" for line in lines[1:] - ] - source = f"{msg_id} =\n" + "\n".join(text_lines) - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("multiline_type=with_placeables") - event(f"line_count={min(len(lines), 5)}") - - @given( - msg_id=ftl_identifiers, - indent=st.integers(min_value=4, max_value=12), - ) - @settings(max_examples=50) - def test_multiline_indentation_consistency( - self, msg_id: str, indent: int - ) -> None: - """PROPERTY: Consistent indentation in multiline patterns.""" - source = ( - f"{msg_id} =\n" - f"{' ' * indent}Line 1\n" - f"{' ' * indent}Line 2\n" - f"{' ' * indent}Line 3" - ) - parser = FluentParserV1() - resource = parser.parse(source) - - assert resource is not None - - # Emit events for HypoFuzz guidance - event("multiline_type=consistent_indent") - if indent == 4: - event("indent_level=minimal") - elif indent <= 8: - event("indent_level=standard") - else: - event("indent_level=deep") - - -# ============================================================================ -# ROUND-TRIP PROPERTIES -# ============================================================================ - - -class TestParserRoundTrip: - """Property: parse(serialize(parse(source))) preserves AST structure.""" - - @given( - msg_id=shared_ftl_identifiers(), - msg_value=ftl_simple_text(), - ) - @settings(max_examples=1000) - def test_simple_message_roundtrip( - self, msg_id: str, msg_value: str - ) -> None: - """Simple messages round-trip through serialize/parse.""" - parser = FluentParserV1() - serializer = FluentSerializer() - - ftl_source = f"{msg_id} = {msg_value}" - resource1 = parser.parse(ftl_source) - entry_count = len(resource1.entries) - event(f"entry_count={entry_count}") - - assert entry_count > 0 - - serialized = serializer.serialize(resource1) - resource2 = parser.parse(serialized) - - assert len(resource2.entries) == entry_count - if isinstance(resource1.entries[0], Message) and isinstance( - resource2.entries[0], Message - ): - assert ( - resource1.entries[0].id.name - == resource2.entries[0].id.name - ) - - @given( - msg_id=shared_ftl_identifiers(), - var_name=shared_ftl_identifiers(), - prefix=ftl_simple_text(), - suffix=ftl_simple_text(), - ) - @settings(max_examples=500) - def test_variable_interpolation_roundtrip( - self, - msg_id: str, - var_name: str, - prefix: str, - suffix: str, - ) -> None: - """Messages with variable interpolation round-trip.""" - parser = FluentParserV1() - serializer = FluentSerializer() - - ftl_source = ( - f"{msg_id} = {prefix} {{ ${var_name} }} {suffix}" - ) - resource1 = parser.parse(ftl_source) - has_junk = any( - isinstance(e, Junk) for e in resource1.entries - ) - event( - f"outcome={'has_junk' if has_junk else 'roundtrip_clean'}" - ) - - assert not has_junk - - serialized = serializer.serialize(resource1) - resource2 = parser.parse(serialized) - - assert not any( - isinstance(e, Junk) for e in resource2.entries - ) - - -# ============================================================================ -# METAMORPHIC PROPERTIES -# ============================================================================ - - -class TestParserMetamorphicProperties: - """Metamorphic properties: predictable relations between inputs.""" - - @given( - value1=ftl_simple_text(), - value2=ftl_simple_text(), - ) - @settings(max_examples=300) - def test_concatenation_preserves_message_count( - self, value1: str, value2: str - ) -> None: - """Separate messages in one source produce two entries.""" - parser = FluentParserV1() - separate_source = f"m1 = {value1}\nm2 = {value2}" - r1 = parser.parse(separate_source) - - non_junk = [ - e for e in r1.entries if not isinstance(e, Junk) - ] - msg_count = len(non_junk) - event(f"non_junk_count={msg_count}") - assert msg_count == 2 - - @given( - msg_id=shared_ftl_identifiers(), - msg_value=ftl_simple_text(), - newlines=st.integers(min_value=1, max_value=5), - ) - @settings(max_examples=200) - def test_blank_line_count_independence( - self, msg_id: str, msg_value: str, newlines: int - ) -> None: - """Blank lines between messages do not affect parse result.""" - parser = FluentParserV1() - separator = "\n" * newlines - ftl_source = f"m1 = test{separator}{msg_id} = {msg_value}" - - resource = parser.parse(ftl_source) - messages = [ - e for e in resource.entries if isinstance(e, Message) - ] - event(f"separator_newlines={newlines}") - assert len(messages) == 2 - - @given( - msg_id=shared_ftl_identifiers(), - msg_value=ftl_simple_text(), - ) - @settings(max_examples=300) - def test_deterministic_parsing( - self, msg_id: str, msg_value: str - ) -> None: - """Parsing same input twice yields identical results.""" - source = f"{msg_id} = {msg_value}" - parser = FluentParserV1() - result1 = parser.parse(source) - result2 = parser.parse(source) - - assert len(result1.entries) == len(result2.entries) - for e1, e2 in zip( - result1.entries, result2.entries, strict=True - ): - assert isinstance(e1, type(e2)) - event(f"entry_count={len(result1.entries)}") - - -# ============================================================================ -# STRUCTURAL PROPERTIES -# ============================================================================ - - -class TestParserStructuralProperties: - """Properties about AST structure produced by parser.""" - - @given( - msg_id=shared_ftl_identifiers(), - msg_value=ftl_simple_text(), - ) - @settings(max_examples=300) - def test_message_has_required_fields( - self, msg_id: str, msg_value: str - ) -> None: - """Parsed Messages have all required fields set.""" - parser = FluentParserV1() - ftl_source = f"{msg_id} = {msg_value}" - resource = parser.parse(ftl_source) - messages = [ - e for e in resource.entries if isinstance(e, Message) - ] - - assert len(messages) > 0 - msg = messages[0] - assert msg.id is not None - assert msg.id.name == msg_id - assert msg.value is not None - event(f"attribute_count={len(msg.attributes)}") - - @given( - msg_id=shared_ftl_identifiers(), - attr_name=shared_ftl_identifiers(), - attr_value=ftl_simple_text(), - ) - @settings(max_examples=200) - def test_attribute_parsing_structure( - self, msg_id: str, attr_name: str, attr_value: str - ) -> None: - """Messages with attributes parse into correct structure.""" - parser = FluentParserV1() - ftl = f"{msg_id} =\n .{attr_name} = {attr_value}" - resource = parser.parse(ftl) - messages = [ - e for e in resource.entries if isinstance(e, Message) - ] - - has_attr = ( - bool(messages) - and bool(messages[0].attributes) - ) - event( - f"outcome={'has_attr' if has_attr else 'no_attr'}" - ) - if has_attr: - attr = messages[0].attributes[0] - assert attr.id.name == attr_name - - @given( - term_id=shared_ftl_identifiers(), - term_value=ftl_simple_text(), - ) - @settings(max_examples=300) - def test_term_parsing_structure( - self, term_id: str, term_value: str - ) -> None: - """Terms with leading hyphen parse correctly.""" - parser = FluentParserV1() - ftl_source = f"-{term_id} = {term_value}" - resource = parser.parse(ftl_source) - - terms = [ - e for e in resource.entries if isinstance(e, Term) - ] - event(f"term_count={len(terms)}") - assert len(terms) > 0 - assert terms[0].id.name == term_id - - @given( - msg_id=shared_ftl_identifiers(), - nesting_depth=st.integers(min_value=1, max_value=10), - ) - @settings(max_examples=200) - def test_nested_placeable_depth( - self, msg_id: str, nesting_depth: int - ) -> None: - """Parser handles nested placeables up to depth limit.""" - parser = FluentParserV1() - open_braces = "{ " * nesting_depth - close_braces = " }" * nesting_depth - ftl_source = f"{msg_id} = {open_braces}$x{close_braces}" - - resource = parser.parse(ftl_source) - event(f"nesting_depth={nesting_depth}") - assert len(resource.entries) > 0 - - @given(source=st.text(min_size=0, max_size=500)) - @settings(max_examples=2000) - def test_parser_always_returns_resource( - self, source: str - ) -> None: - """Parser handles arbitrary input without crashing.""" - parser = FluentParserV1() - try: - result = parser.parse(source) - assert isinstance(result, Resource) - event(f"entry_count={len(result.entries)}") - except RecursionError: - pass - - @given( - msg_id=shared_ftl_identifiers(), - msg_value=ftl_simple_text(), - leading_ws=st.text(alphabet=" \t", max_size=10), - trailing_ws=st.text(alphabet=" \t", max_size=10), - ) - @settings(max_examples=300) - def test_whitespace_around_message( - self, msg_id: str, msg_value: str, - leading_ws: str, trailing_ws: str, - ) -> None: - """Leading/trailing whitespace does not change message ID.""" - parser = FluentParserV1() - ftl1 = f"{msg_id} = {msg_value}" - ftl2 = ( - f"{leading_ws}{msg_id} = {msg_value}{trailing_ws}" - ) - - resource1 = parser.parse(ftl1) - resource2 = parser.parse(ftl2) - - msgs1 = [ - e for e in resource1.entries - if isinstance(e, Message) - ] - msgs2 = [ - e for e in resource2.entries - if isinstance(e, Message) - ] - ws_type = "mixed" if leading_ws and trailing_ws else "one" - event(f"whitespace_padding={ws_type}") - if msgs1 and msgs2: - assert msgs1[0].id.name == msgs2[0].id.name - - -# ============================================================================ -# MALFORMED INPUT PROPERTIES -# ============================================================================ - - -@st.composite -def malformed_placeable(draw: st.DrawFn) -> str: - """Generate placeables with strategic syntax errors.""" - corruptions = [ - "{", # Missing content - "{ ", # Space but no content - "{ $", # Incomplete variable - "{ $v", # Incomplete variable name - "{ $var", # Missing closing } - '{ "', # Incomplete string literal - '{ "text', # Unterminated string - "{ -", # Incomplete term ref - "{ -t", # Incomplete term name - "{ -term", # Missing closing } - "{ 1.", # Malformed number - "{ 1.2.", # Invalid number format - "{ FUNC", # Missing parentheses - "{ FUNC(", # Incomplete function - "{ FUNC($", # Incomplete arg - "{ msg.", # Missing attr name - "{ msg.@", # Invalid attr name - "{ $x ->", # Incomplete select - "{ $x -> [", # Incomplete variant - "{ $x -> [a]", # Missing pattern - ] - return draw(st.sampled_from(corruptions)) - - -@st.composite -def malformed_function_call(draw: st.DrawFn) -> str: - """Generate function calls with strategic syntax errors.""" - func_name = draw( - st.sampled_from(["FUNC", "NUMBER", "DATETIME"]) - ) - corruptions = [ - f"{func_name}", - f"{func_name}(", - f"{func_name}($", - f"{func_name}($v", - f"{func_name}(1.2.", - f'{{ {func_name}("', - f"{func_name}(@", - f"{func_name}(a:", - f"{func_name}(a: )", - f"{func_name}(123: x)", - f"{func_name}(a: 1, a: 2)", - f"{func_name}(x: 1, 2)", - ] - return draw(st.sampled_from(corruptions)) - - -@st.composite -def malformed_select_expression(draw: st.DrawFn) -> str: - """Generate select expressions with strategic errors.""" - var = draw(st.sampled_from(["$x", "$count", "$num"])) - corruptions = [ - f"{{ {var} ->", - f"{{ {var} -> [", - f"{{ {var} -> [@", - f"{{ {var} -> [a]", - f"{{ {var} -> [a] Text", - f"{{ {var} -> [a] {{ msg.", - f"{{ {var} -> [one] X *[other] Y", - ] - return draw(st.sampled_from(corruptions)) - - -@st.composite -def malformed_term_input(draw: st.DrawFn) -> str: - """Generate terms with strategic syntax errors.""" - corruptions = [ - "-", - "- ", - "-@invalid", - "-term", - "-term =", - "-term = val\n .", - "-term = val\n .@", - ] - return draw(st.sampled_from(corruptions)) - - -@st.composite -def malformed_term_reference(draw: st.DrawFn) -> str: - """Generate term references with strategic errors.""" - corruptions = [ - "{ -", - "{ - }", - "{ -@bad }", - "{ -term(", - "{ -term(x", - "{ -term.", - ] - return draw(st.sampled_from(corruptions)) - - -@st.composite -def malformed_attribute(draw: st.DrawFn) -> str: - """Generate attributes with strategic errors.""" - corruptions = [ - " .", - " .@", - " . = val", - " .attr", - " .attr =", - ] - return draw(st.sampled_from(corruptions)) - - -class TestMalformedPlaceables: - """Parser handles malformed placeables without crashing.""" - - @given( - msg_id=shared_ftl_identifiers(), - placeable=malformed_placeable(), - ) - @settings(max_examples=100, deadline=None) - @example(msg_id="key", placeable="{ msg.") - @example(msg_id="key", placeable='{ "') - @example(msg_id="key", placeable="{ 1.2.") - def test_malformed_placeables( - self, msg_id: str, placeable: str - ) -> None: - """Parser recovers from malformed placeables.""" - source = f"{msg_id} = {placeable}" - event(f"placeable_len={len(placeable)}") - parser = FluentParserV1() - - try: - resource = parser.parse(source) - assert resource is not None - except RecursionError: - assume(False) - - -class TestMalformedFunctionCalls: - """Parser handles malformed function calls gracefully.""" - - @given( - msg_id=shared_ftl_identifiers(), - func_call=malformed_function_call(), - ) - @settings(max_examples=80, deadline=None) - @example(msg_id="key", func_call="FUNC($") - @example(msg_id="key", func_call="FUNC(1.2.") - @example(msg_id="key", func_call='{ FUNC("') - @example(msg_id="key", func_call="FUNC(@bad)") - @example(msg_id="key", func_call="FUNC(a:)") - @example(msg_id="key", func_call="FUNC") - def test_malformed_function_calls( - self, msg_id: str, func_call: str - ) -> None: - """Parser recovers from malformed function calls.""" - source = f"{msg_id} = {{ {func_call} }}" - event(f"func_call_len={len(func_call)}") - parser = FluentParserV1() - resource = parser.parse(source) - assert resource is not None - - -class TestMalformedSelectExpressions: - """Parser handles malformed select expressions.""" - - @given( - msg_id=shared_ftl_identifiers(), - select=malformed_select_expression(), - ) - @settings(max_examples=50, deadline=None) - @example(msg_id="key", select="{ $x -> [@") - @example(msg_id="key", select="{ $x -> [a] Text") - @example( - msg_id="key", - select="{ $x -> [a] { msg.", - ) - def test_malformed_select_expressions( - self, msg_id: str, select: str - ) -> None: - """Parser recovers from malformed selects.""" - source = f"{msg_id} = {select}" - event(f"select_len={len(select)}") - parser = FluentParserV1() - resource = parser.parse(source) - assert resource is not None - - -class TestMalformedTerms: - """Parser handles malformed terms and term references.""" - - @given(term_def=malformed_term_input()) - @settings(max_examples=40, deadline=None) - @example(term_def="-@invalid") - @example(term_def="-term = val\n .") - def test_malformed_term_definitions( - self, term_def: str - ) -> None: - """Parser recovers from malformed term definitions.""" - event(f"input_len={len(term_def)}") - parser = FluentParserV1() - resource = parser.parse(term_def) - assert resource is not None - - @given( - msg_id=shared_ftl_identifiers(), - term_ref=malformed_term_reference(), - ) - @settings(max_examples=40, deadline=None) - @example(msg_id="key", term_ref="{ -") - @example(msg_id="key", term_ref="{ -term(") - def test_malformed_term_references( - self, msg_id: str, term_ref: str - ) -> None: - """Parser recovers from malformed term references.""" - source = f"{msg_id} = {term_ref}" - event(f"term_ref_len={len(term_ref)}") - parser = FluentParserV1() - resource = parser.parse(source) - assert resource is not None - - -class TestMalformedAttributes: - """Parser handles malformed attributes.""" - - @given( - msg_id=shared_ftl_identifiers(), - attr_line=malformed_attribute(), - ) - @settings(max_examples=30, deadline=None) - @example(msg_id="key", attr_line=" .") - def test_malformed_attributes( - self, msg_id: str, attr_line: str - ) -> None: - """Parser recovers from malformed attributes.""" - source = f"{msg_id} = value\n{attr_line}" - event(f"attr_line_len={len(attr_line)}") - parser = FluentParserV1() - resource = parser.parse(source) - assert resource is not None - - -class TestSpecialCharacterSequences: - """Parser handles arbitrary special character sequences.""" - - @given( - text=st.text( - alphabet="{}$-.[]*\n\r\t ", - min_size=1, - max_size=30, - ) - ) - @settings(max_examples=200, deadline=None) - def test_arbitrary_special_char_sequences( - self, text: str - ) -> None: - """Parser never crashes on special FTL character combos.""" - assume(text.strip()) - parser = FluentParserV1() - try: - resource = parser.parse(text) - assert resource is not None - event(f"entry_count={len(resource.entries)}") - except RecursionError: - assume(False) - - @given( - msg_id=shared_ftl_identifiers(), - value=st.text( - alphabet=( - "abcdefghijklmnopqrstuvwxyz" - "ABCDEFGHIJKLMNOPQRSTUVWXYZ" - "0123456789{}$-. " - ), - min_size=1, - max_size=40, - ), - ) - @settings(max_examples=150, deadline=None) - def test_complex_value_patterns( - self, msg_id: str, value: str - ) -> None: - """Parser handles complex patterns in values.""" - source = f"{msg_id} = {value}" - parser = FluentParserV1() - try: - resource = parser.parse(source) - assert resource is not None - has_junk = any( - isinstance(e, Junk) for e in resource.entries - ) - event( - f"outcome={'junk' if has_junk else 'clean'}" - ) - except RecursionError: - assume(False) - - @given( - ftl_source=st.text(min_size=0, max_size=100) - ) - @settings(max_examples=300, deadline=None) - def test_universal_crash_resistance( - self, ftl_source: str - ) -> None: - """Parser never crashes on any input.""" - parser = FluentParserV1() - try: - resource = parser.parse(ftl_source) - assert resource is not None - assert hasattr(resource, "entries") - event(f"entry_count={len(resource.entries)}") - except RecursionError: - assume(False) - - @given( - msg_id=shared_ftl_identifiers(), - placeable_content=st.text( - alphabet=( - "abcdefghijklmnopqrstuvwxyz" - "ABCDEFGHIJKLMNOPQRSTUVWXYZ" - "0123456789$-. " - ), - min_size=0, - max_size=20, - ), - ) - @settings(max_examples=100, deadline=None) - def test_deterministic_placeable_parsing( - self, msg_id: str, placeable_content: str - ) -> None: - """Parsing same placeable twice gives same result.""" - source = f"{msg_id} = {{ {placeable_content} }}" - parser1 = FluentParserV1() - parser2 = FluentParserV1() - try: - result1 = parser1.parse(source) - result2 = parser2.parse(source) - assert len(result1.entries) == len(result2.entries) - for e1, e2 in zip( - result1.entries, result2.entries, strict=True - ): - assert isinstance(e1, type(e2)) - event(f"entry_count={len(result1.entries)}") - except RecursionError: - assume(False) +from tests.syntax_parser_property_cases.core import * # noqa: F403 - re-export split parser property tests +from tests.syntax_parser_property_cases.grammar_boundaries import * # noqa: F403 - re-export split parser property tests +from tests.syntax_parser_property_cases.roundtrip_and_malformed import * # noqa: F403 - re-export split parser property tests +from tests.syntax_parser_property_cases.syntax_elements import * # noqa: F403 - re-export split parser property tests diff --git a/tests/test_syntax_serializer.py b/tests/test_syntax_serializer.py deleted file mode 100644 index 0ba1a254..00000000 --- a/tests/test_syntax_serializer.py +++ /dev/null @@ -1,3039 +0,0 @@ -"""Tests for syntax.serializer: FluentSerializer, serialize(), edge cases, internal helpers. - -Validates serialization of AST nodes back to FTL syntax, including control character -escaping, depth limits, junk entries, multiline patterns, and classify/escape internals. -""" - -from __future__ import annotations - -import pytest -from hypothesis import assume, event, example, given -from hypothesis import strategies as st - -from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError -from ftllexengine.enums import CommentType -from ftllexengine.syntax import serialize -from ftllexengine.syntax.ast import ( - Annotation, - Attribute, - CallArguments, - Comment, - FunctionReference, - Identifier, - Junk, - Message, - MessageReference, - NamedArgument, - NumberLiteral, - Pattern, - Placeable, - Resource, - SelectExpression, - StringLiteral, - Term, - TermReference, - TextElement, - VariableReference, - Variant, -) -from ftllexengine.syntax.parser import FluentParserV1 -from ftllexengine.syntax.serializer import ( - FluentSerializer, - SerializationDepthError, - SerializationValidationError, - _classify_line, - _escape_text, - _LineKind, # Private import for internal unit tests - _validate_resource, -) - -# ============================================================================ -# BASIC SERIALIZATION TESTS -# ============================================================================ - - -class TestSerializerBasic: - """Test basic serializer functionality.""" - - def test_serialize_empty_resource(self) -> None: - """Serialize empty resource.""" - resource = Resource(entries=()) - - result = serialize(resource) - - assert result == "" - - def test_serializer_class_directly(self) -> None: - """Use FluentSerializer class directly.""" - serializer = FluentSerializer() - resource = Resource(entries=()) - - result = serializer.serialize(resource) - - assert result == "" - - -# ============================================================================ -# MESSAGE SERIALIZATION -# ============================================================================ - - -class TestSerializerMessage: - """Test message serialization.""" - - def test_serialize_simple_message(self) -> None: - """Serialize message with text only.""" - msg = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="Hello, World!"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert result == "hello = Hello, World!\n" - - def test_serialize_message_with_variable(self) -> None: - """Serialize message with variable interpolation.""" - msg = Message( - id=Identifier(name="greeting"), - value=Pattern( - elements=( - TextElement(value="Hello, "), - Placeable(expression=VariableReference(id=Identifier(name="name"))), - TextElement(value="!"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert result == "greeting = Hello, { $name }!\n" - - def test_serialize_message_without_value(self) -> None: - """Serialize message without value (only attributes).""" - msg = Message( - id=Identifier(name="test"), - value=None, - attributes=( - Attribute( - id=Identifier(name="attr"), - value=Pattern(elements=(TextElement(value="Value"),)), - ), - ), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "test" in result - assert ".attr = Value" in result - - def test_serialize_message_with_comment(self) -> None: - """Serialize message with associated comment.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - comment=Comment(content="This is a comment", type=CommentType.COMMENT), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "# This is a comment\n" in result - assert "test = Test\n" in result - - def test_serialize_message_with_attributes(self) -> None: - """Serialize message with attributes.""" - msg = Message( - id=Identifier(name="button"), - value=Pattern(elements=(TextElement(value="Save"),)), - attributes=( - Attribute( - id=Identifier(name="tooltip"), - value=Pattern(elements=(TextElement(value="Click to save"),)), - ), - Attribute( - id=Identifier(name="aria-label"), - value=Pattern(elements=(TextElement(value="Save button"),)), - ), - ), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "button = Save\n" in result - assert " .tooltip = Click to save\n" in result - assert " .aria-label = Save button\n" in result - - def test_serialize_multiple_messages(self) -> None: - """Serialize multiple messages with blank line separation.""" - resource = Resource( - entries=( - Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="Hello"),)), - attributes=(), - ), - Message( - id=Identifier(name="goodbye"), - value=Pattern(elements=(TextElement(value="Goodbye"),)), - attributes=(), - ), - ) - ) - - result = serialize(resource) - - assert "hello = Hello\n" in result - assert "goodbye = Goodbye\n" in result - - -# ============================================================================ -# TERM SERIALIZATION -# ============================================================================ - - -class TestSerializerTerm: - """Test term serialization.""" - - def test_serialize_simple_term(self) -> None: - """Serialize simple term.""" - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - ) - resource = Resource(entries=(term,)) - - result = serialize(resource) - - assert result == "-brand = Firefox\n" - - def test_serialize_term_with_attributes(self) -> None: - """Serialize term with attributes.""" - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=( - Attribute( - id=Identifier(name="version"), - value=Pattern(elements=(TextElement(value="120"),)), - ), - ), - ) - resource = Resource(entries=(term,)) - - result = serialize(resource) - - assert "-brand = Firefox\n" in result - assert " .version = 120\n" in result - - def test_serialize_term_with_comment(self) -> None: - """Serialize term with comment.""" - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - comment=Comment(content="Brand name", type=CommentType.COMMENT), - ) - resource = Resource(entries=(term,)) - - result = serialize(resource) - - assert "# Brand name\n" in result - assert "-brand = Firefox\n" in result - - -# ============================================================================ -# COMMENT AND JUNK SERIALIZATION -# ============================================================================ - - -class TestSerializerCommentJunk: - """Test comment and junk serialization.""" - - def test_serialize_standalone_comment(self) -> None: - """Serialize standalone comment.""" - comment = Comment(content="This is a comment", type=CommentType.COMMENT) - resource = Resource(entries=(comment,)) - - result = serialize(resource) - - assert result == "# This is a comment\n" - - def test_serialize_group_comment(self) -> None: - """Serialize group comment (##).""" - comment = Comment(content="Group comment", type=CommentType.GROUP) - resource = Resource(entries=(comment,)) - - result = serialize(resource) - - assert result == "## Group comment\n" - - def test_serialize_resource_comment(self) -> None: - """Serialize resource comment (###).""" - comment = Comment(content="Resource comment", type=CommentType.RESOURCE) - resource = Resource(entries=(comment,)) - - result = serialize(resource) - - assert result == "### Resource comment\n" - - def test_serialize_multiline_comment(self) -> None: - """Serialize multi-line comment.""" - comment = Comment(content="Line 1\nLine 2\nLine 3", type=CommentType.COMMENT) - resource = Resource(entries=(comment,)) - - result = serialize(resource) - - assert "# Line 1\n# Line 2\n# Line 3\n" in result - - def test_serialize_multiline_comment_with_empty_lines(self) -> None: - """Serialize comment with empty lines (no trailing space on empty lines).""" - comment = Comment(content="Line 1\n\nLine 3", type=CommentType.COMMENT) - resource = Resource(entries=(comment,)) - - result = serialize(resource) - - # Empty line should not have trailing space - "# \n" is wrong, "#\n" is correct - assert "# Line 1\n#\n# Line 3\n" in result - assert "# \n" not in result # No trailing space on empty comment lines - - def test_serialize_comment_only_empty_lines(self) -> None: - """Serialize comment that is only empty lines.""" - comment = Comment(content="\n\n", type=CommentType.COMMENT) - resource = Resource(entries=(comment,)) - - result = serialize(resource) - - # All lines should be just "#\n" without trailing space - assert result == "#\n#\n#\n" - assert "# \n" not in result - - def test_serialize_junk(self) -> None: - """Serialize junk entry.""" - junk = Junk(content="invalid { syntax") - resource = Resource(entries=(junk,)) - - result = serialize(resource) - - assert result == "invalid { syntax\n" - - -# ============================================================================ -# EXPRESSION SERIALIZATION -# ============================================================================ - - -class TestSerializerExpressions: - """Test expression serialization.""" - - def test_serialize_string_literal(self) -> None: - """Serialize string literal in placeable.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(Placeable(expression=StringLiteral(value="test value")),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert '{ "test value" }' in result - - def test_serialize_string_literal_with_escapes(self) -> None: - """Serialize string literal with escape characters.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value='quote: " backslash: \\')), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert r'{ "quote: \" backslash: \\" }' in result - - def test_serialize_number_literal(self) -> None: - """Serialize number literal.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=NumberLiteral(value=42, raw="42")),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ 42 }" in result - - def test_serialize_variable_reference(self) -> None: - """Serialize variable reference.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=VariableReference(id=Identifier(name="count"))), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ $count }" in result - - -# ============================================================================ -# REFERENCE SERIALIZATION -# ============================================================================ - - -class TestSerializerReferences: - """Test reference serialization.""" - - def test_serialize_message_reference_simple(self) -> None: - """Serialize message reference without attribute.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=MessageReference( - id=Identifier(name="other"), attribute=None - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ other }" in result - - def test_serialize_message_reference_with_attribute(self) -> None: - """Serialize message reference with attribute.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=MessageReference( - id=Identifier(name="button"), - attribute=Identifier(name="tooltip"), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ button.tooltip }" in result - - def test_serialize_term_reference_simple(self) -> None: - """Serialize term reference without attribute.""" - msg = Message( - id=Identifier(name="welcome"), - value=Pattern( - elements=( - TextElement(value="Welcome to "), - Placeable( - expression=TermReference( - id=Identifier(name="brand"), attribute=None, arguments=None - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ -brand }" in result - - def test_serialize_term_reference_with_attribute(self) -> None: - """Serialize term reference with attribute.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier(name="brand"), - attribute=Identifier(name="version"), - arguments=None, - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ -brand.version }" in result - - def test_serialize_term_reference_with_arguments(self) -> None: - """Serialize term reference with call arguments.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier(name="brand"), - attribute=None, - arguments=CallArguments( - positional=(NumberLiteral(value=1, raw="1"),), named=() - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ -brand(1) }" in result - - -# ============================================================================ -# FUNCTION REFERENCE SERIALIZATION -# ============================================================================ - - -class TestSerializerFunctionReference: - """Test function reference serialization.""" - - def test_serialize_function_no_args(self) -> None: - """Serialize function call with no arguments.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="NOW"), - arguments=CallArguments(positional=(), named=()), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ NOW() }" in result - - def test_serialize_function_with_positional_args(self) -> None: - """Serialize function with positional arguments.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=( - VariableReference(id=Identifier(name="value")), - ), - named=(), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ NUMBER($value) }" in result - - def test_serialize_function_with_multiple_positional_args(self) -> None: - """Serialize function with multiple positional arguments.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="TEST"), - arguments=CallArguments( - positional=( - NumberLiteral(value=1, raw="1"), - NumberLiteral(value=2, raw="2"), - StringLiteral(value="three"), - ), - named=(), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert '{ TEST(1, 2, "three") }' in result - - def test_serialize_function_with_named_args(self) -> None: - """Serialize function with named arguments.""" - msg = Message( - id=Identifier(name="price"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="minimumFractionDigits"), - value=NumberLiteral(value=2, raw="2"), - ), - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ NUMBER(minimumFractionDigits: 2) }" in result - - def test_serialize_function_with_mixed_args(self) -> None: - """Serialize function with both positional and named arguments.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="DATETIME"), - arguments=CallArguments( - positional=( - VariableReference(id=Identifier(name="date")), - ), - named=( - NamedArgument( - name=Identifier(name="dateStyle"), - value=StringLiteral(value="long"), - ), - NamedArgument( - name=Identifier(name="timeStyle"), - value=StringLiteral(value="short"), - ), - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ DATETIME($date, dateStyle: " in result - assert 'timeStyle: "short") }' in result - - -# ============================================================================ -# SELECT EXPRESSION SERIALIZATION -# ============================================================================ - - -class TestSerializerSelectExpression: - """Test select expression serialization.""" - - def test_serialize_simple_select(self) -> None: - """Serialize select expression with variants.""" - msg = Message( - id=Identifier(name="emails"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=Identifier(name="one"), - value=Pattern( - elements=(TextElement(value="one email"),) - ), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=(TextElement(value="many emails"),) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "{ $count ->" in result - assert "[one] one email" in result - assert "*[other] many emails" in result - - def test_serialize_select_with_number_keys(self) -> None: - """Serialize select expression with numeric variant keys.""" - msg = Message( - id=Identifier(name="items"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern( - elements=(TextElement(value="no items"),) - ), - default=False, - ), - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern( - elements=(TextElement(value="one item"),) - ), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=(TextElement(value="many items"),) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "[0] no items" in result - assert "[1] one item" in result - assert "*[other] many items" in result - - -# ============================================================================ -# COMPLEX INTEGRATION TESTS -# ============================================================================ - - -class TestSerializerIntegration: - """Test serializer with complex AST structures.""" - - def test_serialize_mixed_resource(self) -> None: - """Serialize resource with comments, messages, and terms.""" - resource = Resource( - entries=( - Comment(content="Header comment", type=CommentType.COMMENT), - Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="Hello"),)), - attributes=(), - ), - Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - ), - ) - ) - - result = serialize(resource) - - assert "# Header comment\n" in result - assert "hello = Hello\n" in result - assert "-brand = Firefox\n" in result - - def test_serialize_complex_message_with_select(self) -> None: - """Serialize message with select expression and variables.""" - msg = Message( - id=Identifier(name="user-files"), - value=Pattern( - elements=( - TextElement(value="User "), - Placeable(expression=VariableReference(id=Identifier(name="name"))), - TextElement(value=" has "), - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=Identifier(name="one"), - value=Pattern( - elements=(TextElement(value="one file"),) - ), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=( - Placeable( - expression=VariableReference( - id=Identifier(name="count") - ) - ), - TextElement(value=" files"), - ) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "User { $name } has { $count ->" in result - assert "[one] one file" in result - assert "*[other] { $count } files" in result - - def test_serialize_message_with_all_features(self) -> None: - """Serialize message using all features.""" - msg = Message( - id=Identifier(name="complex"), - value=Pattern( - elements=( - TextElement(value="Price: "), - Placeable( - expression=FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=( - VariableReference(id=Identifier(name="price")), - ), - named=( - NamedArgument( - name=Identifier(name="minimumFractionDigits"), - value=NumberLiteral(value=2, raw="2"), - ), - ), - ), - ) - ), - ) - ), - attributes=( - Attribute( - id=Identifier(name="tooltip"), - value=Pattern(elements=(TextElement(value="Product price"),)), - ), - ), - comment=Comment(content="Displays formatted price", type=CommentType.COMMENT), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "# Displays formatted price\n" in result - assert "complex = Price: { NUMBER($price, minimumFractionDigits: 2) }\n" in result - assert " .tooltip = Product price\n" in result - - -# ============================================================================ -# TEXT ELEMENT BRACE SERIALIZATION TESTS -# ============================================================================ - - -class TestTextElementBraceSerialization: - """Test that literal braces in TextElements are serialized per Fluent Spec 1.0. - - Per Fluent Spec: Backslash has no escaping power in TextElements. - Literal braces MUST be expressed as StringLiterals within Placeables: - - { must be serialized as {"{"} (Placeable containing StringLiteral) - - } must be serialized as {"}"} (Placeable containing StringLiteral) - - This produces valid FTL that compliant parsers accept. - """ - - def test_open_brace_becomes_string_literal_placeable(self) -> None: - """Open brace { in text becomes {"{"} per Fluent spec.""" - msg = Message( - id=Identifier(name="brace"), - value=Pattern(elements=(TextElement(value="Use {variable} syntax"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # Braces become StringLiteral Placeables: { "{" }variable{ "}" } - assert 'brace = Use { "{" }variable{ "}" } syntax\n' in result - - def test_close_brace_becomes_string_literal_placeable(self) -> None: - """Close brace } in text becomes {"}"} per Fluent spec.""" - msg = Message( - id=Identifier(name="json"), - value=Pattern(elements=(TextElement(value='{"key": "value"}'),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # Both { and } become StringLiteral Placeables - assert '{ "{" }' in result - assert '{ "}" }' in result - # Full pattern: { "{" }"key": "value"{ "}" } - assert 'json = { "{" }"key": "value"{ "}" }\n' in result - - def test_backslash_not_escaped_in_text_elements(self) -> None: - """Backslash has no special meaning in TextElements per Fluent spec. - - Per spec: backslash only has escaping power in StringLiterals, - not in TextElements. A backslash in text is preserved as-is. - """ - msg = Message( - id=Identifier(name="path"), - value=Pattern(elements=(TextElement(value="C:\\Users\\file.txt"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # Backslash preserved as-is (no escaping in TextElements) - assert "path = C:\\Users\\file.txt\n" in result - - def test_backslash_before_brace_preserved(self) -> None: - """Backslash before brace: backslash preserved, brace becomes placeable.""" - msg = Message( - id=Identifier(name="escaped"), - value=Pattern(elements=(TextElement(value="Literal \\{ brace"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # Backslash preserved, brace becomes StringLiteral Placeable - assert 'escaped = Literal \\{ "{" } brace\n' in result - - def test_preserve_text_without_braces(self) -> None: - """Text without braces should not be modified.""" - msg = Message( - id=Identifier(name="plain"), - value=Pattern(elements=(TextElement(value="Hello, World!"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert "plain = Hello, World!\n" in result - - def test_mixed_text_and_placeables(self) -> None: - """Text with literal braces alongside real placeables.""" - msg = Message( - id=Identifier(name="mixed"), - value=Pattern( - elements=( - TextElement(value="JSON: {key} = "), - Placeable(expression=VariableReference(id=Identifier(name="value"))), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # Literal braces become StringLiteral Placeables, real placeable unchanged - assert 'mixed = JSON: { "{" }key{ "}" } = { $value }\n' in result - - def test_multiple_consecutive_braces(self) -> None: - """Multiple consecutive braces each become separate placeables.""" - msg = Message( - id=Identifier(name="multi"), - value=Pattern(elements=(TextElement(value="{{nested}}"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # Each brace becomes its own placeable - assert 'multi = { "{" }{ "{" }' in result - assert '{ "}" }{ "}" }' in result - - def test_brace_at_start_of_text(self) -> None: - """Brace at start of text element.""" - msg = Message( - id=Identifier(name="start"), - value=Pattern(elements=(TextElement(value="{start"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert 'start = { "{" }start\n' in result - - def test_brace_at_end_of_text(self) -> None: - """Brace at end of text element.""" - msg = Message( - id=Identifier(name="end"), - value=Pattern(elements=(TextElement(value="end}"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert 'end = end{ "}" }\n' in result - - def test_only_braces(self) -> None: - """Text containing only braces.""" - msg = Message( - id=Identifier(name="braces"), - value=Pattern(elements=(TextElement(value="{}"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - assert 'braces = { "{" }{ "}" }\n' in result - - -# ============================================================================ -# IDENTIFIER VALIDATION TESTS -# ============================================================================ - - -class TestIdentifierValidation: - """Test identifier validation during serialization.""" - - def test_invalid_message_id_rejected(self) -> None: - """Invalid message identifier rejected when validate=True. - - Regression test for SER-INVALID-OUTPUT-001. - Parser-produced ASTs have valid identifiers, but programmatically - constructed ASTs can contain arbitrary strings. Serializer should - validate identifiers when validate=True. - """ - msg = Message( - id=Identifier(name="invalid message with spaces"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - with pytest.raises(SerializationValidationError, match="Invalid identifier"): - serialize(resource, validate=True) - - def test_invalid_variable_reference_rejected(self) -> None: - """Invalid variable identifier rejected when validate=True.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=VariableReference( - id=Identifier(name="my var") # Space invalid - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - with pytest.raises(SerializationValidationError, match="Invalid identifier"): - serialize(resource, validate=True) - - def test_invalid_identifier_allowed_when_validation_disabled(self) -> None: - """Invalid identifier allowed when validate=False.""" - msg = Message( - id=Identifier(name="invalid id"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # Should not raise exception - result = serialize(resource, validate=False) - assert "invalid id" in result - - def test_valid_identifier_with_hyphens_and_underscores(self) -> None: - """Valid identifiers with hyphens and underscores pass validation.""" - msg = Message( - id=Identifier(name="valid-id_123"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - assert "valid-id_123" in result - - -# ============================================================================ -# EDGE CASES AND INTERNAL HELPERS (from test_serializer_edge_cases.py) -# ============================================================================ - - -class TestControlCharacterEscaping: - """Test StringLiteral escaping of all control characters.""" - - def test_del_character_escaped_as_unicode(self) -> None: - """DEL character (0x7F) serialized as \\u007F escape sequence.""" - # DEL is a control character that needs Unicode escaping - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value="before\x7Fafter")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - # DEL must be escaped as \u007F - assert r"\u007F" in result - assert "before" in result - assert "after" in result - - def test_nul_character_escaped(self) -> None: - """NUL character (0x00) serialized as \\u0000 escape sequence.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value="a\x00b")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - assert r"\u0000" in result - - def test_bel_character_escaped(self) -> None: - """BEL character (0x07) serialized as \\u0007 escape sequence.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value="ring\x07bell")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - assert r"\u0007" in result - - def test_vertical_tab_escaped(self) -> None: - """Vertical tab (0x0B) serialized as \\u000B escape sequence.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value="a\x0Bb")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - assert r"\u000B" in result - - def test_form_feed_escaped(self) -> None: - """Form feed (0x0C) serialized as \\u000C escape sequence.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value="page\x0Cbreak")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - assert r"\u000C" in result - - def test_escape_character_escaped(self) -> None: - """ESC character (0x1B) serialized as \\u001B escape sequence.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value="before\x1Bafter")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - assert r"\u001B" in result - - @given( - control_char=st.one_of( - st.integers(min_value=0x00, max_value=0x1F), # C0 control characters - st.just(0x7F), # DEL - ) - ) - @example(control_char=0x7F) # Ensure DEL is explicitly tested - @example(control_char=0x00) # NUL - @example(control_char=0x01) # SOH - @example(control_char=0x1F) # Unit separator - def test_all_control_characters_escaped_property(self, control_char: int) -> None: - """All control characters (0x00-0x1F, 0x7F) escaped as Unicode.""" - is_del = control_char == 0x7F - event(f"control_char=0x{control_char:02X}") - event(f"is_del={is_del}") - char = chr(control_char) - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable(expression=StringLiteral(value=f"a{char}b")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - # Verify Unicode escape present - expected_escape = f"\\u{control_char:04X}" - assert expected_escape in result - - # Verify the raw control character is NOT in the output - # (it should be escaped) - # Exception: newline/tab which might be normalized by string handling - if char not in "\n\t": - assert char not in result - - -class TestSerializationDepthLimitWithoutValidation: - """Test depth limit enforcement during serialization when validation is disabled. - - Per serializer.py lines 297-299, the serialize method has a try/except - that catches DepthLimitExceededError during the _serialize_resource call. - This is distinct from the validation phase depth check. - - To trigger this: - 1. Disable validation (validate=False) - 2. Create AST with nesting that exceeds max_depth - 3. Depth guard triggers during serialization, not validation - """ - - def test_depth_exceeded_during_serialization_not_validation(self) -> None: - """Depth limit enforced during serialization even when validation disabled.""" - # Create deeply nested Placeables beyond the limit - # Start with innermost expression - max_depth = 5 - inner_expr: StringLiteral | Placeable = StringLiteral(value="deep") - - # Build nested Placeables: each Placeable adds one depth level - for _ in range(max_depth + 1): # Exceed limit by 1 - inner_expr = Placeable(expression=inner_expr) - - # Type narrowing: at this point inner_expr is definitely a Placeable - inner_placeable: Placeable = inner_expr # type: ignore[assignment] - - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(inner_placeable,)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # Validation is disabled - should still catch depth during serialization - with pytest.raises(SerializationDepthError, match="nesting exceeds maximum depth"): - serialize(resource, validate=False, max_depth=max_depth) - - def test_depth_exactly_at_limit_succeeds_without_validation(self) -> None: - """AST exactly at depth limit serializes successfully without validation.""" - max_depth = 5 - inner_expr: StringLiteral | Placeable = StringLiteral(value="ok") - - # Build nested Placeables exactly at limit - for _ in range(max_depth): - inner_expr = Placeable(expression=inner_expr) - - # Type narrowing: at this point inner_expr is definitely a Placeable - inner_placeable: Placeable = inner_expr # type: ignore[assignment] - - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(inner_placeable,)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # Should succeed - exactly at limit - result = serialize(resource, validate=False, max_depth=max_depth) - assert "ok" in result - - @given( - depth_over_limit=st.integers(min_value=1, max_value=10), - max_depth=st.integers(min_value=3, max_value=20), - ) - @example(depth_over_limit=1, max_depth=5) - @example(depth_over_limit=5, max_depth=10) - def test_serialization_depth_property( - self, depth_over_limit: int, max_depth: int - ) -> None: - """Serialization depth limit enforced regardless of validation setting.""" - total = max_depth + depth_over_limit - event(f"max_depth={max_depth}") - event(f"depth_over_limit={depth_over_limit}") - event(f"total_nesting={total}") - # Build AST exceeding depth limit - inner_expr: StringLiteral | Placeable = StringLiteral(value="x") - for _ in range(max_depth + depth_over_limit): - inner_expr = Placeable(expression=inner_expr) - - # Type narrowing: at this point inner_expr is definitely a Placeable - inner_placeable: Placeable = inner_expr # type: ignore[assignment] - - msg = Message( - id=Identifier(name="m"), - value=Pattern(elements=(inner_placeable,)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # Should raise SerializationDepthError - with pytest.raises(SerializationDepthError): - serialize(resource, validate=False, max_depth=max_depth) - - -class TestJunkWithLeadingWhitespace: - """Test Junk entry serialization with leading whitespace. - - Per serializer.py line 321, when a Junk entry follows another entry - and the Junk content starts with whitespace, the separator logic takes - a different path (pass statement, no additional separator added). - - This tests the specific branch: isinstance(entry, Junk) and entry.content[0] in "\\n " - """ - - def test_junk_with_leading_newline_after_message(self) -> None: - """Junk with leading newline after message skips adding separator.""" - msg = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="World"),)), - attributes=(), - ) - # Junk with leading newline - parser includes preceding whitespace - junk = Junk(content="\ninvalid junk content") - resource = Resource(entries=(msg, junk)) - - result = serialize(resource) - - # Should not have double newline - Junk content already starts with \n - # Result should be: "hello = World\n\ninvalid junk content\n" - # But since Junk already has \n, we don't add another separator - assert "hello = World\n" in result - assert "\ninvalid junk content" in result - # Should NOT have triple newline - assert "\n\n\n" not in result - - def test_junk_with_leading_space_after_message(self) -> None: - """Junk with leading space after message skips adding separator.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="value"),)), - attributes=(), - ) - # Junk with leading space - junk = Junk(content=" some junk") - resource = Resource(entries=(msg, junk)) - - result = serialize(resource) - - # Junk already has leading space, so separator is skipped - assert "test = value\n some junk" in result - - def test_junk_without_leading_whitespace_gets_separator(self) -> None: - """Junk without leading whitespace gets normal separator.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="value"),)), - attributes=(), - ) - # Junk WITHOUT leading whitespace - junk = Junk(content="junk content") - resource = Resource(entries=(msg, junk)) - - result = serialize(resource) - - # Normal separator added - assert "test = value\n" in result - assert "\njunk content" in result - - def test_empty_junk_content_gets_separator(self) -> None: - """Empty Junk content gets normal separator (no [0] index access).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="value"),)), - attributes=(), - ) - # Empty junk - entry.content[0] won't be accessed due to short-circuit - junk = Junk(content="") - resource = Resource(entries=(msg, junk)) - - result = serialize(resource) - - # Empty junk still gets separator - assert "test = value\n" in result - - @given( - leading_char=st.sampled_from(["\n", " ", "\t", "j"]), - has_content=st.booleans(), - ) - @example(leading_char="\n", has_content=True) - @example(leading_char=" ", has_content=True) - @example(leading_char="j", has_content=True) - def test_junk_separator_logic_property( - self, leading_char: str, has_content: bool - ) -> None: - """Junk separator logic handles various leading characters correctly.""" - is_ws = leading_char in ("\n", " ", "\t") - event(f"leading_char_is_whitespace={is_ws}") - event(f"has_content={has_content}") - msg = Message( - id=Identifier(name="m"), - value=Pattern(elements=(TextElement(value="v"),)), - attributes=(), - ) - - junk = ( - Junk(content=f"{leading_char}content") - if has_content - else Junk(content="") - ) - - resource = Resource(entries=(msg, junk)) - - # Should not raise - serialization should handle all cases - result = serialize(resource) - assert isinstance(result, str) - assert "m = v" in result - - -class TestPatternWithoutBraces: - """Test Pattern serialization path when text has no braces. - - Per serializer.py line 483->467, there's an else branch when text - contains neither { nor } characters. This tests the optimization path - that emits text directly without brace handling. - """ - - def test_text_without_braces_direct_output(self) -> None: - """Text without braces takes direct output path.""" - msg = Message( - id=Identifier(name="plain"), - value=Pattern( - elements=( - TextElement(value="No braces here, just plain text!"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - # Should contain the text as-is (no brace escaping needed) - assert "No braces here, just plain text!" in result - # Should NOT have any brace-related escaping - assert '{ "{" }' not in result - assert '{ "}" }' not in result - - def test_text_with_only_safe_punctuation(self) -> None: - """Text with punctuation but no braces serializes directly.""" - msg = Message( - id=Identifier(name="punct"), - value=Pattern( - elements=( - TextElement(value="Hello, world! How are you?"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - assert "Hello, world! How are you?" in result - # No brace escaping - assert '{ "{" }' not in result - - def test_text_with_numbers_and_symbols(self) -> None: - """Text with numbers and safe symbols serializes directly.""" - msg = Message( - id=Identifier(name="data"), - value=Pattern( - elements=( - TextElement(value="Price: $42.00 (20% off)"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - assert "Price: $42.00 (20% off)" in result - - @given( - text=st.text( - alphabet=st.characters( - whitelist_categories=("Lu", "Ll", "Nd", "Zs"), - whitelist_characters="!@#$%^&*()_+-=[]|;:'\",.<>?/~`", - ), - min_size=1, - max_size=100, - ).filter(lambda t: "{" not in t and "}" not in t) - ) - @example(text="Simple text without any braces") - @example(text="Numbers 123 and symbols !@#") - def test_brace_free_text_property(self, text: str) -> None: - """Text without braces always serializes without brace escaping.""" - event(f"input_len={len(text)}") - assume(text.strip()) # Non-empty after stripping - # Leading whitespace gets wrapped in a StringLiteral placeable for - # roundtrip correctness (see _serialize_pattern); not this test's concern. - assume(not text[0].isspace()) - - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value=text),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # Should contain the original text - assert text in result - # Should NOT have brace escaping since input has no braces - assert '{ "{" }' not in result or "{" in text # Only if original had them - assert '{ "}" }' not in result or "}" in text - - -class TestMultilinePatternIndentation: - """Test multi-line pattern indentation handling. - - Per serializer.py lines 474-475, newlines in TextElements are replaced - with newline + 4-space indentation for FTL continuation lines. - """ - - def test_multiline_text_indented(self) -> None: - """Newlines in TextElement followed by 4-space indentation.""" - msg = Message( - id=Identifier(name="multi"), - value=Pattern( - elements=( - TextElement(value="Line 1\nLine 2\nLine 3"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - # Each newline should be followed by 4 spaces (continuation indent) - assert "Line 1\n Line 2\n Line 3" in result - - def test_multiline_with_braces_indented_and_escaped(self) -> None: - """Multiline text with braces: both indentation and brace escaping.""" - msg = Message( - id=Identifier(name="complex"), - value=Pattern( - elements=( - TextElement(value="First {line}\nSecond }line"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - # Should have indentation AND brace escaping - assert "First" in result - assert "Second" in result - assert '{ "{" }' in result # { escaped - assert '{ "}" }' in result # } escaped - # Newline creates indentation - assert "\n " in result - - @given( - lines=st.lists( - st.text( - alphabet=st.characters( - whitelist_categories=("Lu", "Ll", "Nd", "Zs"), - min_codepoint=0x20, # Printable ASCII and above - ), - min_size=1, - max_size=50, - ).filter(lambda t: "{" not in t and "}" not in t), - min_size=2, - max_size=5, - ) - ) - @example(lines=["First line", "Second line", "Third line"]) - def test_multiline_indentation_property(self, lines: list[str]) -> None: - """Multiline patterns always indent continuation lines with 4 spaces.""" - event(f"line_count={len(lines)}") - assume(all(line.strip() for line in lines)) # Non-empty lines - # Leading whitespace on the first line gets wrapped in a StringLiteral - # placeable for roundtrip correctness; not this test's concern. - assume(not lines[0][0].isspace()) - - text = "\n".join(lines) - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value=text),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - - # After first line, each line should be indented with 4 spaces - for i, line in enumerate(lines): - if i == 0: - # First line not indented - assert lines[0] in result - else: - # Subsequent lines indented - assert f"\n {line}" in result or line in result - - -class TestMixedPatternElements: - """Test Pattern serialization with mixed TextElement and Placeable elements. - - This ensures the elif branch at line 483 is properly covered when - iterating through pattern elements that alternate between types. - """ - - def test_mixed_text_and_placeable_elements(self) -> None: - """Pattern with alternating TextElement and Placeable elements.""" - msg = Message( - id=Identifier(name="mixed"), - value=Pattern( - elements=( - TextElement(value="Start "), - Placeable(expression=VariableReference(id=Identifier(name="var1"))), - TextElement(value=" middle "), - Placeable(expression=VariableReference(id=Identifier(name="var2"))), - TextElement(value=" end"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - assert "Start { $var1 } middle { $var2 } end" in result - - def test_multiple_consecutive_placeables(self) -> None: - """Pattern with consecutive Placeable elements (no text between).""" - msg = Message( - id=Identifier(name="consecutive"), - value=Pattern( - elements=( - Placeable(expression=VariableReference(id=Identifier(name="a"))), - Placeable(expression=VariableReference(id=Identifier(name="b"))), - Placeable(expression=VariableReference(id=Identifier(name="c"))), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - assert "{ $a }{ $b }{ $c }" in result - - def test_text_then_multiple_placeables(self) -> None: - """Pattern starting with text followed by multiple placeables.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - TextElement(value="Prefix: "), - Placeable(expression=StringLiteral(value="one")), - Placeable(expression=StringLiteral(value="two")), - Placeable(expression=StringLiteral(value="three")), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - assert 'Prefix: { "one" }{ "two" }{ "three" }' in result - - @given( - num_text=st.integers(min_value=1, max_value=5), - num_placeable=st.integers(min_value=1, max_value=5), - ) - @example(num_text=3, num_placeable=2) - @example(num_text=1, num_placeable=4) - def test_mixed_pattern_property(self, num_text: int, num_placeable: int) -> None: - """Patterns with varying numbers of text and placeable elements serialize correctly.""" - event(f"num_text={num_text}") - event(f"num_placeable={num_placeable}") - elements: list[TextElement | Placeable] = [] - - # Alternate between text and placeable - for i in range(max(num_text, num_placeable)): - if i < num_text: - elements.append(TextElement(value=f"text{i} ")) - if i < num_placeable: - elements.append( - Placeable( - expression=VariableReference(id=Identifier(name=f"v{i}")) - ) - ) - - msg = Message( - id=Identifier(name="m"), - value=Pattern(elements=tuple(elements)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource) - assert "m = " in result - - -class TestSelectExpressionVariantKeys: - """Test SelectExpression with both Identifier and NumberLiteral variant keys. - - Ensures match statement at line 619-623 covers both cases completely, - including exit paths (622->625). - """ - - def test_select_with_identifier_keys_only(self) -> None: - """SelectExpression with all Identifier variant keys.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=Identifier(name="one"), - value=Pattern( - elements=(TextElement(value="One item"),) - ), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=(TextElement(value="Many items"),) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - assert "[one]" in result - assert "*[other]" in result - assert "One item" in result - assert "Many items" in result - - def test_select_with_number_keys_only(self) -> None: - """SelectExpression with all NumberLiteral variant keys.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern( - elements=(TextElement(value="Exactly one"),) - ), - default=False, - ), - Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern( - elements=(TextElement(value="Zero"),) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - assert "[1]" in result - assert "*[0]" in result - assert "Exactly one" in result - assert "Zero" in result - - def test_select_with_mixed_identifier_and_number_keys(self) -> None: - """SelectExpression with both Identifier and NumberLiteral keys.""" - msg = Message( - id=Identifier(name="mixed"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="val")), - variants=( - Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern( - elements=(TextElement(value="Zero"),) - ), - default=False, - ), - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern( - elements=(TextElement(value="One"),) - ), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=(TextElement(value="Other"),) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - - # Both NumberLiteral and Identifier cases exercised - assert "[0]" in result - assert "[1]" in result - assert "*[other]" in result - - -class TestFunctionReferenceValidation: - """Test FunctionReference validation path coverage. - - Ensures the FunctionReference case at line 183-193 in _validate_expression - is fully covered, including exit paths (185->exit). - """ - - def test_function_reference_with_positional_args_validated(self) -> None: - """FunctionReference with positional arguments passes validation.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=( - VariableReference(id=Identifier(name="count")), - ), - named=(), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # Should validate successfully - result = serialize(resource, validate=True) - assert "NUMBER($count)" in result - - def test_function_reference_with_named_args_validated(self) -> None: - """FunctionReference with named arguments passes validation.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="DATETIME"), - arguments=CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="month"), - value=StringLiteral(value="long"), - ), - NamedArgument( - name=Identifier(name="day"), - value=StringLiteral(value="numeric"), - ), - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # Should validate successfully - result = serialize(resource, validate=True) - assert "DATETIME" in result - assert 'month: "long"' in result - assert 'day: "numeric"' in result - - def test_function_reference_with_mixed_args_validated(self) -> None: - """FunctionReference with both positional and named args validated.""" - msg = Message( - id=Identifier(name="formatted"), - value=Pattern( - elements=( - Placeable( - expression=FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=( - VariableReference(id=Identifier(name="amount")), - ), - named=( - NamedArgument( - name=Identifier(name="style"), - value=StringLiteral(value="currency"), - ), - NamedArgument( - name=Identifier(name="currency"), - value=StringLiteral(value="USD"), - ), - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - result = serialize(resource, validate=True) - assert "NUMBER($amount" in result - assert 'style: "currency"' in result - assert 'currency: "USD"' in result - - -# ============================================================================= -# _classify_line unit tests (covers lines 358, 361) -# ============================================================================= - - -class TestClassifyLine: - """Direct unit tests for _classify_line continuation line classifier.""" - - def test_empty_line(self) -> None: - """Empty string classified as EMPTY.""" - kind, ws_len = _classify_line("") - assert kind is _LineKind.EMPTY - assert ws_len == 0 - - def test_whitespace_only_single_space(self) -> None: - """Single space classified as WHITESPACE_ONLY.""" - kind, ws_len = _classify_line(" ") - assert kind is _LineKind.WHITESPACE_ONLY - assert ws_len == 0 - - def test_whitespace_only_multiple_spaces(self) -> None: - """Multiple spaces classified as WHITESPACE_ONLY.""" - kind, ws_len = _classify_line(" ") - assert kind is _LineKind.WHITESPACE_ONLY - assert ws_len == 0 - - def test_syntax_leading_dot_no_whitespace(self) -> None: - """Dot at position 0 classified as SYNTAX_LEADING with ws_len=0.""" - kind, ws_len = _classify_line(".") - assert kind is _LineKind.SYNTAX_LEADING - assert ws_len == 0 - - def test_syntax_leading_dot_with_whitespace(self) -> None: - """Dot preceded by spaces classified as SYNTAX_LEADING.""" - kind, ws_len = _classify_line(" .attr") - assert kind is _LineKind.SYNTAX_LEADING - assert ws_len == 3 - - def test_syntax_leading_asterisk(self) -> None: - """Asterisk preceded by spaces classified as SYNTAX_LEADING.""" - kind, ws_len = _classify_line(" *") - assert kind is _LineKind.SYNTAX_LEADING - assert ws_len == 3 - - def test_syntax_leading_bracket(self) -> None: - """Open bracket preceded by spaces classified as SYNTAX_LEADING.""" - kind, ws_len = _classify_line(" [key]") - assert kind is _LineKind.SYNTAX_LEADING - assert ws_len == 2 - - def test_normal_text(self) -> None: - """Regular text classified as NORMAL.""" - kind, ws_len = _classify_line("hello") - assert kind is _LineKind.NORMAL - assert ws_len == 0 - - def test_normal_text_with_leading_whitespace(self) -> None: - """Text with leading whitespace but non-syntax first char is NORMAL.""" - kind, ws_len = _classify_line(" hello") - assert kind is _LineKind.NORMAL - assert ws_len == 0 - - def test_dot_after_text_is_normal(self) -> None: - """Dot NOT as first non-ws character is NORMAL.""" - kind, ws_len = _classify_line("x.y") - assert kind is _LineKind.NORMAL - assert ws_len == 0 - - -# ============================================================================= -# _escape_text unit tests (covers brace escaping paths) -# ============================================================================= - - -class TestEscapeText: - """Direct unit tests for _escape_text brace escaping.""" - - def test_no_braces(self) -> None: - """Text without braces passes through unchanged.""" - output: list[str] = [] - _escape_text("hello world", output) - assert "".join(output) == "hello world" - - def test_open_brace(self) -> None: - """Open brace escaped as StringLiteral placeable.""" - output: list[str] = [] - _escape_text("before{after", output) - assert "".join(output) == 'before{ "{" }after' - - def test_close_brace(self) -> None: - """Close brace escaped as StringLiteral placeable.""" - output: list[str] = [] - _escape_text("x}y", output) - assert "".join(output) == 'x{ "}" }y' - - def test_both_braces(self) -> None: - """Both brace types escaped.""" - output: list[str] = [] - _escape_text("{}", output) - assert "".join(output) == '{ "{" }{ "}" }' - - def test_empty_text(self) -> None: - """Empty text produces no output.""" - output: list[str] = [] - _escape_text("", output) - assert output == [] - - def test_only_open_brace(self) -> None: - """Single open brace.""" - output: list[str] = [] - _escape_text("{", output) - assert "".join(output) == '{ "{" }' - - def test_braces_in_middle_of_text(self) -> None: - """Braces surrounded by text.""" - output: list[str] = [] - _escape_text("a{b}c", output) - assert "".join(output) == 'a{ "{" }b{ "}" }c' - - def test_consecutive_braces(self) -> None: - """Multiple consecutive braces.""" - output: list[str] = [] - _escape_text("{{", output) - assert "".join(output) == '{ "{" }{ "{" }' - - -# ============================================================================= -# _emit_classified_line integration tests (covers lines 742-751) -# ============================================================================= - - -class TestEmitClassifiedLineCoverage: - """Roundtrip tests that exercise _emit_classified_line branches.""" - - _parser = FluentParserV1() - - def _roundtrip_check(self, result: str) -> None: - """Verify parse-serialize roundtrip produces no Junk and is idempotent.""" - reparsed = self._parser.parse(result) - assert not any(isinstance(e, Junk) for e in reparsed.entries) - s2 = serialize(reparsed) - assert result == s2 - - def test_whitespace_only_continuation_line(self) -> None: - """Multiline text with whitespace-only continuation (lines 742-744).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value="line1\n \nline3"),) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert '{ " " }' in result - self._roundtrip_check(result) - - def test_syntax_leading_dot(self) -> None: - """Continuation line with dot as first non-ws char (lines 746-751).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value="line1\n.attr"),) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert '{ "." }' in result - self._roundtrip_check(result) - - def test_syntax_leading_with_ws_prefix(self) -> None: - """Syntax char preceded by whitespace (ws_len > 0 branch). - - Content spaces before the syntax char are wrapped in a StringLiteral - placeable so the parser cannot absorb them as structural indent. - """ - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value="line1\n .something"),) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert '{ " " }' in result # Content spaces wrapped - assert '{ "." }' in result # Syntax char wrapped - self._roundtrip_check(result) - - def test_syntax_leading_with_remaining_text(self) -> None: - """Syntax char followed by additional text (remaining branch).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value="line1\n*default value"),) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert '{ "*" }' in result - assert "default value" in result - self._roundtrip_check(result) - - def test_syntax_leading_bracket_with_content(self) -> None: - """Bracket syntax char with trailing content.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value="line1\n[not a variant"),) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert '{ "[" }' in result - self._roundtrip_check(result) - - def test_syntax_leading_ws_prefix_roundtrip_promoted(self) -> None: - """Content spaces before syntax char survive parse-serialize roundtrip. - - Promoted from Atheris fuzzer finding (finding_0001): convergence failure - S(AST) != S(P(S(AST))) when continuation line had content whitespace - before a wrapped syntax character. The parser absorbed content spaces - as structural indent during common-indent stripping. - """ - msg = Message( - id=Identifier(name="fuec"), - value=Pattern( - elements=( - TextElement(value=" dS7aQ\n .h?Q"), - ) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - # Leading 4 spaces wrapped at pattern level - assert '{ " " }' in result - # Content spaces before syntax char wrapped at line level - assert '{ " " }' in result - assert '{ "." }' in result - self._roundtrip_check(result) - - def test_syntax_char_only_no_remaining(self) -> None: - """Continuation line is just a syntax char, no remaining text (750->exit).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=(TextElement(value="line1\n."),) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert '{ "." }' in result - reparsed = self._parser.parse(result) - assert not any(isinstance(e, Junk) for e in reparsed.entries) - - -# ============================================================================= -# Pattern edge cases (covers lines 643->645, 699-700, 723->690, 871->874) -# ============================================================================= - - -class TestPatternEmissionEdgeCases: - """Tests for pattern serialization edge cases.""" - - _parser = FluentParserV1() - - def test_first_text_element_all_spaces(self) -> None: - """First TextElement is all spaces: leading_ws consumed entirely (lines 699-700).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - TextElement(value=" "), - Placeable( - expression=VariableReference(id=Identifier(name="x")) - ), - ) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert '{ " " }' in result - assert "$x" in result - - def test_placeable_not_last_element(self) -> None: - """Placeable followed by TextElement (loop continuation 723->690).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - Placeable( - expression=VariableReference( - id=Identifier(name="name") - ) - ), - TextElement(value=" said hello"), - ) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert "$name" in result - assert "said hello" in result - - def test_intra_element_separate_line_trigger(self) -> None: - """Single TextElement with embedded newline + NORMAL leading ws (643->645).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - TextElement( - value="line1\n normal with leading whitespace" - ), - ) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - # Separate-line mode activated: pattern starts on new line - assert result.startswith("test = \n ") - # Roundtrip - reparsed = self._parser.parse(result) - assert not any(isinstance(e, Junk) for e in reparsed.entries) - - def _roundtrip_convergence(self, source: str) -> None: - """Verify S(P(x)) == S(P(S(P(x)))) for an FTL source string.""" - parsed = self._parser.parse(source) - s1 = serialize(parsed) - reparsed = self._parser.parse(s1) - s2 = serialize(reparsed) - assert s1 == s2, f"Convergence failure:\nS1: {s1!r}\nS2: {s2!r}" - - def test_cross_element_ws_only_no_separate_line_promoted(self) -> None: - """WHITESPACE_ONLY cross-element does not trigger separate-line mode. - - Promoted from Atheris roundtrip fuzzer finding: convergence failure - S(P(x)) != S(P(S(P(x)))) when a whitespace-only TextElement followed - a newline-ending TextElement. The cross-element check triggered - separate-line mode; the serializer wrapped the spaces in a Placeable; - on re-parse the Placeable was opaque to the cross-element check, - so separate-line mode did not trigger, producing different output. - """ - self._roundtrip_convergence("aaaaa =\n h\n \n") - - def test_cross_element_ws_only_direct_ast(self) -> None: - """Cross-element WHITESPACE_ONLY: inline mode, content wrapped.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - TextElement(value="h\n"), - TextElement(value=" "), - ) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - # Should NOT use separate-line mode (WHITESPACE_ONLY handled by wrapping) - assert result.startswith("test = h") - assert '{ " " }' in result - reparsed = self._parser.parse(result) - assert not any(isinstance(e, Junk) for e in reparsed.entries) - s2 = serialize(reparsed) - assert result == s2 - - def test_cross_element_syntax_leading_no_separate_line(self) -> None: - """Cross-element SYNTAX_LEADING: inline mode, content wrapped.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - TextElement(value="h\n"), - TextElement(value=" .dotcontent"), - ) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - # Should NOT use separate-line mode (SYNTAX_LEADING handled by wrapping) - assert result.startswith("test = h") - assert '{ " " }' in result - assert '{ "." }' in result - reparsed = self._parser.parse(result) - assert not any(isinstance(e, Junk) for e in reparsed.entries) - s2 = serialize(reparsed) - assert result == s2 - - def test_cross_element_normal_still_triggers_separate_line(self) -> None: - """Cross-element NORMAL: separate-line mode correctly activates.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern( - elements=( - TextElement(value="h\n"), - TextElement(value=" normal text"), - ) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - # NORMAL with leading whitespace needs separate-line mode - assert result.startswith("test = \n ") - reparsed = self._parser.parse(result) - assert not any(isinstance(e, Junk) for e in reparsed.entries) - - def test_number_literal_variant_key(self) -> None: - """SelectExpression with NumberLiteral variant key (line 871->874).""" - sel = SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern( - elements=(TextElement(value="one item"),) - ), - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=(TextElement(value="many items"),) - ), - default=True, - ), - ), - ) - msg = Message( - id=Identifier(name="items"), - value=Pattern( - elements=(Placeable(expression=sel),) - ), - attributes=(), - ) - result = serialize(Resource(entries=(msg,))) - assert "[1]" in result - assert "[other]" in result - - -# ============================================================================= -# Defensive re-raise tests (covers lines 286, 449) -# ============================================================================= - - -class TestDefensiveReRaises: - """Test defensive re-raise paths for non-RESOLUTION FrozenFluentError.""" - - def test_validate_resource_non_resolution_error_propagates( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """Non-RESOLUTION FrozenFluentError re-raised from validation (line 286).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="hello"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - def fake_validate_pattern( - _pattern: Pattern, - _context: str, - _depth_guard: object, - ) -> None: - raise FrozenFluentError( - message="Test non-resolution error", - category=ErrorCategory.PARSE, - ) - - monkeypatch.setattr( - "ftllexengine.syntax.serializer._validate_pattern", - fake_validate_pattern, - ) - - with pytest.raises(FrozenFluentError, match="non-resolution"): - _validate_resource(resource) - - def test_serialize_non_resolution_error_propagates( - self, monkeypatch: pytest.MonkeyPatch - ) -> None: - """Non-RESOLUTION FrozenFluentError re-raised from serialization (line 449).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="hello"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - def fake_serialize_resource( - _self: object, - _node: Resource, - _output: list[str], - _depth_guard: object, - ) -> None: - raise FrozenFluentError( - message="Test serialize non-resolution", - category=ErrorCategory.PARSE, - ) - - monkeypatch.setattr( - "ftllexengine.syntax.serializer.FluentSerializer._serialize_resource", - fake_serialize_resource, - ) - - with pytest.raises(FrozenFluentError, match="serialize non-resolution"): - serialize(resource, validate=False) - - -class TestSerializerSelectorDepthGuard: - """Serializer wraps SelectExpression selector in depth guard. - - A well-formed SelectExpression with a variable selector serializes normally. - A malformed AST where SelectExpression is nested as its own selector (impossible - in parsed FTL, but constructible via the API) must raise SerializationDepthError - before triggering a RecursionError. - """ - - def test_valid_select_expression_serializes(self) -> None: - """SelectExpression with variable selector serializes to valid FTL.""" - from ftllexengine.syntax import serialize # noqa: PLC0415 - import inside function - - select = SelectExpression( - selector=VariableReference(id=Identifier("x")), - variants=( - Variant( - key=NumberLiteral(raw="1", value=1), - value=Pattern(elements=(TextElement("One"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement("Other"),)), - default=True, - ), - ), - ) - msg = Message( - id=Identifier("msg"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - result = serialize(resource) - assert "msg" in result - assert "$x ->" in result - - def test_deeply_nested_selector_raises_depth_error(self) -> None: - """Malformed deeply-nested SelectExpression selector raises SerializationDepthError.""" - from ftllexengine.syntax import serialize # noqa: PLC0415 - import inside function - - def make_nested_select(depth: int) -> SelectExpression: - if depth == 0: - return SelectExpression( - selector=VariableReference(id=Identifier("x")), - variants=( - Variant( - key=Identifier("a"), - value=Pattern(elements=(TextElement("A"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement("B"),)), - default=True, - ), - ), - ) - inner = make_nested_select(depth - 1) - # Intentionally malformed: SelectExpression as its own selector - return SelectExpression( - selector=inner, # type: ignore[arg-type] - variants=( - Variant( - key=Identifier("a"), - value=Pattern(elements=(TextElement("A"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement("B"),)), - default=True, - ), - ), - ) - - nested = make_nested_select(150) - msg = Message( - id=Identifier("msg"), - value=Pattern(elements=(Placeable(expression=nested),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - with pytest.raises(SerializationDepthError): - serialize(resource, max_depth=50) - - -# ============================================================================ -# SERIALIZER BRANCH COVERAGE -# ============================================================================ - - -class TestSerializerBranchCoverage: - """Test serializer branch coverage.""" - - def test_serialize_junk_entry(self) -> None: - """Junk entry without annotations is serialized as its raw content.""" - junk = Junk(content="invalid content here") - resource = Resource(entries=(junk,)) - - result = serialize(resource) - - assert "invalid content here" in result - - def test_serialize_text_without_braces(self) -> None: - """Message with plain text content serializes correctly.""" - message = Message( - id=Identifier("simple"), - value=Pattern(elements=(TextElement("Plain text without braces"),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - assert "simple = Plain text without braces" in result - - def test_serialize_text_with_braces(self) -> None: - """Text with literal braces serializes via StringLiteral placeables.""" - message = Message( - id=Identifier("braced"), - value=Pattern( - elements=( - TextElement("a"), - Placeable(expression=StringLiteral(value="{")), - TextElement("b"), - Placeable(expression=StringLiteral(value="}")), - TextElement("c"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - assert "braced" in result - - def test_serialize_select_expression(self) -> None: - """SelectExpression with default variant serializes with *[other].""" - select = SelectExpression( - selector=VariableReference(id=Identifier("count")), - variants=( - Variant( - key=Identifier("one"), - value=Pattern(elements=(TextElement("One item"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement("Many items"),)), - default=True, - ), - ), - ) - - message = Message( - id=Identifier("plural"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - assert "plural" in result - assert "*[other]" in result - - def test_serialize_number_literal_variant_key(self) -> None: - """Variant with NumberLiteral key serializes as [1].""" - select = SelectExpression( - selector=VariableReference(id=Identifier("num")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement("One"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement("Other"),)), - default=True, - ), - ), - ) - - message = Message( - id=Identifier("numkey"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - assert "[1]" in result - assert "*[other]" in result - - def test_serialize_nested_placeable(self) -> None: - """Nested Placeable (Placeable inside Placeable) serializes with double braces.""" - inner = Placeable(expression=VariableReference(id=Identifier("inner"))) - outer = Placeable(expression=inner) - - message = Message( - id=Identifier("nested"), - value=Pattern(elements=(outer,)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - assert "{ { $inner } }" in result - - -class TestSerializerBranchCoverageExtended: - """Extended serializer branch coverage tests.""" - - def test_serialize_message_with_comment(self) -> None: - """Message with attached comment serializes comment on separate line.""" - comment = Comment(content="Message comment", type=CommentType.COMMENT) - message = Message( - id=Identifier("commented"), - value=Pattern(elements=(TextElement("Value"),)), - attributes=(), - comment=comment, - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - assert "# Message comment" in result - assert "commented = Value" in result - - def test_serialize_term_with_attributes(self) -> None: - """Term with attributes serializes term and all attributes.""" - term = Term( - id=Identifier("brand"), - value=Pattern(elements=(TextElement("Firefox"),)), - attributes=( - Attribute( - id=Identifier("gender"), - value=Pattern(elements=(TextElement("masculine"),)), - ), - ), - ) - resource = Resource(entries=(term,)) - - result = serialize(resource) - - assert "-brand = Firefox" in result - assert ".gender = masculine" in result - - def test_serialize_function_with_named_args(self) -> None: - """Function call with named arguments serializes with correct syntax.""" - func_ref = FunctionReference( - id=Identifier("NUMBER"), - arguments=CallArguments( - positional=(VariableReference(id=Identifier("count")),), - named=( - NamedArgument( - name=Identifier("style"), - value=StringLiteral(value="percent"), - ), - ), - ), - ) - - message = Message( - id=Identifier("percent"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - assert "NUMBER" in result - assert "style" in result - - -class TestSerializerJunkCoverage: - """Junk entry with annotations serializes raw content.""" - - def test_serialize_junk_entry_with_annotations(self) -> None: - """Junk entry with annotations is serialized as its raw content.""" - junk = Junk( - content="this is not valid FTL", - annotations=( - Annotation(code="E0003", message="Expected token: ="), - ), - ) - resource = Resource(entries=(junk,)) - - ftl = serialize(resource) - - assert "this is not valid FTL" in ftl - - -class TestSerializerPlaceableCoverage: - """Placeable serialization branch coverage.""" - - def test_serialize_placeable_in_pattern(self) -> None: - """Placeable in pattern serializes with correct brace syntax.""" - placeable = Placeable(expression=VariableReference(id=Identifier(name="name"))) - pattern = Pattern(elements=(TextElement(value="Hello "), placeable)) - message = Message( - id=Identifier(name="greeting"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - - assert "{ $name }" in ftl - - -class TestSerializerNumberLiteralVariantKeyCoverage: - """NumberLiteral variant key serialization.""" - - def test_serialize_select_with_number_literal_key(self) -> None: - """Variant with NumberLiteral keys serializes as [1], [2], *[other].""" - variant1 = Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one item"),)), - default=False, - ) - variant2 = Variant( - key=NumberLiteral(value=2, raw="2"), - value=Pattern(elements=(TextElement(value="two items"),)), - default=False, - ) - variant_other = Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="many items"),)), - default=True, - ) - select_expr = SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=(variant1, variant2, variant_other), - ) - placeable = Placeable(expression=select_expr) - pattern = Pattern(elements=(placeable,)) - message = Message( - id=Identifier(name="items"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - - assert "[1]" in ftl - assert "[2]" in ftl - assert "*[other]" in ftl - - -class TestSerializerStringLiteralEscapesCoverage: - """Control character escaping in StringLiteral values.""" - - def test_serialize_string_literal_with_tab(self) -> None: - """Tab character in StringLiteral is escaped as \\u0009.""" - expr = StringLiteral(value="Hello\tWorld") - placeable = Placeable(expression=expr) - pattern = Pattern(elements=(placeable,)) - message = Message( - id=Identifier(name="tabbed"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - - assert "\\u0009" in ftl - - def test_serialize_string_literal_with_newline(self) -> None: - """Newline character in StringLiteral is escaped as \\u000A.""" - expr = StringLiteral(value="Line1\nLine2") - placeable = Placeable(expression=expr) - pattern = Pattern(elements=(placeable,)) - message = Message( - id=Identifier(name="multiline"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - - assert "\\u000A" in ftl - - def test_serialize_string_literal_with_carriage_return(self) -> None: - """Carriage return in StringLiteral is escaped as \\u000D.""" - expr = StringLiteral(value="Line1\rLine2") - placeable = Placeable(expression=expr) - pattern = Pattern(elements=(placeable,)) - message = Message( - id=Identifier(name="crlf"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - - assert "\\u000D" in ftl - - -class TestSerializerBranchExhaustive: - """Exhaustive branch coverage for serializer dispatch.""" - - def test_serialize_empty_pattern(self) -> None: - """Pattern with no elements serializes as empty value line.""" - pattern = Pattern(elements=()) - message = Message( - id=Identifier(name="empty"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - assert "empty = \n" in ftl - - def test_serialize_text_only_pattern(self) -> None: - """Pattern with multiple TextElements concatenates them.""" - pattern = Pattern( - elements=( - TextElement(value="Hello "), - TextElement(value="World"), - TextElement(value="!"), - ) - ) - message = Message( - id=Identifier(name="greeting"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - assert "greeting = Hello World!\n" in ftl - - def test_serialize_multiple_junk_entries(self) -> None: - """Multiple Junk entries all serialize as their raw content.""" - junk1 = Junk(content="bad syntax 1") - junk2 = Junk(content="bad syntax 2") - resource = Resource(entries=(junk1, junk2)) - - ftl = serialize(resource) - assert "bad syntax 1" in ftl - assert "bad syntax 2" in ftl - - def test_serialize_select_number_only_variants(self) -> None: - """Select with only NumberLiteral keys serializes correctly.""" - variants = ( - Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern(elements=(TextElement(value="zero"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one"),)), - default=True, - ), - ) - select_expr = SelectExpression( - selector=VariableReference(id=Identifier(name="n")), - variants=variants, - ) - placeable = Placeable(expression=select_expr) - pattern = Pattern(elements=(placeable,)) - message = Message( - id=Identifier(name="count"), - value=pattern, - attributes=(), - comment=None, - ) - resource = Resource(entries=(message,)) - - ftl = serialize(resource) - assert "[0] zero" in ftl - assert "*[1] one" in ftl - - -class TestSerializerVariantCountProperty: - """Property-based test for serializer with varying variant counts.""" - - @given( - st.integers(min_value=1, max_value=10), - ) - def test_serializer_handles_multiple_variants(self, variant_count: int) -> None: - """Serializer handles select expressions with varying variant counts.""" - event(f"variant_count={variant_count}") - variants = [ - Variant( - key=Identifier(f"key{i}"), - value=Pattern(elements=(TextElement(f"Value {i}"),)), - default=(i == variant_count - 1), - ) - for i in range(variant_count) - ] - - select = SelectExpression( - selector=VariableReference(id=Identifier("sel")), - variants=tuple(variants), - ) - - message = Message( - id=Identifier("multi"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - result = serialize(resource) - - for i in range(variant_count): - assert f"key{i}" in result diff --git a/tests/test_syntax_serializer_branches.py b/tests/test_syntax_serializer_branches.py new file mode 100644 index 00000000..2279de5d --- /dev/null +++ b/tests/test_syntax_serializer_branches.py @@ -0,0 +1,638 @@ +"""Tests for syntax.serializer: FluentSerializer, serialize(), edge cases, internal helpers. + +Validates serialization of AST nodes back to FTL syntax, including control character +escaping, depth limits, junk entries, multiline patterns, and classify/escape internals. +""" + +from __future__ import annotations + +import pytest +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.enums import CommentType +from ftllexengine.syntax import serialize +from ftllexengine.syntax.ast import ( + Annotation, + Attribute, + CallArguments, + Comment, + FunctionReference, + Identifier, + Junk, + Message, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.serializer import ( + SerializationDepthError, + _validate_resource, +) + +# ============================================================================= +# Defensive re-raise tests (covers lines 286, 449) +# ============================================================================= + + +class TestDefensiveReRaises: + """Test defensive re-raise paths for non-RESOLUTION FrozenFluentError.""" + + def test_validate_resource_non_resolution_error_propagates( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """Non-RESOLUTION FrozenFluentError re-raised from validation (line 286).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="hello"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + def fake_validate_pattern( + _pattern: Pattern, + _context: str, + _depth_guard: object, + ) -> None: + raise FrozenFluentError( + message="Test non-resolution error", + category=ErrorCategory.PARSE, + ) + + monkeypatch.setattr( + "ftllexengine.syntax.serializer._validate_pattern", + fake_validate_pattern, + ) + + with pytest.raises(FrozenFluentError, match="non-resolution"): + _validate_resource(resource) + + def test_serialize_non_resolution_error_propagates( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + """Non-RESOLUTION FrozenFluentError re-raised from serialization (line 449).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="hello"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + def fake_serialize_resource( + _self: object, + _node: Resource, + _output: list[str], + _depth_guard: object, + ) -> None: + raise FrozenFluentError( + message="Test serialize non-resolution", + category=ErrorCategory.PARSE, + ) + + monkeypatch.setattr( + "ftllexengine.syntax.serializer.FluentSerializer._serialize_resource", + fake_serialize_resource, + ) + + with pytest.raises(FrozenFluentError, match="serialize non-resolution"): + serialize(resource, validate=False) + + +class TestSerializerSelectorDepthGuard: + """Serializer wraps SelectExpression selector in depth guard. + + A well-formed SelectExpression with a variable selector serializes normally. + A malformed AST where SelectExpression is nested as its own selector (impossible + in parsed FTL, but constructible via the API) must raise SerializationDepthError + before triggering a RecursionError. + """ + + def test_valid_select_expression_serializes(self) -> None: + """SelectExpression with variable selector serializes to valid FTL.""" + from ftllexengine.syntax import serialize # noqa: PLC0415 - import inside function + + select = SelectExpression( + selector=VariableReference(id=Identifier("x")), + variants=( + Variant( + key=NumberLiteral(raw="1", value=1), + value=Pattern(elements=(TextElement("One"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement("Other"),)), + default=True, + ), + ), + ) + msg = Message( + id=Identifier("msg"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + assert "msg" in result + assert "$x ->" in result + + def test_deeply_nested_selector_raises_depth_error(self) -> None: + """Malformed deeply-nested SelectExpression selector raises SerializationDepthError.""" + from ftllexengine.syntax import serialize # noqa: PLC0415 - import inside function + + def make_nested_select(depth: int) -> SelectExpression: + if depth == 0: + return SelectExpression( + selector=VariableReference(id=Identifier("x")), + variants=( + Variant( + key=Identifier("a"), + value=Pattern(elements=(TextElement("A"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement("B"),)), + default=True, + ), + ), + ) + inner = make_nested_select(depth - 1) + # Intentionally malformed: SelectExpression as its own selector + return SelectExpression( + selector=inner, # type: ignore[arg-type] + variants=( + Variant( + key=Identifier("a"), + value=Pattern(elements=(TextElement("A"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement("B"),)), + default=True, + ), + ), + ) + + nested = make_nested_select(150) + msg = Message( + id=Identifier("msg"), + value=Pattern(elements=(Placeable(expression=nested),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + with pytest.raises(SerializationDepthError): + serialize(resource, max_depth=50) + + +# ============================================================================ +# SERIALIZER BRANCH COVERAGE +# ============================================================================ + + +class TestSerializerBranchCoverage: + """Test serializer branch coverage.""" + + def test_serialize_junk_entry(self) -> None: + """Junk entry without annotations is serialized as its raw content.""" + junk = Junk(content="invalid content here") + resource = Resource(entries=(junk,)) + + result = serialize(resource) + + assert "invalid content here" in result + + def test_serialize_text_without_braces(self) -> None: + """Message with plain text content serializes correctly.""" + message = Message( + id=Identifier("simple"), + value=Pattern(elements=(TextElement("Plain text without braces"),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + assert "simple = Plain text without braces" in result + + def test_serialize_text_with_braces(self) -> None: + """Text with literal braces serializes via StringLiteral placeables.""" + message = Message( + id=Identifier("braced"), + value=Pattern( + elements=( + TextElement("a"), + Placeable(expression=StringLiteral(value="{")), + TextElement("b"), + Placeable(expression=StringLiteral(value="}")), + TextElement("c"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + assert "braced" in result + + def test_serialize_select_expression(self) -> None: + """SelectExpression with default variant serializes with *[other].""" + select = SelectExpression( + selector=VariableReference(id=Identifier("count")), + variants=( + Variant( + key=Identifier("one"), + value=Pattern(elements=(TextElement("One item"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement("Many items"),)), + default=True, + ), + ), + ) + + message = Message( + id=Identifier("plural"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + assert "plural" in result + assert "*[other]" in result + + def test_serialize_number_literal_variant_key(self) -> None: + """Variant with NumberLiteral key serializes as [1].""" + select = SelectExpression( + selector=VariableReference(id=Identifier("num")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement("One"),)), + default=False, + ), + Variant( + key=Identifier("other"), + value=Pattern(elements=(TextElement("Other"),)), + default=True, + ), + ), + ) + + message = Message( + id=Identifier("numkey"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + assert "[1]" in result + assert "*[other]" in result + + def test_serialize_nested_placeable(self) -> None: + """Nested Placeable (Placeable inside Placeable) serializes with double braces.""" + inner = Placeable(expression=VariableReference(id=Identifier("inner"))) + outer = Placeable(expression=inner) + + message = Message( + id=Identifier("nested"), + value=Pattern(elements=(outer,)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + assert "{ { $inner } }" in result + + +class TestSerializerBranchCoverageExtended: + """Extended serializer branch coverage tests.""" + + def test_serialize_message_with_comment(self) -> None: + """Message with attached comment serializes comment on separate line.""" + comment = Comment(content="Message comment", type=CommentType.COMMENT) + message = Message( + id=Identifier("commented"), + value=Pattern(elements=(TextElement("Value"),)), + attributes=(), + comment=comment, + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + assert "# Message comment" in result + assert "commented = Value" in result + + def test_serialize_term_with_attributes(self) -> None: + """Term with attributes serializes term and all attributes.""" + term = Term( + id=Identifier("brand"), + value=Pattern(elements=(TextElement("Firefox"),)), + attributes=( + Attribute( + id=Identifier("gender"), + value=Pattern(elements=(TextElement("masculine"),)), + ), + ), + ) + resource = Resource(entries=(term,)) + + result = serialize(resource) + + assert "-brand = Firefox" in result + assert ".gender = masculine" in result + + def test_serialize_function_with_named_args(self) -> None: + """Function call with named arguments serializes with correct syntax.""" + func_ref = FunctionReference( + id=Identifier("NUMBER"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier("count")),), + named=( + NamedArgument( + name=Identifier("style"), + value=StringLiteral(value="percent"), + ), + ), + ), + ) + + message = Message( + id=Identifier("percent"), + value=Pattern(elements=(Placeable(expression=func_ref),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + assert "NUMBER" in result + assert "style" in result + + +class TestSerializerJunkCoverage: + """Junk entry with annotations serializes raw content.""" + + def test_serialize_junk_entry_with_annotations(self) -> None: + """Junk entry with annotations is serialized as its raw content.""" + junk = Junk( + content="this is not valid FTL", + annotations=(Annotation(code="E0003", message="Expected token: ="),), + ) + resource = Resource(entries=(junk,)) + + ftl = serialize(resource) + + assert "this is not valid FTL" in ftl + + +class TestSerializerPlaceableCoverage: + """Placeable serialization branch coverage.""" + + def test_serialize_placeable_in_pattern(self) -> None: + """Placeable in pattern serializes with correct brace syntax.""" + placeable = Placeable(expression=VariableReference(id=Identifier(name="name"))) + pattern = Pattern(elements=(TextElement(value="Hello "), placeable)) + message = Message( + id=Identifier(name="greeting"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + + assert "{ $name }" in ftl + + +class TestSerializerNumberLiteralVariantKeyCoverage: + """NumberLiteral variant key serialization.""" + + def test_serialize_select_with_number_literal_key(self) -> None: + """Variant with NumberLiteral keys serializes as [1], [2], *[other].""" + variant1 = Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one item"),)), + default=False, + ) + variant2 = Variant( + key=NumberLiteral(value=2, raw="2"), + value=Pattern(elements=(TextElement(value="two items"),)), + default=False, + ) + variant_other = Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ) + select_expr = SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=(variant1, variant2, variant_other), + ) + placeable = Placeable(expression=select_expr) + pattern = Pattern(elements=(placeable,)) + message = Message( + id=Identifier(name="items"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + + assert "[1]" in ftl + assert "[2]" in ftl + assert "*[other]" in ftl + + +class TestSerializerStringLiteralEscapesCoverage: + """Control character escaping in StringLiteral values.""" + + def test_serialize_string_literal_with_tab(self) -> None: + """Tab character in StringLiteral is escaped as \\u0009.""" + expr = StringLiteral(value="Hello\tWorld") + placeable = Placeable(expression=expr) + pattern = Pattern(elements=(placeable,)) + message = Message( + id=Identifier(name="tabbed"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + + assert "\\u0009" in ftl + + def test_serialize_string_literal_with_newline(self) -> None: + """Newline character in StringLiteral is escaped as \\u000A.""" + expr = StringLiteral(value="Line1\nLine2") + placeable = Placeable(expression=expr) + pattern = Pattern(elements=(placeable,)) + message = Message( + id=Identifier(name="multiline"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + + assert "\\u000A" in ftl + + def test_serialize_string_literal_with_carriage_return(self) -> None: + """Carriage return in StringLiteral is escaped as \\u000D.""" + expr = StringLiteral(value="Line1\rLine2") + placeable = Placeable(expression=expr) + pattern = Pattern(elements=(placeable,)) + message = Message( + id=Identifier(name="crlf"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + + assert "\\u000D" in ftl + + +class TestSerializerBranchExhaustive: + """Exhaustive branch coverage for serializer dispatch.""" + + def test_serialize_empty_pattern(self) -> None: + """Pattern with no elements serializes as empty value line.""" + pattern = Pattern(elements=()) + message = Message( + id=Identifier(name="empty"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + assert "empty = \n" in ftl + + def test_serialize_text_only_pattern(self) -> None: + """Pattern with multiple TextElements concatenates them.""" + pattern = Pattern( + elements=( + TextElement(value="Hello "), + TextElement(value="World"), + TextElement(value="!"), + ) + ) + message = Message( + id=Identifier(name="greeting"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + assert "greeting = Hello World!\n" in ftl + + def test_serialize_multiple_junk_entries(self) -> None: + """Multiple Junk entries all serialize as their raw content.""" + junk1 = Junk(content="bad syntax 1") + junk2 = Junk(content="bad syntax 2") + resource = Resource(entries=(junk1, junk2)) + + ftl = serialize(resource) + assert "bad syntax 1" in ftl + assert "bad syntax 2" in ftl + + def test_serialize_select_number_only_variants(self) -> None: + """Select with only NumberLiteral keys serializes correctly.""" + variants = ( + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="zero"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one"),)), + default=True, + ), + ) + select_expr = SelectExpression( + selector=VariableReference(id=Identifier(name="n")), + variants=variants, + ) + placeable = Placeable(expression=select_expr) + pattern = Pattern(elements=(placeable,)) + message = Message( + id=Identifier(name="count"), + value=pattern, + attributes=(), + comment=None, + ) + resource = Resource(entries=(message,)) + + ftl = serialize(resource) + assert "[0] zero" in ftl + assert "*[1] one" in ftl + + +class TestSerializerVariantCountProperty: + """Property-based test for serializer with varying variant counts.""" + + @given( + st.integers(min_value=1, max_value=10), + ) + def test_serializer_handles_multiple_variants(self, variant_count: int) -> None: + """Serializer handles select expressions with varying variant counts.""" + event(f"variant_count={variant_count}") + variants = [ + Variant( + key=Identifier(f"key{i}"), + value=Pattern(elements=(TextElement(f"Value {i}"),)), + default=(i == variant_count - 1), + ) + for i in range(variant_count) + ] + + select = SelectExpression( + selector=VariableReference(id=Identifier("sel")), + variants=tuple(variants), + ) + + message = Message( + id=Identifier("multi"), + value=Pattern(elements=(Placeable(expression=select),)), + attributes=(), + ) + resource = Resource(entries=(message,)) + + result = serialize(resource) + + for i in range(variant_count): + assert f"key{i}" in result diff --git a/tests/test_syntax_serializer_core.py b/tests/test_syntax_serializer_core.py new file mode 100644 index 00000000..4f485934 --- /dev/null +++ b/tests/test_syntax_serializer_core.py @@ -0,0 +1,851 @@ +"""Tests for syntax.serializer: FluentSerializer, serialize(), edge cases, internal helpers. + +Validates serialization of AST nodes back to FTL syntax, including control character +escaping, depth limits, junk entries, multiline patterns, and classify/escape internals. +""" + +from __future__ import annotations + +from ftllexengine.enums import CommentType +from ftllexengine.syntax import serialize +from ftllexengine.syntax.ast import ( + Attribute, + CallArguments, + Comment, + FunctionReference, + Identifier, + Junk, + Message, + MessageReference, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + Term, + TermReference, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.serializer import ( + FluentSerializer, +) + +# ============================================================================ +# BASIC SERIALIZATION TESTS +# ============================================================================ + + +class TestSerializerBasic: + """Test basic serializer functionality.""" + + def test_serialize_empty_resource(self) -> None: + """Serialize empty resource.""" + resource = Resource(entries=()) + + result = serialize(resource) + + assert result == "" + + def test_serializer_class_directly(self) -> None: + """Use FluentSerializer class directly.""" + serializer = FluentSerializer() + resource = Resource(entries=()) + + result = serializer.serialize(resource) + + assert result == "" + + +# ============================================================================ +# MESSAGE SERIALIZATION +# ============================================================================ + + +class TestSerializerMessage: + """Test message serialization.""" + + def test_serialize_simple_message(self) -> None: + """Serialize message with text only.""" + msg = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello, World!"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert result == "hello = Hello, World!\n" + + def test_serialize_message_with_variable(self) -> None: + """Serialize message with variable interpolation.""" + msg = Message( + id=Identifier(name="greeting"), + value=Pattern( + elements=( + TextElement(value="Hello, "), + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value="!"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert result == "greeting = Hello, { $name }!\n" + + def test_serialize_message_without_value(self) -> None: + """Serialize message without value (only attributes).""" + msg = Message( + id=Identifier(name="test"), + value=None, + attributes=( + Attribute( + id=Identifier(name="attr"), + value=Pattern(elements=(TextElement(value="Value"),)), + ), + ), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "test" in result + assert ".attr = Value" in result + + def test_serialize_message_with_comment(self) -> None: + """Serialize message with associated comment.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + comment=Comment(content="This is a comment", type=CommentType.COMMENT), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "# This is a comment\n" in result + assert "test = Test\n" in result + + def test_serialize_message_with_attributes(self) -> None: + """Serialize message with attributes.""" + msg = Message( + id=Identifier(name="button"), + value=Pattern(elements=(TextElement(value="Save"),)), + attributes=( + Attribute( + id=Identifier(name="tooltip"), + value=Pattern(elements=(TextElement(value="Click to save"),)), + ), + Attribute( + id=Identifier(name="aria-label"), + value=Pattern(elements=(TextElement(value="Save button"),)), + ), + ), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "button = Save\n" in result + assert " .tooltip = Click to save\n" in result + assert " .aria-label = Save button\n" in result + + def test_serialize_multiple_messages(self) -> None: + """Serialize multiple messages with blank line separation.""" + resource = Resource( + entries=( + Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello"),)), + attributes=(), + ), + Message( + id=Identifier(name="goodbye"), + value=Pattern(elements=(TextElement(value="Goodbye"),)), + attributes=(), + ), + ) + ) + + result = serialize(resource) + + assert "hello = Hello\n" in result + assert "goodbye = Goodbye\n" in result + + +# ============================================================================ +# TERM SERIALIZATION +# ============================================================================ + + +class TestSerializerTerm: + """Test term serialization.""" + + def test_serialize_simple_term(self) -> None: + """Serialize simple term.""" + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + ) + resource = Resource(entries=(term,)) + + result = serialize(resource) + + assert result == "-brand = Firefox\n" + + def test_serialize_term_with_attributes(self) -> None: + """Serialize term with attributes.""" + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=( + Attribute( + id=Identifier(name="version"), + value=Pattern(elements=(TextElement(value="120"),)), + ), + ), + ) + resource = Resource(entries=(term,)) + + result = serialize(resource) + + assert "-brand = Firefox\n" in result + assert " .version = 120\n" in result + + def test_serialize_term_with_comment(self) -> None: + """Serialize term with comment.""" + term = Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + comment=Comment(content="Brand name", type=CommentType.COMMENT), + ) + resource = Resource(entries=(term,)) + + result = serialize(resource) + + assert "# Brand name\n" in result + assert "-brand = Firefox\n" in result + + +# ============================================================================ +# COMMENT AND JUNK SERIALIZATION +# ============================================================================ + + +class TestSerializerCommentJunk: + """Test comment and junk serialization.""" + + def test_serialize_standalone_comment(self) -> None: + """Serialize standalone comment.""" + comment = Comment(content="This is a comment", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + + result = serialize(resource) + + assert result == "# This is a comment\n" + + def test_serialize_group_comment(self) -> None: + """Serialize group comment (##).""" + comment = Comment(content="Group comment", type=CommentType.GROUP) + resource = Resource(entries=(comment,)) + + result = serialize(resource) + + assert result == "## Group comment\n" + + def test_serialize_resource_comment(self) -> None: + """Serialize resource comment (###).""" + comment = Comment(content="Resource comment", type=CommentType.RESOURCE) + resource = Resource(entries=(comment,)) + + result = serialize(resource) + + assert result == "### Resource comment\n" + + def test_serialize_multiline_comment(self) -> None: + """Serialize multi-line comment.""" + comment = Comment(content="Line 1\nLine 2\nLine 3", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + + result = serialize(resource) + + assert "# Line 1\n# Line 2\n# Line 3\n" in result + + def test_serialize_multiline_comment_with_empty_lines(self) -> None: + """Serialize comment with empty lines (no trailing space on empty lines).""" + comment = Comment(content="Line 1\n\nLine 3", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + + result = serialize(resource) + + # Empty line should not have trailing space - "# \n" is wrong, "#\n" is correct + assert "# Line 1\n#\n# Line 3\n" in result + assert "# \n" not in result # No trailing space on empty comment lines + + def test_serialize_comment_only_empty_lines(self) -> None: + """Serialize comment that is only empty lines.""" + comment = Comment(content="\n\n", type=CommentType.COMMENT) + resource = Resource(entries=(comment,)) + + result = serialize(resource) + + # All lines should be just "#\n" without trailing space + assert result == "#\n#\n#\n" + assert "# \n" not in result + + def test_serialize_junk(self) -> None: + """Serialize junk entry.""" + junk = Junk(content="invalid { syntax") + resource = Resource(entries=(junk,)) + + result = serialize(resource) + + assert result == "invalid { syntax\n" + + +# ============================================================================ +# EXPRESSION SERIALIZATION +# ============================================================================ + + +class TestSerializerExpressions: + """Test expression serialization.""" + + def test_serialize_string_literal(self) -> None: + """Serialize string literal in placeable.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value="test value")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert '{ "test value" }' in result + + def test_serialize_string_literal_with_escapes(self) -> None: + """Serialize string literal with escape characters.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=StringLiteral(value='quote: " backslash: \\')),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert r'{ "quote: \" backslash: \\" }' in result + + def test_serialize_number_literal(self) -> None: + """Serialize number literal.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=NumberLiteral(value=42, raw="42")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ 42 }" in result + + def test_serialize_variable_reference(self) -> None: + """Serialize variable reference.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=(Placeable(expression=VariableReference(id=Identifier(name="count"))),) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ $count }" in result + + +# ============================================================================ +# REFERENCE SERIALIZATION +# ============================================================================ + + +class TestSerializerReferences: + """Test reference serialization.""" + + def test_serialize_message_reference_simple(self) -> None: + """Serialize message reference without attribute.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=MessageReference(id=Identifier(name="other"), attribute=None) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ other }" in result + + def test_serialize_message_reference_with_attribute(self) -> None: + """Serialize message reference with attribute.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=MessageReference( + id=Identifier(name="button"), + attribute=Identifier(name="tooltip"), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ button.tooltip }" in result + + def test_serialize_term_reference_simple(self) -> None: + """Serialize term reference without attribute.""" + msg = Message( + id=Identifier(name="welcome"), + value=Pattern( + elements=( + TextElement(value="Welcome to "), + Placeable( + expression=TermReference( + id=Identifier(name="brand"), attribute=None, arguments=None + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ -brand }" in result + + def test_serialize_term_reference_with_attribute(self) -> None: + """Serialize term reference with attribute.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier(name="brand"), + attribute=Identifier(name="version"), + arguments=None, + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ -brand.version }" in result + + def test_serialize_term_reference_with_arguments(self) -> None: + """Serialize term reference with call arguments.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier(name="brand"), + attribute=None, + arguments=CallArguments( + positional=(NumberLiteral(value=1, raw="1"),), named=() + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ -brand(1) }" in result + + +# ============================================================================ +# FUNCTION REFERENCE SERIALIZATION +# ============================================================================ + + +class TestSerializerFunctionReference: + """Test function reference serialization.""" + + def test_serialize_function_no_args(self) -> None: + """Serialize function call with no arguments.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="NOW"), + arguments=CallArguments(positional=(), named=()), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ NOW() }" in result + + def test_serialize_function_with_positional_args(self) -> None: + """Serialize function with positional arguments.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="value")),), + named=(), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ NUMBER($value) }" in result + + def test_serialize_function_with_multiple_positional_args(self) -> None: + """Serialize function with multiple positional arguments.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="TEST"), + arguments=CallArguments( + positional=( + NumberLiteral(value=1, raw="1"), + NumberLiteral(value=2, raw="2"), + StringLiteral(value="three"), + ), + named=(), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert '{ TEST(1, 2, "three") }' in result + + def test_serialize_function_with_named_args(self) -> None: + """Serialize function with named arguments.""" + msg = Message( + id=Identifier(name="price"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="minimumFractionDigits"), + value=NumberLiteral(value=2, raw="2"), + ), + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ NUMBER(minimumFractionDigits: 2) }" in result + + def test_serialize_function_with_mixed_args(self) -> None: + """Serialize function with both positional and named arguments.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="DATETIME"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="date")),), + named=( + NamedArgument( + name=Identifier(name="dateStyle"), + value=StringLiteral(value="long"), + ), + NamedArgument( + name=Identifier(name="timeStyle"), + value=StringLiteral(value="short"), + ), + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ DATETIME($date, dateStyle: " in result + assert 'timeStyle: "short") }' in result + + +# ============================================================================ +# SELECT EXPRESSION SERIALIZATION +# ============================================================================ + + +class TestSerializerSelectExpression: + """Test select expression serialization.""" + + def test_serialize_simple_select(self) -> None: + """Serialize select expression with variants.""" + msg = Message( + id=Identifier(name="emails"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="one email"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many emails"),)), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "{ $count ->" in result + assert "[one] one email" in result + assert "*[other] many emails" in result + + def test_serialize_select_with_number_keys(self) -> None: + """Serialize select expression with numeric variant keys.""" + msg = Message( + id=Identifier(name="items"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="no items"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one item"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "[0] no items" in result + assert "[1] one item" in result + assert "*[other] many items" in result + + +# ============================================================================ +# COMPLEX INTEGRATION TESTS +# ============================================================================ + + +class TestSerializerIntegration: + """Test serializer with complex AST structures.""" + + def test_serialize_mixed_resource(self) -> None: + """Serialize resource with comments, messages, and terms.""" + resource = Resource( + entries=( + Comment(content="Header comment", type=CommentType.COMMENT), + Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="Hello"),)), + attributes=(), + ), + Term( + id=Identifier(name="brand"), + value=Pattern(elements=(TextElement(value="Firefox"),)), + attributes=(), + ), + ) + ) + + result = serialize(resource) + + assert "# Header comment\n" in result + assert "hello = Hello\n" in result + assert "-brand = Firefox\n" in result + + def test_serialize_complex_message_with_select(self) -> None: + """Serialize message with select expression and variables.""" + msg = Message( + id=Identifier(name="user-files"), + value=Pattern( + elements=( + TextElement(value="User "), + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value=" has "), + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="one file"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern( + elements=( + Placeable( + expression=VariableReference( + id=Identifier(name="count") + ) + ), + TextElement(value=" files"), + ) + ), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "User { $name } has { $count ->" in result + assert "[one] one file" in result + assert "*[other] { $count } files" in result + + def test_serialize_message_with_all_features(self) -> None: + """Serialize message using all features.""" + msg = Message( + id=Identifier(name="complex"), + value=Pattern( + elements=( + TextElement(value="Price: "), + Placeable( + expression=FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="price")),), + named=( + NamedArgument( + name=Identifier(name="minimumFractionDigits"), + value=NumberLiteral(value=2, raw="2"), + ), + ), + ), + ) + ), + ) + ), + attributes=( + Attribute( + id=Identifier(name="tooltip"), + value=Pattern(elements=(TextElement(value="Product price"),)), + ), + ), + comment=Comment(content="Displays formatted price", type=CommentType.COMMENT), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "# Displays formatted price\n" in result + assert "complex = Price: { NUMBER($price, minimumFractionDigits: 2) }\n" in result + assert " .tooltip = Product price\n" in result diff --git a/tests/test_syntax_serializer_helpers.py b/tests/test_syntax_serializer_helpers.py new file mode 100644 index 00000000..d2aeda70 --- /dev/null +++ b/tests/test_syntax_serializer_helpers.py @@ -0,0 +1,424 @@ +"""Tests for syntax.serializer: FluentSerializer, serialize(), edge cases, internal helpers. + +Validates serialization of AST nodes back to FTL syntax, including control character +escaping, depth limits, junk entries, multiline patterns, and classify/escape internals. +""" + +from __future__ import annotations + +from ftllexengine.syntax import serialize +from ftllexengine.syntax.ast import ( + Identifier, + Junk, + Message, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + TextElement, + VariableReference, + Variant, +) +from ftllexengine.syntax.parser import FluentParserV1 +from ftllexengine.syntax.serializer_lines import _classify_line, _escape_text, _LineKind + +# ============================================================================= +# _classify_line unit tests (covers lines 358, 361) +# ============================================================================= + + +class TestClassifyLine: + """Direct unit tests for _classify_line continuation line classifier.""" + + def test_empty_line(self) -> None: + """Empty string classified as EMPTY.""" + kind, ws_len = _classify_line("") + assert kind is _LineKind.EMPTY + assert ws_len == 0 + + def test_whitespace_only_single_space(self) -> None: + """Single space classified as WHITESPACE_ONLY.""" + kind, ws_len = _classify_line(" ") + assert kind is _LineKind.WHITESPACE_ONLY + assert ws_len == 0 + + def test_whitespace_only_multiple_spaces(self) -> None: + """Multiple spaces classified as WHITESPACE_ONLY.""" + kind, ws_len = _classify_line(" ") + assert kind is _LineKind.WHITESPACE_ONLY + assert ws_len == 0 + + def test_syntax_leading_dot_no_whitespace(self) -> None: + """Dot at position 0 classified as SYNTAX_LEADING with ws_len=0.""" + kind, ws_len = _classify_line(".") + assert kind is _LineKind.SYNTAX_LEADING + assert ws_len == 0 + + def test_syntax_leading_dot_with_whitespace(self) -> None: + """Dot preceded by spaces classified as SYNTAX_LEADING.""" + kind, ws_len = _classify_line(" .attr") + assert kind is _LineKind.SYNTAX_LEADING + assert ws_len == 3 + + def test_syntax_leading_asterisk(self) -> None: + """Asterisk preceded by spaces classified as SYNTAX_LEADING.""" + kind, ws_len = _classify_line(" *") + assert kind is _LineKind.SYNTAX_LEADING + assert ws_len == 3 + + def test_syntax_leading_bracket(self) -> None: + """Open bracket preceded by spaces classified as SYNTAX_LEADING.""" + kind, ws_len = _classify_line(" [key]") + assert kind is _LineKind.SYNTAX_LEADING + assert ws_len == 2 + + def test_normal_text(self) -> None: + """Regular text classified as NORMAL.""" + kind, ws_len = _classify_line("hello") + assert kind is _LineKind.NORMAL + assert ws_len == 0 + + def test_normal_text_with_leading_whitespace(self) -> None: + """Text with leading whitespace but non-syntax first char is NORMAL.""" + kind, ws_len = _classify_line(" hello") + assert kind is _LineKind.NORMAL + assert ws_len == 0 + + def test_dot_after_text_is_normal(self) -> None: + """Dot NOT as first non-ws character is NORMAL.""" + kind, ws_len = _classify_line("x.y") + assert kind is _LineKind.NORMAL + assert ws_len == 0 + + +# ============================================================================= +# _escape_text unit tests (covers brace escaping paths) +# ============================================================================= + + +class TestEscapeText: + """Direct unit tests for _escape_text brace escaping.""" + + def test_no_braces(self) -> None: + """Text without braces passes through unchanged.""" + output: list[str] = [] + _escape_text("hello world", output) + assert "".join(output) == "hello world" + + def test_open_brace(self) -> None: + """Open brace escaped as StringLiteral placeable.""" + output: list[str] = [] + _escape_text("before{after", output) + assert "".join(output) == 'before{ "{" }after' + + def test_close_brace(self) -> None: + """Close brace escaped as StringLiteral placeable.""" + output: list[str] = [] + _escape_text("x}y", output) + assert "".join(output) == 'x{ "}" }y' + + def test_both_braces(self) -> None: + """Both brace types escaped.""" + output: list[str] = [] + _escape_text("{}", output) + assert "".join(output) == '{ "{" }{ "}" }' + + def test_empty_text(self) -> None: + """Empty text produces no output.""" + output: list[str] = [] + _escape_text("", output) + assert output == [] + + def test_only_open_brace(self) -> None: + """Single open brace.""" + output: list[str] = [] + _escape_text("{", output) + assert "".join(output) == '{ "{" }' + + def test_braces_in_middle_of_text(self) -> None: + """Braces surrounded by text.""" + output: list[str] = [] + _escape_text("a{b}c", output) + assert "".join(output) == 'a{ "{" }b{ "}" }c' + + def test_consecutive_braces(self) -> None: + """Multiple consecutive braces.""" + output: list[str] = [] + _escape_text("{{", output) + assert "".join(output) == '{ "{" }{ "{" }' + + +# ============================================================================= +# _emit_classified_line integration tests (covers lines 742-751) +# ============================================================================= + + +class TestEmitClassifiedLineCoverage: + """Roundtrip tests that exercise _emit_classified_line branches.""" + + _parser = FluentParserV1() + + def _roundtrip_check(self, result: str) -> None: + """Verify parse-serialize roundtrip produces no Junk and is idempotent.""" + reparsed = self._parser.parse(result) + assert not any(isinstance(e, Junk) for e in reparsed.entries) + s2 = serialize(reparsed) + assert result == s2 + + def test_whitespace_only_continuation_line(self) -> None: + """Multiline text with whitespace-only continuation (lines 742-744).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="line1\n \nline3"),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert '{ " " }' in result + self._roundtrip_check(result) + + def test_syntax_leading_dot(self) -> None: + """Continuation line with dot as first non-ws char (lines 746-751).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="line1\n.attr"),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert '{ "." }' in result + self._roundtrip_check(result) + + def test_syntax_leading_with_ws_prefix(self) -> None: + """Syntax char preceded by whitespace (ws_len > 0 branch). + + Content spaces before the syntax char are wrapped in a StringLiteral + placeable so the parser cannot absorb them as structural indent. + """ + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="line1\n .something"),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert '{ " " }' in result # Content spaces wrapped + assert '{ "." }' in result # Syntax char wrapped + self._roundtrip_check(result) + + def test_syntax_leading_with_remaining_text(self) -> None: + """Syntax char followed by additional text (remaining branch).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="line1\n*default value"),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert '{ "*" }' in result + assert "default value" in result + self._roundtrip_check(result) + + def test_syntax_leading_bracket_with_content(self) -> None: + """Bracket syntax char with trailing content.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="line1\n[not a variant"),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert '{ "[" }' in result + self._roundtrip_check(result) + + def test_syntax_leading_ws_prefix_roundtrip_promoted(self) -> None: + """Content spaces before syntax char survive parse-serialize roundtrip. + + Promoted from Atheris fuzzer finding (finding_0001): convergence failure + S(AST) != S(P(S(AST))) when continuation line had content whitespace + before a wrapped syntax character. The parser absorbed content spaces + as structural indent during common-indent stripping. + """ + msg = Message( + id=Identifier(name="fuec"), + value=Pattern(elements=(TextElement(value=" dS7aQ\n .h?Q"),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + # Leading 4 spaces wrapped at pattern level + assert '{ " " }' in result + # Content spaces before syntax char wrapped at line level + assert '{ " " }' in result + assert '{ "." }' in result + self._roundtrip_check(result) + + def test_syntax_char_only_no_remaining(self) -> None: + """Continuation line is just a syntax char, no remaining text (750->exit).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="line1\n."),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert '{ "." }' in result + reparsed = self._parser.parse(result) + assert not any(isinstance(e, Junk) for e in reparsed.entries) + + +# ============================================================================= +# Pattern edge cases (covers lines 643->645, 699-700, 723->690, 871->874) +# ============================================================================= + + +class TestPatternEmissionEdgeCases: + """Tests for pattern serialization edge cases.""" + + _parser = FluentParserV1() + + def test_first_text_element_all_spaces(self) -> None: + """First TextElement is all spaces: leading_ws consumed entirely (lines 699-700).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + TextElement(value=" "), + Placeable(expression=VariableReference(id=Identifier(name="x"))), + ) + ), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert '{ " " }' in result + assert "$x" in result + + def test_placeable_not_last_element(self) -> None: + """Placeable followed by TextElement (loop continuation 723->690).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable(expression=VariableReference(id=Identifier(name="name"))), + TextElement(value=" said hello"), + ) + ), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert "$name" in result + assert "said hello" in result + + def test_intra_element_separate_line_trigger(self) -> None: + """Single TextElement with embedded newline + NORMAL leading ws (643->645).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="line1\n normal with leading whitespace"),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + # Separate-line mode activated: pattern starts on new line + assert result.startswith("test = \n ") + # Roundtrip + reparsed = self._parser.parse(result) + assert not any(isinstance(e, Junk) for e in reparsed.entries) + + def _roundtrip_convergence(self, source: str) -> None: + """Verify S(P(x)) == S(P(S(P(x)))) for an FTL source string.""" + parsed = self._parser.parse(source) + s1 = serialize(parsed) + reparsed = self._parser.parse(s1) + s2 = serialize(reparsed) + assert s1 == s2, f"Convergence failure:\nS1: {s1!r}\nS2: {s2!r}" + + def test_cross_element_ws_only_no_separate_line_promoted(self) -> None: + """WHITESPACE_ONLY cross-element does not trigger separate-line mode. + + Promoted from Atheris roundtrip fuzzer finding: convergence failure + S(P(x)) != S(P(S(P(x)))) when a whitespace-only TextElement followed + a newline-ending TextElement. The cross-element check triggered + separate-line mode; the serializer wrapped the spaces in a Placeable; + on re-parse the Placeable was opaque to the cross-element check, + so separate-line mode did not trigger, producing different output. + """ + self._roundtrip_convergence("aaaaa =\n h\n \n") + + def test_cross_element_ws_only_direct_ast(self) -> None: + """Cross-element WHITESPACE_ONLY: inline mode, content wrapped.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + TextElement(value="h\n"), + TextElement(value=" "), + ) + ), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + # Should NOT use separate-line mode (WHITESPACE_ONLY handled by wrapping) + assert result.startswith("test = h") + assert '{ " " }' in result + reparsed = self._parser.parse(result) + assert not any(isinstance(e, Junk) for e in reparsed.entries) + s2 = serialize(reparsed) + assert result == s2 + + def test_cross_element_syntax_leading_no_separate_line(self) -> None: + """Cross-element SYNTAX_LEADING: inline mode, content wrapped.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + TextElement(value="h\n"), + TextElement(value=" .dotcontent"), + ) + ), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + # Should NOT use separate-line mode (SYNTAX_LEADING handled by wrapping) + assert result.startswith("test = h") + assert '{ " " }' in result + assert '{ "." }' in result + reparsed = self._parser.parse(result) + assert not any(isinstance(e, Junk) for e in reparsed.entries) + s2 = serialize(reparsed) + assert result == s2 + + def test_cross_element_normal_still_triggers_separate_line(self) -> None: + """Cross-element NORMAL: separate-line mode correctly activates.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + TextElement(value="h\n"), + TextElement(value=" normal text"), + ) + ), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + # NORMAL with leading whitespace needs separate-line mode + assert result.startswith("test = \n ") + reparsed = self._parser.parse(result) + assert not any(isinstance(e, Junk) for e in reparsed.entries) + + def test_number_literal_variant_key(self) -> None: + """SelectExpression with NumberLiteral variant key (line 871->874).""" + sel = SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="one item"),)), + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="many items"),)), + default=True, + ), + ), + ) + msg = Message( + id=Identifier(name="items"), + value=Pattern(elements=(Placeable(expression=sel),)), + attributes=(), + ) + result = serialize(Resource(entries=(msg,))) + assert "[1]" in result + assert "[other]" in result diff --git a/tests/test_syntax_serializer_patterns.py b/tests/test_syntax_serializer_patterns.py new file mode 100644 index 00000000..348f173c --- /dev/null +++ b/tests/test_syntax_serializer_patterns.py @@ -0,0 +1,427 @@ +"""Tests for syntax.serializer: FluentSerializer, serialize(), edge cases, internal helpers. + +Validates serialization of AST nodes back to FTL syntax, including control character +escaping, depth limits, junk entries, multiline patterns, and classify/escape internals. +""" + +from __future__ import annotations + +from hypothesis import assume, event, example, given +from hypothesis import strategies as st + +from ftllexengine.syntax import serialize +from ftllexengine.syntax.ast import ( + CallArguments, + FunctionReference, + Identifier, + Message, + NamedArgument, + NumberLiteral, + Pattern, + Placeable, + Resource, + SelectExpression, + StringLiteral, + TextElement, + VariableReference, + Variant, +) + + +class TestMultilinePatternIndentation: + """Test multi-line pattern indentation handling. + + Per serializer.py lines 474-475, newlines in TextElements are replaced + with newline + 4-space indentation for FTL continuation lines. + """ + + def test_multiline_text_indented(self) -> None: + """Newlines in TextElement followed by 4-space indentation.""" + msg = Message( + id=Identifier(name="multi"), + value=Pattern(elements=(TextElement(value="Line 1\nLine 2\nLine 3"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + # Each newline should be followed by 4 spaces (continuation indent) + assert "Line 1\n Line 2\n Line 3" in result + + def test_multiline_with_braces_indented_and_escaped(self) -> None: + """Multiline text with braces: both indentation and brace escaping.""" + msg = Message( + id=Identifier(name="complex"), + value=Pattern(elements=(TextElement(value="First {line}\nSecond }line"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + # Should have indentation AND brace escaping + assert "First" in result + assert "Second" in result + assert '{ "{" }' in result # { escaped + assert '{ "}" }' in result # } escaped + # Newline creates indentation + assert "\n " in result + + @given( + lines=st.lists( + st.text( + alphabet=st.characters( + whitelist_categories=("Lu", "Ll", "Nd", "Zs"), + min_codepoint=0x20, # Printable ASCII and above + ), + min_size=1, + max_size=50, + ).filter(lambda t: "{" not in t and "}" not in t), + min_size=2, + max_size=5, + ) + ) + @example(lines=["First line", "Second line", "Third line"]) + def test_multiline_indentation_property(self, lines: list[str]) -> None: + """Multiline patterns always indent continuation lines with 4 spaces.""" + event(f"line_count={len(lines)}") + assume(all(line.strip() for line in lines)) # Non-empty lines + # Leading whitespace on the first line gets wrapped in a StringLiteral + # placeable for roundtrip correctness; not this test's concern. + assume(not lines[0][0].isspace()) + + text = "\n".join(lines) + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value=text),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # After first line, each line should be indented with 4 spaces + for i, line in enumerate(lines): + if i == 0: + # First line not indented + assert lines[0] in result + else: + # Subsequent lines indented + assert f"\n {line}" in result or line in result + + +class TestMixedPatternElements: + """Test Pattern serialization with mixed TextElement and Placeable elements. + + This ensures the elif branch at line 483 is properly covered when + iterating through pattern elements that alternate between types. + """ + + def test_mixed_text_and_placeable_elements(self) -> None: + """Pattern with alternating TextElement and Placeable elements.""" + msg = Message( + id=Identifier(name="mixed"), + value=Pattern( + elements=( + TextElement(value="Start "), + Placeable(expression=VariableReference(id=Identifier(name="var1"))), + TextElement(value=" middle "), + Placeable(expression=VariableReference(id=Identifier(name="var2"))), + TextElement(value=" end"), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + assert "Start { $var1 } middle { $var2 } end" in result + + def test_multiple_consecutive_placeables(self) -> None: + """Pattern with consecutive Placeable elements (no text between).""" + msg = Message( + id=Identifier(name="consecutive"), + value=Pattern( + elements=( + Placeable(expression=VariableReference(id=Identifier(name="a"))), + Placeable(expression=VariableReference(id=Identifier(name="b"))), + Placeable(expression=VariableReference(id=Identifier(name="c"))), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + assert "{ $a }{ $b }{ $c }" in result + + def test_text_then_multiple_placeables(self) -> None: + """Pattern starting with text followed by multiple placeables.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + TextElement(value="Prefix: "), + Placeable(expression=StringLiteral(value="one")), + Placeable(expression=StringLiteral(value="two")), + Placeable(expression=StringLiteral(value="three")), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + assert 'Prefix: { "one" }{ "two" }{ "three" }' in result + + @given( + num_text=st.integers(min_value=1, max_value=5), + num_placeable=st.integers(min_value=1, max_value=5), + ) + @example(num_text=3, num_placeable=2) + @example(num_text=1, num_placeable=4) + def test_mixed_pattern_property(self, num_text: int, num_placeable: int) -> None: + """Patterns with varying numbers of text and placeable elements serialize correctly.""" + event(f"num_text={num_text}") + event(f"num_placeable={num_placeable}") + elements: list[TextElement | Placeable] = [] + + # Alternate between text and placeable + for i in range(max(num_text, num_placeable)): + if i < num_text: + elements.append(TextElement(value=f"text{i} ")) + if i < num_placeable: + elements.append( + Placeable(expression=VariableReference(id=Identifier(name=f"v{i}"))) + ) + + msg = Message( + id=Identifier(name="m"), + value=Pattern(elements=tuple(elements)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + assert "m = " in result + + +class TestSelectExpressionVariantKeys: + """Test SelectExpression with both Identifier and NumberLiteral variant keys. + + Ensures match statement at line 619-623 covers both cases completely, + including exit paths (622->625). + """ + + def test_select_with_identifier_keys_only(self) -> None: + """SelectExpression with all Identifier variant keys.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=Identifier(name="one"), + value=Pattern(elements=(TextElement(value="One item"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Many items"),)), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + assert "[one]" in result + assert "*[other]" in result + assert "One item" in result + assert "Many items" in result + + def test_select_with_number_keys_only(self) -> None: + """SelectExpression with all NumberLiteral variant keys.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="count")), + variants=( + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="Exactly one"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="Zero"),)), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + assert "[1]" in result + assert "*[0]" in result + assert "Exactly one" in result + assert "Zero" in result + + def test_select_with_mixed_identifier_and_number_keys(self) -> None: + """SelectExpression with both Identifier and NumberLiteral keys.""" + msg = Message( + id=Identifier(name="mixed"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=VariableReference(id=Identifier(name="val")), + variants=( + Variant( + key=NumberLiteral(value=0, raw="0"), + value=Pattern(elements=(TextElement(value="Zero"),)), + default=False, + ), + Variant( + key=NumberLiteral(value=1, raw="1"), + value=Pattern(elements=(TextElement(value="One"),)), + default=False, + ), + Variant( + key=Identifier(name="other"), + value=Pattern(elements=(TextElement(value="Other"),)), + default=True, + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + # Both NumberLiteral and Identifier cases exercised + assert "[0]" in result + assert "[1]" in result + assert "*[other]" in result + + +class TestFunctionReferenceValidation: + """Test FunctionReference validation path coverage. + + Ensures the FunctionReference case at line 183-193 in _validate_expression + is fully covered, including exit paths (185->exit). + """ + + def test_function_reference_with_positional_args_validated(self) -> None: + """FunctionReference with positional arguments passes validation.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="count")),), + named=(), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Should validate successfully + result = serialize(resource, validate=True) + assert "NUMBER($count)" in result + + def test_function_reference_with_named_args_validated(self) -> None: + """FunctionReference with named arguments passes validation.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="DATETIME"), + arguments=CallArguments( + positional=(), + named=( + NamedArgument( + name=Identifier(name="month"), + value=StringLiteral(value="long"), + ), + NamedArgument( + name=Identifier(name="day"), + value=StringLiteral(value="numeric"), + ), + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Should validate successfully + result = serialize(resource, validate=True) + assert "DATETIME" in result + assert 'month: "long"' in result + assert 'day: "numeric"' in result + + def test_function_reference_with_mixed_args_validated(self) -> None: + """FunctionReference with both positional and named args validated.""" + msg = Message( + id=Identifier(name="formatted"), + value=Pattern( + elements=( + Placeable( + expression=FunctionReference( + id=Identifier(name="NUMBER"), + arguments=CallArguments( + positional=(VariableReference(id=Identifier(name="amount")),), + named=( + NamedArgument( + name=Identifier(name="style"), + value=StringLiteral(value="currency"), + ), + NamedArgument( + name=Identifier(name="currency"), + value=StringLiteral(value="USD"), + ), + ), + ), + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + assert "NUMBER($amount" in result + assert 'style: "currency"' in result + assert 'currency: "USD"' in result diff --git a/tests/test_syntax_serializer_roundtrip.py b/tests/test_syntax_serializer_roundtrip.py index 59396dac..ec2895c4 100644 --- a/tests/test_syntax_serializer_roundtrip.py +++ b/tests/test_syntax_serializer_roundtrip.py @@ -1,1037 +1,7 @@ -"""Roundtrip tests for syntax.serializer: parse(serialize(ast)) == ast. +"""Aggregated syntax serializer roundtrip test surface.""" -Validates both the parser and serializer simultaneously, covering programmatic -ASTs with embedded newlines, whitespace preservation, and convergence stability. -""" - -from __future__ import annotations - -import pytest -from hypothesis import assume, event, example, given, settings -from hypothesis import strategies as st - -from ftllexengine.enums import CommentType -from ftllexengine.syntax import parse, serialize -from ftllexengine.syntax.ast import ( - Comment, - Identifier, - Junk, - Message, - NumberLiteral, - Pattern, - Placeable, - Resource, - SelectExpression, - TextElement, - VariableReference, - Variant, -) -from ftllexengine.syntax.parser import FluentParserV1 -from ftllexengine.syntax.serializer import FluentSerializer - -from .strategies import ( - ftl_comments, - ftl_message_nodes, - ftl_patterns, - ftl_resources, - ftl_select_expressions, - ftl_variable_references, -) - -# ============================================================================ -# SIMPLE ROUNDTRIP TESTS (Example-Based) -# ============================================================================ - - -def test_roundtrip_simple_message(): - """Round-trip a simple message with text only.""" - # Create AST - msg = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="Hello, World!"),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # Serialize and parse back - serialized = serialize(resource) - reparsed = parse(serialized) - - # Should be structurally identical - assert len(reparsed.entries) == 1 - assert isinstance(reparsed.entries[0], Message) - assert reparsed.entries[0].id.name == "hello" - - -def test_roundtrip_message_with_variable(): - """Round-trip a message with variable interpolation.""" - msg = Message( - id=Identifier(name="greeting"), - value=Pattern( - elements=( - TextElement(value="Hello, "), - Placeable(expression=VariableReference(id=Identifier(name="name"))), - TextElement(value="!"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) == 1 - assert isinstance(reparsed.entries[0], Message) - assert reparsed.entries[0].id.name == "greeting" - # Verify pattern has 3 elements - pattern = reparsed.entries[0].value - assert pattern is not None - assert len(pattern.elements) == 3 - - -def test_roundtrip_select_expression(): - """Round-trip a message with select expression (plurals).""" - msg = Message( - id=Identifier(name="emails"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="one email"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=( - Placeable( - expression=VariableReference( - id=Identifier(name="count") - ) - ), - TextElement(value=" emails"), - ) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) == 1 - assert isinstance(reparsed.entries[0], Message) - - -def test_roundtrip_numeric_variant(): - """Round-trip select expression with numeric variant keys.""" - msg = Message( - id=Identifier(name="items"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern(elements=(TextElement(value="no items"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one item"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="many items"),)), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) == 1 - msg_parsed = reparsed.entries[0] - assert isinstance(msg_parsed, Message) - assert msg_parsed.id.name == "items" - - -def test_roundtrip_comment(): - """Round-trip standalone comment. - - NOTE: Parser does not currently support standalone comments - they are - silently ignored during parsing. This test documents the limitation. - When parser support is added, this test should pass. - """ - comment = Comment(content=" This is a comment", type=CommentType.COMMENT) - resource = Resource(entries=(comment,)) - - serialized = serialize(resource) - # Serializer correctly outputs: "# This is a comment\n" - assert serialized == "# This is a comment\n" - - # Per Fluent spec: Comments are preserved in AST - reparsed = parse(serialized) - - # Spec-conformant behavior: Comments are preserved - assert len(reparsed.entries) == 1 - assert isinstance(reparsed.entries[0], Comment) - assert reparsed.entries[0].content == comment.content - - -def test_roundtrip_junk(): - """Round-trip junk (invalid syntax preserved).""" - junk = Junk(content="invalid syntax here {") - resource = Resource(entries=(junk,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - # Junk gets reparsed as junk - assert len(reparsed.entries) >= 1 - # At least one entry should be junk - assert any(isinstance(e, Junk) for e in reparsed.entries) - - -def test_roundtrip_junk_with_leading_whitespace(): - """Round-trip junk with leading whitespace without redundant newlines. - - Tests that the serializer does not add redundant separators before Junk - entries when the Junk content already includes leading whitespace. - The parser includes preceding whitespace in Junk.content for containment. - """ - # Parse FTL with message followed by blank lines and indented junk - source = "msg = hello\n\n bad" - resource = parse(source) - - # Serialize and re-parse - serialized = serialize(resource) - reparsed = parse(serialized) - - # Verify file doesn't grow on multiple roundtrips (key invariant) - serialized2 = serialize(reparsed) - assert len(serialized2) == len(serialized), ( - "File size should remain stable across roundtrips (no whitespace inflation)" - ) - - # Verify multiple roundtrips converge to stable output - serialized3 = serialize(parse(serialized2)) - assert serialized3 == serialized2, ( - "Serialization should be idempotent after first roundtrip" - ) - - -def test_roundtrip_multiple_messages(): - """Round-trip resource with multiple messages.""" - msg1 = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="Hello!"),)), - attributes=(), - ) - msg2 = Message( - id=Identifier(name="goodbye"), - value=Pattern(elements=(TextElement(value="Goodbye!"),)), - attributes=(), - ) - resource = Resource(entries=(msg1, msg2)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - # Should have at least 2 messages - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) >= 2 - - -def test_roundtrip_mixed_entries(): - """Round-trip resource with messages and standalone comments. - - When Comments appear as separate entries in the AST (not as message.comment), - they are standalone comments and should remain standalone after roundtrip. - The serializer preserves this by adding 2 blank lines between a standalone - comment and the following message/term. - """ - entries = ( - Comment(content=" Header comment", type=CommentType.COMMENT), - Message( - id=Identifier(name="app-name"), - value=Pattern(elements=(TextElement(value="MyApp"),)), - attributes=(), - ), - Comment(content=" Another comment", type=CommentType.COMMENT), - Message( - id=Identifier(name="version"), - value=Pattern(elements=(TextElement(value="1.0.0"),)), - attributes=(), - ), - ) - resource = Resource(entries=entries) - - serialized = serialize(resource) - reparsed = parse(serialized) - - # Standalone comments remain standalone after roundtrip - standalone_comments = [e for e in reparsed.entries if isinstance(e, Comment)] - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(standalone_comments) == 2 # Comments remain standalone - assert len(messages) == 2 # Messages survive roundtrip - - # Messages should NOT have attached comments (comments are standalone) - assert messages[0].comment is None - assert messages[1].comment is None - - # Comment content is preserved - assert "Header comment" in standalone_comments[0].content - assert "Another comment" in standalone_comments[1].content - - -def test_roundtrip_attached_comments(): - """Round-trip resource with attached comments. - - When Comments are set as message.comment (not as separate entries), - they are attached comments and should remain attached after roundtrip. - """ - entries = ( - Message( - id=Identifier(name="app-name"), - value=Pattern(elements=(TextElement(value="MyApp"),)), - attributes=(), - comment=Comment(content=" Attached to app-name", type=CommentType.COMMENT), - ), - Message( - id=Identifier(name="version"), - value=Pattern(elements=(TextElement(value="1.0.0"),)), - attributes=(), - comment=Comment(content=" Attached to version", type=CommentType.COMMENT), - ), - ) - resource = Resource(entries=entries) - - serialized = serialize(resource) - reparsed = parse(serialized) - - # No standalone comments - all attached - standalone_comments = [e for e in reparsed.entries if isinstance(e, Comment)] - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(standalone_comments) == 0 # No standalone comments - assert len(messages) == 2 # Messages survive roundtrip - - # Comments remain attached to their messages - assert messages[0].comment is not None - assert "Attached to app-name" in messages[0].comment.content - assert messages[1].comment is not None - assert "Attached to version" in messages[1].comment.content - - -def test_roundtrip_empty_resource(): - """Round-trip empty resource.""" - resource = Resource(entries=()) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) == 0 - - -def test_roundtrip_message_with_only_placeable(): - """Round-trip message with only a placeable (no text).""" - msg = Message( - id=Identifier(name="count"), - value=Pattern( - elements=(Placeable(expression=VariableReference(id=Identifier(name="num"))),) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) == 1 - assert isinstance(reparsed.entries[0], Message) - - -def test_roundtrip_complex_pattern(): - """Round-trip message with complex pattern (text + variables). - - NOTE: Parser creates spurious Junk entry for trailing period. - This is a parser quirk - the message itself parses correctly. - """ - msg = Message( - id=Identifier(name="user-info"), - value=Pattern( - elements=( - TextElement(value="User "), - Placeable(expression=VariableReference(id=Identifier(name="name"))), - TextElement(value=" has "), - Placeable(expression=VariableReference(id=Identifier(name="count"))), - TextElement(value=" items"), # Removed trailing period to avoid parser quirk - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - # Message parses correctly (ignore spurious Junk entries) - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) == 1 - msg_parsed = messages[0] - assert isinstance(msg_parsed, Message) - assert msg_parsed.value is not None - assert len(msg_parsed.value.elements) == 5 - - -# ============================================================================ -# WHITESPACE PRESERVATION ROUNDTRIP TESTS -# ============================================================================ - - -def test_roundtrip_multiline_leading_whitespace(): - """Round-trip preserves leading whitespace after newlines. - - Tests fix for IMPL-SERIALIZER-ROUNDTRIP-CORRUPTION-001: when TextElement - with leading whitespace follows element ending with newline, serializer - must emit pattern on separate line to preserve the whitespace semantically. - """ - # Pattern: "Line 1\n Line 2" (2 leading spaces on line 2) - msg = Message( - id=Identifier(name="code-block"), - value=Pattern( - elements=( - TextElement(value="Line 1\n"), - TextElement(value=" Line 2"), # 2-space indent - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - # Extract reparsed pattern content - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) == 1 - pattern = messages[0].value - assert pattern is not None - - # Reconstruct the pattern content from elements - content = "".join( - elem.value for elem in pattern.elements if isinstance(elem, TextElement) - ) - assert "Line 1\n" in content - assert " Line 2" in content # 2 spaces preserved - - -def test_roundtrip_code_example_indent(): - """Round-trip preserves code example indentation. - - Tests common use case of embedding code examples in localization strings. - """ - # Multi-line code example with indentation - msg = Message( - id=Identifier(name="code-example"), - value=Pattern( - elements=( - TextElement(value="Example:\n"), - TextElement(value=" def hello():\n"), - TextElement(value=" print('Hi')"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) == 1 - pattern = messages[0].value - assert pattern is not None - - content = "".join( - elem.value for elem in pattern.elements if isinstance(elem, TextElement) - ) - # Verify indentation preserved - assert " def hello():" in content - assert " print('Hi')" in content - - -def test_roundtrip_whitespace_idempotent(): - """Multiple roundtrips produce identical output (idempotency). - - Tests that whitespace handling doesn't cause drift across roundtrips. - """ - msg = Message( - id=Identifier(name="formatted"), - value=Pattern( - elements=( - TextElement(value="Header:\n"), - TextElement(value=" Item 1\n"), - TextElement(value=" Item 2"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - # First roundtrip - serialized1 = serialize(resource) - reparsed1 = parse(serialized1) - - # Second roundtrip - serialized2 = serialize(reparsed1) - reparsed2 = parse(serialized2) - - # Third roundtrip - serialized3 = serialize(reparsed2) - - # Output should stabilize after first roundtrip - assert serialized2 == serialized3, "Serialization should be idempotent" - - -def test_roundtrip_mixed_whitespace_and_placeables(): - """Round-trip preserves whitespace with interleaved placeables.""" - msg = Message( - id=Identifier(name="mixed"), - value=Pattern( - elements=( - TextElement(value="Results for "), - Placeable(expression=VariableReference(id=Identifier(name="query"))), - TextElement(value=":\n"), - TextElement(value=" - First result\n"), - TextElement(value=" - "), - Placeable(expression=VariableReference(id=Identifier(name="count"))), - TextElement(value=" more"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) == 1 - pattern = messages[0].value - assert pattern is not None - - # Verify structure preserved - should have TextElements with whitespace - text_elements = [e for e in pattern.elements if isinstance(e, TextElement)] - text_content = "".join(e.value for e in text_elements) - - # Check whitespace preservation - assert ":\n" in text_content - assert " - First result\n" in text_content or " -" in text_content - - -def test_roundtrip_tab_indentation(): - """Round-trip preserves tab indentation.""" - msg = Message( - id=Identifier(name="tabbed"), - value=Pattern( - elements=( - TextElement(value="Data:\n"), - TextElement(value="\tColumn 1\n"), - TextElement(value="\t\tNested"), - ) - ), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) == 1 - pattern = messages[0].value - assert pattern is not None - - content = "".join( - elem.value for elem in pattern.elements if isinstance(elem, TextElement) - ) - assert "\tColumn 1" in content - assert "\t\tNested" in content - - -def test_roundtrip_preserves_parsed_whitespace(): - """Parse and serialize preserves original whitespace from FTL source. - - Tests the full cycle: FTL source -> parse -> serialize -> parse -> serialize - """ - # FTL with intentional indentation - source = """\ -code-snippet = - Example code: - if True: - print("hello") -""" - parsed = parse(source) - serialized = serialize(parsed) - reparsed = parse(serialized) - serialized2 = serialize(reparsed) - - # Should stabilize - assert serialized == serialized2, "Roundtrip should be stable" - - # Verify semantic content preserved - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) == 1 - pattern = messages[0].value - assert pattern is not None - - content = "".join( - elem.value for elem in pattern.elements if isinstance(elem, TextElement) - ) - # Original indentation relationships should be preserved - assert "Example code:" in content - assert "print(" in content - - -def test_roundtrip_compact_messages_no_blank_lines(): - """Roundtrip of compact messages preserves no-blank-line format. - - Tests the fix for NAME-SERIALIZER-SPACING-001 where serializer was adding - redundant newlines between Message/Term entries. - """ - # Compact FTL with no blank lines between messages - source = "msg1 = First\nmsg2 = Second\nmsg3 = Third" - - parsed = parse(source) - serialized = serialize(parsed) - - # Serialized output should maintain compact format (no blank lines) - assert serialized == "msg1 = First\nmsg2 = Second\nmsg3 = Third\n" - - # Verify roundtrip preserves structure - reparsed = parse(serialized) - assert len(reparsed.entries) == 3 - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) == 3 - assert messages[0].id.name == "msg1" - assert messages[1].id.name == "msg2" - assert messages[2].id.name == "msg3" - - -def test_comment_message_separation_preserved(): - """Comment->Message still gets blank line to prevent attachment. - - Tests the fix for NAME-SERIALIZER-SPACING-001 ensures Comment separation - logic is preserved (blank lines prevent comment attachment on re-parse). - """ - # Standalone comment followed by message (with blank line) - source = "# Standalone comment\n\nmsg = Value" - - parsed = parse(source) - serialized = serialize(parsed) - - # Should preserve blank line between comment and message - # The blank line prevents the comment from being attached to the message - assert "\n\n" in serialized - - # Verify roundtrip: comment should remain standalone - reparsed = parse(serialized) - comments = [e for e in reparsed.entries if isinstance(e, Comment)] - messages = [e for e in reparsed.entries if isinstance(e, Message)] - - assert len(comments) == 1 - assert len(messages) == 1 - # Message should NOT have an attached comment - assert messages[0].comment is None - - -def test_roundtrip_mixed_spacing_preserved(): - """Mixed spacing patterns are preserved during roundtrip.""" - # Mix of compact messages and separated entries - source = "msg1 = First\nmsg2 = Second\n\n# Comment\n\nmsg3 = Third" - - parsed = parse(source) - serialized = serialize(parsed) - reparsed = parse(serialized) - - # Should have 3 messages and 1 comment - messages = [e for e in reparsed.entries if isinstance(e, Message)] - comments = [e for e in reparsed.entries if isinstance(e, Comment)] - - assert len(messages) == 3 - assert len(comments) == 1 - - # First two messages should be compact (consecutive) - # Comment should be standalone (not attached) - # Third message should be after comment - assert messages[0].id.name == "msg1" - assert messages[1].id.name == "msg2" - assert messages[2].id.name == "msg3" - - -# ============================================================================ -# PROPERTY-BASED ROUNDTRIP TESTS (Hypothesis) -# ============================================================================ - - -@given(ftl_message_nodes()) -@settings(max_examples=30) -def test_roundtrip_property_messages(message: Message) -> None: - """Property: All generated messages round-trip successfully.""" - resource = Resource(entries=(message,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - messages = [e for e in reparsed.entries if isinstance(e, Message)] - assert len(messages) >= 1 - assert messages[0].id.name == message.id.name - has_attrs = len(message.attributes) > 0 - event(f"has_attributes={has_attrs}") - event("outcome=message_roundtrip") - - -@given(ftl_patterns()) -@settings(max_examples=30) -def test_roundtrip_property_patterns(pattern: Pattern) -> None: - """Property: All generated patterns round-trip in messages.""" - msg = Message( - id=Identifier(name="test"), value=pattern, attributes=() - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) >= 1 - event(f"element_count={len(pattern.elements)}") - event("outcome=pattern_roundtrip") - - -@given(ftl_select_expressions()) -@settings(max_examples=20) -def test_roundtrip_property_select_expressions( - select_expr: SelectExpression, -) -> None: - """Property: All generated select expressions round-trip.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=select_expr),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) >= 1 - event(f"variant_count={len(select_expr.variants)}") - event("outcome=select_roundtrip") - - -@given(ftl_comments()) -@settings(max_examples=30) -def test_roundtrip_property_comments(comment_str: str) -> None: - """Property: All generated comments serialize correctly.""" - if comment_str.startswith("### "): - comment_type = CommentType.RESOURCE - content = comment_str[4:] - elif comment_str.startswith("## "): - comment_type = CommentType.GROUP - content = comment_str[3:] - else: - comment_type = CommentType.COMMENT - content = comment_str[2:] - - comment_node = Comment(content=content, type=comment_type) - resource = Resource(entries=(comment_node,)) - - serialized = serialize(resource) - assert isinstance(serialized, str) - assert serialized.startswith("#") - - _ = parse(serialized) - event(f"comment_type={comment_type.name}") - event("outcome=comment_roundtrip") - - -@given(ftl_resources()) -@settings(max_examples=20) -def test_roundtrip_property_complete_resources( - resource: Resource, -) -> None: - """Property: All generated resources round-trip successfully.""" - serialized = serialize(resource) - reparsed = parse(serialized) - - original_messages = [ - e for e in resource.entries if isinstance(e, Message) - ] - reparsed_messages = [ - e for e in reparsed.entries if isinstance(e, Message) - ] - - original_ids = {msg.id.name for msg in original_messages} - reparsed_ids = {msg.id.name for msg in reparsed_messages} - assert original_ids.issubset(reparsed_ids) - event(f"entry_count={len(resource.entries)}") - event("outcome=resource_roundtrip") - - -@given(ftl_variable_references()) -@settings(max_examples=30) -def test_roundtrip_property_variable_references( - var_ref: VariableReference, -) -> None: - """Property: Variable references round-trip in placeables.""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=var_ref),)), - attributes=(), - ) - resource = Resource(entries=(msg,)) - - serialized = serialize(resource) - reparsed = parse(serialized) - - assert len(reparsed.entries) >= 1 - event(f"var_name={var_ref.id.name}") - event("outcome=varref_roundtrip") - - -# ============================================================================ -# SERIALIZER VALIDITY TESTS -# ============================================================================ - - -@given(ftl_resources()) -@settings(max_examples=30) -def test_serializer_produces_valid_ftl(resource: Resource) -> None: - """Property: Serialized output always produces parseable FTL.""" - serialized = serialize(resource) - - assert isinstance(serialized, str) - - result = parse(serialized) - assert isinstance(result, Resource) - event(f"entry_count={len(resource.entries)}") - event("outcome=valid_ftl") - - -@given(ftl_message_nodes()) -@settings(max_examples=30) -def test_serializer_deterministic(message: Message) -> None: - """Property: Same AST always produces same serialized output.""" - resource = Resource(entries=(message,)) - - serialized1 = serialize(resource) - serialized2 = serialize(resource) - - assert serialized1 == serialized2 - event("outcome=deterministic") - - -# ============================================================================ -# PROGRAMMATIC AST ROUNDTRIPS (from test_serializer_programmatic_roundtrip.py) -# ============================================================================ - - -_parser = FluentParserV1() -_serializer = FluentSerializer() - - -def _roundtrip_pattern_value(pattern_text: str) -> str: - """Create a programmatic AST, serialize, parse, and return pattern value.""" - msg = Message( - id=Identifier(name="msg", span=None), - value=Pattern(elements=(TextElement(value=pattern_text),)), - attributes=(), - comment=None, - span=None, - ) - resource = Resource(entries=(msg,)) - serialized = _serializer.serialize(resource) - parsed = _parser.parse(serialized) - entry = parsed.entries[0] - assert hasattr(entry, "value") - assert entry.value is not None - return "".join( - el.value for el in entry.value.elements # type: ignore[union-attr] - ) - - -class TestEmbeddedNewlineWhitespace: - """Roundtrip preservation of embedded newlines with significant whitespace.""" - - def test_five_space_indent(self) -> None: - """Embedded newline with 5-space indent preserved through roundtrip.""" - original = "foo\n bar" - assert _roundtrip_pattern_value(original) == original - - def test_four_space_indent(self) -> None: - """Embedded newline with exactly 4-space indent (boundary case).""" - original = "foo\n bar" - assert _roundtrip_pattern_value(original) == original - - def test_single_space_indent(self) -> None: - """Embedded newline with single space indent.""" - original = "foo\n bar" - assert _roundtrip_pattern_value(original) == original - - def test_multiple_newlines_varying_indent(self) -> None: - """Multiple embedded newlines with different indentation levels.""" - original = "a\n b\n c\n d" - assert _roundtrip_pattern_value(original) == original - - def test_no_whitespace_after_newline(self) -> None: - """Embedded newline without whitespace does not trigger separate-line.""" - original = "hello\nworld" - assert _roundtrip_pattern_value(original) == original - - def test_trailing_newline_no_whitespace(self) -> None: - """Trailing newline at end of text element.""" - original = "hello\n" - result = _roundtrip_pattern_value(original) - # Trailing newline may be normalized during parse - assert result.rstrip("\n") == "hello" - - def test_tab_after_newline(self) -> None: - """Tab character after newline (not space, no separate-line needed). - - Only space characters trigger separate-line serialization per the - FTL spec's whitespace handling (tab is not continuation indent). - """ - original = "foo\n\tbar" - assert _roundtrip_pattern_value(original) == original - - -def _extract_element_values(resource: Resource) -> list[str]: - """Extract text element values from the first entry's pattern.""" - entry = resource.entries[0] - assert hasattr(entry, "value") - assert entry.value is not None - return [el.value for el in entry.value.elements] # type: ignore[union-attr] - - -class TestParserProducedRoundtrip: - """Verify existing parser-produced roundtrip behavior is preserved.""" - - def test_separate_line_with_extra_indent(self) -> None: - """Parser-produced AST from FTL with extra indentation.""" - ftl = "msg =\n foo\n bar\n" - resource = _parser.parse(ftl) - serialized = _serializer.serialize(resource) - resource2 = _parser.parse(serialized) - assert _extract_element_values(resource) == _extract_element_values(resource2) - - def test_inline_start_multiline(self) -> None: - """Inline pattern start with continuation line.""" - ftl = "msg = foo\n bar\n" - resource = _parser.parse(ftl) - serialized = _serializer.serialize(resource) - resource2 = _parser.parse(serialized) - assert _extract_element_values(resource) == _extract_element_values(resource2) - - -class TestSerializerStability: - """Serialize-parse-serialize stability (idempotence after first roundtrip).""" - - @given( - indent=st.integers(min_value=1, max_value=12), - line_count=st.integers(min_value=2, max_value=5), - ) - @settings(max_examples=100) - @example(indent=1, line_count=2) - @example(indent=4, line_count=2) - @example(indent=5, line_count=3) - def test_embedded_indent_stability(self, indent: int, line_count: int) -> None: - """After first roundtrip, subsequent roundtrips are stable. - - Constructs patterns with N lines, each indented by `indent` spaces. - After initial serialize-parse, the result must be stable on - subsequent serialize-parse cycles. - """ - event(f"indent={indent}") - event(f"line_count={line_count}") - lines = [f"{' ' * indent}line{i}" if i > 0 else "first" for i in range(line_count)] - original = "\n".join(lines) - - # First roundtrip - first_rt = _roundtrip_pattern_value(original) - - # Second roundtrip from the first result - msg2 = Message( - id=Identifier(name="msg", span=None), - value=Pattern(elements=(TextElement(value=first_rt),)), - attributes=(), - comment=None, - span=None, - ) - resource2 = Resource(entries=(msg2,)) - serialized2 = _serializer.serialize(resource2) - parsed2 = _parser.parse(serialized2) - entry2 = parsed2.entries[0] - assert hasattr(entry2, "value") - assert entry2.value is not None - second_rt = "".join( - el.value for el in entry2.value.elements # type: ignore[union-attr] - ) - - # Stability: second roundtrip equals first roundtrip - assert first_rt == second_rt, ( - f"Roundtrip not stable: first={first_rt!r}, second={second_rt!r}" - ) - - -# ============================================================================ -# Identifier Roundtrip (Fuzz-marked: deadline=None) -# ============================================================================ - - -@pytest.mark.fuzz -@given(st.text(alphabet="abcdefghijklmnopqrstuvwxyz", min_size=1, max_size=20)) -@settings(max_examples=50, deadline=None) -def test_serialize_parse_identifiers(identifier: str) -> None: - """Property: valid identifiers survive serialize->parse round-trip. - - FUZZ: run with ./scripts/fuzz_hypofuzz.sh --deep or pytest -m fuzz - """ - assume(identifier[0].isalpha()) - assume(all(c.isalnum() or c == "-" for c in identifier)) - - ftl_source = f"{identifier} = Test value" - resource = parse(ftl_source) - - assume(len(resource.entries) > 0) - assume(not isinstance(resource.entries[0], Junk)) - - serialized = serialize(resource) - resource2 = parse(serialized) - - event(f"id_len={len(identifier)}") - assert resource2 is not None - assert len(resource2.entries) == len(resource.entries) - event("outcome=e2e_id_roundtrip_success") +from tests.syntax_serializer_roundtrip_cases.identifier_roundtrip_fuzz_marked_deadline_none import * # noqa: F403 - re-export split test surface +from tests.syntax_serializer_roundtrip_cases.property_based_roundtrip_tests_hypothesis import * # noqa: F403 - re-export split test surface +from tests.syntax_serializer_roundtrip_cases.serializer_validity_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_serializer_roundtrip_cases.simple_roundtrip_tests_example_based import * # noqa: F403 - re-export split test surface +from tests.syntax_serializer_roundtrip_cases.whitespace_preservation_roundtrip_tests import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_serializer_text_validation.py b/tests/test_syntax_serializer_text_validation.py new file mode 100644 index 00000000..d7b65c61 --- /dev/null +++ b/tests/test_syntax_serializer_text_validation.py @@ -0,0 +1,665 @@ +"""Tests for syntax.serializer: FluentSerializer, serialize(), edge cases, internal helpers. + +Validates serialization of AST nodes back to FTL syntax, including control character +escaping, depth limits, junk entries, multiline patterns, and classify/escape internals. +""" + +from __future__ import annotations + +import pytest +from hypothesis import assume, event, example, given +from hypothesis import strategies as st + +from ftllexengine.syntax import serialize +from ftllexengine.syntax.ast import ( + Identifier, + Junk, + Message, + Pattern, + Placeable, + Resource, + StringLiteral, + TextElement, + VariableReference, +) +from ftllexengine.syntax.serializer import ( + SerializationDepthError, + SerializationValidationError, +) + +# ============================================================================ +# TEXT ELEMENT BRACE SERIALIZATION TESTS +# ============================================================================ + + +class TestTextElementBraceSerialization: + """Test that literal braces in TextElements are serialized per Fluent Spec 1.0. + + Per Fluent Spec: Backslash has no escaping power in TextElements. + Literal braces MUST be expressed as StringLiterals within Placeables: + - { must be serialized as {"{"} (Placeable containing StringLiteral) + - } must be serialized as {"}"} (Placeable containing StringLiteral) + + This produces valid FTL that compliant parsers accept. + """ + + def test_open_brace_becomes_string_literal_placeable(self) -> None: + """Open brace { in text becomes {"{"} per Fluent spec.""" + msg = Message( + id=Identifier(name="brace"), + value=Pattern(elements=(TextElement(value="Use {variable} syntax"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # Braces become StringLiteral Placeables: { "{" }variable{ "}" } + assert 'brace = Use { "{" }variable{ "}" } syntax\n' in result + + def test_close_brace_becomes_string_literal_placeable(self) -> None: + """Close brace } in text becomes {"}"} per Fluent spec.""" + msg = Message( + id=Identifier(name="json"), + value=Pattern(elements=(TextElement(value='{"key": "value"}'),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # Both { and } become StringLiteral Placeables + assert '{ "{" }' in result + assert '{ "}" }' in result + # Full pattern: { "{" }"key": "value"{ "}" } + assert 'json = { "{" }"key": "value"{ "}" }\n' in result + + def test_backslash_not_escaped_in_text_elements(self) -> None: + """Backslash has no special meaning in TextElements per Fluent spec. + + Per spec: backslash only has escaping power in StringLiterals, + not in TextElements. A backslash in text is preserved as-is. + """ + msg = Message( + id=Identifier(name="path"), + value=Pattern(elements=(TextElement(value="C:\\Users\\file.txt"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # Backslash preserved as-is (no escaping in TextElements) + assert "path = C:\\Users\\file.txt\n" in result + + def test_backslash_before_brace_preserved(self) -> None: + """Backslash before brace: backslash preserved, brace becomes placeable.""" + msg = Message( + id=Identifier(name="escaped"), + value=Pattern(elements=(TextElement(value="Literal \\{ brace"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # Backslash preserved, brace becomes StringLiteral Placeable + assert 'escaped = Literal \\{ "{" } brace\n' in result + + def test_preserve_text_without_braces(self) -> None: + """Text without braces should not be modified.""" + msg = Message( + id=Identifier(name="plain"), + value=Pattern(elements=(TextElement(value="Hello, World!"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert "plain = Hello, World!\n" in result + + def test_mixed_text_and_placeables(self) -> None: + """Text with literal braces alongside real placeables.""" + msg = Message( + id=Identifier(name="mixed"), + value=Pattern( + elements=( + TextElement(value="JSON: {key} = "), + Placeable(expression=VariableReference(id=Identifier(name="value"))), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # Literal braces become StringLiteral Placeables, real placeable unchanged + assert 'mixed = JSON: { "{" }key{ "}" } = { $value }\n' in result + + def test_multiple_consecutive_braces(self) -> None: + """Multiple consecutive braces each become separate placeables.""" + msg = Message( + id=Identifier(name="multi"), + value=Pattern(elements=(TextElement(value="{{nested}}"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # Each brace becomes its own placeable + assert 'multi = { "{" }{ "{" }' in result + assert '{ "}" }{ "}" }' in result + + def test_brace_at_start_of_text(self) -> None: + """Brace at start of text element.""" + msg = Message( + id=Identifier(name="start"), + value=Pattern(elements=(TextElement(value="{start"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert 'start = { "{" }start\n' in result + + def test_brace_at_end_of_text(self) -> None: + """Brace at end of text element.""" + msg = Message( + id=Identifier(name="end"), + value=Pattern(elements=(TextElement(value="end}"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert 'end = end{ "}" }\n' in result + + def test_only_braces(self) -> None: + """Text containing only braces.""" + msg = Message( + id=Identifier(name="braces"), + value=Pattern(elements=(TextElement(value="{}"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + assert 'braces = { "{" }{ "}" }\n' in result + + +# ============================================================================ +# IDENTIFIER VALIDATION TESTS +# ============================================================================ + + +class TestIdentifierValidation: + """Test identifier validation during serialization.""" + + def test_invalid_message_id_rejected(self) -> None: + """Invalid message identifier rejected when validate=True. + + Regression test for SER-INVALID-OUTPUT-001. + Parser-produced ASTs have valid identifiers, but programmatically + constructed ASTs can contain arbitrary strings. Serializer should + validate identifiers when validate=True. + """ + msg = Message( + id=Identifier(name="invalid message with spaces"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + with pytest.raises(SerializationValidationError, match="Invalid identifier"): + serialize(resource, validate=True) + + def test_invalid_variable_reference_rejected(self) -> None: + """Invalid variable identifier rejected when validate=True.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern( + elements=( + Placeable( + expression=VariableReference( + id=Identifier(name="my var") # Space invalid + ) + ), + ) + ), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + with pytest.raises(SerializationValidationError, match="Invalid identifier"): + serialize(resource, validate=True) + + def test_invalid_identifier_allowed_when_validation_disabled(self) -> None: + """Invalid identifier allowed when validate=False.""" + msg = Message( + id=Identifier(name="invalid id"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Should not raise exception + result = serialize(resource, validate=False) + assert "invalid id" in result + + def test_valid_identifier_with_hyphens_and_underscores(self) -> None: + """Valid identifiers with hyphens and underscores pass validation.""" + msg = Message( + id=Identifier(name="valid-id_123"), + value=Pattern(elements=(TextElement(value="Test"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource, validate=True) + assert "valid-id_123" in result + + +# ============================================================================ +# EDGE CASES AND INTERNAL HELPERS (from test_serializer_edge_cases.py) +# ============================================================================ + + +class TestControlCharacterEscaping: + """Test StringLiteral escaping of all control characters.""" + + def test_del_character_escaped_as_unicode(self) -> None: + """DEL character (0x7F) serialized as \\u007F escape sequence.""" + # DEL is a control character that needs Unicode escaping + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value="before\x7fafter")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + # DEL must be escaped as \u007F + assert r"\u007F" in result + assert "before" in result + assert "after" in result + + def test_nul_character_escaped(self) -> None: + """NUL character (0x00) serialized as \\u0000 escape sequence.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value="a\x00b")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + assert r"\u0000" in result + + def test_bel_character_escaped(self) -> None: + """BEL character (0x07) serialized as \\u0007 escape sequence.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value="ring\x07bell")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + assert r"\u0007" in result + + def test_vertical_tab_escaped(self) -> None: + """Vertical tab (0x0B) serialized as \\u000B escape sequence.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value="a\x0bb")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + assert r"\u000B" in result + + def test_form_feed_escaped(self) -> None: + """Form feed (0x0C) serialized as \\u000C escape sequence.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value="page\x0cbreak")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + assert r"\u000C" in result + + def test_escape_character_escaped(self) -> None: + """ESC character (0x1B) serialized as \\u001B escape sequence.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value="before\x1bafter")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + assert r"\u001B" in result + + @given( + control_char=st.one_of( + st.integers(min_value=0x00, max_value=0x1F), # C0 control characters + st.just(0x7F), # DEL + ) + ) + @example(control_char=0x7F) # Ensure DEL is explicitly tested + @example(control_char=0x00) # NUL + @example(control_char=0x01) # SOH + @example(control_char=0x1F) # Unit separator + def test_all_control_characters_escaped_property(self, control_char: int) -> None: + """All control characters (0x00-0x1F, 0x7F) escaped as Unicode.""" + is_del = control_char == 0x7F + event(f"control_char=0x{control_char:02X}") + event(f"is_del={is_del}") + char = chr(control_char) + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(Placeable(expression=StringLiteral(value=f"a{char}b")),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + # Verify Unicode escape present + expected_escape = f"\\u{control_char:04X}" + assert expected_escape in result + + # Verify the raw control character is NOT in the output + # (it should be escaped) + # Exception: newline/tab which might be normalized by string handling + if char not in "\n\t": + assert char not in result + + +class TestSerializationDepthLimitWithoutValidation: + """Test depth limit enforcement during serialization when validation is disabled. + + Per serializer.py lines 297-299, the serialize method has a try/except + that catches DepthLimitExceededError during the _serialize_resource call. + This is distinct from the validation phase depth check. + + To trigger this: + 1. Disable validation (validate=False) + 2. Create AST with nesting that exceeds max_depth + 3. Depth guard triggers during serialization, not validation + """ + + def test_depth_exceeded_during_serialization_not_validation(self) -> None: + """Depth limit enforced during serialization even when validation disabled.""" + # Create deeply nested Placeables beyond the limit + # Start with innermost expression + max_depth = 5 + inner_expr: StringLiteral | Placeable = StringLiteral(value="deep") + + # Build nested Placeables: each Placeable adds one depth level + for _ in range(max_depth + 1): # Exceed limit by 1 + inner_expr = Placeable(expression=inner_expr) + + # Type narrowing: at this point inner_expr is definitely a Placeable + inner_placeable: Placeable = inner_expr # type: ignore[assignment] + + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(inner_placeable,)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Validation is disabled - should still catch depth during serialization + with pytest.raises(SerializationDepthError, match="nesting exceeds maximum depth"): + serialize(resource, validate=False, max_depth=max_depth) + + def test_depth_exactly_at_limit_succeeds_without_validation(self) -> None: + """AST exactly at depth limit serializes successfully without validation.""" + max_depth = 5 + inner_expr: StringLiteral | Placeable = StringLiteral(value="ok") + + # Build nested Placeables exactly at limit + for _ in range(max_depth): + inner_expr = Placeable(expression=inner_expr) + + # Type narrowing: at this point inner_expr is definitely a Placeable + inner_placeable: Placeable = inner_expr # type: ignore[assignment] + + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(inner_placeable,)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Should succeed - exactly at limit + result = serialize(resource, validate=False, max_depth=max_depth) + assert "ok" in result + + @given( + depth_over_limit=st.integers(min_value=1, max_value=10), + max_depth=st.integers(min_value=3, max_value=20), + ) + @example(depth_over_limit=1, max_depth=5) + @example(depth_over_limit=5, max_depth=10) + def test_serialization_depth_property(self, depth_over_limit: int, max_depth: int) -> None: + """Serialization depth limit enforced regardless of validation setting.""" + total = max_depth + depth_over_limit + event(f"max_depth={max_depth}") + event(f"depth_over_limit={depth_over_limit}") + event(f"total_nesting={total}") + # Build AST exceeding depth limit + inner_expr: StringLiteral | Placeable = StringLiteral(value="x") + for _ in range(max_depth + depth_over_limit): + inner_expr = Placeable(expression=inner_expr) + + # Type narrowing: at this point inner_expr is definitely a Placeable + inner_placeable: Placeable = inner_expr # type: ignore[assignment] + + msg = Message( + id=Identifier(name="m"), + value=Pattern(elements=(inner_placeable,)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + # Should raise SerializationDepthError + with pytest.raises(SerializationDepthError): + serialize(resource, validate=False, max_depth=max_depth) + + +class TestJunkWithLeadingWhitespace: + """Test Junk entry serialization with leading whitespace. + + Per serializer.py line 321, when a Junk entry follows another entry + and the Junk content starts with whitespace, the separator logic takes + a different path (pass statement, no additional separator added). + + This tests the specific branch: isinstance(entry, Junk) and entry.content[0] in "\\n " + """ + + def test_junk_with_leading_newline_after_message(self) -> None: + """Junk with leading newline after message skips adding separator.""" + msg = Message( + id=Identifier(name="hello"), + value=Pattern(elements=(TextElement(value="World"),)), + attributes=(), + ) + # Junk with leading newline - parser includes preceding whitespace + junk = Junk(content="\ninvalid junk content") + resource = Resource(entries=(msg, junk)) + + result = serialize(resource) + + # Should not have double newline - Junk content already starts with \n + # Result should be: "hello = World\n\ninvalid junk content\n" + # But since Junk already has \n, we don't add another separator + assert "hello = World\n" in result + assert "\ninvalid junk content" in result + # Should NOT have triple newline + assert "\n\n\n" not in result + + def test_junk_with_leading_space_after_message(self) -> None: + """Junk with leading space after message skips adding separator.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="value"),)), + attributes=(), + ) + # Junk with leading space + junk = Junk(content=" some junk") + resource = Resource(entries=(msg, junk)) + + result = serialize(resource) + + # Junk already has leading space, so separator is skipped + assert "test = value\n some junk" in result + + def test_junk_without_leading_whitespace_gets_separator(self) -> None: + """Junk without leading whitespace gets normal separator.""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="value"),)), + attributes=(), + ) + # Junk WITHOUT leading whitespace + junk = Junk(content="junk content") + resource = Resource(entries=(msg, junk)) + + result = serialize(resource) + + # Normal separator added + assert "test = value\n" in result + assert "\njunk content" in result + + def test_empty_junk_content_gets_separator(self) -> None: + """Empty Junk content gets normal separator (no [0] index access).""" + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value="value"),)), + attributes=(), + ) + # Empty junk - entry.content[0] won't be accessed due to short-circuit + junk = Junk(content="") + resource = Resource(entries=(msg, junk)) + + result = serialize(resource) + + # Empty junk still gets separator + assert "test = value\n" in result + + @given( + leading_char=st.sampled_from(["\n", " ", "\t", "j"]), + has_content=st.booleans(), + ) + @example(leading_char="\n", has_content=True) + @example(leading_char=" ", has_content=True) + @example(leading_char="j", has_content=True) + def test_junk_separator_logic_property(self, leading_char: str, has_content: bool) -> None: + """Junk separator logic handles various leading characters correctly.""" + is_ws = leading_char in ("\n", " ", "\t") + event(f"leading_char_is_whitespace={is_ws}") + event(f"has_content={has_content}") + msg = Message( + id=Identifier(name="m"), + value=Pattern(elements=(TextElement(value="v"),)), + attributes=(), + ) + + junk = Junk(content=f"{leading_char}content") if has_content else Junk(content="") + + resource = Resource(entries=(msg, junk)) + + # Should not raise - serialization should handle all cases + result = serialize(resource) + assert isinstance(result, str) + assert "m = v" in result + + +class TestPatternWithoutBraces: + """Test Pattern serialization path when text has no braces. + + Per serializer.py line 483->467, there's an else branch when text + contains neither { nor } characters. This tests the optimization path + that emits text directly without brace handling. + """ + + def test_text_without_braces_direct_output(self) -> None: + """Text without braces takes direct output path.""" + msg = Message( + id=Identifier(name="plain"), + value=Pattern(elements=(TextElement(value="No braces here, just plain text!"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + # Should contain the text as-is (no brace escaping needed) + assert "No braces here, just plain text!" in result + # Should NOT have any brace-related escaping + assert '{ "{" }' not in result + assert '{ "}" }' not in result + + def test_text_with_only_safe_punctuation(self) -> None: + """Text with punctuation but no braces serializes directly.""" + msg = Message( + id=Identifier(name="punct"), + value=Pattern(elements=(TextElement(value="Hello, world! How are you?"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + assert "Hello, world! How are you?" in result + # No brace escaping + assert '{ "{" }' not in result + + def test_text_with_numbers_and_symbols(self) -> None: + """Text with numbers and safe symbols serializes directly.""" + msg = Message( + id=Identifier(name="data"), + value=Pattern(elements=(TextElement(value="Price: $42.00 (20% off)"),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + result = serialize(resource) + + assert "Price: $42.00 (20% off)" in result + + @given( + text=st.text( + alphabet=st.characters( + whitelist_categories=("Lu", "Ll", "Nd", "Zs"), + whitelist_characters="!@#$%^&*()_+-=[]|;:'\",.<>?/~`", + ), + min_size=1, + max_size=100, + ).filter(lambda t: "{" not in t and "}" not in t) + ) + @example(text="Simple text without any braces") + @example(text="Numbers 123 and symbols !@#") + def test_brace_free_text_property(self, text: str) -> None: + """Text without braces always serializes without brace escaping.""" + event(f"input_len={len(text)}") + assume(text.strip()) # Non-empty after stripping + # Leading whitespace gets wrapped in a StringLiteral placeable for + # roundtrip correctness (see _serialize_pattern); not this test's concern. + assume(not text[0].isspace()) + + msg = Message( + id=Identifier(name="test"), + value=Pattern(elements=(TextElement(value=text),)), + attributes=(), + ) + resource = Resource(entries=(msg,)) + + result = serialize(resource) + + # Should contain the original text + assert text in result + # Should NOT have brace escaping since input has no braces + assert '{ "{" }' not in result or "{" in text # Only if original had them + assert '{ "}" }' not in result or "}" in text diff --git a/tests/test_syntax_validator.py b/tests/test_syntax_validator.py index 66b99320..bd838fc8 100644 --- a/tests/test_syntax_validator.py +++ b/tests/test_syntax_validator.py @@ -1,2257 +1,6 @@ -"""Tests for syntax.validator: SemanticValidator, validate(), semantic correctness per spec.""" +"""Aggregated syntax validator test surface.""" -from __future__ import annotations - -from decimal import Decimal - -import pytest - -from ftllexengine import FluentBundle -from ftllexengine.diagnostics import ValidationResult -from ftllexengine.diagnostics.codes import DiagnosticCode -from ftllexengine.enums import CommentType -from ftllexengine.introspection import FunctionCallInfo, introspect_message -from ftllexengine.syntax.ast import ( - Annotation, - Attribute, - CallArguments, - Comment, - FunctionReference, - Identifier, - Junk, - Message, - NamedArgument, - NumberLiteral, - Pattern, - Placeable, - Resource, - SelectExpression, - Span, - Term, - TermReference, - TextElement, - VariableReference, - Variant, -) -from ftllexengine.syntax.parser import FluentParserV1 -from ftllexengine.syntax.validator import ( - _VALIDATION_MESSAGES, - SemanticValidator, - validate, -) - -# ============================================================================ -# ENTRY VALIDATION TESTS -# ============================================================================ - - -class TestMessageValidation: - """Test message entry validation.""" - - def test_message_with_value_and_attributes(self) -> None: - """Message with value and attributes validates correctly.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = Hello World - .attr1 = Attribute 1 - .attr2 = Attribute 2 -""") - result = validate(resource) - assert result.is_valid - - def test_message_with_only_attributes_no_value(self) -> None: - """Message with no value, only attributes (valid per Fluent spec). - - Tests line 171->175 branch when message.value is None. - """ - parser = FluentParserV1() - resource = parser.parse(""" -msg = - .attr1 = Attribute value - .attr2 = Another attribute -""") - result = validate(resource) - assert result.is_valid - assert len(result.annotations) == 0 - - def test_message_with_plain_text_only(self) -> None: - """Message with plain text value validates.""" - parser = FluentParserV1() - resource = parser.parse("msg = Plain text value") - result = validate(resource) - assert result.is_valid - - def test_message_with_placeables(self) -> None: - """Message with variable references validates. - - Tests line 171-172 (message.value exists branch). - """ - parser = FluentParserV1() - resource = parser.parse("msg = Hello { $name }, you have { $count } messages") - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result.is_valid - - def test_message_with_value_explicit_validation_path(self) -> None: - """Message with value takes the validation path. - - Explicitly tests line 171->172 branch (if message.value: path). - """ - # Create message with explicit value pattern - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Has value"),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result.is_valid - - def test_message_without_value_explicit_validation_path(self) -> None: - """Message without value skips value validation. - - Explicitly tests line 171->175 branch (when message.value is None). - """ - # Create message with no value (only attributes) - message = Message( - id=Identifier(name="test"), - value=None, - attributes=( - Attribute( - id=Identifier(name="attr"), - value=Pattern(elements=(TextElement(value="Attribute value"),)), - ), - ), - ) - resource = Resource(entries=(message,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result.is_valid - - -class TestTermValidation: - """Test term entry validation.""" - - def test_term_with_value_validates(self) -> None: - """Term with value is valid per Fluent spec.""" - parser = FluentParserV1() - resource = parser.parse("-brand = Firefox") - result = validate(resource) - assert result.is_valid - - def test_term_with_value_and_attributes(self) -> None: - """Term with value and attributes validates. - - Tests line 202 - term attribute validation. - """ - parser = FluentParserV1() - resource = parser.parse(""" --brand = Firefox - .short = FX - .long = Mozilla Firefox -""") - result = validate(resource) - assert result.is_valid - - def test_term_without_value_constructor_validation(self) -> None: - """Term without value raises ValueError at construction. - - The AST enforces that terms must have values. - Tests the invariant that validator assumes terms always have values. - """ - with pytest.raises(ValueError, match="Term must have a value pattern"): - Term( - id=Identifier(name="test"), - value=None, # type: ignore[arg-type] # Invalid per spec - attributes=(), - span=Span(start=0, end=10), - ) - - def test_term_without_value_validator_defensive_check(self) -> None: - """Validator defensively checks for term without value. - - Tests lines 188-195 (defensive validation even though AST prevents it). - This tests the validator's defensive programming - if AST validation - is ever bypassed, validator should still catch the error. - """ - # Create a Term object bypassing __post_init__ validation - # This is defensive testing - ensures validator catches errors - # even if AST validation fails - term = object.__new__(Term) - object.__setattr__(term, "id", Identifier(name="broken")) - object.__setattr__(term, "value", None) # Invalid per spec - object.__setattr__(term, "attributes", ()) - object.__setattr__(term, "span", Span(start=0, end=10)) - - resource = Resource(entries=(term,)) - validator = SemanticValidator() - result = validator.validate(resource) - - # Validator should catch the missing value - assert not result.is_valid - errors = [a for a in result.annotations if "TERM_NO_VALUE" in a.code] - assert len(errors) > 0 - - -class TestCommentAndJunkValidation: - """Test Comment and Junk entry handling.""" - - def test_comment_entries_pass_validation(self) -> None: - """Comments require no validation and pass through. - - Tests line 156-157 (Comment case in _validate_entry). - """ - comment = Comment(content="# Test comment", type=CommentType.COMMENT) - resource = Resource(entries=(comment,)) - result = validate(resource) - assert result.is_valid - assert len(result.annotations) == 0 - - def test_junk_entries_pass_validation(self) -> None: - """Junk already represents parse errors, no further validation needed. - - Tests line 158-159 and 158->exit (Junk case in _validate_entry). - """ - junk = Junk(content="invalid syntax", annotations=()) - resource = Resource(entries=(junk,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - # Validator doesn't add errors for junk (already invalid at parse level) - assert result.is_valid - assert len(result.annotations) == 0 - - def test_resource_with_junk_from_parser(self) -> None: - """Parser-generated junk entries are handled correctly.""" - parser = FluentParserV1() - # Invalid FTL syntax produces Junk entries - resource = parser.parse("msg = { invalid syntax here }") - result = validate(resource) - # Validator doesn't crash on junk - assert isinstance(result, ValidationResult) - - def test_multiple_junk_entries_in_resource(self) -> None: - """Multiple junk entries all pass through validator. - - Ensures Junk case exit path is exercised. - """ - junk1 = Junk(content="bad syntax 1", annotations=()) - junk2 = Junk(content="bad syntax 2", annotations=()) - junk3 = Junk(content="bad syntax 3", annotations=()) - - resource = Resource(entries=(junk1, junk2, junk3)) - validator = SemanticValidator() - result = validator.validate(resource) - - # All junk entries pass through without adding validation errors - assert result.is_valid - - def test_junk_entry_isolated_validation(self) -> None: - """Single junk entry validates in isolation. - - Explicitly tests line 158-159 Junk case and exit path. - This test isolates the Junk validation path to ensure - branch coverage tools detect the 158->exit path. - """ - from ftllexengine.core.depth_guard import DepthGuard - - # Create a Junk entry - junk = Junk(content="isolated junk", annotations=()) - - # Validate with fresh validator instance - validator = SemanticValidator() - errors: list[Annotation] = [] - depth_guard = DepthGuard(max_depth=100) - - # Call _validate_entry directly to ensure this specific path is measured - validator._validate_entry(junk, errors, depth_guard) - - # Junk should not add any validation errors - assert len(errors) == 0 - - -class TestEmptyResourceValidation: - """Test empty resource boundary condition.""" - - def test_empty_resource_is_valid(self) -> None: - """Empty resource (no entries) is valid.""" - resource = Resource(entries=()) - result = validate(resource) - assert result.is_valid - assert len(result.annotations) == 0 - - -# ============================================================================ -# PATTERN ELEMENT VALIDATION TESTS -# ============================================================================ - - -class TestTextElementValidation: - """Test TextElement validation.""" - - def test_text_elements_require_no_validation(self) -> None: - """Plain text elements need no validation. - - Tests line 245-246 and 247->exit (TextElement case in _validate_pattern_element). - """ - parser = FluentParserV1() - resource = parser.parse("msg = Plain text without any placeables") - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result.is_valid - - def test_text_with_special_characters(self) -> None: - """Text elements with special characters validate.""" - parser = FluentParserV1() - resource = parser.parse(r"msg = Text with special: !@#$%^&*()_+-=[]|;',./<>?") - result = validate(resource) - assert isinstance(result, ValidationResult) - - def test_text_element_explicit_validation_path(self) -> None: - """Text element explicitly exercises validation path. - - Ensures TextElement case and exit path (line 247->exit) are covered. - """ - # Create message with explicit TextElement - text_elem = TextElement(value="Explicit text element") - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(text_elem,)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - # TextElement requires no validation, should be valid - assert result.is_valid - - def test_multiple_text_elements_in_pattern(self) -> None: - """Pattern with multiple TextElements validates. - - Multiple invocations of TextElement path. - """ - text1 = TextElement(value="First ") - text2 = TextElement(value="Second ") - text3 = TextElement(value="Third") - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(text1, text2, text3)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result.is_valid - - def test_text_element_isolated_validation(self) -> None: - """Single TextElement validates in isolation. - - Explicitly tests line 245-246 TextElement case and exit path. - This test isolates the TextElement validation path to ensure - branch coverage tools detect the 247->exit path. - """ - from ftllexengine.core.depth_guard import DepthGuard - - # Create TextElement - text_elem = TextElement(value="isolated text") - - # Validate with fresh validator instance - validator = SemanticValidator() - errors: list[Annotation] = [] - depth_guard = DepthGuard(max_depth=100) - - # Call _validate_pattern_element directly to ensure this specific path is measured - validator._validate_pattern_element(text_elem, errors, "test", depth_guard) - - # TextElement should not add any validation errors - assert len(errors) == 0 - - def test_junk_entry_isolated_direct_call(self) -> None: - """Junk entry validated through direct method call. - - Alternative approach to ensure 158->exit branch is covered. - """ - from ftllexengine.core.depth_guard import DepthGuard - - junk = Junk(content="direct call junk", annotations=()) - - validator = SemanticValidator() - errors: list[Annotation] = [] - depth_guard = DepthGuard(max_depth=100) - - # Direct call to _validate_entry with Junk - validator._validate_entry(junk, errors, depth_guard) - - assert len(errors) == 0 - - -class TestPlaceableValidation: - """Test Placeable validation including nested cases.""" - - def test_placeable_with_variable_reference(self) -> None: - """Placeable containing variable reference validates.""" - parser = FluentParserV1() - resource = parser.parse("msg = Hello { $name }") - result = validate(resource) - assert result.is_valid - - def test_nested_placeables(self) -> None: - """Nested placeables validate recursively. - - Tests lines 293-294 (Placeable as inline expression). - """ - # Manually construct nested placeables - inner = Placeable(expression=VariableReference(id=Identifier(name="x"))) - outer = Placeable(expression=inner) - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(outer,)), - attributes=(), - ) - resource = Resource(entries=(message,)) - result = validate(resource) - assert result.is_valid - - -# ============================================================================ -# INLINE EXPRESSION VALIDATION TESTS -# ============================================================================ - - -class TestStringAndNumberLiteralValidation: - """Test literal value validation.""" - - def test_string_literal_always_valid(self) -> None: - """String literals require no validation.""" - parser = FluentParserV1() - resource = parser.parse('msg = { "Hello" }') - result = validate(resource) - assert result.is_valid - - def test_number_literal_always_valid(self) -> None: - """Number literals require no validation.""" - parser = FluentParserV1() - resource = parser.parse("msg = { 42 }") - result = validate(resource) - assert result.is_valid - - -class TestVariableReferenceValidation: - """Test variable reference validation.""" - - def test_variable_reference_always_valid(self) -> None: - """Variable references require no semantic validation.""" - parser = FluentParserV1() - resource = parser.parse("msg = { $var }") - result = validate(resource) - assert result.is_valid - - -class TestMessageReferenceValidation: - """Test message reference validation.""" - - def test_message_reference_validates(self) -> None: - """Message references are always valid semantically. - - Tests line 287 (MessageReference case in _validate_inline_expression). - Message references cannot have arguments (enforced by grammar). - """ - parser = FluentParserV1() - resource = parser.parse("msg = { other-msg }") - result = validate(resource) - assert result.is_valid - - def test_message_reference_with_attribute(self) -> None: - """Message reference with attribute access validates.""" - parser = FluentParserV1() - resource = parser.parse("msg = { other-msg.attr }") - result = validate(resource) - assert result.is_valid - - -class TestTermReferenceValidation: - """Test term reference validation.""" - - def test_term_reference_without_arguments(self) -> None: - """Term reference without arguments validates.""" - parser = FluentParserV1() - resource = parser.parse("msg = { -brand }") - result = validate(resource) - assert result.is_valid - - def test_term_reference_with_named_arguments(self) -> None: - """Term reference with named arguments validates.""" - parser = FluentParserV1() - resource = parser.parse('msg = { -brand(case: "nominative") }') - result = validate(resource) - assert result.is_valid - - def test_term_reference_with_positional_arguments_warns(self) -> None: - """Term reference with positional arguments emits warning. - - Tests lines 310-324 (_validate_term_reference with positional args). - Per Fluent spec, positional args to terms are ignored at runtime. - """ - # Manually construct term reference with positional args - args = CallArguments( - positional=(NumberLiteral(value=1, raw="1"),), - named=(), - ) - term_ref = TermReference( - id=Identifier(name="brand"), - arguments=args, - attribute=None, - ) - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=term_ref),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - result = validate(resource) - - # Should emit warning about positional args being ignored - assert not result.is_valid - warnings = [a for a in result.annotations if "positional arguments" in a.message.lower()] - assert len(warnings) > 0 - - def test_term_reference_with_attribute_and_arguments(self) -> None: - """Term reference with attribute access and arguments validates.""" - parser = FluentParserV1() - resource = parser.parse('msg = { -brand.short(case: "genitive") }') - result = validate(resource) - assert result.is_valid - - -class TestFunctionReferenceValidation: - """Test function reference validation.""" - - def test_function_reference_without_arguments(self) -> None: - """Function reference without arguments validates.""" - # Manually construct function call without arguments - func_ref = FunctionReference( - id=Identifier(name="BUILTIN"), - arguments=CallArguments(positional=(), named=()), - ) - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - result = validate(resource) - assert result.is_valid - - def test_function_reference_with_positional_arguments(self) -> None: - """Function reference with positional arguments validates. - - Tests lines 365-366 (positional arg validation in _validate_call_arguments). - """ - parser = FluentParserV1() - resource = parser.parse("msg = { NUMBER($count) }") - result = validate(resource) - assert result.is_valid - - def test_function_reference_with_named_arguments(self) -> None: - """Function reference with named arguments validates.""" - parser = FluentParserV1() - resource = parser.parse("msg = { NUMBER($count, minimumFractionDigits: 2) }") - result = validate(resource) - assert result.is_valid - - -# ============================================================================ -# CALL ARGUMENTS VALIDATION TESTS -# ============================================================================ - - -class TestCallArgumentsValidation: - """Test call arguments validation.""" - - def test_duplicate_named_arguments_invalid(self) -> None: - """Function call with duplicate named arguments is invalid. - - Tests duplicate detection in _validate_call_arguments. - """ - # Manually construct function with duplicate named args - args = CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="option"), - value=NumberLiteral(value=1, raw="1"), - ), - NamedArgument( - name=Identifier(name="option"), # Duplicate! - value=NumberLiteral(value=2, raw="2"), - ), - ), - ) - func_ref = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=args, - ) - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - result = validate(resource) - - # Should detect duplicate named argument - assert not result.is_valid - errors = [a for a in result.annotations if "DUPLICATE" in a.code] - assert len(errors) > 0 - - def test_mixed_positional_and_named_arguments(self) -> None: - """Function with both positional and named arguments validates.""" - parser = FluentParserV1() - resource = parser.parse("msg = { NUMBER($val, minimumFractionDigits: 2) }") - result = validate(resource) - assert result.is_valid - - def test_nested_expressions_in_arguments(self) -> None: - """Nested expressions in arguments validate recursively.""" - parser = FluentParserV1() - resource = parser.parse("msg = { NUMBER({ $count }) }") - result = validate(resource) - assert result.is_valid - - -# ============================================================================ -# SELECT EXPRESSION VALIDATION TESTS -# ============================================================================ - - -class TestSelectExpressionValidation: - """Test select expression validation.""" - - def test_select_with_valid_default_variant(self) -> None: - """Select expression with exactly one default variant validates.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $count -> - [one] One item - *[other] Many items -} -""") - result = validate(resource) - assert result.is_valid - - def test_select_without_variants_constructor_validation(self) -> None: - """SelectExpression without variants raises ValueError at construction. - - Tests AST __post_init__ validation that enforces at least one variant. - Tests assumption that validator can rely on this invariant. - """ - with pytest.raises(ValueError, match="at least one variant"): - SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=(), - ) - - def test_select_without_variants_validator_defensive_check(self) -> None: - """Validator catches empty-variants SelectExpression constructed via object.__new__. - - SelectExpression.__post_init__ enforces non-empty variants at construction. - The validator's check is intentional defense-in-depth for ASTs that bypass - __post_init__ (e.g., via object.__new__ + object.__setattr__). - """ - # Create SelectExpression bypassing __post_init__ validation - select = object.__new__(SelectExpression) - object.__setattr__(select, "selector", VariableReference(id=Identifier(name="x"))) - object.__setattr__(select, "variants", ()) # Invalid per spec - object.__setattr__(select, "span", None) - - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - validator = SemanticValidator() - result = validator.validate(resource) - - # Validator should catch missing variants - assert not result.is_valid - errors = [a for a in result.annotations if "NO_VARIANTS" in a.code] - assert len(errors) > 0 - - def test_select_with_multiple_defaults_constructor_validation(self) -> None: - """SelectExpression with multiple defaults raises ValueError. - - Tests AST __post_init__ validation. - """ - variants = ( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="One"),)), - default=True, # First default - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=True, # Second default - invalid! - ), - ) - with pytest.raises(ValueError, match="exactly one default variant"): - SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=variants, - ) - - def test_select_with_zero_defaults_validator_defensive_check(self) -> None: - """Validator catches zero-default SelectExpression constructed via object.__new__. - - SelectExpression.__post_init__ enforces exactly one default at construction. - The validator's check is intentional defense-in-depth for ASTs that bypass - __post_init__ (e.g., via object.__new__ + object.__setattr__). - """ - # Create SelectExpression with zero defaults (bypassing __post_init__) - variant = object.__new__(Variant) - object.__setattr__(variant, "key", Identifier(name="one")) - object.__setattr__(variant, "value", Pattern(elements=(TextElement(value="One"),))) - object.__setattr__(variant, "default", False) # No default! - object.__setattr__(variant, "span", None) - - select = object.__new__(SelectExpression) - object.__setattr__(select, "selector", VariableReference(id=Identifier(name="x"))) - object.__setattr__(select, "variants", (variant,)) - object.__setattr__(select, "span", None) - - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - validator = SemanticValidator() - result = validator.validate(resource) - - # Validator should catch default count != 1 - assert not result.is_valid - errors = [a for a in result.annotations if "NO_DEFAULT" in a.code] - assert len(errors) > 0 - - def test_select_with_duplicate_variant_keys_invalid(self) -> None: - """Select expression with duplicate variant keys is invalid. - - Tests line 418 (duplicate variant key detection). - """ - # Manually construct select with duplicate keys - variants = ( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="First one"),)), - default=False, - ), - Variant( - key=Identifier(name="one"), # Duplicate! - value=Pattern(elements=(TextElement(value="Second one"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=True, - ), - ) - select = SelectExpression( - selector=VariableReference(id=Identifier(name="x")), - variants=variants, - ) - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - result = validate(resource) - - # Should detect duplicate variant key - assert not result.is_valid - errors = [ - a - for a in result.annotations - if "DUPLICATE" in a.code or "duplicate" in a.message.lower() - ] - assert len(errors) > 0 - - def test_select_with_numeric_variant_keys(self) -> None: - """Select expression with numeric variant keys validates.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $count -> - [0] Zero - [1] One - *[other] Many -} -""") - result = validate(resource) - assert result.is_valid - - def test_select_with_duplicate_numeric_keys_different_forms(self) -> None: - """Numeric variant keys 1 and 1.0 are duplicates. - - Tests Decimal normalization in _variant_key_to_string. - """ - # Manually construct select with 1 and 1.0 (should be duplicates) - variants = ( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="One"),)), - default=False, - ), - Variant( - key=NumberLiteral(value=Decimal("1.0"), raw="1.0"), # Duplicate! - value=Pattern(elements=(TextElement(value="One point zero"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=True, - ), - ) - select = SelectExpression( - selector=VariableReference(id=Identifier(name="x")), - variants=variants, - ) - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - result = validate(resource) - - # Should detect duplicate (1 and 1.0 are same value) - assert not result.is_valid - - def test_select_nested_in_variant(self) -> None: - """Nested select expressions validate recursively.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $x -> - [one] { $y -> - [a] One-A - *[b] One-B - } - *[other] Other -} -""") - result = validate(resource) - assert result.is_valid - - -# ============================================================================ -# VARIANT KEY NORMALIZATION TESTS -# ============================================================================ - - -class TestVariantKeyNormalization: - """Test variant key normalization and Decimal handling.""" - - def test_decimal_normalization_for_numeric_keys(self) -> None: - """Numeric keys are normalized using Decimal for comparison. - - 100 (int, raw="100") and 1E+2 (Decimal, raw="1E2") are the same numeric - value after Decimal normalization; the validator must detect them as - duplicate variant keys. - """ - variants = ( - Variant( - key=NumberLiteral(value=100, raw="100"), - value=Pattern(elements=(TextElement(value="Hundred"),)), - default=False, - ), - Variant( - # Decimal("1E2") == Decimal("100") after normalization. - # raw="1E2" is a valid Decimal literal; value must be Decimal, not int, - # because int("1E2") fails. Both normalize to format("f") = "100". - key=NumberLiteral(value=Decimal("1E2"), raw="1E2"), - value=Pattern(elements=(TextElement(value="Also hundred"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="Other"),)), - default=True, - ), - ) - select = SelectExpression( - selector=VariableReference(id=Identifier(name="x")), - variants=variants, - ) - message = Message( - id=Identifier(name="msg"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - result = validate(resource) - - # Should detect as duplicates after normalization - assert not result.is_valid - - def test_number_literal_rejects_invalid_raw(self) -> None: - """NumberLiteral.__post_init__ rejects raw strings that do not parse as numbers. - - The validator's former fallback (returning key.raw on Decimal conversion failure) - is now unreachable because NumberLiteral enforces the raw/value invariant at - construction time. - """ - with pytest.raises(ValueError, match="not a valid number literal"): - NumberLiteral(value=Decimal(0), raw="not-a-number") - - def test_number_literal_rejects_non_finite_decimal(self) -> None: - """NumberLiteral.__post_init__ rejects non-finite Decimal values. - - Infinity and NaN are not valid FTL number literal values. - The validator's former exception handling for format(Infinity, 'f') is now - unreachable because NumberLiteral rejects non-finite Decimals at construction. - """ - with pytest.raises(ValueError, match="not a finite number"): - NumberLiteral(value=Decimal("Infinity"), raw="Infinity") - - -# ============================================================================ -# VALIDATION RESULT TESTS -# ============================================================================ - - -class TestValidationResultFactory: - """Test ValidationResult factory methods.""" - - def test_validation_result_valid_factory(self) -> None: - """ValidationResult.valid() creates valid result.""" - result = ValidationResult.valid() - assert result.is_valid is True - assert len(result.annotations) == 0 - - def test_validation_result_invalid_factory(self) -> None: - """ValidationResult.invalid() creates invalid result.""" - annotation = Annotation( - code="E0001", - message="Test error", - span=Span(start=0, end=1), - ) - result = ValidationResult.invalid(annotations=(annotation,)) - assert result.is_valid is False - assert len(result.annotations) == 1 - - def test_validation_result_from_annotations_empty(self) -> None: - """ValidationResult.from_annotations() with empty tuple is valid.""" - result = ValidationResult.from_annotations(()) - assert result.is_valid is True - assert len(result.annotations) == 0 - - def test_validation_result_from_annotations_with_errors(self) -> None: - """ValidationResult.from_annotations() with errors is invalid.""" - annotations = ( - Annotation(code="E0001", message="Error 1", span=Span(start=0, end=1)), - Annotation(code="E0002", message="Error 2", span=Span(start=2, end=3)), - ) - result = ValidationResult.from_annotations(annotations) - assert not result.is_valid - assert len(result.annotations) == 2 - - -class TestValidationResultProperties: - """Test ValidationResult properties.""" - - def test_annotations_are_immutable_tuples(self) -> None: - """Annotations are stored as tuples (immutable).""" - annotation = Annotation( - code="E0001", - message="Error", - span=Span(start=0, end=1), - ) - result = ValidationResult.invalid(annotations=(annotation,)) - assert isinstance(result.annotations, tuple) - - def test_is_valid_true_means_no_errors(self) -> None: - """is_valid=True implies no error-level annotations.""" - result = ValidationResult.valid() - assert result.is_valid is True - assert len(result.annotations) == 0 - - -# ============================================================================ -# ERROR MESSAGE HANDLING TESTS -# ============================================================================ - - -class TestErrorMessageHandling: - """Test error message generation and diagnostic codes.""" - - def test_validation_messages_dict_exists(self) -> None: - """_VALIDATION_MESSAGES dict contains error message templates.""" - assert isinstance(_VALIDATION_MESSAGES, dict) - assert len(_VALIDATION_MESSAGES) > 0 - - def test_diagnostic_codes_for_validation_exist(self) -> None: - """Validation-related DiagnosticCodes are defined.""" - expected_codes = [ - DiagnosticCode.VALIDATION_TERM_NO_VALUE, - DiagnosticCode.VALIDATION_SELECT_NO_DEFAULT, - DiagnosticCode.VALIDATION_SELECT_NO_VARIANTS, - DiagnosticCode.VALIDATION_VARIANT_DUPLICATE, - DiagnosticCode.VALIDATION_NAMED_ARG_DUPLICATE, - ] - for code in expected_codes: - assert isinstance(code, DiagnosticCode) - assert code.value >= 5000 # Validation codes in 5000+ range - - def test_error_message_fallback_for_unknown_code(self) -> None: - """Error message uses fallback for unknown diagnostic code. - - Tests line 129->133 in _add_error method. - """ - # Create an annotation with a code not in _VALIDATION_MESSAGES - validator = SemanticValidator() - errors: list[Annotation] = [] - - # Use a diagnostic code that won't be in the validation messages dict - # Call the _add_error method directly (accessing private method for testing) - validator._add_error( - errors, - DiagnosticCode.MESSAGE_NOT_FOUND, # Not a validation code - span=Span(start=0, end=1), - ) - - # Should have added an error with fallback message - assert len(errors) == 1 - assert errors[0].message == "Unknown validation error" - - -# ============================================================================ -# VALIDATOR STATE MANAGEMENT TESTS -# ============================================================================ - - -class TestValidatorStateManagement: - """Test validator internal state handling.""" - - def test_validator_reusable_across_validations(self) -> None: - """Validator can validate multiple resources without state leakage.""" - parser = FluentParserV1() - validator = SemanticValidator() - - # First validation - resource1 = parser.parse("msg1 = Value 1") - result1 = validator.validate(resource1) - assert result1.is_valid - - # Second validation should not be affected by first - resource2 = parser.parse("msg2 = Value 2") - result2 = validator.validate(resource2) - assert result2.is_valid - - def test_validator_results_independent(self) -> None: - """Validating one resource doesn't affect validation of another.""" - parser = FluentParserV1() - validator = SemanticValidator() - - resource1 = parser.parse("msg1 = Value 1") - resource2 = parser.parse("msg2 = Value 2") - - result1_first = validator.validate(resource1) - validator.validate(resource2) # Validate resource2 - result1_again = validator.validate(resource1) # Validate resource1 again - - # Results for same resource should be identical - assert result1_first.is_valid == result1_again.is_valid - assert len(result1_first.annotations) == len(result1_again.annotations) - - -# ============================================================================ -# INTEGRATION TESTS -# ============================================================================ - - -class TestValidatorIntegration: - """Integration tests combining multiple validation aspects.""" - - def test_complex_message_with_all_features(self) -> None: - """Complex message with multiple features validates correctly.""" - parser = FluentParserV1() - resource = parser.parse(""" -# Comment -greeting = Hello { $name }, you have { $count -> - [0] no messages - [1] one message - *[other] { NUMBER($count) } messages -}! - .formal = Dear { $name }, you have { NUMBER($count) } message(s). - --brand = Firefox - .short = FX - -status = - .online = Online now - .offline = Offline - -invalid junk entry -""") - result = validate(resource) - # Should handle all entry types and complex patterns - assert isinstance(result, ValidationResult) - - def test_deeply_nested_structures(self) -> None: - """Deeply nested select expressions validate without issues.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $a -> - [1] { $b -> - [1] { $c -> - [1] Triple nested - *[other] C-other - } - *[other] B-other - } - *[other] A-other -} -""") - result = validate(resource) - assert isinstance(result, ValidationResult) - - def test_multiple_entries_with_mixed_validity(self) -> None: - """Resource with mix of valid and invalid entries.""" - # Construct resource with some invalid entries - valid_message = Message( - id=Identifier(name="valid"), - value=Pattern(elements=(TextElement(value="Valid"),)), - attributes=(), - ) - - # Invalid: duplicate named args - invalid_func = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="opt"), - value=NumberLiteral(value=1, raw="1"), - ), - NamedArgument( - name=Identifier(name="opt"), # Duplicate - value=NumberLiteral(value=2, raw="2"), - ), - ), - ), - ) - invalid_message = Message( - id=Identifier(name="invalid"), - value=Pattern(elements=(Placeable(expression=invalid_func),)), - attributes=(), - ) - - resource = Resource(entries=(valid_message, invalid_message)) - result = validate(resource) - - # Should detect the invalid entry - assert not result.is_valid - assert len(result.annotations) > 0 - - -# ============================================================================ -# CONVENIENCE FUNCTION TESTS -# ============================================================================ - - -class TestConvenienceFunction: - """Test the validate() convenience function.""" - - def test_validate_function_creates_validator_internally(self) -> None: - """validate() function is a convenience wrapper.""" - parser = FluentParserV1() - resource = parser.parse("msg = Value") - - # Use convenience function - result = validate(resource) - - assert isinstance(result, ValidationResult) - assert result.is_valid - - def test_validate_function_same_result_as_validator_class(self) -> None: - """validate() function produces same result as SemanticValidator.""" - parser = FluentParserV1() - resource = parser.parse("msg = Hello World") - - # Use convenience function - result1 = validate(resource) - - # Use validator class - validator = SemanticValidator() - result2 = validator.validate(resource) - - assert result1.is_valid == result2.is_valid - assert len(result1.annotations) == len(result2.annotations) - - -# ============================================================================ -# SEMANTIC VALIDATION (from test_semantic_validation.py) -# ============================================================================ - - -class TestValidationFramework: - """Test the validation framework itself.""" - - def test_validator_initialization(self): - """Test validator can be created.""" - validator = SemanticValidator() - assert validator is not None - - def test_validate_empty_resource(self): - """Empty resource should be valid.""" - parser = FluentParserV1() - resource = parser.parse("") - - result = validate(resource) - assert result.is_valid - assert len(result.annotations) == 0 - - def test_validate_returns_result(self): - """Validate function returns ValidationResult.""" - parser = FluentParserV1() - resource = parser.parse("msg = value") - - result = validate(resource) - assert isinstance(result, ValidationResult) - assert hasattr(result, "is_valid") - assert hasattr(result, "annotations") - - -class TestMessageValidationHighLevel: - """Test message validation rules.""" - - def test_valid_simple_message(self): - """Simple message should be valid.""" - parser = FluentParserV1() - resource = parser.parse("hello = Hello, world!") - - result = validate(resource) - assert result.is_valid - assert len(result.annotations) == 0 - - def test_valid_message_with_variable(self): - """Message with variable should be valid.""" - parser = FluentParserV1() - resource = parser.parse("welcome = Welcome, { $name }!") - - result = validate(resource) - assert result.is_valid - - def test_valid_message_with_attribute(self): - """Message with attribute should be valid.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = Value - .tooltip = Tooltip text -""") - - result = validate(resource) - assert result.is_valid - - def test_valid_message_reference(self): - """Message referencing another message should be valid.""" - parser = FluentParserV1() - resource = parser.parse("msg = { other-msg }") - - result = validate(resource) - assert result.is_valid - - def test_valid_message_reference_with_attribute(self): - """Message.attr reference should be valid.""" - parser = FluentParserV1() - resource = parser.parse("msg = { other.attr }") - - result = validate(resource) - assert result.is_valid - - -class TestTermValidationHighLevel: - """Test term validation rules.""" - - def test_valid_simple_term(self): - """Simple term should be valid.""" - parser = FluentParserV1() - resource = parser.parse("-brand = Firefox") - - result = validate(resource) - assert result.is_valid - - def test_valid_term_with_attribute(self): - """Term with attribute should be valid.""" - parser = FluentParserV1() - resource = parser.parse(""" --brand = Firefox - .gender = masculine -""") - - result = validate(resource) - assert result.is_valid - - def test_valid_term_reference_with_arguments(self): - """Term reference with call arguments should be valid.""" - parser = FluentParserV1() - # Note: This tests that if the parser creates a TermReference with arguments, - # the validator accepts it - resource = parser.parse("msg = { -term() }") - - result = validate(resource) - # Should be valid - terms can be parameterized - assert result.is_valid - - -class TestSelectExpressionValidationHighLevel: - """Test select expression validation rules.""" - - def test_valid_select_with_default(self): - """Select with default variant should be valid.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $count -> - [one] One item - *[other] Many items -} -""") - - result = validate(resource) - assert result.is_valid - - def test_valid_select_multiple_variants(self): - """Select with multiple non-default variants should be valid.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $count -> - [zero] No items - [one] One item - [two] Two items - *[other] Many items -} -""") - - result = validate(resource) - assert result.is_valid - - def test_invalid_select_no_default(self): - """Parser rejects select without default variant (syntactic validation). - - Note: This is now a parser-level validation, not semantic validation. - The parser creates Junk for select expressions without default variants - per FTL spec requirements, so semantic validation never sees them. - - This test verifies the parser correctly enforces this rule. - """ - from ftllexengine.syntax.ast import Junk - - parser = FluentParserV1() - # Try to parse select without default - resource = parser.parse(""" -msg = { $count -> - [one] One item - [two] Two items -} -""") - - # Parser should create Junk (syntactic error) - assert len(resource.entries) >= 1 - assert isinstance(resource.entries[0], Junk) - - # Verify error annotation exists - junk = resource.entries[0] - assert len(junk.annotations) > 0 - # Generic error message (detailed info removed) - - def test_invalid_duplicate_variant_keys(self): - """Select with duplicate variant keys should be invalid.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $count -> - [one] First one - [one] Second one (duplicate) - *[other] Many -} -""") - - result = validate(resource) - - # Should detect duplicate keys - if not result.is_valid: - assert any("VALIDATION_VARIANT_DUPLICATE" in ann.code for ann in result.annotations) - else: - # Parser might have deduped, which is also acceptable - pass - - def test_high_precision_numeric_variants_not_false_duplicate(self): - """High-precision numeric variant keys are treated as distinct. - - Regression test for SEM-VALIDATOR-PRECISION-001. - Validator should use NumberLiteral.raw (original string) for comparison, - not NumberLiteral.value (Decimal), to preserve precision. - This matches resolver behavior. - """ - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $x -> - [0.10000000000000001] precise - [0.1] rounded - *[other] default -} -""") - - result = validate(resource) - - # These keys should NOT be treated as duplicates because they have - # different source representations even though their numeric values are - # close. The validator should accept this as valid FTL. - assert result.is_valid - - -class TestFunctionValidationHighLevel: - """Test function reference validation rules.""" - - def test_valid_function_no_args(self): - """Function with no arguments should be valid.""" - parser = FluentParserV1() - resource = parser.parse("msg = { FUNC() }") - - result = validate(resource) - assert result.is_valid - - def test_valid_function_positional_args(self): - """Function with positional arguments should be valid.""" - parser = FluentParserV1() - resource = parser.parse("msg = { NUMBER($count) }") - - result = validate(resource) - assert result.is_valid - - def test_valid_function_named_args(self): - """Function with named arguments should be valid.""" - parser = FluentParserV1() - resource = parser.parse("msg = { NUMBER($count, minimumFractionDigits: 2) }") - - result = validate(resource) - assert result.is_valid - - def test_valid_function_mixed_args(self): - """Function with positional and named arguments should be valid.""" - parser = FluentParserV1() - resource = parser.parse('msg = { DATETIME($date, hour: "numeric", minute: "numeric") }') - - result = validate(resource) - assert result.is_valid - - def test_invalid_duplicate_named_args(self): - """Function with duplicate named arguments should be invalid.""" - parser = FluentParserV1() - resource = parser.parse("msg = { FUNC(x: 1, x: 2) }") - - result = validate(resource) - - # Should detect duplicate named arguments - if not result.is_valid: - assert any("E0010" in ann.code for ann in result.annotations) - - -class TestRealWorldScenarios: - """Test validation on real-world FTL patterns.""" - - def test_complex_message_with_select(self): - """Complex message with select should validate.""" - parser = FluentParserV1() - resource = parser.parse(""" -emails = { $unreadEmails -> - [one] You have one unread email - *[other] You have { $unreadEmails } unread emails -} -""") - - result = validate(resource) - assert result.is_valid - - def test_message_with_multiple_placeables(self): - """Message with multiple placeables should validate.""" - parser = FluentParserV1() - resource = parser.parse("msg = Hello { $firstName } { $lastName }!") - - result = validate(resource) - assert result.is_valid - - def test_nested_select_expressions(self): - """Nested select expressions should validate.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = { $gender -> - [male] { $count -> - [one] He has one item - *[other] He has { $count } items - } - *[female] { $count -> - [one] She has one item - *[other] She has { $count } items - } -} -""") - - result = validate(resource) - assert result.is_valid - - def test_term_reference_in_message(self): - """Term reference in message should validate.""" - parser = FluentParserV1() - resource = parser.parse(""" --brand = Firefox -welcome = Welcome to { -brand }! -""") - - result = validate(resource) - assert result.is_valid - - def test_message_with_function_and_select(self): - """Message combining function call and select should validate.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = Updated { DATETIME($date, month: "long", year: "numeric") } - { $status -> - [active] Active - *[inactive] Inactive -} -""") - - result = validate(resource) - assert result.is_valid - - -class TestEdgeCases: - """Test edge cases in validation.""" - - def test_comment_only_resource(self): - """Resource with only comments should be valid.""" - parser = FluentParserV1() - resource = parser.parse(""" -# This is a comment -## This is a group comment -### This is a resource comment -""") - - result = validate(resource) - assert result.is_valid - - def test_message_with_only_attributes(self): - """Message with only attributes (no value) should be valid.""" - parser = FluentParserV1() - resource = parser.parse(""" -msg = - .attr1 = Value 1 - .attr2 = Value 2 -""") - - result = validate(resource) - # This should be valid per spec - assert result.is_valid - - def test_empty_pattern(self): - """Message with empty value should be valid.""" - parser = FluentParserV1() - resource = parser.parse("msg = ") - - result = validate(resource) - # Empty pattern is syntactically valid - assert result.is_valid - - def test_junk_entries_ignored(self): - """Junk entries should not be validated (already errors).""" - parser = FluentParserV1() - resource = parser.parse(""" -valid = Value -invalid { syntax -also-valid = Another value -""") - - result = validate(resource) - # Should validate the valid entries, ignore junk - assert result.is_valid - - -class TestValidatorState: - """Test validator state management.""" - - def test_validator_reusable(self): - """Validator should be reusable across multiple validations.""" - validator = SemanticValidator() - parser = FluentParserV1() - - resource1 = parser.parse("msg1 = Value 1") - result1 = validator.validate(resource1) - assert result1.is_valid - - resource2 = parser.parse("msg2 = Value 2") - result2 = validator.validate(resource2) - assert result2.is_valid - - # Errors shouldn't accumulate - assert len(result1.annotations) == 0 - assert len(result2.annotations) == 0 - - def test_validate_function_is_stateless(self): - """Module-level validate() function should be stateless.""" - parser = FluentParserV1() - - result1 = validate(parser.parse("msg1 = Value 1")) - result2 = validate(parser.parse("msg2 = Value 2")) - - assert result1.is_valid - assert result2.is_valid - - -class TestValidationErrorCodes: - """Test that error codes are descriptive and consistent.""" - - def test_diagnostic_codes_are_unique(self): - """All validation DiagnosticCode values should be unique.""" - from ftllexengine.diagnostics.codes import DiagnosticCode - - # Get all validation-related codes (5000-5199 range) - validation_codes = [ - code for code in DiagnosticCode - if code.value >= 5000 and code.value < 5200 - ] - values = [code.value for code in validation_codes] - assert len(values) == len(set(values)), "DiagnosticCode values must be unique" - - def test_validation_messages_exist(self): - """All validation codes should have messages in _VALIDATION_MESSAGES.""" - from ftllexengine.diagnostics.codes import DiagnosticCode - from ftllexengine.syntax.validator import _VALIDATION_MESSAGES - - for code, message in _VALIDATION_MESSAGES.items(): - assert isinstance(code, DiagnosticCode), f"{code} should be DiagnosticCode" - assert len(message) > 5, f"Message for {code.name} should be descriptive" - assert message[0].isupper(), f"Message for {code.name} should start with uppercase" - - -class TestAttributeGranularCycleDetection: - """Attribute-granular cycle detection prevents false positives. - - A message referencing its own attribute (msg = { msg.tooltip }) is NOT a cycle. - Only true self-references (msg = { msg }) or cross-message cycles are cyclic. - This distinction prevents spurious warnings for common FTL patterns. - """ - - def test_cross_attribute_reference_not_cyclic(self) -> None: - """Message value referencing its own attribute is not a circular reference.""" - bundle = FluentBundle("en") - ftl = "msg = { msg.tooltip }\n .tooltip = Tooltip text\n" - result = bundle.validate_resource(ftl) - circular_warnings = [w for w in result.warnings if "ircular" in w.message] - assert len(circular_warnings) == 0 - - def test_true_self_reference_detected(self) -> None: - """Message value referencing itself is a circular reference.""" - bundle = FluentBundle("en") - ftl = "msg = { msg }\n" - result = bundle.validate_resource(ftl) - circular_warnings = [w for w in result.warnings if "ircular" in w.message] - assert len(circular_warnings) > 0 - - def test_term_attribute_self_reference_detected(self) -> None: - """Term attribute referencing itself is a circular reference.""" - bundle = FluentBundle("en") - ftl = "-term = Value\n .attr = { -term.attr }\n" - result = bundle.validate_resource(ftl) - circular_warnings = [w for w in result.warnings if "ircular" in w.message] - assert len(circular_warnings) > 0 - - def test_cross_term_cycle_detected(self) -> None: - """Cross-term mutual references produce a circular reference warning.""" - bundle = FluentBundle("en") - ftl = "-a = { -b }\n-b = { -a }\n" - result = bundle.validate_resource(ftl) - circular_warnings = [w for w in result.warnings if "ircular" in w.message] - assert len(circular_warnings) > 0 - - -# ============================================================================ -# VALIDATION EDGE CASES (from test_semantic_validation_edge_cases.py) -# ============================================================================ - - -class TestTermPositionalArgsWarning: - """Tests for VAL-TERM-POSITIONAL-ARGS-001 resolution. - - SemanticValidator emits warning when term references include positional - arguments, which are silently ignored at runtime per Fluent spec. - """ - - def test_term_reference_positional_args_triggers_warning(self) -> None: - """Term reference with positional args emits validation warning.""" - parser = FluentParserV1() - ftl_source = """ --brand = Acme Corp -msg = Welcome to { -brand($var) } -""" - resource = parser.parse(ftl_source) - - validator = SemanticValidator() - result = validator.validate(resource) - - # Should have warning about positional args - # Annotation.code is a string (enum name), not DiagnosticCode enum - warning_codes = [a.code for a in result.annotations] - assert "VALIDATION_TERM_POSITIONAL_ARGS" in warning_codes - - def test_term_reference_named_args_no_warning(self) -> None: - """Term reference with only named args does NOT emit warning.""" - parser = FluentParserV1() - ftl_source = """ --brand = { $case -> - [nominative] Acme Corp - *[other] Acme Corp -} -msg = Welcome to { -brand(case: "nominative") } -""" - resource = parser.parse(ftl_source) - - validator = SemanticValidator() - result = validator.validate(resource) - - # Should NOT have warning about positional args - warning_codes = [a.code for a in result.annotations] - assert "VALIDATION_TERM_POSITIONAL_ARGS" not in warning_codes - - def test_term_reference_mixed_args_triggers_warning(self) -> None: - """Term reference with mixed positional and named args emits warning.""" - parser = FluentParserV1() - ftl_source = """ --brand = Acme Corp -msg = Welcome to { -brand($var, extra: "value") } -""" - resource = parser.parse(ftl_source) - - validator = SemanticValidator() - result = validator.validate(resource) - - warning_codes = [a.code for a in result.annotations] - assert "VALIDATION_TERM_POSITIONAL_ARGS" in warning_codes - - def test_term_reference_no_args_no_warning(self) -> None: - """Term reference without arguments does NOT emit warning.""" - parser = FluentParserV1() - ftl_source = """ --brand = Acme Corp -msg = Welcome to { -brand } -""" - resource = parser.parse(ftl_source) - - validator = SemanticValidator() - result = validator.validate(resource) - - # Should NOT have warning about positional args - warning_codes = [a.code for a in result.annotations] - assert "VALIDATION_TERM_POSITIONAL_ARGS" not in warning_codes - - def test_warning_message_contains_term_name(self) -> None: - """Warning message identifies the term reference causing the warning.""" - parser = FluentParserV1() - ftl_source = """ --my_special_term = Test -msg = { -my_special_term($x) } -""" - resource = parser.parse(ftl_source) - - validator = SemanticValidator() - result = validator.validate(resource) - - annotations = [ - a - for a in result.annotations - if a.code == "VALIDATION_TERM_POSITIONAL_ARGS" - ] - assert len(annotations) == 1 - assert "-my_special_term" in annotations[0].message - assert "positional arguments are ignored" in annotations[0].message - - -class TestFunctionCallInfoPositionalArgVarsRename: - """Tests for SEM-INTROSPECTION-DATA-LOSS-001 resolution. - - FunctionCallInfo.positional_args renamed to positional_arg_vars to - clarify that it contains only variable reference names, not all arguments. - """ - - def test_positional_arg_vars_field_exists(self) -> None: - """FunctionCallInfo has positional_arg_vars field.""" - info = FunctionCallInfo( - name="NUMBER", - positional_arg_vars=("amount", "extra"), - named_args=frozenset({"minimumFractionDigits"}), - span=None, - ) - assert info.positional_arg_vars == ("amount", "extra") - - def test_positional_arg_vars_contains_only_variable_names(self) -> None: - """positional_arg_vars only contains VariableReference names.""" - parser = FluentParserV1() - # FTL with function that has mixed positional args (variable and literal) - ftl_source = 'msg = { NUMBER($var, "literal") }' - resource = parser.parse(ftl_source) - msg = resource.entries[0] - assert isinstance(msg, Message) - - result = introspect_message(msg) - func = next(iter(result.functions)) - - # Only variable reference name should be present, not "literal" - assert func.positional_arg_vars == ("var",) - - def test_introspect_message_extracts_positional_arg_vars(self) -> None: - """introspect_message correctly populates positional_arg_vars.""" - bundle = FluentBundle("en") - bundle.add_resource("price = { NUMBER($amount, minimumFractionDigits: 2) }") - - info = bundle.introspect_message("price") - funcs = list(info.functions) - assert len(funcs) == 1 - - func = funcs[0] - assert func.name == "NUMBER" - assert "amount" in func.positional_arg_vars - assert "minimumFractionDigits" in func.named_args - - def test_positional_arg_vars_multiple_variables(self) -> None: - """positional_arg_vars captures multiple variable references.""" - parser = FluentParserV1() - ftl_source = "msg = { FUNC($a, $b, $c) }" - resource = parser.parse(ftl_source) - msg = resource.entries[0] - assert isinstance(msg, Message) - - result = introspect_message(msg) - func = next(iter(result.functions)) - - assert set(func.positional_arg_vars) == {"a", "b", "c"} - - -class TestCrossResourceCycleDetection: - """Tests for VAL-CROSS-RESOURCE-CYCLES-001 resolution. - - FluentBundle.validate_resource() now detects cycles involving dependencies - OF existing bundle entries, not just their names. - """ - - def test_simple_cross_resource_cycle_detected(self) -> None: - """Cycle through dependencies of existing entry is detected. - - Scenario: - - Resource 1: msg_a = { msg_b } - - Resource 2: msg_b = { msg_a } - - When validating Resource 2, msg_b references msg_a which is in the bundle. - Since msg_a's dependencies (msg_b) now complete a cycle, it should be detected. - """ - bundle = FluentBundle("en", use_isolating=False) - - # Add first resource: msg_a depends on msg_b (not yet defined) - bundle.add_resource("msg_a = { msg_b }") - - # Now validate second resource that completes the cycle - result = bundle.validate_resource("msg_b = { msg_a }") - - # Should detect the circular reference - warning_texts = " ".join(w.message for w in result.warnings) - assert "Circular" in warning_texts - - def test_term_cross_resource_cycle_detected(self) -> None: - """Cycle through term dependencies is detected. - - Scenario: - - Resource 1: -term_a = { -term_b } - - Resource 2: -term_b = { -term_a } - """ - bundle = FluentBundle("en", use_isolating=False) - - # Add first resource: term_a depends on term_b - bundle.add_resource("-term_a = { -term_b }") - - # Validate second resource that completes the cycle - result = bundle.validate_resource("-term_b = { -term_a }") - - # Should detect the circular reference - warning_texts = " ".join(w.message for w in result.warnings) - assert "Circular" in warning_texts - - def test_mixed_message_term_cross_resource_cycle_detected(self) -> None: - """Cycle involving both messages and terms across resources is detected. - - Scenario: - - Resource 1: -brand = { greeting } - - Resource 2: greeting = { -brand } - """ - bundle = FluentBundle("en", use_isolating=False) - - # Add first resource: term depends on message - bundle.add_resource("-brand = { greeting }") - - # Validate second resource that completes the cycle - result = bundle.validate_resource("greeting = { -brand }") - - # Should detect the circular reference - warning_texts = " ".join(w.message for w in result.warnings) - assert "Circular" in warning_texts - - def test_no_false_positive_for_valid_cross_resource(self) -> None: - """Valid cross-resource references don't trigger false positives. - - Scenario: - - Resource 1: msg_a = Hello - - Resource 2: msg_b = { msg_a } - - This is a valid dependency chain, not a cycle. - """ - bundle = FluentBundle("en", use_isolating=False) - - # Add first resource: msg_a has no dependencies - bundle.add_resource("msg_a = Hello") - - # Validate second resource that references msg_a - result = bundle.validate_resource("msg_b = { msg_a }") - - # Should NOT have circular reference warnings - warning_texts = " ".join(w.message for w in result.warnings) - assert "Circular" not in warning_texts - - def test_transitive_cross_resource_cycle_detected(self) -> None: - """Transitive cycles across resources are detected. - - Scenario: - - Resource 1: msg_a = { msg_b }, msg_b = { msg_c } - - Resource 2: msg_c = { msg_a } - """ - bundle = FluentBundle("en", use_isolating=False) - - # Add first resource with chain msg_a -> msg_b -> msg_c (incomplete) - bundle.add_resource(""" -msg_a = { msg_b } -msg_b = { msg_c } -""") - - # Validate second resource that completes the cycle - result = bundle.validate_resource("msg_c = { msg_a }") - - # Should detect the circular reference - warning_texts = " ".join(w.message for w in result.warnings) - assert "Circular" in warning_texts - - def test_bundle_deps_tracking_accuracy(self) -> None: - """Internal _msg_deps and _term_deps are correctly populated.""" - bundle = FluentBundle("en", use_isolating=False) - - # Add resources with various dependencies - bundle.add_resource(""" --brand = Acme Corp --slogan = { -brand } -welcome = Hello { -brand } -goodbye = { welcome } - { -slogan } -""") - - # pylint: disable=protected-access - # Verify _term_deps - assert "brand" in bundle._term_deps - assert bundle._term_deps["brand"] == set() - - assert "slogan" in bundle._term_deps - assert "term:brand" in bundle._term_deps["slogan"] - - # Verify _msg_deps - assert "welcome" in bundle._msg_deps - assert "term:brand" in bundle._msg_deps["welcome"] - - assert "goodbye" in bundle._msg_deps - assert "msg:welcome" in bundle._msg_deps["goodbye"] - assert "term:slogan" in bundle._msg_deps["goodbye"] - # pylint: enable=protected-access - - -# ============================================================================ -# VALIDATOR BRANCH COVERAGE -# ============================================================================ - - -class TestValidatorBranchCoverage: - """Test SemanticValidator branch coverage.""" - - def test_validate_junk_entry_passthrough(self) -> None: - """Junk entry in validation passes through without error.""" - junk = Junk(content="invalid") - resource = Resource(entries=(junk,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result is not None - - def test_validate_comment_entry_passthrough(self) -> None: - """Comment entry in validation passes through successfully.""" - comment = Comment(content="This is a comment", type=CommentType.COMMENT) - resource = Resource(entries=(comment,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result.is_valid - - def test_validate_message_without_value(self) -> None: - """Message with value=None and attributes validates without crash.""" - attr = Attribute( - id=Identifier("hint"), - value=Pattern(elements=(TextElement("Hint text"),)), - ) - message = Message( - id=Identifier("noValue"), - value=None, - attributes=(attr,), - ) - resource = Resource(entries=(message,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result is not None - - -class TestTermWithoutValueRejected: - """Term with None value is rejected at construction time by __post_init__.""" - - def test_term_without_value_via_manual_ast(self) -> None: - """Term constructor raises ValueError when value is None.""" - with pytest.raises(ValueError, match="Term must have a value pattern"): - Term( - id=Identifier(name="empty-term"), - value=None, # type: ignore[arg-type] - attributes=(), - ) - - -class TestPlaceableExpressionValidation: - """Validator processes the expression inside a Placeable.""" - - def test_placeable_expression_validation(self) -> None: - """Validation processes Placeable's inner expression (hits validate_expression path).""" - ftl = """ -message = Text { $variable } more text -""" - resource = FluentParserV1().parse(ftl) - result = validate(resource) - - assert result.is_valid - - -class TestDuplicateNamedArguments: - """Validator detects duplicate named argument names in function calls.""" - - def test_duplicate_named_arguments(self) -> None: - """Function with duplicate named arg names produces validation annotation.""" - func_ref = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(NumberLiteral(value=42, raw="42"),), - named=( - NamedArgument( - name=Identifier(name="minimumFractionDigits"), - value=NumberLiteral(value=2, raw="2"), - ), - NamedArgument( - name=Identifier(name="minimumFractionDigits"), # Duplicate - value=NumberLiteral(value=3, raw="3"), - ), - ), - ), - ) - - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(Placeable(expression=func_ref),)), - attributes=(), - comment=None, - span=(0, 0), # type: ignore[arg-type] - ) - - resource = Resource(entries=(msg,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert len(result.annotations) > 0 or not result.is_valid - - -class TestSelectExpressionNoVariants: - """SelectExpression with zero variants is rejected by __post_init__.""" - - def test_select_expression_no_variants(self) -> None: - """SelectExpression constructor raises ValueError when variants is empty.""" - with pytest.raises(ValueError, match="SelectExpression requires at least one variant"): - SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=(), - ) - - -class TestNestedPlaceableValidation: - """Validator processes nested Placeables (Placeable as expression of Placeable).""" - - def test_nested_placeable_validation(self) -> None: - """Validator traverses nested Placeables without error.""" - inner_placeable = Placeable( - expression=VariableReference(id=Identifier(name="count")) - ) - outer_placeable = Placeable(expression=inner_placeable) - - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(outer_placeable,)), - attributes=(), - comment=None, - span=(0, 0), # type: ignore[arg-type] - ) - - resource = Resource(entries=(msg,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result.is_valid - - -class TestValidatorBranchCoverageExtended: - """Extended validator branch coverage tests.""" - - def test_validate_term_with_attributes(self) -> None: - """Validator handles term with attributes without error.""" - term = Term( - id=Identifier("brand"), - value=Pattern(elements=(TextElement("Firefox"),)), - attributes=( - Attribute( - id=Identifier("gender"), - value=Pattern(elements=(TextElement("m"),)), - ), - ), - ) - resource = Resource(entries=(term,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result is not None - - def test_validate_message_with_select_in_attribute(self) -> None: - """Validator processes message with SelectExpression in attribute.""" - select = SelectExpression( - selector=VariableReference(id=Identifier("count")), - variants=( - Variant( - key=Identifier("one"), - value=Pattern(elements=(TextElement("One"),)), - default=False, - ), - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement("Other"),)), - default=True, - ), - ), - ) - - message = Message( - id=Identifier("msg"), - value=Pattern(elements=(TextElement("Main"),)), - attributes=( - Attribute( - id=Identifier("count"), - value=Pattern(elements=(Placeable(expression=select),)), - ), - ), - ) - resource = Resource(entries=(message,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result is not None - - -# ============================================================================ -# DEFENSE-IN-DEPTH: PLACEABLE AS SELECTOR GUARD (validator.py:421-422) -# ============================================================================ - - -class TestPlaceableAsSelectorDefenseGuard: - """Validator defense-in-depth: Placeable used as SelectExpression selector. - - The SelectorExpression type alias excludes Placeable at the type level, so - normal construction cannot produce this state. However, deserialization or - object.__setattr__ bypass can. The validator re-checks this invariant at - line 420-422 via a widened ``object`` guard to avoid mypy unreachable - detection while still catching adversarial ASTs at runtime. - - Covers validator.py:422 (``self._add_error(errors, VALIDATION_PLACEABLE_SELECTOR)``). - """ - - def test_placeable_as_selector_adds_error(self) -> None: - """Validator adds VALIDATION_PLACEABLE_SELECTOR error when selector is a Placeable.""" - # Build a valid SelectExpression first, then bypass __post_init__ to - # inject a Placeable as the selector — this is the adversarial path. - valid_select = SelectExpression( - selector=VariableReference(id=Identifier("count")), - variants=( - Variant( - key=Identifier("other"), - value=Pattern(elements=(TextElement("Other"),)), - default=True, - ), - ), - ) - # Bypass __post_init__: inject a Placeable as the selector - nested_literal = Placeable( - expression=VariableReference(id=Identifier("nested")) - ) - object.__setattr__(valid_select, "selector", nested_literal) - - message = Message( - id=Identifier("msg"), - value=Pattern(elements=(Placeable(expression=valid_select),)), - attributes=(), - ) - resource = Resource(entries=(message,)) - - validator = SemanticValidator() - result = validator.validate(resource) - - assert result is not None - # _add_error stores Annotation.code as DiagnosticCode.name (str), not enum. - # Errors from SemanticValidator appear in result.annotations, not result.errors. - selector_errors = [ - a for a in result.annotations - if a.code == DiagnosticCode.VALIDATION_PLACEABLE_SELECTOR.name - ] - assert len(selector_errors) == 1 +from tests.syntax_validator_cases.entries import * # noqa: F403 - re-export split syntax validator tests +from tests.syntax_validator_cases.high_level import * # noqa: F403 - re-export split syntax validator tests +from tests.syntax_validator_cases.regressions import * # noqa: F403 - re-export split syntax validator tests +from tests.syntax_validator_cases.results import * # noqa: F403 - re-export split syntax validator tests diff --git a/tests/test_syntax_validator_property.py b/tests/test_syntax_validator_property.py index e5557a28..6e40ff04 100644 --- a/tests/test_syntax_validator_property.py +++ b/tests/test_syntax_validator_property.py @@ -11,6 +11,7 @@ from ftllexengine.diagnostics import ValidationResult from ftllexengine.diagnostics.codes import DiagnosticCode from ftllexengine.syntax.ast import ( + Annotation, Identifier, Message, Pattern, @@ -90,7 +91,7 @@ class TestDeepNestingValidation: Targets missing coverage in nested validation logic. """ - @given(ftl_deeply_nested_selects(max_depth=10)) + @given(select_expr=ftl_deeply_nested_selects(max_depth=10)) @settings(max_examples=100) def test_validator_handles_deep_nesting(self, select_expr): """STRESS: Deep nesting doesn't crash validator.""" @@ -106,7 +107,7 @@ def test_validator_handles_deep_nesting(self, select_expr): event(f"outcome={'valid' if result.is_valid else 'invalid'}") assert isinstance(result, ValidationResult) - @given(ftl_deeply_nested_selects(max_depth=5)) + @given(select_expr=ftl_deeply_nested_selects(max_depth=5)) @settings(max_examples=100) def test_deeply_nested_selects_validate_correctly(self, select_expr): """PROPERTY: Deeply nested selects validate (may have errors).""" @@ -242,7 +243,7 @@ class TestValidatorStateManagement: Targets lines 331, 361-362, 367, 395: State reset between validations. """ - @given(st.lists(ftl_resources(), min_size=2, max_size=5)) + @given(st.lists(semantic_validation_resources(), min_size=2, max_size=5)) @settings(max_examples=100) def test_validator_state_resets_between_calls(self, resources): """PROPERTY: Validator state resets between validate() calls.""" @@ -254,7 +255,7 @@ def test_validator_state_resets_between_calls(self, resources): for result in results: assert isinstance(result, ValidationResult) - @given(ftl_resources(), ftl_resources()) + @given(semantic_validation_resources(), semantic_validation_resources()) @settings(max_examples=100) def test_validator_results_independent(self, resource1, resource2): """PROPERTY: Validating resource1 doesn't affect resource2.""" @@ -829,7 +830,7 @@ def test_add_error_uses_default_message_when_none(self) -> None: expected_msg = _VALIDATION_MESSAGES[code] validator = SemanticValidator() - errors: list = [] + errors: list[Annotation] = [] # pylint: disable=protected-access validator._add_error(errors, code) # No message argument @@ -843,7 +844,7 @@ def test_add_error_uses_custom_message_when_provided(self) -> None: ) validator = SemanticValidator() - errors: list = [] + errors: list[Annotation] = [] custom_msg = "Custom validation error message" validator._add_error( # pylint: disable=protected-access errors, @@ -861,7 +862,7 @@ def test_add_error_unknown_code_uses_fallback(self) -> None: ) validator = SemanticValidator() - errors: list = [] + errors: list[Annotation] = [] # Use a code NOT in _VALIDATION_MESSAGES validator._add_error( # pylint: disable=protected-access errors, diff --git a/tests/test_syntax_visitor.py b/tests/test_syntax_visitor.py index 5ac863bf..58713884 100644 --- a/tests/test_syntax_visitor.py +++ b/tests/test_syntax_visitor.py @@ -1,1149 +1,12 @@ -"""Tests for syntax.visitor: ASTVisitor traversal, dispatch, and defensive branches.""" - -from __future__ import annotations - -from dataclasses import dataclass -from typing import Any - -from ftllexengine.enums import CommentType -from ftllexengine.syntax.ast import ( - Attribute, - CallArguments, - Comment, - FunctionReference, - Identifier, - Junk, - Message, - MessageReference, - NamedArgument, - NumberLiteral, - Pattern, - Placeable, - Resource, - SelectExpression, - StringLiteral, - Term, - TermReference, - TextElement, - VariableReference, - Variant, -) -from ftllexengine.syntax.visitor import ASTTransformer, ASTVisitor - -# ============================================================================ -# HELPER VISITORS -# ============================================================================ - - -class CountingVisitor(ASTVisitor): - """Counts visits to each node type.""" - - def __init__(self) -> None: - """Initialize counters.""" - super().__init__() - self.counts: dict[str, int] = {} - - def visit(self, node: Any) -> Any: - """Track each visit.""" - node_type = type(node).__name__ - self.counts[node_type] = self.counts.get(node_type, 0) + 1 - return super().visit(node) - - -class CollectingVisitor(ASTVisitor): - """Collects all identifiers visited.""" - - def __init__(self) -> None: - """Initialize collection.""" - super().__init__() - self.identifiers: list[str] = [] - - def visit_Identifier(self, node: Identifier) -> Any: - """Collect identifier names.""" - self.identifiers.append(node.name) - return self.generic_visit(node) - - -class TransformingVisitor(ASTVisitor): - """Transforms text to uppercase.""" - - def visit_TextElement(self, node: TextElement) -> TextElement: - """Transform text to uppercase.""" - return TextElement(value=node.value.upper()) - - -# ============================================================================ -# BASIC VISITOR TESTS -# ============================================================================ - - -class TestASTVisitorBasic: - """Test basic visitor functionality.""" - - def test_visit_dispatches_to_specific_method(self) -> None: - """Visitor dispatches to visit_NodeType method.""" - visitor = CountingVisitor() - node = Identifier(name="test") - - visitor.visit(node) - - assert visitor.counts["Identifier"] == 1 - - def test_generic_visit_returns_node(self) -> None: - """Generic visit returns node unchanged.""" - visitor = ASTVisitor() - node = Identifier(name="test") - - result = visitor.generic_visit(node) - - assert result is node - - -# ============================================================================ -# RESOURCE AND ENTRY NODES -# ============================================================================ - - -class TestVisitorResource: - """Test visiting Resource nodes.""" - - def test_visit_empty_resource(self) -> None: - """Visit empty resource.""" - visitor = CountingVisitor() - resource = Resource(entries=()) - - visitor.visit(resource) - - assert visitor.counts["Resource"] == 1 - - def test_visit_resource_with_messages(self) -> None: - """Visit resource with multiple messages.""" - visitor = CountingVisitor() - resource = Resource( - entries=( - Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="Hello"),)), - attributes=(), - ), - Message( - id=Identifier(name="goodbye"), - value=Pattern(elements=(TextElement(value="Goodbye"),)), - attributes=(), - ), - ) - ) - - visitor.visit(resource) - - assert visitor.counts["Resource"] == 1 - assert visitor.counts["Message"] == 2 - assert visitor.counts["Identifier"] == 2 - assert visitor.counts["Pattern"] == 2 - assert visitor.counts["TextElement"] == 2 - - -class TestVisitorMessage: - """Test visiting Message nodes.""" - - def test_visit_simple_message(self) -> None: - """Visit message with text only.""" - visitor = CountingVisitor() - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - - visitor.visit(msg) - - assert visitor.counts["Message"] == 1 - assert visitor.counts["Identifier"] == 1 - assert visitor.counts["Pattern"] == 1 - assert visitor.counts["TextElement"] == 1 - - def test_visit_message_with_attributes(self) -> None: - """Visit message with attributes.""" - visitor = CountingVisitor() - msg = Message( - id=Identifier(name="button"), - value=Pattern(elements=(TextElement(value="Save"),)), - attributes=( - Attribute( - id=Identifier(name="tooltip"), - value=Pattern(elements=(TextElement(value="Click to save"),)), - ), - ), - ) - - visitor.visit(msg) - - assert visitor.counts["Message"] == 1 - assert visitor.counts["Attribute"] == 1 - assert visitor.counts["Identifier"] == 2 # message + attribute - - def test_visit_message_with_comment(self) -> None: - """Visit message with comment.""" - visitor = CountingVisitor() - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - comment=Comment(content="This is a comment", type=CommentType.COMMENT), - ) - - visitor.visit(msg) - - assert visitor.counts["Message"] == 1 - assert visitor.counts["Comment"] == 1 - - def test_visit_message_without_value(self) -> None: - """Visit message without value (only attributes).""" - visitor = CountingVisitor() - msg = Message( - id=Identifier(name="test"), - value=None, - attributes=( - Attribute( - id=Identifier(name="attr"), - value=Pattern(elements=(TextElement(value="Value"),)), - ), - ), - ) - - visitor.visit(msg) - - assert visitor.counts["Message"] == 1 - assert visitor.counts["Attribute"] == 1 - # No Pattern count for message value (it's None) - assert visitor.counts["Pattern"] == 1 # From attribute - - -class TestVisitorTerm: - """Test visiting Term nodes.""" - - def test_visit_simple_term(self) -> None: - """Visit term with text only.""" - visitor = CountingVisitor() - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - ) - - visitor.visit(term) - - assert visitor.counts["Term"] == 1 - assert visitor.counts["Identifier"] == 1 - assert visitor.counts["Pattern"] == 1 - - def test_visit_term_with_attributes(self) -> None: - """Visit term with attributes.""" - visitor = CountingVisitor() - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=( - Attribute( - id=Identifier(name="version"), - value=Pattern(elements=(TextElement(value="120"),)), - ), - ), - ) - - visitor.visit(term) - - assert visitor.counts["Term"] == 1 - assert visitor.counts["Attribute"] == 1 - - def test_visit_term_with_comment(self) -> None: - """Visit term with comment.""" - visitor = CountingVisitor() - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - comment=Comment(content="Brand name", type=CommentType.COMMENT), - ) - - visitor.visit(term) - - assert visitor.counts["Term"] == 1 - assert visitor.counts["Comment"] == 1 - - -class TestVisitorAttribute: - """Test visiting Attribute nodes.""" - - def test_visit_attribute(self) -> None: - """Visit attribute node.""" - visitor = CountingVisitor() - attr = Attribute( - id=Identifier(name="tooltip"), - value=Pattern(elements=(TextElement(value="Help text"),)), - ) - - visitor.visit(attr) - - assert visitor.counts["Attribute"] == 1 - assert visitor.counts["Identifier"] == 1 - assert visitor.counts["Pattern"] == 1 - - -class TestVisitorCommentJunk: - """Test visiting Comment and Junk nodes.""" - - def test_visit_comment(self) -> None: - """Visit comment node.""" - visitor = CountingVisitor() - comment = Comment(content="This is a comment", type=CommentType.COMMENT) - - visitor.visit(comment) - - assert visitor.counts["Comment"] == 1 - - def test_visit_junk(self) -> None: - """Visit junk node.""" - visitor = CountingVisitor() - junk = Junk(content="invalid { syntax") - - visitor.visit(junk) - - assert visitor.counts["Junk"] == 1 - - -# ============================================================================ -# PATTERN AND ELEMENT NODES -# ============================================================================ - - -class TestVisitorPattern: - """Test visiting Pattern nodes.""" - - def test_visit_pattern_with_text(self) -> None: - """Visit pattern with text elements.""" - visitor = CountingVisitor() - pattern = Pattern(elements=(TextElement(value="Hello"),)) - - visitor.visit(pattern) - - assert visitor.counts["Pattern"] == 1 - assert visitor.counts["TextElement"] == 1 - - def test_visit_pattern_with_mixed_elements(self) -> None: - """Visit pattern with text and placeables.""" - visitor = CountingVisitor() - pattern = Pattern( - elements=( - TextElement(value="Hello, "), - Placeable(expression=VariableReference(id=Identifier(name="name"))), - TextElement(value="!"), - ) - ) - - visitor.visit(pattern) - - assert visitor.counts["Pattern"] == 1 - assert visitor.counts["TextElement"] == 2 - assert visitor.counts["Placeable"] == 1 - assert visitor.counts["VariableReference"] == 1 - - -class TestVisitorTextElement: - """Test visiting TextElement nodes.""" - - def test_visit_text_element(self) -> None: - """Visit text element.""" - visitor = CountingVisitor() - text = TextElement(value="Hello, World!") - - visitor.visit(text) - - assert visitor.counts["TextElement"] == 1 - - -class TestVisitorPlaceable: - """Test visiting Placeable nodes.""" - - def test_visit_placeable_with_variable(self) -> None: - """Visit placeable containing variable.""" - visitor = CountingVisitor() - placeable = Placeable(expression=VariableReference(id=Identifier(name="var"))) - - visitor.visit(placeable) - - assert visitor.counts["Placeable"] == 1 - assert visitor.counts["VariableReference"] == 1 - assert visitor.counts["Identifier"] == 1 - - -# ============================================================================ -# EXPRESSION NODES -# ============================================================================ - - -class TestVisitorLiterals: - """Test visiting literal expression nodes.""" - - def test_visit_string_literal(self) -> None: - """Visit string literal.""" - visitor = CountingVisitor() - literal = StringLiteral(value="test") - - visitor.visit(literal) - - assert visitor.counts["StringLiteral"] == 1 - - def test_visit_number_literal(self) -> None: - """Visit number literal.""" - visitor = CountingVisitor() - literal = NumberLiteral(value=42, raw="42") - - visitor.visit(literal) - - assert visitor.counts["NumberLiteral"] == 1 - - -class TestVisitorReferences: - """Test visiting reference expression nodes.""" - - def test_visit_variable_reference(self) -> None: - """Visit variable reference.""" - visitor = CountingVisitor() - ref = VariableReference(id=Identifier(name="count")) - - visitor.visit(ref) - - assert visitor.counts["VariableReference"] == 1 - assert visitor.counts["Identifier"] == 1 - - def test_visit_message_reference_simple(self) -> None: - """Visit message reference without attribute.""" - visitor = CountingVisitor() - ref = MessageReference(id=Identifier(name="hello"), attribute=None) - - visitor.visit(ref) - - assert visitor.counts["MessageReference"] == 1 - assert visitor.counts["Identifier"] == 1 - - def test_visit_message_reference_with_attribute(self) -> None: - """Visit message reference with attribute.""" - visitor = CountingVisitor() - ref = MessageReference( - id=Identifier(name="button"), attribute=Identifier(name="tooltip") - ) - - visitor.visit(ref) - - assert visitor.counts["MessageReference"] == 1 - assert visitor.counts["Identifier"] == 2 - - def test_visit_term_reference_simple(self) -> None: - """Visit term reference without attribute or arguments.""" - visitor = CountingVisitor() - ref = TermReference(id=Identifier(name="brand"), attribute=None, arguments=None) - - visitor.visit(ref) - - assert visitor.counts["TermReference"] == 1 - assert visitor.counts["Identifier"] == 1 - - def test_visit_term_reference_with_attribute(self) -> None: - """Visit term reference with attribute.""" - visitor = CountingVisitor() - ref = TermReference( - id=Identifier(name="brand"), - attribute=Identifier(name="version"), - arguments=None, - ) - - visitor.visit(ref) - - assert visitor.counts["TermReference"] == 1 - assert visitor.counts["Identifier"] == 2 - - def test_visit_term_reference_with_arguments(self) -> None: - """Visit term reference with arguments.""" - visitor = CountingVisitor() - ref = TermReference( - id=Identifier(name="brand"), - attribute=None, - arguments=CallArguments(positional=(), named=()), - ) - - visitor.visit(ref) - - assert visitor.counts["TermReference"] == 1 - assert visitor.counts["CallArguments"] == 1 - - -class TestVisitorFunctionReference: - """Test visiting FunctionReference nodes.""" - - def test_visit_function_reference_no_args(self) -> None: - """Visit function with no arguments.""" - visitor = CountingVisitor() - func = FunctionReference( - id=Identifier(name="NUMBER"), arguments=CallArguments(positional=(), named=()) - ) - - visitor.visit(func) - - assert visitor.counts["FunctionReference"] == 1 - assert visitor.counts["Identifier"] == 1 - assert visitor.counts["CallArguments"] == 1 - - def test_visit_function_reference_with_args(self) -> None: - """Visit function with positional arguments.""" - visitor = CountingVisitor() - func = FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=(VariableReference(id=Identifier(name="value")),), named=() - ), - ) - - visitor.visit(func) - - assert visitor.counts["FunctionReference"] == 1 - assert visitor.counts["CallArguments"] == 1 - assert visitor.counts["VariableReference"] == 1 - - -class TestVisitorSelectExpression: - """Test visiting SelectExpression nodes.""" - - def test_visit_select_expression(self) -> None: - """Visit select expression with variants.""" - visitor = CountingVisitor() - select = SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="one item"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="many items"),)), - default=True, - ), - ), - ) - - visitor.visit(select) - - assert visitor.counts["SelectExpression"] == 1 - assert visitor.counts["VariableReference"] == 1 - assert visitor.counts["Variant"] == 2 - assert visitor.counts["Pattern"] == 2 - - -class TestVisitorVariant: - """Test visiting Variant nodes.""" - - def test_visit_variant_with_identifier_key(self) -> None: - """Visit variant with identifier key.""" - visitor = CountingVisitor() - variant = Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="one"),)), - default=False, - ) - - visitor.visit(variant) - - assert visitor.counts["Variant"] == 1 - assert visitor.counts["Identifier"] == 1 - assert visitor.counts["Pattern"] == 1 - - def test_visit_variant_with_number_key(self) -> None: - """Visit variant with number literal key.""" - visitor = CountingVisitor() - variant = Variant( - key=NumberLiteral(value=0, raw="0"), - value=Pattern(elements=(TextElement(value="none"),)), - default=False, - ) - - visitor.visit(variant) - - assert visitor.counts["Variant"] == 1 - assert visitor.counts["NumberLiteral"] == 1 - - -# ============================================================================ -# CALL ARGUMENTS -# ============================================================================ - - -class TestVisitorCallArguments: - """Test visiting CallArguments nodes.""" - - def test_visit_call_arguments_empty(self) -> None: - """Visit call arguments with no args.""" - visitor = CountingVisitor() - args = CallArguments(positional=(), named=()) - - visitor.visit(args) - - assert visitor.counts["CallArguments"] == 1 - - def test_visit_call_arguments_positional(self) -> None: - """Visit call arguments with positional args.""" - visitor = CountingVisitor() - args = CallArguments( - positional=( - VariableReference(id=Identifier(name="x")), - NumberLiteral(value=42, raw="42"), - ), - named=(), - ) - - visitor.visit(args) - - assert visitor.counts["CallArguments"] == 1 - assert visitor.counts["VariableReference"] == 1 - assert visitor.counts["NumberLiteral"] == 1 - - def test_visit_call_arguments_named(self) -> None: - """Visit call arguments with named args.""" - visitor = CountingVisitor() - args = CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="param"), - value=StringLiteral(value="value"), - ), - ), - ) - - visitor.visit(args) - - assert visitor.counts["CallArguments"] == 1 - assert visitor.counts["NamedArgument"] == 1 - assert visitor.counts["StringLiteral"] == 1 - - -class TestVisitorNamedArgument: - """Test visiting NamedArgument nodes.""" - - def test_visit_named_argument(self) -> None: - """Visit named argument.""" - visitor = CountingVisitor() - arg = NamedArgument( - name=Identifier(name="minimumFractionDigits"), value=NumberLiteral(value=2, raw="2") - ) - - visitor.visit(arg) - - assert visitor.counts["NamedArgument"] == 1 - assert visitor.counts["Identifier"] == 1 - assert visitor.counts["NumberLiteral"] == 1 - - -class TestVisitorIdentifier: - """Test visiting Identifier nodes.""" - - def test_visit_identifier(self) -> None: - """Visit identifier.""" - visitor = CountingVisitor() - ident = Identifier(name="test") - - visitor.visit(ident) - - assert visitor.counts["Identifier"] == 1 - - -# ============================================================================ -# VISITOR CUSTOMIZATION -# ============================================================================ - - -class TestVisitorCustomization: - """Test custom visitor implementations.""" - - def test_collecting_visitor(self) -> None: - """Custom visitor can collect specific data.""" - visitor = CollectingVisitor() - resource = Resource( - entries=( - Message( - id=Identifier(name="hello"), - value=Pattern( - elements=( - TextElement(value="Hello, "), - Placeable( - expression=VariableReference(id=Identifier(name="name")) - ), - ) - ), - attributes=(), - ), - Message( - id=Identifier(name="goodbye"), - value=Pattern(elements=(TextElement(value="Goodbye"),)), - attributes=(), - ), - ) - ) - - visitor.visit(resource) - - assert "hello" in visitor.identifiers - assert "goodbye" in visitor.identifiers - assert "name" in visitor.identifiers - - def test_transforming_visitor(self) -> None: - """Custom visitor can transform nodes.""" - visitor = TransformingVisitor() - text = TextElement(value="hello") - - result = visitor.visit(text) - - assert isinstance(result, TextElement) - assert result.value == "HELLO" - - -# ============================================================================ -# COMPLEX INTEGRATION TESTS -# ============================================================================ - - -class TestVisitorIntegration: - """Test visitor with complex AST structures.""" - - def test_visit_complex_message_with_select(self) -> None: - """Visit message with select expression and multiple variants.""" - visitor = CountingVisitor() - msg = Message( - id=Identifier(name="emails"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=Identifier(name="one"), - value=Pattern( - elements=(TextElement(value="one email"),) - ), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern( - elements=( - Placeable( - expression=VariableReference( - id=Identifier(name="count") - ) - ), - TextElement(value=" emails"), - ) - ), - default=True, - ), - ), - ) - ), - ) - ), - attributes=(), - ) - - visitor.visit(msg) - - assert visitor.counts["Message"] == 1 - assert visitor.counts["SelectExpression"] == 1 - assert visitor.counts["Variant"] == 2 - assert visitor.counts["VariableReference"] == 2 # selector + in variant - - def test_visit_message_with_function_call(self) -> None: - """Visit message with function call.""" - visitor = CountingVisitor() - msg = Message( - id=Identifier(name="price"), - value=Pattern( - elements=( - TextElement(value="Price: "), - Placeable( - expression=FunctionReference( - id=Identifier(name="NUMBER"), - arguments=CallArguments( - positional=( - VariableReference(id=Identifier(name="value")), - ), - named=( - NamedArgument( - name=Identifier(name="minimumFractionDigits"), - value=NumberLiteral(value=2, raw="2"), - ), - ), - ), - ) - ), - ) - ), - attributes=(), - ) - - visitor.visit(msg) - - assert visitor.counts["Message"] == 1 - assert visitor.counts["FunctionReference"] == 1 - assert visitor.counts["CallArguments"] == 1 - assert visitor.counts["NamedArgument"] == 1 - - def test_visit_resource_with_mixed_entries(self) -> None: - """Visit resource with messages, terms, comments, and junk.""" - visitor = CountingVisitor() - resource = Resource( - entries=( - Comment(content="Header comment", type=CommentType.COMMENT), - Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="Hello"),)), - attributes=(), - ), - Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - ), - Junk(content="invalid syntax"), - ) - ) - - visitor.visit(resource) - - assert visitor.counts["Resource"] == 1 - assert visitor.counts["Comment"] == 1 - assert visitor.counts["Message"] == 1 - assert visitor.counts["Term"] == 1 - assert visitor.counts["Junk"] == 1 - - -# ============================================================================ -# DEFENSIVE BRANCHES (from test_visitor_branch_coverage.py) -# ============================================================================ - - -@dataclass(frozen=True) -class MockFieldContainer: - """Mock container without __dataclass_fields__ for testing defensive branches.""" - - value: str - - -class PlainObject: - """Plain object without dataclass fields for testing defensive branches.""" - - def __init__(self, data: str) -> None: - """Initialize with data.""" - self.data = data - - -class TestGenericVisitDefensiveBranches: - """Test defensive branches in generic_visit for non-ASTNode values.""" - - def test_generic_visit_tuple_with_non_dataclass_items(self) -> None: - """Test line 214->212: tuple containing items without __dataclass_fields__. - - This tests the defensive branch where a tuple field contains items that - are not ASTNodes (don't have __dataclass_fields__). - """ - - class CountingVisitor(ASTVisitor): - """Visitor that counts visits.""" - - def __init__(self) -> None: - """Initialize visitor.""" - super().__init__() - self.visit_count = 0 - - def visit(self, node): - """Count each visit.""" - self.visit_count += 1 - return super().visit(node) - - # Create a message with normal structure - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - - # Monkey-patch the elements tuple to include a non-ASTNode item - # This is testing a defensive code path that shouldn't happen in normal usage - # but guards against malformed AST structures - modified_elements = ( - TextElement(value="First"), - MockFieldContainer(value="not_an_astnode"), # No __dataclass_fields__ - TextElement(value="Last"), - ) - - # Use object.__setattr__ to bypass frozen dataclass protection - object.__setattr__(msg.value, "elements", modified_elements) - - visitor = CountingVisitor() - visitor.generic_visit(msg) - - # The visitor should visit the Message, Pattern, Identifier, and the two TextElements - # but NOT the MockFieldContainer (it lacks __dataclass_fields__) - # Visit count: Message (1) + Identifier (1) + Pattern (1) + 2 TextElements (2) = 5 - assert visitor.visit_count == 5 - - def test_generic_visit_tuple_with_mixed_items(self) -> None: - """Test tuple containing mix of ASTNodes and non-ASTNodes. - - This comprehensively tests the line 214 branch logic where we check - each tuple item for __dataclass_fields__. - """ - - class VisitOrderTracker(ASTVisitor): - """Track order of visits.""" - - def __init__(self) -> None: - """Initialize tracker.""" - super().__init__() - self.visit_order: list[str] = [] - - def visit(self, node): - """Record visit order.""" - node_name = type(node).__name__ - if node_name == "TextElement": - text_value = getattr(node, "value", "") - self.visit_order.append(f"TextElement:{text_value}") - else: - self.visit_order.append(node_name) - return super().visit(node) - - # Create pattern with mixed elements - pattern = Pattern( - elements=( - TextElement(value="A"), - TextElement(value="B"), - ) - ) - - # Inject non-ASTNode items into the tuple - mixed_elements = ( - TextElement(value="A"), - "string_value", # Not an ASTNode, will be skipped - TextElement(value="B"), - 123, # int, will be skipped by primitive check - ) - - object.__setattr__(pattern, "elements", mixed_elements) - - visitor = VisitOrderTracker() - visitor.generic_visit(pattern) - - # Should visit TextElement:A and TextElement:B, skipping string and int - assert "TextElement:A" in visitor.visit_order - assert "TextElement:B" in visitor.visit_order - # String and int should not appear - assert "str" not in visitor.visit_order - assert "int" not in visitor.visit_order - - def test_generic_visit_non_tuple_non_dataclass_field(self) -> None: - """Test line 217->203: single field that is an object without __dataclass_fields__. - - This tests the defensive else branch where a field value is: - - Not None - - Not a primitive (str, int, float, bool) - - Not a tuple - - Not an ASTNode (no __dataclass_fields__) - """ - # Create a message - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - - # Replace the 'comment' field (normally None or Comment ASTNode) with a - # plain object that doesn't have __dataclass_fields__ - plain_obj = PlainObject(data="test") - object.__setattr__(msg, "comment", plain_obj) - - class VisitorTracker(ASTVisitor): - """Track what gets visited.""" - - def __init__(self) -> None: - """Initialize tracker.""" - super().__init__() - self.visited_types: set[str] = set() - - def visit(self, node): - """Track visits.""" - self.visited_types.add(type(node).__name__) - return super().visit(node) - - visitor = VisitorTracker() - visitor.generic_visit(msg) - - # Should have visited Message's children (Identifier, Pattern, TextElement) - # but NOT the PlainObject (it doesn't have __dataclass_fields__) - assert "Identifier" in visitor.visited_types - assert "Pattern" in visitor.visited_types - assert "TextElement" in visitor.visited_types - assert "PlainObject" not in visitor.visited_types - - -# ============================================================================ -# VISITOR BRANCH COVERAGE -# ============================================================================ - - -class TestVisitorBranchCoverage: - """Test visitor branch coverage for tuple fields and primitive fields.""" - - def test_visit_node_with_empty_tuple_field(self) -> None: - """Visitor handles message with empty attributes tuple.""" - message = Message( - id=Identifier("empty"), - value=Pattern(elements=(TextElement("Value"),)), - attributes=(), - ) - - class CountingVisitorLocal(ASTVisitor): - """Visitor that counts all nodes visited.""" - - def __init__(self) -> None: - """Initialize counter.""" - super().__init__() - self.visit_count = 0 - - def visit(self, node: Any) -> Any: - """Count each visit.""" - self.visit_count += 1 - return super().visit(node) - - visitor = CountingVisitorLocal() - visitor.visit(message) - - assert visitor.visit_count > 0 - - def test_visit_node_with_primitive_fields(self) -> None: - """Visitor dispatches to visit_Identifier for Identifier nodes.""" - ident = Identifier("test") - - class FieldInspector(ASTVisitor): - """Visitor that tracks Identifier visits.""" - - def __init__(self) -> None: - """Initialize tracker.""" - super().__init__() - self.visited_identifier = False - - def visit_Identifier(self, node: Identifier) -> Any: - """Record that Identifier was visited.""" - self.visited_identifier = True - return self.generic_visit(node) - - visitor = FieldInspector() - visitor.visit(ident) - - assert visitor.visited_identifier - - def test_visit_node_with_none_field(self) -> None: - """Visitor handles message with comment=None field gracefully.""" - message = Message( - id=Identifier("noComment"), - value=Pattern(elements=(TextElement("Val"),)), - attributes=(), - comment=None, - ) - - visitor = ASTVisitor() - result = visitor.visit(message) - - assert result is not None - - -class TestVisitorBranchCoverageExtended: - """Extended visitor branch coverage tests.""" - - def test_visit_resource_with_mixed_entries(self) -> None: - """Visitor traverses Resource with mix of messages, terms, comments, and junk.""" - resource = Resource( - entries=( - Comment(content="File comment", type=CommentType.RESOURCE), - Message( - id=Identifier("msg"), - value=Pattern(elements=(TextElement("Value"),)), - attributes=(), - ), - Term( - id=Identifier("term"), - value=Pattern(elements=(TextElement("Term"),)), - attributes=(), - ), - Junk(content="invalid"), - ) - ) - - visitor = ASTVisitor() - result = visitor.visit(resource) - - assert result is not None - - def test_visit_with_dataclass_fields(self) -> None: - """Visitor traverses nodes with int and bool dataclass fields.""" - num_lit = NumberLiteral(value=42, raw="42") - - variant = Variant( - key=num_lit, - value=Pattern(elements=(TextElement("Forty-two"),)), - default=True, - ) - - select = SelectExpression( - selector=VariableReference(id=Identifier("num")), - variants=(variant,), - ) - - message = Message( - id=Identifier("select"), - value=Pattern(elements=(Placeable(expression=select),)), - attributes=(), - ) - - visitor = ASTVisitor() - result = visitor.visit(message) - - assert result is not None - - -class TestTransformerListExpansion: - """ASTTransformer that returns a list from a visit method expands elements.""" - - def test_transform_list_with_multiple_results(self) -> None: - """Transformer returning a list from visit_TextElement expands pattern elements.""" - - class ListExpandingTransformer(ASTTransformer): - """Transformer that returns a list instead of a single node.""" - - def visit_TextElement(self, node: TextElement) -> Any: - """Return two nodes in place of one.""" - return [ - TextElement(value=node.value.upper()), - TextElement(value=" "), - ] - - pattern = Pattern(elements=( - TextElement(value="hello"), - TextElement(value="world"), - )) - - transformer = ListExpandingTransformer() - result = transformer.visit(pattern) - - assert isinstance(result, Pattern) - assert len(result.elements) > 2 +"""Aggregated syntax visitor test surface.""" + +from tests.syntax_visitor_cases.basic_visitor_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.call_arguments import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.complex_integration_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.defensive_branches_from_test_visitor_branch_coverage_py import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.expression_nodes import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.helper_visitors import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.pattern_and_element_nodes import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.resource_and_entry_nodes import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.visitor_branch_coverage import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_cases.visitor_customization import * # noqa: F403 - re-export split test surface diff --git a/tests/test_syntax_visitor_transformer.py b/tests/test_syntax_visitor_transformer.py index 10a425c5..0c808466 100644 --- a/tests/test_syntax_visitor_transformer.py +++ b/tests/test_syntax_visitor_transformer.py @@ -1,1366 +1,11 @@ -"""Tests for syntax.visitor: ASTTransformer transformation, validation, and error cases.""" - -from __future__ import annotations - -import pytest -from hypothesis import event, given, settings -from hypothesis import strategies as st - -from ftllexengine.syntax.ast import ( - Attribute, - CallArguments, - FunctionReference, - Identifier, - Message, - MessageReference, - NamedArgument, - NumberLiteral, - Pattern, - Placeable, - Resource, - SelectExpression, - StringLiteral, - Term, - TermReference, - TextElement, - VariableReference, - Variant, -) -from ftllexengine.syntax.visitor import ASTTransformer, ASTVisitor - - -class UppercaseIdentifierTransformer(ASTTransformer): - """Test transformer that uppercases all identifiers.""" - - def visit_Identifier(self, node: Identifier) -> Identifier: - """Uppercase identifier names.""" - return Identifier(name=node.name.upper()) - - -class TestTermTransformation: - """Test Term node transformation (line 303).""" - - def test_transform_term_with_value(self) -> None: - """Transform a Term with value and attributes.""" - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Acme Corp"),)), - attributes=( - Attribute( - id=Identifier(name="legal"), - value=Pattern(elements=(TextElement(value="Acme Corporation"),)), - ), - ), - comment=None, - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(term) - - # Should transform all identifiers to uppercase - assert isinstance(result, Term) - assert result.id.name == "BRAND" - assert result.attributes[0].id.name == "LEGAL" - - -class TestSelectExpressionTransformation: - """Test SelectExpression transformation (line 315).""" - - def test_transform_select_expression(self) -> None: - """Transform SelectExpression with variants.""" - select = SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=Identifier(name="one"), - value=Pattern(elements=(TextElement(value="one item"),)), - default=False, - ), - Variant( - key=Identifier(name="other"), - value=Pattern(elements=(TextElement(value="many items"),)), - default=True, - ), - ), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(select) - - # Should transform all identifiers - assert isinstance(result, SelectExpression) - assert result.selector.id.name == "COUNT" # type: ignore[union-attr] - assert result.variants[0].key.name == "ONE" # type: ignore[union-attr] - assert result.variants[1].key.name == "OTHER" # type: ignore[union-attr] - - -class TestVariantTransformation: - """Test Variant transformation (line 321).""" - - def test_transform_variant(self) -> None: - """Transform Variant with key and value.""" - variant = Variant( - key=Identifier(name="zero"), - value=Pattern( - elements=( - Placeable( - expression=VariableReference(id=Identifier(name="count")) - ), - ) - ), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(variant) - - # Should transform identifiers in key and value - assert isinstance(result, Variant) - assert result.key.name == "ZERO" # type: ignore[union-attr] - assert result.value.elements[0].expression.id.name == "COUNT" # type: ignore[union-attr] - - -class TestFunctionReferenceTransformation: - """Test FunctionReference transformation (line 324).""" - - def test_transform_function_reference(self) -> None: - """Transform FunctionReference with arguments.""" - func_ref = FunctionReference( - id=Identifier(name="number"), - arguments=CallArguments( - positional=(VariableReference(id=Identifier(name="amount")),), - named=( - NamedArgument( - name=Identifier(name="minimumFractionDigits"), - value=NumberLiteral(value=2, raw="2"), - ), - ), - ), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(func_ref) - - # Should transform all identifiers - assert isinstance(result, FunctionReference) - assert result.id.name == "NUMBER" - assert result.arguments.positional[0].id.name == "AMOUNT" # type: ignore[union-attr] - assert result.arguments.named[0].name.name == "MINIMUMFRACTIONDIGITS" - - -class TestMessageReferenceTransformation: - """Test MessageReference transformation (line 330).""" - - def test_transform_message_reference_without_attribute(self) -> None: - """Transform MessageReference without attribute.""" - msg_ref = MessageReference( - id=Identifier(name="welcome"), attribute=None - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(msg_ref) - - assert isinstance(result, MessageReference) - assert result.id.name == "WELCOME" - assert result.attribute is None - - def test_transform_message_reference_with_attribute(self) -> None: - """Transform MessageReference with attribute.""" - msg_ref = MessageReference( - id=Identifier(name="welcome"), - attribute=Identifier(name="tooltip"), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(msg_ref) - - assert isinstance(result, MessageReference) - assert result.id.name == "WELCOME" - assert result.attribute.name == "TOOLTIP" # type: ignore[union-attr] - - -class TestTermReferenceTransformation: - """Test TermReference transformation (line 336).""" - - def test_transform_term_reference_simple(self) -> None: - """Transform TermReference without attribute or arguments.""" - term_ref = TermReference( - id=Identifier(name="brand"), - attribute=None, - arguments=None, - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(term_ref) - - assert isinstance(result, TermReference) - assert result.id.name == "BRAND" - assert result.attribute is None - assert result.arguments is None - - def test_transform_term_reference_with_attribute_and_arguments(self) -> None: - """Transform TermReference with attribute and arguments.""" - term_ref = TermReference( - id=Identifier(name="brand"), - attribute=Identifier(name="legal"), - arguments=CallArguments( - positional=(), - named=( - NamedArgument( - name=Identifier(name="case"), - value=StringLiteral(value="upper"), - ), - ), - ), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(term_ref) - - assert isinstance(result, TermReference) - assert result.id.name == "BRAND" - assert result.attribute.name == "LEGAL" # type: ignore[union-attr] - assert result.arguments.named[0].name.name == "CASE" # type: ignore[union-attr] - - -class TestVariableReferenceTransformation: - """Test VariableReference transformation (line 343).""" - - def test_transform_variable_reference(self) -> None: - """Transform VariableReference.""" - var_ref = VariableReference(id=Identifier(name="userName")) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(var_ref) - - assert isinstance(result, VariableReference) - assert result.id.name == "USERNAME" - - -class TestCallArgumentsTransformation: - """Test CallArguments transformation (line 345).""" - - def test_transform_call_arguments(self) -> None: - """Transform CallArguments: positional args are visited, named arg names are visited. - - Named arg values are FTLLiteral (StringLiteral | NumberLiteral) leaf nodes; - the transformer returns them unchanged (generic_visit returns leaf nodes as-is). - The identifier in named arg NAME is visited and uppercased. - """ - call_args = CallArguments( - positional=( - VariableReference(id=Identifier(name="value")), - NumberLiteral(value=42, raw="42"), - ), - named=( - NamedArgument( - name=Identifier(name="option"), - value=StringLiteral(value="opt_value"), - ), - ), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(call_args) - - assert isinstance(result, CallArguments) - assert result.positional[0].id.name == "VALUE" # type: ignore[union-attr] - assert result.positional[1].value == 42 # type: ignore[union-attr] - assert result.named[0].name.name == "OPTION" - # Literal value is a leaf node; returned unchanged by generic_visit - assert result.named[0].value == StringLiteral(value="opt_value") - - -class TestNamedArgumentTransformation: - """Test NamedArgument transformation (line 351).""" - - def test_transform_named_argument(self) -> None: - """Transform NamedArgument: name identifier is visited; literal value is unchanged. - - Named arg values are FTLLiteral (StringLiteral | NumberLiteral); generic_visit - returns leaf nodes as-is. The identifier in the name field is visited. - """ - named_arg = NamedArgument( - name=Identifier(name="minimumFractionDigits"), - value=NumberLiteral(value=2, raw="2"), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(named_arg) - - assert isinstance(result, NamedArgument) - assert result.name.name == "MINIMUMFRACTIONDIGITS" - # Literal value is a leaf node; returned unchanged - assert result.value == NumberLiteral(value=2, raw="2") - - -class TestAttributeTransformation: - """Test Attribute transformation (line 353).""" - - def test_transform_attribute(self) -> None: - """Transform Attribute with id and value.""" - attr = Attribute( - id=Identifier(name="tooltip"), - value=Pattern( - elements=( - Placeable( - expression=VariableReference(id=Identifier(name="text")) - ), - ) - ), - ) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(attr) - - assert isinstance(result, Attribute) - assert result.id.name == "TOOLTIP" - assert result.value.elements[0].expression.id.name == "TEXT" # type: ignore[union-attr] - - -class TestTransformListEdgeCases: - """Test _transform_list method edge cases.""" - - def test_transform_empty_tuple(self) -> None: - """Transform empty tuple.""" - pattern = Pattern(elements=()) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(pattern) - - assert isinstance(result, Pattern) - assert result.elements == () - - def test_transform_large_list(self) -> None: - """Transform large list of elements.""" - elements = tuple( - Placeable(expression=VariableReference(id=Identifier(name=f"var{i}"))) - for i in range(100) - ) - pattern = Pattern(elements=elements) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(pattern) - - assert isinstance(result, Pattern) - assert len(result.elements) == 100 - # All identifiers should be uppercased - for i, elem in enumerate(result.elements): - assert elem.expression.id.name == f"VAR{i}".upper() # type: ignore[union-attr] - - -class TestTransformerPropertyBased: - """Property-based tests for Transformer.""" - - @given( - st.text( - min_size=1, - max_size=20, - alphabet=st.characters(min_codepoint=97, max_codepoint=122), - ) - ) - @settings(max_examples=50) - def test_identifier_transformation_is_idempotent(self, name: str) -> None: - """Transforming twice yields same result (idempotency).""" - identifier = Identifier(name=name) - transformer = UppercaseIdentifierTransformer() - - result1 = transformer.visit(identifier) - assert isinstance(result1, Identifier), f"Expected Identifier, got {type(result1)}" - result2 = transformer.visit(result1) - assert isinstance(result2, Identifier), f"Expected Identifier, got {type(result2)}" - - event("outcome=idempotent") - - # Uppercasing twice should give same result - assert result1.name == result2.name - assert result1.name == name.upper() - - @given( - st.lists( - st.text( - min_size=1, - max_size=10, - alphabet=st.characters(min_codepoint=97, max_codepoint=122), - ), - min_size=0, - max_size=20, - ) - ) - @settings(max_examples=30) - def test_transform_pattern_with_variable_count(self, names: list[str]) -> None: - """Transform pattern with arbitrary number of variables.""" - elements = tuple( - Placeable(expression=VariableReference(id=Identifier(name=name))) - for name in names - ) - pattern = Pattern(elements=elements) - - transformer = UppercaseIdentifierTransformer() - result = transformer.visit(pattern) - assert isinstance(result, Pattern), f"Expected Pattern, got {type(result)}" - - event(f"element_count={len(names)}") - - assert len(result.elements) == len(names) - for i, name in enumerate(names): - elem = result.elements[i] - assert isinstance(elem, Placeable), f"Expected Placeable, got {type(elem)}" - assert isinstance(elem.expression, VariableReference), ( - f"Expected VariableReference, got {type(elem.expression)}" - ) - assert elem.expression.id.name == name.upper() - - -# ============================================================================ -# ERROR CASES AND DEFENSIVE BRANCHES (from test_visitor_error_cases.py) -# ============================================================================ - - - - -class NoneReturningTransformer(ASTTransformer): - """Transformer that incorrectly returns None for required scalar fields.""" - - def __init__(self, target_node_type: str) -> None: - """Initialize with target node type to return None for. - - Args: - target_node_type: Node type to return None for (e.g., "Identifier") - """ - super().__init__() - self.target_node_type = target_node_type - - def visit_Identifier(self, node: Identifier) -> Identifier | None: - """Return None for Identifier (invalid for required fields).""" - if self.target_node_type == "Identifier": - return None - return node - - -class ListReturningTransformer(ASTTransformer): - """Transformer that incorrectly returns list for scalar fields.""" - - def __init__(self, target_node_type: str) -> None: - """Initialize with target node type to return list for. - - Args: - target_node_type: Node type to return list for (e.g., "Identifier") - """ - super().__init__() - self.target_node_type = target_node_type - - def visit_Identifier(self, node: Identifier) -> Identifier | list[Identifier]: - """Return list of Identifiers (invalid for scalar fields).""" - if self.target_node_type == "Identifier": - return [node, Identifier(name="extra")] - return node - - def visit_Pattern(self, node: Pattern) -> Pattern | list[Pattern]: - """Return list of Patterns (invalid for scalar fields).""" - if self.target_node_type == "Pattern": - return [node, Pattern(elements=())] - return self.generic_visit(node) # type: ignore[return-value] - - -# ============================================================================ -# TESTS FOR _validate_scalar_result ERROR CASES -# ============================================================================ - - -class TestValidateScalarResultErrors: - """Test error cases in _validate_scalar_result (lines 318-331).""" - - def test_none_for_required_message_id_raises_typeerror(self) -> None: - """Returning None for Message.id raises TypeError (lines 318-323).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Hello"),)), - attributes=(), - ) - - transformer = NoneReturningTransformer("Identifier") - - with pytest.raises(TypeError) as exc_info: - transformer.visit(msg) - - assert "Cannot assign None to required scalar field 'Message.id'" in str( - exc_info.value - ) - assert "Required scalar fields must have a single ASTNode" in str( - exc_info.value - ) - - def test_none_for_required_term_value_raises_typeerror(self) -> None: - """Returning None for Term.value raises TypeError (lines 318-323).""" - - class NonePatternTransformer(ASTTransformer): - def visit_Pattern(self, _node: Pattern) -> None: - """Return None for Pattern (invalid for Term.value).""" - return - - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - ) - - transformer = NonePatternTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.visit(term) - - assert "Cannot assign None to required scalar field 'Term.value'" in str( - exc_info.value - ) - - def test_list_for_scalar_message_id_raises_typeerror(self) -> None: - """Returning list for Message.id raises TypeError (lines 325-331).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Hello"),)), - attributes=(), - ) - - transformer = ListReturningTransformer("Identifier") - - with pytest.raises(TypeError) as exc_info: - transformer.visit(msg) - - error_msg = str(exc_info.value) - assert "Cannot assign list to scalar field 'Message.id'" in error_msg - assert "Scalar fields require a single ASTNode" in error_msg - assert "Got 2 nodes:" in error_msg - assert "['Identifier', 'Identifier']" in error_msg - - def test_list_for_scalar_term_value_raises_typeerror(self) -> None: - """Returning list for Term.value raises TypeError (lines 325-331).""" - term = Term( - id=Identifier(name="brand"), - value=Pattern(elements=(TextElement(value="Firefox"),)), - attributes=(), - ) - - transformer = ListReturningTransformer("Pattern") - - with pytest.raises(TypeError) as exc_info: - transformer.visit(term) - - error_msg = str(exc_info.value) - assert "Cannot assign list to scalar field 'Term.value'" in error_msg - assert "Got 2 nodes:" in error_msg - assert "['Pattern', 'Pattern']" in error_msg - - def test_list_for_scalar_placeable_expression_raises_typeerror(self) -> None: - """Returning list for Placeable.expression raises TypeError (lines 325-331).""" - - class ListVariableRefTransformer(ASTTransformer): - def visit_VariableReference( - self, node: VariableReference - ) -> list[VariableReference]: - """Return list of VariableReferences.""" - return [node, VariableReference(id=Identifier(name="extra"))] - - placeable = Placeable( - expression=VariableReference(id=Identifier(name="count")) - ) - - transformer = ListVariableRefTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.visit(placeable) - - error_msg = str(exc_info.value) - assert ( - "Cannot assign list to scalar field 'Placeable.expression'" in error_msg - ) - assert "['VariableReference', 'VariableReference']" in error_msg - - -# ============================================================================ -# TESTS FOR _validate_optional_scalar_result ERROR CASES -# ============================================================================ - - -class TestValidateOptionalScalarResultErrors: - """Test error cases in _validate_optional_scalar_result (lines 360-366).""" - - def test_list_for_optional_message_value_raises_typeerror(self) -> None: - """Returning list for Message.value (optional) raises TypeError (lines 360-366).""" - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Hello"),)), - attributes=(), - ) - - transformer = ListReturningTransformer("Pattern") - - with pytest.raises(TypeError) as exc_info: - transformer.visit(msg) - - error_msg = str(exc_info.value) - assert ( - "Cannot assign list to optional scalar field 'Message.value'" in error_msg - ) - assert "Scalar fields require a single ASTNode or None" in error_msg - assert "Got 2 nodes:" in error_msg - - def test_list_for_optional_message_reference_attribute_raises_typeerror( - self, - ) -> None: - """Returning list for MessageReference.attribute raises TypeError (lines 360-366).""" - msg_ref = MessageReference( - id=Identifier(name="button"), attribute=Identifier(name="tooltip") - ) - - transformer = ListReturningTransformer("Identifier") - - # The error will occur when visiting the attribute field - with pytest.raises(TypeError) as exc_info: - transformer.visit(msg_ref) - - error_msg = str(exc_info.value) - # Could be Message.id or MessageReference.attribute depending on traversal order - assert "Cannot assign list to" in error_msg - assert "scalar field" in error_msg - - -# ============================================================================ -# TESTS FOR GENERIC_VISIT BRANCH COVERAGE -# ============================================================================ - - -class TestGenericVisitBranchCoverage: - """Test branch coverage in generic_visit (lines 214, 217).""" - - def test_generic_visit_skips_none_values(self) -> None: - """Generic visit skips None field values (branch coverage for line 207).""" - # Message with value=None but with attribute (valid per spec), and comment=None - msg = Message( - id=Identifier(name="test"), - value=None, - attributes=( - Attribute( - id=Identifier(name="attr"), - value=Pattern(elements=(TextElement(value="val"),)), - ), - ), - comment=None, - ) - - visitor = ASTVisitor() - result = visitor.generic_visit(msg) - - # Should complete without error (None values are skipped) - assert result is msg - - def test_generic_visit_skips_string_fields(self) -> None: - """Generic visit skips string fields (branch coverage for line 207).""" - # TextElement has a string 'value' field - text = TextElement(value="Hello, World!") - - visitor = ASTVisitor() - result = visitor.generic_visit(text) - - # Should complete without error (string fields are skipped) - assert result is text - - def test_generic_visit_skips_int_fields(self) -> None: - """Generic visit skips int fields (branch coverage for line 207).""" - # Create a node with int field (custom test node) - # Since AST doesn't have many int fields directly, use a workaround - # Actually, Identifier just has 'name' (str), so let's use a different approach - - # The coverage here is about ensuring we skip non-ASTNode fields - # Let's verify by checking the behavior is correct - ident = Identifier(name="test") - - visitor = ASTVisitor() - result = visitor.generic_visit(ident) - - assert result is ident - - def test_generic_visit_tuple_with_non_astnode_items(self) -> None: - """Generic visit skips tuple items without __dataclass_fields__ (line 214 branch). - - This tests the negative branch of: - if hasattr(item, "__dataclass_fields__"): - """ - - class TupleFieldVisitor(ASTVisitor): - """Visitor that tracks tuple processing.""" - - def __init__(self) -> None: - """Initialize visitor.""" - super().__init__() - self.visited_types: list[str] = [] - - def visit(self, node): - """Track visited node types.""" - self.visited_types.append(type(node).__name__) - return super().visit(node) - - # Pattern has elements tuple, which normally contains ASTNodes - # We'll create a normal pattern and verify tuple processing - pattern = Pattern( - elements=( - TextElement(value="Hello"), - TextElement(value="World"), - ) - ) - - visitor = TupleFieldVisitor() - visitor.generic_visit(pattern) - - # Should have visited the TextElements in the tuple - assert "TextElement" in visitor.visited_types - - def test_generic_visit_non_tuple_non_astnode_field(self) -> None: - """Generic visit handles non-tuple, non-ASTNode single fields (line 217 branch). - - This tests the negative branch of: - elif hasattr(value, "__dataclass_fields__"): - """ - # All our AST nodes have either ASTNode children or primitive fields - # The negative branch is when a field is a primitive (str, int, bool) - - # Let's create a scenario with a field that's not an ASTNode - # Actually, this is already covered by string/int tests above - - # The key is to ensure we don't crash on non-ASTNode single values - msg = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - - visitor = ASTVisitor() - result = visitor.generic_visit(msg) - - assert result is msg - - -# ============================================================================ -# TESTS FOR _TRANSFORM_LIST EDGE CASES -# ============================================================================ - - -class TestTransformListNodeManagement: - """Test edge cases in _transform_list (line 552 and match branches).""" - - def test_transform_list_with_none_removal(self) -> None: - """_transform_list handles None results (node removal).""" - - class RemoveFirstElementTransformer(ASTTransformer): - """Remove first element from pattern.""" - - def __init__(self) -> None: - """Initialize transformer.""" - super().__init__() - self.first_text_seen = False - - def visit_TextElement(self, node: TextElement) -> TextElement | None: - """Remove first text element.""" - if not self.first_text_seen: - self.first_text_seen = True - return None - return node - - pattern = Pattern( - elements=( - TextElement(value="First"), - TextElement(value="Second"), - TextElement(value="Third"), - ) - ) - - transformer = RemoveFirstElementTransformer() - result = transformer.visit(pattern) - - assert isinstance(result, Pattern) - assert len(result.elements) == 2 - assert result.elements[0].value == "Second" # type: ignore[union-attr] - assert result.elements[1].value == "Third" # type: ignore[union-attr] - - def test_transform_list_with_expansion(self) -> None: - """_transform_list handles list results (node expansion).""" - - class DuplicateTextElementTransformer(ASTTransformer): - """Duplicate text elements.""" - - def visit_TextElement(self, node: TextElement) -> list[TextElement]: - """Duplicate each text element.""" - return [node, TextElement(value=f"{node.value}_copy")] - - pattern = Pattern( - elements=( - TextElement(value="Hello"), - TextElement(value="World"), - ) - ) - - transformer = DuplicateTextElementTransformer() - result = transformer.visit(pattern) - - assert isinstance(result, Pattern) - assert len(result.elements) == 4 - assert result.elements[0].value == "Hello" # type: ignore[union-attr] - assert result.elements[1].value == "Hello_copy" # type: ignore[union-attr] - assert result.elements[2].value == "World" # type: ignore[union-attr] - assert result.elements[3].value == "World_copy" # type: ignore[union-attr] - - def test_transform_list_with_single_replacement(self) -> None: - """_transform_list handles single ASTNode results (replacement, line 552).""" - - class UppercaseTextTransformer(ASTTransformer): - """Uppercase text elements.""" - - def visit_TextElement(self, node: TextElement) -> TextElement: - """Uppercase text.""" - return TextElement(value=node.value.upper()) - - pattern = Pattern( - elements=( - TextElement(value="hello"), - TextElement(value="world"), - ) - ) - - transformer = UppercaseTextTransformer() - result = transformer.visit(pattern) - - assert isinstance(result, Pattern) - assert len(result.elements) == 2 - assert result.elements[0].value == "HELLO" # type: ignore[union-attr] - assert result.elements[1].value == "WORLD" # type: ignore[union-attr] - - def test_transform_list_mixed_operations(self) -> None: - """_transform_list handles mix of None, list, and single node returns.""" - - class MixedTransformer(ASTTransformer): - """Transform with mixed return types.""" - - def __init__(self) -> None: - """Initialize transformer.""" - super().__init__() - self.element_count = 0 - - def visit_TextElement( - self, node: TextElement - ) -> TextElement | None | list[TextElement]: - """Return different types based on position.""" - self.element_count += 1 - - match self.element_count: - case 1: - # Remove first element - return None - case 2: - # Expand second element - return [ - TextElement(value=f"{node.value}_a"), - TextElement(value=f"{node.value}_b"), - ] - case _: - # Keep remaining elements (single node) - return node - - pattern = Pattern( - elements=( - TextElement(value="first"), - TextElement(value="second"), - TextElement(value="third"), - ) - ) - - transformer = MixedTransformer() - result = transformer.visit(pattern) - - assert isinstance(result, Pattern) - # First removed, second expanded to 2, third kept = 3 elements - assert len(result.elements) == 3 - assert result.elements[0].value == "second_a" # type: ignore[union-attr] - assert result.elements[1].value == "second_b" # type: ignore[union-attr] - assert result.elements[2].value == "third" # type: ignore[union-attr] - - -# ============================================================================ -# ADDITIONAL COVERAGE TESTS -# ============================================================================ - - -class TestAdditionalCoverage: - """Additional tests to ensure complete coverage.""" - - def test_validate_scalar_result_all_field_types(self) -> None: - """Test _validate_scalar_result for various required scalar fields.""" - - class AlwaysNoneTransformer(ASTTransformer): - def visit_Identifier(self, _node: Identifier) -> None: - """Always return None.""" - return - - # Test various nodes with required scalar Identifier fields - test_cases: list[tuple[str, VariableReference | Attribute]] = [ - ( - "VariableReference.id", - VariableReference(id=Identifier(name="test")), - ), - ( - "Attribute.id", - Attribute( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="val"),)), - ), - ), - ] - - transformer = AlwaysNoneTransformer() - - for _field_name, node in test_cases: - with pytest.raises(TypeError) as exc_info: - transformer.visit(node) - - # Should raise error mentioning the field cannot be None - assert "Cannot assign None to required scalar field" in str(exc_info.value) - - -# ============================================================================ -# TRANSFORM LIST TYPE VALIDATION (from test_transformer_type_validation.py) -# ============================================================================ - - -def _make_resource(*messages: Message) -> Resource: - """Create a Resource with the given messages.""" - return Resource(entries=messages) - - -def _make_simple_message(name: str, text: str) -> Message: - """Create a simple message with a text pattern.""" - return Message( - id=Identifier(name=name, span=None), - value=Pattern(elements=(TextElement(value=text),)), - attributes=(), - comment=None, - span=None, - ) - - -class TestTransformListTypeValidation: - """_transform_list rejects wrong-typed nodes.""" - - def test_message_in_pattern_elements_rejected(self) -> None: - """Message node in Pattern.elements raises TypeError. - - Pattern.elements expects TextElement | Placeable. Producing a Message - violates the field type constraint. - """ - - class BadTransformer(ASTTransformer): - def visit_TextElement(self, node: TextElement) -> Message: - return _make_simple_message("wrong", "bad") - - resource = _make_resource(_make_simple_message("msg", "hello")) - transformer = BadTransformer() - - with pytest.raises(TypeError, match=r"Pattern\.elements.*TextElement \| Placeable"): - transformer.transform(resource) - - def test_text_element_in_resource_entries_rejected(self) -> None: - """TextElement in Resource.entries raises TypeError. - - Resource.entries expects Message | Term | Comment | Junk. - """ - - class BadTransformer(ASTTransformer): - def visit_Message(self, node: Message) -> TextElement: - return TextElement(value="not a message") - - resource = _make_resource(_make_simple_message("msg", "hello")) - transformer = BadTransformer() - - with pytest.raises(TypeError, match=r"Resource\.entries.*Message \| Term"): - transformer.transform(resource) - - def test_message_in_call_arguments_named_rejected(self) -> None: - """Message in CallArguments.named raises TypeError. - - CallArguments.named expects NamedArgument only. - """ - - class BadTransformer(ASTTransformer): - def visit_NamedArgument(self, node: NamedArgument) -> Message: - return _make_simple_message("wrong", "bad") - - func_ref = FunctionReference( - id=Identifier(name="NUMBER", span=None), - arguments=CallArguments( - positional=(VariableReference(id=Identifier(name="x", span=None), span=None),), - named=( - NamedArgument( - name=Identifier(name="style", span=None), - value=StringLiteral(value="decimal", span=None), - span=None, - ), - ), - ), - span=None, - ) - msg = Message( - id=Identifier(name="msg", span=None), - value=Pattern( - elements=(Placeable(expression=func_ref),), - ), - attributes=(), - comment=None, - span=None, - ) - resource = _make_resource(msg) - transformer = BadTransformer() - - with pytest.raises(TypeError, match=r"CallArguments\.named.*NamedArgument"): - transformer.transform(resource) - - -class TestTransformListTypeValidationExpand: - """_transform_list validates types in expanded lists.""" - - def test_expanded_list_with_wrong_type_rejected(self) -> None: - """List expansion with wrong type raises TypeError. - - When visit_* returns a list, each element must match expected types. - """ - - class ExpandBadTransformer(ASTTransformer): - def visit_TextElement( - self, node: TextElement - ) -> list[Message]: - return [_make_simple_message("wrong", "bad")] - - resource = _make_resource(_make_simple_message("msg", "hello")) - transformer = ExpandBadTransformer() - - with pytest.raises(TypeError, match=r"Pattern\.elements"): - transformer.transform(resource) - - -class TestTransformListTypeValidationValid: - """Valid transformations pass type validation.""" - - def test_identity_transform_succeeds(self) -> None: - """Identity transformer (no changes) passes validation.""" - resource = _make_resource( - _make_simple_message("msg1", "hello"), - _make_simple_message("msg2", "world"), - ) - transformer = ASTTransformer() - result = transformer.transform(resource) - assert isinstance(result, Resource) - - def test_correct_type_replacement_succeeds(self) -> None: - """Replacing TextElement with another TextElement passes validation.""" - - class UpperTransformer(ASTTransformer): - def visit_TextElement(self, node: TextElement) -> TextElement: - return TextElement(value=node.value.upper()) - - resource = _make_resource(_make_simple_message("msg", "hello")) - transformer = UpperTransformer() - result = transformer.transform(resource) - assert isinstance(result, Resource) - elements = result.entries[0].value.elements # type: ignore[union-attr] - assert elements[0].value == "HELLO" # type: ignore[union-attr] - - def test_none_removal_succeeds(self) -> None: - """Removing elements via None passes validation (no type check needed).""" - - class RemoveTransformer(ASTTransformer): - def visit_Message(self, node: Message) -> None: - return None - - resource = _make_resource( - _make_simple_message("msg1", "hello"), - _make_simple_message("msg2", "world"), - ) - transformer = RemoveTransformer() - result = transformer.transform(resource) - assert isinstance(result, Resource) - assert len(result.entries) == 0 - - def test_correct_expansion_succeeds(self) -> None: - """Expanding one Message into two Messages passes validation.""" - - class DuplicateTransformer(ASTTransformer): - def visit_Message(self, node: Message) -> list[Message]: - copy = Message( - id=Identifier(name=node.id.name + "_copy", span=None), - value=node.value, - attributes=(), - comment=None, - span=None, - ) - return [node, copy] - - resource = _make_resource(_make_simple_message("msg", "hello")) - transformer = DuplicateTransformer() - result = transformer.transform(resource) - assert isinstance(result, Resource) - assert len(result.entries) == 2 - entry0 = result.entries[0] - entry1 = result.entries[1] - assert isinstance(entry0, Message) - assert isinstance(entry1, Message) - assert entry0.id.name == "msg" - assert entry1.id.name == "msg_copy" - - -# ============================================================================ -# SCALAR FIELD VALIDATION (from test_transformer_validation.py) -# ============================================================================ - - -class TestASTTransformerValidation: - """Tests for ASTTransformer scalar field validation.""" - - def test_scalar_field_accepts_single_node(self) -> None: - """Scalar field accepts single ASTNode return value.""" - class RenameIdentifierTransformer(ASTTransformer): - def visit_Identifier(self, node: Identifier) -> Identifier: - return Identifier(name="renamed") - - message = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="World"),)), - attributes=(), - ) - - transformer = RenameIdentifierTransformer() - transformed = transformer.transform(message) - - # Transformation should succeed - assert isinstance(transformed, Message) - assert transformed.id.name == "renamed" - - def test_scalar_field_rejects_none(self) -> None: - """Scalar field assignment rejects None return value.""" - class RemoveIdentifierTransformer(ASTTransformer): - def visit_Identifier(self, node: Identifier) -> None: - return None # Invalid: scalar field requires node - - message = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="World"),)), - attributes=(), - ) - - transformer = RemoveIdentifierTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.transform(message) - - error_msg = str(exc_info.value) - assert "Cannot assign None to required scalar field" in error_msg - assert "Message.id" in error_msg - assert "Required scalar fields must have a single ASTNode" in error_msg - - def test_scalar_field_rejects_list(self) -> None: - """Scalar field assignment rejects list[ASTNode] return value.""" - class ExpandIdentifierTransformer(ASTTransformer): - def visit_Identifier(self, node: Identifier) -> list[Identifier]: - return [ # Invalid: scalar field requires single node - Identifier(name="id1"), - Identifier(name="id2"), - ] - - message = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="World"),)), - attributes=(), - ) - - transformer = ExpandIdentifierTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.transform(message) - - error_msg = str(exc_info.value) - assert "Cannot assign list to scalar field" in error_msg - assert "Message.id" in error_msg - assert "Got 2 nodes" in error_msg - - def test_collection_field_accepts_list(self) -> None: - """Collection field accepts list[ASTNode] return value via _transform_list.""" - class ExpandTextElementTransformer(ASTTransformer): - def visit_TextElement(self, node: TextElement) -> list[TextElement]: - # Valid: Pattern.elements is a collection field - return [ - TextElement(value="Hello"), - TextElement(value=" "), - TextElement(value="World"), - ] - - pattern = Pattern(elements=(TextElement(value="HelloWorld"),)) - - transformer = ExpandTextElementTransformer() - transformed = transformer.transform(pattern) - - # Transformation should succeed - assert isinstance(transformed, Pattern) - assert len(transformed.elements) == 3 - first_element = transformed.elements[0] - assert isinstance(first_element, TextElement) - assert first_element.value == "Hello" - - def test_optional_scalar_field_accepts_none_when_original_is_none(self) -> None: - """Optional scalar fields (e.g., Message.value) accept None when original has attributes.""" - from ftllexengine.syntax.ast import Attribute # noqa: PLC0415 - import inside function - - # Message without value but with attribute (valid per spec) - message = Message( - id=Identifier(name="empty"), - value=None, # Optional field - attributes=( - Attribute( - id=Identifier(name="attr"), - value=Pattern(elements=(TextElement(value="val"),)), - ), - ), - ) - - class NoOpTransformer(ASTTransformer): - pass - - transformer = NoOpTransformer() - transformed = transformer.transform(message) - - # Transformation should succeed - assert isinstance(transformed, Message) - assert transformed.value is None - - def test_optional_scalar_field_accepts_none_when_transformer_removes(self) -> None: - """Optional scalar fields accept None return value to remove existing value.""" - from ftllexengine.enums import CommentType # noqa: PLC0415 - import inside function - from ftllexengine.syntax.ast import Comment # noqa: PLC0415 - import inside function - - # Message with comment (optional field) - message = Message( - id=Identifier(name="hello"), - value=Pattern(elements=(TextElement(value="World"),)), - attributes=(), - comment=Comment(content="A comment", type=CommentType.COMMENT), - ) - - class RemoveCommentTransformer(ASTTransformer): - def visit_Comment(self, node: Comment) -> None: - return None # Valid: removes optional comment field - - transformer = RemoveCommentTransformer() - transformed = transformer.transform(message) - - # Transformation should succeed with comment removed - assert isinstance(transformed, Message) - assert transformed.comment is None - assert transformed.id.name == "hello" - - def test_placeable_expression_validation(self) -> None: - """Placeable.expression validates scalar field assignment.""" - class RemoveExpressionTransformer(ASTTransformer): - def visit_VariableReference(self, node: VariableReference) -> None: - return None # Invalid: Placeable.expression requires node - - placeable = Placeable(expression=VariableReference(id=Identifier(name="var"))) - - transformer = RemoveExpressionTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.transform(placeable) - - error_msg = str(exc_info.value) - assert "Cannot assign None to required scalar field" in error_msg - assert "Placeable.expression" in error_msg - - def test_error_message_shows_node_types_for_list(self) -> None: - """Error message for list assignment shows node types.""" - class MultipleIdentifiersTransformer(ASTTransformer): - def visit_Identifier(self, node: Identifier) -> list[Identifier]: - return [ - Identifier(name="a"), - Identifier(name="b"), - Identifier(name="c"), - ] - - message = Message( - id=Identifier(name="test"), - value=Pattern(elements=(TextElement(value="Test"),)), - attributes=(), - ) - - transformer = MultipleIdentifiersTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.transform(message) - - error_msg = str(exc_info.value) - assert "Got 3 nodes" in error_msg - assert "['Identifier', 'Identifier', 'Identifier']" in error_msg - - def test_nested_transformation_validates_all_levels(self) -> None: - """Validation applies recursively at all nesting levels.""" - class RemoveNestedIdentifierTransformer(ASTTransformer): - def visit_Identifier(self, node: Identifier) -> Identifier | None: - if node.name == "var": - return None # Invalid for scalar field - return node - - # Nested structure: Message -> Pattern -> Placeable -> VariableReference -> Identifier - message = Message( - id=Identifier(name="msg"), - value=Pattern( - elements=( - Placeable(expression=VariableReference(id=Identifier(name="var"))), - ) - ), - attributes=(), - ) - - transformer = RemoveNestedIdentifierTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.transform(message) - - # Error should be raised when trying to assign None to VariableReference.id - error_msg = str(exc_info.value) - assert "Cannot assign None to required scalar field" in error_msg - assert "VariableReference.id" in error_msg - - def test_validation_with_generic_visit(self) -> None: - """Validation works with default generic_visit (no custom visit methods).""" - class BreakScalarFieldTransformer(ASTTransformer): - def visit_Identifier(self, node: Identifier) -> None: - return None - - # Use a complex node to test generic_visit path - from ftllexengine.syntax.ast import ( # noqa: PLC0415 - import inside function - NumberLiteral, - SelectExpression, - Variant, - ) - - select_expr = SelectExpression( - selector=VariableReference(id=Identifier(name="count")), - variants=( - Variant( - key=NumberLiteral(value=1, raw="1"), - value=Pattern(elements=(TextElement(value="one"),)), - default=True, - ), - ), - ) - - transformer = BreakScalarFieldTransformer() - - with pytest.raises(TypeError) as exc_info: - transformer.transform(select_expr) - - # Should fail on SelectExpression.selector -> VariableReference.id - error_msg = str(exc_info.value) - assert "Cannot assign None to required scalar field" in error_msg +"""Aggregated syntax visitor transformer test surface.""" + +from tests.syntax_visitor_transformer_cases.additional_coverage_tests import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.core import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.error_cases_and_defensive_branches_from_test_visitor_error_cases_py import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.scalar_field_validation_from_test_transformer_validation_py import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.tests_for_generic_visit_branch_coverage import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.tests_for_transform_list_edge_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.tests_for_validate_optional_scalar_result_error_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.tests_for_validate_scalar_result_error_cases import * # noqa: F403 - re-export split test surface +from tests.syntax_visitor_transformer_cases.transform_list_type_validation_from_test_transformer_type_validation_py import * # noqa: F403 - re-export split test surface diff --git a/tests/test_validation_resource.py b/tests/test_validation_resource.py index 1993ee8e..22a01adc 100644 --- a/tests/test_validation_resource.py +++ b/tests/test_validation_resource.py @@ -1,1417 +1,11 @@ -"""Tests for validation.resource: validate_resource(), graph algorithms, edge cases.""" - -from __future__ import annotations - -from unittest.mock import patch - -from hypothesis import event, given -from hypothesis import strategies as st - -from ftllexengine.diagnostics import DiagnosticCode -from ftllexengine.syntax import ( - Identifier, - Junk, - Message, - MessageReference, - Pattern, - Placeable, - Term, - TermReference, - TextElement, -) -from ftllexengine.syntax.cursor import LineOffsetCache -from ftllexengine.validation.resource import ( - _detect_circular_references, - _extract_syntax_errors, - validate_resource, -) -from ftllexengine.validation.resource_graph import ( - _compute_longest_paths, - build_dependency_graph, -) - - -class TestSyntaxErrorExtraction: - """Test extraction of syntax errors from Junk entries.""" - - def test_single_junk_entry_creates_validation_error(self) -> None: - """Test that Junk entry is converted to ValidationError.""" - ftl = "invalid junk entry" - result = validate_resource(ftl) - - # Should have syntax error - assert len(result.errors) > 0 - assert any("parse" in err.code.name.lower() for err in result.errors) - - def test_multiple_junk_entries_create_multiple_errors(self) -> None: - """Test multiple Junk entries create multiple errors.""" - ftl = """ -invalid entry 1 -also bad = { broken -another junk line -""" - result = validate_resource(ftl) - # Should have multiple errors (exact count depends on parser) - assert len(result.errors) >= 1 - - def test_junk_with_span_information(self) -> None: - """Test that Junk errors include position information.""" - ftl = "msg = { broken syntax }" - result = validate_resource(ftl) - - # Should have error with position info - if len(result.errors) > 0: - error = result.errors[0] - # Line/column may be set - assert error.code is not None - - @given( - st.text(min_size=1, max_size=50).filter( - lambda s: "=" not in s and "{" not in s and "}" not in s - ) - ) - def test_invalid_syntax_property(self, invalid_text: str) -> None: - """PROPERTY: Invalid FTL syntax produces validation errors. - - Events emitted: - - has_errors={bool}: Whether validation produced errors - - has_whitespace={bool}: Whether input contains whitespace - """ - - result = validate_resource(invalid_text) - - # Emit events for semantic coverage - event(f"has_errors={len(result.errors) > 0}") - event(f"has_whitespace={any(c.isspace() for c in invalid_text)}") - - # Either parses as comment/junk or has errors - assert result is not None - - -class TestDuplicateIdDetection: - """Test duplicate message and term ID detection.""" - - def test_duplicate_message_ids_produce_warning(self) -> None: - """Test duplicate message IDs create warnings.""" - ftl = """ -msg = First value -msg = Second value -""" - result = validate_resource(ftl) - - # Should have warning about duplicate - assert len(result.warnings) > 0 - assert any( - "duplicate" in warn.message.lower() and "msg" in warn.message.lower() - for warn in result.warnings - ) - - def test_duplicate_term_ids_produce_warning(self) -> None: - """Test duplicate term IDs create warnings.""" - ftl = """ --term = First value --term = Second value -""" - result = validate_resource(ftl) - - # Should have warning - assert len(result.warnings) > 0 - assert any("duplicate" in warn.message.lower() for warn in result.warnings) - - def test_no_duplicate_warning_for_unique_ids(self) -> None: - """Test no duplicate warnings when IDs are unique.""" - ftl = """ -msg1 = First -msg2 = Second --term1 = Term one --term2 = Term two -""" - result = validate_resource(ftl) - - # Should not have duplicate warnings - duplicate_warnings = [ - w for w in result.warnings if "duplicate" in w.message.lower() - ] - assert len(duplicate_warnings) == 0 - - @given( - st.lists( - st.from_regex(r"[a-z]+", fullmatch=True), - min_size=2, - max_size=5, - ) - ) - def test_multiple_duplicates_property(self, ids: list[str]) -> None: - """PROPERTY: Multiple duplicate IDs all produce warnings. - - Events emitted: - - duplicate_count={n}: Number of duplicate entries (len - 1) - """ - - # Create FTL with all same ID - ftl_lines = [f"{ids[0]} = Value {i}" for i in range(len(ids))] - ftl = "\n".join(ftl_lines) - - # Emit event for duplicate count - event(f"duplicate_count={len(ids) - 1}") - - result = validate_resource(ftl) - # Should have warnings (at least len(ids) - 1 duplicates) - if len(ids) > 1: - assert len(result.warnings) >= 1 - - -class TestMessageWithoutValue: - """Test validation of messages without values (only attributes).""" - - def test_message_with_only_attributes_produces_warning(self) -> None: - """Test message with no value but attributes gets warning.""" - ftl = """ -msg = - .attr1 = Value 1 - .attr2 = Value 2 -""" - result = validate_resource(ftl) - - # Per FTL spec, message can have only attributes (valid) - # But implementation may warn about this pattern - # Check it doesn't crash - assert result is not None - - def test_message_with_value_and_attributes_no_warning(self) -> None: - """Test message with both value and attributes is valid.""" - ftl = """ -msg = Value - .attr = Attribute -""" - result = validate_resource(ftl) - - # Should be valid - no warnings about structure - assert result is not None - assert result.is_valid - - -class TestUndefinedReferenceDetection: - """Test detection of undefined message and term references.""" - - def test_undefined_message_reference_produces_warning(self) -> None: - """Test reference to undefined message produces warning.""" - ftl = """ -msg = { other } -""" - result = validate_resource(ftl) - - # Should warn about undefined reference - assert len(result.warnings) > 0 - assert any( - "undefined" in warn.message.lower() or "reference" in warn.message.lower() - for warn in result.warnings - ) - - def test_undefined_term_reference_produces_warning(self) -> None: - """Test reference to undefined term produces warning.""" - ftl = """ -msg = { -undefined } -""" - result = validate_resource(ftl) - - # Should warn about undefined term - assert len(result.warnings) > 0 - assert any("undefined" in warn.message.lower() for warn in result.warnings) - - def test_defined_message_reference_no_warning(self) -> None: - """Test reference to defined message produces no warning.""" - ftl = """ -other = Other message -msg = { other } -""" - result = validate_resource(ftl) - - # Should not warn about this reference - undefined_warnings = [ - w for w in result.warnings if "undefined" in w.message.lower() - ] - assert len(undefined_warnings) == 0 - - def test_defined_term_reference_no_warning(self) -> None: - """Test reference to defined term produces no warning.""" - ftl = """ --brand = Firefox -msg = { -brand } -""" - result = validate_resource(ftl) - - undefined_warnings = [ - w for w in result.warnings if "undefined" in w.message.lower() - ] - assert len(undefined_warnings) == 0 - - def test_term_referencing_undefined_message(self) -> None: - """Test term that references undefined message.""" - ftl = """ --term = { undefined } -""" - result = validate_resource(ftl) - - # Should warn - assert any("undefined" in w.message.lower() for w in result.warnings) - - def test_term_referencing_undefined_term(self) -> None: - """Test term that references undefined term.""" - ftl = """ --term1 = { -term2 } -""" - result = validate_resource(ftl) - - # Should warn - assert any("undefined" in w.message.lower() for w in result.warnings) - - -class TestCircularReferenceDetection: - """Test detection of circular dependencies.""" - - def test_direct_message_self_reference(self) -> None: - """Test message referencing itself.""" - ftl = """ -msg = { msg } -""" - result = validate_resource(ftl) - - # Should detect cycle - assert any("circular" in w.message.lower() for w in result.warnings) - - def test_indirect_message_cycle(self) -> None: - """Test indirect message cycle (A -> B -> A).""" - ftl = """ -a = { b } -b = { a } -""" - result = validate_resource(ftl) - - # Should detect cycle - assert any("circular" in w.message.lower() for w in result.warnings) - - def test_three_way_message_cycle(self) -> None: - """Test three-way message cycle (A -> B -> C -> A).""" - ftl = """ -a = { b } -b = { c } -c = { a } -""" - result = validate_resource(ftl) - - # Should detect cycle - assert any("circular" in w.message.lower() for w in result.warnings) - - def test_direct_term_self_reference(self) -> None: - """Test term referencing itself.""" - ftl = """ --term = { -term } -""" - result = validate_resource(ftl) - - # Should detect cycle - assert any("circular" in w.message.lower() for w in result.warnings) - - def test_indirect_term_cycle(self) -> None: - """Test indirect term cycle.""" - ftl = """ --a = { -b } --b = { -a } -""" - result = validate_resource(ftl) - - # Should detect cycle - assert any("circular" in w.message.lower() for w in result.warnings) - - def test_no_cycle_in_tree_structure(self) -> None: - """Test tree structure (no cycles) produces no warnings.""" - ftl = """ -base = Base -a = { base } -b = { base } -c = { a } -""" - result = validate_resource(ftl) - - # Should not warn about cycles - circular_warnings = [ - w for w in result.warnings if "circular" in w.message.lower() - ] - assert len(circular_warnings) == 0 - - -class TestValidationResultStructure: - """Test ValidationResult structure and properties.""" - - def test_valid_ftl_has_no_errors(self) -> None: - """Test valid FTL produces is_valid=True.""" - ftl = """ -msg = Hello --term = World -""" - result = validate_resource(ftl) - - assert result.is_valid - assert len(result.errors) == 0 - - def test_parse_error_sets_is_valid_false(self) -> None: - """Test parse errors set is_valid=False.""" - ftl = "invalid junk" - result = validate_resource(ftl) - - # Should have errors and be invalid - # (unless parser treats it as comment) - if len(result.errors) > 0: - assert not result.is_valid - - def test_warnings_dont_affect_is_valid(self) -> None: - """Test warnings don't set is_valid=False.""" - ftl = """ -msg = { undefined } -""" - result = validate_resource(ftl) - - # May have warnings but no errors - if len(result.errors) == 0: - assert result.is_valid - - def test_validation_result_has_all_fields(self) -> None: - """Test ValidationResult has all expected fields.""" - ftl = "msg = Test" - result = validate_resource(ftl) - - assert hasattr(result, "errors") - assert hasattr(result, "warnings") - assert hasattr(result, "annotations") - assert hasattr(result, "is_valid") - - assert isinstance(result.errors, tuple) - assert isinstance(result.warnings, tuple) - assert isinstance(result.annotations, tuple) - - -class TestCustomParserInstance: - """Test validate_resource with custom parser.""" - - def test_validate_with_custom_parser(self) -> None: - """Test validate_resource accepts custom parser.""" - from ftllexengine.syntax.parser import FluentParserV1 - - parser = FluentParserV1() - ftl = "msg = Test" - - result = validate_resource(ftl, parser=parser) - assert result is not None - assert result.is_valid - - def test_validate_creates_default_parser_if_none(self) -> None: - """Test validate_resource creates parser if not provided.""" - ftl = "msg = Test" - result = validate_resource(ftl) - - assert result is not None - - -class TestEdgeCases: - """Test edge cases and boundary conditions.""" - - def test_empty_resource(self) -> None: - """Test validation of empty resource.""" - ftl = "" - result = validate_resource(ftl) - - assert result is not None - # Empty resource is valid - assert result.is_valid - - def test_only_comments(self) -> None: - """Test resource with only comments.""" - ftl = """ -# Comment 1 -## Comment 2 -### Comment 3 -""" - result = validate_resource(ftl) - - assert result.is_valid - assert len(result.errors) == 0 - - def test_comments_and_valid_entries(self) -> None: - """Test mixed comments and entries.""" - ftl = """ -# Header comment -msg = Value - -## Section --term = Term value -""" - result = validate_resource(ftl) - - assert result.is_valid - - def test_whitespace_only(self) -> None: - """Test resource with only whitespace.""" - ftl = " \n\n \n" # Spaces only, no tabs (tabs can be invalid FTL) - result = validate_resource(ftl) - - # Should be valid or may have parse errors depending on tab handling - assert result is not None - - @given( - st.text( - alphabet=st.sampled_from(" \n\r"), # Only safe whitespace chars - min_size=0, - max_size=100, - ) - ) - def test_whitespace_property(self, whitespace: str) -> None: - """PROPERTY: Whitespace-only resources don't crash validation. - - Events emitted: - - length_category={bucket}: Length bucket (empty, short, medium, long) - - has_newlines={bool}: Whether input contains newlines - """ - - # Emit events for semantic coverage - length = len(whitespace) - if length == 0: - event("length_category=empty") - elif length < 10: - event("length_category=short") - elif length < 50: - event("length_category=medium") - else: - event("length_category=long") - - event(f"has_newlines={'\n' in whitespace}") - - result = validate_resource(whitespace) - # Should not crash (may have errors, but completes) - assert result is not None - - -class TestComplexScenarios: - """Test complex validation scenarios.""" - - def test_large_resource_with_multiple_issues(self) -> None: - """Test resource with multiple types of issues.""" - ftl = """ -# Valid comment -msg1 = Value - -# Duplicate ID -msg1 = Second value - -# Undefined reference -msg2 = { undefined } - -# Circular reference -a = { b } -b = { a } - -# Invalid syntax -invalid junk - -# Valid term --term = Term -""" - result = validate_resource(ftl) - - # Should collect all issues - assert len(result.errors) + len(result.warnings) > 0 - - def test_deeply_nested_references(self) -> None: - """Test chain of references without cycles.""" - ftl = """ -msg1 = Value -msg2 = { msg1 } -msg3 = { msg2 } -msg4 = { msg3 } -msg5 = { msg4 } -""" - result = validate_resource(ftl) - - # Should be valid (no cycles) - circular_warnings = [ - w for w in result.warnings if "circular" in w.message.lower() - ] - assert len(circular_warnings) == 0 - - def test_message_and_term_with_same_base_name(self) -> None: - """Test message and term can have same name (different namespaces).""" - ftl = """ -brand = Message --brand = Term -msg = { brand } and { -brand } -""" - result = validate_resource(ftl) - - # Should be valid - different namespaces - undefined_warnings = [ - w for w in result.warnings if "undefined" in w.message.lower() - ] - assert len(undefined_warnings) == 0 - - -class TestValidationIntegration: - """Integration tests combining multiple validation passes.""" - - def test_all_validation_passes_execute(self) -> None: - """Test all validation passes execute in sequence.""" - ftl = """ -# Syntax error -invalid - -# Duplicate -msg = First -msg = Second - -# Undefined reference -ref = { missing } - -# Circular reference -c1 = { c2 } -c2 = { c1 } -""" - result = validate_resource(ftl) - - # Should have collected issues from all passes - total_issues = len(result.errors) + len(result.warnings) - assert total_issues > 0 - - @given( - st.lists( - st.from_regex(r"[a-z]+", fullmatch=True), - min_size=1, - max_size=10, - unique=True, - ) - ) - def test_valid_messages_property(self, identifiers: list[str]) -> None: - """PROPERTY: Valid messages with unique IDs validate successfully. - - Events emitted: - - message_count={n}: Number of messages in resource - """ - - ftl_lines = [f"{id_} = Value for {id_}" for id_ in identifiers] - ftl = "\n".join(ftl_lines) - - # Emit event for message count - event(f"message_count={len(identifiers)}") - - result = validate_resource(ftl) - - # Should be valid - assert result.is_valid - assert len(result.errors) == 0 - - -# ============================================================================ -# LINE 113: Test Message Without Value or Attributes -# ============================================================================ - - -class TestMessageWithoutValueOrAttributes: - """Test validation of message with neither value nor attributes (line 113).""" - - def test_message_without_value_or_attributes_raises_at_construction(self) -> None: - """Message with neither value nor attributes raises ValueError at construction. - - The __post_init__ validation now enforces this invariant at construction - time rather than deferring to the validator. - """ - import pytest - - from ftllexengine.syntax.ast import Identifier, Message - - with pytest.raises(ValueError, match="must have a value or at least one attribute"): - Message( - id=Identifier("empty_msg"), - value=None, - attributes=(), - ) - - -# ============================================================================ -# BRANCH COVERAGE: Test Missing Branches -# ============================================================================ - - -class TestMissingBranchCoverage: - """Test missing branch coverage in resource.py.""" - - def test_junk_without_span_line_56(self) -> None: - """Test Junk entry without span (branch 56->60). - - Line 56: if entry.span - When span is None, line/column remain None. - """ - from ftllexengine.syntax.ast import Junk, Resource - from ftllexengine.validation.resource import _extract_syntax_errors - - # Create Junk with no span - junk_no_span = Junk(content="invalid", span=None) - resource = Resource(entries=(junk_no_span,)) - - # Extract errors with LineOffsetCache - errors = _extract_syntax_errors(resource, LineOffsetCache("source")) - - # Should have error with line=None, column=None - assert len(errors) == 1 - assert errors[0].line is None - assert errors[0].column is None - - def test_term_references_undefined_message_line_187(self) -> None: - """Test term referencing undefined message (branch 187->186). - - Line 187: if ref not in messages_dict - This tests the loop iteration when a term references a message. - Branch 187->186 is when the message DOES exist (if condition is False). - """ - from ftllexengine.syntax.ast import ( - Identifier, - Message, - MessageReference, - Pattern, - Placeable, - Term, - TextElement, - ) - from ftllexengine.validation.resource import _check_undefined_references - - # Create message that exists - existing_message = Message( - id=Identifier("existing_msg"), - value=Pattern(elements=(TextElement("text"),)), - attributes=(), - ) - - # Create term that references the existing message - term_with_msg_ref = Term( - id=Identifier("myterm"), - value=Pattern(elements=( - TextElement("text"), - Placeable( - expression=MessageReference(id=Identifier("existing_msg")) - ), # Reference to message that EXISTS - )), - attributes=(), - ) - - messages_dict = {"existing_msg": existing_message} # Message exists - terms_dict = {"myterm": term_with_msg_ref} - - # Check references with empty LineOffsetCache for AST-only testing - warnings = _check_undefined_references(messages_dict, terms_dict, LineOffsetCache("")) - - # Should have NO warnings (message exists) - # This tests branch 187->186 (if condition is False, continue to next iteration) - undefined_warnings = [w for w in warnings if "undefined" in w.message.lower()] - assert len(undefined_warnings) == 0 - - def test_duplicate_cycle_detection_line_243(self) -> None: - """Test cycle deduplication for messages. - - Verifies that the unified graph cycle detection produces exactly one - warning per unique cycle, not multiple warnings for the same cycle - detected from different starting points. - - Uses unified cross-type cycle detection. - """ - from ftllexengine.syntax.ast import ( - Identifier, - Message, - MessageReference, - Pattern, - Placeable, - ) - - # Create circular messages: a -> b -> a - msg_a = Message( - id=Identifier("a"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("b"))),) - ), - attributes=(), - ) - msg_b = Message( - id=Identifier("b"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("a"))),) - ), - attributes=(), - ) - - messages_dict = {"a": msg_a, "b": msg_b} - terms_dict: dict[str, Term] = {} - - # Build dependency graph - graph = build_dependency_graph(messages_dict, terms_dict) - # Call the real function without mocking - warnings = _detect_circular_references(graph) - - # Should only have 1 warning (cycle a -> b -> a is detected once) - circular_warnings = [w for w in warnings if "circular" in w.message.lower()] - assert len(circular_warnings) == 1 - # Should mention both messages in the cycle - warning_msg = circular_warnings[0].message.lower() - assert "a" in warning_msg or "b" in warning_msg - - def test_duplicate_cycle_detection_line_257(self) -> None: - """Test cycle deduplication for terms. - - Verifies that term-only cycles are detected and deduplicated properly - in the unified graph. - - Uses unified cross-type cycle detection. - """ - from ftllexengine.syntax.ast import ( - Identifier, - Pattern, - Placeable, - Term, - TermReference, - ) - - # Create circular terms: -ta -> -tb -> -ta - term_a = Term( - id=Identifier("ta"), - value=Pattern( - elements=(Placeable(expression=TermReference(id=Identifier("tb"))),) - ), - attributes=(), - ) - term_b = Term( - id=Identifier("tb"), - value=Pattern( - elements=(Placeable(expression=TermReference(id=Identifier("ta"))),) - ), - attributes=(), - ) - - messages_dict: dict[str, Message] = {} - terms_dict = {"ta": term_a, "tb": term_b} - - # Build dependency graph - graph = build_dependency_graph(messages_dict, terms_dict) - # Call the real function without mocking - warnings = _detect_circular_references(graph) - - # Should only have 1 warning (cycle ta -> tb -> ta is detected once) - circular_warnings = [w for w in warnings if "circular" in w.message.lower()] - assert len(circular_warnings) == 1 - # Should mention both terms in the cycle - warning_msg = circular_warnings[0].message.lower() - assert "ta" in warning_msg or "tb" in warning_msg - - -# ============================================================================ -# API BOUNDARY VALIDATION: TypeError for Non-String Input -# ============================================================================ - - -class TestAPIBoundaryValidation: - """Test API boundary validation for validate_resource. - - Tests defensive type checking at API boundaries (lines 760-764). - """ - - def test_validate_resource_raises_typeerror_for_bytes(self) -> None: - """Test validate_resource raises TypeError when passed bytes instead of str. - - Type hints are not enforced at runtime. Users may incorrectly pass bytes - when the API expects str. The function defensively checks and raises - TypeError with a helpful message. - - Covers lines 760-764 and branch [759, 760]. - """ - import pytest - - # Pass bytes instead of str (common mistake when reading files) - source_bytes = b"msg = Hello" - - with pytest.raises( - TypeError, - match=r"source must be str, not bytes.*Decode bytes to str", - ): - validate_resource(source_bytes) # type: ignore[arg-type] - - def test_validate_resource_raises_typeerror_for_none(self) -> None: - """Test validate_resource raises TypeError when passed None.""" - import pytest - - with pytest.raises( - TypeError, - match=r"source must be str, not NoneType", - ): - validate_resource(None) # type: ignore[arg-type] - - def test_validate_resource_raises_typeerror_for_int(self) -> None: - """Test validate_resource raises TypeError when passed int.""" - import pytest - - with pytest.raises( - TypeError, - match=r"source must be str, not int", - ): - validate_resource(42) # type: ignore[arg-type] - - def test_validate_resource_raises_typeerror_for_list(self) -> None: - """Test validate_resource raises TypeError when passed list.""" - import pytest - - with pytest.raises( - TypeError, - match=r"source must be str, not list", - ): - validate_resource(["msg = Hello"]) # type: ignore[arg-type] - - @given( - st.one_of( - st.binary(min_size=1, max_size=50), - st.integers(), - st.lists(st.text()), - st.none(), - ) - ) - def test_validate_resource_rejects_non_string_types_property( - self, invalid_input: bytes | int | list[str] | None - ) -> None: - """PROPERTY: validate_resource rejects all non-string types with TypeError. - - Events emitted: - - input_type={type}: Type of invalid input tested - """ - import pytest - - # Emit event for input type diversity - event(f"input_type={type(invalid_input).__name__}") - - with pytest.raises(TypeError, match=r"source must be str"): - validate_resource(invalid_input) # type: ignore[arg-type] - - -# ============================================================================ -# _compute_longest_paths: Diamond Pattern (Line 556) -# ============================================================================ - - -class TestComputeLongestPathsDiamondPattern: - """Tests for _compute_longest_paths with diamond dependency patterns. - - Targets line 556: continue when node already in longest_path during - stack processing (not outer loop). - """ - - def test_diamond_pattern_triggers_inner_continue(self) -> None: - """Diamond pattern: A->B, A->C->B causes B to be encountered twice. - - When DFS processes A: - 1. Descends to B first, computes longest_path[B] - 2. Descends to C, which references B - 3. C tries to process B, but B is already in longest_path - 4. This triggers line 556: continue (inner stack check) - - This is different from outer loop skip (line 545-546). - """ - # Create diamond: msg_a -> msg_b, msg_a -> msg_c -> msg_b - graph = { - "msg:a": {"msg:b", "msg:c"}, - "msg:b": set(), - "msg:c": {"msg:b"}, - } - - result = _compute_longest_paths(graph) - - # All nodes should be processed - assert "msg:a" in result - assert "msg:b" in result - assert "msg:c" in result - - # msg_b has no dependencies: depth 0 - assert result["msg:b"][0] == 0 - # msg_c depends on msg_b: depth 1 - assert result["msg:c"][0] == 1 - # msg_a has longest path through msg_c: depth 2 - assert result["msg:a"][0] == 2 - - def test_multi_level_diamond_pattern(self) -> None: - """Multi-level diamond: A->B->D, A->C->D ensures deep graph traversal.""" - graph = { - "msg:a": {"msg:b", "msg:c"}, - "msg:b": {"msg:d"}, - "msg:c": {"msg:d"}, - "msg:d": set(), - } - - result = _compute_longest_paths(graph) - - # msg_d is leaf: depth 0 - assert result["msg:d"][0] == 0 - # msg_b and msg_c both depend on msg_d: depth 1 - assert result["msg:b"][0] == 1 - assert result["msg:c"][0] == 1 - # msg_a depends on msg_b/msg_c: depth 2 - assert result["msg:a"][0] == 2 - - def test_complex_dag_with_shared_nodes(self) -> None: - """Complex DAG: A->B->E, A->C->E, A->D->E ensures multiple paths converge.""" - graph = { - "msg:a": {"msg:b", "msg:c", "msg:d"}, - "msg:b": {"msg:e"}, - "msg:c": {"msg:e"}, - "msg:d": {"msg:e"}, - "msg:e": set(), - } - - result = _compute_longest_paths(graph) - - # msg_e is referenced by 3 nodes - assert result["msg:e"][0] == 0 - assert result["msg:b"][0] == 1 - assert result["msg:c"][0] == 1 - assert result["msg:d"][0] == 1 - assert result["msg:a"][0] == 2 - - @given( - num_intermediate=st.integers(min_value=2, max_value=5), - ) - def test_diamond_pattern_property(self, num_intermediate: int) -> None: - """Property: Diamond with N intermediate nodes all converging to same leaf. - - Pattern: root -> {node1, node2, ..., nodeN} -> leaf - - Events emitted: - - num_intermediate={n}: Number of intermediate nodes - """ - # Emit event for fuzzer guidance - event(f"num_intermediate={num_intermediate}") - - graph: dict[str, set[str]] = { - "msg:root": {f"msg:mid{i}" for i in range(num_intermediate)}, - "msg:leaf": set(), - } - for i in range(num_intermediate): - graph[f"msg:mid{i}"] = {"msg:leaf"} - - result = _compute_longest_paths(graph) - - # Leaf has no dependencies - assert result["msg:leaf"][0] == 0 - # All intermediate nodes have depth 1 - for i in range(num_intermediate): - assert result[f"msg:mid{i}"][0] == 1 - # Root has depth 2 - assert result["msg:root"][0] == 2 - - -# ============================================================================ -# _compute_longest_paths: Cycle/Back-Edge Handling (Line 554-555) -# ============================================================================ - - -class TestComputeLongestPathsCycleHandling: - """Tests for _compute_longest_paths with cycles (back-edge detection). - - Targets line 554-555: continue when node in in_stack (back-edge detection). - This is different from diamond patterns - actual cycles, not DAGs. - """ - - def test_simple_two_node_cycle(self) -> None: - """Two-node cycle: A->B->A triggers back-edge detection. - - When DFS processes A: - 1. Push (A, 0), mark A in_stack - 2. Push (B, 0), mark B in_stack - 3. B references A, so push (A, 0) - 4. A is already in in_stack -> triggers line 554 second condition - """ - graph = { - "msg:a": {"msg:b"}, - "msg:b": {"msg:a"}, - } - - result = _compute_longest_paths(graph) - - # Both nodes should be processed - assert "msg:a" in result - assert "msg:b" in result - - # Cycle is broken by back-edge detection - # A depends on B (depth 1), B's back-edge to A is skipped (depth 0) - assert result["msg:a"][0] == 1 - assert result["msg:b"][0] == 0 - - def test_three_node_cycle(self) -> None: - """Three-node cycle: A->B->C->A triggers back-edge on longer path.""" - graph = { - "msg:a": {"msg:b"}, - "msg:b": {"msg:c"}, - "msg:c": {"msg:a"}, - } - - result = _compute_longest_paths(graph) - - # All nodes processed - assert "msg:a" in result - assert "msg:b" in result - assert "msg:c" in result - - # Cycle is broken at C (back-edge to A skipped) - # A->B->C, C's back-edge to A is ignored - assert result["msg:a"][0] == 2 - assert result["msg:b"][0] == 1 - assert result["msg:c"][0] == 0 - - def test_self_referencing_node(self) -> None: - """Self-reference: A->A is simplest cycle case.""" - graph = { - "msg:a": {"msg:a"}, - } - - result = _compute_longest_paths(graph) - - assert "msg:a" in result - # Self-reference creates back-edge immediately - assert result["msg:a"][0] == 0 - - def test_cycle_with_tail(self) -> None: - """Cycle with tail: D->A->B->C->A (D leads into cycle).""" - graph = { - "msg:d": {"msg:a"}, - "msg:a": {"msg:b"}, - "msg:b": {"msg:c"}, - "msg:c": {"msg:a"}, - } - - result = _compute_longest_paths(graph) - - # All nodes processed - assert len(result) == 4 - - # D is outside cycle, has longest path through cycle - assert result["msg:d"][0] >= 3 - - @given( - cycle_size=st.integers(min_value=2, max_value=6), - ) - def test_cycle_property(self, cycle_size: int) -> None: - """Property: N-node cycle should not cause infinite loop. - - Creates a cycle: 0->1->2->...->N-1->0 - - Events emitted: - - cycle_size={n}: Size of the cycle - """ - # Emit event for fuzzer guidance - event(f"cycle_size={cycle_size}") - - graph: dict[str, set[str]] = {} - for i in range(cycle_size): - next_node = (i + 1) % cycle_size - graph[f"msg:n{i}"] = {f"msg:n{next_node}"} - - result = _compute_longest_paths(graph) - - # All nodes should be processed (no infinite loop) - assert len(result) == cycle_size - - # Each node should have finite depth - for i in range(cycle_size): - depth, _path = result[f"msg:n{i}"] - assert depth < cycle_size # Depth bounded by cycle size - - -# ============================================================================ -# _detect_circular_references: Duplicate Cycle Keys (Branch 425) -# ============================================================================ - - -class TestDetectCircularReferencesDuplicateCycleKeys: - """Tests for _detect_circular_references duplicate cycle key handling. - - Targets branch 425->423: if cycle_key not in seen_cycle_keys (false branch). - """ - - def test_duplicate_cycle_from_detect_cycles(self) -> None: - """Mock detect_cycles to return duplicate cycles for defensive code test.""" - # Create a simple cycle - graph = { - "msg:a": {"msg:b"}, - "msg:b": {"msg:a"}, - } - - # Mock detect_cycles to yield the same cycle twice - with patch("ftllexengine.validation.resource.detect_cycles") as mock_detect: - # Return same cycle twice to test deduplication logic - cycle = ["msg:a", "msg:b", "msg:a"] - mock_detect.return_value = iter([cycle, cycle]) - - warnings = _detect_circular_references(graph) - - # Should deduplicate and return only one warning - assert len(warnings) == 1 - assert warnings[0].code == DiagnosticCode.VALIDATION_CIRCULAR_REFERENCE - - def test_cycle_key_deduplication_with_permutations(self) -> None: - """Cycle keys should deduplicate permutations (A->B->A == B->A->B).""" - # This tests the make_cycle_key function indirectly - # Create a self-referencing cycle to ensure consistent behavior - graph = { - "msg:x": {"msg:y"}, - "msg:y": {"msg:z"}, - "msg:z": {"msg:x"}, - } - - warnings = _detect_circular_references(graph) - - # Should detect exactly one cycle (not multiple rotations) - assert len(warnings) == 1 - cycle_warnings = [ - w for w in warnings - if w.code == DiagnosticCode.VALIDATION_CIRCULAR_REFERENCE - ] - assert len(cycle_warnings) == 1 - - -# ============================================================================ -# _detect_circular_references: Malformed Node Formatting (Branch 434) -# ============================================================================ - - -class TestDetectCircularReferencesMalformedNodes: - """Tests for _detect_circular_references with malformed graph nodes. - - Targets branch 434->431: node doesn't start with "msg:" or "term:". - """ - - def test_malformed_node_in_cycle_skipped_in_formatting(self) -> None: - """Malformed nodes (no msg:/term: prefix) handled gracefully in formatting.""" - # Directly test with malformed graph (shouldn't happen in practice) - # This tests defensive programming - graph = { - "msg:a": {"malformed_node"}, - "malformed_node": {"msg:a"}, - } - - # Mock detect_cycles to return a cycle with malformed node - with patch("ftllexengine.validation.resource.detect_cycles") as mock_detect: - cycle = ["msg:a", "malformed_node", "msg:a"] - mock_detect.return_value = iter([cycle]) - - warnings = _detect_circular_references(graph) - - # Should still create a warning - assert len(warnings) == 1 - assert warnings[0].code == DiagnosticCode.VALIDATION_CIRCULAR_REFERENCE - - # Context should only contain properly formatted nodes - # "malformed_node" should be skipped (no prefix match) - assert warnings[0].context is not None - # The formatted output should contain "a" but not include malformed_node - # (since it doesn't match msg: or term: prefixes) - assert "a" in warnings[0].context - - def test_mixed_valid_and_malformed_nodes_in_cycle(self) -> None: - """Cycle with mix of valid and malformed nodes formats valid ones only.""" - graph = { - "msg:valid1": {"term:valid2"}, - "term:valid2": {"bad_node"}, - "bad_node": {"msg:valid1"}, - } - - with patch("ftllexengine.validation.resource.detect_cycles") as mock_detect: - cycle = ["msg:valid1", "term:valid2", "bad_node", "msg:valid1"] - mock_detect.return_value = iter([cycle]) - - warnings = _detect_circular_references(graph) - - assert len(warnings) == 1 - assert warnings[0].context is not None - # Should format valid nodes - assert "valid1" in warnings[0].context - assert "-valid2" in warnings[0].context - # bad_node should be skipped in formatting (no prefix) - - -# ============================================================================ -# Integration Tests with Real FTL Structures -# ============================================================================ - - -class TestValidationResourceCompleteIntegration: - """Integration tests combining edge cases using real FTL AST structures.""" - - def test_diamond_dependency_in_real_messages(self) -> None: - """Diamond pattern with real Message objects.""" - # Create: msgA -> msgB, msgA -> msgC -> msgB - msg_b = Message( - id=Identifier("msgB"), - value=Pattern(elements=(TextElement(value="Base message"),)), - attributes=(), - ) - msg_c = Message( - id=Identifier("msgC"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("msgB"))),) - ), - attributes=(), - ) - msg_a = Message( - id=Identifier("msgA"), - value=Pattern( - elements=( - Placeable(expression=MessageReference(id=Identifier("msgB"))), - TextElement(value=" and "), - Placeable(expression=MessageReference(id=Identifier("msgC"))), - ) - ), - attributes=(), - ) - - messages_dict = {"msgA": msg_a, "msgB": msg_b, "msgC": msg_c} - terms_dict: dict[str, Term] = {} - - # Build dependency graph - graph = build_dependency_graph(messages_dict, terms_dict) - - # Compute longest paths (exercises diamond pattern) - result = _compute_longest_paths(graph) - - # msgB is referenced by both msgA and msgC - assert "msg:msgB" in result - assert result["msg:msgB"][0] == 0 - assert result["msg:msgC"][0] == 1 - assert result["msg:msgA"][0] == 2 - - def test_cross_type_diamond_message_and_term(self) -> None: - """Diamond with cross-type references: msg -> term, msg -> msg -> term.""" - # Create: msgA -> termB, msgA -> msgC -> termB - term_b = Term( - id=Identifier("termB"), - value=Pattern(elements=(TextElement(value="Term value"),)), - attributes=(), - ) - msg_c = Message( - id=Identifier("msgC"), - value=Pattern( - elements=(Placeable(expression=TermReference(id=Identifier("termB"))),) - ), - attributes=(), - ) - msg_a = Message( - id=Identifier("msgA"), - value=Pattern( - elements=( - Placeable(expression=TermReference(id=Identifier("termB"))), - TextElement(value=" via "), - Placeable(expression=MessageReference(id=Identifier("msgC"))), - ) - ), - attributes=(), - ) - - messages_dict = {"msgA": msg_a, "msgC": msg_c} - terms_dict = {"termB": term_b} - - # Build dependency graph - graph = build_dependency_graph(messages_dict, terms_dict) - - # Compute longest paths - result = _compute_longest_paths(graph) - - # termB is referenced by both msgA and msgC - assert "term:termB" in result - assert result["term:termB"][0] == 0 - assert result["msg:msgC"][0] == 1 - assert result["msg:msgA"][0] == 2 - - @given( - num_messages=st.integers(min_value=3, max_value=8), - ) - def test_property_complex_dependency_graphs(self, num_messages: int) -> None: - """Property: Complex dependency graphs always compute without errors. - - Events emitted: - - num_messages={n}: Number of messages in graph - """ - # Emit event for fuzzer guidance - event(f"num_messages={num_messages}") - - # Create a chain with some cross-references - messages_dict: dict[str, Message] = {} - - for i in range(num_messages): - if i == num_messages - 1: - # Last message has no references - value = Pattern(elements=(TextElement(value="End"),)) - elif i % 2 == 0: - # Even messages reference next message - value = Pattern( - elements=( - Placeable( - expression=MessageReference(id=Identifier(f"msg{i+1}")) - ), - ) - ) - else: - # Odd messages reference last message (creates diamond-like structure) - value = Pattern( - elements=( - Placeable( - expression=MessageReference( - id=Identifier(f"msg{num_messages-1}") - ) - ), - ) - ) - - messages_dict[f"msg{i}"] = Message( - id=Identifier(f"msg{i}"), - value=value, - attributes=(), - ) - - terms_dict: dict[str, Term] = {} - - # Build and compute - should not raise - graph = build_dependency_graph(messages_dict, terms_dict) - result = _compute_longest_paths(graph) - - # All messages should be in result - assert len(result) >= num_messages - - -class TestValidationResourceEdgeCases: - """Coverage for validation/resource.py edge cases.""" - - def test_junk_without_span(self) -> None: - """Junk entry without span uses None for line/column.""" - junk = Junk(content="invalid", span=None) - - class MockResource: - def __init__(self) -> None: - self.entries = [junk] - - errors = _extract_syntax_errors( - MockResource(), "invalid" # type: ignore[arg-type] - ) - assert len(errors) > 0 - assert errors[0].line is None - - def test_validation_with_invalid_ftl(self) -> None: - """Validation handles malformed FTL gracefully.""" - result = validate_resource("msg = { $val ->") - assert result is not None - - def test_cycle_deduplication(self) -> None: - """Circular references are detected without duplicates.""" - ftl = "\na = { b }\nb = { a }\nc = { d }\nd = { c }\n" - result = validate_resource(ftl) - circular = [ - w for w in result.warnings - if "circular" in w.message.lower() - ] - assert len(circular) >= 2 +"""Aggregated validation resource test surface.""" + +from tests.validation_resource_cases.api_boundary_validation_type_error_for_non_string_input import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.branch_coverage_test_missing_branches import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.compute_longest_paths_cycle_back_edge_handling_line_554_555 import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.compute_longest_paths_diamond_pattern_line_556 import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.core import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.detect_circular_references_duplicate_cycle_keys_branch_425 import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.detect_circular_references_malformed_node_formatting_branch_434 import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.integration_tests_with_real_ftl_structures import * # noqa: F403 - re-export split test surface +from tests.validation_resource_cases.line_113_test_message_without_value_or_attributes import * # noqa: F403 - re-export split test surface diff --git a/tests/test_validation_resource_dependency_graph.py b/tests/test_validation_resource_dependency_graph.py index d8862560..6ee2478e 100644 --- a/tests/test_validation_resource_dependency_graph.py +++ b/tests/test_validation_resource_dependency_graph.py @@ -1,1041 +1,5 @@ -"""Dependency graph construction tests for validation/resource_graph.py. +"""Aggregated validation resource dependency graph test surface.""" -Tests attribute-qualified reference resolution and known entry dependency -propagation to achieve 100% coverage of build_dependency_graph and -related helper functions. - -Coverage targets: -- Lines 507-509: _resolve_msg_ref with attribute-qualified references -- Lines 519-521: _resolve_term_ref with attribute-qualified references -- Line 572: known_msg_deps dependency propagation -- Line 582: known_term_deps dependency propagation -""" - -from __future__ import annotations - -from hypothesis import event, given -from hypothesis import strategies as st - -from ftllexengine.syntax.ast import ( - Attribute, - Identifier, - Message, - MessageReference, - Pattern, - Placeable, - SelectExpression, - Term, - TermReference, - TextElement, - Variant, -) -from ftllexengine.validation.resource import _detect_circular_references -from ftllexengine.validation.resource_graph import build_dependency_graph - - -class TestAttributeQualifiedMessageReferences: - """Test attribute-qualified message reference resolution (lines 507-509).""" - - def test_undefined_attribute_qualified_message_reference(self) -> None: - """Attribute-qualified reference to undefined message returns None. - - Tests branch 508->513: When "." is in ref but base message doesn't - exist in messages_dict or known_messages, _resolve_msg_ref returns None - and the reference is NOT added to the dependency graph. - """ - # Message referencing undefined message's attribute - ref_msg = Message( - id=Identifier("referrer"), - value=Pattern( - elements=( - Placeable( - expression=MessageReference( - id=Identifier("undefined"), - attribute=Identifier("tooltip"), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"referrer": ref_msg} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:referrer" node but NO dependency (undefined.tooltip ignored) - assert "msg:referrer" in graph - # The dependency set should be empty (undefined reference not added) - assert len(graph["msg:referrer"]) == 0 - # Should NOT have "msg:undefined.tooltip" node - assert "msg:undefined.tooltip" not in graph["msg:referrer"] - - def test_message_attribute_reference_creates_qualified_node(self) -> None: - """Message referencing another message's attribute creates qualified node. - - Tests lines 507-509: When a message reference contains "." (attribute - qualification), split it and create "msg:base.attr" node if base exists. - """ - # Create base message with an attribute - base_msg = Message( - id=Identifier("base"), - value=Pattern(elements=(TextElement("value"),)), - attributes=( - Attribute( - id=Identifier("tooltip"), - value=Pattern(elements=(TextElement("tooltip text"),)), - ), - ), - ) - - # Create message that references base message's attribute - ref_msg = Message( - id=Identifier("referrer"), - value=Pattern( - elements=( - TextElement("text "), - Placeable( - expression=MessageReference( - id=Identifier("base"), - attribute=Identifier("tooltip"), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"base": base_msg, "referrer": ref_msg} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:referrer" node with dependency on "msg:base.tooltip" - assert "msg:referrer" in graph - assert "msg:base.tooltip" in graph["msg:referrer"] - - def test_message_attribute_reference_with_known_messages(self) -> None: - """Message referencing known message's attribute creates qualified node. - - Tests lines 507-509 with known_messages parameter: attribute-qualified - reference to a known message should resolve correctly. - """ - # Current resource has message referencing known message's attribute - ref_msg = Message( - id=Identifier("current"), - value=Pattern( - elements=( - Placeable( - expression=MessageReference( - id=Identifier("known"), - attribute=Identifier("attr"), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"current": ref_msg} - terms_dict: dict[str, Term] = {} - known_messages = frozenset({"known"}) - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_messages=known_messages, - ) - - # Should resolve "known.attr" to "msg:known.attr" node - assert "msg:current" in graph - assert "msg:known.attr" in graph["msg:current"] - - def test_bare_message_reference_creates_unqualified_node(self) -> None: - """Bare message reference (no attribute) creates unqualified node. - - Regression test: ensure bare references still work correctly after - attribute-qualified support. - """ - msg_a = Message( - id=Identifier("a"), - value=Pattern(elements=(TextElement("value"),)), - attributes=(), - ) - msg_b = Message( - id=Identifier("b"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("a"))),) - ), - attributes=(), - ) - - messages_dict = {"a": msg_a, "b": msg_b} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:b" -> "msg:a" (no attribute qualification) - assert "msg:b" in graph - assert "msg:a" in graph["msg:b"] - - -class TestAttributeQualifiedTermReferences: - """Test attribute-qualified term reference resolution (lines 519-521).""" - - def test_undefined_attribute_qualified_term_reference(self) -> None: - """Attribute-qualified reference to undefined term returns None. - - Tests branch 520->524: When "." is in ref but base term doesn't - exist in terms_dict or known_terms, _resolve_term_ref returns None - and the reference is NOT added to the dependency graph. - """ - # Message referencing undefined term's attribute - msg = Message( - id=Identifier("msg"), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier("undefined"), - attribute=Identifier("variant"), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"msg": msg} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:msg" node but NO dependency (undefined term ignored) - assert "msg:msg" in graph - # The dependency set should be empty (undefined reference not added) - assert len(graph["msg:msg"]) == 0 - # Should NOT have "term:undefined.variant" node - assert "term:undefined.variant" not in graph["msg:msg"] - - def test_term_attribute_reference_creates_qualified_node(self) -> None: - """Message referencing term's attribute creates qualified node. - - Tests lines 519-521: When a term reference contains "." (attribute - qualification), split it and create "term:base.attr" node if base exists. - """ - # Create base term with an attribute - base_term = Term( - id=Identifier("brand"), - value=Pattern(elements=(TextElement("Firefox"),)), - attributes=( - Attribute( - id=Identifier("short"), - value=Pattern(elements=(TextElement("FF"),)), - ), - ), - ) - - # Create message that references term's attribute - msg = Message( - id=Identifier("welcome"), - value=Pattern( - elements=( - TextElement("Welcome to "), - Placeable( - expression=TermReference( - id=Identifier("brand"), - attribute=Identifier("short"), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"welcome": msg} - terms_dict = {"brand": base_term} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:welcome" node with dependency on "term:brand.short" - assert "msg:welcome" in graph - assert "term:brand.short" in graph["msg:welcome"] - - def test_term_attribute_reference_with_known_terms(self) -> None: - """Message referencing known term's attribute creates qualified node. - - Tests lines 519-521 with known_terms parameter: attribute-qualified - reference to a known term should resolve correctly. - """ - # Current resource has message referencing known term's attribute - msg = Message( - id=Identifier("current"), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier("known_term"), - attribute=Identifier("variant"), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"current": msg} - terms_dict: dict[str, Term] = {} - known_terms = frozenset({"known_term"}) - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_terms=known_terms, - ) - - # Should resolve "known_term.variant" to "term:known_term.variant" node - assert "msg:current" in graph - assert "term:known_term.variant" in graph["msg:current"] - - def test_bare_term_reference_creates_unqualified_node(self) -> None: - """Bare term reference (no attribute) creates unqualified node. - - Regression test: ensure bare term references still work correctly. - """ - term_brand = Term( - id=Identifier("brand"), - value=Pattern(elements=(TextElement("Firefox"),)), - attributes=(), - ) - msg = Message( - id=Identifier("welcome"), - value=Pattern( - elements=( - Placeable(expression=TermReference(id=Identifier("brand"))), - ) - ), - attributes=(), - ) - - messages_dict = {"welcome": msg} - terms_dict = {"brand": term_brand} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:welcome" -> "term:brand" (no attribute qualification) - assert "msg:welcome" in graph - assert "term:brand" in graph["msg:welcome"] - - -class TestKnownMessageDependencies: - """Test known_msg_deps dependency propagation (line 572).""" - - def test_known_message_with_dependencies_propagates_to_graph(self) -> None: - """Known message with dependencies adds them to graph. - - Tests line 572: When known_msg_deps is provided and contains the - known message ID, copy those dependencies into the graph. - """ - # Current resource has a simple message - current_msg = Message( - id=Identifier("current"), - value=Pattern(elements=(TextElement("value"),)), - attributes=(), - ) - - messages_dict = {"current": current_msg} - terms_dict: dict[str, Term] = {} - known_messages = frozenset({"known_a", "known_b"}) - - # known_a has dependencies on known_b and a term - known_msg_deps: dict[str, frozenset[str]] = { - "known_a": frozenset({"msg:known_b", "term:some_term"}), - } - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_messages=known_messages, - known_msg_deps=known_msg_deps, - ) - - # Should have known_a in graph with its dependencies - assert "msg:known_a" in graph - assert graph["msg:known_a"] == {"msg:known_b", "term:some_term"} - - def test_known_message_without_deps_entry_gets_empty_set(self) -> None: - """Known message not in known_msg_deps gets empty dependency set. - - Tests line 574: When known message is NOT in known_msg_deps dict, - it gets an empty set (no dependencies). - """ - current_msg = Message( - id=Identifier("current"), - value=Pattern(elements=(TextElement("value"),)), - attributes=(), - ) - - messages_dict = {"current": current_msg} - terms_dict: dict[str, Term] = {} - known_messages = frozenset({"known_orphan"}) - - # known_msg_deps exists but doesn't contain "known_orphan" - known_msg_deps: dict[str, frozenset[str]] = { - "some_other_msg": frozenset({"msg:dependency"}), - } - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_messages=known_messages, - known_msg_deps=known_msg_deps, - ) - - # Should have known_orphan in graph with empty dependencies - assert "msg:known_orphan" in graph - assert graph["msg:known_orphan"] == set() - - def test_known_message_already_in_graph_not_overwritten(self) -> None: - """Known message already in graph from current resource is not overwritten. - - Tests the "if node_key not in graph" guard at line 569: if a known - message is also defined in the current resource, the current resource - definition takes precedence. - """ - # Current resource defines "shared" message - shared_msg = Message( - id=Identifier("shared"), - value=Pattern( - elements=( - Placeable(expression=MessageReference(id=Identifier("local"))), - ) - ), - attributes=(), - ) - local_msg = Message( - id=Identifier("local"), - value=Pattern(elements=(TextElement("value"),)), - attributes=(), - ) - - messages_dict = {"shared": shared_msg, "local": local_msg} - terms_dict: dict[str, Term] = {} - known_messages = frozenset({"shared"}) # "shared" is also in known - - # known_msg_deps says "shared" depends on something else - known_msg_deps: dict[str, frozenset[str]] = { - "shared": frozenset({"msg:different_dependency"}), - } - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_messages=known_messages, - known_msg_deps=known_msg_deps, - ) - - # Current resource definition should win - "shared" depends on "local" - assert "msg:shared" in graph - assert "msg:local" in graph["msg:shared"] - # Should NOT have the known_msg_deps dependency - assert "msg:different_dependency" not in graph["msg:shared"] - - -class TestKnownTermDependencies: - """Test known_term_deps dependency propagation (line 582).""" - - def test_known_term_with_dependencies_propagates_to_graph(self) -> None: - """Known term with dependencies adds them to graph. - - Tests line 582: When known_term_deps is provided and contains the - known term ID, copy those dependencies into the graph. - """ - # Current resource has a simple message - current_msg = Message( - id=Identifier("current"), - value=Pattern(elements=(TextElement("value"),)), - attributes=(), - ) - - messages_dict = {"current": current_msg} - terms_dict: dict[str, Term] = {} - known_terms = frozenset({"known_term_a", "known_term_b"}) - - # known_term_a has dependencies - known_term_deps: dict[str, frozenset[str]] = { - "known_term_a": frozenset({"term:known_term_b", "msg:some_msg"}), - } - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_terms=known_terms, - known_term_deps=known_term_deps, - ) - - # Should have known_term_a in graph with its dependencies - assert "term:known_term_a" in graph - assert graph["term:known_term_a"] == {"term:known_term_b", "msg:some_msg"} - - def test_known_term_without_deps_entry_gets_empty_set(self) -> None: - """Known term not in known_term_deps gets empty dependency set. - - Tests line 584: When known term is NOT in known_term_deps dict, - it gets an empty set (no dependencies). - """ - current_msg = Message( - id=Identifier("current"), - value=Pattern(elements=(TextElement("value"),)), - attributes=(), - ) - - messages_dict = {"current": current_msg} - terms_dict: dict[str, Term] = {} - known_terms = frozenset({"known_orphan_term"}) - - # known_term_deps exists but doesn't contain "known_orphan_term" - known_term_deps: dict[str, frozenset[str]] = { - "some_other_term": frozenset({"term:dependency"}), - } - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_terms=known_terms, - known_term_deps=known_term_deps, - ) - - # Should have known_orphan_term in graph with empty dependencies - assert "term:known_orphan_term" in graph - assert graph["term:known_orphan_term"] == set() - - def test_known_term_already_in_graph_not_overwritten(self) -> None: - """Known term already in graph from current resource is not overwritten. - - Tests the "if node_key not in graph" guard at line 579: if a known - term is also defined in the current resource, the current resource - definition takes precedence. - """ - # Current resource defines "shared_term" term - shared_term = Term( - id=Identifier("shared_term"), - value=Pattern( - elements=( - Placeable(expression=TermReference(id=Identifier("local_term"))), - ) - ), - attributes=(), - ) - local_term = Term( - id=Identifier("local_term"), - value=Pattern(elements=(TextElement("value"),)), - attributes=(), - ) - - messages_dict: dict[str, Message] = {} - terms_dict = {"shared_term": shared_term, "local_term": local_term} - known_terms = frozenset({"shared_term"}) # "shared_term" is also in known - - # known_term_deps says "shared_term" depends on something else - known_term_deps: dict[str, frozenset[str]] = { - "shared_term": frozenset({"term:different_dependency"}), - } - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_terms=known_terms, - known_term_deps=known_term_deps, - ) - - # Current resource definition should win - assert "term:shared_term" in graph - assert "term:local_term" in graph["term:shared_term"] - # Should NOT have the known_term_deps dependency - assert "term:different_dependency" not in graph["term:shared_term"] - - -class TestCrossResourceCycleDetectionWithDependencies: - """Integration test: cross-resource cycle detection with known deps.""" - - def test_cross_resource_cycle_detected_via_known_deps(self) -> None: - """Cycle spanning current and known resources detected. - - Integration test: Current resource references known message, known - message (via known_msg_deps) references current resource, creating - a cross-resource cycle. - """ - # Current resource: msg_a -> known_b - msg_a = Message( - id=Identifier("a"), - value=Pattern( - elements=( - Placeable(expression=MessageReference(id=Identifier("b"))), - ) - ), - attributes=(), - ) - - messages_dict = {"a": msg_a} - terms_dict: dict[str, Term] = {} - known_messages = frozenset({"b"}) - - # Known message "b" references "a" (creating cycle: a -> b -> a) - known_msg_deps: dict[str, frozenset[str]] = { - "b": frozenset({"msg:a"}), - } - - graph = build_dependency_graph( - messages_dict, - terms_dict, - known_messages=known_messages, - known_msg_deps=known_msg_deps, - ) - - # Detect cycles - warnings = _detect_circular_references(graph) - - # Should detect the cross-resource cycle - circular_warnings = [w for w in warnings if "circular" in w.message.lower()] - assert len(circular_warnings) == 1 - # Should mention both messages in the cycle - warning_msg = circular_warnings[0].message.lower() - assert ("a" in warning_msg and "b" in warning_msg) or "circular" in warning_msg - - -class TestAttributeReferenceProperties: - """Property-based tests for attribute-qualified references.""" - - @given( - st.from_regex(r"[a-z]+", fullmatch=True), - st.from_regex(r"[a-z]+", fullmatch=True), - ) - def test_message_attribute_reference_roundtrip( - self, base_id: str, attr_id: str - ) -> None: - """PROPERTY: Message attribute reference creates qualified graph node. - - Attribute-qualified message reference "base.attr" should always - create a "msg:base.attr" node when "base" exists. - - Events emitted: - - id_length_base={bucket}: Length category of base identifier - - id_length_attr={bucket}: Length category of attribute identifier - """ - # Emit events for identifier length diversity - event(f"id_length_base={'short' if len(base_id) <= 3 else 'long'}") - event(f"id_length_attr={'short' if len(attr_id) <= 3 else 'long'}") - - base_msg = Message( - id=Identifier(base_id), - value=Pattern(elements=(TextElement("value"),)), - attributes=( - Attribute( - id=Identifier(attr_id), - value=Pattern(elements=(TextElement("attr value"),)), - ), - ), - ) - - ref_msg = Message( - id=Identifier("ref"), - value=Pattern( - elements=( - Placeable( - expression=MessageReference( - id=Identifier(base_id), - attribute=Identifier(attr_id), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {base_id: base_msg, "ref": ref_msg} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Property: qualified node exists - expected_node = f"msg:{base_id}.{attr_id}" - assert "msg:ref" in graph - assert expected_node in graph["msg:ref"] - - @given( - st.from_regex(r"[a-z]+", fullmatch=True), - st.from_regex(r"[a-z]+", fullmatch=True), - ) - def test_term_attribute_reference_roundtrip( - self, base_id: str, attr_id: str - ) -> None: - """PROPERTY: Term attribute reference creates qualified graph node. - - Attribute-qualified term reference "-base.attr" should always - create a "term:base.attr" node when "-base" exists. - - Events emitted: - - term_id_length_base={bucket}: Length category of base term identifier - - term_id_length_attr={bucket}: Length category of attribute identifier - """ - # Emit events for identifier length diversity - event(f"term_id_length_base={'short' if len(base_id) <= 3 else 'long'}") - event(f"term_id_length_attr={'short' if len(attr_id) <= 3 else 'long'}") - - base_term = Term( - id=Identifier(base_id), - value=Pattern(elements=(TextElement("value"),)), - attributes=( - Attribute( - id=Identifier(attr_id), - value=Pattern(elements=(TextElement("attr value"),)), - ), - ), - ) - - msg = Message( - id=Identifier("msg"), - value=Pattern( - elements=( - Placeable( - expression=TermReference( - id=Identifier(base_id), - attribute=Identifier(attr_id), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"msg": msg} - terms_dict = {base_id: base_term} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Property: qualified node exists - expected_node = f"term:{base_id}.{attr_id}" - assert "msg:msg" in graph - assert expected_node in graph["msg:msg"] - - -class TestComplexAttributeReferences: - """Test complex scenarios with attribute references.""" - - def test_message_with_multiple_attribute_references(self) -> None: - """Message referencing multiple attributes from different messages.""" - msg_a = Message( - id=Identifier("a"), - value=Pattern(elements=(TextElement("A"),)), - attributes=( - Attribute( - id=Identifier("tooltip"), - value=Pattern(elements=(TextElement("A tooltip"),)), - ), - ), - ) - - msg_b = Message( - id=Identifier("b"), - value=Pattern(elements=(TextElement("B"),)), - attributes=( - Attribute( - id=Identifier("label"), - value=Pattern(elements=(TextElement("B label"),)), - ), - ), - ) - - # Message referencing multiple attributes - msg_complex = Message( - id=Identifier("complex"), - value=Pattern( - elements=( - TextElement("Value"), - Placeable( - expression=MessageReference( - id=Identifier("a"), - attribute=Identifier("tooltip"), - ) - ), - TextElement(" and "), - Placeable( - expression=MessageReference( - id=Identifier("b"), - attribute=Identifier("label"), - ) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"a": msg_a, "b": msg_b, "complex": msg_complex} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have dependencies on both qualified attributes - assert "msg:complex" in graph - assert "msg:a.tooltip" in graph["msg:complex"] - assert "msg:b.label" in graph["msg:complex"] - - def test_message_attribute_itself_has_references(self) -> None: - """Message attribute containing references creates attribute-level node.""" - base_msg = Message( - id=Identifier("base"), - value=Pattern(elements=(TextElement("base value"),)), - attributes=(), - ) - - # Message with attribute that references another message - msg_with_attr_ref = Message( - id=Identifier("complex"), - value=Pattern(elements=(TextElement("value"),)), - attributes=( - Attribute( - id=Identifier("tooltip"), - value=Pattern( - elements=( - TextElement("See "), - Placeable(expression=MessageReference(id=Identifier("base"))), - ) - ), - ), - ), - ) - - messages_dict = {"base": base_msg, "complex": msg_with_attr_ref} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:complex.tooltip" node with dependency on "msg:base" - assert "msg:complex.tooltip" in graph - assert "msg:base" in graph["msg:complex.tooltip"] - - def test_select_expression_in_attribute_creates_variant_dependencies(self) -> None: - """Attribute with select expression creates variant-level dependencies.""" - base_msg = Message( - id=Identifier("base"), - value=Pattern(elements=(TextElement("base"),)), - attributes=(), - ) - - # Message with attribute containing select expression - msg_with_select_attr = Message( - id=Identifier("selector"), - value=Pattern(elements=(TextElement("value"),)), - attributes=( - Attribute( - id=Identifier("dynamic"), - value=Pattern( - elements=( - Placeable( - expression=SelectExpression( - selector=MessageReference(id=Identifier("base")), - variants=( - Variant( - key=Identifier("one"), - value=Pattern( - elements=(TextElement("variant"),) - ), - default=True, - ), - ), - ) - ), - ) - ), - ), - ), - ) - - messages_dict = {"base": base_msg, "selector": msg_with_select_attr} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - - # Should have "msg:selector.dynamic" node with dependency on "msg:base" - assert "msg:selector.dynamic" in graph - assert "msg:base" in graph["msg:selector.dynamic"] - - -# ============================================================================ -# CYCLE DETECTION BRANCH COVERAGE -# ============================================================================ - - -class TestValidationResourceBranchCoverage: - """Test validation/resource.py cycle detection branch coverage.""" - - def test_cycle_detection_loop_iterations(self) -> None: - """Cycle detection handles term-to-term cycle correctly.""" - msg_a = Message( - id=Identifier("a"), - value=Pattern( - elements=( - Placeable( - expression=TermReference(id=Identifier("x")) - ), - ) - ), - attributes=(), - ) - msg_b = Message( - id=Identifier("b"), - value=Pattern( - elements=( - Placeable( - expression=TermReference(id=Identifier("y")) - ), - ) - ), - attributes=(), - ) - - term_x = Term( - id=Identifier("x"), - value=Pattern( - elements=( - Placeable( - expression=TermReference(id=Identifier("y")) - ), - ) - ), - attributes=(), - ) - term_y = Term( - id=Identifier("y"), - value=Pattern( - elements=( - Placeable( - expression=TermReference(id=Identifier("x")) - ), - ) - ), - attributes=(), - ) - - messages_dict = {"a": msg_a, "b": msg_b} - terms_dict = {"x": term_x, "y": term_y} - - graph = build_dependency_graph(messages_dict, terms_dict) - warnings = _detect_circular_references(graph) - - cycle_warnings = [w for w in warnings if "circular" in w.message.lower()] - assert len(cycle_warnings) >= 1 - - def test_cross_type_cycle_detection(self) -> None: - """Cycle detection finds message-to-term-to-message cycle.""" - msg_a = Message( - id=Identifier("a"), - value=Pattern( - elements=( - Placeable(expression=TermReference(id=Identifier("t"))), - ) - ), - attributes=(), - ) - - term_t = Term( - id=Identifier("t"), - value=Pattern( - elements=( - Placeable(expression=MessageReference(id=Identifier("a"))), - ) - ), - attributes=(), - ) - - messages_dict = {"a": msg_a} - terms_dict = {"t": term_t} - - graph = build_dependency_graph(messages_dict, terms_dict) - warnings = _detect_circular_references(graph) - - assert any("circular" in w.message.lower() for w in warnings) - - -class TestResourceValidationBranchCoverageExtended: - """Extended resource validation branch coverage tests.""" - - def test_cycle_detection_with_multiple_independent_cycles(self) -> None: - """Cycle detection finds both of two independent cycles in the same resource.""" - msg_a = Message( - id=Identifier("a"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("b"))),) - ), - attributes=(), - ) - msg_b = Message( - id=Identifier("b"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("a"))),) - ), - attributes=(), - ) - - msg_x = Message( - id=Identifier("x"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("y"))),) - ), - attributes=(), - ) - msg_y = Message( - id=Identifier("y"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("x"))),) - ), - attributes=(), - ) - - messages_dict = {"a": msg_a, "b": msg_b, "x": msg_x, "y": msg_y} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - warnings = _detect_circular_references(graph) - - cycle_warnings = [w for w in warnings if "circular" in w.message.lower()] - assert len(cycle_warnings) >= 2 - - def test_no_cycles_in_linear_chain(self) -> None: - """Linear reference chain without cycles produces no cycle warnings.""" - msg_a = Message( - id=Identifier("a"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("b"))),) - ), - attributes=(), - ) - msg_b = Message( - id=Identifier("b"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("c"))),) - ), - attributes=(), - ) - msg_c = Message( - id=Identifier("c"), - value=Pattern( - elements=(Placeable(expression=MessageReference(id=Identifier("d"))),) - ), - attributes=(), - ) - msg_d = Message( - id=Identifier("d"), - value=Pattern(elements=(TextElement("End"),)), - attributes=(), - ) - - messages_dict = {"a": msg_a, "b": msg_b, "c": msg_c, "d": msg_d} - terms_dict: dict[str, Term] = {} - - graph = build_dependency_graph(messages_dict, terms_dict) - warnings = _detect_circular_references(graph) - - cycle_warnings = [w for w in warnings if "circular" in w.message.lower()] - assert len(cycle_warnings) == 0 +from tests.validation_resource_dependency_graph_cases.core import * # noqa: F403 - re-export split test surface +from tests.validation_resource_dependency_graph_cases.core_2 import * # noqa: F403 - re-export split test surface +from tests.validation_resource_dependency_graph_cases.cycle_detection_branch_coverage import * # noqa: F403 - re-export split test surface diff --git a/tests/validation_resource_cases/__init__.py b/tests/validation_resource_cases/__init__.py new file mode 100644 index 00000000..391e81ed --- /dev/null +++ b/tests/validation_resource_cases/__init__.py @@ -0,0 +1,54 @@ +"""Tests for validation.resource: validate_resource(), graph algorithms, edge cases.""" + +from __future__ import annotations + +from unittest.mock import patch + +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.diagnostics import DiagnosticCode +from ftllexengine.syntax import ( + Identifier, + Junk, + Message, + MessageReference, + Pattern, + Placeable, + Term, + TermReference, + TextElement, +) +from ftllexengine.syntax.cursor import LineOffsetCache +from ftllexengine.validation.resource import ( + _detect_circular_references, + _extract_syntax_errors, + validate_resource, +) +from ftllexengine.validation.resource_graph import ( + _compute_longest_paths, + build_dependency_graph, +) + +__all__ = [ + "DiagnosticCode", + "Identifier", + "Junk", + "LineOffsetCache", + "Message", + "MessageReference", + "Pattern", + "Placeable", + "Term", + "TermReference", + "TextElement", + "_compute_longest_paths", + "_detect_circular_references", + "_extract_syntax_errors", + "build_dependency_graph", + "event", + "given", + "patch", + "st", + "validate_resource", +] diff --git a/tests/validation_resource_cases/api_boundary_validation_type_error_for_non_string_input.py b/tests/validation_resource_cases/api_boundary_validation_type_error_for_non_string_input.py new file mode 100644 index 00000000..02cf5f6e --- /dev/null +++ b/tests/validation_resource_cases/api_boundary_validation_type_error_for_non_string_input.py @@ -0,0 +1,89 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# API BOUNDARY VALIDATION: TypeError for Non-String Input +# ============================================================================ + + +class TestAPIBoundaryValidation: + """Test API boundary validation for validate_resource. + + Tests defensive type checking at API boundaries (lines 760-764). + """ + + def test_validate_resource_raises_typeerror_for_bytes(self) -> None: + """Test validate_resource raises TypeError when passed bytes instead of str. + + Type hints are not enforced at runtime. Users may incorrectly pass bytes + when the API expects str. The function defensively checks and raises + TypeError with a helpful message. + + Covers lines 760-764 and branch [759, 760]. + """ + import pytest + + # Pass bytes instead of str (common mistake when reading files) + source_bytes = b"msg = Hello" + + with pytest.raises( + TypeError, + match=r"source must be str, not bytes.*Decode bytes to str", + ): + validate_resource(source_bytes) # type: ignore[arg-type] + + def test_validate_resource_raises_typeerror_for_none(self) -> None: + """Test validate_resource raises TypeError when passed None.""" + import pytest + + with pytest.raises( + TypeError, + match=r"source must be str, not NoneType", + ): + validate_resource(None) # type: ignore[arg-type] + + def test_validate_resource_raises_typeerror_for_int(self) -> None: + """Test validate_resource raises TypeError when passed int.""" + import pytest + + with pytest.raises( + TypeError, + match=r"source must be str, not int", + ): + validate_resource(42) # type: ignore[arg-type] + + def test_validate_resource_raises_typeerror_for_list(self) -> None: + """Test validate_resource raises TypeError when passed list.""" + import pytest + + with pytest.raises( + TypeError, + match=r"source must be str, not list", + ): + validate_resource(["msg = Hello"]) # type: ignore[arg-type] + + @given( + st.one_of( + st.binary(min_size=1, max_size=50), + st.integers(), + st.lists(st.text()), + st.none(), + ) + ) + def test_validate_resource_rejects_non_string_types_property( + self, invalid_input: bytes | int | list[str] | None + ) -> None: + """PROPERTY: validate_resource rejects all non-string types with TypeError. + + Events emitted: + - input_type={type}: Type of invalid input tested + """ + import pytest + + # Emit event for input type diversity + event(f"input_type={type(invalid_input).__name__}") + + with pytest.raises(TypeError, match=r"source must be str"): + validate_resource(invalid_input) # type: ignore[arg-type] diff --git a/tests/validation_resource_cases/branch_coverage_test_missing_branches.py b/tests/validation_resource_cases/branch_coverage_test_missing_branches.py new file mode 100644 index 00000000..a1079cf4 --- /dev/null +++ b/tests/validation_resource_cases/branch_coverage_test_missing_branches.py @@ -0,0 +1,176 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# BRANCH COVERAGE: Test Missing Branches +# ============================================================================ + + +class TestMissingBranchCoverage: + """Test missing branch coverage in resource.py.""" + + def test_junk_without_span_line_56(self) -> None: + """Test Junk entry without span (branch 56->60). + + Line 56: if entry.span + When span is None, line/column remain None. + """ + from ftllexengine.syntax.ast import Junk, Resource + from ftllexengine.validation.resource import _extract_syntax_errors + + # Create Junk with no span + junk_no_span = Junk(content="invalid", span=None) + resource = Resource(entries=(junk_no_span,)) + + # Extract errors with LineOffsetCache + errors = _extract_syntax_errors(resource, LineOffsetCache("source")) + + # Should have error with line=None, column=None + assert len(errors) == 1 + assert errors[0].line is None + assert errors[0].column is None + + def test_term_references_undefined_message_line_187(self) -> None: + """Test term referencing undefined message (branch 187->186). + + Line 187: if ref not in messages_dict + This tests the loop iteration when a term references a message. + Branch 187->186 is when the message DOES exist (if condition is False). + """ + from ftllexengine.syntax.ast import ( + Identifier, + Message, + MessageReference, + Pattern, + Placeable, + Term, + TextElement, + ) + from ftllexengine.validation.resource import _check_undefined_references + + # Create message that exists + existing_message = Message( + id=Identifier("existing_msg"), + value=Pattern(elements=(TextElement("text"),)), + attributes=(), + ) + + # Create term that references the existing message + term_with_msg_ref = Term( + id=Identifier("myterm"), + value=Pattern(elements=( + TextElement("text"), + Placeable( + expression=MessageReference(id=Identifier("existing_msg")) + ), # Reference to message that EXISTS + )), + attributes=(), + ) + + messages_dict = {"existing_msg": existing_message} # Message exists + terms_dict = {"myterm": term_with_msg_ref} + + # Check references with empty LineOffsetCache for AST-only testing + warnings = _check_undefined_references(messages_dict, terms_dict, LineOffsetCache("")) + + # Should have NO warnings (message exists) + # This tests branch 187->186 (if condition is False, continue to next iteration) + undefined_warnings = [w for w in warnings if "undefined" in w.message.lower()] + assert len(undefined_warnings) == 0 + + def test_duplicate_cycle_detection_line_243(self) -> None: + """Test cycle deduplication for messages. + + Verifies that the unified graph cycle detection produces exactly one + warning per unique cycle, not multiple warnings for the same cycle + detected from different starting points. + + Uses unified cross-type cycle detection. + """ + from ftllexengine.syntax.ast import ( + Identifier, + Message, + MessageReference, + Pattern, + Placeable, + ) + + # Create circular messages: a -> b -> a + msg_a = Message( + id=Identifier("a"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("b"))),) + ), + attributes=(), + ) + msg_b = Message( + id=Identifier("b"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("a"))),) + ), + attributes=(), + ) + + messages_dict = {"a": msg_a, "b": msg_b} + terms_dict: dict[str, Term] = {} + + # Build dependency graph + graph = build_dependency_graph(messages_dict, terms_dict) + # Call the real function without mocking + warnings = _detect_circular_references(graph) + + # Should only have 1 warning (cycle a -> b -> a is detected once) + circular_warnings = [w for w in warnings if "circular" in w.message.lower()] + assert len(circular_warnings) == 1 + # Should mention both messages in the cycle + warning_msg = circular_warnings[0].message.lower() + assert "a" in warning_msg or "b" in warning_msg + + def test_duplicate_cycle_detection_line_257(self) -> None: + """Test cycle deduplication for terms. + + Verifies that term-only cycles are detected and deduplicated properly + in the unified graph. + + Uses unified cross-type cycle detection. + """ + from ftllexengine.syntax.ast import ( + Identifier, + Pattern, + Placeable, + Term, + TermReference, + ) + + # Create circular terms: -ta -> -tb -> -ta + term_a = Term( + id=Identifier("ta"), + value=Pattern( + elements=(Placeable(expression=TermReference(id=Identifier("tb"))),) + ), + attributes=(), + ) + term_b = Term( + id=Identifier("tb"), + value=Pattern( + elements=(Placeable(expression=TermReference(id=Identifier("ta"))),) + ), + attributes=(), + ) + + messages_dict: dict[str, Message] = {} + terms_dict = {"ta": term_a, "tb": term_b} + + # Build dependency graph + graph = build_dependency_graph(messages_dict, terms_dict) + # Call the real function without mocking + warnings = _detect_circular_references(graph) + + # Should only have 1 warning (cycle ta -> tb -> ta is detected once) + circular_warnings = [w for w in warnings if "circular" in w.message.lower()] + assert len(circular_warnings) == 1 + # Should mention both terms in the cycle + warning_msg = circular_warnings[0].message.lower() + assert "ta" in warning_msg or "tb" in warning_msg diff --git a/tests/validation_resource_cases/compute_longest_paths_cycle_back_edge_handling_line_554_555.py b/tests/validation_resource_cases/compute_longest_paths_cycle_back_edge_handling_line_554_555.py new file mode 100644 index 00000000..4e6b7a7e --- /dev/null +++ b/tests/validation_resource_cases/compute_longest_paths_cycle_back_edge_handling_line_554_555.py @@ -0,0 +1,120 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _compute_longest_paths: Cycle/Back-Edge Handling (Line 554-555) +# ============================================================================ + + +class TestComputeLongestPathsCycleHandling: + """Tests for _compute_longest_paths with cycles (back-edge detection). + + Targets line 554-555: continue when node in in_stack (back-edge detection). + This is different from diamond patterns - actual cycles, not DAGs. + """ + + def test_simple_two_node_cycle(self) -> None: + """Two-node cycle: A->B->A triggers back-edge detection. + + When DFS processes A: + 1. Push (A, 0), mark A in_stack + 2. Push (B, 0), mark B in_stack + 3. B references A, so push (A, 0) + 4. A is already in in_stack -> triggers line 554 second condition + """ + graph = { + "msg:a": {"msg:b"}, + "msg:b": {"msg:a"}, + } + + result = _compute_longest_paths(graph) + + # Both nodes should be processed + assert "msg:a" in result + assert "msg:b" in result + + # Cycle is broken by back-edge detection + # A depends on B (depth 1), B's back-edge to A is skipped (depth 0) + assert result["msg:a"][0] == 1 + assert result["msg:b"][0] == 0 + + def test_three_node_cycle(self) -> None: + """Three-node cycle: A->B->C->A triggers back-edge on longer path.""" + graph = { + "msg:a": {"msg:b"}, + "msg:b": {"msg:c"}, + "msg:c": {"msg:a"}, + } + + result = _compute_longest_paths(graph) + + # All nodes processed + assert "msg:a" in result + assert "msg:b" in result + assert "msg:c" in result + + # Cycle is broken at C (back-edge to A skipped) + # A->B->C, C's back-edge to A is ignored + assert result["msg:a"][0] == 2 + assert result["msg:b"][0] == 1 + assert result["msg:c"][0] == 0 + + def test_self_referencing_node(self) -> None: + """Self-reference: A->A is simplest cycle case.""" + graph = { + "msg:a": {"msg:a"}, + } + + result = _compute_longest_paths(graph) + + assert "msg:a" in result + # Self-reference creates back-edge immediately + assert result["msg:a"][0] == 0 + + def test_cycle_with_tail(self) -> None: + """Cycle with tail: D->A->B->C->A (D leads into cycle).""" + graph = { + "msg:d": {"msg:a"}, + "msg:a": {"msg:b"}, + "msg:b": {"msg:c"}, + "msg:c": {"msg:a"}, + } + + result = _compute_longest_paths(graph) + + # All nodes processed + assert len(result) == 4 + + # D is outside cycle, has longest path through cycle + assert result["msg:d"][0] >= 3 + + @given( + cycle_size=st.integers(min_value=2, max_value=6), + ) + def test_cycle_property(self, cycle_size: int) -> None: + """Property: N-node cycle should not cause infinite loop. + + Creates a cycle: 0->1->2->...->N-1->0 + + Events emitted: + - cycle_size={n}: Size of the cycle + """ + # Emit event for fuzzer guidance + event(f"cycle_size={cycle_size}") + + graph: dict[str, set[str]] = {} + for i in range(cycle_size): + next_node = (i + 1) % cycle_size + graph[f"msg:n{i}"] = {f"msg:n{next_node}"} + + result = _compute_longest_paths(graph) + + # All nodes should be processed (no infinite loop) + assert len(result) == cycle_size + + # Each node should have finite depth + for i in range(cycle_size): + depth, _path = result[f"msg:n{i}"] + assert depth < cycle_size # Depth bounded by cycle size diff --git a/tests/validation_resource_cases/compute_longest_paths_diamond_pattern_line_556.py b/tests/validation_resource_cases/compute_longest_paths_diamond_pattern_line_556.py new file mode 100644 index 00000000..60f4b721 --- /dev/null +++ b/tests/validation_resource_cases/compute_longest_paths_diamond_pattern_line_556.py @@ -0,0 +1,117 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _compute_longest_paths: Diamond Pattern (Line 556) +# ============================================================================ + + +class TestComputeLongestPathsDiamondPattern: + """Tests for _compute_longest_paths with diamond dependency patterns. + + Targets line 556: continue when node already in longest_path during + stack processing (not outer loop). + """ + + def test_diamond_pattern_triggers_inner_continue(self) -> None: + """Diamond pattern: A->B, A->C->B causes B to be encountered twice. + + When DFS processes A: + 1. Descends to B first, computes longest_path[B] + 2. Descends to C, which references B + 3. C tries to process B, but B is already in longest_path + 4. This triggers line 556: continue (inner stack check) + + This is different from outer loop skip (line 545-546). + """ + # Create diamond: msg_a -> msg_b, msg_a -> msg_c -> msg_b + graph = { + "msg:a": {"msg:b", "msg:c"}, + "msg:b": set(), + "msg:c": {"msg:b"}, + } + + result = _compute_longest_paths(graph) + + # All nodes should be processed + assert "msg:a" in result + assert "msg:b" in result + assert "msg:c" in result + + # msg_b has no dependencies: depth 0 + assert result["msg:b"][0] == 0 + # msg_c depends on msg_b: depth 1 + assert result["msg:c"][0] == 1 + # msg_a has longest path through msg_c: depth 2 + assert result["msg:a"][0] == 2 + + def test_multi_level_diamond_pattern(self) -> None: + """Multi-level diamond: A->B->D, A->C->D ensures deep graph traversal.""" + graph = { + "msg:a": {"msg:b", "msg:c"}, + "msg:b": {"msg:d"}, + "msg:c": {"msg:d"}, + "msg:d": set(), + } + + result = _compute_longest_paths(graph) + + # msg_d is leaf: depth 0 + assert result["msg:d"][0] == 0 + # msg_b and msg_c both depend on msg_d: depth 1 + assert result["msg:b"][0] == 1 + assert result["msg:c"][0] == 1 + # msg_a depends on msg_b/msg_c: depth 2 + assert result["msg:a"][0] == 2 + + def test_complex_dag_with_shared_nodes(self) -> None: + """Complex DAG: A->B->E, A->C->E, A->D->E ensures multiple paths converge.""" + graph = { + "msg:a": {"msg:b", "msg:c", "msg:d"}, + "msg:b": {"msg:e"}, + "msg:c": {"msg:e"}, + "msg:d": {"msg:e"}, + "msg:e": set(), + } + + result = _compute_longest_paths(graph) + + # msg_e is referenced by 3 nodes + assert result["msg:e"][0] == 0 + assert result["msg:b"][0] == 1 + assert result["msg:c"][0] == 1 + assert result["msg:d"][0] == 1 + assert result["msg:a"][0] == 2 + + @given( + num_intermediate=st.integers(min_value=2, max_value=5), + ) + def test_diamond_pattern_property(self, num_intermediate: int) -> None: + """Property: Diamond with N intermediate nodes all converging to same leaf. + + Pattern: root -> {node1, node2, ..., nodeN} -> leaf + + Events emitted: + - num_intermediate={n}: Number of intermediate nodes + """ + # Emit event for fuzzer guidance + event(f"num_intermediate={num_intermediate}") + + graph: dict[str, set[str]] = { + "msg:root": {f"msg:mid{i}" for i in range(num_intermediate)}, + "msg:leaf": set(), + } + for i in range(num_intermediate): + graph[f"msg:mid{i}"] = {"msg:leaf"} + + result = _compute_longest_paths(graph) + + # Leaf has no dependencies + assert result["msg:leaf"][0] == 0 + # All intermediate nodes have depth 1 + for i in range(num_intermediate): + assert result[f"msg:mid{i}"][0] == 1 + # Root has depth 2 + assert result["msg:root"][0] == 2 diff --git a/tests/validation_resource_cases/core.py b/tests/validation_resource_cases/core.py new file mode 100644 index 00000000..595e7c83 --- /dev/null +++ b/tests/validation_resource_cases/core.py @@ -0,0 +1,581 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + + +class TestSyntaxErrorExtraction: + """Test extraction of syntax errors from Junk entries.""" + + def test_single_junk_entry_creates_validation_error(self) -> None: + """Test that Junk entry is converted to ValidationError.""" + ftl = "invalid junk entry" + result = validate_resource(ftl) + + # Should have syntax error + assert len(result.errors) > 0 + assert any("parse" in err.code.name.lower() for err in result.errors) + + def test_multiple_junk_entries_create_multiple_errors(self) -> None: + """Test multiple Junk entries create multiple errors.""" + ftl = """ +invalid entry 1 +also bad = { broken +another junk line +""" + result = validate_resource(ftl) + # Should have multiple errors (exact count depends on parser) + assert len(result.errors) >= 1 + + def test_junk_with_span_information(self) -> None: + """Test that Junk errors include position information.""" + ftl = "msg = { broken syntax }" + result = validate_resource(ftl) + + # Should have error with position info + if len(result.errors) > 0: + error = result.errors[0] + # Line/column may be set + assert error.code is not None + + @given( + st.text(min_size=1, max_size=50).filter( + lambda s: "=" not in s and "{" not in s and "}" not in s + ) + ) + def test_invalid_syntax_property(self, invalid_text: str) -> None: + """PROPERTY: Invalid FTL syntax produces validation errors. + + Events emitted: + - has_errors={bool}: Whether validation produced errors + - has_whitespace={bool}: Whether input contains whitespace + """ + + result = validate_resource(invalid_text) + + # Emit events for semantic coverage + event(f"has_errors={len(result.errors) > 0}") + event(f"has_whitespace={any(c.isspace() for c in invalid_text)}") + + # Either parses as comment/junk or has errors + assert result is not None + + +class TestDuplicateIdDetection: + """Test duplicate message and term ID detection.""" + + def test_duplicate_message_ids_produce_warning(self) -> None: + """Test duplicate message IDs create warnings.""" + ftl = """ +msg = First value +msg = Second value +""" + result = validate_resource(ftl) + + # Should have warning about duplicate + assert len(result.warnings) > 0 + assert any( + "duplicate" in warn.message.lower() and "msg" in warn.message.lower() + for warn in result.warnings + ) + + def test_duplicate_term_ids_produce_warning(self) -> None: + """Test duplicate term IDs create warnings.""" + ftl = """ +-term = First value +-term = Second value +""" + result = validate_resource(ftl) + + # Should have warning + assert len(result.warnings) > 0 + assert any("duplicate" in warn.message.lower() for warn in result.warnings) + + def test_no_duplicate_warning_for_unique_ids(self) -> None: + """Test no duplicate warnings when IDs are unique.""" + ftl = """ +msg1 = First +msg2 = Second +-term1 = Term one +-term2 = Term two +""" + result = validate_resource(ftl) + + # Should not have duplicate warnings + duplicate_warnings = [ + w for w in result.warnings if "duplicate" in w.message.lower() + ] + assert len(duplicate_warnings) == 0 + + @given( + st.lists( + st.from_regex(r"[a-z]+", fullmatch=True), + min_size=2, + max_size=5, + ) + ) + def test_multiple_duplicates_property(self, ids: list[str]) -> None: + """PROPERTY: Multiple duplicate IDs all produce warnings. + + Events emitted: + - duplicate_count={n}: Number of duplicate entries (len - 1) + """ + + # Create FTL with all same ID + ftl_lines = [f"{ids[0]} = Value {i}" for i in range(len(ids))] + ftl = "\n".join(ftl_lines) + + # Emit event for duplicate count + event(f"duplicate_count={len(ids) - 1}") + + result = validate_resource(ftl) + # Should have warnings (at least len(ids) - 1 duplicates) + if len(ids) > 1: + assert len(result.warnings) >= 1 + + +class TestMessageWithoutValue: + """Test validation of messages without values (only attributes).""" + + def test_message_with_only_attributes_produces_warning(self) -> None: + """Test message with no value but attributes gets warning.""" + ftl = """ +msg = + .attr1 = Value 1 + .attr2 = Value 2 +""" + result = validate_resource(ftl) + + # Per FTL spec, message can have only attributes (valid) + # But implementation may warn about this pattern + # Check it doesn't crash + assert result is not None + + def test_message_with_value_and_attributes_no_warning(self) -> None: + """Test message with both value and attributes is valid.""" + ftl = """ +msg = Value + .attr = Attribute +""" + result = validate_resource(ftl) + + # Should be valid - no warnings about structure + assert result is not None + assert result.is_valid + + +class TestUndefinedReferenceDetection: + """Test detection of undefined message and term references.""" + + def test_undefined_message_reference_produces_warning(self) -> None: + """Test reference to undefined message produces warning.""" + ftl = """ +msg = { other } +""" + result = validate_resource(ftl) + + # Should warn about undefined reference + assert len(result.warnings) > 0 + assert any( + "undefined" in warn.message.lower() or "reference" in warn.message.lower() + for warn in result.warnings + ) + + def test_undefined_term_reference_produces_warning(self) -> None: + """Test reference to undefined term produces warning.""" + ftl = """ +msg = { -undefined } +""" + result = validate_resource(ftl) + + # Should warn about undefined term + assert len(result.warnings) > 0 + assert any("undefined" in warn.message.lower() for warn in result.warnings) + + def test_defined_message_reference_no_warning(self) -> None: + """Test reference to defined message produces no warning.""" + ftl = """ +other = Other message +msg = { other } +""" + result = validate_resource(ftl) + + # Should not warn about this reference + undefined_warnings = [ + w for w in result.warnings if "undefined" in w.message.lower() + ] + assert len(undefined_warnings) == 0 + + def test_defined_term_reference_no_warning(self) -> None: + """Test reference to defined term produces no warning.""" + ftl = """ +-brand = Firefox +msg = { -brand } +""" + result = validate_resource(ftl) + + undefined_warnings = [ + w for w in result.warnings if "undefined" in w.message.lower() + ] + assert len(undefined_warnings) == 0 + + def test_term_referencing_undefined_message(self) -> None: + """Test term that references undefined message.""" + ftl = """ +-term = { undefined } +""" + result = validate_resource(ftl) + + # Should warn + assert any("undefined" in w.message.lower() for w in result.warnings) + + def test_term_referencing_undefined_term(self) -> None: + """Test term that references undefined term.""" + ftl = """ +-term1 = { -term2 } +""" + result = validate_resource(ftl) + + # Should warn + assert any("undefined" in w.message.lower() for w in result.warnings) + + +class TestCircularReferenceDetection: + """Test detection of circular dependencies.""" + + def test_direct_message_self_reference(self) -> None: + """Test message referencing itself.""" + ftl = """ +msg = { msg } +""" + result = validate_resource(ftl) + + # Should detect cycle + assert any("circular" in w.message.lower() for w in result.warnings) + + def test_indirect_message_cycle(self) -> None: + """Test indirect message cycle (A -> B -> A).""" + ftl = """ +a = { b } +b = { a } +""" + result = validate_resource(ftl) + + # Should detect cycle + assert any("circular" in w.message.lower() for w in result.warnings) + + def test_three_way_message_cycle(self) -> None: + """Test three-way message cycle (A -> B -> C -> A).""" + ftl = """ +a = { b } +b = { c } +c = { a } +""" + result = validate_resource(ftl) + + # Should detect cycle + assert any("circular" in w.message.lower() for w in result.warnings) + + def test_direct_term_self_reference(self) -> None: + """Test term referencing itself.""" + ftl = """ +-term = { -term } +""" + result = validate_resource(ftl) + + # Should detect cycle + assert any("circular" in w.message.lower() for w in result.warnings) + + def test_indirect_term_cycle(self) -> None: + """Test indirect term cycle.""" + ftl = """ +-a = { -b } +-b = { -a } +""" + result = validate_resource(ftl) + + # Should detect cycle + assert any("circular" in w.message.lower() for w in result.warnings) + + def test_no_cycle_in_tree_structure(self) -> None: + """Test tree structure (no cycles) produces no warnings.""" + ftl = """ +base = Base +a = { base } +b = { base } +c = { a } +""" + result = validate_resource(ftl) + + # Should not warn about cycles + circular_warnings = [ + w for w in result.warnings if "circular" in w.message.lower() + ] + assert len(circular_warnings) == 0 + + +class TestValidationResultStructure: + """Test ValidationResult structure and properties.""" + + def test_valid_ftl_has_no_errors(self) -> None: + """Test valid FTL produces is_valid=True.""" + ftl = """ +msg = Hello +-term = World +""" + result = validate_resource(ftl) + + assert result.is_valid + assert len(result.errors) == 0 + + def test_parse_error_sets_is_valid_false(self) -> None: + """Test parse errors set is_valid=False.""" + ftl = "invalid junk" + result = validate_resource(ftl) + + # Should have errors and be invalid + # (unless parser treats it as comment) + if len(result.errors) > 0: + assert not result.is_valid + + def test_warnings_dont_affect_is_valid(self) -> None: + """Test warnings don't set is_valid=False.""" + ftl = """ +msg = { undefined } +""" + result = validate_resource(ftl) + + # May have warnings but no errors + if len(result.errors) == 0: + assert result.is_valid + + def test_validation_result_has_all_fields(self) -> None: + """Test ValidationResult has all expected fields.""" + ftl = "msg = Test" + result = validate_resource(ftl) + + assert hasattr(result, "errors") + assert hasattr(result, "warnings") + assert hasattr(result, "annotations") + assert hasattr(result, "is_valid") + + assert isinstance(result.errors, tuple) + assert isinstance(result.warnings, tuple) + assert isinstance(result.annotations, tuple) + + +class TestCustomParserInstance: + """Test validate_resource with custom parser.""" + + def test_validate_with_custom_parser(self) -> None: + """Test validate_resource accepts custom parser.""" + from ftllexengine.syntax.parser import FluentParserV1 + + parser = FluentParserV1() + ftl = "msg = Test" + + result = validate_resource(ftl, parser=parser) + assert result is not None + assert result.is_valid + + def test_validate_creates_default_parser_if_none(self) -> None: + """Test validate_resource creates parser if not provided.""" + ftl = "msg = Test" + result = validate_resource(ftl) + + assert result is not None + + +class TestEdgeCases: + """Test edge cases and boundary conditions.""" + + def test_empty_resource(self) -> None: + """Test validation of empty resource.""" + ftl = "" + result = validate_resource(ftl) + + assert result is not None + # Empty resource is valid + assert result.is_valid + + def test_only_comments(self) -> None: + """Test resource with only comments.""" + ftl = """ +# Comment 1 +## Comment 2 +### Comment 3 +""" + result = validate_resource(ftl) + + assert result.is_valid + assert len(result.errors) == 0 + + def test_comments_and_valid_entries(self) -> None: + """Test mixed comments and entries.""" + ftl = """ +# Header comment +msg = Value + +## Section +-term = Term value +""" + result = validate_resource(ftl) + + assert result.is_valid + + def test_whitespace_only(self) -> None: + """Test resource with only whitespace.""" + ftl = " \n\n \n" # Spaces only, no tabs (tabs can be invalid FTL) + result = validate_resource(ftl) + + # Should be valid or may have parse errors depending on tab handling + assert result is not None + + @given( + st.text( + alphabet=st.sampled_from(" \n\r"), # Only safe whitespace chars + min_size=0, + max_size=100, + ) + ) + def test_whitespace_property(self, whitespace: str) -> None: + """PROPERTY: Whitespace-only resources don't crash validation. + + Events emitted: + - length_category={bucket}: Length bucket (empty, short, medium, long) + - has_newlines={bool}: Whether input contains newlines + """ + + # Emit events for semantic coverage + length = len(whitespace) + if length == 0: + event("length_category=empty") + elif length < 10: + event("length_category=short") + elif length < 50: + event("length_category=medium") + else: + event("length_category=long") + + event(f"has_newlines={'\n' in whitespace}") + + result = validate_resource(whitespace) + # Should not crash (may have errors, but completes) + assert result is not None + + +class TestComplexScenarios: + """Test complex validation scenarios.""" + + def test_large_resource_with_multiple_issues(self) -> None: + """Test resource with multiple types of issues.""" + ftl = """ +# Valid comment +msg1 = Value + +# Duplicate ID +msg1 = Second value + +# Undefined reference +msg2 = { undefined } + +# Circular reference +a = { b } +b = { a } + +# Invalid syntax +invalid junk + +# Valid term +-term = Term +""" + result = validate_resource(ftl) + + # Should collect all issues + assert len(result.errors) + len(result.warnings) > 0 + + def test_deeply_nested_references(self) -> None: + """Test chain of references without cycles.""" + ftl = """ +msg1 = Value +msg2 = { msg1 } +msg3 = { msg2 } +msg4 = { msg3 } +msg5 = { msg4 } +""" + result = validate_resource(ftl) + + # Should be valid (no cycles) + circular_warnings = [ + w for w in result.warnings if "circular" in w.message.lower() + ] + assert len(circular_warnings) == 0 + + def test_message_and_term_with_same_base_name(self) -> None: + """Test message and term can have same name (different namespaces).""" + ftl = """ +brand = Message +-brand = Term +msg = { brand } and { -brand } +""" + result = validate_resource(ftl) + + # Should be valid - different namespaces + undefined_warnings = [ + w for w in result.warnings if "undefined" in w.message.lower() + ] + assert len(undefined_warnings) == 0 + + +class TestValidationIntegration: + """Integration tests combining multiple validation passes.""" + + def test_all_validation_passes_execute(self) -> None: + """Test all validation passes execute in sequence.""" + ftl = """ +# Syntax error +invalid + +# Duplicate +msg = First +msg = Second + +# Undefined reference +ref = { missing } + +# Circular reference +c1 = { c2 } +c2 = { c1 } +""" + result = validate_resource(ftl) + + # Should have collected issues from all passes + total_issues = len(result.errors) + len(result.warnings) + assert total_issues > 0 + + @given( + st.lists( + st.from_regex(r"[a-z]+", fullmatch=True), + min_size=1, + max_size=10, + unique=True, + ) + ) + def test_valid_messages_property(self, identifiers: list[str]) -> None: + """PROPERTY: Valid messages with unique IDs validate successfully. + + Events emitted: + - message_count={n}: Number of messages in resource + """ + + ftl_lines = [f"{id_} = Value for {id_}" for id_ in identifiers] + ftl = "\n".join(ftl_lines) + + # Emit event for message count + event(f"message_count={len(identifiers)}") + + result = validate_resource(ftl) + + # Should be valid + assert result.is_valid + assert len(result.errors) == 0 diff --git a/tests/validation_resource_cases/detect_circular_references_duplicate_cycle_keys_branch_425.py b/tests/validation_resource_cases/detect_circular_references_duplicate_cycle_keys_branch_425.py new file mode 100644 index 00000000..7a1b48bd --- /dev/null +++ b/tests/validation_resource_cases/detect_circular_references_duplicate_cycle_keys_branch_425.py @@ -0,0 +1,55 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _detect_circular_references: Duplicate Cycle Keys (Branch 425) +# ============================================================================ + + +class TestDetectCircularReferencesDuplicateCycleKeys: + """Tests for _detect_circular_references duplicate cycle key handling. + + Targets branch 425->423: if cycle_key not in seen_cycle_keys (false branch). + """ + + def test_duplicate_cycle_from_detect_cycles(self) -> None: + """Mock detect_cycles to return duplicate cycles for defensive code test.""" + # Create a simple cycle + graph = { + "msg:a": {"msg:b"}, + "msg:b": {"msg:a"}, + } + + # Mock detect_cycles to yield the same cycle twice + with patch("ftllexengine.validation.resource.detect_cycles") as mock_detect: + # Return same cycle twice to test deduplication logic + cycle = ["msg:a", "msg:b", "msg:a"] + mock_detect.return_value = iter([cycle, cycle]) + + warnings = _detect_circular_references(graph) + + # Should deduplicate and return only one warning + assert len(warnings) == 1 + assert warnings[0].code == DiagnosticCode.VALIDATION_CIRCULAR_REFERENCE + + def test_cycle_key_deduplication_with_permutations(self) -> None: + """Cycle keys should deduplicate permutations (A->B->A == B->A->B).""" + # This tests the make_cycle_key function indirectly + # Create a self-referencing cycle to ensure consistent behavior + graph = { + "msg:x": {"msg:y"}, + "msg:y": {"msg:z"}, + "msg:z": {"msg:x"}, + } + + warnings = _detect_circular_references(graph) + + # Should detect exactly one cycle (not multiple rotations) + assert len(warnings) == 1 + cycle_warnings = [ + w for w in warnings + if w.code == DiagnosticCode.VALIDATION_CIRCULAR_REFERENCE + ] + assert len(cycle_warnings) == 1 diff --git a/tests/validation_resource_cases/detect_circular_references_malformed_node_formatting_branch_434.py b/tests/validation_resource_cases/detect_circular_references_malformed_node_formatting_branch_434.py new file mode 100644 index 00000000..0152d38a --- /dev/null +++ b/tests/validation_resource_cases/detect_circular_references_malformed_node_formatting_branch_434.py @@ -0,0 +1,62 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# _detect_circular_references: Malformed Node Formatting (Branch 434) +# ============================================================================ + + +class TestDetectCircularReferencesMalformedNodes: + """Tests for _detect_circular_references with malformed graph nodes. + + Targets branch 434->431: node doesn't start with "msg:" or "term:". + """ + + def test_malformed_node_in_cycle_skipped_in_formatting(self) -> None: + """Malformed nodes (no msg:/term: prefix) handled gracefully in formatting.""" + # Directly test with malformed graph (shouldn't happen in practice) + # This tests defensive programming + graph = { + "msg:a": {"malformed_node"}, + "malformed_node": {"msg:a"}, + } + + # Mock detect_cycles to return a cycle with malformed node + with patch("ftllexengine.validation.resource.detect_cycles") as mock_detect: + cycle = ["msg:a", "malformed_node", "msg:a"] + mock_detect.return_value = iter([cycle]) + + warnings = _detect_circular_references(graph) + + # Should still create a warning + assert len(warnings) == 1 + assert warnings[0].code == DiagnosticCode.VALIDATION_CIRCULAR_REFERENCE + + # Context should only contain properly formatted nodes + # "malformed_node" should be skipped (no prefix match) + assert warnings[0].context is not None + # The formatted output should contain "a" but not include malformed_node + # (since it doesn't match msg: or term: prefixes) + assert "a" in warnings[0].context + + def test_mixed_valid_and_malformed_nodes_in_cycle(self) -> None: + """Cycle with mix of valid and malformed nodes formats valid ones only.""" + graph = { + "msg:valid1": {"term:valid2"}, + "term:valid2": {"bad_node"}, + "bad_node": {"msg:valid1"}, + } + + with patch("ftllexengine.validation.resource.detect_cycles") as mock_detect: + cycle = ["msg:valid1", "term:valid2", "bad_node", "msg:valid1"] + mock_detect.return_value = iter([cycle]) + + warnings = _detect_circular_references(graph) + + assert len(warnings) == 1 + assert warnings[0].context is not None + # Should format valid nodes + assert "valid1" in warnings[0].context + assert "-valid2" in warnings[0].context diff --git a/tests/validation_resource_cases/integration_tests_with_real_ftl_structures.py b/tests/validation_resource_cases/integration_tests_with_real_ftl_structures.py new file mode 100644 index 00000000..5c3a1303 --- /dev/null +++ b/tests/validation_resource_cases/integration_tests_with_real_ftl_structures.py @@ -0,0 +1,187 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + + # bad_node should be skipped in formatting (no prefix) + + +# ============================================================================ +# Integration Tests with Real FTL Structures +# ============================================================================ + + +class TestValidationResourceCompleteIntegration: + """Integration tests combining edge cases using real FTL AST structures.""" + + def test_diamond_dependency_in_real_messages(self) -> None: + """Diamond pattern with real Message objects.""" + # Create: msgA -> msgB, msgA -> msgC -> msgB + msg_b = Message( + id=Identifier("msgB"), + value=Pattern(elements=(TextElement(value="Base message"),)), + attributes=(), + ) + msg_c = Message( + id=Identifier("msgC"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("msgB"))),) + ), + attributes=(), + ) + msg_a = Message( + id=Identifier("msgA"), + value=Pattern( + elements=( + Placeable(expression=MessageReference(id=Identifier("msgB"))), + TextElement(value=" and "), + Placeable(expression=MessageReference(id=Identifier("msgC"))), + ) + ), + attributes=(), + ) + + messages_dict = {"msgA": msg_a, "msgB": msg_b, "msgC": msg_c} + terms_dict: dict[str, Term] = {} + + # Build dependency graph + graph = build_dependency_graph(messages_dict, terms_dict) + + # Compute longest paths (exercises diamond pattern) + result = _compute_longest_paths(graph) + + # msgB is referenced by both msgA and msgC + assert "msg:msgB" in result + assert result["msg:msgB"][0] == 0 + assert result["msg:msgC"][0] == 1 + assert result["msg:msgA"][0] == 2 + + def test_cross_type_diamond_message_and_term(self) -> None: + """Diamond with cross-type references: msg -> term, msg -> msg -> term.""" + # Create: msgA -> termB, msgA -> msgC -> termB + term_b = Term( + id=Identifier("termB"), + value=Pattern(elements=(TextElement(value="Term value"),)), + attributes=(), + ) + msg_c = Message( + id=Identifier("msgC"), + value=Pattern( + elements=(Placeable(expression=TermReference(id=Identifier("termB"))),) + ), + attributes=(), + ) + msg_a = Message( + id=Identifier("msgA"), + value=Pattern( + elements=( + Placeable(expression=TermReference(id=Identifier("termB"))), + TextElement(value=" via "), + Placeable(expression=MessageReference(id=Identifier("msgC"))), + ) + ), + attributes=(), + ) + + messages_dict = {"msgA": msg_a, "msgC": msg_c} + terms_dict = {"termB": term_b} + + # Build dependency graph + graph = build_dependency_graph(messages_dict, terms_dict) + + # Compute longest paths + result = _compute_longest_paths(graph) + + # termB is referenced by both msgA and msgC + assert "term:termB" in result + assert result["term:termB"][0] == 0 + assert result["msg:msgC"][0] == 1 + assert result["msg:msgA"][0] == 2 + + @given( + num_messages=st.integers(min_value=3, max_value=8), + ) + def test_property_complex_dependency_graphs(self, num_messages: int) -> None: + """Property: Complex dependency graphs always compute without errors. + + Events emitted: + - num_messages={n}: Number of messages in graph + """ + # Emit event for fuzzer guidance + event(f"num_messages={num_messages}") + + # Create a chain with some cross-references + messages_dict: dict[str, Message] = {} + + for i in range(num_messages): + if i == num_messages - 1: + # Last message has no references + value = Pattern(elements=(TextElement(value="End"),)) + elif i % 2 == 0: + # Even messages reference next message + value = Pattern( + elements=( + Placeable( + expression=MessageReference(id=Identifier(f"msg{i+1}")) + ), + ) + ) + else: + # Odd messages reference last message (creates diamond-like structure) + value = Pattern( + elements=( + Placeable( + expression=MessageReference( + id=Identifier(f"msg{num_messages-1}") + ) + ), + ) + ) + + messages_dict[f"msg{i}"] = Message( + id=Identifier(f"msg{i}"), + value=value, + attributes=(), + ) + + terms_dict: dict[str, Term] = {} + + # Build and compute - should not raise + graph = build_dependency_graph(messages_dict, terms_dict) + result = _compute_longest_paths(graph) + + # All messages should be in result + assert len(result) >= num_messages + + +class TestValidationResourceEdgeCases: + """Coverage for validation/resource.py edge cases.""" + + def test_junk_without_span(self) -> None: + """Junk entry without span uses None for line/column.""" + junk = Junk(content="invalid", span=None) + + class MockResource: + def __init__(self) -> None: + self.entries = [junk] + + errors = _extract_syntax_errors( + MockResource(), "invalid" # type: ignore[arg-type] + ) + assert len(errors) > 0 + assert errors[0].line is None + + def test_validation_with_invalid_ftl(self) -> None: + """Validation handles malformed FTL gracefully.""" + result = validate_resource("msg = { $val ->") + assert result is not None + + def test_cycle_deduplication(self) -> None: + """Circular references are detected without duplicates.""" + ftl = "\na = { b }\nb = { a }\nc = { d }\nd = { c }\n" + result = validate_resource(ftl) + circular = [ + w for w in result.warnings + if "circular" in w.message.lower() + ] + assert len(circular) >= 2 diff --git a/tests/validation_resource_cases/line_113_test_message_without_value_or_attributes.py b/tests/validation_resource_cases/line_113_test_message_without_value_or_attributes.py new file mode 100644 index 00000000..561d00c1 --- /dev/null +++ b/tests/validation_resource_cases/line_113_test_message_without_value_or_attributes.py @@ -0,0 +1,29 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource.py.""" + +from tests.validation_resource_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# LINE 113: Test Message Without Value or Attributes +# ============================================================================ + + +class TestMessageWithoutValueOrAttributes: + """Test validation of message with neither value nor attributes (line 113).""" + + def test_message_without_value_or_attributes_raises_at_construction(self) -> None: + """Message with neither value nor attributes raises ValueError at construction. + + The __post_init__ validation now enforces this invariant at construction + time rather than deferring to the validator. + """ + import pytest + + from ftllexengine.syntax.ast import Identifier, Message + + with pytest.raises(ValueError, match="must have a value or at least one attribute"): + Message( + id=Identifier("empty_msg"), + value=None, + attributes=(), + ) diff --git a/tests/validation_resource_dependency_graph_cases/__init__.py b/tests/validation_resource_dependency_graph_cases/__init__.py new file mode 100644 index 00000000..c193290f --- /dev/null +++ b/tests/validation_resource_dependency_graph_cases/__init__.py @@ -0,0 +1,52 @@ +"""Dependency graph construction tests for validation/resource_graph.py. + +Tests attribute-qualified reference resolution and known entry dependency +propagation to achieve 100% coverage of build_dependency_graph and +related helper functions. + +Coverage targets: +- Lines 507-509: _resolve_msg_ref with attribute-qualified references +- Lines 519-521: _resolve_term_ref with attribute-qualified references +- Line 572: known_msg_deps dependency propagation +- Line 582: known_term_deps dependency propagation +""" + +from __future__ import annotations + +from hypothesis import event, given +from hypothesis import strategies as st + +from ftllexengine.syntax.ast import ( + Attribute, + Identifier, + Message, + MessageReference, + Pattern, + Placeable, + SelectExpression, + Term, + TermReference, + TextElement, + Variant, +) +from ftllexengine.validation.resource import _detect_circular_references +from ftllexengine.validation.resource_graph import build_dependency_graph + +__all__ = [ + "Attribute", + "Identifier", + "Message", + "MessageReference", + "Pattern", + "Placeable", + "SelectExpression", + "Term", + "TermReference", + "TextElement", + "Variant", + "_detect_circular_references", + "build_dependency_graph", + "event", + "given", + "st", +] diff --git a/tests/validation_resource_dependency_graph_cases/core.py b/tests/validation_resource_dependency_graph_cases/core.py new file mode 100644 index 00000000..e6b0c150 --- /dev/null +++ b/tests/validation_resource_dependency_graph_cases/core.py @@ -0,0 +1,691 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource_dependency_graph.py.""" + +from tests.validation_resource_dependency_graph_cases import * # noqa: F403 - shared split test support + + +class TestAttributeQualifiedMessageReferences: + """Test attribute-qualified message reference resolution (lines 507-509).""" + + def test_undefined_attribute_qualified_message_reference(self) -> None: + """Attribute-qualified reference to undefined message returns None. + + Tests branch 508->513: When "." is in ref but base message doesn't + exist in messages_dict or known_messages, _resolve_msg_ref returns None + and the reference is NOT added to the dependency graph. + """ + # Message referencing undefined message's attribute + ref_msg = Message( + id=Identifier("referrer"), + value=Pattern( + elements=( + Placeable( + expression=MessageReference( + id=Identifier("undefined"), + attribute=Identifier("tooltip"), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"referrer": ref_msg} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:referrer" node but NO dependency (undefined.tooltip ignored) + assert "msg:referrer" in graph + # The dependency set should be empty (undefined reference not added) + assert len(graph["msg:referrer"]) == 0 + # Should NOT have "msg:undefined.tooltip" node + assert "msg:undefined.tooltip" not in graph["msg:referrer"] + + def test_message_attribute_reference_creates_qualified_node(self) -> None: + """Message referencing another message's attribute creates qualified node. + + Tests lines 507-509: When a message reference contains "." (attribute + qualification), split it and create "msg:base.attr" node if base exists. + """ + # Create base message with an attribute + base_msg = Message( + id=Identifier("base"), + value=Pattern(elements=(TextElement("value"),)), + attributes=( + Attribute( + id=Identifier("tooltip"), + value=Pattern(elements=(TextElement("tooltip text"),)), + ), + ), + ) + + # Create message that references base message's attribute + ref_msg = Message( + id=Identifier("referrer"), + value=Pattern( + elements=( + TextElement("text "), + Placeable( + expression=MessageReference( + id=Identifier("base"), + attribute=Identifier("tooltip"), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"base": base_msg, "referrer": ref_msg} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:referrer" node with dependency on "msg:base.tooltip" + assert "msg:referrer" in graph + assert "msg:base.tooltip" in graph["msg:referrer"] + + def test_message_attribute_reference_with_known_messages(self) -> None: + """Message referencing known message's attribute creates qualified node. + + Tests lines 507-509 with known_messages parameter: attribute-qualified + reference to a known message should resolve correctly. + """ + # Current resource has message referencing known message's attribute + ref_msg = Message( + id=Identifier("current"), + value=Pattern( + elements=( + Placeable( + expression=MessageReference( + id=Identifier("known"), + attribute=Identifier("attr"), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"current": ref_msg} + terms_dict: dict[str, Term] = {} + known_messages = frozenset({"known"}) + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_messages=known_messages, + ) + + # Should resolve "known.attr" to "msg:known.attr" node + assert "msg:current" in graph + assert "msg:known.attr" in graph["msg:current"] + + def test_bare_message_reference_creates_unqualified_node(self) -> None: + """Bare message reference (no attribute) creates unqualified node. + + Regression test: ensure bare references still work correctly after + attribute-qualified support. + """ + msg_a = Message( + id=Identifier("a"), + value=Pattern(elements=(TextElement("value"),)), + attributes=(), + ) + msg_b = Message( + id=Identifier("b"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("a"))),) + ), + attributes=(), + ) + + messages_dict = {"a": msg_a, "b": msg_b} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:b" -> "msg:a" (no attribute qualification) + assert "msg:b" in graph + assert "msg:a" in graph["msg:b"] + + +class TestAttributeQualifiedTermReferences: + """Test attribute-qualified term reference resolution (lines 519-521).""" + + def test_undefined_attribute_qualified_term_reference(self) -> None: + """Attribute-qualified reference to undefined term returns None. + + Tests branch 520->524: When "." is in ref but base term doesn't + exist in terms_dict or known_terms, _resolve_term_ref returns None + and the reference is NOT added to the dependency graph. + """ + # Message referencing undefined term's attribute + msg = Message( + id=Identifier("msg"), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier("undefined"), + attribute=Identifier("variant"), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"msg": msg} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:msg" node but NO dependency (undefined term ignored) + assert "msg:msg" in graph + # The dependency set should be empty (undefined reference not added) + assert len(graph["msg:msg"]) == 0 + # Should NOT have "term:undefined.variant" node + assert "term:undefined.variant" not in graph["msg:msg"] + + def test_term_attribute_reference_creates_qualified_node(self) -> None: + """Message referencing term's attribute creates qualified node. + + Tests lines 519-521: When a term reference contains "." (attribute + qualification), split it and create "term:base.attr" node if base exists. + """ + # Create base term with an attribute + base_term = Term( + id=Identifier("brand"), + value=Pattern(elements=(TextElement("Firefox"),)), + attributes=( + Attribute( + id=Identifier("short"), + value=Pattern(elements=(TextElement("FF"),)), + ), + ), + ) + + # Create message that references term's attribute + msg = Message( + id=Identifier("welcome"), + value=Pattern( + elements=( + TextElement("Welcome to "), + Placeable( + expression=TermReference( + id=Identifier("brand"), + attribute=Identifier("short"), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"welcome": msg} + terms_dict = {"brand": base_term} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:welcome" node with dependency on "term:brand.short" + assert "msg:welcome" in graph + assert "term:brand.short" in graph["msg:welcome"] + + def test_term_attribute_reference_with_known_terms(self) -> None: + """Message referencing known term's attribute creates qualified node. + + Tests lines 519-521 with known_terms parameter: attribute-qualified + reference to a known term should resolve correctly. + """ + # Current resource has message referencing known term's attribute + msg = Message( + id=Identifier("current"), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier("known_term"), + attribute=Identifier("variant"), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"current": msg} + terms_dict: dict[str, Term] = {} + known_terms = frozenset({"known_term"}) + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_terms=known_terms, + ) + + # Should resolve "known_term.variant" to "term:known_term.variant" node + assert "msg:current" in graph + assert "term:known_term.variant" in graph["msg:current"] + + def test_bare_term_reference_creates_unqualified_node(self) -> None: + """Bare term reference (no attribute) creates unqualified node. + + Regression test: ensure bare term references still work correctly. + """ + term_brand = Term( + id=Identifier("brand"), + value=Pattern(elements=(TextElement("Firefox"),)), + attributes=(), + ) + msg = Message( + id=Identifier("welcome"), + value=Pattern( + elements=( + Placeable(expression=TermReference(id=Identifier("brand"))), + ) + ), + attributes=(), + ) + + messages_dict = {"welcome": msg} + terms_dict = {"brand": term_brand} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:welcome" -> "term:brand" (no attribute qualification) + assert "msg:welcome" in graph + assert "term:brand" in graph["msg:welcome"] + + +class TestKnownMessageDependencies: + """Test known_msg_deps dependency propagation (line 572).""" + + def test_known_message_with_dependencies_propagates_to_graph(self) -> None: + """Known message with dependencies adds them to graph. + + Tests line 572: When known_msg_deps is provided and contains the + known message ID, copy those dependencies into the graph. + """ + # Current resource has a simple message + current_msg = Message( + id=Identifier("current"), + value=Pattern(elements=(TextElement("value"),)), + attributes=(), + ) + + messages_dict = {"current": current_msg} + terms_dict: dict[str, Term] = {} + known_messages = frozenset({"known_a", "known_b"}) + + # known_a has dependencies on known_b and a term + known_msg_deps: dict[str, frozenset[str]] = { + "known_a": frozenset({"msg:known_b", "term:some_term"}), + } + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_messages=known_messages, + known_msg_deps=known_msg_deps, + ) + + # Should have known_a in graph with its dependencies + assert "msg:known_a" in graph + assert graph["msg:known_a"] == {"msg:known_b", "term:some_term"} + + def test_known_message_without_deps_entry_gets_empty_set(self) -> None: + """Known message not in known_msg_deps gets empty dependency set. + + Tests line 574: When known message is NOT in known_msg_deps dict, + it gets an empty set (no dependencies). + """ + current_msg = Message( + id=Identifier("current"), + value=Pattern(elements=(TextElement("value"),)), + attributes=(), + ) + + messages_dict = {"current": current_msg} + terms_dict: dict[str, Term] = {} + known_messages = frozenset({"known_orphan"}) + + # known_msg_deps exists but doesn't contain "known_orphan" + known_msg_deps: dict[str, frozenset[str]] = { + "some_other_msg": frozenset({"msg:dependency"}), + } + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_messages=known_messages, + known_msg_deps=known_msg_deps, + ) + + # Should have known_orphan in graph with empty dependencies + assert "msg:known_orphan" in graph + assert graph["msg:known_orphan"] == set() + + def test_known_message_already_in_graph_not_overwritten(self) -> None: + """Known message already in graph from current resource is not overwritten. + + Tests the "if node_key not in graph" guard at line 569: if a known + message is also defined in the current resource, the current resource + definition takes precedence. + """ + # Current resource defines "shared" message + shared_msg = Message( + id=Identifier("shared"), + value=Pattern( + elements=( + Placeable(expression=MessageReference(id=Identifier("local"))), + ) + ), + attributes=(), + ) + local_msg = Message( + id=Identifier("local"), + value=Pattern(elements=(TextElement("value"),)), + attributes=(), + ) + + messages_dict = {"shared": shared_msg, "local": local_msg} + terms_dict: dict[str, Term] = {} + known_messages = frozenset({"shared"}) # "shared" is also in known + + # known_msg_deps says "shared" depends on something else + known_msg_deps: dict[str, frozenset[str]] = { + "shared": frozenset({"msg:different_dependency"}), + } + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_messages=known_messages, + known_msg_deps=known_msg_deps, + ) + + # Current resource definition should win - "shared" depends on "local" + assert "msg:shared" in graph + assert "msg:local" in graph["msg:shared"] + # Should NOT have the known_msg_deps dependency + assert "msg:different_dependency" not in graph["msg:shared"] + + +class TestKnownTermDependencies: + """Test known_term_deps dependency propagation (line 582).""" + + def test_known_term_with_dependencies_propagates_to_graph(self) -> None: + """Known term with dependencies adds them to graph. + + Tests line 582: When known_term_deps is provided and contains the + known term ID, copy those dependencies into the graph. + """ + # Current resource has a simple message + current_msg = Message( + id=Identifier("current"), + value=Pattern(elements=(TextElement("value"),)), + attributes=(), + ) + + messages_dict = {"current": current_msg} + terms_dict: dict[str, Term] = {} + known_terms = frozenset({"known_term_a", "known_term_b"}) + + # known_term_a has dependencies + known_term_deps: dict[str, frozenset[str]] = { + "known_term_a": frozenset({"term:known_term_b", "msg:some_msg"}), + } + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_terms=known_terms, + known_term_deps=known_term_deps, + ) + + # Should have known_term_a in graph with its dependencies + assert "term:known_term_a" in graph + assert graph["term:known_term_a"] == {"term:known_term_b", "msg:some_msg"} + + def test_known_term_without_deps_entry_gets_empty_set(self) -> None: + """Known term not in known_term_deps gets empty dependency set. + + Tests line 584: When known term is NOT in known_term_deps dict, + it gets an empty set (no dependencies). + """ + current_msg = Message( + id=Identifier("current"), + value=Pattern(elements=(TextElement("value"),)), + attributes=(), + ) + + messages_dict = {"current": current_msg} + terms_dict: dict[str, Term] = {} + known_terms = frozenset({"known_orphan_term"}) + + # known_term_deps exists but doesn't contain "known_orphan_term" + known_term_deps: dict[str, frozenset[str]] = { + "some_other_term": frozenset({"term:dependency"}), + } + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_terms=known_terms, + known_term_deps=known_term_deps, + ) + + # Should have known_orphan_term in graph with empty dependencies + assert "term:known_orphan_term" in graph + assert graph["term:known_orphan_term"] == set() + + def test_known_term_already_in_graph_not_overwritten(self) -> None: + """Known term already in graph from current resource is not overwritten. + + Tests the "if node_key not in graph" guard at line 579: if a known + term is also defined in the current resource, the current resource + definition takes precedence. + """ + # Current resource defines "shared_term" term + shared_term = Term( + id=Identifier("shared_term"), + value=Pattern( + elements=( + Placeable(expression=TermReference(id=Identifier("local_term"))), + ) + ), + attributes=(), + ) + local_term = Term( + id=Identifier("local_term"), + value=Pattern(elements=(TextElement("value"),)), + attributes=(), + ) + + messages_dict: dict[str, Message] = {} + terms_dict = {"shared_term": shared_term, "local_term": local_term} + known_terms = frozenset({"shared_term"}) # "shared_term" is also in known + + # known_term_deps says "shared_term" depends on something else + known_term_deps: dict[str, frozenset[str]] = { + "shared_term": frozenset({"term:different_dependency"}), + } + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_terms=known_terms, + known_term_deps=known_term_deps, + ) + + # Current resource definition should win + assert "term:shared_term" in graph + assert "term:local_term" in graph["term:shared_term"] + # Should NOT have the known_term_deps dependency + assert "term:different_dependency" not in graph["term:shared_term"] + + +class TestCrossResourceCycleDetectionWithDependencies: + """Integration test: cross-resource cycle detection with known deps.""" + + def test_cross_resource_cycle_detected_via_known_deps(self) -> None: + """Cycle spanning current and known resources detected. + + Integration test: Current resource references known message, known + message (via known_msg_deps) references current resource, creating + a cross-resource cycle. + """ + # Current resource: msg_a -> known_b + msg_a = Message( + id=Identifier("a"), + value=Pattern( + elements=( + Placeable(expression=MessageReference(id=Identifier("b"))), + ) + ), + attributes=(), + ) + + messages_dict = {"a": msg_a} + terms_dict: dict[str, Term] = {} + known_messages = frozenset({"b"}) + + # Known message "b" references "a" (creating cycle: a -> b -> a) + known_msg_deps: dict[str, frozenset[str]] = { + "b": frozenset({"msg:a"}), + } + + graph = build_dependency_graph( + messages_dict, + terms_dict, + known_messages=known_messages, + known_msg_deps=known_msg_deps, + ) + + # Detect cycles + warnings = _detect_circular_references(graph) + + # Should detect the cross-resource cycle + circular_warnings = [w for w in warnings if "circular" in w.message.lower()] + assert len(circular_warnings) == 1 + # Should mention both messages in the cycle + warning_msg = circular_warnings[0].message.lower() + assert ("a" in warning_msg and "b" in warning_msg) or "circular" in warning_msg + + +class TestAttributeReferenceProperties: + """Property-based tests for attribute-qualified references.""" + + @given( + st.from_regex(r"[a-z]+", fullmatch=True), + st.from_regex(r"[a-z]+", fullmatch=True), + ) + def test_message_attribute_reference_roundtrip( + self, base_id: str, attr_id: str + ) -> None: + """PROPERTY: Message attribute reference creates qualified graph node. + + Attribute-qualified message reference "base.attr" should always + create a "msg:base.attr" node when "base" exists. + + Events emitted: + - id_length_base={bucket}: Length category of base identifier + - id_length_attr={bucket}: Length category of attribute identifier + """ + # Emit events for identifier length diversity + event(f"id_length_base={'short' if len(base_id) <= 3 else 'long'}") + event(f"id_length_attr={'short' if len(attr_id) <= 3 else 'long'}") + + base_msg = Message( + id=Identifier(base_id), + value=Pattern(elements=(TextElement("value"),)), + attributes=( + Attribute( + id=Identifier(attr_id), + value=Pattern(elements=(TextElement("attr value"),)), + ), + ), + ) + + ref_msg = Message( + id=Identifier("ref"), + value=Pattern( + elements=( + Placeable( + expression=MessageReference( + id=Identifier(base_id), + attribute=Identifier(attr_id), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {base_id: base_msg, "ref": ref_msg} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Property: qualified node exists + expected_node = f"msg:{base_id}.{attr_id}" + assert "msg:ref" in graph + assert expected_node in graph["msg:ref"] + + @given( + st.from_regex(r"[a-z]+", fullmatch=True), + st.from_regex(r"[a-z]+", fullmatch=True), + ) + def test_term_attribute_reference_roundtrip( + self, base_id: str, attr_id: str + ) -> None: + """PROPERTY: Term attribute reference creates qualified graph node. + + Attribute-qualified term reference "-base.attr" should always + create a "term:base.attr" node when "-base" exists. + + Events emitted: + - term_id_length_base={bucket}: Length category of base term identifier + - term_id_length_attr={bucket}: Length category of attribute identifier + """ + # Emit events for identifier length diversity + event(f"term_id_length_base={'short' if len(base_id) <= 3 else 'long'}") + event(f"term_id_length_attr={'short' if len(attr_id) <= 3 else 'long'}") + + base_term = Term( + id=Identifier(base_id), + value=Pattern(elements=(TextElement("value"),)), + attributes=( + Attribute( + id=Identifier(attr_id), + value=Pattern(elements=(TextElement("attr value"),)), + ), + ), + ) + + msg = Message( + id=Identifier("msg"), + value=Pattern( + elements=( + Placeable( + expression=TermReference( + id=Identifier(base_id), + attribute=Identifier(attr_id), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"msg": msg} + terms_dict = {base_id: base_term} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Property: qualified node exists + expected_node = f"term:{base_id}.{attr_id}" + assert "msg:msg" in graph + assert expected_node in graph["msg:msg"] diff --git a/tests/validation_resource_dependency_graph_cases/core_2.py b/tests/validation_resource_dependency_graph_cases/core_2.py new file mode 100644 index 00000000..9537e4e0 --- /dev/null +++ b/tests/validation_resource_dependency_graph_cases/core_2.py @@ -0,0 +1,146 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource_dependency_graph.py.""" + +from tests.validation_resource_dependency_graph_cases import * # noqa: F403 - shared split test support + + +class TestComplexAttributeReferences: + """Test complex scenarios with attribute references.""" + + def test_message_with_multiple_attribute_references(self) -> None: + """Message referencing multiple attributes from different messages.""" + msg_a = Message( + id=Identifier("a"), + value=Pattern(elements=(TextElement("A"),)), + attributes=( + Attribute( + id=Identifier("tooltip"), + value=Pattern(elements=(TextElement("A tooltip"),)), + ), + ), + ) + + msg_b = Message( + id=Identifier("b"), + value=Pattern(elements=(TextElement("B"),)), + attributes=( + Attribute( + id=Identifier("label"), + value=Pattern(elements=(TextElement("B label"),)), + ), + ), + ) + + # Message referencing multiple attributes + msg_complex = Message( + id=Identifier("complex"), + value=Pattern( + elements=( + TextElement("Value"), + Placeable( + expression=MessageReference( + id=Identifier("a"), + attribute=Identifier("tooltip"), + ) + ), + TextElement(" and "), + Placeable( + expression=MessageReference( + id=Identifier("b"), + attribute=Identifier("label"), + ) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"a": msg_a, "b": msg_b, "complex": msg_complex} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have dependencies on both qualified attributes + assert "msg:complex" in graph + assert "msg:a.tooltip" in graph["msg:complex"] + assert "msg:b.label" in graph["msg:complex"] + + def test_message_attribute_itself_has_references(self) -> None: + """Message attribute containing references creates attribute-level node.""" + base_msg = Message( + id=Identifier("base"), + value=Pattern(elements=(TextElement("base value"),)), + attributes=(), + ) + + # Message with attribute that references another message + msg_with_attr_ref = Message( + id=Identifier("complex"), + value=Pattern(elements=(TextElement("value"),)), + attributes=( + Attribute( + id=Identifier("tooltip"), + value=Pattern( + elements=( + TextElement("See "), + Placeable(expression=MessageReference(id=Identifier("base"))), + ) + ), + ), + ), + ) + + messages_dict = {"base": base_msg, "complex": msg_with_attr_ref} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:complex.tooltip" node with dependency on "msg:base" + assert "msg:complex.tooltip" in graph + assert "msg:base" in graph["msg:complex.tooltip"] + + def test_select_expression_in_attribute_creates_variant_dependencies(self) -> None: + """Attribute with select expression creates variant-level dependencies.""" + base_msg = Message( + id=Identifier("base"), + value=Pattern(elements=(TextElement("base"),)), + attributes=(), + ) + + # Message with attribute containing select expression + msg_with_select_attr = Message( + id=Identifier("selector"), + value=Pattern(elements=(TextElement("value"),)), + attributes=( + Attribute( + id=Identifier("dynamic"), + value=Pattern( + elements=( + Placeable( + expression=SelectExpression( + selector=MessageReference(id=Identifier("base")), + variants=( + Variant( + key=Identifier("one"), + value=Pattern( + elements=(TextElement("variant"),) + ), + default=True, + ), + ), + ) + ), + ) + ), + ), + ), + ) + + messages_dict = {"base": base_msg, "selector": msg_with_select_attr} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + + # Should have "msg:selector.dynamic" node with dependency on "msg:base" + assert "msg:selector.dynamic" in graph + assert "msg:base" in graph["msg:selector.dynamic"] diff --git a/tests/validation_resource_dependency_graph_cases/cycle_detection_branch_coverage.py b/tests/validation_resource_dependency_graph_cases/cycle_detection_branch_coverage.py new file mode 100644 index 00000000..01d93fd3 --- /dev/null +++ b/tests/validation_resource_dependency_graph_cases/cycle_detection_branch_coverage.py @@ -0,0 +1,182 @@ +# mypy: ignore-errors +"""Split test cases from tests/test_validation_resource_dependency_graph.py.""" + +from tests.validation_resource_dependency_graph_cases import * # noqa: F403 - shared split test support + +# ============================================================================ +# CYCLE DETECTION BRANCH COVERAGE +# ============================================================================ + + +class TestValidationResourceBranchCoverage: + """Test validation/resource.py cycle detection branch coverage.""" + + def test_cycle_detection_loop_iterations(self) -> None: + """Cycle detection handles term-to-term cycle correctly.""" + msg_a = Message( + id=Identifier("a"), + value=Pattern( + elements=( + Placeable( + expression=TermReference(id=Identifier("x")) + ), + ) + ), + attributes=(), + ) + msg_b = Message( + id=Identifier("b"), + value=Pattern( + elements=( + Placeable( + expression=TermReference(id=Identifier("y")) + ), + ) + ), + attributes=(), + ) + + term_x = Term( + id=Identifier("x"), + value=Pattern( + elements=( + Placeable( + expression=TermReference(id=Identifier("y")) + ), + ) + ), + attributes=(), + ) + term_y = Term( + id=Identifier("y"), + value=Pattern( + elements=( + Placeable( + expression=TermReference(id=Identifier("x")) + ), + ) + ), + attributes=(), + ) + + messages_dict = {"a": msg_a, "b": msg_b} + terms_dict = {"x": term_x, "y": term_y} + + graph = build_dependency_graph(messages_dict, terms_dict) + warnings = _detect_circular_references(graph) + + cycle_warnings = [w for w in warnings if "circular" in w.message.lower()] + assert len(cycle_warnings) >= 1 + + def test_cross_type_cycle_detection(self) -> None: + """Cycle detection finds message-to-term-to-message cycle.""" + msg_a = Message( + id=Identifier("a"), + value=Pattern( + elements=( + Placeable(expression=TermReference(id=Identifier("t"))), + ) + ), + attributes=(), + ) + + term_t = Term( + id=Identifier("t"), + value=Pattern( + elements=( + Placeable(expression=MessageReference(id=Identifier("a"))), + ) + ), + attributes=(), + ) + + messages_dict = {"a": msg_a} + terms_dict = {"t": term_t} + + graph = build_dependency_graph(messages_dict, terms_dict) + warnings = _detect_circular_references(graph) + + assert any("circular" in w.message.lower() for w in warnings) + + +class TestResourceValidationBranchCoverageExtended: + """Extended resource validation branch coverage tests.""" + + def test_cycle_detection_with_multiple_independent_cycles(self) -> None: + """Cycle detection finds both of two independent cycles in the same resource.""" + msg_a = Message( + id=Identifier("a"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("b"))),) + ), + attributes=(), + ) + msg_b = Message( + id=Identifier("b"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("a"))),) + ), + attributes=(), + ) + + msg_x = Message( + id=Identifier("x"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("y"))),) + ), + attributes=(), + ) + msg_y = Message( + id=Identifier("y"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("x"))),) + ), + attributes=(), + ) + + messages_dict = {"a": msg_a, "b": msg_b, "x": msg_x, "y": msg_y} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + warnings = _detect_circular_references(graph) + + cycle_warnings = [w for w in warnings if "circular" in w.message.lower()] + assert len(cycle_warnings) >= 2 + + def test_no_cycles_in_linear_chain(self) -> None: + """Linear reference chain without cycles produces no cycle warnings.""" + msg_a = Message( + id=Identifier("a"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("b"))),) + ), + attributes=(), + ) + msg_b = Message( + id=Identifier("b"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("c"))),) + ), + attributes=(), + ) + msg_c = Message( + id=Identifier("c"), + value=Pattern( + elements=(Placeable(expression=MessageReference(id=Identifier("d"))),) + ), + attributes=(), + ) + msg_d = Message( + id=Identifier("d"), + value=Pattern(elements=(TextElement("End"),)), + attributes=(), + ) + + messages_dict = {"a": msg_a, "b": msg_b, "c": msg_c, "d": msg_d} + terms_dict: dict[str, Term] = {} + + graph = build_dependency_graph(messages_dict, terms_dict) + warnings = _detect_circular_references(graph) + + cycle_warnings = [w for w in warnings if "circular" in w.message.lower()] + assert len(cycle_warnings) == 0 diff --git a/uv.lock b/uv.lock index ab5de8f2..cac2377d 100644 --- a/uv.lock +++ b/uv.lock @@ -75,16 +75,16 @@ wheels = [ [[package]] name = "build" -version = "1.4.3" +version = "1.5.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "colorama", marker = "os_name == 'nt'" }, { name = "packaging" }, { name = "pyproject-hooks" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/3f/16/4b272700dea44c1d2e8ca963ebb3c684efe22b3eba8cfa31c5fdb60de707/build-1.4.3.tar.gz", hash = "sha256:5aa4231ae0e807efdf1fd0623e07366eca2ab215921345a2e38acdd5d0fa0a74", size = 89314, upload-time = "2026-04-10T21:25:40.857Z" } +sdist = { url = "https://files.pythonhosted.org/packages/78/e0/df5e171f685f82f37b12e1f208064e24244911079d7b767447d1af7e0d70/build-1.5.0.tar.gz", hash = "sha256:302c22c3ba2a0fd5f3911918651341ebb3896176cbdec15bd421f80b1afc7647", size = 89796, upload-time = "2026-04-30T03:18:25.17Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/b2/30/f169e1d8b2071beaf8b97088787e30662b1d8fb82f8c0941d14678c0cbf1/build-1.4.3-py3-none-any.whl", hash = "sha256:1bc22b19b383303de8f2c8554c9a32894a58d3f185fe3756b0b20d255bee9a38", size = 26171, upload-time = "2026-04-10T21:25:39.671Z" }, + { url = "https://files.pythonhosted.org/packages/0d/fe/6bea5c9162869c5beba5d9c8abbed835ec85bf1ec1fba05a3822325c45f3/build-1.5.0-py3-none-any.whl", hash = "sha256:13f3eecb844759ab66efec90ca17639bbf14dc06cb2fdf37a9010322d9c50a6f", size = 26018, upload-time = "2026-04-30T03:18:23.644Z" }, ] [[package]] @@ -301,7 +301,7 @@ wheels = [ [[package]] name = "ftllexengine" -version = "0.165.0" +version = "0.166.0" source = { editable = "." } [package.optional-dependencies] @@ -326,6 +326,7 @@ dev = [ ] fuzz = [ { name = "hypofuzz" }, + { name = "hypothesis", extra = ["cli"] }, ] release = [ { name = "build" }, @@ -340,18 +341,21 @@ provides-extras = ["babel"] atheris = [{ name = "atheris", marker = "python_full_version < '3.14'", specifier = ">=3.0.0" }] dev = [ { name = "babel", specifier = ">=2.18.0,<3.0.0" }, - { name = "hypothesis", specifier = ">=6.152.1" }, + { name = "hypothesis", specifier = ">=6.152.4" }, { name = "mypy", specifier = ">=1.20.2" }, { name = "psutil", specifier = ">=7.2.2" }, { name = "pytest", specifier = ">=9.0.3" }, { name = "pytest-benchmark", specifier = ">=5.2.3" }, { name = "pytest-cov", specifier = ">=7.1.0" }, - { name = "ruff", specifier = ">=0.15.11" }, + { name = "ruff", specifier = ">=0.15.12" }, { name = "types-psutil", specifier = ">=7.2.2.20260408" }, ] -fuzz = [{ name = "hypofuzz", specifier = ">=25.11.1" }] +fuzz = [ + { name = "hypofuzz", specifier = ">=25.11.1" }, + { name = "hypothesis", extras = ["cli"], specifier = ">=6.152.4" }, +] release = [ - { name = "build", specifier = ">=1.4.3" }, + { name = "build", specifier = ">=1.5.0" }, { name = "twine", specifier = ">=6.2.0" }, ] @@ -432,14 +436,14 @@ wheels = [ [[package]] name = "hypothesis" -version = "6.152.1" +version = "6.152.4" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "sortedcontainers" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/64/b1/c32bcddb9aab9e3abc700f1f56faf14e7655c64a16ca47701a57362276ea/hypothesis-6.152.1.tar.gz", hash = "sha256:4f4ed934eee295dd84ee97592477d23e8dc03e9f12ae0ee30a4e7c9ef3fca3b0", size = 465029, upload-time = "2026-04-14T22:29:24.062Z" } +sdist = { url = "https://files.pythonhosted.org/packages/fa/c7/3147bd903d6b18324a016d43a259cf5b4bb4545e1ead6773dc8a0374e70a/hypothesis-6.152.4.tar.gz", hash = "sha256:31c8f9ce619716f543e2710b489b1633c833586641d9e6c94cee03f109a5afc4", size = 466444, upload-time = "2026-04-27T20:18:37.594Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/5d/83/860fb3075e00b0fc19a22a2301bc3c96f00437558c3911bdd0a3573a4a53/hypothesis-6.152.1-py3-none-any.whl", hash = "sha256:40a3619d9e0cb97b018857c7986f75cf5de2e5ec0fa8a0b172d00747758f749e", size = 530752, upload-time = "2026-04-14T22:29:20.893Z" }, + { url = "https://files.pythonhosted.org/packages/19/89/0f50dd0d92e8a7dffc24f69ab910ff81db89b2f082ba42682bd57695e4d2/hypothesis-6.152.4-py3-none-any.whl", hash = "sha256:e730fd93c7578182efadc7f90b3c5437ee4d55edf738930eb5043c81ac1d97e8", size = 532145, upload-time = "2026-04-27T20:18:35.043Z" }, ] [package.optional-dependencies] @@ -1047,27 +1051,27 @@ wheels = [ [[package]] name = "ruff" -version = "0.15.11" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/e4/8d/192f3d7103816158dfd5ea50d098ef2aec19194e6cbccd4b3485bdb2eb2d/ruff-0.15.11.tar.gz", hash = "sha256:f092b21708bf0e7437ce9ada249dfe688ff9a0954fc94abab05dcea7dcd29c33", size = 4637264, upload-time = "2026-04-16T18:46:26.58Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/02/1e/6aca3427f751295ab011828e15e9bf452200ac74484f1db4be0197b8170b/ruff-0.15.11-py3-none-linux_armv6l.whl", hash = "sha256:e927cfff503135c558eb581a0c9792264aae9507904eb27809cdcff2f2c847b7", size = 10607943, upload-time = "2026-04-16T18:46:05.967Z" }, - { url = "https://files.pythonhosted.org/packages/e7/26/1341c262e74f36d4e84f3d6f4df0ac68cd53331a66bfc5080daa17c84c0b/ruff-0.15.11-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:7a1b5b2938d8f890b76084d4fa843604d787a912541eae85fd7e233398bbb73e", size = 10988592, upload-time = "2026-04-16T18:46:00.742Z" }, - { url = "https://files.pythonhosted.org/packages/03/71/850b1d6ffa9564fbb6740429bad53df1094082fe515c8c1e74b6d8d05f18/ruff-0.15.11-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d4176f3d194afbdaee6e41b9ccb1a2c287dba8700047df474abfbe773825d1cb", size = 10338501, upload-time = "2026-04-16T18:46:03.723Z" }, - { url = "https://files.pythonhosted.org/packages/f2/11/cc1284d3e298c45a817a6aadb6c3e1d70b45c9b36d8d9cce3387b495a03a/ruff-0.15.11-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3b17c886fb88203ced3afe7f14e8d5ae96e9d2f4ccc0ee66aa19f2c2675a27e4", size = 10670693, upload-time = "2026-04-16T18:46:41.941Z" }, - { url = "https://files.pythonhosted.org/packages/ce/9e/f8288b034ab72b371513c13f9a41d9ba3effac54e24bfb467b007daee2ca/ruff-0.15.11-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:49fafa220220afe7758a487b048de4c8f9f767f37dfefad46b9dd06759d003eb", size = 10416177, upload-time = "2026-04-16T18:46:21.717Z" }, - { url = "https://files.pythonhosted.org/packages/85/71/504d79abfd3d92532ba6bbe3d1c19fada03e494332a59e37c7c2dabae427/ruff-0.15.11-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f2ab8427e74a00d93b8bda1307b1e60970d40f304af38bccb218e056c220120d", size = 11221886, upload-time = "2026-04-16T18:46:15.086Z" }, - { url = "https://files.pythonhosted.org/packages/43/5a/947e6ab7a5ad603d65b474be15a4cbc6d29832db5d762cd142e4e3a74164/ruff-0.15.11-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:195072c0c8e1fc8f940652073df082e37a5d9cb43b4ab1e4d0566ab8977a13b7", size = 12075183, upload-time = "2026-04-16T18:46:07.944Z" }, - { url = "https://files.pythonhosted.org/packages/9f/a1/0b7bb6268775fdd3a0818aee8efd8f5b4e231d24dd4d528ced2534023182/ruff-0.15.11-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:a3a0996d486af3920dec930a2e7daed4847dfc12649b537a9335585ada163e9e", size = 11516575, upload-time = "2026-04-16T18:46:31.687Z" }, - { url = "https://files.pythonhosted.org/packages/30/c3/bb5168fc4d233cc06e95f482770d0f3c87945a0cd9f614b90ea8dc2f2833/ruff-0.15.11-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1bef2cb556d509259f1fe440bb9cd33c756222cf0a7afe90d15edf0866702431", size = 11306537, upload-time = "2026-04-16T18:46:36.988Z" }, - { url = "https://files.pythonhosted.org/packages/e4/92/4cfae6441f3967317946f3b788136eecf093729b94d6561f963ed810c82e/ruff-0.15.11-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:030d921a836d7d4a12cf6e8d984a88b66094ccb0e0f17ddd55067c331191bf19", size = 11296813, upload-time = "2026-04-16T18:46:24.182Z" }, - { url = "https://files.pythonhosted.org/packages/43/26/972784c5dde8313acde8ac71ba8ac65475b85db4a2352a76c9934361f9bc/ruff-0.15.11-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:0e783b599b4577788dbbb66b9addcef87e9a8832f4ce0c19e34bf55543a2f890", size = 10633136, upload-time = "2026-04-16T18:46:39.802Z" }, - { url = "https://files.pythonhosted.org/packages/5b/53/3985a4f185020c2f367f2e08a103032e12564829742a1b417980ce1514a0/ruff-0.15.11-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:ae90592246625ba4a34349d68ec28d4400d75182b71baa196ddb9f82db025ef5", size = 10424701, upload-time = "2026-04-16T18:46:10.381Z" }, - { url = "https://files.pythonhosted.org/packages/d3/57/bf0dfb32241b56c83bb663a826133da4bf17f682ba8c096973065f6e6a68/ruff-0.15.11-py3-none-musllinux_1_2_i686.whl", hash = "sha256:1f111d62e3c983ed20e0ca2e800f8d77433a5b1161947df99a5c2a3fb60514f0", size = 10873887, upload-time = "2026-04-16T18:46:29.157Z" }, - { url = "https://files.pythonhosted.org/packages/02/05/e48076b2a57dc33ee8c7a957296f97c744ca891a8ffb4ffb1aaa3b3f517d/ruff-0.15.11-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:06f483d6646f59eaffba9ae30956370d3a886625f511a3108994000480621d1c", size = 11404316, upload-time = "2026-04-16T18:46:19.462Z" }, - { url = "https://files.pythonhosted.org/packages/88/27/0195d15fe7a897cbcba0904792c4b7c9fdd958456c3a17d2ea6093716a9a/ruff-0.15.11-py3-none-win32.whl", hash = "sha256:476a2aa56b7da0b73a3ee80b6b2f0e19cce544245479adde7baa65466664d5f3", size = 10655535, upload-time = "2026-04-16T18:46:12.47Z" }, - { url = "https://files.pythonhosted.org/packages/3a/5e/c927b325bd4c1d3620211a4b96f47864633199feed60fa936025ab27e090/ruff-0.15.11-py3-none-win_amd64.whl", hash = "sha256:8b6756d88d7e234fb0c98c91511aae3cd519d5e3ed271cae31b20f39cb2a12a3", size = 11779692, upload-time = "2026-04-16T18:46:17.268Z" }, - { url = "https://files.pythonhosted.org/packages/63/b6/aeadee5443e49baa2facd51131159fd6301cc4ccfc1541e4df7b021c37dd/ruff-0.15.11-py3-none-win_arm64.whl", hash = "sha256:063fed18cc1bbe0ee7393957284a6fe8b588c6a406a285af3ee3f46da2391ee4", size = 11032614, upload-time = "2026-04-16T18:46:34.487Z" }, +version = "0.15.12" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/99/43/3291f1cc9106f4c63bdce7a8d0df5047fe8422a75b091c16b5e9355e0b11/ruff-0.15.12.tar.gz", hash = "sha256:ecea26adb26b4232c0c2ca19ccbc0083a68344180bba2a600605538ce51a40a6", size = 4643852, upload-time = "2026-04-24T18:17:14.305Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/c3/6e/e78ffb61d4686f3d96ba3df2c801161843746dcbcbb17a1e927d4829312b/ruff-0.15.12-py3-none-linux_armv6l.whl", hash = "sha256:f86f176e188e94d6bdbc09f09bfd9dc729059ad93d0e7390b5a73efe19f8861c", size = 10640713, upload-time = "2026-04-24T18:17:22.841Z" }, + { url = "https://files.pythonhosted.org/packages/ae/08/a317bc231fb9e7b93e4ef3089501e51922ff88d6936ce5cf870c4fe55419/ruff-0.15.12-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:e3bcd123364c3770b8e1b7baaf343cc99a35f197c5c6e8af79015c666c423a6c", size = 11069267, upload-time = "2026-04-24T18:17:30.105Z" }, + { url = "https://files.pythonhosted.org/packages/aa/a4/f828e9718d3dce1f5f11c39c4f65afd32783c8b2aebb2e3d259e492c47bd/ruff-0.15.12-py3-none-macosx_11_0_arm64.whl", hash = "sha256:fe87510d000220aa1ed530d4448a7c696a0cae1213e5ec30e5874287b66557b5", size = 10397182, upload-time = "2026-04-24T18:17:07.177Z" }, + { url = "https://files.pythonhosted.org/packages/71/e0/3310fc6d1b5e1fdea22bf3b1b807c7e187b581021b0d7d4514cccdb5fb71/ruff-0.15.12-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:84a1630093121375a3e2a95b4a6dc7b59e2b4ee76216e32d81aae550a832d002", size = 10758012, upload-time = "2026-04-24T18:16:55.759Z" }, + { url = "https://files.pythonhosted.org/packages/11/c1/a606911aee04c324ddaa883ae418f3569792fd3c4a10c50e0dd0a2311e1e/ruff-0.15.12-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:fb129f40f114f089ebe0ca56c0d251cf2061b17651d464bb6478dc01e69f11f5", size = 10447479, upload-time = "2026-04-24T18:16:51.677Z" }, + { url = "https://files.pythonhosted.org/packages/9d/68/4201e8444f0894f21ab4aeeaee68aa4f10b51613514a20d80bd628d57e88/ruff-0.15.12-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:b0c862b172d695db7598426b8af465e7e9ac00a3ea2a3630ee67eb82e366aaa6", size = 11234040, upload-time = "2026-04-24T18:17:16.529Z" }, + { url = "https://files.pythonhosted.org/packages/34/ff/8a6d6cf4ccc23fd67060874e832c18919d1557a0611ebef03fdb01fff11e/ruff-0.15.12-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:2849ea9f3484c3aca43a82f484210370319e7170df4dfe4843395ddf6c57bc33", size = 12087377, upload-time = "2026-04-24T18:17:04.944Z" }, + { url = "https://files.pythonhosted.org/packages/85/f6/c669cf73f5152f623d34e69866a46d5e6185816b19fcd5b6dd8a2d299922/ruff-0.15.12-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:9e77c7e51c07fe396826d5969a5b846d9cd4c402535835fb6e21ce8b28fef847", size = 11367784, upload-time = "2026-04-24T18:17:25.409Z" }, + { url = "https://files.pythonhosted.org/packages/e8/39/c61d193b8a1daaa8977f7dea9e8d8ba866e02ea7b65d32f6861693aa4c12/ruff-0.15.12-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:83b2f4f2f3b1026b5fb449b467d9264bf22067b600f7b6f41fc5958909f449d0", size = 11344088, upload-time = "2026-04-24T18:17:12.258Z" }, + { url = "https://files.pythonhosted.org/packages/c2/8d/49afab3645e31e12c590acb6d3b5b69d7aab5b81926dbaf7461f9441f37a/ruff-0.15.12-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:9ba3b8f1afd7e2e43d8943e55f249e13f9682fde09711644a6e7290eb4f3e339", size = 11271770, upload-time = "2026-04-24T18:17:02.457Z" }, + { url = "https://files.pythonhosted.org/packages/46/06/33f41fe94403e2b755481cdfb9b7ef3e4e0ed031c4581124658d935d52b4/ruff-0.15.12-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:e852ba9fdc890655e1d78f2df1499efbe0e54126bd405362154a75e2bde159c5", size = 10719355, upload-time = "2026-04-24T18:17:27.648Z" }, + { url = "https://files.pythonhosted.org/packages/0d/59/18aa4e014debbf559670e4048e39260a85c7fcee84acfd761ac01e7b8d35/ruff-0.15.12-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:dd8aed930da53780d22fc70bdf84452c843cf64f8cb4eb38984319c24c5cd5fd", size = 10462758, upload-time = "2026-04-24T18:17:32.347Z" }, + { url = "https://files.pythonhosted.org/packages/25/e7/cc9f16fd0f3b5fddcbd7ec3d6ae30c8f3fde1047f32a4093a98d633c6570/ruff-0.15.12-py3-none-musllinux_1_2_i686.whl", hash = "sha256:01da3988d225628b709493d7dc67c3b9b12c0210016b08690ef9bd27970b262b", size = 10953498, upload-time = "2026-04-24T18:17:20.674Z" }, + { url = "https://files.pythonhosted.org/packages/72/7a/a9ba7f98c7a575978698f4230c5e8cc54bbc761af34f560818f933dafa0c/ruff-0.15.12-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:9cae0f92bd5700d1213188b31cd3bdd2b315361296d10b96b8e2337d3d11f53e", size = 11447765, upload-time = "2026-04-24T18:17:09.755Z" }, + { url = "https://files.pythonhosted.org/packages/ea/f9/0ae446942c846b8266059ad8a30702a35afae55f5cdc54c5adf8d7afdc27/ruff-0.15.12-py3-none-win32.whl", hash = "sha256:d0185894e038d7043ba8fd6aee7499ece6462dc0ea9f1e260c7451807c714c20", size = 10657277, upload-time = "2026-04-24T18:17:18.591Z" }, + { url = "https://files.pythonhosted.org/packages/33/f1/9614e03e1cdcbf9437570b5400ced8a720b5db22b28d8e0f1bda429f660d/ruff-0.15.12-py3-none-win_amd64.whl", hash = "sha256:c87a162d61ab3adca47c03f7f717c68672edec7d1b5499e652331780fe74950d", size = 11837758, upload-time = "2026-04-24T18:17:00.113Z" }, + { url = "https://files.pythonhosted.org/packages/c0/98/6beb4b351e472e5f4c4613f7c35a5290b8be2497e183825310c4c3a3984b/ruff-0.15.12-py3-none-win_arm64.whl", hash = "sha256:a538f7a82d061cee7be55542aca1d86d1393d55d81d4fcc314370f4340930d4f", size = 11120821, upload-time = "2026-04-24T18:16:57.979Z" }, ] [[package]]