feat: per-library scheduled metadata refresh jobs by AshDevFr · Pull Request #9 · AshDevFr/codex

AshDevFr · 2026-05-04T00:45:53Z

Summary

Adds a per-library scheduled metadata refresh system: typed background jobs that periodically re-fetch and re-apply metadata from configured providers, with field-group/allowlist control, optional re-matching, and a dry-run mode.
Introduces a new library_jobs entity, repository, task handler, and scheduler wiring, replacing the earlier single per-library refresh config with N typed jobs per library.
Exposes full CRUD + run-now + dry-run endpoints under /api/v1/libraries/.../jobs, and a new web UI for managing jobs, editing refresh settings, and previewing dry-run results.

Details

DB / migration: new library_jobs table (m20260503_000071_add_metadata_refresh_config) with entity (library_jobs.rs) and repository (library_jobs.rs).
Services: typed job model and validation in src/services/library_jobs/; metadata apply path extended with field-group resolver and dry-run support (apply.rs, field_groups.rs, refresh_planner.rs).
Tasks / scheduler: new refresh_library_metadata handler (refresh_library_metadata.rs) and per-library scheduling in scheduler/mod.rs; supports MatchingStrategy including a re-match path.
API: CRUD, run-now, and dry-run endpoints in handlers/library_jobs.rs, wired under routes/libraries.rs; per-provider field-allowlist overrides supported.
Web: new Library Jobs page (pages/LibraryJobs.tsx), job list, metadata refresh editor, and dry-run modal (components/library-jobs/); recency guard surfaced in hours rather than seconds.
Tests: API integration tests (tests/api/library_jobs.rs) and apply/dry-run service tests (tests/services/apply_dry_run.rs).

Lay the groundwork for scheduled, scoped, per-library metadata refreshes by adding a `metadata_refresh_config` JSON column to the `libraries` table, a strongly typed `MetadataRefreshConfig` with safe defaults (disabled, "use existing source IDs only", default field groups ratings/status/counts), and library-repository accessors that always return a usable config (default on NULL or parse error, with the parse error logged). The schema reserves a `per_provider_overrides` slot for future per-provider field allowlists so the eventual override feature won't need a migration. A `MetadataRefreshConfigPatch` and `merge_partial` helper are included for the upcoming PATCH endpoint; they are gated with `#[allow(dead_code)]` until that endpoint lands. Tests cover serde round-trips, partial JSON, parse-helper edge cases, PATCH merge semantics including null clears, and end-to-end repository read/write/overwrite plus default-on-NULL and invalid-JSON fallback.

Wire the work behind the scheduled metadata refresh: a new RefreshLibraryMetadata task variant, a stateless RefreshPlanner that decides which (series, provider) pairs to touch, and a worker handler that fetches per-series metadata from plugins and applies it through the existing MetadataApplier. The planner resolves "plugin:<name>" provider strings to enabled plugins (unresolved entries surface as plan-level skips), batches external-id lookups, and emits typed skip reasons (NoExternalId, RecentlySynced, ProviderUnavailable). The handler walks the plan sequentially with a per-pair timeout, isolates per-series errors via a typed PairError, bumps series_external_ids.last_synced_at after each successful apply so the recency guard works on the next run, and returns a stable RunSummary JSON. TaskProgressEvent updates are emitted per pair so the frontend can show live progress. The handler is registered in the worker alongside PluginAutoMatch and inherits the optional ThumbnailService for cover updates. Loose-mode re-matching is intentionally deferred until a MatchingStrategy enum lands; the current behavior counts those pairs as skipped_no_external_id. Tests cover the planner's filter combinations and the handler's short-circuit, error, and skip paths.

…etadata apply Introduce the user-facing field-group taxonomy and the dry-run plumbing the scheduled per-library refresh needs to surface "what would change?" to admins before they commit to a schedule. - New `field_groups` module with a closed `FieldGroup` enum (Identifiers, Descriptive, Status, Counts, Ratings, Cover, Tags, Genres, AgeRating, Classification, Publisher, ExternalRefs) and a `fields_for_groups` resolver that expands group names to the camelCase fields the applier recognises. A self-test pins every returned field to a real `should_apply_field` call site so the mapping cannot drift silently. - `MetadataApplier::apply` gains `ApplyOptions::dry_run` and a `MetadataApplyResult::dry_run_report` carrying `FieldChange` entries. Every write site is gated: dry-run records the prospective change instead of touching the DB. Locks and permission denials still flow through `skipped_fields` via the same code path, so the report stays focused on writes that would actually happen. - HTTP series apply endpoint accepts `dryRun` in the request body and returns `dryRunReport` in the response (omitted on real applies). External-ID upsert is skipped on dry-run so a preview never mutates state. Book apply explicitly rejects `dryRun=true` with 400 because `BookMetadataApplier` does not honour the flag yet. - `refresh_planner::fields_filter_from_config` now delegates to the field-group resolver, so a config like `{ field_groups: ["ratings","status"] }` correctly produces `{"rating","externalRatings","status","year"}` for the applier. Tests cover the field-group mapping, dry-run write gating with row-state snapshots, the new apply API contract, and the planner's group expansion. Existing apply call sites are updated to set `dry_run: false` so the default behaviour is unchanged.

…efresh Introduce an explicit `MatchingStrategy::{ExistingExternalIdOnly, AllowReMatch}` enum on `ApplyOptions` so the scheduled refresh can opt into "don't re-match — only refresh series that already have a stored provider ID" behavior, while every existing manual-apply call site keeps its historical "search if no ID" semantics via the `AllowReMatch` default. The `RefreshLibraryMetadataHandler` now picks the strategy from `metadata_refresh_config.existing_source_ids_only`. Strict mode keeps the planner's plan-time gating; loose mode invokes a new `rematch_external_id` helper that calls `metadata/series/match`, then feeds the returned external ID into the existing `metadata/series/get` + `MetadataApplier` flow. A match miss is surfaced as a new `no_match_candidate` bucket on `RunSummary` so it stays distinguishable from the strict-mode `no_external_id` skip. A new `RefreshSkipReason` taxonomy module provides a stable, serde-friendly superset of the planner's narrower skip enum, with a `From<PlannerSkipReason>` lift so planner reasons can be widened into the public taxonomy that the upcoming HTTP dry-run endpoint will surface. Manual-apply call sites in the v1 plugin handlers and `plugin_auto_match` were converted to `..Default::default()` so they pick up the new field without changing behavior. Tests cover the default-strategy guard, the loose-mode re-match branch, the skip-reason identifiers and serde shape, and the planner-to-public lift.

Register a cron entry per library whose `metadata_refresh_config.enabled = true`, firing `RefreshLibraryMetadata { library_id }` on each tick. The new loader runs from `start()` so `reload_schedules()` rebuilds refresh entries alongside the scan ones with no extra wiring on the API side. The cron-job closure runs a skip-if-already-running guard before enqueuing: if a `pending` or `processing` `refresh_library_metadata` task already exists for the same library, the firing is dropped with a warn log. This is belt and suspenders on top of `TaskRepository::enqueue` deduplication so operators see an explicit log line when a cron tick fires while the previous run is still draining. Per-library config errors (invalid cron, invalid timezone, missing library row) log a warning and skip just that library instead of aborting scheduler boot. Per-library timezone overrides fall back to the server default when parsing fails, matching the existing `add_library_schedule` behavior. Adds integration tests for enabled/disabled configs, per-library timezone, invalid cron tolerance, multi-library coexistence, and reload pickup, plus unit tests on the in-flight detector.

Expose the per-library scheduled metadata refresh to HTTP under a new `Metadata Refresh` OpenAPI tag: - GET/PATCH /api/v1/libraries/{id}/metadata-refresh — read and partially update the config; PATCH validates cron, timezone, field groups, and providers (disabled plugins accepted, missing rejected) and reloads the scheduler so changes apply without a restart. - POST /api/v1/libraries/{id}/metadata-refresh/run-now — enqueue the task immediately; shares the scheduler's skip-if-already-running guard and returns 409 when a refresh is in flight. - POST /api/v1/libraries/{id}/metadata-refresh/dry-run — synchronous preview bounded by sampleSize (default 5, cap 20) with a per-pair plugin timeout. Plans against an optional configOverride, fetches via metadata/series/get (or match in loose mode), and runs MetadataApplier with dry_run=true so no DB writes happen. - GET /api/v1/metadata-refresh/field-groups — static catalog of the 12 field groups with id, label, and concrete camelCase fields, derived from FieldGroup::all() so adding a variant extends the API automatically. The PATCH DTO uses PatchValue for nullable fields (timezone, per_provider_overrides) so absent leaves the value untouched, explicit null clears, and a value sets. Per-pair plugin failures during dry-run surface as a single skip-row in the sample so the UI can render "this series couldn't be previewed" instead of silently dropping it. Includes integration and unit tests covering happy paths, validation failures, permission gating, the run-now conflict guard, and the field-group catalog shape.

Surface the scheduled metadata-refresh feature in the library settings flow. A new "Metadata Refresh" tab in the edit-library modal lets admins configure the schedule (enable + cron preset or custom expression + timezone), choose which field groups to refresh, pick metadata providers, and tune safety options (existing-source-IDs-only, recency cutoff, max concurrency). A "Preview changes" action drives a synchronous dry-run and renders the resulting deltas as `field: before -> after` rows, with locked or otherwise-skipped fields shown alongside their reason. Unresolved providers are highlighted so typos and disabled plugins are easy to spot. "Run now" enqueues the refresh task and reuses the existing task-progress SSE channel to render an inline progress alert; the button stays disabled while a refresh is in flight to avoid piling work onto an active run. Includes a typed API client, TanStack Query hooks (config get/update, run-now, dry-run, field-groups), and component/hook tests covering hydration, save, run-now, dry-run rendering, and edge states.

…ed refresh Lets users override the library-wide field group selection on a per-provider basis, e.g. trust AniList for ratings only while MangaBaka refreshes status and counts under the same schedule. Backend: - New `fields_filter_for_provider(config, provider)` resolver returns the override's expanded field set when one exists for the given `"plugin:<name>"` key, else falls back to the library-wide filter. - `RefreshLibraryMetadataHandler` and the dry-run HTTP handler now compute the field filter per `(series, plugin)` pair instead of once globally. - PATCH and dry-run config validation extended: override map keys must resolve to an installed plugin; each override's field_groups must be a known FieldGroup. Errors mention both the provider and the bad input so the UI can highlight the right row. Frontend: - Per-provider override UI in MetadataRefreshSettings: each selected provider gets an indented expandable card with a Custom / Inherits-library badge, a field-group MultiSelect, and a "Reset to inherit" button. - Save and Preview filter the override map to currently-selected providers and send `null` when empty so the server can clear stale entries on persist. Tests added at the planner, API, and component levels. Pre-existing dead-code warnings on the branch were also annotated so clippy --all-targets -D warnings stays clean.

Earlier phases of the scheduled-refresh feature added speculative plumbing in anticipation of later wiring. The later phases took different shapes; the leftover pieces were never wired and clutter the type surface. Removed: - `ApplyOptions::matching_strategy` field. The applier never branched on it; the handlers correctly resolve the strategy before calling `apply`. Stripping the field also removes four `..Default::default()` spreads and the misleading "informational here" comments at the manual-apply call sites. - `field_groups::group_for_field`. Added for a UI label feature; the frontend computes that mapping client-side from the field-groups endpoint and never called this helper. - `RefreshSkipReason` enum and its module. The dry-run DTO went with raw `reason: String` and the task handler emits its own JSON, so the "stable public taxonomy" had no consumer. Also removed the now-dead re-exports `MetadataRefreshConfigPatch`, `PlannerSkipReason`, `SkippedRefresh`, `fields_for_groups`, and `fields_filter_from_config`, along with the `#[allow(unused_imports)]` and `#[allow(dead_code)]` annotations that were keeping them quiet. No behavior change; clippy --all-targets -D warnings stays clean without the allowlist suppressions.

…g with N typed jobs Pivot the scheduled metadata refresh from "one config per library" to a generic library_jobs table that supports N independent jobs per library, each scoped to a single provider with its own cron, field selection, and safety options. The table is type-discriminated (`type` + `config` JSON) to leave room for future job types (scan, cleanup, indexing) without schema changes. Backend - Replace the metadata_refresh_config column on libraries with a new library_jobs table (id, library_id, type, name, enabled, cron_schedule, timezone, config, last_run_*) with FK cascade and indexes on library_id / enabled / type. The original migration is rewritten in place; the prior config column never shipped. - Add a typed LibraryJobConfig discriminated enum + MetadataRefreshJobConfig payload with a RefreshScope (SeriesOnly, BooksOnly, SeriesAndBooks). Only SeriesOnly is honoured at runtime; the other variants are reserved schema fields rejected by validator and handler. - Rewrite RefreshPlanner for single-provider plans with a typed PlanFailure for plugin resolution errors. - Rewrite the RefreshLibraryMetadata task to take job_id, decode the row's config, runtime-check provider capabilities, and write back last_run_at / last_run_status / last_run_message on completion. - Replace the per-library scheduler loader with a per-job loader and switch the skip-if-already-running guard to query tasks.params->job_id. - Replace the /libraries/{id}/metadata-refresh endpoints with generic CRUD at /libraries/{id}/jobs (oneOf-tagged config), plus per-job run-now and dry-run endpoints. The field-groups catalog moves to /library-jobs/metadata-refresh/field-groups. - Surface plugin capabilities on /plugins/actions so the editor can filter scope options per provider. Frontend - New LibraryJobsPage at /libraries/:id/jobs with list view and add/edit/delete/run-now/preview actions. - Scope-aware editor: read-only label when the chosen provider supports one content type, segmented control when it supports both, auto-correct with notification on provider change, BooksOnly / SeriesAndBooks hard-disabled with a "coming soon" badge. - Persistent visible field-group → fields mapping (no hover required), Advanced individual-field disclosure, and a "Will write N fields" union preview. - Remove the Metadata Refresh tab from LibraryModal; navigation now goes through LibraryActionsMenu → Scheduled Jobs. Tests added across migration, repo, planner, handler, scheduler, API, and frontend. OpenAPI spec regenerated.

The "Skip if synced within" field on the metadata refresh job editor took a value in seconds, which forced users to think in seconds at the form level (3600, 21600, 86400) for what is conceptually an hours-scale threshold. The seconds presentation also made it harder to spot when the default 3600 (= 1h) was masking a "freshly synced, all skipped" outcome on a brand-new library. The wire format and persisted value are unchanged: skipRecentlySyncedWithinS still ships in seconds, the planner cutoff math is unchanged, and the default of 3600s is preserved. Only the input and label change: divide by 3600 on display, multiply by 3600 on save.

cloudflare-workers-and-pages · 2026-05-04T00:51:17Z

Deploying codex with Cloudflare Pages

Latest commit:	`6e87d56`
Status:	✅ Deploy successful!
Preview URL:	https://05efe317.codex-asm.pages.dev
Branch Preview URL:	https://scheduled-metadata-refresh.codex-asm.pages.dev

View logs

…JSON string The seed config sample showed plugin credentials as a one-line flow mapping, which invites users to wrap it in quotes — turning it into a YAML string. The encryption layer then stores that raw string instead of an object, and the plugin receives garbage when it looks up `credentials.api_key`. Switch the example to block-mapping form and add a comment warning against quoting the value.

AshDevFr added 11 commits May 3, 2026 10:31

AshDevFr merged commit e001e3f into main May 4, 2026
17 checks passed

AshDevFr deleted the scheduled-metadata-refresh branch May 4, 2026 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: per-library scheduled metadata refresh jobs#9

feat: per-library scheduled metadata refresh jobs#9
AshDevFr merged 12 commits into
mainfrom
scheduled-metadata-refresh

AshDevFr commented May 4, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AshDevFr commented May 4, 2026

Summary

Details

Uh oh!

cloudflare-workers-and-pages Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying codex with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented May 4, 2026 •

edited

Loading