Skip to content

feat: per-library scheduled metadata refresh jobs#9

Merged
AshDevFr merged 12 commits into
mainfrom
scheduled-metadata-refresh
May 4, 2026
Merged

feat: per-library scheduled metadata refresh jobs#9
AshDevFr merged 12 commits into
mainfrom
scheduled-metadata-refresh

Conversation

@AshDevFr
Copy link
Copy Markdown
Owner

@AshDevFr AshDevFr commented May 4, 2026

Summary

  • Adds a per-library scheduled metadata refresh system: typed background jobs that periodically re-fetch and re-apply metadata from configured providers, with field-group/allowlist control, optional re-matching, and a dry-run mode.
  • Introduces a new library_jobs entity, repository, task handler, and scheduler wiring, replacing the earlier single per-library refresh config with N typed jobs per library.
  • Exposes full CRUD + run-now + dry-run endpoints under /api/v1/libraries/.../jobs, and a new web UI for managing jobs, editing refresh settings, and previewing dry-run results.

Details

AshDevFr added 11 commits May 3, 2026 10:31
Lay the groundwork for scheduled, scoped, per-library metadata refreshes
by adding a `metadata_refresh_config` JSON column to the `libraries`
table, a strongly typed `MetadataRefreshConfig` with safe defaults
(disabled, "use existing source IDs only", default field groups
ratings/status/counts), and library-repository accessors that always
return a usable config (default on NULL or parse error, with the parse
error logged).

The schema reserves a `per_provider_overrides` slot for future
per-provider field allowlists so the eventual override feature won't
need a migration. A `MetadataRefreshConfigPatch` and `merge_partial`
helper are included for the upcoming PATCH endpoint; they are gated
with `#[allow(dead_code)]` until that endpoint lands.

Tests cover serde round-trips, partial JSON, parse-helper edge cases,
PATCH merge semantics including null clears, and end-to-end repository
read/write/overwrite plus default-on-NULL and invalid-JSON fallback.
Wire the work behind the scheduled metadata refresh: a new
RefreshLibraryMetadata task variant, a stateless RefreshPlanner that
decides which (series, provider) pairs to touch, and a worker handler
that fetches per-series metadata from plugins and applies it through
the existing MetadataApplier.

The planner resolves "plugin:<name>" provider strings to enabled
plugins (unresolved entries surface as plan-level skips), batches
external-id lookups, and emits typed skip reasons (NoExternalId,
RecentlySynced, ProviderUnavailable). The handler walks the plan
sequentially with a per-pair timeout, isolates per-series errors via a
typed PairError, bumps series_external_ids.last_synced_at after each
successful apply so the recency guard works on the next run, and
returns a stable RunSummary JSON. TaskProgressEvent updates are emitted
per pair so the frontend can show live progress.

The handler is registered in the worker alongside PluginAutoMatch and
inherits the optional ThumbnailService for cover updates. Loose-mode
re-matching is intentionally deferred until a MatchingStrategy enum
lands; the current behavior counts those pairs as skipped_no_external_id.

Tests cover the planner's filter combinations and the handler's
short-circuit, error, and skip paths.
…etadata apply

Introduce the user-facing field-group taxonomy and the dry-run plumbing
the scheduled per-library refresh needs to surface "what would change?"
to admins before they commit to a schedule.

- New `field_groups` module with a closed `FieldGroup` enum (Identifiers,
  Descriptive, Status, Counts, Ratings, Cover, Tags, Genres, AgeRating,
  Classification, Publisher, ExternalRefs) and a `fields_for_groups`
  resolver that expands group names to the camelCase fields the applier
  recognises. A self-test pins every returned field to a real
  `should_apply_field` call site so the mapping cannot drift silently.
- `MetadataApplier::apply` gains `ApplyOptions::dry_run` and a
  `MetadataApplyResult::dry_run_report` carrying `FieldChange`
  entries. Every write site is gated: dry-run records the prospective
  change instead of touching the DB. Locks and permission denials still
  flow through `skipped_fields` via the same code path, so the report
  stays focused on writes that would actually happen.
- HTTP series apply endpoint accepts `dryRun` in the request body and
  returns `dryRunReport` in the response (omitted on real applies).
  External-ID upsert is skipped on dry-run so a preview never mutates
  state. Book apply explicitly rejects `dryRun=true` with 400 because
  `BookMetadataApplier` does not honour the flag yet.
- `refresh_planner::fields_filter_from_config` now delegates to the
  field-group resolver, so a config like
  `{ field_groups: ["ratings","status"] }` correctly produces
  `{"rating","externalRatings","status","year"}` for the applier.

Tests cover the field-group mapping, dry-run write gating with row-state
snapshots, the new apply API contract, and the planner's group
expansion. Existing apply call sites are updated to set
`dry_run: false` so the default behaviour is unchanged.
…efresh

Introduce an explicit `MatchingStrategy::{ExistingExternalIdOnly,
AllowReMatch}` enum on `ApplyOptions` so the scheduled refresh can opt into
"don't re-match — only refresh series that already have a stored provider
ID" behavior, while every existing manual-apply call site keeps its
historical "search if no ID" semantics via the `AllowReMatch` default.

The `RefreshLibraryMetadataHandler` now picks the strategy from
`metadata_refresh_config.existing_source_ids_only`. Strict mode keeps the
planner's plan-time gating; loose mode invokes a new `rematch_external_id`
helper that calls `metadata/series/match`, then feeds the returned
external ID into the existing `metadata/series/get` + `MetadataApplier`
flow. A match miss is surfaced as a new `no_match_candidate` bucket on
`RunSummary` so it stays distinguishable from the strict-mode
`no_external_id` skip.

A new `RefreshSkipReason` taxonomy module provides a stable, serde-friendly
superset of the planner's narrower skip enum, with a `From<PlannerSkipReason>`
lift so planner reasons can be widened into the public taxonomy that the
upcoming HTTP dry-run endpoint will surface.

Manual-apply call sites in the v1 plugin handlers and `plugin_auto_match`
were converted to `..Default::default()` so they pick up the new field
without changing behavior. Tests cover the default-strategy guard, the
loose-mode re-match branch, the skip-reason identifiers and serde shape,
and the planner-to-public lift.
Register a cron entry per library whose `metadata_refresh_config.enabled = true`,
firing `RefreshLibraryMetadata { library_id }` on each tick. The new loader
runs from `start()` so `reload_schedules()` rebuilds refresh entries alongside
the scan ones with no extra wiring on the API side.

The cron-job closure runs a skip-if-already-running guard before enqueuing:
if a `pending` or `processing` `refresh_library_metadata` task already exists
for the same library, the firing is dropped with a warn log. This is belt
and suspenders on top of `TaskRepository::enqueue` deduplication so operators
see an explicit log line when a cron tick fires while the previous run is
still draining.

Per-library config errors (invalid cron, invalid timezone, missing library
row) log a warning and skip just that library instead of aborting scheduler
boot. Per-library timezone overrides fall back to the server default when
parsing fails, matching the existing `add_library_schedule` behavior.

Adds integration tests for enabled/disabled configs, per-library timezone,
invalid cron tolerance, multi-library coexistence, and reload pickup, plus
unit tests on the in-flight detector.
Expose the per-library scheduled metadata refresh to HTTP under a new
`Metadata Refresh` OpenAPI tag:

- GET/PATCH /api/v1/libraries/{id}/metadata-refresh — read and partially
  update the config; PATCH validates cron, timezone, field groups, and
  providers (disabled plugins accepted, missing rejected) and reloads the
  scheduler so changes apply without a restart.
- POST /api/v1/libraries/{id}/metadata-refresh/run-now — enqueue the task
  immediately; shares the scheduler's skip-if-already-running guard and
  returns 409 when a refresh is in flight.
- POST /api/v1/libraries/{id}/metadata-refresh/dry-run — synchronous
  preview bounded by sampleSize (default 5, cap 20) with a per-pair plugin
  timeout. Plans against an optional configOverride, fetches via
  metadata/series/get (or match in loose mode), and runs MetadataApplier
  with dry_run=true so no DB writes happen.
- GET /api/v1/metadata-refresh/field-groups — static catalog of the 12
  field groups with id, label, and concrete camelCase fields, derived from
  FieldGroup::all() so adding a variant extends the API automatically.

The PATCH DTO uses PatchValue for nullable fields (timezone,
per_provider_overrides) so absent leaves the value untouched, explicit
null clears, and a value sets. Per-pair plugin failures during dry-run
surface as a single skip-row in the sample so the UI can render
"this series couldn't be previewed" instead of silently dropping it.

Includes integration and unit tests covering happy paths, validation
failures, permission gating, the run-now conflict guard, and the
field-group catalog shape.
Surface the scheduled metadata-refresh feature in the library settings flow.
A new "Metadata Refresh" tab in the edit-library modal lets admins configure
the schedule (enable + cron preset or custom expression + timezone), choose
which field groups to refresh, pick metadata providers, and tune safety
options (existing-source-IDs-only, recency cutoff, max concurrency).

A "Preview changes" action drives a synchronous dry-run and renders the
resulting deltas as `field: before -> after` rows, with locked or
otherwise-skipped fields shown alongside their reason. Unresolved providers
are highlighted so typos and disabled plugins are easy to spot.

"Run now" enqueues the refresh task and reuses the existing task-progress
SSE channel to render an inline progress alert; the button stays disabled
while a refresh is in flight to avoid piling work onto an active run.

Includes a typed API client, TanStack Query hooks (config get/update,
run-now, dry-run, field-groups), and component/hook tests covering
hydration, save, run-now, dry-run rendering, and edge states.
…ed refresh

Lets users override the library-wide field group selection on a
per-provider basis, e.g. trust AniList for ratings only while
MangaBaka refreshes status and counts under the same schedule.

Backend:
- New `fields_filter_for_provider(config, provider)` resolver returns
  the override's expanded field set when one exists for the given
  `"plugin:<name>"` key, else falls back to the library-wide filter.
- `RefreshLibraryMetadataHandler` and the dry-run HTTP handler now
  compute the field filter per `(series, plugin)` pair instead of
  once globally.
- PATCH and dry-run config validation extended: override map keys
  must resolve to an installed plugin; each override's field_groups
  must be a known FieldGroup. Errors mention both the provider and
  the bad input so the UI can highlight the right row.

Frontend:
- Per-provider override UI in MetadataRefreshSettings: each selected
  provider gets an indented expandable card with a Custom /
  Inherits-library badge, a field-group MultiSelect, and a
  "Reset to inherit" button.
- Save and Preview filter the override map to currently-selected
  providers and send `null` when empty so the server can clear stale
  entries on persist.

Tests added at the planner, API, and component levels. Pre-existing
dead-code warnings on the branch were also annotated so
clippy --all-targets -D warnings stays clean.
Earlier phases of the scheduled-refresh feature added speculative
plumbing in anticipation of later wiring. The later phases took
different shapes; the leftover pieces were never wired and clutter
the type surface.

Removed:
- `ApplyOptions::matching_strategy` field. The applier never branched
  on it; the handlers correctly resolve the strategy before calling
  `apply`. Stripping the field also removes four `..Default::default()`
  spreads and the misleading "informational here" comments at the
  manual-apply call sites.
- `field_groups::group_for_field`. Added for a UI label feature; the
  frontend computes that mapping client-side from the field-groups
  endpoint and never called this helper.
- `RefreshSkipReason` enum and its module. The dry-run DTO went with
  raw `reason: String` and the task handler emits its own JSON, so
  the "stable public taxonomy" had no consumer.

Also removed the now-dead re-exports `MetadataRefreshConfigPatch`,
`PlannerSkipReason`, `SkippedRefresh`, `fields_for_groups`, and
`fields_filter_from_config`, along with the `#[allow(unused_imports)]`
and `#[allow(dead_code)]` annotations that were keeping them quiet.
No behavior change; clippy --all-targets -D warnings stays clean
without the allowlist suppressions.
…g with N typed jobs

Pivot the scheduled metadata refresh from "one config per library" to a
generic library_jobs table that supports N independent jobs per library,
each scoped to a single provider with its own cron, field selection, and
safety options. The table is type-discriminated (`type` + `config` JSON)
to leave room for future job types (scan, cleanup, indexing) without
schema changes.

Backend
- Replace the metadata_refresh_config column on libraries with a new
  library_jobs table (id, library_id, type, name, enabled, cron_schedule,
  timezone, config, last_run_*) with FK cascade and indexes on
  library_id / enabled / type. The original migration is rewritten in
  place; the prior config column never shipped.
- Add a typed LibraryJobConfig discriminated enum + MetadataRefreshJobConfig
  payload with a RefreshScope (SeriesOnly, BooksOnly, SeriesAndBooks).
  Only SeriesOnly is honoured at runtime; the other variants are reserved
  schema fields rejected by validator and handler.
- Rewrite RefreshPlanner for single-provider plans with a typed PlanFailure
  for plugin resolution errors.
- Rewrite the RefreshLibraryMetadata task to take job_id, decode the row's
  config, runtime-check provider capabilities, and write back
  last_run_at / last_run_status / last_run_message on completion.
- Replace the per-library scheduler loader with a per-job loader and
  switch the skip-if-already-running guard to query tasks.params->job_id.
- Replace the /libraries/{id}/metadata-refresh endpoints with generic
  CRUD at /libraries/{id}/jobs (oneOf-tagged config), plus per-job
  run-now and dry-run endpoints. The field-groups catalog moves to
  /library-jobs/metadata-refresh/field-groups.
- Surface plugin capabilities on /plugins/actions so the editor can
  filter scope options per provider.

Frontend
- New LibraryJobsPage at /libraries/:id/jobs with list view and
  add/edit/delete/run-now/preview actions.
- Scope-aware editor: read-only label when the chosen provider supports
  one content type, segmented control when it supports both, auto-correct
  with notification on provider change, BooksOnly / SeriesAndBooks
  hard-disabled with a "coming soon" badge.
- Persistent visible field-group → fields mapping (no hover required),
  Advanced individual-field disclosure, and a "Will write N fields"
  union preview.
- Remove the Metadata Refresh tab from LibraryModal; navigation now goes
  through LibraryActionsMenu → Scheduled Jobs.

Tests added across migration, repo, planner, handler, scheduler, API,
and frontend. OpenAPI spec regenerated.
The "Skip if synced within" field on the metadata refresh job editor
took a value in seconds, which forced users to think in seconds at the
form level (3600, 21600, 86400) for what is conceptually an hours-scale
threshold. The seconds presentation also made it harder to spot when
the default 3600 (= 1h) was masking a "freshly synced, all skipped"
outcome on a brand-new library.

The wire format and persisted value are unchanged: skipRecentlySyncedWithinS
still ships in seconds, the planner cutoff math is unchanged, and the
default of 3600s is preserved. Only the input and label change: divide
by 3600 on display, multiply by 3600 on save.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 4, 2026

Deploying codex with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6e87d56
Status: ✅  Deploy successful!
Preview URL: https://05efe317.codex-asm.pages.dev
Branch Preview URL: https://scheduled-metadata-refresh.codex-asm.pages.dev

View logs

…JSON string

The seed config sample showed plugin credentials as a one-line flow mapping,
which invites users to wrap it in quotes — turning it into a YAML string.
The encryption layer then stores that raw string instead of an object, and
the plugin receives garbage when it looks up `credentials.api_key`.

Switch the example to block-mapping form and add a comment warning against
quoting the value.
@AshDevFr AshDevFr merged commit e001e3f into main May 4, 2026
17 checks passed
@AshDevFr AshDevFr deleted the scheduled-metadata-refresh branch May 4, 2026 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant