feat: per-library scheduled metadata refresh jobs#9
Merged
Conversation
Lay the groundwork for scheduled, scoped, per-library metadata refreshes by adding a `metadata_refresh_config` JSON column to the `libraries` table, a strongly typed `MetadataRefreshConfig` with safe defaults (disabled, "use existing source IDs only", default field groups ratings/status/counts), and library-repository accessors that always return a usable config (default on NULL or parse error, with the parse error logged). The schema reserves a `per_provider_overrides` slot for future per-provider field allowlists so the eventual override feature won't need a migration. A `MetadataRefreshConfigPatch` and `merge_partial` helper are included for the upcoming PATCH endpoint; they are gated with `#[allow(dead_code)]` until that endpoint lands. Tests cover serde round-trips, partial JSON, parse-helper edge cases, PATCH merge semantics including null clears, and end-to-end repository read/write/overwrite plus default-on-NULL and invalid-JSON fallback.
Wire the work behind the scheduled metadata refresh: a new RefreshLibraryMetadata task variant, a stateless RefreshPlanner that decides which (series, provider) pairs to touch, and a worker handler that fetches per-series metadata from plugins and applies it through the existing MetadataApplier. The planner resolves "plugin:<name>" provider strings to enabled plugins (unresolved entries surface as plan-level skips), batches external-id lookups, and emits typed skip reasons (NoExternalId, RecentlySynced, ProviderUnavailable). The handler walks the plan sequentially with a per-pair timeout, isolates per-series errors via a typed PairError, bumps series_external_ids.last_synced_at after each successful apply so the recency guard works on the next run, and returns a stable RunSummary JSON. TaskProgressEvent updates are emitted per pair so the frontend can show live progress. The handler is registered in the worker alongside PluginAutoMatch and inherits the optional ThumbnailService for cover updates. Loose-mode re-matching is intentionally deferred until a MatchingStrategy enum lands; the current behavior counts those pairs as skipped_no_external_id. Tests cover the planner's filter combinations and the handler's short-circuit, error, and skip paths.
…etadata apply
Introduce the user-facing field-group taxonomy and the dry-run plumbing
the scheduled per-library refresh needs to surface "what would change?"
to admins before they commit to a schedule.
- New `field_groups` module with a closed `FieldGroup` enum (Identifiers,
Descriptive, Status, Counts, Ratings, Cover, Tags, Genres, AgeRating,
Classification, Publisher, ExternalRefs) and a `fields_for_groups`
resolver that expands group names to the camelCase fields the applier
recognises. A self-test pins every returned field to a real
`should_apply_field` call site so the mapping cannot drift silently.
- `MetadataApplier::apply` gains `ApplyOptions::dry_run` and a
`MetadataApplyResult::dry_run_report` carrying `FieldChange`
entries. Every write site is gated: dry-run records the prospective
change instead of touching the DB. Locks and permission denials still
flow through `skipped_fields` via the same code path, so the report
stays focused on writes that would actually happen.
- HTTP series apply endpoint accepts `dryRun` in the request body and
returns `dryRunReport` in the response (omitted on real applies).
External-ID upsert is skipped on dry-run so a preview never mutates
state. Book apply explicitly rejects `dryRun=true` with 400 because
`BookMetadataApplier` does not honour the flag yet.
- `refresh_planner::fields_filter_from_config` now delegates to the
field-group resolver, so a config like
`{ field_groups: ["ratings","status"] }` correctly produces
`{"rating","externalRatings","status","year"}` for the applier.
Tests cover the field-group mapping, dry-run write gating with row-state
snapshots, the new apply API contract, and the planner's group
expansion. Existing apply call sites are updated to set
`dry_run: false` so the default behaviour is unchanged.
…efresh
Introduce an explicit `MatchingStrategy::{ExistingExternalIdOnly,
AllowReMatch}` enum on `ApplyOptions` so the scheduled refresh can opt into
"don't re-match — only refresh series that already have a stored provider
ID" behavior, while every existing manual-apply call site keeps its
historical "search if no ID" semantics via the `AllowReMatch` default.
The `RefreshLibraryMetadataHandler` now picks the strategy from
`metadata_refresh_config.existing_source_ids_only`. Strict mode keeps the
planner's plan-time gating; loose mode invokes a new `rematch_external_id`
helper that calls `metadata/series/match`, then feeds the returned
external ID into the existing `metadata/series/get` + `MetadataApplier`
flow. A match miss is surfaced as a new `no_match_candidate` bucket on
`RunSummary` so it stays distinguishable from the strict-mode
`no_external_id` skip.
A new `RefreshSkipReason` taxonomy module provides a stable, serde-friendly
superset of the planner's narrower skip enum, with a `From<PlannerSkipReason>`
lift so planner reasons can be widened into the public taxonomy that the
upcoming HTTP dry-run endpoint will surface.
Manual-apply call sites in the v1 plugin handlers and `plugin_auto_match`
were converted to `..Default::default()` so they pick up the new field
without changing behavior. Tests cover the default-strategy guard, the
loose-mode re-match branch, the skip-reason identifiers and serde shape,
and the planner-to-public lift.
Register a cron entry per library whose `metadata_refresh_config.enabled = true`,
firing `RefreshLibraryMetadata { library_id }` on each tick. The new loader
runs from `start()` so `reload_schedules()` rebuilds refresh entries alongside
the scan ones with no extra wiring on the API side.
The cron-job closure runs a skip-if-already-running guard before enqueuing:
if a `pending` or `processing` `refresh_library_metadata` task already exists
for the same library, the firing is dropped with a warn log. This is belt
and suspenders on top of `TaskRepository::enqueue` deduplication so operators
see an explicit log line when a cron tick fires while the previous run is
still draining.
Per-library config errors (invalid cron, invalid timezone, missing library
row) log a warning and skip just that library instead of aborting scheduler
boot. Per-library timezone overrides fall back to the server default when
parsing fails, matching the existing `add_library_schedule` behavior.
Adds integration tests for enabled/disabled configs, per-library timezone,
invalid cron tolerance, multi-library coexistence, and reload pickup, plus
unit tests on the in-flight detector.
Expose the per-library scheduled metadata refresh to HTTP under a new
`Metadata Refresh` OpenAPI tag:
- GET/PATCH /api/v1/libraries/{id}/metadata-refresh — read and partially
update the config; PATCH validates cron, timezone, field groups, and
providers (disabled plugins accepted, missing rejected) and reloads the
scheduler so changes apply without a restart.
- POST /api/v1/libraries/{id}/metadata-refresh/run-now — enqueue the task
immediately; shares the scheduler's skip-if-already-running guard and
returns 409 when a refresh is in flight.
- POST /api/v1/libraries/{id}/metadata-refresh/dry-run — synchronous
preview bounded by sampleSize (default 5, cap 20) with a per-pair plugin
timeout. Plans against an optional configOverride, fetches via
metadata/series/get (or match in loose mode), and runs MetadataApplier
with dry_run=true so no DB writes happen.
- GET /api/v1/metadata-refresh/field-groups — static catalog of the 12
field groups with id, label, and concrete camelCase fields, derived from
FieldGroup::all() so adding a variant extends the API automatically.
The PATCH DTO uses PatchValue for nullable fields (timezone,
per_provider_overrides) so absent leaves the value untouched, explicit
null clears, and a value sets. Per-pair plugin failures during dry-run
surface as a single skip-row in the sample so the UI can render
"this series couldn't be previewed" instead of silently dropping it.
Includes integration and unit tests covering happy paths, validation
failures, permission gating, the run-now conflict guard, and the
field-group catalog shape.
Surface the scheduled metadata-refresh feature in the library settings flow. A new "Metadata Refresh" tab in the edit-library modal lets admins configure the schedule (enable + cron preset or custom expression + timezone), choose which field groups to refresh, pick metadata providers, and tune safety options (existing-source-IDs-only, recency cutoff, max concurrency). A "Preview changes" action drives a synchronous dry-run and renders the resulting deltas as `field: before -> after` rows, with locked or otherwise-skipped fields shown alongside their reason. Unresolved providers are highlighted so typos and disabled plugins are easy to spot. "Run now" enqueues the refresh task and reuses the existing task-progress SSE channel to render an inline progress alert; the button stays disabled while a refresh is in flight to avoid piling work onto an active run. Includes a typed API client, TanStack Query hooks (config get/update, run-now, dry-run, field-groups), and component/hook tests covering hydration, save, run-now, dry-run rendering, and edge states.
…ed refresh Lets users override the library-wide field group selection on a per-provider basis, e.g. trust AniList for ratings only while MangaBaka refreshes status and counts under the same schedule. Backend: - New `fields_filter_for_provider(config, provider)` resolver returns the override's expanded field set when one exists for the given `"plugin:<name>"` key, else falls back to the library-wide filter. - `RefreshLibraryMetadataHandler` and the dry-run HTTP handler now compute the field filter per `(series, plugin)` pair instead of once globally. - PATCH and dry-run config validation extended: override map keys must resolve to an installed plugin; each override's field_groups must be a known FieldGroup. Errors mention both the provider and the bad input so the UI can highlight the right row. Frontend: - Per-provider override UI in MetadataRefreshSettings: each selected provider gets an indented expandable card with a Custom / Inherits-library badge, a field-group MultiSelect, and a "Reset to inherit" button. - Save and Preview filter the override map to currently-selected providers and send `null` when empty so the server can clear stale entries on persist. Tests added at the planner, API, and component levels. Pre-existing dead-code warnings on the branch were also annotated so clippy --all-targets -D warnings stays clean.
Earlier phases of the scheduled-refresh feature added speculative plumbing in anticipation of later wiring. The later phases took different shapes; the leftover pieces were never wired and clutter the type surface. Removed: - `ApplyOptions::matching_strategy` field. The applier never branched on it; the handlers correctly resolve the strategy before calling `apply`. Stripping the field also removes four `..Default::default()` spreads and the misleading "informational here" comments at the manual-apply call sites. - `field_groups::group_for_field`. Added for a UI label feature; the frontend computes that mapping client-side from the field-groups endpoint and never called this helper. - `RefreshSkipReason` enum and its module. The dry-run DTO went with raw `reason: String` and the task handler emits its own JSON, so the "stable public taxonomy" had no consumer. Also removed the now-dead re-exports `MetadataRefreshConfigPatch`, `PlannerSkipReason`, `SkippedRefresh`, `fields_for_groups`, and `fields_filter_from_config`, along with the `#[allow(unused_imports)]` and `#[allow(dead_code)]` annotations that were keeping them quiet. No behavior change; clippy --all-targets -D warnings stays clean without the allowlist suppressions.
…g with N typed jobs
Pivot the scheduled metadata refresh from "one config per library" to a
generic library_jobs table that supports N independent jobs per library,
each scoped to a single provider with its own cron, field selection, and
safety options. The table is type-discriminated (`type` + `config` JSON)
to leave room for future job types (scan, cleanup, indexing) without
schema changes.
Backend
- Replace the metadata_refresh_config column on libraries with a new
library_jobs table (id, library_id, type, name, enabled, cron_schedule,
timezone, config, last_run_*) with FK cascade and indexes on
library_id / enabled / type. The original migration is rewritten in
place; the prior config column never shipped.
- Add a typed LibraryJobConfig discriminated enum + MetadataRefreshJobConfig
payload with a RefreshScope (SeriesOnly, BooksOnly, SeriesAndBooks).
Only SeriesOnly is honoured at runtime; the other variants are reserved
schema fields rejected by validator and handler.
- Rewrite RefreshPlanner for single-provider plans with a typed PlanFailure
for plugin resolution errors.
- Rewrite the RefreshLibraryMetadata task to take job_id, decode the row's
config, runtime-check provider capabilities, and write back
last_run_at / last_run_status / last_run_message on completion.
- Replace the per-library scheduler loader with a per-job loader and
switch the skip-if-already-running guard to query tasks.params->job_id.
- Replace the /libraries/{id}/metadata-refresh endpoints with generic
CRUD at /libraries/{id}/jobs (oneOf-tagged config), plus per-job
run-now and dry-run endpoints. The field-groups catalog moves to
/library-jobs/metadata-refresh/field-groups.
- Surface plugin capabilities on /plugins/actions so the editor can
filter scope options per provider.
Frontend
- New LibraryJobsPage at /libraries/:id/jobs with list view and
add/edit/delete/run-now/preview actions.
- Scope-aware editor: read-only label when the chosen provider supports
one content type, segmented control when it supports both, auto-correct
with notification on provider change, BooksOnly / SeriesAndBooks
hard-disabled with a "coming soon" badge.
- Persistent visible field-group → fields mapping (no hover required),
Advanced individual-field disclosure, and a "Will write N fields"
union preview.
- Remove the Metadata Refresh tab from LibraryModal; navigation now goes
through LibraryActionsMenu → Scheduled Jobs.
Tests added across migration, repo, planner, handler, scheduler, API,
and frontend. OpenAPI spec regenerated.
The "Skip if synced within" field on the metadata refresh job editor took a value in seconds, which forced users to think in seconds at the form level (3600, 21600, 86400) for what is conceptually an hours-scale threshold. The seconds presentation also made it harder to spot when the default 3600 (= 1h) was masking a "freshly synced, all skipped" outcome on a brand-new library. The wire format and persisted value are unchanged: skipRecentlySyncedWithinS still ships in seconds, the planner cutoff math is unchanged, and the default of 3600s is preserved. Only the input and label change: divide by 3600 on display, multiply by 3600 on save.
Deploying codex with
|
| Latest commit: |
6e87d56
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://05efe317.codex-asm.pages.dev |
| Branch Preview URL: | https://scheduled-metadata-refresh.codex-asm.pages.dev |
…JSON string The seed config sample showed plugin credentials as a one-line flow mapping, which invites users to wrap it in quotes — turning it into a YAML string. The encryption layer then stores that raw string instead of an object, and the plugin receives garbage when it looks up `credentials.api_key`. Switch the example to block-mapping form and add a comment warning against quoting the value.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
library_jobsentity, repository, task handler, and scheduler wiring, replacing the earlier single per-library refresh config with N typed jobs per library./api/v1/libraries/.../jobs, and a new web UI for managing jobs, editing refresh settings, and previewing dry-run results.Details
library_jobstable (m20260503_000071_add_metadata_refresh_config) with entity (library_jobs.rs) and repository (library_jobs.rs).refresh_library_metadatahandler (refresh_library_metadata.rs) and per-library scheduling in scheduler/mod.rs; supportsMatchingStrategyincluding a re-match path.