Skip to content

fix(scan): match audiobook files by embedded tags when path heuristics fail#688

Open
rknall wants to merge 1 commit into
Listenarrs:canaryfrom
rknall:fix/scan-embedded-tag-matching
Open

fix(scan): match audiobook files by embedded tags when path heuristics fail#688
rknall wants to merge 1 commit into
Listenarrs:canaryfrom
rknall:fix/scan-embedded-tag-matching

Conversation

@rknall

@rknall rknall commented Jun 15, 2026

Copy link
Copy Markdown

Summary

Audiobooks whose on-disk layout does not encode the title/author in the folder or filename — e.g. AudioBookShelf-style libraries with series-creator folders and numbered episode filenames — were left with zero linked files after a scan, even when the files sat in the correct BasePath. The per-audiobook scan attributed files purely by path/name heuristics (filename/folder must contain the title, or the path must contain the author), which reject these files.

This adds an embedded-tag confirmation fallback: for candidates the path heuristics reject, the scan reads the file's embedded ID3/MP4 tags (via the bundled ffprobe, reusing PathMetadataParser) and links a file when its embedded ASIN matches the audiobook (definitive), or when both title and author agree after normalization.

Changes

Added

  • ScanBackgroundService.MatchEmbeddedTags — a pure, unit-tested decision for attributing a file to an audiobook from its embedded tags (ASIN = definitive; title+author = softer fallback).
  • Embedded-tag confirmation fallback in the per-audiobook scan, reusing PathMetadataParser.ReadEmbeddedTagsAsync + the bundled ffprobe.
  • Unit tests covering the match logic.

Changed

  • Scan tag reads run concurrently with bounded parallelism (min(4, CPUs)); the match decision is applied on a single thread.

Testing

  • New unit tests ScanBackgroundServiceTagMatchTests (9 cases: ASIN match incl. case/whitespace insensitivity, title+author when the ASIN differs, punctuation/subtitle tolerance, author-only and title-only negatives, null/empty tags) — all pass.
  • Manual: a real audiobook filed under a series-creator folder with a numbered filename (.../Elfie Donnelly/Bibi und Tina/61 - Retten die Biber/Bibi und Tina - 61 - Retten die Biber.mp3) went from 0 → 1 linked file, confirmed via the embedded ASIN.

Notes

  • The fallback only runs for candidates the existing path heuristics reject, so the common already-matching path is unchanged (no added ffprobe cost there).
  • Follow-ups intentionally out of scope: teaching PathMetadataParser to also read plain title/artist tags (it currently reads album/album_artist); and regional ASIN mismatches (file tag carries the .de ASIN while the record holds the .com ASIN), which belong with the regional-search work.

…tics fail

The per-audiobook file scan attributed candidate files using only path/name
heuristics: a file was kept only if its filename or folder contained the
audiobook title, or its path contained the author. Layouts where the
folder/filename does not carry that information (e.g. AudioBookShelf-style
series-creator folders with numbered episode filenames) had every candidate
rejected, so correctly-placed files were never linked (0 files imported).

Add an embedded-tag confirmation fallback: for candidates the path heuristics
reject, read the file's ID3/MP4 tags (reusing PathMetadataParser via the
bundled ffprobe) and attribute the file when the embedded ASIN matches the
audiobook (definitive), or when both title and author agree after
normalization. Tags are read concurrently with a bounded degree of
parallelism; the match decision is applied on a single thread.

The match logic is factored into ScanBackgroundService.MatchEmbeddedTags and
covered by unit tests.
@rknall rknall requested a review from a team June 15, 2026 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant