fix(scan): match audiobook files by embedded tags when path heuristics fail#688
Open
rknall wants to merge 1 commit into
Open
fix(scan): match audiobook files by embedded tags when path heuristics fail#688rknall wants to merge 1 commit into
rknall wants to merge 1 commit into
Conversation
…tics fail The per-audiobook file scan attributed candidate files using only path/name heuristics: a file was kept only if its filename or folder contained the audiobook title, or its path contained the author. Layouts where the folder/filename does not carry that information (e.g. AudioBookShelf-style series-creator folders with numbered episode filenames) had every candidate rejected, so correctly-placed files were never linked (0 files imported). Add an embedded-tag confirmation fallback: for candidates the path heuristics reject, read the file's ID3/MP4 tags (reusing PathMetadataParser via the bundled ffprobe) and attribute the file when the embedded ASIN matches the audiobook (definitive), or when both title and author agree after normalization. Tags are read concurrently with a bounded degree of parallelism; the match decision is applied on a single thread. The match logic is factored into ScanBackgroundService.MatchEmbeddedTags and covered by unit tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Audiobooks whose on-disk layout does not encode the title/author in the folder or filename — e.g. AudioBookShelf-style libraries with series-creator folders and numbered episode filenames — were left with zero linked files after a scan, even when the files sat in the correct
BasePath. The per-audiobook scan attributed files purely by path/name heuristics (filename/folder must contain the title, or the path must contain the author), which reject these files.This adds an embedded-tag confirmation fallback: for candidates the path heuristics reject, the scan reads the file's embedded ID3/MP4 tags (via the bundled ffprobe, reusing
PathMetadataParser) and links a file when its embedded ASIN matches the audiobook (definitive), or when both title and author agree after normalization.Changes
Added
ScanBackgroundService.MatchEmbeddedTags— a pure, unit-tested decision for attributing a file to an audiobook from its embedded tags (ASIN = definitive; title+author = softer fallback).PathMetadataParser.ReadEmbeddedTagsAsync+ the bundled ffprobe.Changed
min(4, CPUs)); the match decision is applied on a single thread.Testing
ScanBackgroundServiceTagMatchTests(9 cases: ASIN match incl. case/whitespace insensitivity, title+author when the ASIN differs, punctuation/subtitle tolerance, author-only and title-only negatives, null/empty tags) — all pass..../Elfie Donnelly/Bibi und Tina/61 - Retten die Biber/Bibi und Tina - 61 - Retten die Biber.mp3) went from 0 → 1 linked file, confirmed via the embedded ASIN.Notes
PathMetadataParserto also read plaintitle/artisttags (it currently readsalbum/album_artist); and regional ASIN mismatches (file tag carries the.deASIN while the record holds the.comASIN), which belong with the regional-search work.