Plumb show_name + published_at through download cascade (closes #111)#133
Plumb show_name + published_at through download cascade (closes #111)#133
Conversation
rafacm
left a comment
There was a problem hiding this comment.
Verdict: runtime code looks broadly sound for #111, but I would fix the documentation/transcript compliance issues before merging if the repo's AGENTS.md requirements are merge-gating.
Migration
No blocking issue found. episodes/migrations/0025_episode_show_name.py:13 adds show_name as models.CharField(blank=True, default='', max_length=255) with no null=True, which matches the existing Episode metadata-field style in episodes/models.py rather than introducing nullable strings. I also ran uv run python manage.py makemigrations --check --dry-run; it reported No changes detected.
Fetch-details
The prompt update is additive and explicit about the important sources: episodes/agents/fetch_details.py:48 lists og:site_name, application-name, RSS/Atom channel title, JSON-LD isPartOf.name / partOfSeries.name, and visible publisher/show title. episodes/agents/fetch_details.py:55 also says the value must be the broadcast/podcast title, not the publisher company or URL hostname, and to leave it blank when uncertain. That directly addresses the original bug.
Persistence semantics are mostly right: episodes/fetch_details_step.py:89 reads the detail, episodes/fetch_details_step.py:90 requires a non-empty stripped string, and episodes/fetch_details_step.py:91 writes the stripped value. Empty agent output does not clear an existing show_name.
One caveat: the PR body and comments mention preserving admin edits, but show_name is not included in EpisodeAdmin.METADATA_FIELDS or any fieldset. See episodes/admin.py:292 and episodes/admin.py:355. With the current explicit fieldsets, admins cannot actually edit show_name through the Episode admin. That does not break the pipeline fix, but it makes the “admin edit” preservation story incomplete.
Aggregators
The parser behavior is reasonable and candidate-safe:
- fyyd:
episodes/podcast_aggregators/fyyd.py:86plumbspubdate, and_parse_pubdate()returnsNonefor missing/blank/malformed input without dropping the candidate (episodes/podcast_aggregators/fyyd.py:104,episodes/podcast_aggregators/fyyd.py:127). The happy path covers the documented"2024-08-30 04:00:00"shape. - iTunes:
episodes/podcast_aggregators/itunes.py:86plumbsreleaseDate;episodes/podcast_aggregators/itunes.py:117handles trailingZvia+00:00beforefromisoformat(). - podcastindex:
episodes/podcast_aggregators/podcastindex.py:111plumbsdatePublished;episodes/podcast_aggregators/podcastindex.py:135accepts numeric values andepisodes/podcast_aggregators/podcastindex.py:141accepts numeric strings. It correctly usesdatetime.fromtimestamp(..., tz=timezone.utc)atepisodes/podcast_aggregators/podcastindex.py:137, avoiding local-time drift.
Timezone edge coverage is thin. iTunes currently returns the date in the timestamp's own offset (datetime.fromisoformat(...).date() at episodes/podcast_aggregators/itunes.py:117), while podcastindex normalizes epoch seconds to UTC. The ±1 day prompt tolerance probably absorbs this, but the tests do not lock down the intended semantics for offset-bearing iTunes/fyyd values near midnight.
Agent Prompt
The prompt says exactly what the PR claims: episodes/agents/download.py:62 treats Show values containing . and no spaces as hostnames, episodes/agents/download.py:65 switches to (title, published_at), and episodes/agents/download.py:69 defines the ±1 day match. episodes/agents/download.py:73 gives a fallback when Published is unknown. The check is prompt-side only, not deterministic code, which is acceptable for this layer but should be recognized as LLM behavior rather than enforceable matching logic.
Backwards compatibility mostly holds: blank Episode.show_name falls through to URL host in episodes/downloader.py:106, and _get_system_prompt() renders non-empty hostnames into the prompt. One subtle limitation is that the index lookup still queries aggregators with show_name + title; fyyd/iTunes build that term at episodes/podcast_aggregators/fyyd.py:44 and episodes/podcast_aggregators/itunes.py:46. For the canonical ARD case, the issue says fyyd still returned the right candidate, so this PR should address the observed failure. For existing no-backfill rows in general, hostname pollution can still reduce index recall before the LLM ever gets candidates.
Test Coverage
Layer 2 has useful parser coverage for happy/missing/malformed cases in episodes/tests/test_podcast_aggregators.py:71, episodes/tests/test_podcast_aggregators.py:90, episodes/tests/test_podcast_aggregators.py:105, and equivalents for iTunes/podcastindex. Layer 1 has model/default and _show_name cascade coverage in episodes/tests/test_models.py:32 and episodes/tests/test_download.py:181.
Gaps I would add but would not block on runtime:
- No direct fetch-details persistence test for
show_name: set a prior value, run_apply_details/fetch_episode_detailswith blankdetails.show_name, and assert the prior value survives; run with whitespace-padded non-empty value and assert it is stripped. - No prompt rendering test for
Published: YYYY-MM-DDvs(unknown)or the hostname instructions. - No timezone tests for iTunes/fyyd offset timestamps near a date boundary.
I attempted a focused local test run, but the local Postgres test DB disappeared mid-run (test_ragtime / episodes_episode missing). I did verify makemigrations --check --dry-run successfully.
Documentation
The PR includes the expected plan, feature doc, planning session, implementation session, and changelog entry. The PR body also has an actionable manual verification checklist for the canonical ARD URL.
Two documentation problems:
doc/README.mdis stale after this change.doc/README.md:39lists theEpisodeDetails.detailsfields and omitsshow_name.doc/README.md:55says Episode columns are overwritten directly by authoritative output with “no empty-field-only merge”; that is no longer true forshow_name, which is explicitly additive-only. README.md's summary table is still broadly accurate, but the detailed docs should be updated.- The session transcripts do not satisfy the AGENTS.md verbatim-user-message requirement. The planning transcript says its user messages are “summarized for brevity” at
doc/sessions/2026-05-04-download-show-name-fix-planning-session.md:30anddoc/sessions/2026-05-04-download-show-name-fix-planning-session.md:68; the implementation transcript does the same atdoc/sessions/2026-05-04-download-show-name-fix-implementation-session.md:26. The implementation note is honest, which is better than fabricating text, but the prompt for this review specifically asked to verify the parent prompt was reproduced verbatim rather than summarized. It was not.
Other Findings
No unused imports or obvious candidate-dropping regressions found in the touched code. The fyyd list comprehension still calls _candidate(item) twice at episodes/podcast_aggregators/fyyd.py:61, which predates or at least is adjacent to this change; now it also parses/logs twice for malformed dates. Not a blocker, but easy to clean up later.
Suggested follow-ups
- Update
doc/README.mdforshow_nameand the additive-only exception. - Replace summarized transcript user sections with verbatim parent/user prompts where recoverable, or explicitly get maintainer acceptance for
unavailable/summarized transcripts. - Add focused tests for
show_namepersistence semantics and prompt rendering. - Consider omitting hostname-shaped
show_namefrom aggregator search terms in code, so existing no-backfill rows don't depend on index search tolerating hostnames. - Add
show_nameto the Episode admin metadata fieldset if manual/admin correction is intended.
— Codex review (independent second-opinion pass)
AGENTS.md's transcript policy assumed the human-with-Claude workflow where every session has real `### User` messages. Conductor's parallel- launch model breaks that assumption: the implementation session has no direct user, only an agent-orchestrated launching prompt. Add a new "Agent-orchestrated sessions" subsection to AGENTS.md and a matching note to the Feature PR Documentation Bundle AI check. The new convention: use `### Parent agent (orchestrator)` headings instead of `### User`, reproduce the parent-agent prompt verbatim, and declare the session as agent-orchestrated at the top of `## Detailed conversation`. Same verbatim rule applies — summarized parent prompts are still rejected. This eliminates a recurring false-negative on the docs check that was raised on PRs #129, #130, #131, #133. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5b4e033 to
6cc9d06
Compare
AI Checks summary❌ 3 fail · ✅ 4 pass · ⏭️ 0 skip ❌ Branching & PR Strategy
PR is from main branch, violating workflow rules. DetailsThe PR was created from branch ❌ Pipeline Step Documentation Sync
Documentation not updated to reflect pipeline changes. DetailsThe diff adds a new field ❌ RAGTIME_* Env Var Sync
The diff violates the RAGTIME_ Env Var Sync rule.* DetailsThe PR introduces a new runtime configuration variable Other checks (4 passing · 0 skipped)Show details
|
Merging with
|
Closes #111. Three coordinated layers fix the canonical ARD Sounds episode failure where the download agent rejected every fyyd candidate: 1. Real show_name source of truth. New Episode.show_name CharField (migration 0025), extracted by fetch_details from og:site_name, application-name, RSS <channel><title>, JSON-LD isPartOf.name, or the visible publisher heading. Persisted only when non-empty. downloader._show_name now prefers it, with URL host as a defense-in-depth fallback. 2. published_at as a tiebreaker. EpisodeCandidate.published_at parsed by each aggregator (fyyd pubdate, iTunes releaseDate, podcastindex datePublished epoch). Surfaced through DownloadDeps and IndexCandidate. 3. Hostname-aware download agent prompt. When show_name looks like a hostname (contains . and no spaces), the agent switches to a (title, published_at ±1 day) match instead of requiring an exact show_name string match. Real broadcast titles keep the existing path. Tests: 384 passing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6cc9d06 to
04f5ce4
Compare
Response to codex review (review 4219890521)Mapping each follow-up to the current PR head (
Other notes from the review
Manual ARD verificationIn progress against the rebased branch. Will tick the PR-body checklist as items complete. Known noise on this PR
|
Three failing AI Checks — all model hallucinations (tracked in #137)CI returned with three AI Check failures on the rebased branch (
Diff evidence for
|
Closes #111.
Summary
Tightens the download agent's index-candidate matching for non-English shows whose extracted show name was previously absent. Three coordinated layers, single PR:
Real
show_namesource of truth. NewEpisode.show_nameCharField(migration0025_episode_show_name). The fetch_details agent extracts it from<meta property=\"og:site_name\">,<meta name=\"application-name\">, RSS<channel><title>, JSON-LDisPartOf.name/partOfSeries.name, or the visible publisher heading. Persisted only when non-empty so a partial re-run never wipes a previously-good value or admin edit.episodes/downloader._show_namenow prefersepisode.show_name, falling back to URL host as defense-in-depth.published_atas a tiebreaker.EpisodeCandidate.published_atparsed by each aggregator (fyydpubdate, iTunesreleaseDate, podcastindexdatePublishedepoch seconds). Missing or malformed values log a warning and returnNonewithout dropping the candidate. Surfaced throughDownloadDeps.published_atandIndexCandidate.published_at.Hostname-aware download agent prompt. When
Showcontains a.and no spaces (i.e. it looks like a hostname), the agent now switches to a(title, published_at)match with ±1 day tolerance instead of demanding an exactshow_namestring match. Real broadcast titles keep the existing show-plus-title path.No backfill — pre-prod data freedom per
feedback_reembed_ok_preprod.md.Test plan
uv run python manage.py test— 384 passing.uv run python manage.py makemigrations --check --dry-run— no further migrations.Manual verification (post-merge or pre-merge with reviewer)
Requires a live ASGI worker, configured provider keys, and the canonical ARD episode:
https://www.ardsounds.de/episode/urn:ard:episode:fdcf93eef8395b35/via the admin "Submit episode" form.Episode.show_name = \"Zeitzeichen\"(or similar).source=\"fyyd\".dbos workflow steps <id>showsdownload_stepsucceeded.Documentation
doc/plans/2026-05-04-download-show-name-fix.mddoc/features/2026-05-04-download-show-name-fix.mddoc/sessions/2026-05-04-download-show-name-fix-planning-session.mddoc/sessions/2026-05-04-download-show-name-fix-implementation-session.md## 2026-05-04🤖 Generated with Claude Code