Skip to content

Paper post: arXiv:2603.26644 (Lovick, Yallup, Handley — ALCS)#9

Open
williamjameshandley wants to merge 10 commits into
mainfrom
paper/2603.26644
Open

Paper post: arXiv:2603.26644 (Lovick, Yallup, Handley — ALCS)#9
williamjameshandley wants to merge 10 commits into
mainfrom
paper/2603.26644

Conversation

@williamjameshandley

Copy link
Copy Markdown
Contributor

@tobyLovick — this PR adds a /papers/ post for arXiv:2603.26644 (ALCS), drafted by an orchestrator agent following the new paper skill baseline that lands in #6.

Two asks for you

This is both a paper post and a feedback opportunity for the skill itself. We want the next paper post to take less of the lead author's time, and the only way that gets better is if you tell us where the agent's draft fell short.

1. Fill the "How AI helped (or didn't)" section

The post body has a TODO block in that section with prompts. Edit the post to replace the TODO with your own prose. The agent deliberately did not speculate about your AI involvement — that section is the load-bearing differentiator from the arXiv abstract and only you can write it honestly. "Didn't help, this was done by hand" is a perfectly fine answer.

2. Narrate what you actually did to land this post

As you work through the edit, please write a short note (in this PR's conversation, not in the post body) describing your end-to-end workflow:

  • What did the agent get right? Which sentences/sections did you keep verbatim?
  • What did the agent get wrong? Where did its summary mislead, simplify badly, or invent? Quote the bits you had to rewrite.
  • What was missing? Did the skill ask you the wrong questions in the TODO block, or skip a question that mattered?
  • What was over-the-top? Did it produce text the post didn't need (e.g. the "What the paper does" para could be shorter / longer / different)?
  • Where did the friction sit? If you spent more than ~10 min on this PR, what consumed the time?

The point is to find the boundary between what the agent can do unsupervised and what genuinely needs you. Each round of feedback collapses that boundary further. Eventually a paper/<id> PR with a usable draft should land in your inbox the day a paper goes on arXiv, with a "How AI helped" interview prompt waiting and not much else for you to do — but only if we keep iterating on the skill from real feedback.

If something in the skill should change as a result of your feedback, propose it as a follow-up commit to this branch (or the skill's branch in #6), or just describe it and we'll edit.

Repository hygiene

Refs #3 #6

Replaces the auto-publish `script/posts/arxiv.py` workflow that produced
~88 unreviewed posts. New posts are drafted via this skill on a
`paper/<arxiv-id>` branch and merged only after human review by someone
who is not the paper's lead author.

Tone constraints baked in:
- exact paper title (no clever rewrites)
- no press-release language
- no AI-generated body text and no synthetic AI illustrations
- 200-500 words; the paper is the long-form artefact
- explicit "How AI helped (or didn't)" section is the differentiator;
  honest "didn't help" answers are acceptable

This is a baseline produced during the 5 May 2026 workshop. The first
real paper post written through it (Toby Lovick's latest ALCS paper) is
expected to drive the next iteration.

Refs #3
Removes 88 .md files from `_posts/` and 82 .png illustrations from
`assets/images/posts/`. These were generated by the legacy
`script/posts/arxiv.py` pipeline and shipped to `main` without human
review. The 5 May 2026 workshop converged on phasing out that pipeline
in favour of PR-reviewed paper posts via the `paper` skill (added
elsewhere in this PR).

Effect: `/papers/` now renders only its index intro (no posts listed)
until the first PR-reviewed post lands.

Recovery: anything worth re-publishing is recoverable from git history
(`git show <commit>:<path>`); copy back to `_posts/` only via the
`paper` skill workflow with human review.

Also gitignores the orchestrator-day `context/` scratch directory and
`.playwright-mcp/` browser cache, which were accidentally included in
the previous commit.
Toby Lovick, David Yallup, Will Handley — "Automatic Laplace Collapsed
Sampling: Scalable Marginalisation of Latent Parameters via Automatic
Differentiation". The first post produced through skills/paper, included
in this PR as a worked example of what the skill produces from a single
arXiv ID.

Per the skill's tone constraints: paper title verbatim; no synthetic AI
illustration; no press-release language; ~140 words on the science. The
"How AI helped" section is left as a TODO for Toby, because the skill
explicitly requires the lab author to fill it — it's the load-bearing
differentiator from the arXiv abstract and not the agent's voice.

Reviewer should not be Toby (the lead author). The TODO blocks need
filling before the post is "done"; merging this PR ships them publicly
as TODOs, which is intentional during the workshop iteration phase.

Refs #3
The post layout now keys off the `arxiv:` frontmatter field:
- a clickable arXiv-id badge renders near the top of the post header
- if `assets/images/papers/<arxiv-id>.png` exists, it auto-renders as a
  figure linked to the PDF, captioned "First page of arXiv:<id>"

This means paper post markdown no longer repeats the arXiv link as the
first body line — the badge handles it. Body content starts with the
human authors line, then "What the paper does", then "How AI helped".

Includes:
- _layouts/post.html: rewritten header + first-page figure block;
  MathJax updated to v3 (was a broken reference to mathjax.org/latest)
- _sass/minima/custom-styles.scss: arxiv-badge and paper-firstpage styles
- assets/images/papers/2603.26644.png: rendered first page (pdftoppm at
  150 DPI) of the ALCS paper, ~340 KB
- skills/paper/SKILL.md: documents the load-bearing `arxiv:` field,
  the auto-included first-page screenshot, and the pdftoppm command;
  drops the previous "manual cropped figure" wording
- _posts/2026-03-27-2603.26644.md: drops the redundant first body line
  pointing at the paper (now the badge does that)

Refs #3
/papers/ list: each entry is now a card-row with the first-page
thumbnail on the left, date, arXiv badge, title, authors, and an
optional excerpt on the right. Posts without an arxiv frontmatter
field still render with a placeholder thumbnail box.

Post page: the first-page figure is now a 150px floated thumbnail
beside the header rather than a full-width inline figure. Caption
is hidden (the badge already labels the source). On narrow viewports
it falls back to centered above the body.
Splitting PR #6 into two: this branch keeps the paper skill, the auto-
generated-post purge, the post-layout improvements (arXiv badge +
first-page thumbnail) and the /papers/ index card-row treatment.

Toby's ALCS paper post (`_posts/2026-03-27-2603.26644.md` and
`assets/images/papers/2603.26644.png`) lands in a follow-up PR
authored on his behalf so he can review and adjust the 'How AI helped'
section before merge.
First post produced by an agent following the skills/paper baseline,
ready for review by @tobyLovick (the lead author).

The post itself contains a TODO block for the 'How AI helped' section,
which only Toby can fill. The agent did NOT speculate about the AI
involvement in the research — that section is the load-bearing
differentiator from the arXiv abstract and must come from the lab
author.

Refs #3 #6
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@tobyLovick

tobyLovick commented May 5, 2026

Copy link
Copy Markdown

The agent did fine with the summary, it missed a key point that this is for general inference, and is posed as an ML paper testing on a set of community benchmarks (inferencegym), but otherwise was fine. I would say I spent about 15 minutes doing this, but that's because I wanted it done well and accurate, not because of friction caused by the skill. Compared to the time taking writing the paper, 15mins not so bad, so no edits suggested to the skill.

post completed and ready to be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants