🚢 The One Where Shipping Got More Honest

Hiya friends,

I’m on my way to San Francisco, so this one stays quick. Two GitHub features made the moment before code ships less hand-wavy: coverage on PRs, and npm staged publishing. There’s also a paper that’s been rattling around in my head and a small experiment to back it up. If you’re at Microsoft Build, scroll down to find me.

🚢 What Shipped

Code coverage in pull requests is now in public preview

Coverage finally lives where it should: on the PR itself. GitHub Code Quality now shows an aggregate percent of code covered directly on the pull request, so reviewers can see test completeness without bouncing to a third tool. Upload a Cobertura report from your existing CI workflow using the actions/upload-code-coverage action. GitHub Apps and Actions workflows need the new code-quality:write fine-grained permission to push reports. Available today on github.com for Enterprise Cloud and Team, free during the preview. Enterprise Server isn’t included yet.

Staged publishing and new install-time controls for npm

Two updates worth knowing in npm CLI 11.15.0. Staged publishing is generally available: instead of a direct npm publish going live, the tarball sits in a stage queue until a maintainer approves it with a 2FA challenge. Pair it with trusted publishing (OIDC) and lock CI workflows to stage-only, so non-interactive publishes never auto-release. Three new install flags (—allow-file, —allow-remote, —allow-directory) join —allow-git, each accepting all or none. Heads-up: the —allow-git default flips to none in npm CLI v12.

📖 What I’m Reading

“Long Live the Librarian!” by Cho, Choi, Heo, Choi, Moon, Park, and Kim (arxiv.org/abs/2605.27787)

Multi-agent SWE systems waste a lot of output tokens re-exploring the same files. The paper measures it (output tokens cost 30 to 1,000x more energy than input or cached tokens) and proposes a persistent search sub-agent that tracks repo-search history and returns short references instead of full file excerpts. On SWE-Bench Verified, that cuts per-episode GPU energy by up to 25% without losing task performance.

Worth your time if: you ship anything with more than one agent in it and you want to know where the waste lives.

🔧 What I’m Using

I read the Librarian paper and wanted dollar numbers, so I built github.com/AndreaGriffiths11/librarian-demo: a harness that runs 5 reviewer agents over a 16K-token PR on Claude Opus 4.6 through the GitHub Copilot proxy. Measured token counts across three modes: naive $1.32, prompt cache $0.45 warm, librarian-pattern digest $0.49 cold-every-call. Cache wins on warm series; librarian wins when your bot serves PRs from many devs across many repos.

✨ This Week

I’m at Microsoft Build through the week, demoing the GitHub app at the GitHub Commons pavilion. If you’re at the venue, come find me. Not in town? Register virtually at gh.io/microsoft-build and follow along. After Build I’m joining the GitHub Universe content committee to help pick this year’s speakers. Reply and let me know if you submitted a talk.

See you next week.

With gratitude,
Andrea