Main Branch

Fundamentals first, always

Articles

Adversarial Code Review with GitHub Models

AI agents write code. Some of it works. When you ask them to check their own work, they rationalize the bugs instead of catching them. So I built a second agent that watches the first.

Andrea Griffiths 5 min read 🌐 Read in Spanish 🌐 阅读中文版
Proof Agent AI Agents Code Review GitHub Actions GitHub Models

Leer en español → · 阅读中文版 →

Proof Agent FAIL verdict

AI agents write code. Some of it works. Some of it doesn’t. And when you ask them to check their own work? They rationalize the bugs instead of catching them.

Self-verification is a lie. The same model that made the mistake will defend it.

Proof Agent enforces a simple rule: the worker and the verifier are always separate. No agent verifies its own code. Ever.

This is adversarial code review — and it catches real bugs.

Try it yourself →

The Problem with Self-Verification

Here’s what happens when an AI agent reviews its own work:

Agent writes code:

function getUserById(id) {
  return db.query("SELECT * FROM users WHERE id = " + id);
}

You ask: “Is this safe?”

Agent responds: “Yes! This retrieves the user by ID. The query is simple and efficient.”

Reality: it’s vulnerable to SQL injection. An attacker can pass 1 OR 1=1 and dump the entire database.

The agent didn’t catch it because it wrote the code. It already rationalized the design choice. Asking it to verify is asking it to admit a mistake — and models are trained to sound confident, not to audit themselves.

How Adversarial Review Works

Proof Agent separates the worker from the verifier:

  1. Worker agent writes the code (GitHub Copilot, Claude, whatever)
  2. Verifier agent (separate LLM instance) runs an adversarial review
  3. Verifier checks for:
    • Correctness — Does it do what was requested?
    • Security — Vulnerabilities, exposed secrets, permission issues?
    • Bugs — Edge cases, error handling, regressions?
    • Facts — Are claims, version numbers, URLs verifiable?

The verifier must run commands to gather evidence. No PASS without proof.

Verdicts:

  • PASS — All checks passed with evidence
  • FAIL — Issues found. Specifics reported. Retry up to 3 times.
  • PARTIAL — Some checks passed, others couldn’t be verified

Real Example: Catching SQL Injection

I tested Proof Agent with intentionally vulnerable code: proof-agent-demo/pull/1

The first commit added a single file. Proof Agent ran but returned SKIP — only 1 file changed, no sensitive patterns matched.

The second commit added two more files, bringing the total to 3. That crossed the threshold. Proof Agent ran a full adversarial review and returned FAIL.

Files in the PR:

  • add-user-endpoint.js — SQL injection, plain-text passwords, error leakage
  • database-utils.js — Hardcoded database password, multiple SQL injection vulnerabilities
  • api-client.js — Hardcoded API keys, disabled SSL verification

Here’s a subset of Proof Agent’s findings (the full output flagged additional issues in database-utils.js):

❌ FAIL

Critical issues found. DO NOT MERGE.

Issues:

• File: src/add-user-endpoint.js, Line: 25
  Severity: CRITICAL
  Issue: SQL injection vulnerability - user input (username, email)
         directly used in an SQL statement without sanitization.
  Code: const user = await queryBuilder.updateUser('new', username, email);
  Fix: Use parameterized queries to prevent SQL injection.

• File: src/add-user-endpoint.js, Line: 37
  Severity: CRITICAL
  Issue: Plain-text password storage - password is stored without
         any hashing or encryption.
  Code: auth.registerUser(username, password);
  Fix: Use a secure hashing algorithm (e.g., bcrypt) before storage.

• File: src/api-client.js, Line: 6
  Severity: CRITICAL
  Issue: Hardcoded API credentials - API key and secret key
         exposed in source code.
  Code: this.apiKey = 'sk-first-to-find-this-wins-a-mona';
  Fix: Use environment variables or a secrets management system.

• File: src/api-client.js, Line: 24
  Severity: CRITICAL
  Issue: SSL verification disabled (rejectUnauthorized: false),
         vulnerable to man-in-the-middle attacks.
  Fix: Remove or set to true to enforce SSL verification.

The workflow blocked the PR merge until these issues are fixed.

Failed check blocking merge

The worker agent didn’t catch any of this. The verifier did — because it was looking for problems, not defending the code.

How to Add It to Your Repo

Zero setup. No API keys. No external services.

Add this workflow to .github/workflows/proof-agent.yml:

name: Proof Agent

on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write
  models: read

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: AndreaGriffiths11/proof-agent@v1.0.2
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          base-ref: origin/main
          block-on-fail: true
          post-comment: true

That’s it. Every PR now gets adversarial verification.

Uses GitHub Models API (free tier). The built-in GITHUB_TOKEN has models: read permission — no PAT needed.

When It Runs

Auto-verifies when:

  • 3 or more non-excluded files changed
  • Any file matches a sensitive pattern: **/*auth*, **/*secret*, **/*permission*, **/Dockerfile, **/*.env*

Skips when:

  • Fewer than 3 files changed and none match sensitive patterns
  • Files matching **/.gitignore are excluded from the count

You can customize all thresholds, patterns, and retry behavior in proof-agent.yaml.

Why Adversarial Review Matters

AI agents are getting more autonomous. They write code, deploy infrastructure, and make architectural decisions. If they can’t be trusted to verify their own work, we need a second agent watching.

Proof Agent is the first step toward multi-agent safety:

  • One agent proposes
  • Another agent verifies
  • Neither trusts the other

This is how we’ll ship AI-generated code to production — with confidence, not blind trust.

Try It

Install it. Break something. See if it catches you.

About the Author: Andrea Griffiths is a Senior Developer Advocate at GitHub, where she helps engineering teams adopt and scale developer technologies. She's passionate about making technical concepts accessible—to both humans and AI agents. Connect with her on LinkedIn, GitHub, or Twitter/X. · Read in Spanish · 阅读中文版