.
AI writes more and more of our code. So we need a way to be sure it works. The code and the product. Today we make the agent prove it.
What to have ready.
Install Claude in Chrome.
This is a Chrome plugin that lets the agent drive a real browser. It can open your app, click, type, and read the screen, the console, and the network. We use it this afternoon to test the product the way a user sees it.
Install the Chrome extension, then connect it to Claude Code. Do this before the afternoon lab.
Where we are in the workflow.
An agent works in a loop.
Who checks it works?
Two kinds of quality.
& ship
Every part can pass. The product can still break where the parts join. So we check both: the parts and the whole.
.
Why test-first fits the loop.
Write the test first.
How we run TDD.
Spec the feature, then build it test-first.
- Start from the working to-do appit already runs and passes its tests
- First, grill-with-docs the featurespec “let users edit a to-do” before any code
- Then run /tdd from that specred, then green, one slice at a time
- Read the test, not just the checkdoes it match the spec we just wrote?
The agent can fake a passing test.
Spec it, then build it test-first.
Two steps, in order. First write the spec with grill-with-docs. Then let /tdd build it, one slice at a time.
.
Why unit tests aren’t enough.
Code review tells you what changed. Only the browser shows what the page does when a real user loads it.
The agent drives a real browser.
A bug only the browser catches.
Now prove the edit screen in the browser.
- Build the edit screenthe form for the rules track one already proved
- Give it a rubricclick the pencil, change the title, save, and watch the row update
- Let it drive the real appthrough Claude in Chrome, the way a user would
- Save the passing flow as a testa Playwright run that re-checks it
Prove your feature in the browser.
Take the feature whose rules you built in Exercise A. Build a simple screen for it, then run /browser-testing to prove it in a real browser.
- Did it find anything that broke in the real UI?
- What could it not test on its own?
.
What we added today.
You have two skills the team owns now: /tdd that makes the agent prove its code, and /browser-testing that proves the product. Both run inside the agent's loop, on every task.
The agent does the checking now. You still own the judgment: what to build, and whether it's right.