SOLID principles make AI-generated code worse, not better.

A guide to TDD

Heavy abstraction fragments context across files and hides behavior behind interfaces. Agents do better with code that's locally readable. The conventional wisdom — "tell the agent to use SOLID, apply patterns" — produces code that looks principled and is painful to change.

Here's what actually works. I'll walk through it in Rust.

Step 1: Write the declaration first.

Like a C++ header file. Signatures, types, doc comments describing behavior. No implementation. This is the contract.

/// Token bucket rate limiter.
///
/// Tokens refill continuously at `rate` per second, capped at `burst`.
/// Thread-safe: concurrent callers never double-spend tokens.
pub struct RateLimiter { /* ... */ }

impl RateLimiter {
 /// `rate=0.0` means tokens never refill.
 /// `burst=0` means `try_acquire` always returns false.
 pub fn new(rate: f64, burst: u32) -> Self { todo!() }

 /// Consumes `n` tokens if available, returns true.
 /// Otherwise returns false and consumes nothing.
 pub fn try_acquire(&self, n: u32) -> bool { todo!() }
}

That's the spec. Types pin down what's allowed in and out. Doc comments pin down behavior, edge cases, and concurrency. An agent reading this knows exactly what to build against — and exactly what's out of scope.

The discipline of writing the header first is the same discipline that makes this work for humans: you commit to the shape before the details.

Step 2: Hand the header to an agent and ask for adversarial tests.

Prompt framing matters. "Write thorough tests" gets you happy paths. "Write tests designed to destroy this module" gets you the good stuff:

#[test]
fn rejects_when_burst_is_zero() {
 let rl = RateLimiter::new(100.0, 0);
 assert!(!rl.try_acquire(1));
}

#[test]
fn never_refills_when_rate_is_zero() {
 let rl = RateLimiter::new(0.0, 5);
 for _ in 0..5 { assert!(rl.try_acquire(1)); }
 std::thread::sleep(Duration::from_secs(1));
 assert!(!rl.try_acquire(1));
}

#[test]
fn concurrent_callers_never_double_spend() {
 let rl = Arc::new(RateLimiter::new(0.0, 1000));
 let acquired = AtomicUsize::new(0);
 thread::scope(|s| {
 for _ in 0..32 {
 s.spawn(|_| {
 for _ in 0..100 {
 if rl.try_acquire(1) { acquired.fetch_add(1, SeqCst); }
 }
 });
 }
 });
 assert_eq!(acquired.load(SeqCst), 1000); // exactly burst, no more
}

The third test is the interesting one. A naive implementation that reads tokens, checks, then writes — without holding a lock across the whole operation — fails it. That's the point. The test hunts for a real bug class the header's "never double-spend" comment promised.

Pair these with property-based tests using proptest:

proptest! {
 #[test]
 fn never_exceeds_burst(rate in 0.0..1000.0, burst in 0u32..10_000) {
 let rl = RateLimiter::new(rate, burst);
 std::thread::sleep(Duration::from_millis(100));
 let mut taken = 0;
 while rl.try_acquire(1) { taken += 1; if taken > burst as usize * 2 { break; } }
 prop_assert!(taken <= burst as usize);
 }
}

Invariants are harder to game than examples.

Step 3: Different agent, fresh context, implements against the header and tests.

The implementer sees the declaration and the tests but didn't write the tests. They can't pattern-match their way to a known-passing implementation because they're staring at tests probing for the exact bugs they'd otherwise introduce.

pub struct RateLimiter {
 rate: f64,
 burst: u32,
 state: Mutex<State>,
}

struct State { tokens: f64, last: Instant }

impl RateLimiter {
 pub fn try_acquire(&self, n: u32) -> bool {
 let mut s = self.state.lock().unwrap();
 let now = Instant::now();
 let elapsed = now.duration_since(s.last).as_secs_f64();
 s.tokens = (s.tokens + elapsed * self.rate).min(self.burst as f64);
 s.last = now;
 if s.tokens >= n as f64 { s.tokens -= n as f64; true } else { false }
 }
}

Locally readable. No traits for the sake of traits, no dependency injection, no Box<dyn TokenBucketStrategy>. The behavior is right there in twelve lines, matching the contract from step 1.

Step 4 (optional): A third agent reviews.

Reads the header, the tests, and the implementation. Asks: do the tests cover what the header promised? Is the implementation honest, or is it gaming the tests? This catches "passes tests, violates intent."

Why this works:

Agents that write both the contract and the implementation collude with themselves. They write tests that pass their implementation. Splitting the roles — header first, then tests, then implementation, each with the previous artifact as input — breaks the collusion at every step.

The header is doing extra work too. It forces you to commit to the API before you've thought about how to build it. Most bad agent code starts when the agent invents the interface and the implementation simultaneously and lets one shape the other.

The deeper point: the most reliable way to get good work from agents isn't better prompts. It's narrow, checkable steps with verification between them.

Header → adversarial tests → implementation → review.

It's TDD with role separation, anchored by an explicit contract. The contract and the role separation are doing the real work.

Join Nidheesh on Peerlist!

Join amazing folks like Nidheesh and thousands of other builders on Peerlist.