I Built an MCP Server That Lets AI Models Argue With Each Other

The story behind mcp-rubber-duck — from copy-pasting between AI tabs to multi-model debates

It started with too many tabs

Late last year I had four AI tabs open. ChatGPT, Claude, Gemini, and Groq. I was debugging a tricky race condition and I wanted a second opinion. Then a third. Then a fourth.

Every time, the same ritual: copy the question, switch tabs, paste, wait, read, switch back, compare. For one question. Now imagine doing this twenty times a day.

I thought: what if I could ask all of them at once, from inside my editor?

So I built a rubber duck that talks back

MCP Rubber Duck is an MCP server — it plugs into AI coding tools like Claude Desktop, Cursor, and VS Code. You ask a question, it fans it out to multiple LLMs in parallel, and you get all the answers back in one place.

The name comes from rubber duck debugging — that old technique where you explain your problem to a rubber duck on your desk and the answer reveals itself. Except these ducks actually respond. And they disagree with each other.

The features I didn't plan

The side-by-side comparison was the obvious feature. But the interesting stuff came later.

Voting. I added a duck_vote tool where models vote on the best approach with reasoning and confidence scores. Turns out "3 out of 4 models agree" is a surprisingly useful signal when you're stuck between two approaches.

Debates. This one was an accident. I was testing multi-turn conversations between models and realized: what if one model argues FOR an approach and another argues AGAINST, and then a third model judges? Oxford-style debates between AI models. It sounds ridiculous, but the disagreements surface edge cases I wouldn't have thought of.

CLI agents as ducks. Claude Code, Codex, Gemini CLI — they all run as subprocesses now. No API keys, no per-token costs. Your existing subscriptions do double duty. And for Claude specifically, this is the only way to use it as a duck — Anthropic blocks third-party SDK access to subscription credentials.

What surprised me

The most valuable output isn't when all models agree — it's when they disagree. If I ask "should I use Redis or PostgreSQL for this caching layer?" and all four ducks say Redis, that's reassuring but boring. If three say Redis and one says "actually, your access pattern suggests PostgreSQL with materialized views," that's when I learn something.

The debates feature captures this perfectly. Models are forced to argue opposing sides, which means they dig into edge cases and failure modes that none of them would mention in a standard response.

The numbers

- 136 GitHub stars
- 15 MCP tools (ask, compare, vote, debate, judge, iterate, and more)
- Works with any OpenAI-compatible API + 5 CLI agents
- Interactive HTML UIs for comparison, voting, and debate visualization
- Built in TypeScript, runs via npx or Docker

What I learned building it

MCP is underrated. The Model Context Protocol lets you extend AI tools in ways that feel native. My ducks show up as regular tools in Claude Desktop — users don't need to know there's a whole multi-LLM orchestration layer underneath.

Start with the simplest version. The first version just called one LLM. Then two. Then I added comparison. Each feature earned its place because someone (usually me) actually needed it.

Open source compounds. Every time someone opens an issue or suggests a feature, the project gets better in ways I wouldn't have thought of. CLI ducks, MCP Bridge, consensus voting — all came from real usage patterns.

Try it

npx mcp-rubber-duck

GitHub: https://github.com/nesquikm/mcp-rubber-duck

The ducks are waiting. They have opinions.

Join Mikhail on Peerlist!

Join amazing folks like Mikhail and thousands of other builders on Peerlist.