Day 6 of 2026 Building!

Have you ever asked ChatGPT to write a report, read the output, and thought, "Eh, that's kind of generic"?
We've all been there. The standard "one-shot" approach to AI - prompt in, answer out - mimics a human blurting out the first thing that comes to mind. But real intelligence isn't just about generation; it's about refinement.
Today, I'm going to show you how to build a Self-Recursive Agent using Node.js, local LLMs (Ollama), and Langfuse. This agent doesn't just write; it crtiques its own work, fixes mistakes, and improves recursively until it meets a high quality standard.
Why Recursion > Fine-Tuning
If you want better model outputs, the industry standard advice is often "Fine-tune a model on your data!"
Fine-tuning is powerful, but it's also:
Expensive: Requires GPUs and compute.
Rigid: The model only learns what you show it.
Hard to Debug: If it fails, you don't know why.
Self-Recursion is the alternative. Instead of training a smarter model, you build a smarter workflow. You create a loop where "dumber" models can outperform "smarter" ones simply by having the chance to correct their own mistakes.
The Architecture: A Digital Newsroom
We simulate a team of experts working together:
The Planner (Llama 3.2): breaks the user's request into a JSON list of tasks.
The Executor (llama3.1:8b-instruct-q4_K_M): writes the content.
The Critic (gemma3:12b): reads the draft and critiques it against a strict rubric.
The Judge (gpt-oss:20b): assigns a numerical score (0-1).
If the score is below 0.9, the Critic's feedback is passed back to the Executor, and the loop flows again.
The Code: The "Thinking" Loop
The magic happens inside a simple while loop in Node.js. Here is the simplified logic from our index.js:
let score = 0.0;
let attempt = 0;
// Keep refining until we hit 90% quality
while (attempt < MAX_RETRIES && score < 0.9) {
// 1. Executor writes/rewrites
const report = await executorModel.generate({
tasks: plan,
previousDraft: currentReport,
criticFeedback: feedback // <--- The secret sauce
});
// 2. Critic reviews the work
const critique = await criticModel.chat({
prompt: `Analyze this draft for logic gaps: ${report}`
});
// 3. Judge evaluates
const judgment = await judgeModel.evaluate(report, critique);
score = judgment.score;
// Send the score to Langfuse for tracking
await langfuse.score({
name: "quality-gate",
value: score,
comment: judgment.reasoning
});
}Notice how we feed criticFeedback back into the Executor? That is the recursion. The model effectively "learns" from the critique in real-time, within the context window.
Observability: Seeing the Brain Work 🧠
When you have loops inside loops, console.log doesn't cut it. You need to see the trace of execution.
This is where Langfuse comes in. It's an open-source LLM engineering platform that lets us visualize exactly what our agent is doing.
In our code, we wrap the entire process in a trace:
const trace = langfuse.trace({ name: "Recursive-Research-Agent" });
// ... later ...
const loopSpan = trace.span({ name: `Refinement_Loop_V${attempt}` });The result? A beautiful graph that looks like this:
Trace Start
Planner (Output: "Task List")
Loop 1 (Score: 0.6)
Executor → "Draft 1"
Critic → "You missed section X"
Loop 2 (Score: 0.8)
Executor → "Draft 2 (Fixed X)"
Critic → "Tone is too casual"
Loop 3 (Score: 0.95 - PASS) ✅
Without Langfuse, you're flying blind. With it, you can pinpoint exactly which model (Critic or Executor) is dropping the ball and adjust your prompts accordingly.
You don't need a massive cluster of H100s to build powerful AI. By chaining smaller, local models together in a self-correcting loop and monitoring them with tools like Langfuse, you can achieve results that rival much larger logic engines.
The code is open source - go clone it, fire up Ollama, and watch your agent teach itself to be better! 🚀
0
3
0