Chris McKenzie

Jul 14, 2025 • 8 min read

Control the UI with Tool calling and Browser AI

How local LLMs can take real-time control of your UI — no servers, no round trips

Control the UI with Tool calling and Browser AI

Most AI apps today treat the frontend like a dumb terminal. The model lives on a server somewhere, shuttling tool calls to your backend and waiting on glue code to push changes into the UI — slow, brittle, and disconnected.

Browser.AI flips that model.

It runs locally, inside the browser. No server. No round trips. The LLM sees context, decides what to do, and triggers a function on the spot. A menu opens. A modal appears. A product drops into your cart. Direct control. Native speed.

This post walks through a real demo using window.ai and a prototype I built called Browser.AI. No hype—just code, running locally, reshaping how frontend and AI can work together.

The Shift to Local Tool-Calling

Tool calling isn’t new. OpenAI, Anthropic, and others all let you define functions the model can call with structured arguments. It’s how agents do real-world tasks.

But almost all of that happens on the server side. You send messages to a hosted model, maybe pipe in some RAG data, and wait for it to return a function call. Then your backend does the thing.

That’s reasonable — for backend logic.

But for UI control? It’s too indirect. Why involve a server to update a button, toggle a tab, or open a modal? The model already knows what it wants to do. Let it act locally.

The future I’m exploring flips that model: local AI, running on-device, calling browser-native tools that directly manipulate the DOM, app state, or user experience.

Instead of “call my API to fetch data” it becomes “scroll to this section.” “Show only vegan items.” “Prefill this form with inferred values.” In other words, the model becomes part of your frontend stack.

Browser.AI: An API for In-Browser Models

Imagine tapping into powerful AI models directly from the browser, using an API as familiar as window.fetch orwindow.localStorage

Browser.AI is a working prototype that demonstrates how developers can run AI models directly on users’ devices through an API on the window object. This approach enhances privacy, speed, and offline capabilities for web apps.

const session = await window.ai.model.connect({ model: "llama3.2"});

const response = await session.chat({
  messages: [{ role: "user", content: "Hi!" }],
});

It’s designed around three key principles:

  • Keep it simple: An intuitive API that feels natural for JavaScript developers.

  • Flexibility and control for users and developers: Users choose which models to allow, and developers pick the models that fit their apps — ensuring cross-browser compatibility and avoiding vendor lock-in.

  • Boost performance and privacy: Running AI locally reduces latency, keeps data private, and enables offline functionality for a smoother experience.

Curious why I built Browser.AI? I unpack the motivation in the article: Bringing AI to the Browser: Transform Your Web App with On-Device Models.

Three Use Cases You Could Build Right Now

On-device LLMs open the door to everything from text summarization and content transformation to tool-calling and agent workflows. I’m very bullish on the future of on-device models — and I believe the best way to unlock their potential is by giving developers simple, powerful tools to build with them directly.

1. Conversational Navigation

“Show me vegan breakfast under $10.”

With local tool calls, the model parses the intent, filters the product list, and scrolls to the relevant section — without ever leaving the browser. You get hands-free, adaptive navigation that feels more like a smart assistant than a web page.

2. Personalized Content Transformation

A diabetic user lands on your site. The model rewrites product descriptions to highlight low-sugar options.

Because the model runs locally, it can reshape the UI in real time, with no data sent to a server — and no need to precompute every variant.

3. Dynamic Component Injection

As a user scrolls, the model detects patterns and injects relevant widgets — a map, a calculator, a timeline — only when useful.

Now your UI becomes adaptive. Not responsive in the CSS sense, responsive to meaning.

Under the Hood

I built a simple demo to show how the Browser.AI API paired with tool calling can be used to control the UI.

The demo is a simple online restaurant menu that uses AI-powered dialogue to filter items by category, add or remove products from the cart, and run basic searches. It’s intentionally minimal — built to showcase what’s possible without getting bogged down in complexity. If this pattern sparks ideas, I encourage you to build on it and see how far you can take it.

I encourage you to clone the repo and set it up locally. However, you can view the final app here (must run in Browser.AI).

Now that we have that out of the way, let’s take a look at what makes this possible.

Model

I’m using Meta’s Llama3.2, but you can swap in any model that supports tool calling in Browser.AI (Settings → Model Settings or ⌘,).

Llama 3.2B is a 3-billion parameter instruction-tuned model. It does well with dialogue, retrieval, and summarization, and supports up to 128K tokens — plenty for long inputs. It handles native function calling, supports single and multi-call flows, and works zero-shot out of the box. That said, it’s small, and you’ll hit its limits. Tool call accuracy is decent (approximately 80%), but it sometimes misfires — calling tools when it shouldn’t, formatting arguments incorrectly, or missing context entirely.

Bottom line: Llama 3.2B is fast, lightweight, and good enough for local UI logic — but it’s not perfect. Keep your scope tight, and it’ll deliver.

Tools

Tool calling allows language models to output structured function calls, rather than plain text. The model decides which tool to use and what arguments to pass — but it doesn’t execute anything itself. Your code handles the actual function call. It’s a way to let the model guide workflows without giving it full control. If this is new to you, I put together a quick explainer with JavaScript examples.

The model doesn’t run the function — it just suggests it.

Define Your Tools — Tools are plain JSON schemas. You tell the model what functions it can call and what arguments are expected.

const addToCartTool = {
  type: "function",
  function: {
    name: "addToCartFunc",
    description: "Adds a product to the cart",
    parameters: {
      type: "object",
      properties: {
        productId: { type: "number" },
      },
      required: ["productId"],
    },
  },
};

Implement the Function — Your code runs the function once the model suggests it:

const toolFunctions = {
  addToCartFunc: ({ productId }) => {
    addToCart(productId);
    return "Added to cart.";
  },
};

Execute the Call — After the chat completes, check for a tool_calls array. If present, run the function:

const { name, arguments: args } = response.choices[0].message.tool_calls[0].function;
toolFunctions[name](args);

That’s it. You just made your UI programmable by an on-device model.

Try It Yourself

I’ve put together a minimal demo: an online restaurant menu that responds to natural language prompts like:

  • “Add coke to the cart”

  • “Remove the coke”

  • “Show dessert options”

It uses Meta’s Llama 3.2B model — just 2GB and surprisingly capable for frontend tasks. Tool call accuracy is ~80%, and latency is low enough for real-time UI feedback.

👋 Not on a Mac? Watch the demo video instead.

Why This Matters

Yes, this is early-stage tech. The models are small. The tooling is rough.

But we’re standing at the edge of a major shift in frontend development:

  • From imperative UI updates to model-driven intent

  • From cloud-only models to local-first intelligence

  • From brittle click-flows to resilient adaptive interfaces

And most importantly: from dumb clients to intelligent, contextual, private-first experiences.

That’s not just an architectural shift. It’s a new mental model for how web apps are built.

Final thoughts

The model I used is just 3B parameters and ~2GB in size. It’s nowhere near the scale or capability of GPT-4 or Llama 4 Maverick. I could’ve gone bigger — Browser.AI supports larger models — but I optimized for speed and responsiveness over raw power.

And yeah, the demo is limited. The catalog is small. Only four functions. It’ll break if you throw too much at it.

But here’s what surprised me: even with a tiny model, I got a responsive, local-first UI agent working in the browser — with no backend, no glue code, and no nonsense. That’s not just a cool trick — it’s a signal.

If this is what’s possible today, imagine where we’ll be in a year.

So here’s your challenge:
 Fork the demo. Push it further. Add voice with window.webkitSpeechRecognition. Hook in real-time data. Scale up the model. The foundation’s there—you just have to build on it.

👉 Get the code on GitHub and make it yours.


To stay connected and share your journey, feel free to reach out through the following channels:

  • 👨‍💼 LinkedIn: Join me for more insights into AI development and tech innovations.

  • 🤖 JavaScript + AI: Join the JavaScript and AI group and share what you’re working on.

  • 💻 GitHub: Explore my projects and contribute to ongoing work.

  • 📚 Medium: Follow my articles for more in-depth discussions on the intersection of JavaScript and AI.

Join Chris on Peerlist!

Join amazing folks like Chris and thousands of other builders on Peerlist.

peerlist.io/

It’s available... this username is available! 😃

Claim your username before it's too late!

This username is already taken, you’re a little late.😐

0

10

0