Introduction
OmniBrowser Agent is a lightweight, open-source library for building browser automation agents that run entirely in the browser. No server required — the AI planning runs locally via WebLLM or uses a simple heuristic planner.
The agent observes the current page state, plans the next action (click, type, scroll, etc.), executes it, and repeats until the goal is achieved or the step limit is reached.
Installation
From NPM
npm install @akshayram1/omnibrowser-agent
From CDN (no build step)
import { createBrowserAgent }
from "https://unpkg.com/@akshayram1/omnibrowser-agent/dist/lib.js";
The library is a pure ES module. It works in any modern browser that supports import statements.
Quick Start
Create an agent, give it a goal, and start it:
const agent = createBrowserAgent(
{ goal: "Click the Sign Up button", mode: "autonomous" },
{
onStep(result) { console.log(result.message); },
onDone(result) { console.log("Done:", result.message); }
}
);
await agent.start();
mode: "human-approved" to review each action before it runs — great for debugging.createBrowserAgent(config, callbacks)
The main entry point. Returns an AgentSession object.
Parameters
config.goal— The natural language goal for the agentconfig.mode—"autonomous"or"human-approved"config.planner— Planner configuration ({ kind: "heuristic" }or{ kind: "webllm" })config.maxSteps— Maximum actions before stopping (default: 10)config.stepDelayMs— Delay between steps in ms (default: 500)config.scopeSelector— CSS selector to limit the agent's scope
AgentSession
The object returned by createBrowserAgent().
Methods
start()— Begin executing toward the goal. Returns a Promise.stop()— Stop the agent after the current step.approve()— Approve a pending action (in human-approved mode).
Callbacks
onStep(result)— Called after each step with status and action infoonDone(result)— Called when the agent completes the goalonError(error)— Called on fatal errorsonApprovalRequired(action)— Called when an action needs user approvalonMaxStepsReached()— Called when step limit is hit
PlannerConfig
Controls how the agent decides what to do next.
{
kind: "webllm" | "heuristic",
systemPrompt?: string, // Custom system prompt for WebLLM
model?: string // WebLLM model ID
}
systemPrompt option lets you customize the agent's behavior — for example, making it a careful meeting-room booking assistant.Actions
Actions the agent can perform on the page:
click— Click an element by selectortype— Type text into an input or textareascroll— Scroll an element or the pagenavigate— Navigate to a URLextract— Read text content from an elementfocus— Focus an elementwait— Wait a specified number of millisecondsdone— Signal that the goal is complete
Modes
Autonomous
The agent plans and executes actions without asking for permission. Best for trusted environments and demos.
Human-Approved
Every action must be approved before execution. The onApprovalRequired callback fires, and the agent pauses until agent.approve() is called.
Planners
Heuristic Planner
A rule-based planner that uses regex pattern matching to understand goals. No AI model needed. Works offline and is very fast, but limited to simple goals like "fill X with Y" and "click Z".
WebLLM Planner
Uses a local LLM (via WebLLM) running in the browser with WebGPU. Supports complex, multi-step goals with context awareness. Requires a WebGPU-capable browser and model download.
Recommended models:
- Llama 3.2 1B — Fast, ~600 MB, good for simple goals
- Llama 3.2 3B — Better reasoning, ~1.5 GB
- Phi-3.5 Mini — Great quality, ~2 GB
- Mistral 7B v0.3 — Balanced quality, ~4.1 GB
- Qwen2.5 7B — Strongest quality, ~4.3 GB
- Llama 3.1 8B — Strong reasoning, ~4.8 GB
Selectors & Scope
Use scopeSelector to restrict the agent to a portion of the page:
createBrowserAgent({
goal: "Fill in the form",
scopeSelector: "#my-form"
});
The observer will only collect candidates (inputs, buttons, links) within the scoped element. This prevents the agent from interacting with navigation, ads, or other unrelated UI.
WebLLM Integration
To use WebLLM, set up a bridge on window.__browserAgentWebLLM:
import * as webllm from "https://esm.run/@mlc-ai/web-llm";
const engine = await webllm.CreateMLCEngine("Phi-3.5-mini-instruct-q4f16_1-MLC");
window.__browserAgentWebLLM = {
async plan(input) {
const response = await engine.chat.completions.create({
messages: [
{ role: "system", content: "You are a browser planner..." },
{ role: "user", content: \`Goal: \${input.goal}\` }
]
});
return parseAction(response.choices[0].message.content);
}
};
window.__browserAgentWebLLM when you set planner to { kind: "webllm" }.Custom Bridges
You can create custom planner bridges to connect any LLM — cloud APIs, local Ollama, or even rule-based systems:
window.__browserAgentWebLLM = {
async plan(input) {
// input contains: goal, history, snapshot, lastError, memory
const response = await fetch("/api/plan", {
method: "POST",
body: JSON.stringify(input)
});
return response.json();
}
};
The bridge must return an action object matching the AgentAction type.
Reflection Loop
When the planner returns evaluation, memory, and next_goal fields alongside the action, the agent enters a reflection loop:
- evaluation — What happened in the last step
- memory — Key facts to carry forward
- next_goal — The refined sub-goal for the next step
This allows the agent to adapt to unexpected page states and recover from errors.
Security
Security best practices:
- Use
scopeSelectorto prevent interaction outside the target area - Use
human-approvedmode for any real-world interaction - Set a low
maxStepsto limit damage from runaway agents - Never pass untrusted user input directly as the agent goal
- Review WebLLM model outputs — small models can produce unexpected actions
General Questions
Does this work without an internet connection?
The heuristic planner works fully offline. WebLLM requires an initial model download but runs locally after that.
Which browsers are supported?
Any modern browser supporting ES modules. WebLLM additionally requires WebGPU (Chrome 113+, Edge 113+). Firefox and Safari have experimental WebGPU support.
Can the agent fill multi-step forms?
Yes. Give a compound goal like "Fill name with Jane, email with jane@example.com, and submit the form". The agent breaks it into steps automatically.
Troubleshooting
WebLLM model won't load
Check that your browser supports WebGPU. Visit chrome://gpu and look for "WebGPU: Enabled". Also ensure you have enough GPU memory for the chosen model.
Agent clicks the wrong element
Use scopeSelector to narrow the agent's view. Add clear id attributes to your interactive elements. The heuristic planner works best with descriptive IDs and labels.
Agent keeps repeating the same action
This usually means the action is silently failing. Check the onStep callback for error statuses. Enable the reflection loop by using WebLLM with a model that supports evaluation/memory fields.