OmniBrowser Agent — Doc Viewer Demo

Introduction

OmniBrowser Agent is a lightweight, open-source library for building browser automation agents that run entirely in the browser. No server required — the AI planning runs locally via WebLLM or uses a simple heuristic planner.

This library is designed for demos, prototyping, and educational purposes. It ships as a single ES module you can import from a CDN.

The agent observes the current page state, plans the next action (click, type, scroll, etc.), executes it, and repeats until the goal is achieved or the step limit is reached.

Installation

From NPM

npm install @akshayram1/omnibrowser-agent

From CDN (no build step)

import { createBrowserAgent }
  from "https://unpkg.com/@akshayram1/omnibrowser-agent/dist/lib.js";

The library is a pure ES module. It works in any modern browser that supports import statements.

Quick Start

Create an agent, give it a goal, and start it:

const agent = createBrowserAgent(
  { goal: "Click the Sign Up button", mode: "autonomous" },
  {
    onStep(result) { console.log(result.message); },
    onDone(result) { console.log("Done:", result.message); }
  }
);

await agent.start();

Use mode: "human-approved" to review each action before it runs — great for debugging.

createBrowserAgent(config, callbacks)

The main entry point. Returns an AgentSession object.

Parameters

config.goal — The natural language goal for the agent
config.mode — "autonomous" or "human-approved"
config.planner — Planner configuration ({ kind: "heuristic" } or { kind: "webllm" })
config.maxSteps — Maximum actions before stopping (default: 10)
config.stepDelayMs — Delay between steps in ms (default: 500)
config.scopeSelector — CSS selector to limit the agent's scope

AgentSession

The object returned by createBrowserAgent().

Methods

start() — Begin executing toward the goal. Returns a Promise.
stop() — Stop the agent after the current step.
approve() — Approve a pending action (in human-approved mode).

Callbacks

onStep(result) — Called after each step with status and action info
onDone(result) — Called when the agent completes the goal
onError(error) — Called on fatal errors
onApprovalRequired(action) — Called when an action needs user approval
onMaxStepsReached() — Called when step limit is hit

PlannerConfig

Controls how the agent decides what to do next.

{
  kind: "webllm" | "heuristic",
  systemPrompt?: string,  // Custom system prompt for WebLLM
  model?: string           // WebLLM model ID
}

The systemPrompt option lets you customize the agent's behavior — for example, making it a careful meeting-room booking assistant.

Actions

Actions the agent can perform on the page:

click — Click an element by selector
type — Type text into an input or textarea
scroll — Scroll an element or the page
navigate — Navigate to a URL
extract — Read text content from an element
focus — Focus an element
wait — Wait a specified number of milliseconds
done — Signal that the goal is complete

Modes

Autonomous

The agent plans and executes actions without asking for permission. Best for trusted environments and demos.

Human-Approved

Every action must be approved before execution. The onApprovalRequired callback fires, and the agent pauses until agent.approve() is called.

Always use human-approved mode when the agent interacts with real services or sensitive data.

Planners

Heuristic Planner

A rule-based planner that uses regex pattern matching to understand goals. No AI model needed. Works offline and is very fast, but limited to simple goals like "fill X with Y" and "click Z".

WebLLM Planner

Uses a local LLM (via WebLLM) running in the browser with WebGPU. Supports complex, multi-step goals with context awareness. Requires a WebGPU-capable browser and model download.

Recommended models:

Llama 3.2 1B — Fast, ~600 MB, good for simple goals
Llama 3.2 3B — Better reasoning, ~1.5 GB
Phi-3.5 Mini — Great quality, ~2 GB
Mistral 7B v0.3 — Balanced quality, ~4.1 GB
Qwen2.5 7B — Strongest quality, ~4.3 GB
Llama 3.1 8B — Strong reasoning, ~4.8 GB

Selectors & Scope

Use scopeSelector to restrict the agent to a portion of the page:

createBrowserAgent({
  goal: "Fill in the form",
  scopeSelector: "#my-form"
});

The observer will only collect candidates (inputs, buttons, links) within the scoped element. This prevents the agent from interacting with navigation, ads, or other unrelated UI.

WebLLM Integration

To use WebLLM, set up a bridge on window.__browserAgentWebLLM:

import * as webllm from "https://esm.run/@mlc-ai/web-llm";

const engine = await webllm.CreateMLCEngine("Phi-3.5-mini-instruct-q4f16_1-MLC");

window.__browserAgentWebLLM = {
  async plan(input) {
    const response = await engine.chat.completions.create({
      messages: [
        { role: "system", content: "You are a browser planner..." },
        { role: "user", content: \`Goal: \${input.goal}\` }
      ]
    });
    return parseAction(response.choices[0].message.content);
  }
};

The agent automatically detects window.__browserAgentWebLLM when you set planner to { kind: "webllm" }.

Custom Bridges

You can create custom planner bridges to connect any LLM — cloud APIs, local Ollama, or even rule-based systems:

window.__browserAgentWebLLM = {
  async plan(input) {
    // input contains: goal, history, snapshot, lastError, memory
    const response = await fetch("/api/plan", {
      method: "POST",
      body: JSON.stringify(input)
    });
    return response.json();
  }
};

The bridge must return an action object matching the AgentAction type.

Reflection Loop

When the planner returns evaluation, memory, and next_goal fields alongside the action, the agent enters a reflection loop:

evaluation — What happened in the last step
memory — Key facts to carry forward
next_goal — The refined sub-goal for the next step

This allows the agent to adapt to unexpected page states and recover from errors.

Security

OmniBrowser Agent executes actions in the user's browser session. Always scope the agent narrowly and use human-approved mode for sensitive pages.

Security best practices:

Use scopeSelector to prevent interaction outside the target area
Use human-approved mode for any real-world interaction
Set a low maxSteps to limit damage from runaway agents
Never pass untrusted user input directly as the agent goal
Review WebLLM model outputs — small models can produce unexpected actions

General Questions

Does this work without an internet connection?

The heuristic planner works fully offline. WebLLM requires an initial model download but runs locally after that.

Which browsers are supported?

Any modern browser supporting ES modules. WebLLM additionally requires WebGPU (Chrome 113+, Edge 113+). Firefox and Safari have experimental WebGPU support.

Can the agent fill multi-step forms?

Yes. Give a compound goal like "Fill name with Jane, email with jane@example.com, and submit the form". The agent breaks it into steps automatically.

Troubleshooting

WebLLM model won't load

Check that your browser supports WebGPU. Visit chrome://gpu and look for "WebGPU: Enabled". Also ensure you have enough GPU memory for the chosen model.

Agent clicks the wrong element

Use scopeSelector to narrow the agent's view. Add clear id attributes to your interactive elements. The heuristic planner works best with descriptive IDs and labels.

Agent keeps repeating the same action

This usually means the action is silently failing. Check the onStep callback for error statuses. Enable the reflection loop by using WebLLM with a model that supports evaluation/memory fields.