omnibrowser-agent v0.2.35

Local-first browser AI automation library

OmniBrowser Agent plans and executes DOM actions entirely in the browser — no API keys, no cloud costs, no data leaving your machine. Wire in a WebLLM model and it reasons, remembers, and acts on any webpage.

Privacy-first WebLLM + WebGPU Reflection loop Human-approved mode Custom system prompt Embeddable API

🚀 Live Demo NPM Package GitHub 🤗 Our LLM

2Agent Modes

2Planner Modes

8Action Types

MITLicense

Use Cases

CRM profile lookup automation
Guided form-filling workflows
Assisted data extraction flows
Multi-step task automation

Core Engine

Observer: DOM snapshot + candidate elements
Planner: reflection → next action
Safety: safe / review / blocked gating
Executor: DOM actions with framework compat

Project Links

watch the demo

See OmniBrowser Agent in Action

our llm

OmniBrowser Planner LLM

We fine-tuned a purpose-built planner model that runs entirely in your browser via WebLLM + WebGPU. No API keys, no cloud — the model weights download once and execute locally on your GPU.

Base Model

Qwen2.5-1.5B-Instruct — a compact, high-quality instruction-following LLM from Alibaba.

Fine-tuning

QLoRA fine-tuned on our custom OmniBrowser planner dataset — DOM snapshots paired with correct AgentAction JSON outputs.

Quantization

q4f16_1 via MLC-LLM — 4-bit weights, 16-bit activations. Optimized for WebGPU inference in the browser.

Size

~800 MB download. Runs on any device with WebGPU support (Chrome 113+, Edge, Safari 18+).

What the model does

Given a user goal, page URL, visible DOM candidates, and action history — the model outputs a structured JSON response with evaluation, working memory, next goal, and the exact DOM action to execute:

{
  "evaluation": "Clicked the CRM tab — now on the contacts page.",
  "memory": "CRM tab active. Name field is #name, currently empty.",
  "nextGoal": "Type Jane Doe into the name field",
  "action": { "type": "type", "selector": "#name", "text": "Jane Doe", "clearFirst": true }
}

Training pipeline

The full training pipeline is open-source and reproducible:

Generate a planner dataset from real DOM snapshots using notebook/scripts/generate_dataset.mjs
Validate selector + action correctness with notebook/scripts/validate_dataset.mjs
QLoRA fine-tune in Google Colab — see notebook/
Merge adapters and quantize with mlc_llm convert_weight --quantization q4f16_1
Upload to Hugging Face and load in WebLLM via custom appConfig

Load in your app

import * as webllm from "@mlc-ai/web-llm";
import { createWebLLMBridge } from "@akshayram1/omnibrowser-agent";

const appConfig = {
  model_list: [
    ...webllm.prebuiltAppConfig.model_list,
    {
      model: "https://huggingface.co/Akshayram1/omnibrowser-planner-1p5b-q4f16_1-MLC",
      model_id: "omnibrowser-planner-1p5b-q4f16_1",
      model_lib: webllm.modelLibURLPrefix + webllm.modelVersion
        + "/Qwen2-1.5B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm",
    },
  ],
};

const engine = await webllm.CreateMLCEngine(
  "omnibrowser-planner-1p5b-q4f16_1", { appConfig }
);
window.__browserAgentWebLLM = createWebLLMBridge(engine);

🤗 View on Hugging Face Training Notebook Model Test Demo

what's new

What's New in v0.2.35

This release implements the reflection-before-action pattern — the same loop used by leading browser agents — plus a new systemPrompt option so you can shape agent behaviour without rewriting the bridge.

Reflection Loop New

Before every action the agent now goes through a 4-step inner loop:

1 · Evaluate

What happened in the previous step? Did it succeed? What changed on the page?

2 · Remember

What key facts should be carried into the next step? Selector mappings, field values, task state.

3 · Plan

State the next goal in plain English before choosing an action.

4 · Act

Output the specific DOM action: click, type, navigate, scroll, etc.

The WebLLM bridge now returns the full reflection object:

{
  "evaluation": "The name field was filled successfully.",
  "memory":     "Name=#name done. Next: fill email at #email.",
  "next_goal":  "Type the email address into #email",
  "action":     { "type": "type", "selector": "#email", "text": "jane@example.com", "clearFirst": true }
}

The nextGoal field is surfaced in the live demo as a 💭 thought bubble before each action, so you can follow the agent's reasoning in real time.

Working Memory Across Steps New

The agent's memory string is automatically carried forward from one tick to the next inside AgentSession. The planner receives it as input.memory and can update it each step — giving the agent a scratchpad across the whole task.

Custom System Prompt New

Pass your own system prompt directly in the planner config — no need to rewrite the bridge:

const agent = createBrowserAgent({
  goal: "Fill the checkout form",
  planner: {
    kind: "webllm",
    systemPrompt: "You are a careful checkout assistant. Never submit before all required fields are filled."
  }
});

New Exports New

parsePlannerResult(raw) — parse the full reflection+action JSON from raw LLM output, with fallback to bare AgentAction for backward compatibility.
PlannerResult type — { action, evaluation?, memory?, nextGoal? }

import { parsePlannerResult } from "@akshayram1/omnibrowser-agent";

const result = parsePlannerResult(llmRawOutput);
// result.action      → AgentAction
// result.evaluation  → string | undefined
// result.memory      → string | undefined
// result.nextGoal    → string | undefined

Backward Compatible

Existing bridges that return a bare AgentAction object still work without any changes. The library normalises both formats automatically.

docs

Docs

Everything you need to install, initialise, and run your first browser agent.

Installation

npm install @akshayram1/omnibrowser-agent

Quick Start

import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";

const agent = createBrowserAgent(
  {
    goal: "Open CRM and find customer John Smith",
    mode: "human-approved",        // or "autonomous"
    planner: { kind: "heuristic" } // or "webllm"
  },
  {
    onStep:            (result, session) => console.log(result.message),
    onApprovalRequired:(action, session) => console.log("Needs approval:", action),
    onDone:            (result, session) => console.log("Done:", result.message),
    onError:           (err,    session) => console.error(err),
    onMaxStepsReached: (session)         => console.log("Max steps hit"),
  }
);

await agent.start();

// Resume after an approval prompt:
await agent.resume();

// Inspect state at any time:
console.log(agent.isRunning, agent.hasPendingAction);

// Stop:
agent.stop();

AbortSignal Support

const controller = new AbortController();
const agent = createBrowserAgent({ goal: "...", signal: controller.signal });
agent.start();

controller.abort(); // cancel from outside

Reading Reflection Fields

Every onStep result now includes optional reflection data from the planner:

onStep(result, session) {
  if (result.reflection?.nextGoal) {
    console.log("Agent thinking:", result.reflection.nextGoal);
  }
  if (result.reflection?.memory) {
    console.log("Agent memory:", result.reflection.memory);
  }
  console.log("Action:", result.message);
}

Agent Modes

human-approved

Pauses on review-rated actions and fires onApprovalRequired. Call agent.resume() to continue. Recommended for CRM, finance, and admin flows.

autonomous

Executes all safe and review actions without pausing. Best for rapid prototyping and demos.

Planner Modes

heuristic

Zero-dependency regex planner. Works fully offline. Best for simple, predictable goals: navigate, fill a field, click a button.

webllm

On-device LLM via WebGPU through window.__browserAgentWebLLM. Fully private. Supports the reflection loop and custom system prompts.

Supported Actions

Action	Description	Risk level
`navigate`	Navigate to a URL (http/https only)	safe
`click`	Click an element by CSS selector	safe / review
`type`	Type text into an input or textarea	safe / review
`scroll`	Scroll a container or the page	safe
`focus`	Focus an element (useful for dropdowns)	safe
`wait`	Pause for N milliseconds	safe
`extract`	Extract text from an element	review
`done`	Signal task completion	safe

Safety Model

safe — executes immediately in all modes.
review — pauses in human-approved mode; executes in autonomous. Triggered by actions on labels matching delete / submit / pay / confirm / transfer.
blocked — never executes. Triggered by javascript:, file:, or malformed URLs.

architecture

01 — delivery

Chrome Extension

popup + background worker

Load the dist/ folder as an unpacked extension. Enter a goal in the popup, pick a mode, and hit Start. The background service worker drives the tick loop across tabs.

MV3 background worker

npm Library

createBrowserAgent()

Embed the agent directly into any web app. Import, configure, and wire up event callbacks. The same core engine powers both — zero duplication.

@akshayram1/omnibrowser-agent

both share the same core engine

02 — agent tick loop

one tick = observe → plan → assess risk → execute

01

Observe

observer.ts

Scans the live DOM. Filters invisible elements. Prioritises in-viewport candidates. Resolves ARIA labels. Returns a PageSnapshot.

02

Plan

planner.ts

Takes the goal, snapshot, and history. Returns the next AgentAction — plus optional reflection (evaluation, memory, nextGoal).

03

Assess risk

safety.ts

Every action gets a risk level: safe, review, or blocked. Risky actions pause for human approval in human-approved mode.

04

Execute

executor.ts

Performs the DOM action. Dispatches proper InputEvents for framework compat. Verifies success. Feeds errors back to the planner.

03 — core modules (src/core/)

planner.ts

planNextAction(config, input)

URL, fill, search, click regex patterns
Fallback: first input → first button → done
WebLLM bridge via window.__browserAgentWebLLM
Returns PlannerResult with reflection fields
lastError fed back on retry

observer.ts

collectSnapshot()

Queries a, button, input, textarea, select…
Filters hidden & zero-dimension elements
In-viewport elements listed first
Resolves label via aria-labelledby, for/id, aria-label
Max 60 candidates, 1500-char text preview

executor.ts

executeAction(action)

click — disabled-check before firing
type — InputEvent + change dispatch (React/Vue safe)
navigate — sets window.location.href
extract — verifies non-empty text
scroll, focus, wait actions

04 — safety & planner modes

// safety.ts — assessRisk()

safe

navigate (http/s), click, scroll, wait, focus, done

review

extract, click/type on delete · pay · submit · transfer

blocked

javascript: · file: · malformed URLs

// planner modes

Heuristic — zero deps, works offline

Pure regex matching against your goal string. Handles navigate, fill, search, and click patterns. Falls back to the first visible input or button.

const agent = createBrowserAgent({
  goal: "search for John Smith",
  planner: { kind: "heuristic" }
});

WebLLM — on-device via WebGPU

Delegates to window.__browserAgentWebLLM. Fully private — no API calls, runs local inference via WebGPU. Wire in your own bridge implementation.

const agent = createBrowserAgent({
  goal: "open CRM and find customer",
  planner: { kind: "webllm", modelId: "Llama-3" }
}); // needs bridge on window

05 — action outcomes

Executed

Action is safe. Runs immediately. Result message appended to session history. Next tick begins.

status: "executed"

Needs approval

Risk level is "review" in human-approved mode. Agent pauses, fires onApprovalRequired. Resume with agent.resume().

status: "needs_approval"

Blocked

Dangerous protocol or malformed URL. Action is rejected entirely. Session stops without executing anything.

status: "blocked"

Data Flow — One Tick

goal + history + memory
        │
        ▼
observer.collectSnapshot()   →  PageSnapshot (url, title, candidates[])
        │
        ▼
planner.planNextAction()     →  PlannerResult
                                  { action, evaluation?, memory?, nextGoal? }
        │
        ▼
safety.assessRisk(action)    →  safe | review | blocked
        │
   ┌────┴──────────────────────────┐
blocked                  review (human-approved mode)
   │                               │
  stop                   pause → user approves → resume()
                                   │
                              safe / approved
                                   │
                                   ▼
                executor.executeAction(action)  →  result string
                                   │
                                   ▼
                       session.history.push(result)
                       session.memory = plannerResult.memory
                       → next tick

WebLLM Bridge Contract

Attach an object to window.__browserAgentWebLLM before starting the agent. The bridge can return either the new PlannerResult format or a bare AgentAction (backward compatible).

window.__browserAgentWebLLM = {
  async plan(input, modelId) {
    // input.goal, input.snapshot, input.history,
    // input.lastError, input.memory, input.systemPrompt
    return {
      evaluation: "Previous step succeeded.",
      memory:     "Name field is #name.",
      next_goal:  "Fill the email field.",
      action: { "type": "type", "selector": "#email", "text": "jane@example.com", "clearFirst": true }
    };
  }
};

Current Limitations

No persistent long-term memory (IndexedDB) yet
No goal decomposition / multi-step task graphs yet
Risk scoring is keyword-based, not semantic
No selector healing or fallback strategy yet

embedding guide

06 — get started

@akshayram1/omnibrowser-agent

v0.2.35 MIT TypeScript

$ npm install @akshayram1/omnibrowser-agent

// Minimal setup — heuristic planner, autonomous mode

import { createBrowserAgent } from '@akshayram1/omnibrowser-agent';

const agent = createBrowserAgent({

goal: "search for contact John Smith in CRM",

mode: "autonomous",

planner: { kind: "heuristic" }

});

await agent.start();

// Human-approved mode — agent pauses on risky actions

import { createBrowserAgent } from '@akshayram1/omnibrowser-agent';

const agent = createBrowserAgent({

goal: "fill out the payment form",

mode: "human-approved", // pauses on delete / submit / pay

planner: { kind: "heuristic" },

maxSteps: 20,

stepDelayMs: 500

});

await agent.start();

// Later — after user reviews the pending action:

await agent.resume();

// Full event callback API

import { createBrowserAgent } from '@akshayram1/omnibrowser-agent';

const agent = createBrowserAgent(

{ goal: "open CRM and find customer", mode: "human-approved" },

{

onStart: (session) => console.log('started', session.id),

onStep: (result, session) => console.log(result.message),

onApprovalRequired: (action, session) => {

console.log('Review:', action);

// call agent.resume() after user confirms

},

onDone: (result) => console.log('Done:', result.message),

onError: (err) => console.error(err),

onMaxStepsReached: (session) => console.warn('max steps', session)

}

);

await agent.start();

// AbortSignal — cancel from outside at any time

import { createBrowserAgent } from '@akshayram1/omnibrowser-agent';

const controller = new AbortController();

const agent = createBrowserAgent({

goal: "extract all product prices",

signal: controller.signal // wire in the abort signal

});

agent.start();

// Cancel externally (e.g. button click, timeout, unmount)

setTimeout(() => controller.abort(), 5000);

// Or call stop() directly:

agent.stop();

// WebLLM bridge with reflection loop

import * as webllm from "@mlc-ai/web-llm";

import { createBrowserAgent, parsePlannerResult } from "@akshayram1/omnibrowser-agent";

const engine = await webllm.CreateMLCEngine("Llama-3.2-3B-Instruct-q4f16_1-MLC");

window.__browserAgentWebLLM = {

async plan(input, modelId) {

const resp = await engine.chat.completions.create({

messages: [{ role: "user", content: `Goal: ${input.goal}` }],

temperature: 0, max_tokens: 200

});

return parsePlannerResult(resp.choices[0].message.content);

}

};

exports

createBrowserAgent() parseAction() parsePlannerResult() AgentAction AgentMode AgentSession PlannerConfig ContentResult RiskLevel

Embedding Guide

Embed OmniBrowser Agent as a library in any web application. Full reference in docs/EMBEDDING.md.

Heuristic Planner (zero setup)

import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";

const agent = createBrowserAgent(
  {
    goal: "Search contact Jane Doe and open profile",
    mode: "human-approved",
    planner: { kind: "heuristic" },
    maxSteps: 15,
    stepDelayMs: 400
  },
  {
    onStep:             (result) => console.log("step", result),
    onApprovalRequired: (action) => showApprovalModal(action),
    onDone:             (result) => console.log("done", result),
    onError:            (error)  => console.error(error)
  }
);

await agent.start();

// Approve a paused action:
await agent.approvePendingAction();

// Stop at any time:
agent.stop();

WebLLM Planner with Reflection

Load a WebLLM engine, wire the bridge, then start the agent. The bridge receives the full reflection input and should return the reflection+action object:

import * as webllm from "@mlc-ai/web-llm";
import { createBrowserAgent, parsePlannerResult } from "@akshayram1/omnibrowser-agent";

const engine = await webllm.CreateMLCEngine("Llama-3.2-3B-Instruct-q4f16_1-MLC");

window.__browserAgentWebLLM = {
  async plan(input, modelId) {
    const { goal, history, lastError, memory, systemPrompt } = input;

    const defaultSystem = `You are a browser automation agent.
Output ONLY a JSON object in this format:
{"evaluation":"...","memory":"...","next_goal":"...","action":{...}}`;

    const resp = await engine.chat.completions.create({
      messages: [
        { role: "system", content: systemPrompt || defaultSystem },
        { role: "user",   content: `Goal: "${goal}"\nHistory: ${history.slice(-4).join(" → ")}${memory ? "\nMemory: " + memory : ""}${lastError ? "\nLast error: " + lastError : ""}` }
      ],
      temperature: 0,
      max_tokens: 200
    });

    return parsePlannerResult(resp.choices[0].message.content);
  }
};

const agent = createBrowserAgent({
  goal: "Fill the checkout form with my details",
  planner: { kind: "webllm" }
}, {
  onStep(result) {
    if (result.reflection?.nextGoal) console.log("💭", result.reflection.nextGoal);
    console.log("✅", result.message);
  }
});

await agent.start();

Custom System Prompt

Shape the agent's personality or constraints without touching the bridge:

const agent = createBrowserAgent({
  goal: "Book a meeting room for tomorrow",
  planner: {
    kind: "webllm",
    systemPrompt: `You are a careful meeting room booking assistant.
Always confirm the room is available before clicking Book.
Never navigate away from the booking portal.`
  }
});

Bring Your Own Model (Colab Notebook)

Want a planner model tuned for OmniBrowser's DOM snapshots? Use the training + quantization assets in notebook/:

custom_quantized_llm_colab.ipynb — QLoRA fine-tune + MLC quantization
starter dataset JSONL — matches prompt.ts + parse-action.ts contracts
dataset scripts — generation + validation tooling

Flow: collect traces → fine-tune in Colab → run mlc_llm convert_weight --quantization q4f16_1 + gen_config → upload to HuggingFace → load via custom appConfig.

Custom WebLLM Model

To use a model that isn't in WebLLM's built-in list, compile it with MLC-LLM and register it via appConfig:

import * as webllm from "@mlc-ai/web-llm";

// 1. Describe your compiled model
const myModel = {
  model:    "https://huggingface.co/your-org/your-model-MLC/resolve/main/",
  model_id: "your-model-MLC",
  // point to the compiled .wasm lib (same arch as a similar built-in model)
  model_lib: webllm.modelLibURLPrefix + webllm.modelVersion
           + "/Mistral-7B-Instruct-v0.3-q4f16_1-ctx4k_cs1k-webgpu.wasm",
};

// 2. Merge with the prebuilt catalog so existing models still work
const engine = await webllm.CreateMLCEngine("your-model-MLC", {
  appConfig: {
    model_list: [
      ...webllm.prebuiltAppConfig.model_list,
      myModel,
    ],
  },
  initProgressCallback({ progress, text }) {
    console.log(Math.round(progress * 100) + "%", text);
  },
});

// 3. Wire it up as usual
window.__browserAgentWebLLM = {
  async plan(input) {
    const resp = await engine.chat.completions.create({
      messages: [{ role: "user", content: `Goal: ${input.goal}` }],
      temperature: 0,
      max_tokens: 200,
    });
    return parsePlannerResult(resp.choices[0].message.content);
  }
};

Steps to compile a custom model: (1) Install mlc-llm — llm.mlc.ai/docs/install/mlc_llm. (2) Run mlc_llm convert_weight + mlc_llm gen_config. (3) If your architecture has no compatible prebuilt kernel, compile with mlc_llm compile. (4) Upload weights + mlc-chat-config.json to a CORS-enabled host (e.g. HuggingFace).

Notes

The WebLLM bridge is not bundled — bring your own engine and attach it to window.__browserAgentWebLLM.
Use human-approved mode for CRM, finance, and admin actions.
Bridges returning a bare AgentAction still work — backward compatible.
For production apps, mount inside an authenticated shell and add your own permission checks.

roadmap

Roadmap

Full roadmap in docs/ROADMAP.md.

v0.1

Extension runtime loop
Shared action contracts
Heuristic + WebLLM planner switch
Human-approved mode

v0.2 stable

New actions: scroll, focus
Improved heuristic planner with regex goal patterns
Better page observation (visibility filtering, up to 60 candidates)
Library API: resume(), isRunning, hasPendingAction, AbortSignal, onMaxStepsReached
CI pipeline with auto version bump on push to main

v0.2.35 current

Reflection-before-action pattern (evaluation → memory → next_goal → act)
Working memory carried across ticks via AgentSession.memory
parsePlannerResult() exported from library
systemPrompt option in PlannerConfig
Thought bubble (💭) messages in live demo
Chatbot UI redesign: tabs, typing indicator, right-aligned messages
Doc Viewer example with hidden tabs + side chat
Live Examples hub at /examples

v0.3

Expanded WebLLM model catalog (new 7B/8B options + compatibility matrix)
Improved model loading UX (recommended presets by speed/quality and device memory)
Enhanced default system prompts for safer, clearer multi-step planning
Prompt presets for common workflows (docs navigation, CRM form fill, task automation)

v1.0

Advanced prompt orchestration (goal-aware system prompt routing and contextual guardrails)
Functionality expansion: richer action toolkit and stronger extraction/navigation reliability
Adaptive planner behaviour (model-aware retries, fallback strategies, and recovery flows)
Evaluation suite for prompt and model quality across benchmark browser tasks

contact

Contact

Maintainer: Akshay Chame

For feature requests or bugs, please open an issue on GitHub with reproduction steps.

↗ View on GitHub npm package live examples