Use Cases
- CRM profile lookup automation
- Guided form-filling workflows
- Assisted data extraction flows
- Multi-step task automation
OmniBrowser Agent plans and executes DOM actions entirely in the browser — no API keys, no cloud costs, no data leaving your machine. Wire in a WebLLM model and it reasons, remembers, and acts on any webpage.
We fine-tuned a purpose-built planner model that runs entirely in your browser via WebLLM + WebGPU. No API keys, no cloud — the model weights download once and execute locally on your GPU.
Qwen2.5-1.5B-Instruct — a compact, high-quality instruction-following LLM from Alibaba.
QLoRA fine-tuned on our custom OmniBrowser planner dataset — DOM snapshots paired with correct AgentAction JSON outputs.
q4f16_1 via MLC-LLM — 4-bit weights, 16-bit activations. Optimized for WebGPU inference in the browser.
~800 MB download. Runs on any device with WebGPU support (Chrome 113+, Edge, Safari 18+).
Given a user goal, page URL, visible DOM candidates, and action history — the model outputs a structured JSON response with evaluation, working memory, next goal, and the exact DOM action to execute:
{
"evaluation": "Clicked the CRM tab — now on the contacts page.",
"memory": "CRM tab active. Name field is #name, currently empty.",
"nextGoal": "Type Jane Doe into the name field",
"action": { "type": "type", "selector": "#name", "text": "Jane Doe", "clearFirst": true }
}
The full training pipeline is open-source and reproducible:
notebook/scripts/generate_dataset.mjsnotebook/scripts/validate_dataset.mjsmlc_llm convert_weight --quantization q4f16_1appConfigimport * as webllm from "@mlc-ai/web-llm";
import { createWebLLMBridge } from "@akshayram1/omnibrowser-agent";
const appConfig = {
model_list: [
...webllm.prebuiltAppConfig.model_list,
{
model: "https://huggingface.co/Akshayram1/omnibrowser-planner-1p5b-q4f16_1-MLC",
model_id: "omnibrowser-planner-1p5b-q4f16_1",
model_lib: webllm.modelLibURLPrefix + webllm.modelVersion
+ "/Qwen2-1.5B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm",
},
],
};
const engine = await webllm.CreateMLCEngine(
"omnibrowser-planner-1p5b-q4f16_1", { appConfig }
);
window.__browserAgentWebLLM = createWebLLMBridge(engine);
This release implements the reflection-before-action pattern — the same loop used by leading browser agents — plus a new systemPrompt option so you can shape agent behaviour without rewriting the bridge.
Before every action the agent now goes through a 4-step inner loop:
What happened in the previous step? Did it succeed? What changed on the page?
What key facts should be carried into the next step? Selector mappings, field values, task state.
State the next goal in plain English before choosing an action.
Output the specific DOM action: click, type, navigate, scroll, etc.
The WebLLM bridge now returns the full reflection object:
{
"evaluation": "The name field was filled successfully.",
"memory": "Name=#name done. Next: fill email at #email.",
"next_goal": "Type the email address into #email",
"action": { "type": "type", "selector": "#email", "text": "jane@example.com", "clearFirst": true }
}
The nextGoal field is surfaced in the live demo as a 💭 thought bubble before each action, so you can follow the agent's reasoning in real time.
The agent's memory string is automatically carried forward from one tick to the next inside AgentSession. The planner receives it as input.memory and can update it each step — giving the agent a scratchpad across the whole task.
Pass your own system prompt directly in the planner config — no need to rewrite the bridge:
const agent = createBrowserAgent({
goal: "Fill the checkout form",
planner: {
kind: "webllm",
systemPrompt: "You are a careful checkout assistant. Never submit before all required fields are filled."
}
});
parsePlannerResult(raw) — parse the full reflection+action JSON from raw LLM output, with fallback to bare AgentAction for backward compatibility.PlannerResult type — { action, evaluation?, memory?, nextGoal? }import { parsePlannerResult } from "@akshayram1/omnibrowser-agent";
const result = parsePlannerResult(llmRawOutput);
// result.action → AgentAction
// result.evaluation → string | undefined
// result.memory → string | undefined
// result.nextGoal → string | undefined
Existing bridges that return a bare AgentAction object still work without any changes. The library normalises both formats automatically.
Everything you need to install, initialise, and run your first browser agent.
npm install @akshayram1/omnibrowser-agent
import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";
const agent = createBrowserAgent(
{
goal: "Open CRM and find customer John Smith",
mode: "human-approved", // or "autonomous"
planner: { kind: "heuristic" } // or "webllm"
},
{
onStep: (result, session) => console.log(result.message),
onApprovalRequired:(action, session) => console.log("Needs approval:", action),
onDone: (result, session) => console.log("Done:", result.message),
onError: (err, session) => console.error(err),
onMaxStepsReached: (session) => console.log("Max steps hit"),
}
);
await agent.start();
// Resume after an approval prompt:
await agent.resume();
// Inspect state at any time:
console.log(agent.isRunning, agent.hasPendingAction);
// Stop:
agent.stop();
const controller = new AbortController();
const agent = createBrowserAgent({ goal: "...", signal: controller.signal });
agent.start();
controller.abort(); // cancel from outside
Every onStep result now includes optional reflection data from the planner:
onStep(result, session) {
if (result.reflection?.nextGoal) {
console.log("Agent thinking:", result.reflection.nextGoal);
}
if (result.reflection?.memory) {
console.log("Agent memory:", result.reflection.memory);
}
console.log("Action:", result.message);
}
Pauses on review-rated actions and fires onApprovalRequired. Call agent.resume() to continue. Recommended for CRM, finance, and admin flows.
Executes all safe and review actions without pausing. Best for rapid prototyping and demos.
Zero-dependency regex planner. Works fully offline. Best for simple, predictable goals: navigate, fill a field, click a button.
On-device LLM via WebGPU through window.__browserAgentWebLLM. Fully private. Supports the reflection loop and custom system prompts.
| Action | Description | Risk level |
|---|---|---|
navigate | Navigate to a URL (http/https only) | safe |
click | Click an element by CSS selector | safe / review |
type | Type text into an input or textarea | safe / review |
scroll | Scroll a container or the page | safe |
focus | Focus an element (useful for dropdowns) | safe |
wait | Pause for N milliseconds | safe |
extract | Extract text from an element | review |
done | Signal task completion | safe |
human-approved mode; executes in autonomous. Triggered by actions on labels matching delete / submit / pay / confirm / transfer.javascript:, file:, or malformed URLs.dist/ folder as an unpacked extension. Enter a goal in the popup, pick a mode, and hit Start. The background service worker drives the tick loop across tabs.goal + history + memory
│
▼
observer.collectSnapshot() → PageSnapshot (url, title, candidates[])
│
▼
planner.planNextAction() → PlannerResult
{ action, evaluation?, memory?, nextGoal? }
│
▼
safety.assessRisk(action) → safe | review | blocked
│
┌────┴──────────────────────────┐
blocked review (human-approved mode)
│ │
stop pause → user approves → resume()
│
safe / approved
│
▼
executor.executeAction(action) → result string
│
▼
session.history.push(result)
session.memory = plannerResult.memory
→ next tick
Attach an object to window.__browserAgentWebLLM before starting the agent. The bridge can return either the new PlannerResult format or a bare AgentAction (backward compatible).
window.__browserAgentWebLLM = {
async plan(input, modelId) {
// input.goal, input.snapshot, input.history,
// input.lastError, input.memory, input.systemPrompt
return {
evaluation: "Previous step succeeded.",
memory: "Name field is #name.",
next_goal: "Fill the email field.",
action: { "type": "type", "selector": "#email", "text": "jane@example.com", "clearFirst": true }
};
}
};
Embed OmniBrowser Agent as a library in any web application. Full reference in docs/EMBEDDING.md.
import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";
const agent = createBrowserAgent(
{
goal: "Search contact Jane Doe and open profile",
mode: "human-approved",
planner: { kind: "heuristic" },
maxSteps: 15,
stepDelayMs: 400
},
{
onStep: (result) => console.log("step", result),
onApprovalRequired: (action) => showApprovalModal(action),
onDone: (result) => console.log("done", result),
onError: (error) => console.error(error)
}
);
await agent.start();
// Approve a paused action:
await agent.approvePendingAction();
// Stop at any time:
agent.stop();
Load a WebLLM engine, wire the bridge, then start the agent. The bridge receives the full reflection input and should return the reflection+action object:
import * as webllm from "@mlc-ai/web-llm";
import { createBrowserAgent, parsePlannerResult } from "@akshayram1/omnibrowser-agent";
const engine = await webllm.CreateMLCEngine("Llama-3.2-3B-Instruct-q4f16_1-MLC");
window.__browserAgentWebLLM = {
async plan(input, modelId) {
const { goal, history, lastError, memory, systemPrompt } = input;
const defaultSystem = `You are a browser automation agent.
Output ONLY a JSON object in this format:
{"evaluation":"...","memory":"...","next_goal":"...","action":{...}}`;
const resp = await engine.chat.completions.create({
messages: [
{ role: "system", content: systemPrompt || defaultSystem },
{ role: "user", content: `Goal: "${goal}"\nHistory: ${history.slice(-4).join(" → ")}${memory ? "\nMemory: " + memory : ""}${lastError ? "\nLast error: " + lastError : ""}` }
],
temperature: 0,
max_tokens: 200
});
return parsePlannerResult(resp.choices[0].message.content);
}
};
const agent = createBrowserAgent({
goal: "Fill the checkout form with my details",
planner: { kind: "webllm" }
}, {
onStep(result) {
if (result.reflection?.nextGoal) console.log("💭", result.reflection.nextGoal);
console.log("✅", result.message);
}
});
await agent.start();
Shape the agent's personality or constraints without touching the bridge:
const agent = createBrowserAgent({
goal: "Book a meeting room for tomorrow",
planner: {
kind: "webllm",
systemPrompt: `You are a careful meeting room booking assistant.
Always confirm the room is available before clicking Book.
Never navigate away from the booking portal.`
}
});
Want a planner model tuned for OmniBrowser's DOM snapshots? Use the training + quantization assets in notebook/:
Flow: collect traces → fine-tune in Colab → run mlc_llm convert_weight --quantization q4f16_1 + gen_config → upload to HuggingFace → load via custom appConfig.
To use a model that isn't in WebLLM's built-in list, compile it with MLC-LLM and register it via appConfig:
import * as webllm from "@mlc-ai/web-llm";
// 1. Describe your compiled model
const myModel = {
model: "https://huggingface.co/your-org/your-model-MLC/resolve/main/",
model_id: "your-model-MLC",
// point to the compiled .wasm lib (same arch as a similar built-in model)
model_lib: webllm.modelLibURLPrefix + webllm.modelVersion
+ "/Mistral-7B-Instruct-v0.3-q4f16_1-ctx4k_cs1k-webgpu.wasm",
};
// 2. Merge with the prebuilt catalog so existing models still work
const engine = await webllm.CreateMLCEngine("your-model-MLC", {
appConfig: {
model_list: [
...webllm.prebuiltAppConfig.model_list,
myModel,
],
},
initProgressCallback({ progress, text }) {
console.log(Math.round(progress * 100) + "%", text);
},
});
// 3. Wire it up as usual
window.__browserAgentWebLLM = {
async plan(input) {
const resp = await engine.chat.completions.create({
messages: [{ role: "user", content: `Goal: ${input.goal}` }],
temperature: 0,
max_tokens: 200,
});
return parsePlannerResult(resp.choices[0].message.content);
}
};
Steps to compile a custom model: (1) Install mlc-llm — llm.mlc.ai/docs/install/mlc_llm. (2) Run mlc_llm convert_weight + mlc_llm gen_config. (3) If your architecture has no compatible prebuilt kernel, compile with mlc_llm compile. (4) Upload weights + mlc-chat-config.json to a CORS-enabled host (e.g. HuggingFace).
window.__browserAgentWebLLM.human-approved mode for CRM, finance, and admin actions.AgentAction still work — backward compatible.Full roadmap in docs/ROADMAP.md.
scroll, focusresume(), isRunning, hasPendingAction, AbortSignal, onMaxStepsReachedevaluation → memory → next_goal → act)AgentSession.memoryparsePlannerResult() exported from librarysystemPrompt option in PlannerConfig/examplesMaintainer: Akshay Chame
For feature requests or bugs, please open an issue on GitHub with reproduction steps.