clawdbot-workspace/research-browser-use-tools.md

# Browser Automation Tools for AI Agents — Comprehensive Research (Feb 2026)

> **TL;DR:** Browser Use is the most popular open-source framework (78k+ stars, 89% benchmark accuracy). Stagehand is the best TypeScript SDK with self-healing + caching. Playwright MCP is the most token-efficient for coding agents. Agent Browser is the fastest CLI. For infra, Browserbase leads cloud; Steel leads self-hosted. For hybrid deterministic+AI workflows, watch Notte and Skyvern.

---

## Table of Contents
1. [Browser Use](#1-browser-use)
2. [Playwright MCP](#2-playwright-mcp)
3. [Stagehand (by Browserbase)](#3-stagehand-by-browserbase)
4. [Agent Browser (Vercel Labs)](#4-agent-browser-vercel-labs)
5. [Puppeteer + AI Wrappers](#5-puppeteer--ai-wrappers)
6. [Steel.dev](#6-steeldev)
7. [Browserbase](#7-browserbase)
8. [Emerging Tools](#8-emerging-tools)
9. [Comparison Matrix](#9-comparison-matrix)
10. [Recommendations](#10-recommendations-for-clawdbot)

---

## 1. Browser Use
**Website:** [browser-use.com](https://browser-use.com) | **GitHub:** [browser-use/browser-use](https://github.com/browser-use/browser-use) | **Stars:** ~78,000+ | **Language:** Python

### Architecture
- **DOM distillation** — strips pages down to essential interactive elements, sends structured text to LLM (not screenshots). Reduces token consumption significantly.
- Built on **Playwright** under the hood for full browser control (JS rendering, screenshots, network interception).
- **Multi-tab support** — agents can work across multiple tabs simultaneously.
- **Memory & context** — maintains conversation history and page context across navigation steps.
- **ChatBrowserUse** — their own optimized LLM model purpose-built for browser automation tasks. Claims 3-5x faster task completion vs generic models.

### Speed
- Up to 20 browser steps per minute (with parallel agent integration).
- Cloud sandbox mode runs agent next to browser = minimal latency.
- ChatBrowserUse model optimized for speed; generic models (GPT-4o, Claude) are slower but work.

### Reliability
- **89.1% success rate on WebVoyager benchmark** (586 diverse web tasks) — current state-of-the-art for autonomous web interaction.
- Reddit sentiment is mixed: many users report initial impressiveness but **reliability issues in repeated/production runs**. Common complaints about slowness with generic models and flaky behavior on complex sites.
- "Browser-use sucks" threads exist alongside enthusiastic adopters.

### Session Persistence
- ✅ **Full session persistence** via cloud profiles — saves cookies, auth, local storage across runs.
- Can reuse existing Chrome profiles with saved logins.
- `cloud_profile_id` parameter syncs auth profiles with remote browsers.
- Profile sync via CLI: `curl -fsSL https://browser-use.com/profile.sh | BROWSER_USE_API_KEY=XXXX sh`

### Cost Structure
| Tier | Price | Details |
|------|-------|---------|
| Open Source | Free | Self-hosted, you pay LLM costs |
| Pay As You Go | $10 free credits | $0.06/hr browser, $10/GB proxy, ~$0.002/agent step |
| Business | $400/mo (annual) | $6000/yr credits, 250 concurrent, 50% off everything |
| Scaleup | $2000/mo (annual) | $30k/yr credits, 500 concurrent, 60% off proxy |
| ChatBrowserUse LLM | Per-token | $0.20/1M input, $0.02/1M cached, $2.00/1M output |

### Integration with Claude/Clawdbot
- **Model agnostic** — works with Claude, GPT-4o, Gemini, local models via LiteLLM.
- Has a **Claude Code Skill** — install via `curl` into `~/.claude/skills/browser-use/`.
- CLI mode (`browser-use open/click/type/screenshot`) keeps browser running between commands — perfect for agent integration.
- MCP support available.

### Verdict
**Best for:** Developers building custom Python-based AI browser agents. The most mature open-source option with the largest community. Production readiness improving but still has reliability quirks for complex workflows.

---

## 2. Playwright MCP
**GitHub:** [microsoft/playwright-mcp](https://github.com/microsoft/playwright-mcp) | **Stars:** ~18,000+ | **Language:** TypeScript/Node.js | **By:** Microsoft

### Architecture
- **Accessibility tree-based** — operates on structured accessibility snapshots, NOT screenshots or vision models.
- Deterministic tool application — avoids ambiguity of screenshot-based approaches.
- Exposes Playwright's full automation toolkit over Model Context Protocol (MCP).
- LLMs interact with pages through structured data — no vision model needed.
- Also has a **Vision Mode** (opt-in) using screenshots + coordinate-based interaction for computer use models.
- Microsoft recommends **CLI+SKILLS** over MCP for coding agents (more token-efficient).

### Speed
- **Fastest for token efficiency** — avoids loading large tool schemas into context. CLI commands are concise.
- Accessibility tree snapshots are lightweight compared to screenshots.
- Very fast for simple interactions; more overhead for complex multi-step reasoning since each step requires an MCP round-trip.

### Reliability
- Playwright itself is the gold standard (45.1% adoption among QA professionals — #1 framework).
- MCP server is mature at v1.0.10+ — works reliably for cross-browser testing, exploratory automation, and debugging workflows.
- **Best for persistent state, rich introspection, and iterative reasoning** over page structure.
- Long-running autonomous workflows work well due to continuous browser context.

### Session Persistence
- ✅ Browser stays open across MCP calls — continuous context.
- Manages page state, cookies, local storage throughout a session.
- Can connect to existing browser instances.
- **No built-in cloud session persistence** — you manage your own browser lifecycle.

### Cost Structure
- **Completely free and open source** (MIT license by Microsoft).
- You only pay for the LLM you use with it.
- No cloud service — runs locally.

### Integration with Claude/Clawdbot
- **First-class Claude Code support:** `claude mcp add playwright npx @playwright/mcp@latest`
- Works with Cursor, VS Code, Windsurf, GitHub Copilot, Codex, Goose, Gemini CLI, and more.
- One of the most widely supported MCP servers.
- **Already available in Clawdbot** via browser tool's Playwright-based automation.

### Verdict
**Best for:** Coding agents that need lightweight, token-efficient browser interaction. The most reliable underlying automation engine. Not an "agent framework" — it's a tool that agents use. Pair with an agent loop for autonomous workflows.

---

## 3. Stagehand (by Browserbase)
**Website:** [stagehand.dev](https://stagehand.dev) | **GitHub:** [browserbase/stagehand](https://github.com/browserbase/stagehand) | **Stars:** ~21,000+ | **Language:** TypeScript (Python SDK also available)

### Architecture
- **Hybrid approach** — lets developers choose when to use code vs. natural language.
- Three core primitives: `act()` (single actions), `extract()` (structured data), `agent()` (multi-step flows).
- **v3 architecture** (latest): Dropped Playwright dependency, now operates directly on **Chrome DevTools Protocol (CDP)**.
- **Self-healing + auto-caching**: Caches successful actions for replay without LLM inference. If cached selector breaks, auto-falls back to AI to find the new element.
- Context builder reduces token waste — feeds models only essential page data.
- Model-agnostic Agent Mode — works with any LLM or Computer Use Agent (CUA).
- SDKs in TypeScript, Python, Java, C#, Ruby, Rust (alpha).

### Speed
- **v3 is 44% faster** than v2 across iframes and shadow-root interactions.
- CDP-direct means fewer websocket round-trips.
- Cached actions run **without LLM inference** — near-instant on repeated workflows.
- 500k+ weekly npm downloads.

### Reliability
- **Best-in-class for production automation** — purpose-built for stability over time.
- Self-healing means workflows survive website changes without breaking.
- "Write once, run forever" philosophy.
- Strong community adoption — used by production teams at scale.
- Dev sentiment very positive: "The feature that sold me — caching + self-healing."

### Session Persistence
- ✅ When used with Browserbase — full session management with cookies/localStorage across agent runs.
- Stealth mode + auto CAPTCHA solving.
- Session recordings for debugging (watch exactly what the agent did).
- Without Browserbase — can run locally with local Chromium.

### Cost Structure
- **Open source (MIT)** — free to use locally.
- Cloud features via **Browserbase** (see Browserbase pricing below).
- LLM costs are on you (but caching dramatically reduces them over time).

### Integration with Claude/Clawdbot
- MCP integration available — works with Claude, Cursor, etc.
- TypeScript-first — great for Node.js-based agent systems.
- Python SDK available for Python-based integrations.
- Can run locally (no Browserbase required) or in cloud.

### Verdict
**Best for:** Production TypeScript-based browser automations where reliability matters most. The self-healing + caching combo is genuinely unique. The best "middle ground" between full agent chaos and brittle deterministic scripts.

---

## 4. Agent Browser (Vercel Labs)
**Website:** [agent-browser.dev](https://agent-browser.dev) | **GitHub:** [vercel-labs/agent-browser](https://github.com/vercel-labs/agent-browser) | **Stars:** ~14,000+ | **Language:** Rust CLI + Node.js

### Architecture
- **CLI-first** — headless browser automation designed specifically for AI agents to invoke as shell commands.
- Three-layer architecture: **Rust CLI** (sub-millisecond parsing) → daemon → Chromium.
- Uses **accessibility tree with refs** (`@e1`, `@e2`) for deterministic element selection — best for AI context efficiency.
- 50+ commands covering navigation, forms, screenshots, network, storage.
- Compact text output optimized for AI context windows.

### Speed
- **Fastest CLI** — native Rust binary, sub-millisecond command parsing.
- Daemon keeps browser running between commands — no startup overhead.
- Ideal for high-throughput coding agents that balance browser + code within limited context windows.

### Reliability
- Deterministic ref-based selection is more reliable than screenshot-based approaches.
- Session isolation — multiple browser instances with separate auth.
- Relatively new (Jan 2026 emergence) — less battle-tested than Browser Use or Stagehand.
- Cross-platform: macOS, Linux, Windows.

### Session Persistence
- ✅ **Multiple isolated sessions** with separate auth via named sessions.
- Cookie/storage management per session.
- Browser stays running between commands.
- Session commands: `agent-browser session new/list/switch/cookie`.

### Cost Structure
- **Completely free and open source** (MIT, by Vercel Labs).
- No cloud service — runs locally.
- Only LLM costs apply.

### Integration with Claude/Clawdbot
- **Works with Claude Code, Cursor, GitHub Copilot, OpenAI Codex, Gemini CLI, Goose, Windsurf** out of the box.
- SKILL file available for Claude Code integration.
- Perfect for CLI-based agent workflows (Clawdbot's `exec` tool could invoke it directly).
- Most natural fit for "coding agent needs to check a web page" pattern.

### Verdict
**Best for:** CLI-native coding agents that need fast, deterministic browser control. The Rust speed is real. Ideal as a tool for Claude Code / Clawdbot exec rather than as a standalone agent framework.

---

## 5. Puppeteer + AI Wrappers
**GitHub:** [puppeteer/puppeteer](https://github.com/puppeteer/puppeteer) | **Stars:** ~90,300+ | **Language:** TypeScript/Node.js | **By:** Google

### Architecture
- Puppeteer provides high-level API to control Chrome over DevTools Protocol.
- AI wrappers typically combine Puppeteer with vision models:
  - **Screenshot → GPT-4V/Claude Vision** — capture screenshot, send to LLM, get next action.
  - **DOM extraction → LLM** — extract page content, send structured text.
  - **Midscene.js** — vision-driven UI automation layer on top of Puppeteer.
- Notable projects: `puppeteer-browser-agent`, MCP servers for Puppeteer (stealth variants).

### Speed
- Puppeteer itself is fast (direct CDP).
- AI wrapper overhead depends on approach — vision (slow, lots of tokens) vs. DOM extraction (faster).
- Generally slower than purpose-built frameworks like Browser Use or Stagehand because no optimization for AI workflows.

### Reliability
- Puppeteer is extremely mature and well-tested.
- AI wrappers add a fragility layer — screenshot-based approaches are more error-prone.
- No self-healing, no caching, no agent memory by default.
- Requires significant custom code to match Browser Use / Stagehand features.

### Session Persistence
- ✅ Puppeteer manages sessions, cookies, local storage natively.
- Persistent contexts available via `puppeteer.launch({ userDataDir: '...' })`.
- No cloud session management unless paired with Browserbase/Steel.

### Cost Structure
- **Free and open source** (Apache 2.0, by Google).
- LLM costs for AI wrapper.
- Build everything yourself.

### Integration with Claude/Clawdbot
- Puppeteer MCP servers exist (community-built).
- More manual work than Playwright MCP or Browser Use.
- Works but you're reinventing what purpose-built tools already provide.

### Verdict
**Use only if:** You have existing Puppeteer infrastructure. For new projects, Playwright MCP or Stagehand are strictly superior for AI agent use cases. Puppeteer is being superseded by Playwright in the AI agent ecosystem.

---

## 6. Steel.dev
**Website:** [steel.dev](https://steel.dev) | **GitHub:** [steel-dev/steel-browser](https://github.com/steel-dev/steel-browser) | **Stars:** ~6,400+ | **Language:** TypeScript/Node.js

### Architecture
- **Open-source browser API** — a "batteries-included" browser sandbox.
- Manages sessions, pages, and browser processes — you connect via CDP using Puppeteer, Playwright, or Selenium.
- REST API for browser operations: create sessions, scrape, screenshot, PDF, convert to markdown.
- Built-in stealth plugins, fingerprint management, proxy chain rotation.
- Session viewer for debugging (live + recorded).
- Docker-based deployment.

### Speed
- Fast for what it does — browser management, not AI reasoning.
- Focused on reducing infra setup time, not agent execution speed.
- Sessions can run up to 24 hours.

### Reliability
- Solid infrastructure layer — automatic cleanup, browser lifecycle management.
- In public beta — "evolving every day" per their own docs.
- Smaller community than Browserbase but active Discord.
- Good for self-hosted scenarios where you control everything.

### Session Persistence
- ✅ **Excellent** — core feature. Save and inject cookies + local storage to pick up where you left off.
- Session state maintained across requests.
- Persistent browser instances.

### Cost Structure
- **Fully open source** — self-host for free.
- **Steel Cloud** (hosted) available at [app.steel.dev](https://app.steel.dev) — pricing not publicly listed (contact sales).
- One-click deploy to Railway or Render.
- Docker image available.

### Integration with Claude/Clawdbot
- Connect any Playwright/Puppeteer code to Steel's endpoint.
- REST API for session management.
- Good for Clawdbot if you want self-hosted browser infra without building from scratch.

### Verdict
**Best for:** Self-hosted browser infrastructure. If you want full control over your browser fleet without vendor lock-in and don't want to build session management from scratch. Pairs well with any agent framework (Browser Use, Stagehand, etc.) as the underlying browser layer.

---

## 7. Browserbase
**Website:** [browserbase.com](https://www.browserbase.com) | **Stagehand creator** | **Funding:** $40M Series B ($300M valuation, Jun 2025)

### Architecture
- **Cloud browser infrastructure** — serverless, managed Chrome instances optimized for AI agents.
- Not an agent framework itself — it's the browser layer that frameworks run on.
- Provides remote CDP endpoints that work with Playwright, Puppeteer, Selenium, Stagehand.
- 50 million sessions processed in 2025 across 1,000+ paying customers.
- Features: stealth mode, auto CAPTCHA solving, proxy rotation, session recording, session replay.
- "Director" no-code tool for building automations.

### Speed
- Cloud browsers optimized for AI workloads.
- Network-layer optimizations amplify speed when paired with Stagehand v3.
- Reliable global infrastructure — multi-region.

### Reliability
- **Most proven cloud browser platform** — 1,000+ paying customers, used by Microsoft's AI teams.
- Session recordings for debugging exactly what went wrong.
- 30-day data retention on paid plans.

### Session Persistence
- ✅ **Core feature** — persistent browser sessions with cookie/localStorage management across agent runs.
- Session duration: 15 min (free) to 6+ hours (paid).
- "Browserbase Agent Identity" feature (waitlist) for persistent bot identity.

### Cost Structure
| Plan | Price | Browser Hours | Concurrency | Stealth |
|------|-------|---------------|-------------|---------|
| Free | $0/mo | 1 hr | 1 | None |
| Developer | $20/mo | 100 hrs (then $0.12/hr) | 25 | Basic + CAPTCHA |
| Startup | $99/mo | 500 hrs (then $0.10/hr) | 100 | Basic + CAPTCHA |
| Scale | Custom | Usage-based | 250+ | Advanced + CAPTCHA |
| Proxies | — | $10-12/GB | — | — |

### Integration with Claude/Clawdbot
- Stagehand (their SDK) has MCP support for Claude.
- Any Playwright/Puppeteer code points to their endpoint.
- Functions (serverless code execution) free on all plans.

### Verdict
**Best for:** Teams that need managed, scalable cloud browser infrastructure without ops overhead. The "AWS of browser infra" for AI agents. Pairs naturally with Stagehand but works with any framework.

---

## 8. Emerging Tools

### Skyvern
**Website:** [skyvern.com](https://www.skyvern.com) | **GitHub Stars:** ~20,000+ | **Language:** Python | **YC-backed**

- **Architecture:** LLMs + Computer Vision — uses both DOM understanding AND visual element detection.
- Multi-agent system: planning agent, acting agent, validation agent.
- **Unique:** Can work on websites it's never seen before without custom code.
- 85.8% on WebVoyager eval (Skyvern 2.0).
- **Pricing:** Free tier (~170 actions), Hobby $29/mo (~1,200 actions), Pro $149/mo.
- No-code workflow builder available.
- Reddit sentiment: impressive demos but **slower and more expensive** than Browser Use for equivalent tasks. Computer vision approach adds latency.

### Notte
**Website:** [notte.cc](https://www.notte.cc) | **GitHub:** [nottelabs/notte](https://github.com/nottelabs/notte) | **Language:** Python

- **Hybrid approach:** Use AI agent to discover workflow once, then convert to deterministic code.
- Two modes: Agent mode (natural language) and Demonstration mode (manually click → generate code).
- "Only use the LLM for discovery or failures, not every run" — dramatically reduces cost.
- Close Playwright compatibility — mix web automation primitives with agents.
- Built-in CAPTCHA solving, proxies, anti-detection.
- **Most interesting newcomer** for production reliability — addresses the core "agents are unreliable" problem.

### Hyperbrowser
**Website:** [hyperbrowser.ai](https://www.hyperbrowser.ai) | **Cloud browser competitor to Browserbase**

- Cloud browsers for AI agents.
- HyperAgent SDK records actions during AI runs, enabling deterministic replay without LLM calls.
- Newer, less proven than Browserbase.
- Generally cheaper entry point.

### Firecrawl
**Website:** [firecrawl.dev](https://www.firecrawl.dev) | **GitHub Stars:** 82,000+

- Not a browser agent framework — it's a **web data extraction layer**.
- Search → Navigate → Extract structured data from any URL.
- Agent endpoint for autonomous navigation + extraction.
- Best for RAG pipelines and data extraction, not interactive web tasks.
- SOC 2 Type 2 compliant.
- Pricing: Free 500 credits, $16/mo+.

### Claude Computer Use + Cowork
- Anthropic's native approach — Claude controls a full desktop environment via screenshots + mouse/keyboard.
- **Claude for Chrome** extension — direct browser control from Claude Code.
- Slower than structured approaches (screenshot-per-step overhead).
- Most general-purpose but least efficient for repetitive browser tasks.

---

## 9. Comparison Matrix

| Tool | Type | Architecture | GitHub Stars | Session Persist | Local/Cloud | Speed | Reliability | Cost |
|------|------|-------------|-------------|----------------|-------------|-------|-------------|------|
| **Browser Use** | Agent Framework | DOM distillation | 78k+ | ✅ Cloud profiles | Both | Medium | Good (89% bench) | Free OSS / $0.002+/step cloud |
| **Playwright MCP** | MCP Server | Accessibility tree | 18k+ | ✅ (within session) | Local only | Fast (token-efficient) | Excellent | Free |
| **Stagehand** | SDK Framework | CDP + AI hybrid | 21k+ | ✅ (via Browserbase) | Both | Fast (v3 44% ↑) | Best (self-healing) | Free OSS / BB pricing |
| **Agent Browser** | CLI Tool | A11y tree + refs | 14k+ | ✅ Named sessions | Local only | Fastest (Rust) | Good | Free |
| **Puppeteer** | Low-level Lib | CDP + custom wrappers | 90k+ | ✅ User data dirs | Local | Medium | Good (mature) | Free |
| **Steel.dev** | Browser Infra | REST API + CDP | 6.4k+ | ✅ Cookie injection | Self-hosted/Cloud | N/A (infra) | Good (beta) | Free self-host |
| **Browserbase** | Cloud Infra | Serverless Chrome | N/A | ✅ Cross-session | Cloud only | Optimized | Excellent | $0-99+/mo |
| **Skyvern** | Agent Platform | LLM + Computer Vision | 20k+ | ✅ Stored creds | Both | Slower (vision) | Good (86% bench) | $0-149+/mo |
| **Notte** | Hybrid Framework | Playwright + AI | Newer | ✅ | Both | Fast (deterministic replay) | Promising | TBD |

---

## 10. Recommendations for Clawdbot

### For Clawdbot's Needs (autonomous browsing, logins, forms, scraping, web app interaction):

#### 🥇 Best Overall Stack: **Stagehand + Browserbase**
- Self-healing means workflows survive site changes.
- Caching means repeated tasks are near-instant (no LLM costs).
- Browserbase handles session persistence, stealth, CAPTCHAs.
- TypeScript-native fits Node.js agent architecture.
- MCP integration available for Claude.
- **Cost:** $20-99/mo for Browserbase + LLM costs (reduced by caching).

#### 🥈 Best Free/Local Stack: **Playwright MCP + Agent Browser**
- Playwright MCP for structured browser interaction from Claude Code.
- Agent Browser for fast CLI-based commands from exec tool.
- Both free, both local, both work with Clawdbot today.
- No session persistence across restarts without custom code.
- **Cost:** $0 (just LLM costs).

#### 🥉 Best Python Agent Stack: **Browser Use**
- Largest community, most examples, most integrations.
- Cloud API handles everything if willing to pay.
- Python-native — good if building Python-based agents.
- ChatBrowserUse model is cheap and fast for browser tasks.
- **Cost:** $10 free credits, then pay-as-you-go.

#### 🏅 Most Interesting for Production: **Notte** (watch closely)
- Hybrid approach solves the fundamental "agents are unreliable" problem.
- Record workflow once → deterministic replay → AI only for failures.
- Still early but the architecture is exactly right for production reliability.

### What Clawdbot Already Has
Clawdbot's built-in `browser` tool already provides Playwright-based browser automation (snapshot, screenshot, act, navigate, click, type, etc.). This is functionally similar to Playwright MCP. For most tasks, this is sufficient.

**When to upgrade:**
- Need stealth/anti-detection → Add Browserbase or Browser Use Cloud
- Need self-healing workflows → Add Stagehand
- Need fast CLI browser control for coding tasks → Add Agent Browser
- Need autonomous multi-step web tasks → Add Browser Use or Stagehand agent layer

---

*Research compiled Feb 18, 2026. Market is evolving rapidly — re-evaluate quarterly.*