clawdbot-workspace/research-computer-use-macos.md

# macOS Computer Use Tools for AI Agents — Deep Research (Feb 2026)

> **Context:** Evaluating the best "computer use" tools/frameworks for AI agents running on an always-on Mac mini M-series (specifically for Clawdbot/OpenClaw-style automation).

---

## Table of Contents

1. [Anthropic Computer Use](#1-anthropic-computer-use)
2. [Apple Accessibility APIs](#2-apple-accessibility-apis)
3. [Peekaboo](#3-peekaboo)
4. [Open Interpreter](#4-open-interpreter)
5. [Other Frameworks](#5-other-notable-frameworks)
   - [macOS-use (browser-use)](#51-macos-use-browser-use)
   - [Agent S (Simular.ai)](#52-agent-s-simularai)
   - [C/ua (trycua)](#53-cua-trycua)
   - [mcp-server-macos-use (mediar-ai)](#54-mcp-server-macos-use-mediar-ai)
   - [mcp-remote-macos-use](#55-mcp-remote-macos-use)
   - [macOS Automator MCP (steipete)](#56-macos-automator-mcp-steipete)
   - [mac_computer_use (deedy)](#57-mac_computer_use-deedy)
6. [Comparison Matrix](#6-comparison-matrix)
7. [Recommendations for Mac Mini Agent Setup](#7-recommendations-for-mac-mini-agent-setup)
8. [Headless / SSH Considerations](#8-headless--ssh-considerations)

---

## 1. Anthropic Computer Use

**GitHub:** [anthropics/anthropic-quickstarts](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo)
**Mac forks:** [deedy/mac_computer_use](https://github.com/deedy/mac_computer_use), [PallavAg/claude-computer-use-macos](https://github.com/PallavAg/claude-computer-use-macos), [newideas99/Anthropic-Computer-Use-MacOS](https://github.com/newideas99/Anthropic-Computer-Use-MacOS)

### How It Works
- **Screenshot-based.** The model receives screenshots and reasons about pixel coordinates.
- Claude sends actions (mouse_move, click, type, screenshot) to a local executor.
- On macOS, the executor uses `cliclick` for mouse/keyboard and `screencapture` for screenshots.
- The model identifies coordinates by "counting pixels" — trained specifically for coordinate estimation.
- Anthropic recommends XGA (1024×768) or WXGA (1280×800) resolution for best accuracy.
- The official demo uses Docker + Ubuntu (xdotool). macOS forks replace xdotool with `cliclick` and native `screencapture`.

### Speed / Latency
- **Slow.** Each action cycle involves: screenshot → upload image → API inference → parse response → execute action.
- A single click-and-verify cycle takes **3-8 seconds** depending on API latency.
- Multi-step tasks (e.g., open Safari, navigate, search) can take **30-120+ seconds**.
- Screenshot upload adds ~1-3s overhead per cycle (images are typically 100-500KB).

### Reliability
- **Moderate.** Coordinate estimation works well for large, distinct UI elements.
- Struggles with small buttons, dense UIs, and similar-looking elements.
- No DOM/accessibility tree awareness — purely visual. If the UI changes between screenshot and action, clicks can miss.
- Self-correction loop helps: model takes new screenshots after each action.
- Prone to **prompt injection** from on-screen text (major security concern).
- Simon Willison's testing (Oct 2024): works for simple tasks, fails on complex multi-step workflows.

### Setup Complexity
- **Moderate.** Requires: Python 3.12+, cliclick (`brew install cliclick`), Anthropic API key, macOS Accessibility permissions.
- Mac forks require cloning a repo + setting up a venv + environment variables.
- Some forks include a Streamlit UI for interactive testing.
- Must grant Terminal/Python Accessibility permissions in System Preferences.

### Headless / SSH
- **Problematic.** `screencapture` requires WindowServer (a GUI session).
- Over pure SSH without a display, `screencapture` fails silently or returns black images.
- **Workaround:** Use an HDMI dummy plug + Screen Sharing (VNC), or connect via Apple Remote Desktop. The screencapture then works against the VNC session.
- Not designed for headless operation.

### Cost
- **API costs only.** Anthropic API pricing (Feb 2026):
  - Claude Sonnet 4.5: $3/M input tokens, $15/M output tokens
  - Claude Opus 4.5: $5/M input tokens, $25/M output tokens
  - Each screenshot is ~1,500-3,000 tokens (image tokens)
  - A 10-step task might cost $0.05-0.30 depending on model and complexity
- Computer use itself is free — you run the executor locally.

### Reddit Sentiment
- **Excited but cautious.** r/Anthropic thread on unsandboxed Mac use got 24 upvotes, with comments calling it "dangerous but cool."
- r/macmini discussions show interest in buying Mac Minis specifically for this use case.
- Common complaints: slow, expensive at scale, not reliable enough for unsupervised use.
- Benjamin Anderson's blog post captures the zeitgeist: "Claude needs his own computer" — the coding agent + computer use convergence thesis.

---

## 2. Apple Accessibility APIs

**Documentation:** [Apple Mac Automation Scripting Guide](https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/AutomatetheUserInterface.html)

### How It Works
- **Accessibility tree-based.** macOS exposes every UI element (buttons, text fields, menus, etc.) through the Accessibility framework (AXUIElement API).
- **Three access methods:**
  1. **AppleScript / osascript:** `tell application "System Events" → tell process "Finder" → click button "OK"`. High-level scripting, easy to write.
  2. **JXA (JavaScript for Automation):** Same capabilities as AppleScript, written in JavaScript. Run via `osascript -l JavaScript`.
  3. **AXUIElement (C/Swift/Python via pyobjc):** Low-level programmatic access to the full accessibility tree. Can enumerate all UI elements, read properties (role, title, position, size), and perform actions (press, set value, etc.).
- Does NOT rely on screenshots — reads the actual UI element tree.
- Can traverse the entire hierarchy: Application → Window → Group → Button → etc.

### Speed / Latency
- **Fast.** AppleScript commands execute in **10-100ms**. AXUIElement API calls are typically **1-10ms**.
- No image capture, no network round-trip, no model inference.
- Menu clicks, text entry, window management — all near-instantaneous.
- Can enumerate hundreds of UI elements in <100ms.

### Reliability
- **High for supported apps.** Most native macOS apps and many Electron apps expose accessibility info.
- Apple's own apps (Finder, Safari, Mail, Calendar, Notes) have excellent accessibility support.
- Electron apps (VS Code, Slack, Discord) expose basic accessibility but may have gaps.
- Web content in browsers is accessible via accessibility APIs (each DOM element maps to an AX element).
- **Failure modes:** Apps with custom rendering (games, some media apps) may not expose UI elements. Some apps have broken accessibility annotations.

### Setup Complexity
- **Low.** AppleScript is built into macOS — no installation needed.
- `osascript` is available in every terminal.
- For Python access: `pip install pyobjc-framework-ApplicationServices`
- **Critical requirement:** Must enable Accessibility permissions for the calling application (Terminal, Python, etc.) in System Preferences → Privacy & Security → Accessibility.
- For automation across apps: System Preferences → Privacy & Security → Automation.

### Headless / SSH
- **Partially works.** AppleScript/osascript commands work over SSH **if** a GUI session is active (user logged in).
- AXUIElement requires WindowServer to be running.
- Works well with headless Mac Mini + HDMI dummy plug + remote login session.
- `osascript` may throw "not allowed assistive access" errors over SSH — the calling process (sshd, bash) needs to be in the Accessibility allow list.
- **Workaround:** Save scripts as .app bundles, grant them Accessibility access, then invoke from SSH.

### Cost
- **Free.** Built into macOS, no API costs.

### Best For
- **Structured automation:** "Click the Save button in TextEdit" rather than "figure out what's on screen."
- **Fast, deterministic workflows** where you know the target app and UI structure.
- **Combining with an LLM:** Feed the accessibility tree to an LLM, let it decide which element to interact with. This is what Peekaboo, mcp-server-macos-use, and macOS-use all do under the hood.

### Limitations
- **No visual understanding.** Can't interpret images, charts, or custom-drawn content.
- **Fragile element references:** If an app updates, button names/positions may change.
- **Permission hell:** Each calling app needs separate Accessibility + Automation grants. Can't grant to `osascript` directly (it's not an .app).

---

## 3. Peekaboo

**GitHub:** [steipete/Peekaboo](https://github.com/steipete/Peekaboo)
**Website:** [peekaboo.boo](https://www.peekaboo.boo/)
**Author:** Peter Steinberger (well-known iOS/macOS developer)

### How It Works
- **Hybrid: screenshot + accessibility tree.** This is Peekaboo's killer feature.
- The `see` command captures a screenshot AND overlays element IDs from the accessibility tree, creating an annotated snapshot.
- The `click` command can target elements by: accessibility ID, label text, or raw coordinates.
- **Full GUI automation suite:** click, type, press, hotkey, scroll, swipe, drag, move, window management, app control, menu interaction, dock control, dialog handling, Space switching.
- **Native Swift CLI** — compiled binary, not Python. Fast and deeply integrated with macOS APIs.
- **MCP server mode** — can be used as an MCP tool by Claude Desktop, Cursor, or any MCP client.
- **Agent mode** — `peekaboo agent` runs a natural-language multi-step automation loop (capture → LLM decide → act → repeat).
- Supports multiple AI providers: OpenAI, Claude, Grok, Gemini, Ollama (local).

### Speed / Latency
- **Fast.** Screenshot capture via ScreenCaptureKit is <100ms. Accessibility tree traversal is similarly fast.
- Individual click/type/press commands execute in **10-50ms**.
- Agent mode latency depends on the LLM provider (1-5s per step with cloud APIs).
- Much faster than pure screenshot-based approaches because clicks target element IDs, not pixel coordinates.

### Reliability
- **High.** Using accessibility IDs instead of pixel coordinates means:
  - Clicks don't miss due to resolution changes or slight UI shifts.
  - Elements are identified by semantic identity (button label, role), not visual appearance.
- The annotated snapshot approach gives the LLM **both** visual context and structural data — best of both worlds.
- Menu interaction, dialog handling, and window management are deeply integrated.
- Created by Peter Steinberger — high-quality Swift code, actively maintained.

### Setup Complexity
- **Low.** `brew install steipete/tap/peekaboo` — single command.
- Requires macOS 15+ (Sequoia), Screen Recording permission, Accessibility permission.
- MCP server mode: `npx @steipete/peekaboo-mcp@beta` (zero-install for Node users).
- Configuration for AI providers via `peekaboo config`.

### Headless / SSH
- **Requires a GUI session** (ScreenCaptureKit and accessibility APIs need WindowServer).
- Works with Mac Mini + HDMI dummy plug + Screen Sharing.
- Can be invoked over SSH if a GUI login session is active.
- The CLI nature makes it easy to script and automate remotely.

### Cost
- **Free and open-source** (MIT license).
- AI provider costs apply when using `peekaboo agent` or `peekaboo see --analyze`.
- Local models via Ollama = zero marginal cost.

### Reddit / Community Sentiment
- Very well-received in the macOS developer community.
- Peter Steinberger's reputation lends credibility.
- Described as "giving AI agents eyes on macOS."
- Praised for the hybrid screenshot+accessibility approach.
- Active development — regular releases with new features.

### Why Peekaboo Stands Out
- **Best-in-class for macOS-specific automation.** It's what a senior macOS developer would build if they were making the perfect agent tool.
- Complete command set: see, click, type, press, hotkey, scroll, swipe, drag, window, app, space, menu, menubar, dock, dialog.
- Runnable automation scripts (`.peekaboo.json`).
- Clean JSON output for programmatic consumption.

---

## 4. Open Interpreter

**Website:** [openinterpreter.com](https://www.openinterpreter.com/)
**GitHub:** [OpenInterpreter/open-interpreter](https://github.com/OpenInterpreter/open-interpreter)

### How It Works
- **Primarily code execution**, with experimental "OS mode" for GUI control.
- Normal mode: LLM generates Python/bash/JS code, executes it locally.
- **OS mode** (`interpreter --os`): Screenshot-based. Takes screenshots, sends to a vision model (GPT-4V, etc.), model reasons about actions, executes via pyautogui.
- Also includes 01 Light hardware — a portable voice interface that connects to a home computer.

### Speed / Latency
- Normal mode (code execution): **Fast** — direct code execution, limited by LLM inference time.
- OS mode: **Slow** — same screenshot→API→action loop as Anthropic Computer Use.
- OS mode is explicitly labeled "highly experimental."

### Reliability
- Normal mode: **Good** for code-centric tasks. LLM writes code that runs on your machine.
- OS mode: **Low.** Labeled as "work in progress." Community reports frequent failures.
- Single monitor only. No multi-display support in OS mode.
- Better at tasks that can be accomplished via code (file manipulation, API calls, data processing) than GUI interaction.

### Setup Complexity
- **Low.** `pip install open-interpreter` and `interpreter --os`.
- Requires Screen Recording permissions on macOS.
- API key for your chosen LLM provider.

### Headless / SSH
- Normal mode (code execution): **Works perfectly** over SSH.
- OS mode: **Requires GUI session** (uses pyautogui + screenshots).

### Cost
- **Free and open-source.**
- LLM API costs apply.

### Reddit Sentiment
- Community has cooled on Open Interpreter since the initial hype.
- OS mode is seen as a proof-of-concept, not production-ready.
- Normal mode (code execution) is valued but outcompeted by Claude Code, Cursor, etc.
- 01 Light hardware project had enthusiastic reception but unclear adoption.

### Verdict
- **Not recommended for computer use / GUI automation.** Its strength is code execution, and dedicated coding agents (Claude Code, Codex) do that better now.
- OS mode is too experimental and unreliable for production use.

---

## 5. Other Notable Frameworks

### 5.1 macOS-use (browser-use)

**GitHub:** [browser-use/macOS-use](https://github.com/browser-use/macOS-use)
**Install:** `pip install mlx-use`

**How it works:** Screenshot-based. Takes screenshots, sends to vision model (OpenAI/Anthropic/Gemini), model returns actions (click coordinates, type text, etc.), executes via pyautogui/AppleScript.

**Key details:**
- Spin-off from the popular browser-use project.
- Supports OpenAI, Anthropic, Gemini APIs.
- Vision: plans to support local inference via Apple MLX framework (not yet implemented).
- Works across ALL macOS apps, not just browsers.
- Early stage — "varying success rates depending on task prompt."
- **Security warning:** Can access credentials, stored passwords, and all UI components.

**Speed:** Slow (cloud API round-trip per action).
**Reliability:** Low-moderate. Early development.
**Setup:** `pip install mlx-use`, configure API key.
**Headless:** Requires GUI session.
**Cost:** Free + API costs.
**Sentiment:** Exciting concept but immature. Reddit post got moderate engagement.

---

### 5.2 Agent S (Simular.ai)

**GitHub:** [simular-ai/Agent-S](https://github.com/simular-ai/Agent-S)
**Website:** [simular.ai](https://www.simular.ai/)

**How it works:** Multi-model system using **screenshot + grounding model + planning model.**
- Agent S3 (latest) uses a planning LLM (e.g., GPT-5, Claude) + a grounding model (UI-TARS-1.5-7B) for precise element location.
- The grounding model takes screenshots and returns precise coordinates for UI elements.
- Supports macOS, Windows, Linux.
- **State-of-the-art results:** Agent S3 was the first to surpass human performance on OSWorld benchmark (72.6%).
- ICLR 2025 Best Paper Award.

**Key details:**
- Requires two models: a main reasoning model + a grounding model (UI-TARS-1.5-7B recommended).
- The grounding model can be self-hosted on Hugging Face Inference Endpoints.
- Optional local coding environment for code execution tasks.
- Uses pyautogui for actions + screenshots for perception.
- CLI interface: `agent_s --provider openai --model gpt-5-2025-08-07 --ground_provider huggingface ...`

**Speed:** Moderate. Two-model inference adds latency. Grounding model can be local for faster inference.
**Reliability:** **Highest reported.** 72.6% on OSWorld surpasses human performance.
**Setup:** Complex. Requires two models, API keys, grounding model deployment.
**Headless:** Requires GUI session (pyautogui + screenshots).
**Cost:** Free (open source) + API costs for both models. UI-TARS-7B hosting adds cost.
**Sentiment:** Highly respected in the research community. ICLR paper, strong benchmarks. The "serious" option for computer use research.

---

### 5.3 C/ua (trycua)

**GitHub:** [trycua/cua](https://github.com/trycua/cua)
**Website:** [cua.ai](https://cua.ai/)
**YC Company**

**How it works:** **Sandboxed virtual machines** for computer use agents.
- Runs macOS or Linux VMs on Apple Silicon using Apple's Virtualization.Framework.
- Near-native performance (97% of native CPU speed reported).
- Provides a complete SDK for agents to control the VM: click, type, scroll, screenshot, accessibility tree.
- **CuaBot:** CLI tool that gives any coding agent (Claude Code, OpenClaw) a sandbox.
- Includes benchmarking suite (cua-bench) for evaluating agents on OSWorld, ScreenSpot, etc.

**Key details:**
- `lume` — macOS/Linux VM management on Apple Silicon (their virtualization layer).
- `lumier` — Docker-compatible interface for Lume VMs.
- Agent SDK supports multiple models (Anthropic, OpenAI, etc.).
- Designed specifically for the "give your agent a computer" use case.
- Sandboxed = safe. Agent can't damage your host system.

**Speed:** Near-native. VM overhead is minimal on Apple Silicon.
**Reliability:** Good. VM provides consistent environment.
**Setup:** Moderate. `npx cuabot` for quick start, or programmatic setup via Python SDK.
**Headless:** **Excellent.** VMs run headless by design. H.265 streaming for when you want to observe.
**Cost:** Free and open source (MIT). API costs for the AI model.
**Sentiment:** Strong interest on r/LocalLLaMA. "Docker for computer use agents" resonates. YC backing adds credibility.

**Why C/ua matters:** It solves the biggest problem with giving agents computer access — **safety.** The agent operates in an isolated VM, can't touch your host system. Perfect for always-on Mac Mini setups.

---

### 5.4 mcp-server-macos-use (mediar-ai)

**GitHub:** [mediar-ai/mcp-server-macos-use](https://github.com/mediar-ai/mcp-server-macos-use)

**How it works:** **Accessibility tree-based.** Swift MCP server that controls macOS apps through AXUIElement APIs.
- Every action (click, type, press key) is followed by an accessibility tree traversal, giving the LLM updated UI state.
- Tools: open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, refresh_traversal.
- Communicates via stdin/stdout (MCP protocol).
- Uses the app's PID (process ID) for targeting.

**Speed:** Fast. Native Swift, accessibility APIs are low-latency.
**Reliability:** High for apps with good accessibility support.
**Setup:** Build with `swift build`, configure in Claude Desktop or any MCP client.
**Headless:** Requires GUI session (accessibility APIs need WindowServer).
**Cost:** Free and open source.
**Sentiment:** Niche but well-designed. Good for MCP-native workflows.

---

### 5.5 mcp-remote-macos-use

**GitHub:** [baryhuang/mcp-remote-macos-use](https://github.com/baryhuang/mcp-remote-macos-use)

**How it works:** **Screen Sharing-based remote control.** Uses macOS Screen Sharing (VNC) protocol.
- Captures screenshots and sends input over the VNC connection.
- Doesn't require any software installed on the target Mac (just Screen Sharing enabled).
- Deployable via Docker.
- No extra API key needed — works with any MCP client/LLM.

**Speed:** Moderate (VNC overhead).
**Reliability:** Moderate. VNC-level interaction.
**Setup:** Enable Screen Sharing on target Mac, configure env vars.
**Headless:** **Yes!** Designed for remote/headless operation via Screen Sharing.
**Cost:** Free.
**Sentiment:** Practical for remote Mac control scenarios.

---

### 5.6 macOS Automator MCP (steipete)

**GitHub:** [steipete/macos-automator-mcp](https://github.com/steipete/macos-automator-mcp)

**How it works:** **AppleScript/JXA execution via MCP.** Ships with 200+ pre-built automation recipes.
- Executes AppleScript or JXA (JavaScript for Automation) scripts.
- Knowledge base of common automations: toggle dark mode, extract URLs from Safari, manage windows, etc.
- Supports inline scripts, file-based scripts, and pre-built knowledge base scripts.
- TypeScript/Node.js implementation.

**Speed:** Fast. AppleScript executes in milliseconds.
**Reliability:** High for scripted automations. Depends on script quality.
**Setup:** `npx @steipete/macos-automator-mcp@latest` — minimal.
**Headless:** Partially. AppleScript works over SSH with GUI session active.
**Cost:** Free (MIT).
**Sentiment:** Great companion to Peekaboo. Same author (Peter Steinberger).

---

### 5.7 mac_computer_use (deedy)

**GitHub:** [deedy/mac_computer_use](https://github.com/deedy/mac_computer_use)

**How it works:** Fork of Anthropic's official computer-use demo, adapted for native macOS.
- Screenshot-based (screencapture + cliclick).
- Streamlit web UI.
- Multi-provider support (Anthropic, Bedrock, Vertex).
- Automatic resolution scaling.

**Speed:** Same as Anthropic Computer Use (slow — API round-trip per action).
**Reliability:** Same as Anthropic Computer Use (moderate).
**Setup:** Clone, pip install, set API key, run streamlit.
**Headless:** Same limitations (needs WindowServer).
**Cost:** Free + API costs.

---

## 6. Comparison Matrix

| Tool | Approach | Speed | Reliability | Setup | Headless | Cost | Best For |
|------|----------|-------|-------------|-------|----------|------|----------|
| **Anthropic Computer Use** | Screenshot + pixel coords | ⭐⭐ Slow | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate | ❌ Needs GUI | API costs | General-purpose computer use |
| **Apple Accessibility APIs** | Accessibility tree | ⭐⭐⭐⭐⭐ Instant | ⭐⭐⭐⭐ High | ⭐⭐⭐⭐ Low | ⚠️ Partial | Free | Deterministic automation |
| **Peekaboo** | **Hybrid: screenshot + accessibility** | ⭐⭐⭐⭐ Fast | ⭐⭐⭐⭐ High | ⭐⭐⭐⭐⭐ Easy | ⚠️ Needs GUI | Free + API | **Best macOS agent tool** |
| **Open Interpreter** | Screenshots (OS mode) | ⭐⭐ Slow | ⭐⭐ Low | ⭐⭐⭐⭐ Easy | ❌ OS mode needs GUI | Free + API | Code execution (not GUI) |
| **macOS-use** | Screenshots + pyautogui | ⭐⭐ Slow | ⭐⭐ Low-Med | ⭐⭐⭐ Easy | ❌ Needs GUI | Free + API | Cross-app automation (experimental) |
| **Agent S3** | Screenshots + grounding model | ⭐⭐⭐ Moderate | ⭐⭐⭐⭐⭐ Highest | ⭐⭐ Complex | ❌ Needs GUI | Free + 2× API | Research / highest accuracy |
| **C/ua** | VM sandbox + screenshot/a11y | ⭐⭐⭐⭐ Fast | ⭐⭐⭐⭐ Good | ⭐⭐⭐ Moderate | ✅ Yes | Free + API | **Safest sandboxed option** |
| **mcp-server-macos-use** | Accessibility tree (Swift) | ⭐⭐⭐⭐⭐ Fast | ⭐⭐⭐⭐ High | ⭐⭐⭐ Moderate | ⚠️ Needs GUI | Free | MCP-native workflows |
| **mcp-remote-macos-use** | VNC screen sharing | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate | ⭐⭐⭐ Easy | ✅ Yes | Free | Remote Mac control |
| **macOS Automator MCP** | AppleScript/JXA | ⭐⭐⭐⭐⭐ Instant | ⭐⭐⭐⭐ High | ⭐⭐⭐⭐⭐ Easy | ⚠️ Partial | Free | Scripted automations |

---

## 7. Recommendations for Mac Mini Agent Setup

### 🏆 Tier 1: Best Overall

**Peekaboo** is the clear winner for an always-on Mac Mini running AI agent automation.

**Why:**
- Hybrid approach (screenshot + accessibility tree) gives the best of both worlds
- Native Swift CLI = fast and deeply integrated with macOS
- MCP server mode works with any MCP client
- Complete automation toolkit (click, type, menu, window, dialog, etc.)
- Active development by a respected macOS developer
- Easy install (`brew install steipete/tap/peekaboo`)

**Recommended stack:**
```
Peekaboo (GUI automation)
+ macOS Automator MCP (AppleScript/JXA for scripted tasks)
+ Apple Accessibility APIs (direct AXUIElement for custom automation)
```

### 🥈 Tier 2: For Safety-Critical Use

**C/ua** if you need sandboxed execution (agent can't damage your host system).

**Why:**
- VM isolation = peace of mind for unsupervised operation
- Near-native performance on Apple Silicon
- Works headless by design
- Good for running untrusted or experimental agents
- YC-backed, strong engineering

### 🥉 Tier 3: For Research / Maximum Accuracy

**Agent S3** if you need the highest possible task completion rate and are willing to invest in setup complexity.

**Why:**
- Best benchmark results (72.6% on OSWorld, surpassing human performance)
- Two-model approach provides better grounding
- Research-grade quality
- But: complex setup, higher API costs

### For Clawdbot/OpenClaw Specifically

The ideal integration path:
1. **Peekaboo MCP** as the primary computer-use tool (add to MCP config)
2. **macOS Automator MCP** for common scripted tasks (dark mode, app control, etc.)
3. **Apple Accessibility APIs** via `osascript` for quick deterministic actions
4. Fall back to **Anthropic Computer Use** for tasks requiring pure visual reasoning

---

## 8. Headless / SSH Considerations

Running computer-use tools on a headless Mac Mini is a **critical concern** for always-on setups:

### The Core Problem
macOS GUI automation tools (screenshots, accessibility APIs, pyautogui, cliclick) require:
1. **WindowServer** to be running (a GUI session must exist)
2. **A display** (real or virtual) for screenshots to capture

### Solutions

1. **HDMI Dummy Plug** ($5-15): Plugs into HDMI port, tricks macOS into thinking a display is connected. This is **the most reliable solution** for headless Mac Minis.

2. **Apple Screen Sharing / VNC**: Enable Screen Sharing in System Settings. Connect from another Mac or use a VNC client. `screencapture` works against the active session.

3. **HDMI Dummy + Auto-Login**: Configure macOS to auto-login on boot, use HDMI dummy plug for display emulation. Most robust setup for unattended operation.

4. **C/ua VMs**: Run the agent in a VM — it has its own virtual display. No dummy plug needed.

### What Works Over SSH (with GUI session active)

| Capability | Works Over SSH? |
|-----------|----------------|
| `osascript` / AppleScript | ✅ Yes (if Accessibility granted) |
| `screencapture` | ✅ Yes (with GUI session + display) |
| `cliclick` | ✅ Yes (with GUI session) |
| Peekaboo CLI | ✅ Yes (with GUI session) |
| pyautogui | ✅ Yes (with GUI session) |

### Recommended Headless Setup

```
Mac Mini M4 + HDMI Dummy Plug
├── Auto-login enabled
├── Screen Sharing enabled (for monitoring)
├── SSH enabled (for CLI access)
├── Peekaboo installed
├── Clawdbot/OpenClaw running as launch daemon
└── HDMI dummy forces 1080p display for consistent screenshots
```

**Key tip from community:** Get the cheapest HDMI dummy plug you can find (Amazon, ~$8). Without it, the Mac Mini may boot into a low-resolution or no-display mode that breaks all screenshot-based automation.

---

## Sources

- [Anthropic Computer Use Docs](https://docs.anthropic.com/en/docs/build-with-claude/computer-use)
- [Simon Willison's Computer Use Analysis](https://simonwillison.net/2024/Oct/22/computer-use/)
- [Benjamin Anderson: Should I Buy Claude a Mac Mini?](https://benanderson.work/blog/claude-mac-mini/)
- [Peekaboo GitHub](https://github.com/steipete/Peekaboo)
- [C/ua GitHub](https://github.com/trycua/cua)
- [Agent S GitHub](https://github.com/simular-ai/Agent-S)
- [macOS-use GitHub](https://github.com/browser-use/macOS-use)
- [mcp-server-macos-use GitHub](https://github.com/mediar-ai/mcp-server-macos-use)
- [macOS Automator MCP GitHub](https://github.com/steipete/macos-automator-mcp)
- [Apple Accessibility Documentation](https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/AutomatetheUserInterface.html)
- Various Reddit threads (r/macmini, r/Anthropic, r/MacOS, r/LocalLLaMA)

---

*Last updated: February 18, 2026*