# macOS Computer Use Tools for AI Agents — Deep Research (Feb 2026) > **Context:** Evaluating the best "computer use" tools/frameworks for AI agents running on an always-on Mac mini M-series (specifically for Clawdbot/OpenClaw-style automation). --- ## Table of Contents 1. [Anthropic Computer Use](#1-anthropic-computer-use) 2. [Apple Accessibility APIs](#2-apple-accessibility-apis) 3. [Peekaboo](#3-peekaboo) 4. [Open Interpreter](#4-open-interpreter) 5. [Other Frameworks](#5-other-notable-frameworks) - [macOS-use (browser-use)](#51-macos-use-browser-use) - [Agent S (Simular.ai)](#52-agent-s-simularai) - [C/ua (trycua)](#53-cua-trycua) - [mcp-server-macos-use (mediar-ai)](#54-mcp-server-macos-use-mediar-ai) - [mcp-remote-macos-use](#55-mcp-remote-macos-use) - [macOS Automator MCP (steipete)](#56-macos-automator-mcp-steipete) - [mac_computer_use (deedy)](#57-mac_computer_use-deedy) 6. [Comparison Matrix](#6-comparison-matrix) 7. [Recommendations for Mac Mini Agent Setup](#7-recommendations-for-mac-mini-agent-setup) 8. [Headless / SSH Considerations](#8-headless--ssh-considerations) --- ## 1. Anthropic Computer Use **GitHub:** [anthropics/anthropic-quickstarts](https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo) **Mac forks:** [deedy/mac_computer_use](https://github.com/deedy/mac_computer_use), [PallavAg/claude-computer-use-macos](https://github.com/PallavAg/claude-computer-use-macos), [newideas99/Anthropic-Computer-Use-MacOS](https://github.com/newideas99/Anthropic-Computer-Use-MacOS) ### How It Works - **Screenshot-based.** The model receives screenshots and reasons about pixel coordinates. - Claude sends actions (mouse_move, click, type, screenshot) to a local executor. - On macOS, the executor uses `cliclick` for mouse/keyboard and `screencapture` for screenshots. - The model identifies coordinates by "counting pixels" — trained specifically for coordinate estimation. - Anthropic recommends XGA (1024×768) or WXGA (1280×800) resolution for best accuracy. - The official demo uses Docker + Ubuntu (xdotool). macOS forks replace xdotool with `cliclick` and native `screencapture`. ### Speed / Latency - **Slow.** Each action cycle involves: screenshot → upload image → API inference → parse response → execute action. - A single click-and-verify cycle takes **3-8 seconds** depending on API latency. - Multi-step tasks (e.g., open Safari, navigate, search) can take **30-120+ seconds**. - Screenshot upload adds ~1-3s overhead per cycle (images are typically 100-500KB). ### Reliability - **Moderate.** Coordinate estimation works well for large, distinct UI elements. - Struggles with small buttons, dense UIs, and similar-looking elements. - No DOM/accessibility tree awareness — purely visual. If the UI changes between screenshot and action, clicks can miss. - Self-correction loop helps: model takes new screenshots after each action. - Prone to **prompt injection** from on-screen text (major security concern). - Simon Willison's testing (Oct 2024): works for simple tasks, fails on complex multi-step workflows. ### Setup Complexity - **Moderate.** Requires: Python 3.12+, cliclick (`brew install cliclick`), Anthropic API key, macOS Accessibility permissions. - Mac forks require cloning a repo + setting up a venv + environment variables. - Some forks include a Streamlit UI for interactive testing. - Must grant Terminal/Python Accessibility permissions in System Preferences. ### Headless / SSH - **Problematic.** `screencapture` requires WindowServer (a GUI session). - Over pure SSH without a display, `screencapture` fails silently or returns black images. - **Workaround:** Use an HDMI dummy plug + Screen Sharing (VNC), or connect via Apple Remote Desktop. The screencapture then works against the VNC session. - Not designed for headless operation. ### Cost - **API costs only.** Anthropic API pricing (Feb 2026): - Claude Sonnet 4.5: $3/M input tokens, $15/M output tokens - Claude Opus 4.5: $5/M input tokens, $25/M output tokens - Each screenshot is ~1,500-3,000 tokens (image tokens) - A 10-step task might cost $0.05-0.30 depending on model and complexity - Computer use itself is free — you run the executor locally. ### Reddit Sentiment - **Excited but cautious.** r/Anthropic thread on unsandboxed Mac use got 24 upvotes, with comments calling it "dangerous but cool." - r/macmini discussions show interest in buying Mac Minis specifically for this use case. - Common complaints: slow, expensive at scale, not reliable enough for unsupervised use. - Benjamin Anderson's blog post captures the zeitgeist: "Claude needs his own computer" — the coding agent + computer use convergence thesis. --- ## 2. Apple Accessibility APIs **Documentation:** [Apple Mac Automation Scripting Guide](https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/AutomatetheUserInterface.html) ### How It Works - **Accessibility tree-based.** macOS exposes every UI element (buttons, text fields, menus, etc.) through the Accessibility framework (AXUIElement API). - **Three access methods:** 1. **AppleScript / osascript:** `tell application "System Events" → tell process "Finder" → click button "OK"`. High-level scripting, easy to write. 2. **JXA (JavaScript for Automation):** Same capabilities as AppleScript, written in JavaScript. Run via `osascript -l JavaScript`. 3. **AXUIElement (C/Swift/Python via pyobjc):** Low-level programmatic access to the full accessibility tree. Can enumerate all UI elements, read properties (role, title, position, size), and perform actions (press, set value, etc.). - Does NOT rely on screenshots — reads the actual UI element tree. - Can traverse the entire hierarchy: Application → Window → Group → Button → etc. ### Speed / Latency - **Fast.** AppleScript commands execute in **10-100ms**. AXUIElement API calls are typically **1-10ms**. - No image capture, no network round-trip, no model inference. - Menu clicks, text entry, window management — all near-instantaneous. - Can enumerate hundreds of UI elements in <100ms. ### Reliability - **High for supported apps.** Most native macOS apps and many Electron apps expose accessibility info. - Apple's own apps (Finder, Safari, Mail, Calendar, Notes) have excellent accessibility support. - Electron apps (VS Code, Slack, Discord) expose basic accessibility but may have gaps. - Web content in browsers is accessible via accessibility APIs (each DOM element maps to an AX element). - **Failure modes:** Apps with custom rendering (games, some media apps) may not expose UI elements. Some apps have broken accessibility annotations. ### Setup Complexity - **Low.** AppleScript is built into macOS — no installation needed. - `osascript` is available in every terminal. - For Python access: `pip install pyobjc-framework-ApplicationServices` - **Critical requirement:** Must enable Accessibility permissions for the calling application (Terminal, Python, etc.) in System Preferences → Privacy & Security → Accessibility. - For automation across apps: System Preferences → Privacy & Security → Automation. ### Headless / SSH - **Partially works.** AppleScript/osascript commands work over SSH **if** a GUI session is active (user logged in). - AXUIElement requires WindowServer to be running. - Works well with headless Mac Mini + HDMI dummy plug + remote login session. - `osascript` may throw "not allowed assistive access" errors over SSH — the calling process (sshd, bash) needs to be in the Accessibility allow list. - **Workaround:** Save scripts as .app bundles, grant them Accessibility access, then invoke from SSH. ### Cost - **Free.** Built into macOS, no API costs. ### Best For - **Structured automation:** "Click the Save button in TextEdit" rather than "figure out what's on screen." - **Fast, deterministic workflows** where you know the target app and UI structure. - **Combining with an LLM:** Feed the accessibility tree to an LLM, let it decide which element to interact with. This is what Peekaboo, mcp-server-macos-use, and macOS-use all do under the hood. ### Limitations - **No visual understanding.** Can't interpret images, charts, or custom-drawn content. - **Fragile element references:** If an app updates, button names/positions may change. - **Permission hell:** Each calling app needs separate Accessibility + Automation grants. Can't grant to `osascript` directly (it's not an .app). --- ## 3. Peekaboo **GitHub:** [steipete/Peekaboo](https://github.com/steipete/Peekaboo) **Website:** [peekaboo.boo](https://www.peekaboo.boo/) **Author:** Peter Steinberger (well-known iOS/macOS developer) ### How It Works - **Hybrid: screenshot + accessibility tree.** This is Peekaboo's killer feature. - The `see` command captures a screenshot AND overlays element IDs from the accessibility tree, creating an annotated snapshot. - The `click` command can target elements by: accessibility ID, label text, or raw coordinates. - **Full GUI automation suite:** click, type, press, hotkey, scroll, swipe, drag, move, window management, app control, menu interaction, dock control, dialog handling, Space switching. - **Native Swift CLI** — compiled binary, not Python. Fast and deeply integrated with macOS APIs. - **MCP server mode** — can be used as an MCP tool by Claude Desktop, Cursor, or any MCP client. - **Agent mode** — `peekaboo agent` runs a natural-language multi-step automation loop (capture → LLM decide → act → repeat). - Supports multiple AI providers: OpenAI, Claude, Grok, Gemini, Ollama (local). ### Speed / Latency - **Fast.** Screenshot capture via ScreenCaptureKit is <100ms. Accessibility tree traversal is similarly fast. - Individual click/type/press commands execute in **10-50ms**. - Agent mode latency depends on the LLM provider (1-5s per step with cloud APIs). - Much faster than pure screenshot-based approaches because clicks target element IDs, not pixel coordinates. ### Reliability - **High.** Using accessibility IDs instead of pixel coordinates means: - Clicks don't miss due to resolution changes or slight UI shifts. - Elements are identified by semantic identity (button label, role), not visual appearance. - The annotated snapshot approach gives the LLM **both** visual context and structural data — best of both worlds. - Menu interaction, dialog handling, and window management are deeply integrated. - Created by Peter Steinberger — high-quality Swift code, actively maintained. ### Setup Complexity - **Low.** `brew install steipete/tap/peekaboo` — single command. - Requires macOS 15+ (Sequoia), Screen Recording permission, Accessibility permission. - MCP server mode: `npx @steipete/peekaboo-mcp@beta` (zero-install for Node users). - Configuration for AI providers via `peekaboo config`. ### Headless / SSH - **Requires a GUI session** (ScreenCaptureKit and accessibility APIs need WindowServer). - Works with Mac Mini + HDMI dummy plug + Screen Sharing. - Can be invoked over SSH if a GUI login session is active. - The CLI nature makes it easy to script and automate remotely. ### Cost - **Free and open-source** (MIT license). - AI provider costs apply when using `peekaboo agent` or `peekaboo see --analyze`. - Local models via Ollama = zero marginal cost. ### Reddit / Community Sentiment - Very well-received in the macOS developer community. - Peter Steinberger's reputation lends credibility. - Described as "giving AI agents eyes on macOS." - Praised for the hybrid screenshot+accessibility approach. - Active development — regular releases with new features. ### Why Peekaboo Stands Out - **Best-in-class for macOS-specific automation.** It's what a senior macOS developer would build if they were making the perfect agent tool. - Complete command set: see, click, type, press, hotkey, scroll, swipe, drag, window, app, space, menu, menubar, dock, dialog. - Runnable automation scripts (`.peekaboo.json`). - Clean JSON output for programmatic consumption. --- ## 4. Open Interpreter **Website:** [openinterpreter.com](https://www.openinterpreter.com/) **GitHub:** [OpenInterpreter/open-interpreter](https://github.com/OpenInterpreter/open-interpreter) ### How It Works - **Primarily code execution**, with experimental "OS mode" for GUI control. - Normal mode: LLM generates Python/bash/JS code, executes it locally. - **OS mode** (`interpreter --os`): Screenshot-based. Takes screenshots, sends to a vision model (GPT-4V, etc.), model reasons about actions, executes via pyautogui. - Also includes 01 Light hardware — a portable voice interface that connects to a home computer. ### Speed / Latency - Normal mode (code execution): **Fast** — direct code execution, limited by LLM inference time. - OS mode: **Slow** — same screenshot→API→action loop as Anthropic Computer Use. - OS mode is explicitly labeled "highly experimental." ### Reliability - Normal mode: **Good** for code-centric tasks. LLM writes code that runs on your machine. - OS mode: **Low.** Labeled as "work in progress." Community reports frequent failures. - Single monitor only. No multi-display support in OS mode. - Better at tasks that can be accomplished via code (file manipulation, API calls, data processing) than GUI interaction. ### Setup Complexity - **Low.** `pip install open-interpreter` and `interpreter --os`. - Requires Screen Recording permissions on macOS. - API key for your chosen LLM provider. ### Headless / SSH - Normal mode (code execution): **Works perfectly** over SSH. - OS mode: **Requires GUI session** (uses pyautogui + screenshots). ### Cost - **Free and open-source.** - LLM API costs apply. ### Reddit Sentiment - Community has cooled on Open Interpreter since the initial hype. - OS mode is seen as a proof-of-concept, not production-ready. - Normal mode (code execution) is valued but outcompeted by Claude Code, Cursor, etc. - 01 Light hardware project had enthusiastic reception but unclear adoption. ### Verdict - **Not recommended for computer use / GUI automation.** Its strength is code execution, and dedicated coding agents (Claude Code, Codex) do that better now. - OS mode is too experimental and unreliable for production use. --- ## 5. Other Notable Frameworks ### 5.1 macOS-use (browser-use) **GitHub:** [browser-use/macOS-use](https://github.com/browser-use/macOS-use) **Install:** `pip install mlx-use` **How it works:** Screenshot-based. Takes screenshots, sends to vision model (OpenAI/Anthropic/Gemini), model returns actions (click coordinates, type text, etc.), executes via pyautogui/AppleScript. **Key details:** - Spin-off from the popular browser-use project. - Supports OpenAI, Anthropic, Gemini APIs. - Vision: plans to support local inference via Apple MLX framework (not yet implemented). - Works across ALL macOS apps, not just browsers. - Early stage — "varying success rates depending on task prompt." - **Security warning:** Can access credentials, stored passwords, and all UI components. **Speed:** Slow (cloud API round-trip per action). **Reliability:** Low-moderate. Early development. **Setup:** `pip install mlx-use`, configure API key. **Headless:** Requires GUI session. **Cost:** Free + API costs. **Sentiment:** Exciting concept but immature. Reddit post got moderate engagement. --- ### 5.2 Agent S (Simular.ai) **GitHub:** [simular-ai/Agent-S](https://github.com/simular-ai/Agent-S) **Website:** [simular.ai](https://www.simular.ai/) **How it works:** Multi-model system using **screenshot + grounding model + planning model.** - Agent S3 (latest) uses a planning LLM (e.g., GPT-5, Claude) + a grounding model (UI-TARS-1.5-7B) for precise element location. - The grounding model takes screenshots and returns precise coordinates for UI elements. - Supports macOS, Windows, Linux. - **State-of-the-art results:** Agent S3 was the first to surpass human performance on OSWorld benchmark (72.6%). - ICLR 2025 Best Paper Award. **Key details:** - Requires two models: a main reasoning model + a grounding model (UI-TARS-1.5-7B recommended). - The grounding model can be self-hosted on Hugging Face Inference Endpoints. - Optional local coding environment for code execution tasks. - Uses pyautogui for actions + screenshots for perception. - CLI interface: `agent_s --provider openai --model gpt-5-2025-08-07 --ground_provider huggingface ...` **Speed:** Moderate. Two-model inference adds latency. Grounding model can be local for faster inference. **Reliability:** **Highest reported.** 72.6% on OSWorld surpasses human performance. **Setup:** Complex. Requires two models, API keys, grounding model deployment. **Headless:** Requires GUI session (pyautogui + screenshots). **Cost:** Free (open source) + API costs for both models. UI-TARS-7B hosting adds cost. **Sentiment:** Highly respected in the research community. ICLR paper, strong benchmarks. The "serious" option for computer use research. --- ### 5.3 C/ua (trycua) **GitHub:** [trycua/cua](https://github.com/trycua/cua) **Website:** [cua.ai](https://cua.ai/) **YC Company** **How it works:** **Sandboxed virtual machines** for computer use agents. - Runs macOS or Linux VMs on Apple Silicon using Apple's Virtualization.Framework. - Near-native performance (97% of native CPU speed reported). - Provides a complete SDK for agents to control the VM: click, type, scroll, screenshot, accessibility tree. - **CuaBot:** CLI tool that gives any coding agent (Claude Code, OpenClaw) a sandbox. - Includes benchmarking suite (cua-bench) for evaluating agents on OSWorld, ScreenSpot, etc. **Key details:** - `lume` — macOS/Linux VM management on Apple Silicon (their virtualization layer). - `lumier` — Docker-compatible interface for Lume VMs. - Agent SDK supports multiple models (Anthropic, OpenAI, etc.). - Designed specifically for the "give your agent a computer" use case. - Sandboxed = safe. Agent can't damage your host system. **Speed:** Near-native. VM overhead is minimal on Apple Silicon. **Reliability:** Good. VM provides consistent environment. **Setup:** Moderate. `npx cuabot` for quick start, or programmatic setup via Python SDK. **Headless:** **Excellent.** VMs run headless by design. H.265 streaming for when you want to observe. **Cost:** Free and open source (MIT). API costs for the AI model. **Sentiment:** Strong interest on r/LocalLLaMA. "Docker for computer use agents" resonates. YC backing adds credibility. **Why C/ua matters:** It solves the biggest problem with giving agents computer access — **safety.** The agent operates in an isolated VM, can't touch your host system. Perfect for always-on Mac Mini setups. --- ### 5.4 mcp-server-macos-use (mediar-ai) **GitHub:** [mediar-ai/mcp-server-macos-use](https://github.com/mediar-ai/mcp-server-macos-use) **How it works:** **Accessibility tree-based.** Swift MCP server that controls macOS apps through AXUIElement APIs. - Every action (click, type, press key) is followed by an accessibility tree traversal, giving the LLM updated UI state. - Tools: open_application_and_traverse, click_and_traverse, type_and_traverse, press_key_and_traverse, refresh_traversal. - Communicates via stdin/stdout (MCP protocol). - Uses the app's PID (process ID) for targeting. **Speed:** Fast. Native Swift, accessibility APIs are low-latency. **Reliability:** High for apps with good accessibility support. **Setup:** Build with `swift build`, configure in Claude Desktop or any MCP client. **Headless:** Requires GUI session (accessibility APIs need WindowServer). **Cost:** Free and open source. **Sentiment:** Niche but well-designed. Good for MCP-native workflows. --- ### 5.5 mcp-remote-macos-use **GitHub:** [baryhuang/mcp-remote-macos-use](https://github.com/baryhuang/mcp-remote-macos-use) **How it works:** **Screen Sharing-based remote control.** Uses macOS Screen Sharing (VNC) protocol. - Captures screenshots and sends input over the VNC connection. - Doesn't require any software installed on the target Mac (just Screen Sharing enabled). - Deployable via Docker. - No extra API key needed — works with any MCP client/LLM. **Speed:** Moderate (VNC overhead). **Reliability:** Moderate. VNC-level interaction. **Setup:** Enable Screen Sharing on target Mac, configure env vars. **Headless:** **Yes!** Designed for remote/headless operation via Screen Sharing. **Cost:** Free. **Sentiment:** Practical for remote Mac control scenarios. --- ### 5.6 macOS Automator MCP (steipete) **GitHub:** [steipete/macos-automator-mcp](https://github.com/steipete/macos-automator-mcp) **How it works:** **AppleScript/JXA execution via MCP.** Ships with 200+ pre-built automation recipes. - Executes AppleScript or JXA (JavaScript for Automation) scripts. - Knowledge base of common automations: toggle dark mode, extract URLs from Safari, manage windows, etc. - Supports inline scripts, file-based scripts, and pre-built knowledge base scripts. - TypeScript/Node.js implementation. **Speed:** Fast. AppleScript executes in milliseconds. **Reliability:** High for scripted automations. Depends on script quality. **Setup:** `npx @steipete/macos-automator-mcp@latest` — minimal. **Headless:** Partially. AppleScript works over SSH with GUI session active. **Cost:** Free (MIT). **Sentiment:** Great companion to Peekaboo. Same author (Peter Steinberger). --- ### 5.7 mac_computer_use (deedy) **GitHub:** [deedy/mac_computer_use](https://github.com/deedy/mac_computer_use) **How it works:** Fork of Anthropic's official computer-use demo, adapted for native macOS. - Screenshot-based (screencapture + cliclick). - Streamlit web UI. - Multi-provider support (Anthropic, Bedrock, Vertex). - Automatic resolution scaling. **Speed:** Same as Anthropic Computer Use (slow — API round-trip per action). **Reliability:** Same as Anthropic Computer Use (moderate). **Setup:** Clone, pip install, set API key, run streamlit. **Headless:** Same limitations (needs WindowServer). **Cost:** Free + API costs. --- ## 6. Comparison Matrix | Tool | Approach | Speed | Reliability | Setup | Headless | Cost | Best For | |------|----------|-------|-------------|-------|----------|------|----------| | **Anthropic Computer Use** | Screenshot + pixel coords | ⭐⭐ Slow | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate | ❌ Needs GUI | API costs | General-purpose computer use | | **Apple Accessibility APIs** | Accessibility tree | ⭐⭐⭐⭐⭐ Instant | ⭐⭐⭐⭐ High | ⭐⭐⭐⭐ Low | ⚠️ Partial | Free | Deterministic automation | | **Peekaboo** | **Hybrid: screenshot + accessibility** | ⭐⭐⭐⭐ Fast | ⭐⭐⭐⭐ High | ⭐⭐⭐⭐⭐ Easy | ⚠️ Needs GUI | Free + API | **Best macOS agent tool** | | **Open Interpreter** | Screenshots (OS mode) | ⭐⭐ Slow | ⭐⭐ Low | ⭐⭐⭐⭐ Easy | ❌ OS mode needs GUI | Free + API | Code execution (not GUI) | | **macOS-use** | Screenshots + pyautogui | ⭐⭐ Slow | ⭐⭐ Low-Med | ⭐⭐⭐ Easy | ❌ Needs GUI | Free + API | Cross-app automation (experimental) | | **Agent S3** | Screenshots + grounding model | ⭐⭐⭐ Moderate | ⭐⭐⭐⭐⭐ Highest | ⭐⭐ Complex | ❌ Needs GUI | Free + 2× API | Research / highest accuracy | | **C/ua** | VM sandbox + screenshot/a11y | ⭐⭐⭐⭐ Fast | ⭐⭐⭐⭐ Good | ⭐⭐⭐ Moderate | ✅ Yes | Free + API | **Safest sandboxed option** | | **mcp-server-macos-use** | Accessibility tree (Swift) | ⭐⭐⭐⭐⭐ Fast | ⭐⭐⭐⭐ High | ⭐⭐⭐ Moderate | ⚠️ Needs GUI | Free | MCP-native workflows | | **mcp-remote-macos-use** | VNC screen sharing | ⭐⭐⭐ Moderate | ⭐⭐⭐ Moderate | ⭐⭐⭐ Easy | ✅ Yes | Free | Remote Mac control | | **macOS Automator MCP** | AppleScript/JXA | ⭐⭐⭐⭐⭐ Instant | ⭐⭐⭐⭐ High | ⭐⭐⭐⭐⭐ Easy | ⚠️ Partial | Free | Scripted automations | --- ## 7. Recommendations for Mac Mini Agent Setup ### 🏆 Tier 1: Best Overall **Peekaboo** is the clear winner for an always-on Mac Mini running AI agent automation. **Why:** - Hybrid approach (screenshot + accessibility tree) gives the best of both worlds - Native Swift CLI = fast and deeply integrated with macOS - MCP server mode works with any MCP client - Complete automation toolkit (click, type, menu, window, dialog, etc.) - Active development by a respected macOS developer - Easy install (`brew install steipete/tap/peekaboo`) **Recommended stack:** ``` Peekaboo (GUI automation) + macOS Automator MCP (AppleScript/JXA for scripted tasks) + Apple Accessibility APIs (direct AXUIElement for custom automation) ``` ### 🥈 Tier 2: For Safety-Critical Use **C/ua** if you need sandboxed execution (agent can't damage your host system). **Why:** - VM isolation = peace of mind for unsupervised operation - Near-native performance on Apple Silicon - Works headless by design - Good for running untrusted or experimental agents - YC-backed, strong engineering ### 🥉 Tier 3: For Research / Maximum Accuracy **Agent S3** if you need the highest possible task completion rate and are willing to invest in setup complexity. **Why:** - Best benchmark results (72.6% on OSWorld, surpassing human performance) - Two-model approach provides better grounding - Research-grade quality - But: complex setup, higher API costs ### For Clawdbot/OpenClaw Specifically The ideal integration path: 1. **Peekaboo MCP** as the primary computer-use tool (add to MCP config) 2. **macOS Automator MCP** for common scripted tasks (dark mode, app control, etc.) 3. **Apple Accessibility APIs** via `osascript` for quick deterministic actions 4. Fall back to **Anthropic Computer Use** for tasks requiring pure visual reasoning --- ## 8. Headless / SSH Considerations Running computer-use tools on a headless Mac Mini is a **critical concern** for always-on setups: ### The Core Problem macOS GUI automation tools (screenshots, accessibility APIs, pyautogui, cliclick) require: 1. **WindowServer** to be running (a GUI session must exist) 2. **A display** (real or virtual) for screenshots to capture ### Solutions 1. **HDMI Dummy Plug** ($5-15): Plugs into HDMI port, tricks macOS into thinking a display is connected. This is **the most reliable solution** for headless Mac Minis. 2. **Apple Screen Sharing / VNC**: Enable Screen Sharing in System Settings. Connect from another Mac or use a VNC client. `screencapture` works against the active session. 3. **HDMI Dummy + Auto-Login**: Configure macOS to auto-login on boot, use HDMI dummy plug for display emulation. Most robust setup for unattended operation. 4. **C/ua VMs**: Run the agent in a VM — it has its own virtual display. No dummy plug needed. ### What Works Over SSH (with GUI session active) | Capability | Works Over SSH? | |-----------|----------------| | `osascript` / AppleScript | ✅ Yes (if Accessibility granted) | | `screencapture` | ✅ Yes (with GUI session + display) | | `cliclick` | ✅ Yes (with GUI session) | | Peekaboo CLI | ✅ Yes (with GUI session) | | pyautogui | ✅ Yes (with GUI session) | ### Recommended Headless Setup ``` Mac Mini M4 + HDMI Dummy Plug ├── Auto-login enabled ├── Screen Sharing enabled (for monitoring) ├── SSH enabled (for CLI access) ├── Peekaboo installed ├── Clawdbot/OpenClaw running as launch daemon └── HDMI dummy forces 1080p display for consistent screenshots ``` **Key tip from community:** Get the cheapest HDMI dummy plug you can find (Amazon, ~$8). Without it, the Mac Mini may boot into a low-resolution or no-display mode that breaks all screenshot-based automation. --- ## Sources - [Anthropic Computer Use Docs](https://docs.anthropic.com/en/docs/build-with-claude/computer-use) - [Simon Willison's Computer Use Analysis](https://simonwillison.net/2024/Oct/22/computer-use/) - [Benjamin Anderson: Should I Buy Claude a Mac Mini?](https://benanderson.work/blog/claude-mac-mini/) - [Peekaboo GitHub](https://github.com/steipete/Peekaboo) - [C/ua GitHub](https://github.com/trycua/cua) - [Agent S GitHub](https://github.com/simular-ai/Agent-S) - [macOS-use GitHub](https://github.com/browser-use/macOS-use) - [mcp-server-macos-use GitHub](https://github.com/mediar-ai/mcp-server-macos-use) - [macOS Automator MCP GitHub](https://github.com/steipete/macos-automator-mcp) - [Apple Accessibility Documentation](https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/AutomatetheUserInterface.html) - Various Reddit threads (r/macmini, r/Anthropic, r/MacOS, r/LocalLLaMA) --- *Last updated: February 18, 2026*