clawdbot-workspace/REONOMY-AGENT-BROWSER-PLAN.md

# Reonomy Scraper - AGENT-BROWSER PLAN

**Date**: 2026-01-15
**Status**: Agent-browser confirmed working and ready to use

---

## 🎯 New Approach: Use Agent-Browser for Reonomy Scraper

### Why Agent-Browser Over Puppeteer

| Aspect | Puppeteer | Agent-Browser |
|--------|-----------|---------------|
| **Speed** | Fast (Rust CLI) | ⚡ Faster (Rust CLI + Playwright) |
| **Stability** | Medium (SPA timeouts) | ✅ High (Playwright engine) |
| **Refs** | ❌ No (CSS selectors) | ✅ Yes (deterministic @e1, @e2) |
| **Semantic Locators** | ❌ No | ✅ Yes (role, text, label, placeholder) |
| **State Persistence** | Manual (code changes) | ✅ Built-in (save/load) |
| **Sessions** | ❌ No (single instance) | ✅ Yes (parallel scrapers) |
| **API Compatibility** | ✅ Perfect (Node.js) | ✅ Perfect (Node.js) |
| **Eval Syntax** | Puppeteer `page.evaluate()` | ✅ Simple strings |

**Agent-Browser Wins:**
1. **Refs** — Snapshot once, use refs for all interactions (AI-friendly)
2. **Semantic Locators** — Find by role/text/label without CSS selectors
3. **State Persistence** — Login once, reuse across all scrapes (skip auth)
4. **Sessions** — Run parallel scrapers for different locations
5. **Playwright Engine** — More reliable than Puppeteer for SPAs

---

## 📋 Agent-Browser Workflow for Reonomy

### Step 1: Login (One-Time)
```bash
agent-browser open "https://app.reonomy.com/#!/login"
agent-browser snapshot -i  # Get login form refs
agent-browser fill @e1 "henry@realestateenhanced.com"
agent-browser fill @e2 "9082166532"
agent-browser click @e3  # Click login button
agent-browser wait 15000
agent-browser state save "reonomy-auth-state.txt"  # Save auth state
```

### Step 2: Load Saved State (Subsequent Runs)
```bash
# Skip login on future runs
agent-browser state load "reonomy-auth-state.txt"
```

### Step 3: Navigate to Search with Filters
```bash
# Use your search ID with phone+email filters
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6"
```

### Step 4: Extract Property IDs
```bash
# Get snapshot of search results
agent-browser snapshot -i

# Extract property links from refs
# (Parse JSON output to get all property IDs)
```

### Step 5: Process Each Property (Dual-Tab Extraction)

**For each property:**
```bash
# Navigate to ownership page directly
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6/property/{property-id}/ownership"

# Wait for page to load
agent-browser wait 8000

# Get snapshot
agent-browser snapshot -i

# Extract from Builder and Lot tab
# (Address, City, State, ZIP, SF, Property Type)

# Wait a moment
agent-browser wait 2000

# Extract from Owner tab
# (Owner Names, Emails using mailto, Phones using your CSS selector)

# Screenshot for debugging
agent-browser screenshot "/tmp/property-{index}.png"
```

### Step 6: Save Results
```bash
# Output to JSON
# (Combine all property data into final JSON)
```

---

## 🎯 Key Selectors

### Email Extraction (Dual Approach)
```javascript
// Mailto links
Array.from(document.querySelectorAll('a[href^="mailto:"]')).map(a => a.href.replace('mailto:', ''))

// Text-based emails
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g
```

### Phone Extraction (Your Provided Selector)
```css
p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2
```

### Owner Name Extraction
```javascript
// Text patterns
/Owns\s+(\d+)\s+properties?\s*([A-Z][a-z]+)/i
```

---

## 💡 Agent-Browser Commands to Implement

1. **Authentication**: `state save`, `state load`
2. **Navigation**: `open <url>`
3. **Snapshot**: `snapshot -i` (get refs)
4. **Extraction**: `eval <js_code>`
5. **Wait**: `wait <ms>` or `wait --text <string>`
6. **Screenshots**: `screenshot <path>`
7. **JSON Output**: `--json` flag for machine-readable output

---

## 📊 Data Structure

```json
{
  "scrapeDate": "2026-01-15",
  "searchId": "504a2d13-d88f-4213-9ac6-a7c8bc7c20c6",
  "properties": [
    {
      "propertyId": "...",
      "propertyUrl": "...",
      "address": "...",
      "city": "...",
      "state": "...",
      "zip": "...",
      "squareFootage": "...",
      "propertyType": "...",
      "ownerNames": ["..."],
      "emails": ["..."],
      "phones": ["..."]
    }
  ]
}
```

---

## 🔍 Verification Steps

Before creating script:
1. **Test agent-browser** with Reonomy login
2. **Snapshot search results** to verify property IDs appear
3. **Snapshot ownership page** to verify DOM structure
4. **Test your CSS selector**: `p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2`
5. **Test email extraction**: Mailto links + text regex
6. **Test owner name extraction**: Regex patterns

---

## 💛 Implementation Questions

1. **Should I create the agent-browser script now?**
   - Implement the workflow above
   - Add ref-based navigation
   - Implement state save/load
   - Add dual-tab extraction (Builder and Lot + Owner)
   - Use your CSS selector for phones

2. **Or should I wait for your manual verification?**
   - You can test agent-browser manually with your search ID
   - Share snapshot results so I can see actual DOM structure
   - Verify CSS selector works for phones

3. **Any other requirements?**
   - Google Sheets export via gog?
   - CSV export format?
   - Parallel scraping for multiple locations?

---

## 🚀 Benefits of Agent-Browser Approach

| Benefit | Description |
|---------|-------------|
| ✅ **Ref-based navigation** — Snapshot once, use deterministic refs |
| ✅ **State persistence** — Login once, skip auth on future runs |
| ✅ **Semantic locators** — Find by role/text/label, not brittle CSS selectors |
| ✅ **Playwright engine** — More stable than Puppeteer for SPAs |
| ✅ **Rust CLI speed** — Faster command execution |
| ✅ **JSON output** | Machine-readable for parsing |
| ✅ **Parallel sessions** | Run multiple scrapers at once |

---

**Ready to implement when you confirm!** 💛