clawdbot-workspace/REONOMY-AGENT-BROWSER-PLAN.md

5.8 KiB

Reonomy Scraper - AGENT-BROWSER PLAN

Date: 2026-01-15 Status: Agent-browser confirmed working and ready to use


🎯 New Approach: Use Agent-Browser for Reonomy Scraper

Why Agent-Browser Over Puppeteer

Aspect Puppeteer Agent-Browser
Speed Fast (Rust CLI) Faster (Rust CLI + Playwright)
Stability Medium (SPA timeouts) High (Playwright engine)
Refs No (CSS selectors) Yes (deterministic @e1, @e2)
Semantic Locators No Yes (role, text, label, placeholder)
State Persistence Manual (code changes) Built-in (save/load)
Sessions No (single instance) Yes (parallel scrapers)
API Compatibility Perfect (Node.js) Perfect (Node.js)
Eval Syntax Puppeteer page.evaluate() Simple strings

Agent-Browser Wins:

  1. Refs — Snapshot once, use refs for all interactions (AI-friendly)
  2. Semantic Locators — Find by role/text/label without CSS selectors
  3. State Persistence — Login once, reuse across all scrapes (skip auth)
  4. Sessions — Run parallel scrapers for different locations
  5. Playwright Engine — More reliable than Puppeteer for SPAs

📋 Agent-Browser Workflow for Reonomy

Step 1: Login (One-Time)

agent-browser open "https://app.reonomy.com/#!/login"
agent-browser snapshot -i  # Get login form refs
agent-browser fill @e1 "henry@realestateenhanced.com"
agent-browser fill @e2 "9082166532"
agent-browser click @e3  # Click login button
agent-browser wait 15000
agent-browser state save "reonomy-auth-state.txt"  # Save auth state

Step 2: Load Saved State (Subsequent Runs)

# Skip login on future runs
agent-browser state load "reonomy-auth-state.txt"

Step 3: Navigate to Search with Filters

# Use your search ID with phone+email filters
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6"

Step 4: Extract Property IDs

# Get snapshot of search results
agent-browser snapshot -i

# Extract property links from refs
# (Parse JSON output to get all property IDs)

Step 5: Process Each Property (Dual-Tab Extraction)

For each property:

# Navigate to ownership page directly
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6/property/{property-id}/ownership"

# Wait for page to load
agent-browser wait 8000

# Get snapshot
agent-browser snapshot -i

# Extract from Builder and Lot tab
# (Address, City, State, ZIP, SF, Property Type)

# Wait a moment
agent-browser wait 2000

# Extract from Owner tab
# (Owner Names, Emails using mailto, Phones using your CSS selector)

# Screenshot for debugging
agent-browser screenshot "/tmp/property-{index}.png"

Step 6: Save Results

# Output to JSON
# (Combine all property data into final JSON)

🎯 Key Selectors

Email Extraction (Dual Approach)

// Mailto links
Array.from(document.querySelectorAll('a[href^="mailto:"]')).map(a => a.href.replace('mailto:', ''))

// Text-based emails
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g

Phone Extraction (Your Provided Selector)

p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2

Owner Name Extraction

// Text patterns
/Owns\s+(\d+)\s+properties?\s*([A-Z][a-z]+)/i

💡 Agent-Browser Commands to Implement

  1. Authentication: state save, state load
  2. Navigation: open <url>
  3. Snapshot: snapshot -i (get refs)
  4. Extraction: eval <js_code>
  5. Wait: wait <ms> or wait --text <string>
  6. Screenshots: screenshot <path>
  7. JSON Output: --json flag for machine-readable output

📊 Data Structure

{
  "scrapeDate": "2026-01-15",
  "searchId": "504a2d13-d88f-4213-9ac6-a7c8bc7c20c6",
  "properties": [
    {
      "propertyId": "...",
      "propertyUrl": "...",
      "address": "...",
      "city": "...",
      "state": "...",
      "zip": "...",
      "squareFootage": "...",
      "propertyType": "...",
      "ownerNames": ["..."],
      "emails": ["..."],
      "phones": ["..."]
    }
  ]
}

🔍 Verification Steps

Before creating script:

  1. Test agent-browser with Reonomy login
  2. Snapshot search results to verify property IDs appear
  3. Snapshot ownership page to verify DOM structure
  4. Test your CSS selector: p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2
  5. Test email extraction: Mailto links + text regex
  6. Test owner name extraction: Regex patterns

💛 Implementation Questions

  1. Should I create the agent-browser script now?

    • Implement the workflow above
    • Add ref-based navigation
    • Implement state save/load
    • Add dual-tab extraction (Builder and Lot + Owner)
    • Use your CSS selector for phones
  2. Or should I wait for your manual verification?

    • You can test agent-browser manually with your search ID
    • Share snapshot results so I can see actual DOM structure
    • Verify CSS selector works for phones
  3. Any other requirements?

    • Google Sheets export via gog?
    • CSV export format?
    • Parallel scraping for multiple locations?

🚀 Benefits of Agent-Browser Approach

Benefit Description
Ref-based navigation — Snapshot once, use deterministic refs
State persistence — Login once, skip auth on future runs
Semantic locators — Find by role/text/label, not brittle CSS selectors
Playwright engine — More stable than Puppeteer for SPAs
Rust CLI speed — Faster command execution
JSON output Machine-readable for parsing
Parallel sessions Run multiple scrapers at once

Ready to implement when you confirm! 💛