# Reonomy Scraper - AGENT-BROWSER PLAN **Date**: 2026-01-15 **Status**: Agent-browser confirmed working and ready to use --- ## 🎯 New Approach: Use Agent-Browser for Reonomy Scraper ### Why Agent-Browser Over Puppeteer | Aspect | Puppeteer | Agent-Browser | |--------|-----------|---------------| | **Speed** | Fast (Rust CLI) | ⚡ Faster (Rust CLI + Playwright) | | **Stability** | Medium (SPA timeouts) | ✅ High (Playwright engine) | | **Refs** | ❌ No (CSS selectors) | ✅ Yes (deterministic @e1, @e2) | | **Semantic Locators** | ❌ No | ✅ Yes (role, text, label, placeholder) | | **State Persistence** | Manual (code changes) | ✅ Built-in (save/load) | | **Sessions** | ❌ No (single instance) | ✅ Yes (parallel scrapers) | | **API Compatibility** | ✅ Perfect (Node.js) | ✅ Perfect (Node.js) | | **Eval Syntax** | Puppeteer `page.evaluate()` | ✅ Simple strings | **Agent-Browser Wins:** 1. **Refs** — Snapshot once, use refs for all interactions (AI-friendly) 2. **Semantic Locators** — Find by role/text/label without CSS selectors 3. **State Persistence** — Login once, reuse across all scrapes (skip auth) 4. **Sessions** — Run parallel scrapers for different locations 5. **Playwright Engine** — More reliable than Puppeteer for SPAs --- ## 📋 Agent-Browser Workflow for Reonomy ### Step 1: Login (One-Time) ```bash agent-browser open "https://app.reonomy.com/#!/login" agent-browser snapshot -i # Get login form refs agent-browser fill @e1 "henry@realestateenhanced.com" agent-browser fill @e2 "9082166532" agent-browser click @e3 # Click login button agent-browser wait 15000 agent-browser state save "reonomy-auth-state.txt" # Save auth state ``` ### Step 2: Load Saved State (Subsequent Runs) ```bash # Skip login on future runs agent-browser state load "reonomy-auth-state.txt" ``` ### Step 3: Navigate to Search with Filters ```bash # Use your search ID with phone+email filters agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6" ``` ### Step 4: Extract Property IDs ```bash # Get snapshot of search results agent-browser snapshot -i # Extract property links from refs # (Parse JSON output to get all property IDs) ``` ### Step 5: Process Each Property (Dual-Tab Extraction) **For each property:** ```bash # Navigate to ownership page directly agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6/property/{property-id}/ownership" # Wait for page to load agent-browser wait 8000 # Get snapshot agent-browser snapshot -i # Extract from Builder and Lot tab # (Address, City, State, ZIP, SF, Property Type) # Wait a moment agent-browser wait 2000 # Extract from Owner tab # (Owner Names, Emails using mailto, Phones using your CSS selector) # Screenshot for debugging agent-browser screenshot "/tmp/property-{index}.png" ``` ### Step 6: Save Results ```bash # Output to JSON # (Combine all property data into final JSON) ``` --- ## 🎯 Key Selectors ### Email Extraction (Dual Approach) ```javascript // Mailto links Array.from(document.querySelectorAll('a[href^="mailto:"]')).map(a => a.href.replace('mailto:', '')) // Text-based emails /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g ``` ### Phone Extraction (Your Provided Selector) ```css p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2 ``` ### Owner Name Extraction ```javascript // Text patterns /Owns\s+(\d+)\s+properties?\s*([A-Z][a-z]+)/i ``` --- ## 💡 Agent-Browser Commands to Implement 1. **Authentication**: `state save`, `state load` 2. **Navigation**: `open ` 3. **Snapshot**: `snapshot -i` (get refs) 4. **Extraction**: `eval ` 5. **Wait**: `wait ` or `wait --text ` 6. **Screenshots**: `screenshot ` 7. **JSON Output**: `--json` flag for machine-readable output --- ## 📊 Data Structure ```json { "scrapeDate": "2026-01-15", "searchId": "504a2d13-d88f-4213-9ac6-a7c8bc7c20c6", "properties": [ { "propertyId": "...", "propertyUrl": "...", "address": "...", "city": "...", "state": "...", "zip": "...", "squareFootage": "...", "propertyType": "...", "ownerNames": ["..."], "emails": ["..."], "phones": ["..."] } ] } ``` --- ## 🔍 Verification Steps Before creating script: 1. **Test agent-browser** with Reonomy login 2. **Snapshot search results** to verify property IDs appear 3. **Snapshot ownership page** to verify DOM structure 4. **Test your CSS selector**: `p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2` 5. **Test email extraction**: Mailto links + text regex 6. **Test owner name extraction**: Regex patterns --- ## 💛 Implementation Questions 1. **Should I create the agent-browser script now?** - Implement the workflow above - Add ref-based navigation - Implement state save/load - Add dual-tab extraction (Builder and Lot + Owner) - Use your CSS selector for phones 2. **Or should I wait for your manual verification?** - You can test agent-browser manually with your search ID - Share snapshot results so I can see actual DOM structure - Verify CSS selector works for phones 3. **Any other requirements?** - Google Sheets export via gog? - CSV export format? - Parallel scraping for multiple locations? --- ## 🚀 Benefits of Agent-Browser Approach | Benefit | Description | |---------|-------------| | ✅ **Ref-based navigation** — Snapshot once, use deterministic refs | | ✅ **State persistence** — Login once, skip auth on future runs | | ✅ **Semantic locators** — Find by role/text/label, not brittle CSS selectors | | ✅ **Playwright engine** — More stable than Puppeteer for SPAs | | ✅ **Rust CLI speed** — Faster command execution | | ✅ **JSON output** | Machine-readable for parsing | | ✅ **Parallel sessions** | Run multiple scrapers at once | --- **Ready to implement when you confirm!** 💛