5.8 KiB
5.8 KiB
Reonomy Scraper - AGENT-BROWSER PLAN
Date: 2026-01-15 Status: Agent-browser confirmed working and ready to use
🎯 New Approach: Use Agent-Browser for Reonomy Scraper
Why Agent-Browser Over Puppeteer
| Aspect | Puppeteer | Agent-Browser |
|---|---|---|
| Speed | Fast (Rust CLI) | ⚡ Faster (Rust CLI + Playwright) |
| Stability | Medium (SPA timeouts) | ✅ High (Playwright engine) |
| Refs | ❌ No (CSS selectors) | ✅ Yes (deterministic @e1, @e2) |
| Semantic Locators | ❌ No | ✅ Yes (role, text, label, placeholder) |
| State Persistence | Manual (code changes) | ✅ Built-in (save/load) |
| Sessions | ❌ No (single instance) | ✅ Yes (parallel scrapers) |
| API Compatibility | ✅ Perfect (Node.js) | ✅ Perfect (Node.js) |
| Eval Syntax | Puppeteer page.evaluate() |
✅ Simple strings |
Agent-Browser Wins:
- Refs — Snapshot once, use refs for all interactions (AI-friendly)
- Semantic Locators — Find by role/text/label without CSS selectors
- State Persistence — Login once, reuse across all scrapes (skip auth)
- Sessions — Run parallel scrapers for different locations
- Playwright Engine — More reliable than Puppeteer for SPAs
📋 Agent-Browser Workflow for Reonomy
Step 1: Login (One-Time)
agent-browser open "https://app.reonomy.com/#!/login"
agent-browser snapshot -i # Get login form refs
agent-browser fill @e1 "henry@realestateenhanced.com"
agent-browser fill @e2 "9082166532"
agent-browser click @e3 # Click login button
agent-browser wait 15000
agent-browser state save "reonomy-auth-state.txt" # Save auth state
Step 2: Load Saved State (Subsequent Runs)
# Skip login on future runs
agent-browser state load "reonomy-auth-state.txt"
Step 3: Navigate to Search with Filters
# Use your search ID with phone+email filters
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6"
Step 4: Extract Property IDs
# Get snapshot of search results
agent-browser snapshot -i
# Extract property links from refs
# (Parse JSON output to get all property IDs)
Step 5: Process Each Property (Dual-Tab Extraction)
For each property:
# Navigate to ownership page directly
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6/property/{property-id}/ownership"
# Wait for page to load
agent-browser wait 8000
# Get snapshot
agent-browser snapshot -i
# Extract from Builder and Lot tab
# (Address, City, State, ZIP, SF, Property Type)
# Wait a moment
agent-browser wait 2000
# Extract from Owner tab
# (Owner Names, Emails using mailto, Phones using your CSS selector)
# Screenshot for debugging
agent-browser screenshot "/tmp/property-{index}.png"
Step 6: Save Results
# Output to JSON
# (Combine all property data into final JSON)
🎯 Key Selectors
Email Extraction (Dual Approach)
// Mailto links
Array.from(document.querySelectorAll('a[href^="mailto:"]')).map(a => a.href.replace('mailto:', ''))
// Text-based emails
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g
Phone Extraction (Your Provided Selector)
p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2
Owner Name Extraction
// Text patterns
/Owns\s+(\d+)\s+properties?\s*([A-Z][a-z]+)/i
💡 Agent-Browser Commands to Implement
- Authentication:
state save,state load - Navigation:
open <url> - Snapshot:
snapshot -i(get refs) - Extraction:
eval <js_code> - Wait:
wait <ms>orwait --text <string> - Screenshots:
screenshot <path> - JSON Output:
--jsonflag for machine-readable output
📊 Data Structure
{
"scrapeDate": "2026-01-15",
"searchId": "504a2d13-d88f-4213-9ac6-a7c8bc7c20c6",
"properties": [
{
"propertyId": "...",
"propertyUrl": "...",
"address": "...",
"city": "...",
"state": "...",
"zip": "...",
"squareFootage": "...",
"propertyType": "...",
"ownerNames": ["..."],
"emails": ["..."],
"phones": ["..."]
}
]
}
🔍 Verification Steps
Before creating script:
- Test agent-browser with Reonomy login
- Snapshot search results to verify property IDs appear
- Snapshot ownership page to verify DOM structure
- Test your CSS selector:
p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2 - Test email extraction: Mailto links + text regex
- Test owner name extraction: Regex patterns
💛 Implementation Questions
-
Should I create the agent-browser script now?
- Implement the workflow above
- Add ref-based navigation
- Implement state save/load
- Add dual-tab extraction (Builder and Lot + Owner)
- Use your CSS selector for phones
-
Or should I wait for your manual verification?
- You can test agent-browser manually with your search ID
- Share snapshot results so I can see actual DOM structure
- Verify CSS selector works for phones
-
Any other requirements?
- Google Sheets export via gog?
- CSV export format?
- Parallel scraping for multiple locations?
🚀 Benefits of Agent-Browser Approach
| Benefit | Description |
|---|---|
| ✅ Ref-based navigation — Snapshot once, use deterministic refs | |
| ✅ State persistence — Login once, skip auth on future runs | |
| ✅ Semantic locators — Find by role/text/label, not brittle CSS selectors | |
| ✅ Playwright engine — More stable than Puppeteer for SPAs | |
| ✅ Rust CLI speed — Faster command execution | |
| ✅ JSON output | Machine-readable for parsing |
| ✅ Parallel sessions | Run multiple scrapers at once |
Ready to implement when you confirm! 💛