Jake Shore df4aa799f8 Daily backup: 2026-01-24 - Workspace files including Discord bot automation research, Reonomy scraper versions, backup scripts, and project config

2026-01-24 05:09:55 -05:00

5.8 KiB

Raw Blame History

Reonomy Scraper - AGENT-BROWSER PLAN

Date: 2026-01-15 Status: Agent-browser confirmed working and ready to use

🎯 New Approach: Use Agent-Browser for Reonomy Scraper

Why Agent-Browser Over Puppeteer

Aspect	Puppeteer	Agent-Browser
Speed	Fast (Rust CLI)	⚡ Faster (Rust CLI + Playwright)
Stability	Medium (SPA timeouts)	✅ High (Playwright engine)
Refs	❌ No (CSS selectors)	✅ Yes (deterministic @e1, @e2)
Semantic Locators	❌ No	✅ Yes (role, text, label, placeholder)
State Persistence	Manual (code changes)	✅ Built-in (save/load)
Sessions	❌ No (single instance)	✅ Yes (parallel scrapers)
API Compatibility	✅ Perfect (Node.js)	✅ Perfect (Node.js)
Eval Syntax	Puppeteer `page.evaluate()`	✅ Simple strings

Agent-Browser Wins:

Refs — Snapshot once, use refs for all interactions (AI-friendly)
Semantic Locators — Find by role/text/label without CSS selectors
State Persistence — Login once, reuse across all scrapes (skip auth)
Sessions — Run parallel scrapers for different locations
Playwright Engine — More reliable than Puppeteer for SPAs

📋 Agent-Browser Workflow for Reonomy

agent-browser open "https://app.reonomy.com/#!/login"
agent-browser snapshot -i  # Get login form refs
agent-browser fill @e1 "henry@realestateenhanced.com"
agent-browser fill @e2 "9082166532"
agent-browser click @e3  # Click login button
agent-browser wait 15000
agent-browser state save "reonomy-auth-state.txt"  # Save auth state

Step 2: Load Saved State (Subsequent Runs)

# Skip login on future runs
agent-browser state load "reonomy-auth-state.txt"

Step 3: Navigate to Search with Filters

# Use your search ID with phone+email filters
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6"

Step 4: Extract Property IDs

# Get snapshot of search results
agent-browser snapshot -i

# Extract property links from refs
# (Parse JSON output to get all property IDs)

Step 5: Process Each Property (Dual-Tab Extraction)

For each property:

# Navigate to ownership page directly
agent-browser open "https://app.reonomy.com/#!/search/504a2d13-d88f-4213-9ac6-a7c8bc7c20c6/property/{property-id}/ownership"

# Wait for page to load
agent-browser wait 8000

# Get snapshot
agent-browser snapshot -i

# Extract from Builder and Lot tab
# (Address, City, State, ZIP, SF, Property Type)

# Wait a moment
agent-browser wait 2000

# Extract from Owner tab
# (Owner Names, Emails using mailto, Phones using your CSS selector)

# Screenshot for debugging
agent-browser screenshot "/tmp/property-{index}.png"

Step 6: Save Results

# Output to JSON
# (Combine all property data into final JSON)

🎯 Key Selectors

Email Extraction (Dual Approach)

// Mailto links
Array.from(document.querySelectorAll('a[href^="mailto:"]')).map(a => a.href.replace('mailto:', ''))

// Text-based emails
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g

Phone Extraction (Your Provided Selector)

p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2

Owner Name Extraction

// Text patterns
/Owns\s+(\d+)\s+properties?\s*([A-Z][a-z]+)/i

💡 Agent-Browser Commands to Implement

Authentication: state save, state load
Navigation: open <url>
Snapshot: snapshot -i (get refs)
Extraction: eval <js_code>
Wait: wait <ms> or wait --text <string>
Screenshots: screenshot <path>
JSON Output: --json flag for machine-readable output

📊 Data Structure

{
  "scrapeDate": "2026-01-15",
  "searchId": "504a2d13-d88f-4213-9ac6-a7c8bc7c20c6",
  "properties": [
    {
      "propertyId": "...",
      "propertyUrl": "...",
      "address": "...",
      "city": "...",
      "state": "...",
      "zip": "...",
      "squareFootage": "...",
      "propertyType": "...",
      "ownerNames": ["..."],
      "emails": ["..."],
      "phones": ["..."]
    }
  ]
}

🔍 Verification Steps

Before creating script:

Test agent-browser with Reonomy login
Snapshot search results to verify property IDs appear
Snapshot ownership page to verify DOM structure
Test your CSS selector: p.MuiTypography-root.jss1797.jss1798.MuiTypography-body2
Test email extraction: Mailto links + text regex
Test owner name extraction: Regex patterns

💛 Implementation Questions

Should I create the agent-browser script now?
- Implement the workflow above
- Add ref-based navigation
- Implement state save/load
- Add dual-tab extraction (Builder and Lot + Owner)
- Use your CSS selector for phones
Or should I wait for your manual verification?
- You can test agent-browser manually with your search ID
- Share snapshot results so I can see actual DOM structure
- Verify CSS selector works for phones
Any other requirements?
- Google Sheets export via gog?
- CSV export format?
- Parallel scraping for multiple locations?

🚀 Benefits of Agent-Browser Approach

Benefit	Description
✅ Ref-based navigation — Snapshot once, use deterministic refs
✅ State persistence — Login once, skip auth on future runs
✅ Semantic locators — Find by role/text/label, not brittle CSS selectors
✅ Playwright engine — More stable than Puppeteer for SPAs
✅ Rust CLI speed — Faster command execution
✅ JSON output	Machine-readable for parsing
✅ Parallel sessions	Run multiple scrapers at once

Ready to implement when you confirm! 💛

5.8 KiB Raw Blame History