Jake Shore df4aa799f8 Daily backup: 2026-01-24 - Workspace files including Discord bot automation research, Reonomy scraper versions, backup scripts, and project config

2026-01-24 05:09:55 -05:00

8.1 KiB

Raw Blame History

Reonomy Scraper - Complete Analysis & Memory

Last Updated: 2026-01-13 19:43Z

🎯 Critical URL Pattern Discovery

✅ Working URL Patterns

# Search Page (property list)
https://app.reonomy.com/#!/search/{search-id}

# Property Page (with tabs)
https://app.reonomy.com/#!/property/{property-id}

# Ownership Page (WITH CONTACT INFO) ← KEY!
https://app.reonomy.com/#!/search/{search-id}/property/{property-id}/ownership

Key Insight: Must use /ownership suffix to get emails/phones. Direct property pages don't show contact info.

📊 DOM Structure & Contact Selectors

Page Layout

Left Panel: Map view
Right Panel: Property cards (scrollable list)
Property Details Page: 3 tabs
1. Owner (RIGHT side, default tab) ← Contains contact info
2. Building and Lot (property details)
3. Occupants (tenant info)

Contact Info Extraction (PROVEN WORKING)

// Emails (from manually tested property)
document.querySelectorAll('a[href^="mailto:"]').forEach(a => {
  const email = a.href.replace('mailto:', '');
  if (email && email.length > 5) {
    // Found email!
  }
});

// Phones (from manually tested property)
document.querySelectorAll('a[href^="tel:"]').forEach(a => {
  const phone = a.href.replace('tel:', '');
  if (phone && phone.length > 7) {
    // Found phone!
  }
});

Property Address Extraction

// From h1-h6 heading
const heading = document.querySelector('h1, h2, h3, h4, h5, h6');
const address = heading.textContent.trim();
// Format: "123 main st, city, ST 12345"

Owner Name Extraction

// From page text
const ownerPattern = /Owner:\s*(\d+)\s+properties?\s*in\s*([A-Za-z\s,]+(?:\s*,\s+[A-Z]{2})?)/i;
const ownerMatch = document.body.innerText.match(ownerPattern);
const ownerName = ownerMatch[2]?.trim(); // e.g., "Helen Christian"

🐛 Issues Encountered

Issue 1: Account Tier / Access Levels

Problem: When scraper navigates to /ownership URLs, it finds 0 emails/phones
Root Cause: Different properties may have different access levels based on:
- Premium/Free account tier
- Property type (commercial vs residential)
- Geographic location
- Whether you've previously viewed the property
Evidence: Manually inspected property showed 4 emails + 4 phones, but scraper found 0

Issue 2: Page Loading Timing

Problem: Contact info loads dynamically via JavaScript/AJAX after initial page load
Evidence: Reonomy uses SPA (Single Page Application) framework
Solution Needed: Increased wait times (10-15 seconds) + checking for specific selectors

Issue 3: Dynamic Property IDs

Problem: Property IDs extracted from search results may not be the most recent/current ones
Evidence: Different searches produce different property lists
Solution Needed: Check URL to confirm we're on correct search

📂 Scraper Versions

v1-v3.js - Basic (from earlier attempts)

❌ Wrong URL pattern (missing /search/{id})
❌ Wrong selectors (complex CSS)
❌ No contact info extraction

✅ Correct URL pattern: /search/{search-id}/property/{id}/ownership
❌ Navigates directly to /ownership without clicking through property
❌ Finds 0 emails/phones on all properties

v3-v4-v5-v6-v7-v8-v9 (various click-through attempts)

✅ All attempted to click property buttons first
❌ All found 0 emails/phones on properties
⚠️ Possible cause: Account access limitations, dynamic loading, wrong page state

v9 (LATEST) - Owner Tab Extraction (current best approach)

✅ Extracts data from Owner tab (right side, default view)
✅ No tab clicking needed - contact info is visible by default
✅ Extracts: address, city, state, zip, square footage, property type, owner names, emails, phones
✅ Correct URL pattern with /ownership suffix
✅ 8 second wait for content to load
✅ Click-through approach: property button → property page → extract Owner tab → go back → next property

File: reonomy-scraper-v9-owner-tab.js

🎯 Recommended Approach

Workflow (Based on manual inspection)

Login to Reonomy
Navigate to search
Apply advanced filters (optional but helpful):
- "Has Phone" checkbox
- "Has Email" checkbox
Search for location (e.g., "Eatontown, NJ")
Extract property IDs from search results
For each property:
- Click property button (navigate into property page)
- Wait 5-8 seconds for page to load
- Navigate to /ownership tab (CRITICAL - this is where contact info is!)
- Wait 8-10 seconds for ownership tab content to load
- Extract contact info:
  - Emails: a[href^="mailto:"]
  - Phones: a[href^="tel:"]
  - Owner name: From page text regex
  - Property address: From h1-h6 heading
- Go back to search results
Repeat for next property

Key Differences from Previous Attempts

Aspect	Old Approach	New Approach (v9)
URL	`/property/{id}`	`/search/{id}/property/{id}/ownership`
Navigation	Direct to page	Click property → Go to ownership
View	Dashboard/Search	Owner tab (default right side)
Wait Time	2-3 seconds	8-10 seconds (longer)
Data Source	Not found	Owner tab content

🚀 How to Use v9 Scraper

# Run with default settings (Eatontown, NJ)
cd /Users/jakeshore/.clawdbot/workspace
node reonomy-scraper-v9-owner-tab.js

# Run with custom location
REONOMY_LOCATION="Your City, ST" node reonomy-scraper-v9-owner-tab.js

# Run in visible mode (watch it work)
HEADLESS=false node reonomy-scraper-v9-owner-tab.js

Configuration Options

# Change email/password
REONOMY_EMAIL="your-email@example.com"
REONOMY_PASSWORD="yourpassword"
node reonomy-scraper-v9-owner-tab.js

# Change max properties (default: 20)
MAX_PROPERTIES=50 node reonomy-scraper-v9-owner-tab.js

Output

File: reonomy-leads-v9-owner-tab.json
Format: JSON with scrapeDate, location, searchId, leadCount, leads[]
Each lead contains:
- scrapeDate
- propertyId
- propertyUrl
- ownershipUrl (with /ownership suffix)
- address
- city, state, zip
- squareFootage
- propertyType
- ownerNames (array)
- emails (array)
- phones (array)

🎯 What Makes v9 Different

Correct URL Pattern - Uses /search/{search-id}/property/{id}/ownership (not just /property/{id})
Owner Tab Extraction - Extracts from Owner tab content directly (no need to click "View Contact" button)
Click-Through Workflow - Property button → Navigate → Extract → Go back → Next property
Longer Wait Times - 10 second wait after navigation, 10 second wait after going to ownership tab
Full Data Extraction - Not just emails/phones, but also: address, city, state, zip, square footage, property type, owner names

🔧 If v9 Still Fails

Manual Debugging Steps

Run in visible mode to watch the browser
Check if the Owner tab is the default view (it should be)
Verify we're on the correct search results page
Check if property IDs are being extracted correctly
Look for any "Upgrade to view contact" or "Premium only" messages

Alternative: Try Specific Properties

From your manually tested property that had contact info:

Search for: "Center Hill, FL" or specific address from that property
Navigate directly to that property's ownership tab

Alternative: Check "Recently Viewed Properties"

Your account shows "Recently Viewed Properties" on the home page - these may have guaranteed access to contact info

📝 Summary

We've learned:

✅ Correct URL pattern for contact info: /search/{id}/property/{id}/ownership
✅ Contact info is in Owner tab (right side, default)
✅ Emails: a[href^="mailto:"]
✅ Phones: a[href^="tel:"]
✅ Can extract: address, owner names, property details
⚠️ Contact info may be limited by account tier or property type

Current Best Approach: v9 Owner Tab Extractor

Next Step: Test v9 and see if it successfully finds contact info on properties that have it available.

8.1 KiB Raw Blame History