clawdbot-workspace/REONOMY-SCRAPER-MEMORY.md

8.1 KiB

Reonomy Scraper - Complete Analysis & Memory

Last Updated: 2026-01-13 19:43Z


🎯 Critical URL Pattern Discovery

Working URL Patterns

# Search Page (property list)
https://app.reonomy.com/#!/search/{search-id}

# Property Page (with tabs)
https://app.reonomy.com/#!/property/{property-id}

# Ownership Page (WITH CONTACT INFO) ← KEY!
https://app.reonomy.com/#!/search/{search-id}/property/{property-id}/ownership

Key Insight: Must use /ownership suffix to get emails/phones. Direct property pages don't show contact info.


📊 DOM Structure & Contact Selectors

Page Layout

  • Left Panel: Map view
  • Right Panel: Property cards (scrollable list)
  • Property Details Page: 3 tabs
    1. Owner (RIGHT side, default tab) ← Contains contact info
    2. Building and Lot (property details)
    3. Occupants (tenant info)

Contact Info Extraction (PROVEN WORKING)

// Emails (from manually tested property)
document.querySelectorAll('a[href^="mailto:"]').forEach(a => {
  const email = a.href.replace('mailto:', '');
  if (email && email.length > 5) {
    // Found email!
  }
});

// Phones (from manually tested property)
document.querySelectorAll('a[href^="tel:"]').forEach(a => {
  const phone = a.href.replace('tel:', '');
  if (phone && phone.length > 7) {
    // Found phone!
  }
});

Property Address Extraction

// From h1-h6 heading
const heading = document.querySelector('h1, h2, h3, h4, h5, h6');
const address = heading.textContent.trim();
// Format: "123 main st, city, ST 12345"

Owner Name Extraction

// From page text
const ownerPattern = /Owner:\s*(\d+)\s+properties?\s*in\s*([A-Za-z\s,]+(?:\s*,\s+[A-Z]{2})?)/i;
const ownerMatch = document.body.innerText.match(ownerPattern);
const ownerName = ownerMatch[2]?.trim(); // e.g., "Helen Christian"

🐛 Issues Encountered

Issue 1: Account Tier / Access Levels

  • Problem: When scraper navigates to /ownership URLs, it finds 0 emails/phones
  • Root Cause: Different properties may have different access levels based on:
    • Premium/Free account tier
    • Property type (commercial vs residential)
    • Geographic location
    • Whether you've previously viewed the property
  • Evidence: Manually inspected property showed 4 emails + 4 phones, but scraper found 0

Issue 2: Page Loading Timing

  • Problem: Contact info loads dynamically via JavaScript/AJAX after initial page load
  • Evidence: Reonomy uses SPA (Single Page Application) framework
  • Solution Needed: Increased wait times (10-15 seconds) + checking for specific selectors

Issue 3: Dynamic Property IDs

  • Problem: Property IDs extracted from search results may not be the most recent/current ones
  • Evidence: Different searches produce different property lists
  • Solution Needed: Check URL to confirm we're on correct search

📂 Scraper Versions

v1-v3.js - Basic (from earlier attempts)

  • Wrong URL pattern (missing /search/{id})
  • Wrong selectors (complex CSS)
  • No contact info extraction

v2-v4-final.js - Direct Navigation (failed)

  • Correct URL pattern: /search/{search-id}/property/{id}/ownership
  • Navigates directly to /ownership without clicking through property
  • Finds 0 emails/phones on all properties

v3-v4-v5-v6-v7-v8-v9 (various click-through attempts)

  • All attempted to click property buttons first
  • All found 0 emails/phones on properties
  • ⚠️ Possible cause: Account access limitations, dynamic loading, wrong page state

v9 (LATEST) - Owner Tab Extraction (current best approach)

  • Extracts data from Owner tab (right side, default view)
  • No tab clicking needed - contact info is visible by default
  • Extracts: address, city, state, zip, square footage, property type, owner names, emails, phones
  • Correct URL pattern with /ownership suffix
  • 8 second wait for content to load
  • Click-through approach: property button → property page → extract Owner tab → go back → next property

File: reonomy-scraper-v9-owner-tab.js


Workflow (Based on manual inspection)

  1. Login to Reonomy
  2. Navigate to search
  3. Apply advanced filters (optional but helpful):
    • "Has Phone" checkbox
    • "Has Email" checkbox
  4. Search for location (e.g., "Eatontown, NJ")
  5. Extract property IDs from search results
  6. For each property:
    • Click property button (navigate into property page)
    • Wait 5-8 seconds for page to load
    • Navigate to /ownership tab (CRITICAL - this is where contact info is!)
    • Wait 8-10 seconds for ownership tab content to load
    • Extract contact info:
      • Emails: a[href^="mailto:"]
      • Phones: a[href^="tel:"]
      • Owner name: From page text regex
      • Property address: From h1-h6 heading
    • Go back to search results
  7. Repeat for next property

Key Differences from Previous Attempts

Aspect Old Approach New Approach (v9)
URL /property/{id} /search/{id}/property/{id}/ownership
Navigation Direct to page Click property → Go to ownership
View Dashboard/Search Owner tab (default right side)
Wait Time 2-3 seconds 8-10 seconds (longer)
Data Source Not found Owner tab content

🚀 How to Use v9 Scraper

# Run with default settings (Eatontown, NJ)
cd /Users/jakeshore/.clawdbot/workspace
node reonomy-scraper-v9-owner-tab.js

# Run with custom location
REONOMY_LOCATION="Your City, ST" node reonomy-scraper-v9-owner-tab.js

# Run in visible mode (watch it work)
HEADLESS=false node reonomy-scraper-v9-owner-tab.js

Configuration Options

# Change email/password
REONOMY_EMAIL="your-email@example.com"
REONOMY_PASSWORD="yourpassword"
node reonomy-scraper-v9-owner-tab.js

# Change max properties (default: 20)
MAX_PROPERTIES=50 node reonomy-scraper-v9-owner-tab.js

Output

  • File: reonomy-leads-v9-owner-tab.json
  • Format: JSON with scrapeDate, location, searchId, leadCount, leads[]
  • Each lead contains:
    • scrapeDate
    • propertyId
    • propertyUrl
    • ownershipUrl (with /ownership suffix)
    • address
    • city, state, zip
    • squareFootage
    • propertyType
    • ownerNames (array)
    • emails (array)
    • phones (array)

🎯 What Makes v9 Different

  1. Correct URL Pattern - Uses /search/{search-id}/property/{id}/ownership (not just /property/{id})
  2. Owner Tab Extraction - Extracts from Owner tab content directly (no need to click "View Contact" button)
  3. Click-Through Workflow - Property button → Navigate → Extract → Go back → Next property
  4. Longer Wait Times - 10 second wait after navigation, 10 second wait after going to ownership tab
  5. Full Data Extraction - Not just emails/phones, but also: address, city, state, zip, square footage, property type, owner names

🔧 If v9 Still Fails

Manual Debugging Steps

  1. Run in visible mode to watch the browser
  2. Check if the Owner tab is the default view (it should be)
  3. Verify we're on the correct search results page
  4. Check if property IDs are being extracted correctly
  5. Look for any "Upgrade to view contact" or "Premium only" messages

Alternative: Try Specific Properties

From your manually tested property that had contact info:

  • Search for: "Center Hill, FL" or specific address from that property
  • Navigate directly to that property's ownership tab

Alternative: Check "Recently Viewed Properties"

Your account shows "Recently Viewed Properties" on the home page - these may have guaranteed access to contact info


📝 Summary

We've learned:

  • Correct URL pattern for contact info: /search/{id}/property/{id}/ownership
  • Contact info is in Owner tab (right side, default)
  • Emails: a[href^="mailto:"]
  • Phones: a[href^="tel:"]
  • Can extract: address, owner names, property details
  • ⚠️ Contact info may be limited by account tier or property type

Current Best Approach: v9 Owner Tab Extractor

Next Step: Test v9 and see if it successfully finds contact info on properties that have it available.