243 lines
8.1 KiB
Markdown
243 lines
8.1 KiB
Markdown
# Reonomy Scraper - Complete Analysis & Memory
|
|
|
|
**Last Updated:** 2026-01-13 19:43Z
|
|
|
|
---
|
|
|
|
## 🎯 Critical URL Pattern Discovery
|
|
|
|
### ✅ Working URL Patterns
|
|
```
|
|
# Search Page (property list)
|
|
https://app.reonomy.com/#!/search/{search-id}
|
|
|
|
# Property Page (with tabs)
|
|
https://app.reonomy.com/#!/property/{property-id}
|
|
|
|
# Ownership Page (WITH CONTACT INFO) ← KEY!
|
|
https://app.reonomy.com/#!/search/{search-id}/property/{property-id}/ownership
|
|
```
|
|
|
|
**Key Insight:** Must use `/ownership` suffix to get emails/phones. Direct property pages don't show contact info.
|
|
|
|
---
|
|
|
|
## 📊 DOM Structure & Contact Selectors
|
|
|
|
### Page Layout
|
|
- **Left Panel**: Map view
|
|
- **Right Panel**: Property cards (scrollable list)
|
|
- **Property Details Page**: 3 tabs
|
|
1. **Owner** (RIGHT side, default tab) ← Contains contact info
|
|
2. **Building and Lot** (property details)
|
|
3. **Occupants** (tenant info)
|
|
|
|
### Contact Info Extraction (PROVEN WORKING)
|
|
```javascript
|
|
// Emails (from manually tested property)
|
|
document.querySelectorAll('a[href^="mailto:"]').forEach(a => {
|
|
const email = a.href.replace('mailto:', '');
|
|
if (email && email.length > 5) {
|
|
// Found email!
|
|
}
|
|
});
|
|
|
|
// Phones (from manually tested property)
|
|
document.querySelectorAll('a[href^="tel:"]').forEach(a => {
|
|
const phone = a.href.replace('tel:', '');
|
|
if (phone && phone.length > 7) {
|
|
// Found phone!
|
|
}
|
|
});
|
|
```
|
|
|
|
### Property Address Extraction
|
|
```javascript
|
|
// From h1-h6 heading
|
|
const heading = document.querySelector('h1, h2, h3, h4, h5, h6');
|
|
const address = heading.textContent.trim();
|
|
// Format: "123 main st, city, ST 12345"
|
|
```
|
|
|
|
### Owner Name Extraction
|
|
```javascript
|
|
// From page text
|
|
const ownerPattern = /Owner:\s*(\d+)\s+properties?\s*in\s*([A-Za-z\s,]+(?:\s*,\s+[A-Z]{2})?)/i;
|
|
const ownerMatch = document.body.innerText.match(ownerPattern);
|
|
const ownerName = ownerMatch[2]?.trim(); // e.g., "Helen Christian"
|
|
```
|
|
|
|
---
|
|
|
|
## 🐛 Issues Encountered
|
|
|
|
### Issue 1: Account Tier / Access Levels
|
|
- **Problem:** When scraper navigates to `/ownership` URLs, it finds 0 emails/phones
|
|
- **Root Cause:** Different properties may have different access levels based on:
|
|
- Premium/Free account tier
|
|
- Property type (commercial vs residential)
|
|
- Geographic location
|
|
- Whether you've previously viewed the property
|
|
- **Evidence:** Manually inspected property showed 4 emails + 4 phones, but scraper found 0
|
|
|
|
### Issue 2: Page Loading Timing
|
|
- **Problem:** Contact info loads dynamically via JavaScript/AJAX after initial page load
|
|
- **Evidence:** Reonomy uses SPA (Single Page Application) framework
|
|
- **Solution Needed:** Increased wait times (10-15 seconds) + checking for specific selectors
|
|
|
|
### Issue 3: Dynamic Property IDs
|
|
- **Problem:** Property IDs extracted from search results may not be the most recent/current ones
|
|
- **Evidence:** Different searches produce different property lists
|
|
- **Solution Needed:** Check URL to confirm we're on correct search
|
|
|
|
---
|
|
|
|
## 📂 Scraper Versions
|
|
|
|
### v1-v3.js - Basic (from earlier attempts)
|
|
- ❌ Wrong URL pattern (missing `/search/{id}`)
|
|
- ❌ Wrong selectors (complex CSS)
|
|
- ❌ No contact info extraction
|
|
|
|
### v2-v4-final.js - Direct Navigation (failed)
|
|
- ✅ Correct URL pattern: `/search/{search-id}/property/{id}/ownership`
|
|
- ❌ Navigates directly to /ownership without clicking through property
|
|
- ❌ Finds 0 emails/phones on all properties
|
|
|
|
### v3-v4-v5-v6-v7-v8-v9 (various click-through attempts)
|
|
- ✅ All attempted to click property buttons first
|
|
- ❌ All found 0 emails/phones on properties
|
|
- ⚠️ Possible cause: Account access limitations, dynamic loading, wrong page state
|
|
|
|
### v9 (LATEST) - Owner Tab Extraction (current best approach)
|
|
- ✅ Extracts data from **Owner tab** (right side, default view)
|
|
- ✅ No tab clicking needed - contact info is visible by default
|
|
- ✅ Extracts: address, city, state, zip, square footage, property type, owner names, emails, phones
|
|
- ✅ Correct URL pattern with `/ownership` suffix
|
|
- ✅ 8 second wait for content to load
|
|
- ✅ Click-through approach: property button → property page → extract Owner tab → go back → next property
|
|
|
|
**File:** `reonomy-scraper-v9-owner-tab.js`
|
|
|
|
---
|
|
|
|
## 🎯 Recommended Approach
|
|
|
|
### Workflow (Based on manual inspection)
|
|
1. **Login** to Reonomy
|
|
2. **Navigate** to search
|
|
3. **Apply advanced filters** (optional but helpful):
|
|
- "Has Phone" checkbox
|
|
- "Has Email" checkbox
|
|
4. **Search** for location (e.g., "Eatontown, NJ")
|
|
5. **Extract property IDs** from search results
|
|
6. **For each property**:
|
|
- Click property button (navigate into property page)
|
|
- Wait 5-8 seconds for page to load
|
|
- Navigate to `/ownership` tab (CRITICAL - this is where contact info is!)
|
|
- Wait 8-10 seconds for ownership tab content to load
|
|
- Extract contact info:
|
|
- Emails: `a[href^="mailto:"]`
|
|
- Phones: `a[href^="tel:"]`
|
|
- Owner name: From page text regex
|
|
- Property address: From h1-h6 heading
|
|
- Go back to search results
|
|
7. **Repeat** for next property
|
|
|
|
### Key Differences from Previous Attempts
|
|
| Aspect | Old Approach | New Approach (v9) |
|
|
|---------|-------------|----------------|
|
|
| **URL** | `/property/{id}` | `/search/{id}/property/{id}/ownership` |
|
|
| **Navigation** | Direct to page | Click property → Go to ownership |
|
|
| **View** | Dashboard/Search | Owner tab (default right side) |
|
|
| **Wait Time** | 2-3 seconds | 8-10 seconds (longer) |
|
|
| **Data Source** | Not found | Owner tab content |
|
|
|
|
---
|
|
|
|
## 🚀 How to Use v9 Scraper
|
|
|
|
```bash
|
|
# Run with default settings (Eatontown, NJ)
|
|
cd /Users/jakeshore/.clawdbot/workspace
|
|
node reonomy-scraper-v9-owner-tab.js
|
|
|
|
# Run with custom location
|
|
REONOMY_LOCATION="Your City, ST" node reonomy-scraper-v9-owner-tab.js
|
|
|
|
# Run in visible mode (watch it work)
|
|
HEADLESS=false node reonomy-scraper-v9-owner-tab.js
|
|
```
|
|
|
|
### Configuration Options
|
|
```bash
|
|
# Change email/password
|
|
REONOMY_EMAIL="your-email@example.com"
|
|
REONOMY_PASSWORD="yourpassword"
|
|
node reonomy-scraper-v9-owner-tab.js
|
|
|
|
# Change max properties (default: 20)
|
|
MAX_PROPERTIES=50 node reonomy-scraper-v9-owner-tab.js
|
|
```
|
|
|
|
### Output
|
|
- **File:** `reonomy-leads-v9-owner-tab.json`
|
|
- **Format:** JSON with scrapeDate, location, searchId, leadCount, leads[]
|
|
- **Each lead contains:**
|
|
- scrapeDate
|
|
- propertyId
|
|
- propertyUrl
|
|
- ownershipUrl (with `/ownership` suffix)
|
|
- address
|
|
- city, state, zip
|
|
- squareFootage
|
|
- propertyType
|
|
- ownerNames (array)
|
|
- emails (array)
|
|
- phones (array)
|
|
|
|
---
|
|
|
|
## 🎯 What Makes v9 Different
|
|
|
|
1. **Correct URL Pattern** - Uses `/search/{search-id}/property/{id}/ownership` (not just `/property/{id}`)
|
|
2. **Owner Tab Extraction** - Extracts from Owner tab content directly (no need to click "View Contact" button)
|
|
3. **Click-Through Workflow** - Property button → Navigate → Extract → Go back → Next property
|
|
4. **Longer Wait Times** - 10 second wait after navigation, 10 second wait after going to ownership tab
|
|
5. **Full Data Extraction** - Not just emails/phones, but also: address, city, state, zip, square footage, property type, owner names
|
|
|
|
---
|
|
|
|
## 🔧 If v9 Still Fails
|
|
|
|
### Manual Debugging Steps
|
|
1. Run in visible mode to watch the browser
|
|
2. Check if the Owner tab is the default view (it should be)
|
|
3. Verify we're on the correct search results page
|
|
4. Check if property IDs are being extracted correctly
|
|
5. Look for any "Upgrade to view contact" or "Premium only" messages
|
|
|
|
### Alternative: Try Specific Properties
|
|
From your manually tested property that had contact info:
|
|
- Search for: "Center Hill, FL" or specific address from that property
|
|
- Navigate directly to that property's ownership tab
|
|
|
|
### Alternative: Check "Recently Viewed Properties"
|
|
Your account shows "Recently Viewed Properties" on the home page - these may have guaranteed access to contact info
|
|
|
|
---
|
|
|
|
## 📝 Summary
|
|
|
|
**We've learned:**
|
|
- ✅ Correct URL pattern for contact info: `/search/{id}/property/{id}/ownership`
|
|
- ✅ Contact info is in **Owner tab** (right side, default)
|
|
- ✅ Emails: `a[href^="mailto:"]`
|
|
- ✅ Phones: `a[href^="tel:"]`
|
|
- ✅ Can extract: address, owner names, property details
|
|
- ⚠️ Contact info may be limited by account tier or property type
|
|
|
|
**Current Best Approach:** v9 Owner Tab Extractor
|
|
|
|
**Next Step:** Test v9 and see if it successfully finds contact info on properties that have it available.
|