5.8 KiB
Reonomy Scraper Update - Contact Extraction
Summary
The Reonomy scraper has been updated to properly extract email and phone numbers from property and owner detail pages. Previously, the scraper only extracted data from the dashboard/search results page, resulting in empty email and phone fields.
Changes Made
1. New Functions Added
extractPropertyContactInfo(page, propertyUrl)
- Visits each property detail page
- Extracts email and phone numbers using multiple selector strategies
- Uses regex fallback to find contact info in page text
- Returns a contact info object with: email, phone, ownerName, propertyAddress, propertyType, squareFootage
extractOwnerContactInfo(page, ownerUrl)
- Visits each owner detail page
- Extracts email and phone numbers using multiple selector strategies
- Uses regex fallback to find contact info in page text
- Returns a contact info object with: email, phone, ownerName, ownerLocation, propertyCount
extractLinksFromPage(page)
- Finds all property and owner links on the current page
- Extracts IDs from URLs and reconstructs full Reonomy URLs
- Removes duplicate URLs
- Returns arrays of property URLs and owner URLs
2. Configuration Options Added
MAX_PROPERTIES = 20- Limits number of properties to scrape (rate limiting)MAX_OWNERS = 20- Limits number of owners to scrape (rate limiting)PAGE_DELAY_MS = 3000- Delay between page visits (3 seconds) to avoid rate limiting
3. Updated Main Scraper Logic
The scraper now:
- Logs in to Reonomy
- Performs a search
- Extracts all property and owner links from the results page
- NEW: Visits each property page (up to MAX_PROPERTIES) to extract contact info
- NEW: Visits each owner page (up to MAX_OWNERS) to extract contact info
- Saves leads with populated email and phone fields
4. Enhanced Extraction Methods
For email detection:
- Multiple CSS selectors (
a[href^="mailto:"],.email,[data-test*="email"], etc.) - Regex patterns for email addresses
- Falls back to page text analysis
For phone detection:
- Multiple CSS selectors (
a[href^="tel:"],.phone,[data-test*="phone"], etc.) - Multiple regex patterns for US phone numbers
- Falls back to page text analysis
Rate Limiting
The scraper now includes rate limiting to avoid being blocked:
- 3-second delay between page visits (
PAGE_DELAY_MS) - 0.5-second delay between saving each record
- Limits on total properties/owners scraped
Testing Instructions
Option 1: Using the wrapper script with 1Password
cd /Users/jakeshore/.clawdbot/workspace
./scrape-reonomy.sh --1password --location "New York, NY"
Option 2: Using the wrapper script with manual credentials
cd /Users/jakeshore/.clawdbot/workspace
./scrape-reonomy.sh --location "New York, NY"
You'll be prompted for your email and password.
Option 3: Direct execution with environment variables
cd /Users/jakeshore/.clawdbot/workspace
export REONOMY_EMAIL="your@email.com"
export REONOMY_PASSWORD="yourpassword"
export REONOMY_LOCATION="New York, NY"
node reonomy-scraper.js
Option 4: Run in headless mode
HEADLESS=true REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" node reonomy-scraper.js
Option 5: Save to JSON file (no Google Sheets)
REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" node reonomy-scraper.js
If gog CLI is not set up, it will save to reonomy-leads.json.
Option 6: Use existing Google Sheet
REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" REONOMY_SHEET_ID="your-sheet-id" node reonomy-scraper.js
Expected Output
After running the scraper, you should see logs like:
[1/10]
🏠 Visiting property: https://app.reonomy.com/#!/property/xxx-xxx-xxx
📧 Email: owner@example.com
📞 Phone: (555) 123-4567
[2/10]
🏠 Visiting property: https://app.reonomy.com/#!/property/yyy-yyy-yyy
📧 Email: Not found
📞 Phone: Not found
[1/5]
👤 Visiting owner: https://app.reonomy.com/#!/person/zzz-zzz-zzz
📧 Email: another@example.com
📞 Phone: (555) 987-6543
The final reonomy-leads.json or Google Sheet should have populated email and phone fields.
Verification
After scraping, check the output:
If using JSON:
cat reonomy-leads.json | jq '.leads[] | select(.email != "" or .phone != "")'
If using Google Sheets:
Open the sheet at https://docs.google.com/spreadsheets/d/{sheet-id} and verify the Email and Phone columns are populated.
Troubleshooting
"No leads extracted"
- The page structure may have changed
- Check the screenshot saved at
/tmp/reonomy-no-leads.png - Review the log file at
reonomy-scraper.log
"Email/Phone not found"
- Not all properties/owners have contact information
- Reonomy may not display contact info for certain records
- The information may be behind a paywall or require higher access
Rate limiting errors
- Increase
PAGE_DELAY_MSin the script (default is 3000ms) - Decrease
MAX_PROPERTIESandMAX_OWNERS(default is 20 each) - Run the scraper in smaller batches
Key Features of the Updated Scraper
- Deep extraction: Visits each detail page to find contact info
- Multiple fallback strategies: Tries multiple selectors and regex patterns
- Rate limiting: Built-in delays to avoid blocking
- Configurable limits: Can adjust number of properties/owners to scrape
- Detailed logging: Shows progress for each page visited
- Error handling: Continues even if individual page extraction fails
Next Steps
- Test the scraper with your credentials
- Verify email and phone fields are populated
- Adjust limits (
MAX_PROPERTIES,MAX_OWNERS) and delays (PAGE_DELAY_MS) as needed - Review the extracted data quality and refine extraction patterns if needed