Jake Shore df4aa799f8 Daily backup: 2026-01-24 - Workspace files including Discord bot automation research, Reonomy scraper versions, backup scripts, and project config

2026-01-24 05:09:55 -05:00

5.8 KiB

Raw Blame History

Reonomy Scraper Update - Contact Extraction

Summary

The Reonomy scraper has been updated to properly extract email and phone numbers from property and owner detail pages. Previously, the scraper only extracted data from the dashboard/search results page, resulting in empty email and phone fields.

Changes Made

1. New Functions Added

`extractPropertyContactInfo(page, propertyUrl)`

Visits each property detail page
Extracts email and phone numbers using multiple selector strategies
Uses regex fallback to find contact info in page text
Returns a contact info object with: email, phone, ownerName, propertyAddress, propertyType, squareFootage

`extractOwnerContactInfo(page, ownerUrl)`

Visits each owner detail page
Extracts email and phone numbers using multiple selector strategies
Uses regex fallback to find contact info in page text
Returns a contact info object with: email, phone, ownerName, ownerLocation, propertyCount

`extractLinksFromPage(page)`

Finds all property and owner links on the current page
Extracts IDs from URLs and reconstructs full Reonomy URLs
Removes duplicate URLs
Returns arrays of property URLs and owner URLs

2. Configuration Options Added

MAX_PROPERTIES = 20 - Limits number of properties to scrape (rate limiting)
MAX_OWNERS = 20 - Limits number of owners to scrape (rate limiting)
PAGE_DELAY_MS = 3000 - Delay between page visits (3 seconds) to avoid rate limiting

3. Updated Main Scraper Logic

The scraper now:

Logs in to Reonomy
Performs a search
Extracts all property and owner links from the results page
NEW: Visits each property page (up to MAX_PROPERTIES) to extract contact info
NEW: Visits each owner page (up to MAX_OWNERS) to extract contact info
Saves leads with populated email and phone fields

4. Enhanced Extraction Methods

For email detection:

Multiple CSS selectors (a[href^="mailto:"], .email, [data-test*="email"], etc.)
Regex patterns for email addresses
Falls back to page text analysis

For phone detection:

Multiple CSS selectors (a[href^="tel:"], .phone, [data-test*="phone"], etc.)
Multiple regex patterns for US phone numbers
Falls back to page text analysis

Rate Limiting

The scraper now includes rate limiting to avoid being blocked:

3-second delay between page visits (PAGE_DELAY_MS)
0.5-second delay between saving each record
Limits on total properties/owners scraped

Testing Instructions

Option 1: Using the wrapper script with 1Password

cd /Users/jakeshore/.clawdbot/workspace
./scrape-reonomy.sh --1password --location "New York, NY"

Option 2: Using the wrapper script with manual credentials

cd /Users/jakeshore/.clawdbot/workspace
./scrape-reonomy.sh --location "New York, NY"

You'll be prompted for your email and password.

Option 3: Direct execution with environment variables

cd /Users/jakeshore/.clawdbot/workspace
export REONOMY_EMAIL="your@email.com"
export REONOMY_PASSWORD="yourpassword"
export REONOMY_LOCATION="New York, NY"
node reonomy-scraper.js

Option 4: Run in headless mode

HEADLESS=true REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" node reonomy-scraper.js

Option 5: Save to JSON file (no Google Sheets)

REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" node reonomy-scraper.js

If gog CLI is not set up, it will save to reonomy-leads.json.

Option 6: Use existing Google Sheet

REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" REONOMY_SHEET_ID="your-sheet-id" node reonomy-scraper.js

Expected Output

After running the scraper, you should see logs like:

[1/10]
  🏠 Visiting property: https://app.reonomy.com/#!/property/xxx-xxx-xxx
    📧 Email: owner@example.com
    📞 Phone: (555) 123-4567

[2/10]
  🏠 Visiting property: https://app.reonomy.com/#!/property/yyy-yyy-yyy
    📧 Email: Not found
    📞 Phone: Not found

[1/5]
  👤 Visiting owner: https://app.reonomy.com/#!/person/zzz-zzz-zzz
    📧 Email: another@example.com
    📞 Phone: (555) 987-6543

The final reonomy-leads.json or Google Sheet should have populated email and phone fields.

Verification

After scraping, check the output:

If using JSON:

cat reonomy-leads.json | jq '.leads[] | select(.email != "" or .phone != "")'

If using Google Sheets:

Open the sheet at https://docs.google.com/spreadsheets/d/{sheet-id} and verify the Email and Phone columns are populated.

Troubleshooting

"No leads extracted"

The page structure may have changed
Check the screenshot saved at /tmp/reonomy-no-leads.png
Review the log file at reonomy-scraper.log

"Email/Phone not found"

Not all properties/owners have contact information
Reonomy may not display contact info for certain records
The information may be behind a paywall or require higher access

Rate limiting errors

Increase PAGE_DELAY_MS in the script (default is 3000ms)
Decrease MAX_PROPERTIES and MAX_OWNERS (default is 20 each)
Run the scraper in smaller batches

Key Features of the Updated Scraper

Deep extraction: Visits each detail page to find contact info
Multiple fallback strategies: Tries multiple selectors and regex patterns
Rate limiting: Built-in delays to avoid blocking
Configurable limits: Can adjust number of properties/owners to scrape
Detailed logging: Shows progress for each page visited
Error handling: Continues even if individual page extraction fails

Next Steps

Test the scraper with your credentials
Verify email and phone fields are populated
Adjust limits (MAX_PROPERTIES, MAX_OWNERS) and delays (PAGE_DELAY_MS) as needed
Review the extracted data quality and refine extraction patterns if needed

5.8 KiB Raw Blame History