# Reonomy Scraper Update - Contact Extraction ## Summary The Reonomy scraper has been updated to properly extract email and phone numbers from property and owner detail pages. Previously, the scraper only extracted data from the dashboard/search results page, resulting in empty email and phone fields. ## Changes Made ### 1. New Functions Added #### `extractPropertyContactInfo(page, propertyUrl)` - Visits each property detail page - Extracts email and phone numbers using multiple selector strategies - Uses regex fallback to find contact info in page text - Returns a contact info object with: email, phone, ownerName, propertyAddress, propertyType, squareFootage #### `extractOwnerContactInfo(page, ownerUrl)` - Visits each owner detail page - Extracts email and phone numbers using multiple selector strategies - Uses regex fallback to find contact info in page text - Returns a contact info object with: email, phone, ownerName, ownerLocation, propertyCount #### `extractLinksFromPage(page)` - Finds all property and owner links on the current page - Extracts IDs from URLs and reconstructs full Reonomy URLs - Removes duplicate URLs - Returns arrays of property URLs and owner URLs ### 2. Configuration Options Added - `MAX_PROPERTIES = 20` - Limits number of properties to scrape (rate limiting) - `MAX_OWNERS = 20` - Limits number of owners to scrape (rate limiting) - `PAGE_DELAY_MS = 3000` - Delay between page visits (3 seconds) to avoid rate limiting ### 3. Updated Main Scraper Logic The scraper now: 1. Logs in to Reonomy 2. Performs a search 3. Extracts all property and owner links from the results page 4. **NEW**: Visits each property page (up to MAX_PROPERTIES) to extract contact info 5. **NEW**: Visits each owner page (up to MAX_OWNERS) to extract contact info 6. Saves leads with populated email and phone fields ### 4. Enhanced Extraction Methods For email detection: - Multiple CSS selectors (`a[href^="mailto:"]`, `.email`, `[data-test*="email"]`, etc.) - Regex patterns for email addresses - Falls back to page text analysis For phone detection: - Multiple CSS selectors (`a[href^="tel:"]`, `.phone`, `[data-test*="phone"]`, etc.) - Multiple regex patterns for US phone numbers - Falls back to page text analysis ## Rate Limiting The scraper now includes rate limiting to avoid being blocked: - 3-second delay between page visits (`PAGE_DELAY_MS`) - 0.5-second delay between saving each record - Limits on total properties/owners scraped ## Testing Instructions ### Option 1: Using the wrapper script with 1Password ```bash cd /Users/jakeshore/.clawdbot/workspace ./scrape-reonomy.sh --1password --location "New York, NY" ``` ### Option 2: Using the wrapper script with manual credentials ```bash cd /Users/jakeshore/.clawdbot/workspace ./scrape-reonomy.sh --location "New York, NY" ``` You'll be prompted for your email and password. ### Option 3: Direct execution with environment variables ```bash cd /Users/jakeshore/.clawdbot/workspace export REONOMY_EMAIL="your@email.com" export REONOMY_PASSWORD="yourpassword" export REONOMY_LOCATION="New York, NY" node reonomy-scraper.js ``` ### Option 4: Run in headless mode ```bash HEADLESS=true REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" node reonomy-scraper.js ``` ### Option 5: Save to JSON file (no Google Sheets) ```bash REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" node reonomy-scraper.js ``` If `gog` CLI is not set up, it will save to `reonomy-leads.json`. ### Option 6: Use existing Google Sheet ```bash REONOMY_EMAIL="your@email.com" REONOMY_PASSWORD="yourpassword" REONOMY_SHEET_ID="your-sheet-id" node reonomy-scraper.js ``` ## Expected Output After running the scraper, you should see logs like: ``` [1/10] 🏠 Visiting property: https://app.reonomy.com/#!/property/xxx-xxx-xxx 📧 Email: owner@example.com 📞 Phone: (555) 123-4567 [2/10] 🏠 Visiting property: https://app.reonomy.com/#!/property/yyy-yyy-yyy 📧 Email: Not found 📞 Phone: Not found [1/5] 👤 Visiting owner: https://app.reonomy.com/#!/person/zzz-zzz-zzz 📧 Email: another@example.com 📞 Phone: (555) 987-6543 ``` The final `reonomy-leads.json` or Google Sheet should have populated `email` and `phone` fields. ## Verification After scraping, check the output: ### If using JSON: ```bash cat reonomy-leads.json | jq '.leads[] | select(.email != "" or .phone != "")' ``` ### If using Google Sheets: Open the sheet at `https://docs.google.com/spreadsheets/d/{sheet-id}` and verify the Email and Phone columns are populated. ## Troubleshooting ### "No leads extracted" - The page structure may have changed - Check the screenshot saved at `/tmp/reonomy-no-leads.png` - Review the log file at `reonomy-scraper.log` ### "Email/Phone not found" - Not all properties/owners have contact information - Reonomy may not display contact info for certain records - The information may be behind a paywall or require higher access ### Rate limiting errors - Increase `PAGE_DELAY_MS` in the script (default is 3000ms) - Decrease `MAX_PROPERTIES` and `MAX_OWNERS` (default is 20 each) - Run the scraper in smaller batches ## Key Features of the Updated Scraper 1. **Deep extraction**: Visits each detail page to find contact info 2. **Multiple fallback strategies**: Tries multiple selectors and regex patterns 3. **Rate limiting**: Built-in delays to avoid blocking 4. **Configurable limits**: Can adjust number of properties/owners to scrape 5. **Detailed logging**: Shows progress for each page visited 6. **Error handling**: Continues even if individual page extraction fails ## Next Steps 1. Test the scraper with your credentials 2. Verify email and phone fields are populated 3. Adjust limits (`MAX_PROPERTIES`, `MAX_OWNERS`) and delays (`PAGE_DELAY_MS`) as needed 4. Review the extracted data quality and refine extraction patterns if needed