373 lines
8.2 KiB
Markdown
373 lines
8.2 KiB
Markdown
# Reonomy Lead Scraper
|
|
|
|
A browser automation tool that scrapes property and owner leads from [Reonomy](https://www.reonomy.com/) and exports them to Google Sheets.
|
|
|
|
## Features
|
|
|
|
- ✅ Automated login to Reonomy
|
|
- 🔍 Search for properties by location
|
|
- 📊 Extract lead data:
|
|
- Owner Name
|
|
- Property Address
|
|
- City, State, ZIP
|
|
- Property Type
|
|
- Square Footage
|
|
- Owner Location
|
|
- Property Count
|
|
- Property/Owner URLs
|
|
- 📈 Export to Google Sheets via `gog` CLI
|
|
- 🔐 Secure credential handling (environment variables or 1Password)
|
|
- 🖥️ Headless or visible browser mode
|
|
|
|
## Prerequisites
|
|
|
|
### Required Tools
|
|
|
|
1. **Node.js** (v14 or higher)
|
|
```bash
|
|
# Check if installed
|
|
node --version
|
|
```
|
|
|
|
2. **gog CLI** - Google Workspace command-line tool
|
|
```bash
|
|
# Install via Homebrew
|
|
brew install gog
|
|
|
|
# Or from GitHub
|
|
# https://github.com/stripe/gog
|
|
|
|
# Authenticate
|
|
gog auth login
|
|
```
|
|
|
|
3. **Puppeteer** (installed via npm with this script)
|
|
|
|
### Optional Tools
|
|
|
|
- **1Password CLI** (`op`) - For secure credential storage
|
|
```bash
|
|
brew install --cask 1password-cli
|
|
```
|
|
|
|
## Installation
|
|
|
|
1. Clone or navigate to the workspace directory:
|
|
```bash
|
|
cd /Users/jakeshore/.clawdbot/workspace
|
|
```
|
|
|
|
2. Install Node.js dependencies:
|
|
```bash
|
|
npm install
|
|
```
|
|
|
|
3. Make the script executable (should already be done):
|
|
```bash
|
|
chmod +x scrape-reonomy.sh
|
|
```
|
|
|
|
## Setup
|
|
|
|
### Option 1: Environment Variables (Recommended for Development)
|
|
|
|
Set your Reonomy credentials as environment variables:
|
|
|
|
```bash
|
|
export REONOMY_EMAIL="henry@realestateenhanced.com"
|
|
export REONOMY_PASSWORD="your_password_here"
|
|
```
|
|
|
|
Or add to your shell profile (e.g., `~/.zshrc` or `~/.bash_profile`):
|
|
|
|
```bash
|
|
echo 'export REONOMY_EMAIL="henry@realestateenhanced.com"' >> ~/.zshrc
|
|
echo 'export REONOMY_PASSWORD="9082166532"' >> ~/.zshrc
|
|
source ~/.zshrc
|
|
```
|
|
|
|
### Option 2: 1Password (Recommended for Production)
|
|
|
|
1. Create a 1Password item named "Reonomy"
|
|
2. Add fields:
|
|
- `email`: Your Reonomy email
|
|
- `password`: Your Reonomy password
|
|
3. Use the `--1password` flag when running the scraper:
|
|
|
|
```bash
|
|
./scrape-reonomy.sh --1password
|
|
```
|
|
|
|
### Option 3: Interactive Prompt
|
|
|
|
If you don't set credentials, the script will prompt you for them:
|
|
|
|
```bash
|
|
./scrape-reonomy.sh
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
Run the scraper with default settings (searches "New York, NY"):
|
|
|
|
```bash
|
|
./scrape-reonomy.sh
|
|
```
|
|
|
|
### Search a Different Location
|
|
|
|
```bash
|
|
./scrape-reonomy.sh --location "Los Angeles, CA"
|
|
```
|
|
|
|
### Use Existing Google Sheet
|
|
|
|
```bash
|
|
./scrape-reonomy.sh --sheet "1ABC123XYZ..."
|
|
```
|
|
|
|
### Run in Headless Mode (No Browser Window)
|
|
|
|
```bash
|
|
./scrape-reonomy.sh --headless
|
|
```
|
|
|
|
### Combined Options
|
|
|
|
```bash
|
|
# Search Chicago, use headless mode, save to existing sheet
|
|
./scrape-reonomy.sh \
|
|
--location "Chicago, IL" \
|
|
--headless \
|
|
--sheet "1ABC123XYZ..."
|
|
```
|
|
|
|
### Using 1Password
|
|
|
|
```bash
|
|
./scrape-reonomy.sh --1password --headless
|
|
```
|
|
|
|
### Direct Node.js Usage
|
|
|
|
You can also run the scraper directly with Node.js:
|
|
|
|
```bash
|
|
REONOMY_EMAIL="..." \
|
|
REONOMY_PASSWORD="..." \
|
|
REONOMY_LOCATION="Miami, FL" \
|
|
HEADLESS=true \
|
|
node reonomy-scraper.js
|
|
```
|
|
|
|
## Output
|
|
|
|
### Google Sheet
|
|
|
|
The scraper creates or appends to a Google Sheet with the following columns:
|
|
|
|
| Column | Description |
|
|
|--------|-------------|
|
|
| Scrape Date | Date the lead was scraped |
|
|
| Owner Name | Property owner's name |
|
|
| Property Address | Street address of the property |
|
|
| City | Property city |
|
|
| State | Property state |
|
|
| ZIP | Property ZIP code |
|
|
| Property Type | Type of property (e.g., "General Industrial") |
|
|
| Square Footage | Property size |
|
|
| Owner Location | Owner's location |
|
|
| Property Count | Number of properties owned |
|
|
| Property URL | Direct link to property page |
|
|
| Owner URL | Direct link to owner profile |
|
|
| Email | Owner email (if available) |
|
|
| Phone | Owner phone (if available) |
|
|
|
|
### Log File
|
|
|
|
Detailed logs are saved to:
|
|
```
|
|
/Users/jakeshore/.clawdbot/workspace/reonomy-scraper.log
|
|
```
|
|
|
|
## Command-Line Options
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `-h, --help` | Show help message |
|
|
| `-l, --location LOC` | Search location (default: "New York, NY") |
|
|
| `-s, --sheet ID` | Google Sheet ID (creates new sheet if not provided) |
|
|
| `-H, --headless` | Run in headless mode (no browser window) |
|
|
| `--no-headless` | Run with visible browser |
|
|
| `--1password` | Fetch credentials from 1Password |
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Required | Description |
|
|
|----------|----------|-------------|
|
|
| `REONOMY_EMAIL` | Yes | Your Reonomy email address |
|
|
| `REONOMY_PASSWORD` | Yes | Your Reonomy password |
|
|
| `REONOMY_LOCATION` | No | Search location (default: "New York, NY") |
|
|
| `REONOMY_SHEET_ID` | No | Google Sheet ID (creates new sheet if not set) |
|
|
| `REONOMY_SHEET_TITLE` | No | Title for new sheet (default: "Reonomy Leads") |
|
|
| `HEADLESS` | No | Run in headless mode ("true" or "false") |
|
|
|
|
## Troubleshooting
|
|
|
|
### "Login failed" Error
|
|
|
|
- Verify your credentials are correct
|
|
- Check if Reonomy has changed their login process
|
|
- Try running without headless mode to see what's happening:
|
|
```bash
|
|
./scrape-reonomy.sh --no-headless
|
|
```
|
|
|
|
### "gog command failed" Error
|
|
|
|
- Ensure `gog` is installed and authenticated:
|
|
```bash
|
|
gog auth login
|
|
```
|
|
- Check your Google account has Google Sheets access
|
|
|
|
### "No leads extracted" Warning
|
|
|
|
- The page structure may have changed
|
|
- The search location might not have results
|
|
- Check the screenshot saved to `/tmp/reonomy-no-leads.png` or `/tmp/reonomy-error.png`
|
|
|
|
### Puppeteer Issues
|
|
|
|
If you encounter browser-related errors, try:
|
|
```bash
|
|
npm install puppeteer --force
|
|
```
|
|
|
|
## Security Notes
|
|
|
|
### Credential Security
|
|
|
|
⚠️ **Important**: Never commit your credentials to version control!
|
|
|
|
**Best Practices:**
|
|
1. Use environment variables (set in your shell profile)
|
|
2. Use 1Password for production environments
|
|
3. Add `.env` files to `.gitignore`
|
|
4. Never hardcode credentials in scripts
|
|
|
|
### Recommended `.gitignore`
|
|
|
|
```gitignore
|
|
# Credentials
|
|
.env
|
|
.reonomy-credentials.*
|
|
|
|
# Logs
|
|
*.log
|
|
reonomy-scraper.log
|
|
|
|
# Screenshots
|
|
*.png
|
|
/tmp/reonomy-*.png
|
|
|
|
# Node
|
|
node_modules/
|
|
package-lock.json
|
|
```
|
|
|
|
## Advanced Usage
|
|
|
|
### Scheduled Scraping
|
|
|
|
You can set up a cron job to scrape automatically:
|
|
|
|
```bash
|
|
# Edit crontab
|
|
crontab -e
|
|
|
|
# Add line to scrape every morning at 9 AM
|
|
0 9 * * * /Users/jakeshore/.clawdbot/workspace/scrape-reonomy.sh --headless --1password >> /tmp/reonomy-cron.log 2>&1
|
|
```
|
|
|
|
### Custom Search Parameters
|
|
|
|
The scraper currently searches by location. To customize:
|
|
|
|
1. Edit `reonomy-scraper.js`
|
|
2. Modify the `extractLeadsFromPage` function
|
|
3. Add filters for:
|
|
- Property type
|
|
- Price range
|
|
- Building size
|
|
- Owner type
|
|
|
|
### Integrating with Other Tools
|
|
|
|
The Google Sheet can be connected to:
|
|
- Google Data Studio for dashboards
|
|
- Zapier for automations
|
|
- Custom scripts for further processing
|
|
|
|
## Development
|
|
|
|
### File Structure
|
|
|
|
```
|
|
workspace/
|
|
├── reonomy-scraper.js # Main scraper script
|
|
├── scrape-reonomy.sh # Shell wrapper
|
|
├── package.json # Node.js dependencies
|
|
├── README.md # This file
|
|
├── reonomy-scraper.log # Run logs
|
|
└── node_modules/ # Dependencies
|
|
```
|
|
|
|
### Testing
|
|
|
|
Test the scraper in visible mode first:
|
|
|
|
```bash
|
|
./scrape-reonomy.sh --no-headless --location "Brooklyn, NY"
|
|
```
|
|
|
|
### Extending the Scraper
|
|
|
|
To add new data fields:
|
|
1. Update the `headers` array in `initializeSheet()`
|
|
2. Update the `extractLeadsFromPage()` function
|
|
3. Add new parsing functions as needed
|
|
|
|
## Support
|
|
|
|
### Getting Help
|
|
|
|
- Check the log file: `reonomy-scraper.log`
|
|
- Run with visible browser to see issues: `--no-headless`
|
|
- Check screenshots in `/tmp/` directory
|
|
|
|
### Common Issues
|
|
|
|
| Issue | Solution |
|
|
|-------|----------|
|
|
| Login fails | Verify credentials, try manual login |
|
|
| No leads found | Try a different location, check search results |
|
|
| Google Sheets error | Run `gog auth login` to re-authenticate |
|
|
| Browser timeout | Increase timeout in the script |
|
|
|
|
## License
|
|
|
|
This tool is for educational and personal use. Respect Reonomy's Terms of Service when scraping.
|
|
|
|
## Changelog
|
|
|
|
### v1.0.0 (Current)
|
|
- Initial release
|
|
- Automated login
|
|
- Location-based search
|
|
- Google Sheets export
|
|
- 1Password integration
|
|
- Headless mode support
|