373 lines
8.2 KiB
Markdown

# Reonomy Lead Scraper
A browser automation tool that scrapes property and owner leads from [Reonomy](https://www.reonomy.com/) and exports them to Google Sheets.
## Features
- ✅ Automated login to Reonomy
- 🔍 Search for properties by location
- 📊 Extract lead data:
- Owner Name
- Property Address
- City, State, ZIP
- Property Type
- Square Footage
- Owner Location
- Property Count
- Property/Owner URLs
- 📈 Export to Google Sheets via `gog` CLI
- 🔐 Secure credential handling (environment variables or 1Password)
- 🖥️ Headless or visible browser mode
## Prerequisites
### Required Tools
1. **Node.js** (v14 or higher)
```bash
# Check if installed
node --version
```
2. **gog CLI** - Google Workspace command-line tool
```bash
# Install via Homebrew
brew install gog
# Or from GitHub
# https://github.com/stripe/gog
# Authenticate
gog auth login
```
3. **Puppeteer** (installed via npm with this script)
### Optional Tools
- **1Password CLI** (`op`) - For secure credential storage
```bash
brew install --cask 1password-cli
```
## Installation
1. Clone or navigate to the workspace directory:
```bash
cd /Users/jakeshore/.clawdbot/workspace
```
2. Install Node.js dependencies:
```bash
npm install
```
3. Make the script executable (should already be done):
```bash
chmod +x scrape-reonomy.sh
```
## Setup
### Option 1: Environment Variables (Recommended for Development)
Set your Reonomy credentials as environment variables:
```bash
export REONOMY_EMAIL="henry@realestateenhanced.com"
export REONOMY_PASSWORD="your_password_here"
```
Or add to your shell profile (e.g., `~/.zshrc` or `~/.bash_profile`):
```bash
echo 'export REONOMY_EMAIL="henry@realestateenhanced.com"' >> ~/.zshrc
echo 'export REONOMY_PASSWORD="9082166532"' >> ~/.zshrc
source ~/.zshrc
```
### Option 2: 1Password (Recommended for Production)
1. Create a 1Password item named "Reonomy"
2. Add fields:
- `email`: Your Reonomy email
- `password`: Your Reonomy password
3. Use the `--1password` flag when running the scraper:
```bash
./scrape-reonomy.sh --1password
```
### Option 3: Interactive Prompt
If you don't set credentials, the script will prompt you for them:
```bash
./scrape-reonomy.sh
```
## Usage
### Basic Usage
Run the scraper with default settings (searches "New York, NY"):
```bash
./scrape-reonomy.sh
```
### Search a Different Location
```bash
./scrape-reonomy.sh --location "Los Angeles, CA"
```
### Use Existing Google Sheet
```bash
./scrape-reonomy.sh --sheet "1ABC123XYZ..."
```
### Run in Headless Mode (No Browser Window)
```bash
./scrape-reonomy.sh --headless
```
### Combined Options
```bash
# Search Chicago, use headless mode, save to existing sheet
./scrape-reonomy.sh \
--location "Chicago, IL" \
--headless \
--sheet "1ABC123XYZ..."
```
### Using 1Password
```bash
./scrape-reonomy.sh --1password --headless
```
### Direct Node.js Usage
You can also run the scraper directly with Node.js:
```bash
REONOMY_EMAIL="..." \
REONOMY_PASSWORD="..." \
REONOMY_LOCATION="Miami, FL" \
HEADLESS=true \
node reonomy-scraper.js
```
## Output
### Google Sheet
The scraper creates or appends to a Google Sheet with the following columns:
| Column | Description |
|--------|-------------|
| Scrape Date | Date the lead was scraped |
| Owner Name | Property owner's name |
| Property Address | Street address of the property |
| City | Property city |
| State | Property state |
| ZIP | Property ZIP code |
| Property Type | Type of property (e.g., "General Industrial") |
| Square Footage | Property size |
| Owner Location | Owner's location |
| Property Count | Number of properties owned |
| Property URL | Direct link to property page |
| Owner URL | Direct link to owner profile |
| Email | Owner email (if available) |
| Phone | Owner phone (if available) |
### Log File
Detailed logs are saved to:
```
/Users/jakeshore/.clawdbot/workspace/reonomy-scraper.log
```
## Command-Line Options
| Option | Description |
|--------|-------------|
| `-h, --help` | Show help message |
| `-l, --location LOC` | Search location (default: "New York, NY") |
| `-s, --sheet ID` | Google Sheet ID (creates new sheet if not provided) |
| `-H, --headless` | Run in headless mode (no browser window) |
| `--no-headless` | Run with visible browser |
| `--1password` | Fetch credentials from 1Password |
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `REONOMY_EMAIL` | Yes | Your Reonomy email address |
| `REONOMY_PASSWORD` | Yes | Your Reonomy password |
| `REONOMY_LOCATION` | No | Search location (default: "New York, NY") |
| `REONOMY_SHEET_ID` | No | Google Sheet ID (creates new sheet if not set) |
| `REONOMY_SHEET_TITLE` | No | Title for new sheet (default: "Reonomy Leads") |
| `HEADLESS` | No | Run in headless mode ("true" or "false") |
## Troubleshooting
### "Login failed" Error
- Verify your credentials are correct
- Check if Reonomy has changed their login process
- Try running without headless mode to see what's happening:
```bash
./scrape-reonomy.sh --no-headless
```
### "gog command failed" Error
- Ensure `gog` is installed and authenticated:
```bash
gog auth login
```
- Check your Google account has Google Sheets access
### "No leads extracted" Warning
- The page structure may have changed
- The search location might not have results
- Check the screenshot saved to `/tmp/reonomy-no-leads.png` or `/tmp/reonomy-error.png`
### Puppeteer Issues
If you encounter browser-related errors, try:
```bash
npm install puppeteer --force
```
## Security Notes
### Credential Security
⚠️ **Important**: Never commit your credentials to version control!
**Best Practices:**
1. Use environment variables (set in your shell profile)
2. Use 1Password for production environments
3. Add `.env` files to `.gitignore`
4. Never hardcode credentials in scripts
### Recommended `.gitignore`
```gitignore
# Credentials
.env
.reonomy-credentials.*
# Logs
*.log
reonomy-scraper.log
# Screenshots
*.png
/tmp/reonomy-*.png
# Node
node_modules/
package-lock.json
```
## Advanced Usage
### Scheduled Scraping
You can set up a cron job to scrape automatically:
```bash
# Edit crontab
crontab -e
# Add line to scrape every morning at 9 AM
0 9 * * * /Users/jakeshore/.clawdbot/workspace/scrape-reonomy.sh --headless --1password >> /tmp/reonomy-cron.log 2>&1
```
### Custom Search Parameters
The scraper currently searches by location. To customize:
1. Edit `reonomy-scraper.js`
2. Modify the `extractLeadsFromPage` function
3. Add filters for:
- Property type
- Price range
- Building size
- Owner type
### Integrating with Other Tools
The Google Sheet can be connected to:
- Google Data Studio for dashboards
- Zapier for automations
- Custom scripts for further processing
## Development
### File Structure
```
workspace/
├── reonomy-scraper.js # Main scraper script
├── scrape-reonomy.sh # Shell wrapper
├── package.json # Node.js dependencies
├── README.md # This file
├── reonomy-scraper.log # Run logs
└── node_modules/ # Dependencies
```
### Testing
Test the scraper in visible mode first:
```bash
./scrape-reonomy.sh --no-headless --location "Brooklyn, NY"
```
### Extending the Scraper
To add new data fields:
1. Update the `headers` array in `initializeSheet()`
2. Update the `extractLeadsFromPage()` function
3. Add new parsing functions as needed
## Support
### Getting Help
- Check the log file: `reonomy-scraper.log`
- Run with visible browser to see issues: `--no-headless`
- Check screenshots in `/tmp/` directory
### Common Issues
| Issue | Solution |
|-------|----------|
| Login fails | Verify credentials, try manual login |
| No leads found | Try a different location, check search results |
| Google Sheets error | Run `gog auth login` to re-authenticate |
| Browser timeout | Increase timeout in the script |
## License
This tool is for educational and personal use. Respect Reonomy's Terms of Service when scraping.
## Changelog
### v1.0.0 (Current)
- Initial release
- Automated login
- Location-based search
- Google Sheets export
- 1Password integration
- Headless mode support