clawdbot-workspace/reonomy-dom-analysis.md

117 lines
3.4 KiB
Markdown

# Reonomy DOM Analysis - Contact Info Extraction
## Key Findings
### URL Structure
The critical discovery is the **correct URL pattern** for accessing property ownership/contact info:
```
https://app.reonomy.com/!/search/{search-id}/property/{property-id}/ownership
```
**Example:**
```
https://app.reonomy.com/!/search/36724b2c-4352-47a1-bc34-619c09cefa72/property/e9437640-d098-53bb-8421-fffb43f78b7e/ownership
```
**Components:**
- `search-id`: `36724b2c-4352-47a1-bc34-619c09cefa72` (from the search query)
- `property-id`: `e9437640-d098-53bb-8421-fffb43f78b7e` (specific property)
- `view`: `ownership` (this is where contact info lives!)
### Contact Info Found on Ownership Page
**Email addresses (4 found):**
- johnsoh@centurylink.net
- helen.christian@sumter.k12.us
- helen.christian@sumter.k12.fl.us
- christj@sumter.k12.fl.us
**Phone numbers (4 found):**
- 352-568-0033
- 517-610-1861
- 352-793-3204
- 352-603-1369
### DOM Selectors for Contact Info
**Email:**
```javascript
document.querySelectorAll('a[href^="mailto:"]')
```
**Phone:**
```javascript
document.querySelectorAll('a[href^="tel:"]')
```
### Property Details
**Property Address:**
```
288 east ln, center hill, FL 33514
```
**How to navigate between properties:**
- From property page: URL contains property ID
- Ownership view: `/ownership` suffix gives contact info
- Other tabs available: `/building`, `/sales`, `/debt`, `/tax`, `/demographics`, `/notes`
### Scraper Strategy
**Correct approach:**
1. **Login** to Reonomy
2. **Perform search** for location
3. **Extract search-id** from resulting URL
4. **Find all property IDs** from search results page
5. **Navigate to each property's ownership view:**
```
https://app.reonomy.com/!/search/{search-id}/property/{property-id}/ownership
```
6. **Extract contact info** from mailto: and tel: links
7. **Rate limit** with delays between requests
### What Was Wrong With Previous Scrapers
1. **Wrong URL pattern**: They were trying to access `/property/{id}` directly
- Correct: `/search/{search-id}/property/{property-id}/ownership`
2. **Wrong selectors**: Looking for complex CSS classes when simple `a[href^="mailto:"]` and `a[href^="tel:"]` work
3. **Focus on wrong views**: The scraper was checking search results or dashboard, not ownership tab
### Updated Scraper Code Template
```javascript
// After login and search, extract search-id and property IDs
const urlMatch = page.url().match(/search\/([a-f0-9-]+)/);
const searchId = urlMatch[1];
// Find property IDs (needs research on how to get from search results page)
// Then visit each property's ownership view:
const ownershipUrl = `https://app.reonomy.com/#!/search/${searchId}/property/${propertyId}/ownership`;
await page.goto(ownershipUrl, { waitUntil: 'networkidle2' });
// Extract contact info
const emails = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a[href^="mailto:"]'))
.map(a => a.href.replace('mailto:', ''));
});
const phones = await page.evaluate(() => {
return Array.from(document.querySelectorAll('a[href^="tel:"]'))
.map(a => a.href.replace('tel:', ''));
});
```
### Next Steps
1. **Research**: How to extract property IDs from search results page?
- May need to check for specific button clicks or API calls
- Properties might be in a JSON object in window or loaded via XHR
2. **Update scraper** with correct URL pattern
3. **Test** with full property list