117 lines
3.4 KiB
Markdown
117 lines
3.4 KiB
Markdown
# Reonomy DOM Analysis - Contact Info Extraction
|
|
|
|
## Key Findings
|
|
|
|
### URL Structure
|
|
The critical discovery is the **correct URL pattern** for accessing property ownership/contact info:
|
|
|
|
```
|
|
https://app.reonomy.com/!/search/{search-id}/property/{property-id}/ownership
|
|
```
|
|
|
|
**Example:**
|
|
```
|
|
https://app.reonomy.com/!/search/36724b2c-4352-47a1-bc34-619c09cefa72/property/e9437640-d098-53bb-8421-fffb43f78b7e/ownership
|
|
```
|
|
|
|
**Components:**
|
|
- `search-id`: `36724b2c-4352-47a1-bc34-619c09cefa72` (from the search query)
|
|
- `property-id`: `e9437640-d098-53bb-8421-fffb43f78b7e` (specific property)
|
|
- `view`: `ownership` (this is where contact info lives!)
|
|
|
|
### Contact Info Found on Ownership Page
|
|
|
|
**Email addresses (4 found):**
|
|
- johnsoh@centurylink.net
|
|
- helen.christian@sumter.k12.us
|
|
- helen.christian@sumter.k12.fl.us
|
|
- christj@sumter.k12.fl.us
|
|
|
|
**Phone numbers (4 found):**
|
|
- 352-568-0033
|
|
- 517-610-1861
|
|
- 352-793-3204
|
|
- 352-603-1369
|
|
|
|
### DOM Selectors for Contact Info
|
|
|
|
**Email:**
|
|
```javascript
|
|
document.querySelectorAll('a[href^="mailto:"]')
|
|
```
|
|
|
|
**Phone:**
|
|
```javascript
|
|
document.querySelectorAll('a[href^="tel:"]')
|
|
```
|
|
|
|
### Property Details
|
|
|
|
**Property Address:**
|
|
```
|
|
288 east ln, center hill, FL 33514
|
|
```
|
|
|
|
**How to navigate between properties:**
|
|
- From property page: URL contains property ID
|
|
- Ownership view: `/ownership` suffix gives contact info
|
|
- Other tabs available: `/building`, `/sales`, `/debt`, `/tax`, `/demographics`, `/notes`
|
|
|
|
### Scraper Strategy
|
|
|
|
**Correct approach:**
|
|
|
|
1. **Login** to Reonomy
|
|
2. **Perform search** for location
|
|
3. **Extract search-id** from resulting URL
|
|
4. **Find all property IDs** from search results page
|
|
5. **Navigate to each property's ownership view:**
|
|
```
|
|
https://app.reonomy.com/!/search/{search-id}/property/{property-id}/ownership
|
|
```
|
|
6. **Extract contact info** from mailto: and tel: links
|
|
7. **Rate limit** with delays between requests
|
|
|
|
### What Was Wrong With Previous Scrapers
|
|
|
|
1. **Wrong URL pattern**: They were trying to access `/property/{id}` directly
|
|
- Correct: `/search/{search-id}/property/{property-id}/ownership`
|
|
|
|
2. **Wrong selectors**: Looking for complex CSS classes when simple `a[href^="mailto:"]` and `a[href^="tel:"]` work
|
|
|
|
3. **Focus on wrong views**: The scraper was checking search results or dashboard, not ownership tab
|
|
|
|
### Updated Scraper Code Template
|
|
|
|
```javascript
|
|
// After login and search, extract search-id and property IDs
|
|
const urlMatch = page.url().match(/search\/([a-f0-9-]+)/);
|
|
const searchId = urlMatch[1];
|
|
|
|
// Find property IDs (needs research on how to get from search results page)
|
|
// Then visit each property's ownership view:
|
|
const ownershipUrl = `https://app.reonomy.com/#!/search/${searchId}/property/${propertyId}/ownership`;
|
|
await page.goto(ownershipUrl, { waitUntil: 'networkidle2' });
|
|
|
|
// Extract contact info
|
|
const emails = await page.evaluate(() => {
|
|
return Array.from(document.querySelectorAll('a[href^="mailto:"]'))
|
|
.map(a => a.href.replace('mailto:', ''));
|
|
});
|
|
|
|
const phones = await page.evaluate(() => {
|
|
return Array.from(document.querySelectorAll('a[href^="tel:"]'))
|
|
.map(a => a.href.replace('tel:', ''));
|
|
});
|
|
```
|
|
|
|
### Next Steps
|
|
|
|
1. **Research**: How to extract property IDs from search results page?
|
|
- May need to check for specific button clicks or API calls
|
|
- Properties might be in a JSON object in window or loaded via XHR
|
|
|
|
2. **Update scraper** with correct URL pattern
|
|
|
|
3. **Test** with full property list
|