221 lines
7.1 KiB
Markdown
221 lines
7.1 KiB
Markdown
# Reonomy Scraper Platform — Architecture
|
|
|
|
## Overview
|
|
Full-stack Reonomy property data extraction platform with:
|
|
- REST API server (Express.js)
|
|
- MCP Server (TypeScript, stdio transport)
|
|
- MCP App(s) with React UI
|
|
- Integration with LocalBosses web app
|
|
|
|
## Components
|
|
|
|
### 1. API Server (`/api`)
|
|
Express.js REST API that orchestrates scraping.
|
|
|
|
**Endpoints:**
|
|
- `POST /api/scrape` — Start a scrape job with search + output config
|
|
- `GET /api/scrape/:jobId` — Check job status
|
|
- `GET /api/scrape/:jobId/results` — Get results (JSON)
|
|
- `GET /api/filters` — List all available Reonomy filter options
|
|
- `GET /api/exports/:jobId?format=csv|json` — Export results
|
|
|
|
**Auth:** API key header (`X-API-Key`)
|
|
|
|
### 2. Scraper Engine (`/engine`)
|
|
Core scraping logic (evolved from v13). Modular extraction.
|
|
|
|
**Modules:**
|
|
- `auth.js` — Login, session management, state save/load
|
|
- `search-builder.js` — Translates filter config → Reonomy UI actions
|
|
- `extractor.js` — Modular tab extraction (only grabs what user requested)
|
|
- `anti-detection.js` — Random delays, humanization, daily limits
|
|
- `queue.js` — Job queue with rate limiting
|
|
|
|
**Extraction modules (per tab):**
|
|
- `extract-building.js` — Building & Lot data
|
|
- `extract-owner.js` — Owner names, phones, emails (CURRENT v13 logic)
|
|
- `extract-sales.js` — Sale history
|
|
- `extract-debt.js` — Mortgage/lender info
|
|
- `extract-tax.js` — Tax assessed values
|
|
|
|
### 3. MCP Server (`/mcp-server`)
|
|
TypeScript MCP server exposing Reonomy tools.
|
|
|
|
**Tools:**
|
|
- `reonomy_search` — Configure search filters, returns search ID + count
|
|
- `reonomy_scrape` — Start extraction from a search (with output config)
|
|
- `reonomy_get_results` — Fetch results for a job
|
|
- `reonomy_get_filters` — List all available filter options
|
|
- `reonomy_export` — Export results as CSV/JSON
|
|
|
|
**Resources:**
|
|
- `reonomy://app/search` — Search configuration UI
|
|
- `reonomy://app/results` — Results viewer UI
|
|
- `reonomy://app/dashboard` — Dashboard with stats
|
|
|
|
### 4. MCP App(s) (`/mcp-app`)
|
|
React + Vite, bundled to single HTML files per app.
|
|
|
|
**Apps:**
|
|
- **Search Builder App** — Visual filter configuration
|
|
- Location picker, property type checkboxes
|
|
- Owner filters (phone/email toggles)
|
|
- Building & Lot ranges
|
|
- Output field selection
|
|
- "Start Scrape" button
|
|
|
|
- **Results Viewer App** — Table/card view of scraped leads
|
|
- Sortable/filterable data table
|
|
- Expandable owner cards with contact info
|
|
- Export to CSV button
|
|
- Job status indicator
|
|
|
|
- **Dashboard App** — Scrape stats overview
|
|
- Total leads, daily usage, job history
|
|
- Properties by type chart
|
|
|
|
### 5. LocalBosses Integration
|
|
- Add Reonomy channel to toolbar
|
|
- Wire MCP apps into iframe system
|
|
- API endpoints accessible from app
|
|
|
|
## Tech Stack
|
|
- **Runtime:** Node.js 22
|
|
- **API:** Express.js
|
|
- **MCP Server:** @modelcontextprotocol/sdk (TypeScript)
|
|
- **MCP App:** React 18 + Vite + Tailwind
|
|
- **Browser Automation:** agent-browser CLI
|
|
- **Queue:** In-memory (Bull optional for production)
|
|
- **Storage:** SQLite (better-sqlite3) for results + job tracking
|
|
- **Rate Limiting:** Built-in daily caps + per-request delays
|
|
|
|
## File Structure
|
|
```
|
|
reonomy-api/
|
|
├── ARCHITECTURE.md
|
|
├── package.json
|
|
├── tsconfig.json
|
|
├── src/
|
|
│ ├── server.ts # Express API entry
|
|
│ ├── routes/
|
|
│ │ ├── scrape.ts # /api/scrape endpoints
|
|
│ │ ├── filters.ts # /api/filters
|
|
│ │ └── exports.ts # /api/exports
|
|
│ ├── engine/
|
|
│ │ ├── auth.ts # Reonomy auth
|
|
│ │ ├── search-builder.ts
|
|
│ │ ├── extractor.ts # Orchestrates tab extraction
|
|
│ │ ├── anti-detection.ts
|
|
│ │ ├── queue.ts
|
|
│ │ └── extractors/
|
|
│ │ ├── building.ts
|
|
│ │ ├── owner.ts
|
|
│ │ ├── sales.ts
|
|
│ │ ├── debt.ts
|
|
│ │ └── tax.ts
|
|
│ ├── mcp/
|
|
│ │ ├── server.ts # MCP server entry
|
|
│ │ ├── tools.ts # Tool definitions
|
|
│ │ └── resources.ts # Resource definitions
|
|
│ ├── db/
|
|
│ │ ├── schema.ts # SQLite schema
|
|
│ │ └── queries.ts # DB operations
|
|
│ └── types.ts # Shared types
|
|
├── mcp-app/
|
|
│ ├── package.json
|
|
│ ├── vite.config.ts
|
|
│ ├── src/
|
|
│ │ ├── apps/
|
|
│ │ │ ├── SearchBuilder.tsx
|
|
│ │ │ ├── ResultsViewer.tsx
|
|
│ │ │ └── Dashboard.tsx
|
|
│ │ ├── components/
|
|
│ │ │ ├── FilterPanel.tsx
|
|
│ │ │ ├── PropertyCard.tsx
|
|
│ │ │ ├── DataTable.tsx
|
|
│ │ │ └── ExportButton.tsx
|
|
│ │ └── lib/
|
|
│ │ └── mcp-app-sdk.ts
|
|
│ └── dist/ # Built HTML files
|
|
└── data/
|
|
└── reonomy.db # SQLite database
|
|
```
|
|
|
|
## Search Config Schema
|
|
```typescript
|
|
interface SearchConfig {
|
|
location: string; // "Miami-Dade, FL"
|
|
propertyTypes?: string[]; // ["Multifamily", "Multi Family (General)"]
|
|
building?: {
|
|
yearBuiltFrom?: number;
|
|
yearBuiltUntil?: number;
|
|
yearRenovatedFrom?: number;
|
|
yearRenovatedUntil?: number;
|
|
zoning?: string;
|
|
lotSizeSfMin?: number;
|
|
lotSizeSfMax?: number;
|
|
lotSizeAcresMin?: number;
|
|
lotSizeAcresMax?: number;
|
|
opportunityZone?: boolean;
|
|
totalUnitsMin?: number;
|
|
totalUnitsMax?: number;
|
|
buildingAreaMin?: number;
|
|
buildingAreaMax?: number;
|
|
};
|
|
owner?: {
|
|
nameOrCompany?: string;
|
|
ownerType?: "Company" | "Person";
|
|
includesPhone?: boolean;
|
|
includesEmail?: boolean;
|
|
includesMailingAddress?: boolean;
|
|
portfolioMin?: number;
|
|
portfolioMax?: number;
|
|
ownerOccupied?: boolean;
|
|
inStateOwner?: boolean;
|
|
portfolioValueMin?: number;
|
|
portfolioValueMax?: number;
|
|
reportedOwner?: string;
|
|
mailingAddress?: string;
|
|
};
|
|
occupants?: {
|
|
name?: string;
|
|
naicsSic?: string;
|
|
website?: string;
|
|
};
|
|
sales?: {
|
|
dateRange?: string;
|
|
multiParcel?: boolean;
|
|
priceMin?: number;
|
|
priceMax?: number;
|
|
pricePerSfMin?: number;
|
|
pricePerSfMax?: number;
|
|
likelyToSell?: boolean;
|
|
};
|
|
debt?: {
|
|
amountMin?: number;
|
|
amountMax?: number;
|
|
originationFrom?: string;
|
|
originationUntil?: string;
|
|
maturityFrom?: string;
|
|
maturityUntil?: string;
|
|
lenderName?: string;
|
|
cmbsLoan?: boolean;
|
|
};
|
|
distressed?: {
|
|
auctionDateFrom?: string;
|
|
auctionDateUntil?: string;
|
|
preForeclosureCategory?: string;
|
|
cmbsWatchlist?: boolean;
|
|
};
|
|
}
|
|
|
|
interface OutputConfig {
|
|
propertyInfo?: ("address" | "type" | "units" | "sqft" | "yearBuilt" | "lotSize" | "zoning" | "opportunityZone" | "apn" | "legal")[];
|
|
ownerInfo?: ("name" | "company" | "portfolioSize" | "portfolioValue" | "ownerType")[];
|
|
contactInfo?: ("phones" | "emails")[];
|
|
salesInfo?: ("lastSaleDate" | "lastSalePrice" | "buyer" | "seller" | "deedType")[];
|
|
debtInfo?: ("lender" | "loanType" | "mortgageAmount" | "maturityDate" | "interestType")[];
|
|
taxInfo?: ("assessedValue" | "taxAmount")[];
|
|
}
|
|
```
|