592 lines
17 KiB
Markdown
592 lines
17 KiB
Markdown
# Prediction Markets Edge Research
|
|
## Polymarket & Kalshi - Opportunities & Full Stack
|
|
|
|
*Generated 2026-01-22*
|
|
|
|
---
|
|
|
|
## PART 1: WHERE THE EDGE EXISTS
|
|
|
|
### 1. NICHE/SPECIALIZED KNOWLEDGE MARKETS
|
|
|
|
**Why edge exists:**
|
|
- Low liquidity = mispriced odds
|
|
- Fewer sharp traders
|
|
- Information asymmetry in specific domains
|
|
|
|
**Categories with highest potential:**
|
|
|
|
| Category | Why Edge | Examples |
|
|
|----------|----------|----------|
|
|
| **Economic indicators** | Data-driven, predictable releases | CPI, unemployment, GDP beats/misses |
|
|
| **Crypto technicals** | On-chain data available early | ETH price targets, Bitcoin halving outcomes |
|
|
| **Esports/specific sports** | Niche data sources, scouting intel | Dota 2 tournaments, League match outcomes |
|
|
| **Corporate events** | Insider/industry connections | CEO departures, acquisitions, earnings beats |
|
|
| **Geopolitical** | Local intel, language barriers | Election outcomes in non-US countries |
|
|
|
|
**Edge types:**
|
|
- **Data access**: You get data faster (e.g., Bloomberg Terminal vs free APIs)
|
|
- **Domain expertise**: You understand nuances (e.g., esports meta shifts)
|
|
- **Local intelligence**: On-the-ground knowledge (elections, protests)
|
|
|
|
---
|
|
|
|
### 2. TIME-SENSITIVE MARKETS (Information Velocity)
|
|
|
|
**Polymarket excels here - news moves odds FAST**
|
|
|
|
**Edge opportunities:**
|
|
- **Breaking news monitoring**: Reuters API, Bloomberg News, Twitter/X firehose
|
|
- **Economic data releases**: Federal Reserve, BLS, BEA releases with millisecond precision
|
|
- **On-chain signals**: Whale alerts, large transfers, protocol exploits
|
|
- **Social sentiment shifts**: Reddit trends, TikTok virality tracking
|
|
|
|
**Example workflow:**
|
|
```
|
|
Reuters API → Detect breaking news → Cross-reference market → Analyze mispricing → Execute trade
|
|
```
|
|
|
|
**Tools needed:**
|
|
- Real-time news feeds (Reuters, Bloomberg, NewsAPI)
|
|
- Sentiment analysis (VADER, BERT, custom ML models)
|
|
- Fast execution (Polymarket CLOB, Kalshi API)
|
|
|
|
---
|
|
|
|
### 3. CROSS-PLATFORM ARBITRAGE
|
|
|
|
**Why edge exists:**
|
|
- Polymarket and Kalshi don't always have the same events
|
|
- Same event, different platforms = price discrepancies
|
|
- Different user bases = different market efficiency
|
|
|
|
**Types of arbitrage:**
|
|
1. **Direct arbitrage**: Same outcome, different prices (rare but exists)
|
|
2. **Correlated arbitrage**: Related markets with pricing gaps
|
|
3. **Platform liquidity arbitrage**: Capitalize on platform-specific volume shocks
|
|
|
|
**Example:**
|
|
- Polymarket has "Fed rate cut in March 2026" at 65%
|
|
- Kalshi has "Fed funds rate below 4.5% by March 31 2026" at 58%
|
|
- If these are materially the same event, there's an edge
|
|
|
|
**Full arbitrage stack:**
|
|
- `pmxtjs` or `@alango/dr-manhattan` for unified API
|
|
- Correlation detection engine
|
|
- Position sizing with platform-specific risk limits
|
|
|
|
---
|
|
|
|
### 4. LIQUIDITY & MARKET MAKING EDGE
|
|
|
|
**Why edge exists:**
|
|
- Many markets have thin order books
|
|
- Market makers can earn the spread
|
|
- Less competition on smaller markets
|
|
|
|
**Strategies:**
|
|
- **Passive market making**: Place limit orders on both sides of thin markets
|
|
- **Inventory management**: Hedge with correlated markets
|
|
- **Volatility trading**: Buy options/straddles around major events
|
|
|
|
**Tools:**
|
|
- Polymarket CLOB API for order placement
|
|
- Kalshi API for limit orders
|
|
- Real-time price feeds
|
|
|
|
---
|
|
|
|
### 5. MODEL-BASED PREDICTIONS
|
|
|
|
**Where AI/ML shines:**
|
|
|
|
| Market Type | Model Approach | Data Sources |
|
|
|-------------|-----------------|--------------|
|
|
| Economic indicators | Time series forecasting (ARIMA, Prophet, LSTMs) | FRED API, Bloomberg historical |
|
|
| Elections | Poll aggregation + demographic weighting | 538, RealClearPolitics, district data |
|
|
| Crypto prices | On-chain metrics + sentiment | Dune Analytics, Glassnode, social APIs |
|
|
| Weather/climate | Ensemble meteorological models | NOAA, ECMWF, historical data |
|
|
| Sports outcomes | Elo ratings + player statistics | Statcast, ESPN APIs, scraping |
|
|
|
|
**Edge comes from:**
|
|
- Better data (non-obvious signals)
|
|
- Better models (ensemble, custom features)
|
|
- Faster updates (real-time re-training)
|
|
|
|
---
|
|
|
|
## PART 2: THE FULL STACK
|
|
|
|
### Layer 0: Infrastructure
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ DATA INFRASTRUCTURE │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ • Real-time APIs (news, markets, on-chain) │
|
|
│ • PostgreSQL/ClickHouse for historical data │
|
|
│ • Redis for caching + rate limiting │
|
|
│ • Message queue (RabbitMQ/Redis Streams) for events │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Key components:**
|
|
- **Database**: PostgreSQL with TimescaleDB for time-series market data
|
|
- **Cache**: Redis for rate limiting, market snapshots, order book states
|
|
- **Queue**: RabbitMQ or Kafka for async job processing
|
|
- **Monitoring**: Prometheus + Grafana for system health, P&L tracking
|
|
|
|
---
|
|
|
|
### Layer 1: Data Ingestion
|
|
|
|
**Sources:**
|
|
|
|
| Source | API/Tool | Use Case |
|
|
|--------|----------|----------|
|
|
| Polymarket | `@polymarket/sdk`, `polymarket-gamma`, `@nevuamarkets/poly-websockets` | Market data, odds, volume, order book |
|
|
| Kalshi | `kalshi-typescript`, `@newyorkcompute/kalshi-core` | Market data, contract prices, fills |
|
|
| News | Reuters, Bloomberg, NewsAPI | Breaking news, sentiment |
|
|
| On-chain | Dune Analytics, The Graph, Whale Alert | Crypto-specific markets |
|
|
| Social | X (Twitter) API, Reddit API | Sentiment, trend detection |
|
|
| Economic | FRED API, BEA API, BLS API | Macro indicators |
|
|
|
|
**Ingestion pattern:**
|
|
```python
|
|
# Pseudocode
|
|
async def ingest_polymarket_data():
|
|
ws = connect_poly_websocket()
|
|
async for msg in ws:
|
|
process_market_update(msg)
|
|
store_to_postgres(msg)
|
|
emit_to_queue(msg)
|
|
trigger_signal_if_edge_detected(msg)
|
|
```
|
|
|
|
---
|
|
|
|
### Layer 2: Signal Generation
|
|
|
|
**Three approaches:**
|
|
|
|
1. **Rule-based signals**
|
|
```javascript
|
|
// Example: Economic data beat
|
|
if (actualCPI > forecastCPI && marketProbability < 80%) {
|
|
emitSignal({ market: "Fed hike July", action: "BUY YES", confidence: 0.85 });
|
|
}
|
|
```
|
|
|
|
2. **ML-based signals**
|
|
```python
|
|
# Example: Ensemble prediction
|
|
predictions = [
|
|
xgboost_model.predict(features),
|
|
lstm_model.predict(features),
|
|
sentiment_model.predict(features)
|
|
]
|
|
weighted_pred = weighted_average(predictions, historical_accuracy)
|
|
if weighted_pred > market_prob + threshold:
|
|
emit_signal(...)
|
|
```
|
|
|
|
3. **NLP-based signals** (for news/sentiment)
|
|
```python
|
|
# Example: Breaking news analysis
|
|
news_text = get_latest_news()
|
|
sentiment = transformer_model.predict(news_text)
|
|
entities = ner_model.extract(news_text)
|
|
if "Fed" in entities and sentiment > 0.7:
|
|
# Bullish signal for Fed-related markets
|
|
```
|
|
|
|
**Signal validation:**
|
|
- Backtest against historical data
|
|
- Paper trade with small size first
|
|
- Track prediction accuracy by market category
|
|
- Adjust confidence thresholds over time
|
|
|
|
---
|
|
|
|
### Layer 3: Execution Engine
|
|
|
|
**Polymarket execution:**
|
|
```typescript
|
|
import { PolyMarketSDK } from '@polymarket/sdk';
|
|
|
|
const sdk = new PolyMarketSDK({ apiKey: '...' });
|
|
|
|
// Place order
|
|
const order = await sdk.createOrder({
|
|
marketId: '0x...',
|
|
side: 'YES',
|
|
price: 0.65, // 65 cents
|
|
size: 100, // 100 contracts
|
|
expiration: 86400 // 24 hours
|
|
});
|
|
```
|
|
|
|
**Kalshi execution:**
|
|
```typescript
|
|
import { KalshiSDK } from 'kalshi-typescript';
|
|
|
|
const sdk = new KalshiSDK({ apiKey: '...' });
|
|
|
|
// Place order
|
|
const order = await sdk.placeOrder({
|
|
ticker: 'HIGH-CPI-2026',
|
|
side: 'YES',
|
|
count: 100,
|
|
limit_price: 65 // cents
|
|
});
|
|
```
|
|
|
|
**Execution considerations:**
|
|
- **Slippage**: Thin markets = high slippage. Use limit orders with buffer.
|
|
- **Gas**: Polymarket requires ETH on Polygon for gas. Keep buffer.
|
|
- **Rate limits**: Both platforms have API rate limits. Implement backoff.
|
|
- **Position limits**: Don't overexpose to correlated markets.
|
|
|
|
---
|
|
|
|
### Layer 4: Risk Management
|
|
|
|
**Critical components:**
|
|
|
|
1. **Position sizing**
|
|
```
|
|
Kelly Criterion: f* = (bp - q) / b
|
|
where:
|
|
b = odds received on wager (decimal)
|
|
p = probability of winning
|
|
q = probability of losing (1 - p)
|
|
```
|
|
|
|
2. **Correlation matrix**
|
|
```sql
|
|
-- Track correlated positions
|
|
SELECT m1.market_id, m2.market_id, correlation
|
|
FROM market_correlations mc
|
|
JOIN markets m1 ON mc.market_id_1 = m1.id
|
|
JOIN markets m2 ON mc.market_id_2 = m2.id
|
|
WHERE correlation > 0.7 AND active = true;
|
|
```
|
|
|
|
3. **P&L tracking**
|
|
```sql
|
|
-- Daily P&L by strategy
|
|
SELECT
|
|
date,
|
|
strategy,
|
|
SUM(pnl) as total_pnl,
|
|
SUM(trades) as total_trades,
|
|
SUM(pnl) / NULLIF(SUM(max_risk), 0) as roi
|
|
FROM daily_pnl
|
|
GROUP BY date, strategy;
|
|
```
|
|
|
|
4. **Stop-loss mechanisms**
|
|
```python
|
|
# Example: Auto-liquidation threshold
|
|
if current_pnl < -max_drawdown:
|
|
liquidate_positions(reason="Max drawdown exceeded")
|
|
halt_trading(reason="Risk limit")
|
|
```
|
|
|
|
---
|
|
|
|
### Layer 5: Monitoring & Analytics
|
|
|
|
**Dashboard metrics:**
|
|
- Real-time portfolio value
|
|
- Open positions + unrealized P&L
|
|
- Signal accuracy by category
|
|
- Win rate, ROI, Sharpe ratio
|
|
- Correlation heat map
|
|
|
|
**Alerts:**
|
|
- Large price movements
|
|
- Unusual volume spikes
|
|
- Failed orders
|
|
- System health issues
|
|
|
|
**Backtesting:**
|
|
- Replay historical data
|
|
- Test strategies against past events
|
|
- Calculate hypothetical P&L
|
|
- Optimize hyperparameters
|
|
|
|
---
|
|
|
|
## PART 3: SPECIFIC EDGE STRATEGIES (with tech specs)
|
|
|
|
### Strategy 1: Economic Data Trading
|
|
|
|
**Markets:** "CPI above X%", "Fed funds rate above Y%", "GDP growth > 2%"
|
|
|
|
**Data sources:**
|
|
- BLS API (CPI, unemployment)
|
|
- BEA API (GDP, personal income)
|
|
- Federal Reserve (FOMC statements, rate decisions)
|
|
|
|
**Tech stack:**
|
|
```
|
|
BLS/BEA API → Parser → Compare to consensus → If beat: buy YES, if miss: buy NO
|
|
```
|
|
|
|
**Edge factor:** Data is released at scheduled times; pre-position based on own analysis vs market consensus.
|
|
|
|
**Risk:** Market may have already priced in; look for subtle beats/misses.
|
|
|
|
---
|
|
|
|
### Strategy 2: Esports/Specialized Sports
|
|
|
|
**Markets:** "Team A wins tournament X", "Player Y scores Z points"
|
|
|
|
**Data sources:**
|
|
- Official game APIs (Riot, Valve)
|
|
- Esports data providers (Pandascore, Strafe)
|
|
- Team social media (lineup changes, roster swaps)
|
|
- Scouting reports, patch notes (meta shifts)
|
|
|
|
**Tech stack:**
|
|
```
|
|
Riot API + Social scraping → Team form analysis → Probability model → Trade
|
|
```
|
|
|
|
**Edge factor:** Most bettors don't watch games closely; insider knowledge of roster changes, practice schedules, etc.
|
|
|
|
**Risk:** Low liquidity; hard to exit positions.
|
|
|
|
---
|
|
|
|
### Strategy 3: Crypto On-Chain Signals
|
|
|
|
**Markets:** "BTC above $100K by X date", "ETH ETF approved by Y"
|
|
|
|
**Data sources:**
|
|
- Dune Analytics queries
|
|
- Whale Alert API
|
|
- Glassnode on-chain metrics
|
|
- Etherscan events
|
|
|
|
**Tech stack:**
|
|
```
|
|
Dune query → Whale movement detected → Cross-reference with market → Trade
|
|
```
|
|
|
|
**Edge factor:** On-chain data is transparent but not widely used by retail traders.
|
|
|
|
**Risk:** Manipulation (whale spoofing); correlation vs causation issues.
|
|
|
|
---
|
|
|
|
### Strategy 4: Cross-Platform Arbitrage
|
|
|
|
**Example workflow:**
|
|
```typescript
|
|
import { PolyMarketSDK } from '@polymarket/sdk';
|
|
import { KalshiSDK } from 'kalshi-typescript';
|
|
|
|
const poly = new PolyMarketSDK({ apiKey: '...' });
|
|
const kalshi = new KalshiSDK({ apiKey: '...' });
|
|
|
|
// Get equivalent markets
|
|
const polyMarket = await poly.getMarket({ slug: 'fed-hike-july-2026' });
|
|
const kalshiMarket = await kalshi.getMarket({ ticker: 'FED-HIKE-JULY-2026' });
|
|
|
|
// Detect arbitrage
|
|
if (polyMarket.price > kalshiMarket.price + threshold) {
|
|
// Buy NO on Polymarket, YES on Kalshi
|
|
await poly.createOrder({ marketId: polyMarket.id, side: 'NO', ... });
|
|
await kalshi.placeOrder({ ticker: kalshiMarket.ticker, side: 'YES', ... });
|
|
}
|
|
```
|
|
|
|
**Edge factor:** Information asymmetry between platforms; different user bases.
|
|
|
|
**Risk:** Execution risk (prices move during trade); correlated markets not exactly equivalent.
|
|
|
|
---
|
|
|
|
## PART 4: RECOMMENDED STARTER STACK
|
|
|
|
### Minimal Viable Product (MVP)
|
|
|
|
```
|
|
1. MCP Servers (via mcporter)
|
|
├── @iqai/mcp-polymarket
|
|
├── @newyorkcompute/kalshi-mcp
|
|
└── prediction-mcp (unified)
|
|
|
|
2. Data Pipeline
|
|
├── PostgreSQL (market data, trades, P&L)
|
|
├── Redis (caching, rate limiting)
|
|
└── Simple cron jobs (data ingestion)
|
|
|
|
3. Signal Engine
|
|
├── Rule-based signals (start simple)
|
|
├── Sentiment analysis (optional)
|
|
└── Backtesting framework
|
|
|
|
4. Execution
|
|
├── Polymarket SDK
|
|
├── Kalshi SDK
|
|
└── Order queue with retry logic
|
|
|
|
5. Monitoring
|
|
├── Grafana dashboard
|
|
├── Discord alerts
|
|
└── Daily P&L reports
|
|
```
|
|
|
|
### Production-Grade Stack
|
|
|
|
```
|
|
1. Infrastructure
|
|
├── Cloud (AWS/GCP)
|
|
├── Kubernetes (scalability)
|
|
├── PostgreSQL + TimescaleDB (time-series)
|
|
├── Redis Cluster
|
|
└── RabbitMQ/Kafka
|
|
|
|
2. Data Ingestion
|
|
├── WebSocket connections (real-time)
|
|
├── REST APIs (historical)
|
|
├── Scrapers (social, news)
|
|
└── ML feature pipeline
|
|
|
|
3. Signal Engine
|
|
├── Ensemble models (XGBoost + LSTM)
|
|
├── NLP for news/sentiment
|
|
├── Backtesting framework
|
|
└── Hyperparameter optimization
|
|
|
|
4. Execution
|
|
├── Order management system
|
|
├── Position tracker
|
|
├── Risk engine
|
|
└── Circuit breakers
|
|
|
|
5. Monitoring
|
|
├── Prometheus + Grafana
|
|
├── Slack/Discord alerts
|
|
├── P&L analytics
|
|
└── Strategy performance dashboard
|
|
```
|
|
|
|
---
|
|
|
|
## PART 5: GETTING STARTED (Step-by-Step)
|
|
|
|
### Step 1: Install MCP servers
|
|
```bash
|
|
# Add via mcporter
|
|
mcporter add mcp-polymarket
|
|
mcporter add kalshi-mcp
|
|
mcporter add prediction-mcp
|
|
```
|
|
|
|
### Step 2: Set up database
|
|
```sql
|
|
-- Schema for markets, trades, signals
|
|
CREATE TABLE markets (
|
|
id TEXT PRIMARY KEY,
|
|
platform TEXT NOT NULL,
|
|
slug TEXT NOT NULL,
|
|
question TEXT,
|
|
end_date TIMESTAMP,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
|
|
CREATE TABLE trades (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
market_id TEXT REFERENCES markets(id),
|
|
side TEXT NOT NULL,
|
|
price NUMERIC NOT NULL,
|
|
size NUMERIC NOT NULL,
|
|
pnl NUMERIC,
|
|
created_at TIMESTAMP DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
### Step 3: Build signal generator (start with rule-based)
|
|
```python
|
|
# signals/economic.py
|
|
def check_economic_signal(market_data, consensus, actual):
|
|
if actual > consensus and market_data['price'] < 0.8:
|
|
return {'action': 'BUY_YES', 'confidence': 0.8}
|
|
elif actual < consensus and market_data['price'] > 0.2:
|
|
return {'action': 'BUY_NO', 'confidence': 0.8}
|
|
return None
|
|
```
|
|
|
|
### Step 4: Implement execution
|
|
```typescript
|
|
// execute.ts
|
|
import { PolyMarketSDK } from '@polymarket/sdk';
|
|
|
|
async function executeSignal(signal: Signal) {
|
|
const sdk = new PolyMarketSDK({ apiKey: process.env.POLY_API_KEY });
|
|
const order = await sdk.createOrder({
|
|
marketId: signal.marketId,
|
|
side: signal.side,
|
|
price: signal.price,
|
|
size: signal.size
|
|
});
|
|
await logTrade(order);
|
|
}
|
|
```
|
|
|
|
### Step 5: Build backtester
|
|
```python
|
|
# backtest.py
|
|
def backtest_strategy(start_date, end_date):
|
|
historical_data = load_historical_markets(start_date, end_date)
|
|
results = []
|
|
|
|
for market in historical_data:
|
|
signal = generate_signal(market)
|
|
if signal:
|
|
outcome = get_market_outcome(market['id'])
|
|
pnl = calculate_pnl(signal, outcome)
|
|
results.append({signal, outcome, pnl})
|
|
|
|
return analyze_results(results)
|
|
```
|
|
|
|
### Step 6: Deploy and monitor
|
|
- Use cron/scheduler for regular data pulls
|
|
- Set up Discord alerts for signals and trades
|
|
- Daily P&L reports
|
|
- Weekly strategy review
|
|
|
|
---
|
|
|
|
## PART 6: KEY RISKS & MITIGATIONS
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| **Liquidity risk** | Avoid thin markets, use limit orders, size positions appropriately |
|
|
| **Execution risk** | Pre-test APIs, implement retry logic, have fallback mechanisms |
|
|
| **Model risk** | Backtest thoroughly, paper trade first, monitor live accuracy |
|
|
| **Platform risk** | Don't store large amounts on exchange, use API keys with limited permissions |
|
|
| **Correlation risk** | Track correlated positions, implement portfolio-level limits |
|
|
| **Regulatory risk** | Check terms of service, comply with local laws |
|
|
| **Market manipulation** | Be wary of wash trading, suspicious volume spikes |
|
|
|
|
---
|
|
|
|
## PART 7: NEXT ACTIONS
|
|
|
|
1. **Install MCP servers** - start with `prediction-mcp` for unified data access
|
|
2. **Pick a niche** - economic data, esports, or crypto (don't try everything)
|
|
3. **Build data pipeline** - PostgreSQL + simple ingestion scripts
|
|
4. **Start with rule-based signals** - easier to debug and understand
|
|
5. **Paper trade for 2-4 weeks** - validate before using real money
|
|
6. **Scale up gradually** - increase position sizes as confidence grows
|
|
|
|
---
|
|
|
|
*Ready to set up the stack? I can install MCP servers and start building the data pipeline.* |