clawdbot-workspace/prediction-markets-research.md

# Prediction Markets Edge Research
## Polymarket & Kalshi - Opportunities & Full Stack

*Generated 2026-01-22*

---

## PART 1: WHERE THE EDGE EXISTS

### 1. NICHE/SPECIALIZED KNOWLEDGE MARKETS

**Why edge exists:**
- Low liquidity = mispriced odds
- Fewer sharp traders
- Information asymmetry in specific domains

**Categories with highest potential:**

| Category | Why Edge | Examples |
|----------|----------|----------|
| **Economic indicators** | Data-driven, predictable releases | CPI, unemployment, GDP beats/misses |
| **Crypto technicals** | On-chain data available early | ETH price targets, Bitcoin halving outcomes |
| **Esports/specific sports** | Niche data sources, scouting intel | Dota 2 tournaments, League match outcomes |
| **Corporate events** | Insider/industry connections | CEO departures, acquisitions, earnings beats |
| **Geopolitical** | Local intel, language barriers | Election outcomes in non-US countries |

**Edge types:**
- **Data access**: You get data faster (e.g., Bloomberg Terminal vs free APIs)
- **Domain expertise**: You understand nuances (e.g., esports meta shifts)
- **Local intelligence**: On-the-ground knowledge (elections, protests)

---

### 2. TIME-SENSITIVE MARKETS (Information Velocity)

**Polymarket excels here - news moves odds FAST**

**Edge opportunities:**
- **Breaking news monitoring**: Reuters API, Bloomberg News, Twitter/X firehose
- **Economic data releases**: Federal Reserve, BLS, BEA releases with millisecond precision
- **On-chain signals**: Whale alerts, large transfers, protocol exploits
- **Social sentiment shifts**: Reddit trends, TikTok virality tracking

**Example workflow:**
```
Reuters API → Detect breaking news → Cross-reference market → Analyze mispricing → Execute trade
```

**Tools needed:**
- Real-time news feeds (Reuters, Bloomberg, NewsAPI)
- Sentiment analysis (VADER, BERT, custom ML models)
- Fast execution (Polymarket CLOB, Kalshi API)

---

### 3. CROSS-PLATFORM ARBITRAGE

**Why edge exists:**
- Polymarket and Kalshi don't always have the same events
- Same event, different platforms = price discrepancies
- Different user bases = different market efficiency

**Types of arbitrage:**
1. **Direct arbitrage**: Same outcome, different prices (rare but exists)
2. **Correlated arbitrage**: Related markets with pricing gaps
3. **Platform liquidity arbitrage**: Capitalize on platform-specific volume shocks

**Example:**
- Polymarket has "Fed rate cut in March 2026" at 65%
- Kalshi has "Fed funds rate below 4.5% by March 31 2026" at 58%
- If these are materially the same event, there's an edge

**Full arbitrage stack:**
- `pmxtjs` or `@alango/dr-manhattan` for unified API
- Correlation detection engine
- Position sizing with platform-specific risk limits

---

### 4. LIQUIDITY & MARKET MAKING EDGE

**Why edge exists:**
- Many markets have thin order books
- Market makers can earn the spread
- Less competition on smaller markets

**Strategies:**
- **Passive market making**: Place limit orders on both sides of thin markets
- **Inventory management**: Hedge with correlated markets
- **Volatility trading**: Buy options/straddles around major events

**Tools:**
- Polymarket CLOB API for order placement
- Kalshi API for limit orders
- Real-time price feeds

---

### 5. MODEL-BASED PREDICTIONS

**Where AI/ML shines:**

| Market Type | Model Approach | Data Sources |
|-------------|-----------------|--------------|
| Economic indicators | Time series forecasting (ARIMA, Prophet, LSTMs) | FRED API, Bloomberg historical |
| Elections | Poll aggregation + demographic weighting | 538, RealClearPolitics, district data |
| Crypto prices | On-chain metrics + sentiment | Dune Analytics, Glassnode, social APIs |
| Weather/climate | Ensemble meteorological models | NOAA, ECMWF, historical data |
| Sports outcomes | Elo ratings + player statistics | Statcast, ESPN APIs, scraping |

**Edge comes from:**
- Better data (non-obvious signals)
- Better models (ensemble, custom features)
- Faster updates (real-time re-training)

---

## PART 2: THE FULL STACK

### Layer 0: Infrastructure

```
┌─────────────────────────────────────────────────────────────┐
│                    DATA INFRASTRUCTURE                      │
├─────────────────────────────────────────────────────────────┤
│  • Real-time APIs (news, markets, on-chain)                │
│  • PostgreSQL/ClickHouse for historical data               │
│  • Redis for caching + rate limiting                        │
│  • Message queue (RabbitMQ/Redis Streams) for events       │
└─────────────────────────────────────────────────────────────┘
```

**Key components:**
- **Database**: PostgreSQL with TimescaleDB for time-series market data
- **Cache**: Redis for rate limiting, market snapshots, order book states
- **Queue**: RabbitMQ or Kafka for async job processing
- **Monitoring**: Prometheus + Grafana for system health, P&L tracking

---

### Layer 1: Data Ingestion

**Sources:**

| Source | API/Tool | Use Case |
|--------|----------|----------|
| Polymarket | `@polymarket/sdk`, `polymarket-gamma`, `@nevuamarkets/poly-websockets` | Market data, odds, volume, order book |
| Kalshi | `kalshi-typescript`, `@newyorkcompute/kalshi-core` | Market data, contract prices, fills |
| News | Reuters, Bloomberg, NewsAPI | Breaking news, sentiment |
| On-chain | Dune Analytics, The Graph, Whale Alert | Crypto-specific markets |
| Social | X (Twitter) API, Reddit API | Sentiment, trend detection |
| Economic | FRED API, BEA API, BLS API | Macro indicators |

**Ingestion pattern:**
```python
# Pseudocode
async def ingest_polymarket_data():
    ws = connect_poly_websocket()
    async for msg in ws:
        process_market_update(msg)
        store_to_postgres(msg)
        emit_to_queue(msg)
        trigger_signal_if_edge_detected(msg)
```

---

### Layer 2: Signal Generation

**Three approaches:**

1. **Rule-based signals**
```javascript
// Example: Economic data beat
if (actualCPI > forecastCPI && marketProbability < 80%) {
    emitSignal({ market: "Fed hike July", action: "BUY YES", confidence: 0.85 });
}
```

2. **ML-based signals**
```python
# Example: Ensemble prediction
predictions = [
    xgboost_model.predict(features),
    lstm_model.predict(features),
    sentiment_model.predict(features)
]
weighted_pred = weighted_average(predictions, historical_accuracy)
if weighted_pred > market_prob + threshold:
    emit_signal(...)
```

3. **NLP-based signals** (for news/sentiment)
```python
# Example: Breaking news analysis
news_text = get_latest_news()
sentiment = transformer_model.predict(news_text)
entities = ner_model.extract(news_text)
if "Fed" in entities and sentiment > 0.7:
    # Bullish signal for Fed-related markets
```

**Signal validation:**
- Backtest against historical data
- Paper trade with small size first
- Track prediction accuracy by market category
- Adjust confidence thresholds over time

---

### Layer 3: Execution Engine

**Polymarket execution:**
```typescript
import { PolyMarketSDK } from '@polymarket/sdk';

const sdk = new PolyMarketSDK({ apiKey: '...' });

// Place order
const order = await sdk.createOrder({
    marketId: '0x...',
    side: 'YES',
    price: 0.65, // 65 cents
    size: 100,   // 100 contracts
    expiration: 86400 // 24 hours
});
```

**Kalshi execution:**
```typescript
import { KalshiSDK } from 'kalshi-typescript';

const sdk = new KalshiSDK({ apiKey: '...' });

// Place order
const order = await sdk.placeOrder({
    ticker: 'HIGH-CPI-2026',
    side: 'YES',
    count: 100,
    limit_price: 65 // cents
});
```

**Execution considerations:**
- **Slippage**: Thin markets = high slippage. Use limit orders with buffer.
- **Gas**: Polymarket requires ETH on Polygon for gas. Keep buffer.
- **Rate limits**: Both platforms have API rate limits. Implement backoff.
- **Position limits**: Don't overexpose to correlated markets.

---

### Layer 4: Risk Management

**Critical components:**

1. **Position sizing**
```
Kelly Criterion: f* = (bp - q) / b
where:
  b = odds received on wager (decimal)
  p = probability of winning
  q = probability of losing (1 - p)
```

2. **Correlation matrix**
```sql
-- Track correlated positions
SELECT m1.market_id, m2.market_id, correlation
FROM market_correlations mc
JOIN markets m1 ON mc.market_id_1 = m1.id
JOIN markets m2 ON mc.market_id_2 = m2.id
WHERE correlation > 0.7 AND active = true;
```

3. **P&L tracking**
```sql
-- Daily P&L by strategy
SELECT
    date,
    strategy,
    SUM(pnl) as total_pnl,
    SUM(trades) as total_trades,
    SUM(pnl) / NULLIF(SUM(max_risk), 0) as roi
FROM daily_pnl
GROUP BY date, strategy;
```

4. **Stop-loss mechanisms**
```python
# Example: Auto-liquidation threshold
if current_pnl < -max_drawdown:
    liquidate_positions(reason="Max drawdown exceeded")
    halt_trading(reason="Risk limit")
```

---

### Layer 5: Monitoring & Analytics

**Dashboard metrics:**
- Real-time portfolio value
- Open positions + unrealized P&L
- Signal accuracy by category
- Win rate, ROI, Sharpe ratio
- Correlation heat map

**Alerts:**
- Large price movements
- Unusual volume spikes
- Failed orders
- System health issues

**Backtesting:**
- Replay historical data
- Test strategies against past events
- Calculate hypothetical P&L
- Optimize hyperparameters

---

## PART 3: SPECIFIC EDGE STRATEGIES (with tech specs)

### Strategy 1: Economic Data Trading

**Markets:** "CPI above X%", "Fed funds rate above Y%", "GDP growth > 2%"

**Data sources:**
- BLS API (CPI, unemployment)
- BEA API (GDP, personal income)
- Federal Reserve (FOMC statements, rate decisions)

**Tech stack:**
```
BLS/BEA API → Parser → Compare to consensus → If beat: buy YES, if miss: buy NO
```

**Edge factor:** Data is released at scheduled times; pre-position based on own analysis vs market consensus.

**Risk:** Market may have already priced in; look for subtle beats/misses.

---

### Strategy 2: Esports/Specialized Sports

**Markets:** "Team A wins tournament X", "Player Y scores Z points"

**Data sources:**
- Official game APIs (Riot, Valve)
- Esports data providers (Pandascore, Strafe)
- Team social media (lineup changes, roster swaps)
- Scouting reports, patch notes (meta shifts)

**Tech stack:**
```
Riot API + Social scraping → Team form analysis → Probability model → Trade
```

**Edge factor:** Most bettors don't watch games closely; insider knowledge of roster changes, practice schedules, etc.

**Risk:** Low liquidity; hard to exit positions.

---

### Strategy 3: Crypto On-Chain Signals

**Markets:** "BTC above $100K by X date", "ETH ETF approved by Y"

**Data sources:**
- Dune Analytics queries
- Whale Alert API
- Glassnode on-chain metrics
- Etherscan events

**Tech stack:**
```
Dune query → Whale movement detected → Cross-reference with market → Trade
```

**Edge factor:** On-chain data is transparent but not widely used by retail traders.

**Risk:** Manipulation (whale spoofing); correlation vs causation issues.

---

### Strategy 4: Cross-Platform Arbitrage

**Example workflow:**
```typescript
import { PolyMarketSDK } from '@polymarket/sdk';
import { KalshiSDK } from 'kalshi-typescript';

const poly = new PolyMarketSDK({ apiKey: '...' });
const kalshi = new KalshiSDK({ apiKey: '...' });

// Get equivalent markets
const polyMarket = await poly.getMarket({ slug: 'fed-hike-july-2026' });
const kalshiMarket = await kalshi.getMarket({ ticker: 'FED-HIKE-JULY-2026' });

// Detect arbitrage
if (polyMarket.price > kalshiMarket.price + threshold) {
    // Buy NO on Polymarket, YES on Kalshi
    await poly.createOrder({ marketId: polyMarket.id, side: 'NO', ... });
    await kalshi.placeOrder({ ticker: kalshiMarket.ticker, side: 'YES', ... });
}
```

**Edge factor:** Information asymmetry between platforms; different user bases.

**Risk:** Execution risk (prices move during trade); correlated markets not exactly equivalent.

---

## PART 4: RECOMMENDED STARTER STACK

### Minimal Viable Product (MVP)

```
1. MCP Servers (via mcporter)
   ├── @iqai/mcp-polymarket
   ├── @newyorkcompute/kalshi-mcp
   └── prediction-mcp (unified)

2. Data Pipeline
   ├── PostgreSQL (market data, trades, P&L)
   ├── Redis (caching, rate limiting)
   └── Simple cron jobs (data ingestion)

3. Signal Engine
   ├── Rule-based signals (start simple)
   ├── Sentiment analysis (optional)
   └── Backtesting framework

4. Execution
   ├── Polymarket SDK
   ├── Kalshi SDK
   └── Order queue with retry logic

5. Monitoring
   ├── Grafana dashboard
   ├── Discord alerts
   └── Daily P&L reports
```

### Production-Grade Stack

```
1. Infrastructure
   ├── Cloud (AWS/GCP)
   ├── Kubernetes (scalability)
   ├── PostgreSQL + TimescaleDB (time-series)
   ├── Redis Cluster
   └── RabbitMQ/Kafka

2. Data Ingestion
   ├── WebSocket connections (real-time)
   ├── REST APIs (historical)
   ├── Scrapers (social, news)
   └── ML feature pipeline

3. Signal Engine
   ├── Ensemble models (XGBoost + LSTM)
   ├── NLP for news/sentiment
   ├── Backtesting framework
   └── Hyperparameter optimization

4. Execution
   ├── Order management system
   ├── Position tracker
   ├── Risk engine
   └── Circuit breakers

5. Monitoring
   ├── Prometheus + Grafana
   ├── Slack/Discord alerts
   ├── P&L analytics
   └── Strategy performance dashboard
```

---

## PART 5: GETTING STARTED (Step-by-Step)

### Step 1: Install MCP servers
```bash
# Add via mcporter
mcporter add mcp-polymarket
mcporter add kalshi-mcp
mcporter add prediction-mcp
```

### Step 2: Set up database
```sql
-- Schema for markets, trades, signals
CREATE TABLE markets (
    id TEXT PRIMARY KEY,
    platform TEXT NOT NULL,
    slug TEXT NOT NULL,
    question TEXT,
    end_date TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE trades (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    market_id TEXT REFERENCES markets(id),
    side TEXT NOT NULL,
    price NUMERIC NOT NULL,
    size NUMERIC NOT NULL,
    pnl NUMERIC,
    created_at TIMESTAMP DEFAULT NOW()
);
```

### Step 3: Build signal generator (start with rule-based)
```python
# signals/economic.py
def check_economic_signal(market_data, consensus, actual):
    if actual > consensus and market_data['price'] < 0.8:
        return {'action': 'BUY_YES', 'confidence': 0.8}
    elif actual < consensus and market_data['price'] > 0.2:
        return {'action': 'BUY_NO', 'confidence': 0.8}
    return None
```

### Step 4: Implement execution
```typescript
// execute.ts
import { PolyMarketSDK } from '@polymarket/sdk';

async function executeSignal(signal: Signal) {
    const sdk = new PolyMarketSDK({ apiKey: process.env.POLY_API_KEY });
    const order = await sdk.createOrder({
        marketId: signal.marketId,
        side: signal.side,
        price: signal.price,
        size: signal.size
    });
    await logTrade(order);
}
```

### Step 5: Build backtester
```python
# backtest.py
def backtest_strategy(start_date, end_date):
    historical_data = load_historical_markets(start_date, end_date)
    results = []

    for market in historical_data:
        signal = generate_signal(market)
        if signal:
            outcome = get_market_outcome(market['id'])
            pnl = calculate_pnl(signal, outcome)
            results.append({signal, outcome, pnl})

    return analyze_results(results)
```

### Step 6: Deploy and monitor
- Use cron/scheduler for regular data pulls
- Set up Discord alerts for signals and trades
- Daily P&L reports
- Weekly strategy review

---

## PART 6: KEY RISKS & MITIGATIONS

| Risk | Mitigation |
|------|------------|
| **Liquidity risk** | Avoid thin markets, use limit orders, size positions appropriately |
| **Execution risk** | Pre-test APIs, implement retry logic, have fallback mechanisms |
| **Model risk** | Backtest thoroughly, paper trade first, monitor live accuracy |
| **Platform risk** | Don't store large amounts on exchange, use API keys with limited permissions |
| **Correlation risk** | Track correlated positions, implement portfolio-level limits |
| **Regulatory risk** | Check terms of service, comply with local laws |
| **Market manipulation** | Be wary of wash trading, suspicious volume spikes |

---

## PART 7: NEXT ACTIONS

1. **Install MCP servers** - start with `prediction-mcp` for unified data access
2. **Pick a niche** - economic data, esports, or crypto (don't try everything)
3. **Build data pipeline** - PostgreSQL + simple ingestion scripts
4. **Start with rule-based signals** - easier to debug and understand
5. **Paper trade for 2-4 weeks** - validate before using real money
6. **Scale up gradually** - increase position sizes as confidence grows

---

*Ready to set up the stack? I can install MCP servers and start building the data pipeline.*