# Prediction Markets Edge Research ## Polymarket & Kalshi - Opportunities & Full Stack *Generated 2026-01-22* --- ## PART 1: WHERE THE EDGE EXISTS ### 1. NICHE/SPECIALIZED KNOWLEDGE MARKETS **Why edge exists:** - Low liquidity = mispriced odds - Fewer sharp traders - Information asymmetry in specific domains **Categories with highest potential:** | Category | Why Edge | Examples | |----------|----------|----------| | **Economic indicators** | Data-driven, predictable releases | CPI, unemployment, GDP beats/misses | | **Crypto technicals** | On-chain data available early | ETH price targets, Bitcoin halving outcomes | | **Esports/specific sports** | Niche data sources, scouting intel | Dota 2 tournaments, League match outcomes | | **Corporate events** | Insider/industry connections | CEO departures, acquisitions, earnings beats | | **Geopolitical** | Local intel, language barriers | Election outcomes in non-US countries | **Edge types:** - **Data access**: You get data faster (e.g., Bloomberg Terminal vs free APIs) - **Domain expertise**: You understand nuances (e.g., esports meta shifts) - **Local intelligence**: On-the-ground knowledge (elections, protests) --- ### 2. TIME-SENSITIVE MARKETS (Information Velocity) **Polymarket excels here - news moves odds FAST** **Edge opportunities:** - **Breaking news monitoring**: Reuters API, Bloomberg News, Twitter/X firehose - **Economic data releases**: Federal Reserve, BLS, BEA releases with millisecond precision - **On-chain signals**: Whale alerts, large transfers, protocol exploits - **Social sentiment shifts**: Reddit trends, TikTok virality tracking **Example workflow:** ``` Reuters API → Detect breaking news → Cross-reference market → Analyze mispricing → Execute trade ``` **Tools needed:** - Real-time news feeds (Reuters, Bloomberg, NewsAPI) - Sentiment analysis (VADER, BERT, custom ML models) - Fast execution (Polymarket CLOB, Kalshi API) --- ### 3. CROSS-PLATFORM ARBITRAGE **Why edge exists:** - Polymarket and Kalshi don't always have the same events - Same event, different platforms = price discrepancies - Different user bases = different market efficiency **Types of arbitrage:** 1. **Direct arbitrage**: Same outcome, different prices (rare but exists) 2. **Correlated arbitrage**: Related markets with pricing gaps 3. **Platform liquidity arbitrage**: Capitalize on platform-specific volume shocks **Example:** - Polymarket has "Fed rate cut in March 2026" at 65% - Kalshi has "Fed funds rate below 4.5% by March 31 2026" at 58% - If these are materially the same event, there's an edge **Full arbitrage stack:** - `pmxtjs` or `@alango/dr-manhattan` for unified API - Correlation detection engine - Position sizing with platform-specific risk limits --- ### 4. LIQUIDITY & MARKET MAKING EDGE **Why edge exists:** - Many markets have thin order books - Market makers can earn the spread - Less competition on smaller markets **Strategies:** - **Passive market making**: Place limit orders on both sides of thin markets - **Inventory management**: Hedge with correlated markets - **Volatility trading**: Buy options/straddles around major events **Tools:** - Polymarket CLOB API for order placement - Kalshi API for limit orders - Real-time price feeds --- ### 5. MODEL-BASED PREDICTIONS **Where AI/ML shines:** | Market Type | Model Approach | Data Sources | |-------------|-----------------|--------------| | Economic indicators | Time series forecasting (ARIMA, Prophet, LSTMs) | FRED API, Bloomberg historical | | Elections | Poll aggregation + demographic weighting | 538, RealClearPolitics, district data | | Crypto prices | On-chain metrics + sentiment | Dune Analytics, Glassnode, social APIs | | Weather/climate | Ensemble meteorological models | NOAA, ECMWF, historical data | | Sports outcomes | Elo ratings + player statistics | Statcast, ESPN APIs, scraping | **Edge comes from:** - Better data (non-obvious signals) - Better models (ensemble, custom features) - Faster updates (real-time re-training) --- ## PART 2: THE FULL STACK ### Layer 0: Infrastructure ``` ┌─────────────────────────────────────────────────────────────┐ │ DATA INFRASTRUCTURE │ ├─────────────────────────────────────────────────────────────┤ │ • Real-time APIs (news, markets, on-chain) │ │ • PostgreSQL/ClickHouse for historical data │ │ • Redis for caching + rate limiting │ │ • Message queue (RabbitMQ/Redis Streams) for events │ └─────────────────────────────────────────────────────────────┘ ``` **Key components:** - **Database**: PostgreSQL with TimescaleDB for time-series market data - **Cache**: Redis for rate limiting, market snapshots, order book states - **Queue**: RabbitMQ or Kafka for async job processing - **Monitoring**: Prometheus + Grafana for system health, P&L tracking --- ### Layer 1: Data Ingestion **Sources:** | Source | API/Tool | Use Case | |--------|----------|----------| | Polymarket | `@polymarket/sdk`, `polymarket-gamma`, `@nevuamarkets/poly-websockets` | Market data, odds, volume, order book | | Kalshi | `kalshi-typescript`, `@newyorkcompute/kalshi-core` | Market data, contract prices, fills | | News | Reuters, Bloomberg, NewsAPI | Breaking news, sentiment | | On-chain | Dune Analytics, The Graph, Whale Alert | Crypto-specific markets | | Social | X (Twitter) API, Reddit API | Sentiment, trend detection | | Economic | FRED API, BEA API, BLS API | Macro indicators | **Ingestion pattern:** ```python # Pseudocode async def ingest_polymarket_data(): ws = connect_poly_websocket() async for msg in ws: process_market_update(msg) store_to_postgres(msg) emit_to_queue(msg) trigger_signal_if_edge_detected(msg) ``` --- ### Layer 2: Signal Generation **Three approaches:** 1. **Rule-based signals** ```javascript // Example: Economic data beat if (actualCPI > forecastCPI && marketProbability < 80%) { emitSignal({ market: "Fed hike July", action: "BUY YES", confidence: 0.85 }); } ``` 2. **ML-based signals** ```python # Example: Ensemble prediction predictions = [ xgboost_model.predict(features), lstm_model.predict(features), sentiment_model.predict(features) ] weighted_pred = weighted_average(predictions, historical_accuracy) if weighted_pred > market_prob + threshold: emit_signal(...) ``` 3. **NLP-based signals** (for news/sentiment) ```python # Example: Breaking news analysis news_text = get_latest_news() sentiment = transformer_model.predict(news_text) entities = ner_model.extract(news_text) if "Fed" in entities and sentiment > 0.7: # Bullish signal for Fed-related markets ``` **Signal validation:** - Backtest against historical data - Paper trade with small size first - Track prediction accuracy by market category - Adjust confidence thresholds over time --- ### Layer 3: Execution Engine **Polymarket execution:** ```typescript import { PolyMarketSDK } from '@polymarket/sdk'; const sdk = new PolyMarketSDK({ apiKey: '...' }); // Place order const order = await sdk.createOrder({ marketId: '0x...', side: 'YES', price: 0.65, // 65 cents size: 100, // 100 contracts expiration: 86400 // 24 hours }); ``` **Kalshi execution:** ```typescript import { KalshiSDK } from 'kalshi-typescript'; const sdk = new KalshiSDK({ apiKey: '...' }); // Place order const order = await sdk.placeOrder({ ticker: 'HIGH-CPI-2026', side: 'YES', count: 100, limit_price: 65 // cents }); ``` **Execution considerations:** - **Slippage**: Thin markets = high slippage. Use limit orders with buffer. - **Gas**: Polymarket requires ETH on Polygon for gas. Keep buffer. - **Rate limits**: Both platforms have API rate limits. Implement backoff. - **Position limits**: Don't overexpose to correlated markets. --- ### Layer 4: Risk Management **Critical components:** 1. **Position sizing** ``` Kelly Criterion: f* = (bp - q) / b where: b = odds received on wager (decimal) p = probability of winning q = probability of losing (1 - p) ``` 2. **Correlation matrix** ```sql -- Track correlated positions SELECT m1.market_id, m2.market_id, correlation FROM market_correlations mc JOIN markets m1 ON mc.market_id_1 = m1.id JOIN markets m2 ON mc.market_id_2 = m2.id WHERE correlation > 0.7 AND active = true; ``` 3. **P&L tracking** ```sql -- Daily P&L by strategy SELECT date, strategy, SUM(pnl) as total_pnl, SUM(trades) as total_trades, SUM(pnl) / NULLIF(SUM(max_risk), 0) as roi FROM daily_pnl GROUP BY date, strategy; ``` 4. **Stop-loss mechanisms** ```python # Example: Auto-liquidation threshold if current_pnl < -max_drawdown: liquidate_positions(reason="Max drawdown exceeded") halt_trading(reason="Risk limit") ``` --- ### Layer 5: Monitoring & Analytics **Dashboard metrics:** - Real-time portfolio value - Open positions + unrealized P&L - Signal accuracy by category - Win rate, ROI, Sharpe ratio - Correlation heat map **Alerts:** - Large price movements - Unusual volume spikes - Failed orders - System health issues **Backtesting:** - Replay historical data - Test strategies against past events - Calculate hypothetical P&L - Optimize hyperparameters --- ## PART 3: SPECIFIC EDGE STRATEGIES (with tech specs) ### Strategy 1: Economic Data Trading **Markets:** "CPI above X%", "Fed funds rate above Y%", "GDP growth > 2%" **Data sources:** - BLS API (CPI, unemployment) - BEA API (GDP, personal income) - Federal Reserve (FOMC statements, rate decisions) **Tech stack:** ``` BLS/BEA API → Parser → Compare to consensus → If beat: buy YES, if miss: buy NO ``` **Edge factor:** Data is released at scheduled times; pre-position based on own analysis vs market consensus. **Risk:** Market may have already priced in; look for subtle beats/misses. --- ### Strategy 2: Esports/Specialized Sports **Markets:** "Team A wins tournament X", "Player Y scores Z points" **Data sources:** - Official game APIs (Riot, Valve) - Esports data providers (Pandascore, Strafe) - Team social media (lineup changes, roster swaps) - Scouting reports, patch notes (meta shifts) **Tech stack:** ``` Riot API + Social scraping → Team form analysis → Probability model → Trade ``` **Edge factor:** Most bettors don't watch games closely; insider knowledge of roster changes, practice schedules, etc. **Risk:** Low liquidity; hard to exit positions. --- ### Strategy 3: Crypto On-Chain Signals **Markets:** "BTC above $100K by X date", "ETH ETF approved by Y" **Data sources:** - Dune Analytics queries - Whale Alert API - Glassnode on-chain metrics - Etherscan events **Tech stack:** ``` Dune query → Whale movement detected → Cross-reference with market → Trade ``` **Edge factor:** On-chain data is transparent but not widely used by retail traders. **Risk:** Manipulation (whale spoofing); correlation vs causation issues. --- ### Strategy 4: Cross-Platform Arbitrage **Example workflow:** ```typescript import { PolyMarketSDK } from '@polymarket/sdk'; import { KalshiSDK } from 'kalshi-typescript'; const poly = new PolyMarketSDK({ apiKey: '...' }); const kalshi = new KalshiSDK({ apiKey: '...' }); // Get equivalent markets const polyMarket = await poly.getMarket({ slug: 'fed-hike-july-2026' }); const kalshiMarket = await kalshi.getMarket({ ticker: 'FED-HIKE-JULY-2026' }); // Detect arbitrage if (polyMarket.price > kalshiMarket.price + threshold) { // Buy NO on Polymarket, YES on Kalshi await poly.createOrder({ marketId: polyMarket.id, side: 'NO', ... }); await kalshi.placeOrder({ ticker: kalshiMarket.ticker, side: 'YES', ... }); } ``` **Edge factor:** Information asymmetry between platforms; different user bases. **Risk:** Execution risk (prices move during trade); correlated markets not exactly equivalent. --- ## PART 4: RECOMMENDED STARTER STACK ### Minimal Viable Product (MVP) ``` 1. MCP Servers (via mcporter) ├── @iqai/mcp-polymarket ├── @newyorkcompute/kalshi-mcp └── prediction-mcp (unified) 2. Data Pipeline ├── PostgreSQL (market data, trades, P&L) ├── Redis (caching, rate limiting) └── Simple cron jobs (data ingestion) 3. Signal Engine ├── Rule-based signals (start simple) ├── Sentiment analysis (optional) └── Backtesting framework 4. Execution ├── Polymarket SDK ├── Kalshi SDK └── Order queue with retry logic 5. Monitoring ├── Grafana dashboard ├── Discord alerts └── Daily P&L reports ``` ### Production-Grade Stack ``` 1. Infrastructure ├── Cloud (AWS/GCP) ├── Kubernetes (scalability) ├── PostgreSQL + TimescaleDB (time-series) ├── Redis Cluster └── RabbitMQ/Kafka 2. Data Ingestion ├── WebSocket connections (real-time) ├── REST APIs (historical) ├── Scrapers (social, news) └── ML feature pipeline 3. Signal Engine ├── Ensemble models (XGBoost + LSTM) ├── NLP for news/sentiment ├── Backtesting framework └── Hyperparameter optimization 4. Execution ├── Order management system ├── Position tracker ├── Risk engine └── Circuit breakers 5. Monitoring ├── Prometheus + Grafana ├── Slack/Discord alerts ├── P&L analytics └── Strategy performance dashboard ``` --- ## PART 5: GETTING STARTED (Step-by-Step) ### Step 1: Install MCP servers ```bash # Add via mcporter mcporter add mcp-polymarket mcporter add kalshi-mcp mcporter add prediction-mcp ``` ### Step 2: Set up database ```sql -- Schema for markets, trades, signals CREATE TABLE markets ( id TEXT PRIMARY KEY, platform TEXT NOT NULL, slug TEXT NOT NULL, question TEXT, end_date TIMESTAMP, created_at TIMESTAMP DEFAULT NOW() ); CREATE TABLE trades ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), market_id TEXT REFERENCES markets(id), side TEXT NOT NULL, price NUMERIC NOT NULL, size NUMERIC NOT NULL, pnl NUMERIC, created_at TIMESTAMP DEFAULT NOW() ); ``` ### Step 3: Build signal generator (start with rule-based) ```python # signals/economic.py def check_economic_signal(market_data, consensus, actual): if actual > consensus and market_data['price'] < 0.8: return {'action': 'BUY_YES', 'confidence': 0.8} elif actual < consensus and market_data['price'] > 0.2: return {'action': 'BUY_NO', 'confidence': 0.8} return None ``` ### Step 4: Implement execution ```typescript // execute.ts import { PolyMarketSDK } from '@polymarket/sdk'; async function executeSignal(signal: Signal) { const sdk = new PolyMarketSDK({ apiKey: process.env.POLY_API_KEY }); const order = await sdk.createOrder({ marketId: signal.marketId, side: signal.side, price: signal.price, size: signal.size }); await logTrade(order); } ``` ### Step 5: Build backtester ```python # backtest.py def backtest_strategy(start_date, end_date): historical_data = load_historical_markets(start_date, end_date) results = [] for market in historical_data: signal = generate_signal(market) if signal: outcome = get_market_outcome(market['id']) pnl = calculate_pnl(signal, outcome) results.append({signal, outcome, pnl}) return analyze_results(results) ``` ### Step 6: Deploy and monitor - Use cron/scheduler for regular data pulls - Set up Discord alerts for signals and trades - Daily P&L reports - Weekly strategy review --- ## PART 6: KEY RISKS & MITIGATIONS | Risk | Mitigation | |------|------------| | **Liquidity risk** | Avoid thin markets, use limit orders, size positions appropriately | | **Execution risk** | Pre-test APIs, implement retry logic, have fallback mechanisms | | **Model risk** | Backtest thoroughly, paper trade first, monitor live accuracy | | **Platform risk** | Don't store large amounts on exchange, use API keys with limited permissions | | **Correlation risk** | Track correlated positions, implement portfolio-level limits | | **Regulatory risk** | Check terms of service, comply with local laws | | **Market manipulation** | Be wary of wash trading, suspicious volume spikes | --- ## PART 7: NEXT ACTIONS 1. **Install MCP servers** - start with `prediction-mcp` for unified data access 2. **Pick a niche** - economic data, esports, or crypto (don't try everything) 3. **Build data pipeline** - PostgreSQL + simple ingestion scripts 4. **Start with rule-based signals** - easier to debug and understand 5. **Paper trade for 2-4 weeks** - validate before using real money 6. **Scale up gradually** - increase position sizes as confidence grows --- *Ready to set up the stack? I can install MCP servers and start building the data pipeline.*