17 KiB
Prediction Markets Edge Research
Polymarket & Kalshi - Opportunities & Full Stack
Generated 2026-01-22
PART 1: WHERE THE EDGE EXISTS
1. NICHE/SPECIALIZED KNOWLEDGE MARKETS
Why edge exists:
- Low liquidity = mispriced odds
- Fewer sharp traders
- Information asymmetry in specific domains
Categories with highest potential:
| Category | Why Edge | Examples |
|---|---|---|
| Economic indicators | Data-driven, predictable releases | CPI, unemployment, GDP beats/misses |
| Crypto technicals | On-chain data available early | ETH price targets, Bitcoin halving outcomes |
| Esports/specific sports | Niche data sources, scouting intel | Dota 2 tournaments, League match outcomes |
| Corporate events | Insider/industry connections | CEO departures, acquisitions, earnings beats |
| Geopolitical | Local intel, language barriers | Election outcomes in non-US countries |
Edge types:
- Data access: You get data faster (e.g., Bloomberg Terminal vs free APIs)
- Domain expertise: You understand nuances (e.g., esports meta shifts)
- Local intelligence: On-the-ground knowledge (elections, protests)
2. TIME-SENSITIVE MARKETS (Information Velocity)
Polymarket excels here - news moves odds FAST
Edge opportunities:
- Breaking news monitoring: Reuters API, Bloomberg News, Twitter/X firehose
- Economic data releases: Federal Reserve, BLS, BEA releases with millisecond precision
- On-chain signals: Whale alerts, large transfers, protocol exploits
- Social sentiment shifts: Reddit trends, TikTok virality tracking
Example workflow:
Reuters API → Detect breaking news → Cross-reference market → Analyze mispricing → Execute trade
Tools needed:
- Real-time news feeds (Reuters, Bloomberg, NewsAPI)
- Sentiment analysis (VADER, BERT, custom ML models)
- Fast execution (Polymarket CLOB, Kalshi API)
3. CROSS-PLATFORM ARBITRAGE
Why edge exists:
- Polymarket and Kalshi don't always have the same events
- Same event, different platforms = price discrepancies
- Different user bases = different market efficiency
Types of arbitrage:
- Direct arbitrage: Same outcome, different prices (rare but exists)
- Correlated arbitrage: Related markets with pricing gaps
- Platform liquidity arbitrage: Capitalize on platform-specific volume shocks
Example:
- Polymarket has "Fed rate cut in March 2026" at 65%
- Kalshi has "Fed funds rate below 4.5% by March 31 2026" at 58%
- If these are materially the same event, there's an edge
Full arbitrage stack:
pmxtjsor@alango/dr-manhattanfor unified API- Correlation detection engine
- Position sizing with platform-specific risk limits
4. LIQUIDITY & MARKET MAKING EDGE
Why edge exists:
- Many markets have thin order books
- Market makers can earn the spread
- Less competition on smaller markets
Strategies:
- Passive market making: Place limit orders on both sides of thin markets
- Inventory management: Hedge with correlated markets
- Volatility trading: Buy options/straddles around major events
Tools:
- Polymarket CLOB API for order placement
- Kalshi API for limit orders
- Real-time price feeds
5. MODEL-BASED PREDICTIONS
Where AI/ML shines:
| Market Type | Model Approach | Data Sources |
|---|---|---|
| Economic indicators | Time series forecasting (ARIMA, Prophet, LSTMs) | FRED API, Bloomberg historical |
| Elections | Poll aggregation + demographic weighting | 538, RealClearPolitics, district data |
| Crypto prices | On-chain metrics + sentiment | Dune Analytics, Glassnode, social APIs |
| Weather/climate | Ensemble meteorological models | NOAA, ECMWF, historical data |
| Sports outcomes | Elo ratings + player statistics | Statcast, ESPN APIs, scraping |
Edge comes from:
- Better data (non-obvious signals)
- Better models (ensemble, custom features)
- Faster updates (real-time re-training)
PART 2: THE FULL STACK
Layer 0: Infrastructure
┌─────────────────────────────────────────────────────────────┐
│ DATA INFRASTRUCTURE │
├─────────────────────────────────────────────────────────────┤
│ • Real-time APIs (news, markets, on-chain) │
│ • PostgreSQL/ClickHouse for historical data │
│ • Redis for caching + rate limiting │
│ • Message queue (RabbitMQ/Redis Streams) for events │
└─────────────────────────────────────────────────────────────┘
Key components:
- Database: PostgreSQL with TimescaleDB for time-series market data
- Cache: Redis for rate limiting, market snapshots, order book states
- Queue: RabbitMQ or Kafka for async job processing
- Monitoring: Prometheus + Grafana for system health, P&L tracking
Layer 1: Data Ingestion
Sources:
| Source | API/Tool | Use Case |
|---|---|---|
| Polymarket | @polymarket/sdk, polymarket-gamma, @nevuamarkets/poly-websockets |
Market data, odds, volume, order book |
| Kalshi | kalshi-typescript, @newyorkcompute/kalshi-core |
Market data, contract prices, fills |
| News | Reuters, Bloomberg, NewsAPI | Breaking news, sentiment |
| On-chain | Dune Analytics, The Graph, Whale Alert | Crypto-specific markets |
| Social | X (Twitter) API, Reddit API | Sentiment, trend detection |
| Economic | FRED API, BEA API, BLS API | Macro indicators |
Ingestion pattern:
# Pseudocode
async def ingest_polymarket_data():
ws = connect_poly_websocket()
async for msg in ws:
process_market_update(msg)
store_to_postgres(msg)
emit_to_queue(msg)
trigger_signal_if_edge_detected(msg)
Layer 2: Signal Generation
Three approaches:
- Rule-based signals
// Example: Economic data beat
if (actualCPI > forecastCPI && marketProbability < 80%) {
emitSignal({ market: "Fed hike July", action: "BUY YES", confidence: 0.85 });
}
- ML-based signals
# Example: Ensemble prediction
predictions = [
xgboost_model.predict(features),
lstm_model.predict(features),
sentiment_model.predict(features)
]
weighted_pred = weighted_average(predictions, historical_accuracy)
if weighted_pred > market_prob + threshold:
emit_signal(...)
- NLP-based signals (for news/sentiment)
# Example: Breaking news analysis
news_text = get_latest_news()
sentiment = transformer_model.predict(news_text)
entities = ner_model.extract(news_text)
if "Fed" in entities and sentiment > 0.7:
# Bullish signal for Fed-related markets
Signal validation:
- Backtest against historical data
- Paper trade with small size first
- Track prediction accuracy by market category
- Adjust confidence thresholds over time
Layer 3: Execution Engine
Polymarket execution:
import { PolyMarketSDK } from '@polymarket/sdk';
const sdk = new PolyMarketSDK({ apiKey: '...' });
// Place order
const order = await sdk.createOrder({
marketId: '0x...',
side: 'YES',
price: 0.65, // 65 cents
size: 100, // 100 contracts
expiration: 86400 // 24 hours
});
Kalshi execution:
import { KalshiSDK } from 'kalshi-typescript';
const sdk = new KalshiSDK({ apiKey: '...' });
// Place order
const order = await sdk.placeOrder({
ticker: 'HIGH-CPI-2026',
side: 'YES',
count: 100,
limit_price: 65 // cents
});
Execution considerations:
- Slippage: Thin markets = high slippage. Use limit orders with buffer.
- Gas: Polymarket requires ETH on Polygon for gas. Keep buffer.
- Rate limits: Both platforms have API rate limits. Implement backoff.
- Position limits: Don't overexpose to correlated markets.
Layer 4: Risk Management
Critical components:
- Position sizing
Kelly Criterion: f* = (bp - q) / b
where:
b = odds received on wager (decimal)
p = probability of winning
q = probability of losing (1 - p)
- Correlation matrix
-- Track correlated positions
SELECT m1.market_id, m2.market_id, correlation
FROM market_correlations mc
JOIN markets m1 ON mc.market_id_1 = m1.id
JOIN markets m2 ON mc.market_id_2 = m2.id
WHERE correlation > 0.7 AND active = true;
- P&L tracking
-- Daily P&L by strategy
SELECT
date,
strategy,
SUM(pnl) as total_pnl,
SUM(trades) as total_trades,
SUM(pnl) / NULLIF(SUM(max_risk), 0) as roi
FROM daily_pnl
GROUP BY date, strategy;
- Stop-loss mechanisms
# Example: Auto-liquidation threshold
if current_pnl < -max_drawdown:
liquidate_positions(reason="Max drawdown exceeded")
halt_trading(reason="Risk limit")
Layer 5: Monitoring & Analytics
Dashboard metrics:
- Real-time portfolio value
- Open positions + unrealized P&L
- Signal accuracy by category
- Win rate, ROI, Sharpe ratio
- Correlation heat map
Alerts:
- Large price movements
- Unusual volume spikes
- Failed orders
- System health issues
Backtesting:
- Replay historical data
- Test strategies against past events
- Calculate hypothetical P&L
- Optimize hyperparameters
PART 3: SPECIFIC EDGE STRATEGIES (with tech specs)
Strategy 1: Economic Data Trading
Markets: "CPI above X%", "Fed funds rate above Y%", "GDP growth > 2%"
Data sources:
- BLS API (CPI, unemployment)
- BEA API (GDP, personal income)
- Federal Reserve (FOMC statements, rate decisions)
Tech stack:
BLS/BEA API → Parser → Compare to consensus → If beat: buy YES, if miss: buy NO
Edge factor: Data is released at scheduled times; pre-position based on own analysis vs market consensus.
Risk: Market may have already priced in; look for subtle beats/misses.
Strategy 2: Esports/Specialized Sports
Markets: "Team A wins tournament X", "Player Y scores Z points"
Data sources:
- Official game APIs (Riot, Valve)
- Esports data providers (Pandascore, Strafe)
- Team social media (lineup changes, roster swaps)
- Scouting reports, patch notes (meta shifts)
Tech stack:
Riot API + Social scraping → Team form analysis → Probability model → Trade
Edge factor: Most bettors don't watch games closely; insider knowledge of roster changes, practice schedules, etc.
Risk: Low liquidity; hard to exit positions.
Strategy 3: Crypto On-Chain Signals
Markets: "BTC above $100K by X date", "ETH ETF approved by Y"
Data sources:
- Dune Analytics queries
- Whale Alert API
- Glassnode on-chain metrics
- Etherscan events
Tech stack:
Dune query → Whale movement detected → Cross-reference with market → Trade
Edge factor: On-chain data is transparent but not widely used by retail traders.
Risk: Manipulation (whale spoofing); correlation vs causation issues.
Strategy 4: Cross-Platform Arbitrage
Example workflow:
import { PolyMarketSDK } from '@polymarket/sdk';
import { KalshiSDK } from 'kalshi-typescript';
const poly = new PolyMarketSDK({ apiKey: '...' });
const kalshi = new KalshiSDK({ apiKey: '...' });
// Get equivalent markets
const polyMarket = await poly.getMarket({ slug: 'fed-hike-july-2026' });
const kalshiMarket = await kalshi.getMarket({ ticker: 'FED-HIKE-JULY-2026' });
// Detect arbitrage
if (polyMarket.price > kalshiMarket.price + threshold) {
// Buy NO on Polymarket, YES on Kalshi
await poly.createOrder({ marketId: polyMarket.id, side: 'NO', ... });
await kalshi.placeOrder({ ticker: kalshiMarket.ticker, side: 'YES', ... });
}
Edge factor: Information asymmetry between platforms; different user bases.
Risk: Execution risk (prices move during trade); correlated markets not exactly equivalent.
PART 4: RECOMMENDED STARTER STACK
Minimal Viable Product (MVP)
1. MCP Servers (via mcporter)
├── @iqai/mcp-polymarket
├── @newyorkcompute/kalshi-mcp
└── prediction-mcp (unified)
2. Data Pipeline
├── PostgreSQL (market data, trades, P&L)
├── Redis (caching, rate limiting)
└── Simple cron jobs (data ingestion)
3. Signal Engine
├── Rule-based signals (start simple)
├── Sentiment analysis (optional)
└── Backtesting framework
4. Execution
├── Polymarket SDK
├── Kalshi SDK
└── Order queue with retry logic
5. Monitoring
├── Grafana dashboard
├── Discord alerts
└── Daily P&L reports
Production-Grade Stack
1. Infrastructure
├── Cloud (AWS/GCP)
├── Kubernetes (scalability)
├── PostgreSQL + TimescaleDB (time-series)
├── Redis Cluster
└── RabbitMQ/Kafka
2. Data Ingestion
├── WebSocket connections (real-time)
├── REST APIs (historical)
├── Scrapers (social, news)
└── ML feature pipeline
3. Signal Engine
├── Ensemble models (XGBoost + LSTM)
├── NLP for news/sentiment
├── Backtesting framework
└── Hyperparameter optimization
4. Execution
├── Order management system
├── Position tracker
├── Risk engine
└── Circuit breakers
5. Monitoring
├── Prometheus + Grafana
├── Slack/Discord alerts
├── P&L analytics
└── Strategy performance dashboard
PART 5: GETTING STARTED (Step-by-Step)
Step 1: Install MCP servers
# Add via mcporter
mcporter add mcp-polymarket
mcporter add kalshi-mcp
mcporter add prediction-mcp
Step 2: Set up database
-- Schema for markets, trades, signals
CREATE TABLE markets (
id TEXT PRIMARY KEY,
platform TEXT NOT NULL,
slug TEXT NOT NULL,
question TEXT,
end_date TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE trades (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
market_id TEXT REFERENCES markets(id),
side TEXT NOT NULL,
price NUMERIC NOT NULL,
size NUMERIC NOT NULL,
pnl NUMERIC,
created_at TIMESTAMP DEFAULT NOW()
);
Step 3: Build signal generator (start with rule-based)
# signals/economic.py
def check_economic_signal(market_data, consensus, actual):
if actual > consensus and market_data['price'] < 0.8:
return {'action': 'BUY_YES', 'confidence': 0.8}
elif actual < consensus and market_data['price'] > 0.2:
return {'action': 'BUY_NO', 'confidence': 0.8}
return None
Step 4: Implement execution
// execute.ts
import { PolyMarketSDK } from '@polymarket/sdk';
async function executeSignal(signal: Signal) {
const sdk = new PolyMarketSDK({ apiKey: process.env.POLY_API_KEY });
const order = await sdk.createOrder({
marketId: signal.marketId,
side: signal.side,
price: signal.price,
size: signal.size
});
await logTrade(order);
}
Step 5: Build backtester
# backtest.py
def backtest_strategy(start_date, end_date):
historical_data = load_historical_markets(start_date, end_date)
results = []
for market in historical_data:
signal = generate_signal(market)
if signal:
outcome = get_market_outcome(market['id'])
pnl = calculate_pnl(signal, outcome)
results.append({signal, outcome, pnl})
return analyze_results(results)
Step 6: Deploy and monitor
- Use cron/scheduler for regular data pulls
- Set up Discord alerts for signals and trades
- Daily P&L reports
- Weekly strategy review
PART 6: KEY RISKS & MITIGATIONS
| Risk | Mitigation |
|---|---|
| Liquidity risk | Avoid thin markets, use limit orders, size positions appropriately |
| Execution risk | Pre-test APIs, implement retry logic, have fallback mechanisms |
| Model risk | Backtest thoroughly, paper trade first, monitor live accuracy |
| Platform risk | Don't store large amounts on exchange, use API keys with limited permissions |
| Correlation risk | Track correlated positions, implement portfolio-level limits |
| Regulatory risk | Check terms of service, comply with local laws |
| Market manipulation | Be wary of wash trading, suspicious volume spikes |
PART 7: NEXT ACTIONS
- Install MCP servers - start with
prediction-mcpfor unified data access - Pick a niche - economic data, esports, or crypto (don't try everything)
- Build data pipeline - PostgreSQL + simple ingestion scripts
- Start with rule-based signals - easier to debug and understand
- Paper trade for 2-4 weeks - validate before using real money
- Scale up gradually - increase position sizes as confidence grows
Ready to set up the stack? I can install MCP servers and start building the data pipeline.