Jake Shore df4aa799f8 Daily backup: 2026-01-24 - Workspace files including Discord bot automation research, Reonomy scraper versions, backup scripts, and project config

2026-01-24 05:09:55 -05:00

17 KiB

Raw Blame History

Prediction Markets Edge Research

Polymarket & Kalshi - Opportunities & Full Stack

Generated 2026-01-22

PART 1: WHERE THE EDGE EXISTS

1. NICHE/SPECIALIZED KNOWLEDGE MARKETS

Why edge exists:

Low liquidity = mispriced odds
Fewer sharp traders
Information asymmetry in specific domains

Categories with highest potential:

Category	Why Edge	Examples
Economic indicators	Data-driven, predictable releases	CPI, unemployment, GDP beats/misses
Crypto technicals	On-chain data available early	ETH price targets, Bitcoin halving outcomes
Esports/specific sports	Niche data sources, scouting intel	Dota 2 tournaments, League match outcomes
Corporate events	Insider/industry connections	CEO departures, acquisitions, earnings beats
Geopolitical	Local intel, language barriers	Election outcomes in non-US countries

Edge types:

Data access: You get data faster (e.g., Bloomberg Terminal vs free APIs)
Domain expertise: You understand nuances (e.g., esports meta shifts)
Local intelligence: On-the-ground knowledge (elections, protests)

2. TIME-SENSITIVE MARKETS (Information Velocity)

Polymarket excels here - news moves odds FAST

Edge opportunities:

Breaking news monitoring: Reuters API, Bloomberg News, Twitter/X firehose
Economic data releases: Federal Reserve, BLS, BEA releases with millisecond precision
On-chain signals: Whale alerts, large transfers, protocol exploits
Social sentiment shifts: Reddit trends, TikTok virality tracking

Example workflow:

Reuters API → Detect breaking news → Cross-reference market → Analyze mispricing → Execute trade

Tools needed:

Real-time news feeds (Reuters, Bloomberg, NewsAPI)
Sentiment analysis (VADER, BERT, custom ML models)
Fast execution (Polymarket CLOB, Kalshi API)

3. CROSS-PLATFORM ARBITRAGE

Why edge exists:

Polymarket and Kalshi don't always have the same events
Same event, different platforms = price discrepancies
Different user bases = different market efficiency

Types of arbitrage:

Direct arbitrage: Same outcome, different prices (rare but exists)
Correlated arbitrage: Related markets with pricing gaps
Platform liquidity arbitrage: Capitalize on platform-specific volume shocks

Example:

Polymarket has "Fed rate cut in March 2026" at 65%
Kalshi has "Fed funds rate below 4.5% by March 31 2026" at 58%
If these are materially the same event, there's an edge

Full arbitrage stack:

pmxtjs or @alango/dr-manhattan for unified API
Correlation detection engine
Position sizing with platform-specific risk limits

4. LIQUIDITY & MARKET MAKING EDGE

Why edge exists:

Many markets have thin order books
Market makers can earn the spread
Less competition on smaller markets

Strategies:

Passive market making: Place limit orders on both sides of thin markets
Inventory management: Hedge with correlated markets
Volatility trading: Buy options/straddles around major events

Tools:

Polymarket CLOB API for order placement
Kalshi API for limit orders
Real-time price feeds

5. MODEL-BASED PREDICTIONS

Where AI/ML shines:

Market Type	Model Approach	Data Sources
Economic indicators	Time series forecasting (ARIMA, Prophet, LSTMs)	FRED API, Bloomberg historical
Elections	Poll aggregation + demographic weighting	538, RealClearPolitics, district data
Crypto prices	On-chain metrics + sentiment	Dune Analytics, Glassnode, social APIs
Weather/climate	Ensemble meteorological models	NOAA, ECMWF, historical data
Sports outcomes	Elo ratings + player statistics	Statcast, ESPN APIs, scraping

Edge comes from:

Better data (non-obvious signals)
Better models (ensemble, custom features)
Faster updates (real-time re-training)

PART 2: THE FULL STACK

Layer 0: Infrastructure

┌─────────────────────────────────────────────────────────────┐
│                    DATA INFRASTRUCTURE                      │
├─────────────────────────────────────────────────────────────┤
│  • Real-time APIs (news, markets, on-chain)                │
│  • PostgreSQL/ClickHouse for historical data               │
│  • Redis for caching + rate limiting                        │
│  • Message queue (RabbitMQ/Redis Streams) for events       │
└─────────────────────────────────────────────────────────────┘

Key components:

Database: PostgreSQL with TimescaleDB for time-series market data
Cache: Redis for rate limiting, market snapshots, order book states
Queue: RabbitMQ or Kafka for async job processing
Monitoring: Prometheus + Grafana for system health, P&L tracking

Layer 1: Data Ingestion

Sources:

Source	API/Tool	Use Case
Polymarket	`@polymarket/sdk`, `polymarket-gamma`, `@nevuamarkets/poly-websockets`	Market data, odds, volume, order book
Kalshi	`kalshi-typescript`, `@newyorkcompute/kalshi-core`	Market data, contract prices, fills
News	Reuters, Bloomberg, NewsAPI	Breaking news, sentiment
On-chain	Dune Analytics, The Graph, Whale Alert	Crypto-specific markets
Social	X (Twitter) API, Reddit API	Sentiment, trend detection
Economic	FRED API, BEA API, BLS API	Macro indicators

Ingestion pattern:

# Pseudocode
async def ingest_polymarket_data():
    ws = connect_poly_websocket()
    async for msg in ws:
        process_market_update(msg)
        store_to_postgres(msg)
        emit_to_queue(msg)
        trigger_signal_if_edge_detected(msg)

Layer 2: Signal Generation

Three approaches:

Rule-based signals

// Example: Economic data beat
if (actualCPI > forecastCPI && marketProbability < 80%) {
    emitSignal({ market: "Fed hike July", action: "BUY YES", confidence: 0.85 });
}

ML-based signals

# Example: Ensemble prediction
predictions = [
    xgboost_model.predict(features),
    lstm_model.predict(features),
    sentiment_model.predict(features)
]
weighted_pred = weighted_average(predictions, historical_accuracy)
if weighted_pred > market_prob + threshold:
    emit_signal(...)

NLP-based signals (for news/sentiment)

# Example: Breaking news analysis
news_text = get_latest_news()
sentiment = transformer_model.predict(news_text)
entities = ner_model.extract(news_text)
if "Fed" in entities and sentiment > 0.7:
    # Bullish signal for Fed-related markets

Signal validation:

Backtest against historical data
Paper trade with small size first
Track prediction accuracy by market category
Adjust confidence thresholds over time

Layer 3: Execution Engine

Polymarket execution:

import { PolyMarketSDK } from '@polymarket/sdk';

const sdk = new PolyMarketSDK({ apiKey: '...' });

// Place order
const order = await sdk.createOrder({
    marketId: '0x...',
    side: 'YES',
    price: 0.65, // 65 cents
    size: 100,   // 100 contracts
    expiration: 86400 // 24 hours
});

Kalshi execution:

import { KalshiSDK } from 'kalshi-typescript';

const sdk = new KalshiSDK({ apiKey: '...' });

// Place order
const order = await sdk.placeOrder({
    ticker: 'HIGH-CPI-2026',
    side: 'YES',
    count: 100,
    limit_price: 65 // cents
});

Execution considerations:

Slippage: Thin markets = high slippage. Use limit orders with buffer.
Gas: Polymarket requires ETH on Polygon for gas. Keep buffer.
Rate limits: Both platforms have API rate limits. Implement backoff.
Position limits: Don't overexpose to correlated markets.

Layer 4: Risk Management

Critical components:

Position sizing

Kelly Criterion: f* = (bp - q) / b
where:
  b = odds received on wager (decimal)
  p = probability of winning
  q = probability of losing (1 - p)

Correlation matrix

-- Track correlated positions
SELECT m1.market_id, m2.market_id, correlation
FROM market_correlations mc
JOIN markets m1 ON mc.market_id_1 = m1.id
JOIN markets m2 ON mc.market_id_2 = m2.id
WHERE correlation > 0.7 AND active = true;

P&L tracking

-- Daily P&L by strategy
SELECT
    date,
    strategy,
    SUM(pnl) as total_pnl,
    SUM(trades) as total_trades,
    SUM(pnl) / NULLIF(SUM(max_risk), 0) as roi
FROM daily_pnl
GROUP BY date, strategy;

Stop-loss mechanisms

# Example: Auto-liquidation threshold
if current_pnl < -max_drawdown:
    liquidate_positions(reason="Max drawdown exceeded")
    halt_trading(reason="Risk limit")

Layer 5: Monitoring & Analytics

Dashboard metrics:

Real-time portfolio value
Open positions + unrealized P&L
Signal accuracy by category
Win rate, ROI, Sharpe ratio
Correlation heat map

Alerts:

Large price movements
Unusual volume spikes
Failed orders
System health issues

Backtesting:

Replay historical data
Test strategies against past events
Calculate hypothetical P&L
Optimize hyperparameters

PART 3: SPECIFIC EDGE STRATEGIES (with tech specs)

Strategy 1: Economic Data Trading

Markets: "CPI above X%", "Fed funds rate above Y%", "GDP growth > 2%"

Data sources:

BLS API (CPI, unemployment)
BEA API (GDP, personal income)
Federal Reserve (FOMC statements, rate decisions)

Tech stack:

BLS/BEA API → Parser → Compare to consensus → If beat: buy YES, if miss: buy NO

Edge factor: Data is released at scheduled times; pre-position based on own analysis vs market consensus.

Risk: Market may have already priced in; look for subtle beats/misses.

Strategy 2: Esports/Specialized Sports

Markets: "Team A wins tournament X", "Player Y scores Z points"

Data sources:

Official game APIs (Riot, Valve)
Esports data providers (Pandascore, Strafe)
Team social media (lineup changes, roster swaps)
Scouting reports, patch notes (meta shifts)

Tech stack:

Riot API + Social scraping → Team form analysis → Probability model → Trade

Edge factor: Most bettors don't watch games closely; insider knowledge of roster changes, practice schedules, etc.

Risk: Low liquidity; hard to exit positions.

Strategy 3: Crypto On-Chain Signals

Markets: "BTC above $100K by X date", "ETH ETF approved by Y"

Data sources:

Dune Analytics queries
Whale Alert API
Glassnode on-chain metrics
Etherscan events

Tech stack:

Dune query → Whale movement detected → Cross-reference with market → Trade

Edge factor: On-chain data is transparent but not widely used by retail traders.

Risk: Manipulation (whale spoofing); correlation vs causation issues.

Strategy 4: Cross-Platform Arbitrage

Example workflow:

import { PolyMarketSDK } from '@polymarket/sdk';
import { KalshiSDK } from 'kalshi-typescript';

const poly = new PolyMarketSDK({ apiKey: '...' });
const kalshi = new KalshiSDK({ apiKey: '...' });

// Get equivalent markets
const polyMarket = await poly.getMarket({ slug: 'fed-hike-july-2026' });
const kalshiMarket = await kalshi.getMarket({ ticker: 'FED-HIKE-JULY-2026' });

// Detect arbitrage
if (polyMarket.price > kalshiMarket.price + threshold) {
    // Buy NO on Polymarket, YES on Kalshi
    await poly.createOrder({ marketId: polyMarket.id, side: 'NO', ... });
    await kalshi.placeOrder({ ticker: kalshiMarket.ticker, side: 'YES', ... });
}

Edge factor: Information asymmetry between platforms; different user bases.

Risk: Execution risk (prices move during trade); correlated markets not exactly equivalent.

PART 4: RECOMMENDED STARTER STACK

Minimal Viable Product (MVP)

1. MCP Servers (via mcporter)
   ├── @iqai/mcp-polymarket
   ├── @newyorkcompute/kalshi-mcp
   └── prediction-mcp (unified)

2. Data Pipeline
   ├── PostgreSQL (market data, trades, P&L)
   ├── Redis (caching, rate limiting)
   └── Simple cron jobs (data ingestion)

3. Signal Engine
   ├── Rule-based signals (start simple)
   ├── Sentiment analysis (optional)
   └── Backtesting framework

4. Execution
   ├── Polymarket SDK
   ├── Kalshi SDK
   └── Order queue with retry logic

5. Monitoring
   ├── Grafana dashboard
   ├── Discord alerts
   └── Daily P&L reports

Production-Grade Stack

1. Infrastructure
   ├── Cloud (AWS/GCP)
   ├── Kubernetes (scalability)
   ├── PostgreSQL + TimescaleDB (time-series)
   ├── Redis Cluster
   └── RabbitMQ/Kafka

2. Data Ingestion
   ├── WebSocket connections (real-time)
   ├── REST APIs (historical)
   ├── Scrapers (social, news)
   └── ML feature pipeline

3. Signal Engine
   ├── Ensemble models (XGBoost + LSTM)
   ├── NLP for news/sentiment
   ├── Backtesting framework
   └── Hyperparameter optimization

4. Execution
   ├── Order management system
   ├── Position tracker
   ├── Risk engine
   └── Circuit breakers

5. Monitoring
   ├── Prometheus + Grafana
   ├── Slack/Discord alerts
   ├── P&L analytics
   └── Strategy performance dashboard

PART 5: GETTING STARTED (Step-by-Step)

Step 1: Install MCP servers

# Add via mcporter
mcporter add mcp-polymarket
mcporter add kalshi-mcp
mcporter add prediction-mcp

Step 2: Set up database

-- Schema for markets, trades, signals
CREATE TABLE markets (
    id TEXT PRIMARY KEY,
    platform TEXT NOT NULL,
    slug TEXT NOT NULL,
    question TEXT,
    end_date TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE trades (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    market_id TEXT REFERENCES markets(id),
    side TEXT NOT NULL,
    price NUMERIC NOT NULL,
    size NUMERIC NOT NULL,
    pnl NUMERIC,
    created_at TIMESTAMP DEFAULT NOW()
);

Step 3: Build signal generator (start with rule-based)

# signals/economic.py
def check_economic_signal(market_data, consensus, actual):
    if actual > consensus and market_data['price'] < 0.8:
        return {'action': 'BUY_YES', 'confidence': 0.8}
    elif actual < consensus and market_data['price'] > 0.2:
        return {'action': 'BUY_NO', 'confidence': 0.8}
    return None

Step 4: Implement execution

// execute.ts
import { PolyMarketSDK } from '@polymarket/sdk';

async function executeSignal(signal: Signal) {
    const sdk = new PolyMarketSDK({ apiKey: process.env.POLY_API_KEY });
    const order = await sdk.createOrder({
        marketId: signal.marketId,
        side: signal.side,
        price: signal.price,
        size: signal.size
    });
    await logTrade(order);
}

Step 5: Build backtester

# backtest.py
def backtest_strategy(start_date, end_date):
    historical_data = load_historical_markets(start_date, end_date)
    results = []

    for market in historical_data:
        signal = generate_signal(market)
        if signal:
            outcome = get_market_outcome(market['id'])
            pnl = calculate_pnl(signal, outcome)
            results.append({signal, outcome, pnl})

    return analyze_results(results)

Step 6: Deploy and monitor

Use cron/scheduler for regular data pulls
Set up Discord alerts for signals and trades
Daily P&L reports
Weekly strategy review

PART 6: KEY RISKS & MITIGATIONS

Risk	Mitigation
Liquidity risk	Avoid thin markets, use limit orders, size positions appropriately
Execution risk	Pre-test APIs, implement retry logic, have fallback mechanisms
Model risk	Backtest thoroughly, paper trade first, monitor live accuracy
Platform risk	Don't store large amounts on exchange, use API keys with limited permissions
Correlation risk	Track correlated positions, implement portfolio-level limits
Regulatory risk	Check terms of service, comply with local laws
Market manipulation	Be wary of wash trading, suspicious volume spikes

PART 7: NEXT ACTIONS

Install MCP servers - start with prediction-mcp for unified data access
Pick a niche - economic data, esports, or crypto (don't try everything)
Build data pipeline - PostgreSQL + simple ingestion scripts
Start with rule-based signals - easier to debug and understand
Paper trade for 2-4 weeks - validate before using real money
Scale up gradually - increase position sizes as confidence grows

Ready to set up the stack? I can install MCP servers and start building the data pipeline.

17 KiB Raw Blame History

Prediction Markets Edge Research

Polymarket & Kalshi - Opportunities & Full Stack

PART 1: WHERE THE EDGE EXISTS

1. NICHE/SPECIALIZED KNOWLEDGE MARKETS

2. TIME-SENSITIVE MARKETS (Information Velocity)

3. CROSS-PLATFORM ARBITRAGE

4. LIQUIDITY & MARKET MAKING EDGE

5. MODEL-BASED PREDICTIONS

PART 2: THE FULL STACK

Layer 0: Infrastructure

Layer 1: Data Ingestion

Layer 2: Signal Generation

Layer 3: Execution Engine

Layer 4: Risk Management

Layer 5: Monitoring & Analytics

PART 3: SPECIFIC EDGE STRATEGIES (with tech specs)

Strategy 1: Economic Data Trading

Strategy 2: Esports/Specialized Sports

Strategy 3: Crypto On-Chain Signals

Strategy 4: Cross-Platform Arbitrage

PART 4: RECOMMENDED STARTER STACK

Minimal Viable Product (MVP)

Production-Grade Stack

PART 5: GETTING STARTED (Step-by-Step)

Step 1: Install MCP servers

Step 2: Set up database

Step 3: Build signal generator (start with rule-based)

Step 4: Implement execution

Step 5: Build backtester

Step 6: Deploy and monitor

PART 6: KEY RISKS & MITIGATIONS

PART 7: NEXT ACTIONS

17 KiB

Raw Blame History