kalshi backtest progress === this document tracks the development progress, algorithm details, and backtest results for the kalshi prediction market trading system. last updated: 2026-01-22 backtest run #1 --- **date:** 2026-01-22 **period:** 2026-01-20 to 2026-01-22 (2 days) **initial capital:** $10,000 **interval:** 1 hour ### results summary | metric | strategy | random baseline | delta | |--------|----------|-----------------|-------| | total return | +$993.61 (+9.94%) | -$51.00 (-0.51%) | +$1,044.61 | | sharpe ratio | 5.448 | -2.436 | +7.884 | | max drawdown | 1.26% | 0.51% | +0.75% | | win rate | 58.7% | 0.0% | +58.7% | | total trades | 46 | 0 | +46 | | avg trade pnl | $4.59 | $0.00 | +$4.59 | | avg hold time | 5.4 hrs | 0.0 hrs | +5.4 hrs | ### notable trades | ticker | entry | exit | side | pnl | hold time | |--------|-------|------|------|-----|-----------| | KXKHLGAME-26JAN21SEVHCS-SEV | $0.17 | $0.99 | Yes | +$81.98 | 1h | | KXKHLGAME-26JAN21SEVHCS-HCS | $0.20 | $0.93 | Yes | +$72.98 | 1h | | KXFIRSTSUPERBOWLSONG-26FEB09-DTM | $0.11 | $0.63 | Yes | +$51.99 | 2h | | KXUCLBTTS-26JAN21OMLFC | $0.01 | $0.50 | Yes | +$49.00 | 1h | | KXNCAAWBGAME-26JAN21BULAF-LAF | $0.43 | $0.80 | No | +$36.96 | 3h | ### worst trades | ticker | entry | exit | side | pnl | hold time | |--------|-------|------|------|-----|-----------| | KXUCLBTTS-26JAN21GALATM | $0.40 | $0.01 | No | -$39.04 | 1h | | KXUCLBTTS-26JAN21QARSGE | $0.35 | $0.01 | No | -$34.04 | 7h | | KXFIRSTSUPERBOWLSONG-26FEB09-CAL | $0.35 | $0.07 | Yes | -$28.03 | 12h | | KXUCLGAME-26JAN21ATAATH-ATA | $0.46 | $0.19 | No | -$27.05 | 14h | algorithm architecture === pipeline overview --- the system uses a modular pipeline architecture with four stages: ``` sources -> filters -> scorers -> selector ``` 1. **sources** - retrieve market candidates from historical data 2. **filters** - remove unsuitable markets 3. **scorers** - compute feature scores for each candidate 4. **selector** - pick top-k candidates for trading current configuration --- ### sources | source | config | |--------|--------| | HistoricalMarketSource | lookback: 24 hours | ### filters | filter | config | purpose | |--------|--------|---------| | LiquidityFilter | min_volume_24h: 100 | reject illiquid markets | | TimeToCloseFilter | min: 2h, max: 720h | avoid expiring/distant markets | | AlreadyPositionedFilter | max_position: 100 | prevent over-concentration | typical filter stats (per interval): - ~17,000 candidates retrieved - ~10,000 pass liquidity filter (~58%) - ~7,200 pass time filter (~72% of remaining) - ~7,150 pass position filter (~99% of remaining) ### scorers the pipeline runs 8 independent scorers that each contribute features: | scorer | features | lookback | description | |--------|----------|----------|-------------| | MomentumScorer | `momentum` | 6h | price change over lookback window | | MultiTimeframeMomentumScorer | `mtf_momentum`, `mtf_divergence`, `mtf_alignment` | 1h, 4h, 12h, 24h | multi-window momentum with divergence detection | | MeanReversionScorer | `mean_reversion` | 24h | deviation from historical mean | | BollingerMeanReversionScorer | `bollinger_reversion`, `bollinger_position` | 24h, 2.0 std | statistical band analysis | | VolumeScorer | `volume` | 6h | log ratio of recent vs avg hourly volume | | OrderFlowScorer | `order_flow` | - | buy/sell imbalance from taker_side | | TimeDecayScorer | `time_decay` | - | time value decay factor | | CategoryWeightedScorer | `final_score` | - | category-specific weighted ensemble | ### category-specific weights the CategoryWeightedScorer applies different weight profiles based on market category: **default weights:** ``` momentum: 0.20 mean_reversion: 0.20 volume: 0.15 time_decay: 0.10 order_flow: 0.15 bollinger: 0.10 mtf_momentum: 0.10 ``` **politics:** ``` momentum: 0.35 (trend-following works) mean_reversion: 0.10 order_flow: 0.15 mtf_momentum: 0.15 ``` **weather:** ``` mean_reversion: 0.35 (converges to forecasts) bollinger: 0.15 time_decay: 0.15 ``` **sports:** ``` order_flow: 0.30 (sharp money matters) momentum: 0.20 volume: 0.15 ``` **economics/financial:** ``` momentum: 0.25 mean_reversion: 0.20 volume: 0.15 ``` ### selector | selector | config | |----------|--------| | TopKSelector | k=5 (max_positions) | execution logic === position sizing --- uses fractional kelly criterion for position sizing: ```rust kelly_fraction = 0.25 // use 25% of kelly optimal max_position_pct = 0.25 // max 25% of portfolio per trade min_position_size = 10 // minimum 10 contracts max_position_size = 100 // maximum 100 contracts ``` **edge to probability mapping:** ``` win_prob = (1 + tanh(edge)) / 2 ``` this smoothly maps scoring edge to estimated win probability. **kelly formula:** ``` kelly = (odds * win_prob - (1 - win_prob)) / odds position_value = bankroll * min(kelly * kelly_fraction, max_position_pct) ``` side selection --- the executor picks the cheaper side based on signal direction: - positive score (bullish) + yes_price < 0.5 -> buy YES - positive score (bullish) + yes_price >= 0.5 -> buy NO - negative score (bearish) + yes_price > 0.5 -> buy NO - negative score (bearish) + yes_price <= 0.5 -> buy YES rationale: buying the cheaper side gives better risk/reward ratio. exit conditions --- positions are closed when any of these trigger: | condition | threshold | description | |-----------|-----------|-------------| | take_profit | +20% | lock in gains | | stop_loss | -15% | limit downside | | time_stop | 72 hours | prevent stale positions | | score_reversal | < -0.3 | signal flipped against us | slippage model --- - 10 bps slippage applied to all fills - limit orders rejected if fill price exceeds limit by 2x slippage data characteristics === current dataset: `/mnt/work/kalshi-data/` | file | size | description | |------|------|-------------| | markets.csv | 6.6 GB | market metadata, results, prices | | trades.csv | 66 MB | individual trade records with taker_side | trade record schema: ``` timestamp, ticker, price, volume, taker_side ``` market record schema: ``` ticker, title, category, open_time, close_time, result, status, yes_bid, yes_ask, volume, open_interest ``` known issues / future work === ### issues 1. **empty categories** - return_by_category shows empty string, need to verify category parsing from market data 2. **no trading on jan 20** - equity curve shows no activity until jan 21 04:00, likely due to insufficient trade history in lookback window 3. **dead code warnings** - several unused scorers and filters (CorrelationScorer, MLEnsembleScorer, etc.) - cleanup needed ### planned improvements - [ ] category parsing fix - [ ] correlation scorer integration (granger causality between related markets) - [ ] ML model integration (ONNX runtime ready, needs trained models) - [ ] multi-day backtests with larger date ranges - [ ] parameter optimization / grid search - [ ] transaction cost analysis - [ ] position-level attribution appendix: scorer formulas === ### momentum ``` momentum = price(t) - price(t - lookback_hours) ``` ### mean reversion ``` mean = avg(prices over lookback_hours) deviation = current_price - mean mean_reversion = -deviation ``` ### bollinger bands ``` mean = avg(prices) std = stddev(prices) upper_band = mean + 2.0 * std lower_band = mean - 2.0 * std if price >= upper_band: score = -(price - upper_band) / std elif price <= lower_band: score = (lower_band - price) / std else: score = -0.5 * (position - 0.5) // weak mean reversion inside bands ``` ### volume ``` avg_hourly_volume = total_volume / hours_since_open recent_hourly_volume = recent_volume / lookback_hours volume_score = ln(recent_hourly_volume / avg_hourly_volume) ``` ### order flow ``` order_flow = (buy_volume - sell_volume) / (buy_volume + sell_volume) ``` ### time decay ``` hours_remaining = time_to_close time_decay = 1 - 1 / (hours_remaining / 24 + 1) ``` ranges from 0 (about to close) to ~1 (distant expiry).