kalshi-backtest/PROGRESS.md
2026-01-22 06:19:18 -07:00

8.0 KiB

kalshi backtest progress

this document tracks the development progress, algorithm details, and backtest results for the kalshi prediction market trading system.

last updated: 2026-01-22

backtest run #1

date: 2026-01-22 period: 2026-01-20 to 2026-01-22 (2 days) initial capital: $10,000 interval: 1 hour

results summary

metric strategy random baseline delta
total return +$993.61 (+9.94%) -$51.00 (-0.51%) +$1,044.61
sharpe ratio 5.448 -2.436 +7.884
max drawdown 1.26% 0.51% +0.75%
win rate 58.7% 0.0% +58.7%
total trades 46 0 +46
avg trade pnl $4.59 $0.00 +$4.59
avg hold time 5.4 hrs 0.0 hrs +5.4 hrs

notable trades

ticker entry exit side pnl hold time
KXKHLGAME-26JAN21SEVHCS-SEV $0.17 $0.99 Yes +$81.98 1h
KXKHLGAME-26JAN21SEVHCS-HCS $0.20 $0.93 Yes +$72.98 1h
KXFIRSTSUPERBOWLSONG-26FEB09-DTM $0.11 $0.63 Yes +$51.99 2h
KXUCLBTTS-26JAN21OMLFC $0.01 $0.50 Yes +$49.00 1h
KXNCAAWBGAME-26JAN21BULAF-LAF $0.43 $0.80 No +$36.96 3h

worst trades

ticker entry exit side pnl hold time
KXUCLBTTS-26JAN21GALATM $0.40 $0.01 No -$39.04 1h
KXUCLBTTS-26JAN21QARSGE $0.35 $0.01 No -$34.04 7h
KXFIRSTSUPERBOWLSONG-26FEB09-CAL $0.35 $0.07 Yes -$28.03 12h
KXUCLGAME-26JAN21ATAATH-ATA $0.46 $0.19 No -$27.05 14h

algorithm architecture

pipeline overview

the system uses a modular pipeline architecture with four stages:

sources -> filters -> scorers -> selector
  1. sources - retrieve market candidates from historical data
  2. filters - remove unsuitable markets
  3. scorers - compute feature scores for each candidate
  4. selector - pick top-k candidates for trading

current configuration

sources

source config
HistoricalMarketSource lookback: 24 hours

filters

filter config purpose
LiquidityFilter min_volume_24h: 100 reject illiquid markets
TimeToCloseFilter min: 2h, max: 720h avoid expiring/distant markets
AlreadyPositionedFilter max_position: 100 prevent over-concentration

typical filter stats (per interval):

  • ~17,000 candidates retrieved
  • ~10,000 pass liquidity filter (~58%)
  • ~7,200 pass time filter (~72% of remaining)
  • ~7,150 pass position filter (~99% of remaining)

scorers

the pipeline runs 8 independent scorers that each contribute features:

scorer features lookback description
MomentumScorer momentum 6h price change over lookback window
MultiTimeframeMomentumScorer mtf_momentum, mtf_divergence, mtf_alignment 1h, 4h, 12h, 24h multi-window momentum with divergence detection
MeanReversionScorer mean_reversion 24h deviation from historical mean
BollingerMeanReversionScorer bollinger_reversion, bollinger_position 24h, 2.0 std statistical band analysis
VolumeScorer volume 6h log ratio of recent vs avg hourly volume
OrderFlowScorer order_flow - buy/sell imbalance from taker_side
TimeDecayScorer time_decay - time value decay factor
CategoryWeightedScorer final_score - category-specific weighted ensemble

category-specific weights

the CategoryWeightedScorer applies different weight profiles based on market category:

default weights:

momentum:        0.20
mean_reversion:  0.20
volume:          0.15
time_decay:      0.10
order_flow:      0.15
bollinger:       0.10
mtf_momentum:    0.10

politics:

momentum:        0.35  (trend-following works)
mean_reversion:  0.10
order_flow:      0.15
mtf_momentum:    0.15

weather:

mean_reversion:  0.35  (converges to forecasts)
bollinger:       0.15
time_decay:      0.15

sports:

order_flow:      0.30  (sharp money matters)
momentum:        0.20
volume:          0.15

economics/financial:

momentum:        0.25
mean_reversion:  0.20
volume:          0.15

selector

selector config
TopKSelector k=5 (max_positions)

execution logic

position sizing

uses fractional kelly criterion for position sizing:

kelly_fraction = 0.25          // use 25% of kelly optimal
max_position_pct = 0.25        // max 25% of portfolio per trade
min_position_size = 10         // minimum 10 contracts
max_position_size = 100        // maximum 100 contracts

edge to probability mapping:

win_prob = (1 + tanh(edge)) / 2

this smoothly maps scoring edge to estimated win probability.

kelly formula:

kelly = (odds * win_prob - (1 - win_prob)) / odds
position_value = bankroll * min(kelly * kelly_fraction, max_position_pct)

side selection

the executor picks the cheaper side based on signal direction:

  • positive score (bullish) + yes_price < 0.5 -> buy YES
  • positive score (bullish) + yes_price >= 0.5 -> buy NO
  • negative score (bearish) + yes_price > 0.5 -> buy NO
  • negative score (bearish) + yes_price <= 0.5 -> buy YES

rationale: buying the cheaper side gives better risk/reward ratio.

exit conditions

positions are closed when any of these trigger:

condition threshold description
take_profit +20% lock in gains
stop_loss -15% limit downside
time_stop 72 hours prevent stale positions
score_reversal < -0.3 signal flipped against us

slippage model

  • 10 bps slippage applied to all fills
  • limit orders rejected if fill price exceeds limit by 2x slippage

data characteristics

current dataset: /mnt/work/kalshi-data/

file size description
markets.csv 6.6 GB market metadata, results, prices
trades.csv 66 MB individual trade records with taker_side

trade record schema:

timestamp, ticker, price, volume, taker_side

market record schema:

ticker, title, category, open_time, close_time, result, status,
yes_bid, yes_ask, volume, open_interest

known issues / future work

issues

  1. empty categories - return_by_category shows empty string, need to verify category parsing from market data

  2. no trading on jan 20 - equity curve shows no activity until jan 21 04:00, likely due to insufficient trade history in lookback window

  3. dead code warnings - several unused scorers and filters (CorrelationScorer, MLEnsembleScorer, etc.) - cleanup needed

planned improvements

  • category parsing fix
  • correlation scorer integration (granger causality between related markets)
  • ML model integration (ONNX runtime ready, needs trained models)
  • multi-day backtests with larger date ranges
  • parameter optimization / grid search
  • transaction cost analysis
  • position-level attribution

appendix: scorer formulas

momentum

momentum = price(t) - price(t - lookback_hours)

mean reversion

mean = avg(prices over lookback_hours)
deviation = current_price - mean
mean_reversion = -deviation

bollinger bands

mean = avg(prices)
std = stddev(prices)
upper_band = mean + 2.0 * std
lower_band = mean - 2.0 * std

if price >= upper_band:
    score = -(price - upper_band) / std
elif price <= lower_band:
    score = (lower_band - price) / std
else:
    score = -0.5 * (position - 0.5)  // weak mean reversion inside bands

volume

avg_hourly_volume = total_volume / hours_since_open
recent_hourly_volume = recent_volume / lookback_hours
volume_score = ln(recent_hourly_volume / avg_hourly_volume)

order flow

order_flow = (buy_volume - sell_volume) / (buy_volume + sell_volume)

time decay

hours_remaining = time_to_close
time_decay = 1 - 1 / (hours_remaining / 24 + 1)

ranges from 0 (about to close) to ~1 (distant expiry).