2026-01-22 06:19:18 -07:00

8.0 KiB

Raw Permalink Blame History

kalshi backtest progress

this document tracks the development progress, algorithm details, and backtest results for the kalshi prediction market trading system.

last updated: 2026-01-22

backtest run #1

date: 2026-01-22 period: 2026-01-20 to 2026-01-22 (2 days) initial capital: $10,000 interval: 1 hour

results summary

metric	strategy	random baseline	delta
total return	+$993.61 (+9.94%)	-$51.00 (-0.51%)	+$1,044.61
sharpe ratio	5.448	-2.436	+7.884
max drawdown	1.26%	0.51%	+0.75%
win rate	58.7%	0.0%	+58.7%
total trades	46	0	+46
avg trade pnl	$4.59	$0.00	+$4.59
avg hold time	5.4 hrs	0.0 hrs	+5.4 hrs

notable trades

ticker	entry	exit	side	pnl	hold time
KXKHLGAME-26JAN21SEVHCS-SEV	$0.17	$0.99	Yes	+$81.98	1h
KXKHLGAME-26JAN21SEVHCS-HCS	$0.20	$0.93	Yes	+$72.98	1h
KXFIRSTSUPERBOWLSONG-26FEB09-DTM	$0.11	$0.63	Yes	+$51.99	2h
KXUCLBTTS-26JAN21OMLFC	$0.01	$0.50	Yes	+$49.00	1h
KXNCAAWBGAME-26JAN21BULAF-LAF	$0.43	$0.80	No	+$36.96	3h

worst trades

ticker	entry	exit	side	pnl	hold time
KXUCLBTTS-26JAN21GALATM	$0.40	$0.01	No	-$39.04	1h
KXUCLBTTS-26JAN21QARSGE	$0.35	$0.01	No	-$34.04	7h
KXFIRSTSUPERBOWLSONG-26FEB09-CAL	$0.35	$0.07	Yes	-$28.03	12h
KXUCLGAME-26JAN21ATAATH-ATA	$0.46	$0.19	No	-$27.05	14h

algorithm architecture

pipeline overview

the system uses a modular pipeline architecture with four stages:

sources -> filters -> scorers -> selector

sources - retrieve market candidates from historical data
filters - remove unsuitable markets
scorers - compute feature scores for each candidate
selector - pick top-k candidates for trading

current configuration

sources

source	config
HistoricalMarketSource	lookback: 24 hours

filters

filter	config	purpose
LiquidityFilter	min_volume_24h: 100	reject illiquid markets
TimeToCloseFilter	min: 2h, max: 720h	avoid expiring/distant markets
AlreadyPositionedFilter	max_position: 100	prevent over-concentration

typical filter stats (per interval):

~17,000 candidates retrieved
~10,000 pass liquidity filter (~58%)
~7,200 pass time filter (~72% of remaining)
~7,150 pass position filter (~99% of remaining)

scorers

the pipeline runs 8 independent scorers that each contribute features:

scorer	features	lookback	description
MomentumScorer	`momentum`	6h	price change over lookback window
MultiTimeframeMomentumScorer	`mtf_momentum`, `mtf_divergence`, `mtf_alignment`	1h, 4h, 12h, 24h	multi-window momentum with divergence detection
MeanReversionScorer	`mean_reversion`	24h	deviation from historical mean
BollingerMeanReversionScorer	`bollinger_reversion`, `bollinger_position`	24h, 2.0 std	statistical band analysis
VolumeScorer	`volume`	6h	log ratio of recent vs avg hourly volume
OrderFlowScorer	`order_flow`	-	buy/sell imbalance from taker_side
TimeDecayScorer	`time_decay`	-	time value decay factor
CategoryWeightedScorer	`final_score`	-	category-specific weighted ensemble

category-specific weights

the CategoryWeightedScorer applies different weight profiles based on market category:

default weights:

momentum:        0.20
mean_reversion:  0.20
volume:          0.15
time_decay:      0.10
order_flow:      0.15
bollinger:       0.10
mtf_momentum:    0.10

politics:

momentum:        0.35  (trend-following works)
mean_reversion:  0.10
order_flow:      0.15
mtf_momentum:    0.15

weather:

mean_reversion:  0.35  (converges to forecasts)
bollinger:       0.15
time_decay:      0.15

sports:

order_flow:      0.30  (sharp money matters)
momentum:        0.20
volume:          0.15

economics/financial:

momentum:        0.25
mean_reversion:  0.20
volume:          0.15

selector

selector	config
TopKSelector	k=5 (max_positions)

execution logic

position sizing

uses fractional kelly criterion for position sizing:

kelly_fraction = 0.25          // use 25% of kelly optimal
max_position_pct = 0.25        // max 25% of portfolio per trade
min_position_size = 10         // minimum 10 contracts
max_position_size = 100        // maximum 100 contracts

edge to probability mapping:

win_prob = (1 + tanh(edge)) / 2

this smoothly maps scoring edge to estimated win probability.

kelly formula:

kelly = (odds * win_prob - (1 - win_prob)) / odds
position_value = bankroll * min(kelly * kelly_fraction, max_position_pct)

side selection

the executor picks the cheaper side based on signal direction:

positive score (bullish) + yes_price < 0.5 -> buy YES
positive score (bullish) + yes_price >= 0.5 -> buy NO
negative score (bearish) + yes_price > 0.5 -> buy NO
negative score (bearish) + yes_price <= 0.5 -> buy YES

rationale: buying the cheaper side gives better risk/reward ratio.

exit conditions

positions are closed when any of these trigger:

condition	threshold	description
take_profit	+20%	lock in gains
stop_loss	-15%	limit downside
time_stop	72 hours	prevent stale positions
score_reversal	< -0.3	signal flipped against us

slippage model

10 bps slippage applied to all fills
limit orders rejected if fill price exceeds limit by 2x slippage

data characteristics

current dataset: /mnt/work/kalshi-data/

file	size	description
markets.csv	6.6 GB	market metadata, results, prices
trades.csv	66 MB	individual trade records with taker_side

trade record schema:

timestamp, ticker, price, volume, taker_side

market record schema:

ticker, title, category, open_time, close_time, result, status,
yes_bid, yes_ask, volume, open_interest

known issues / future work

issues

empty categories - return_by_category shows empty string, need to verify category parsing from market data
no trading on jan 20 - equity curve shows no activity until jan 21 04:00, likely due to insufficient trade history in lookback window
dead code warnings - several unused scorers and filters (CorrelationScorer, MLEnsembleScorer, etc.) - cleanup needed

planned improvements

category parsing fix
correlation scorer integration (granger causality between related markets)
ML model integration (ONNX runtime ready, needs trained models)
multi-day backtests with larger date ranges
parameter optimization / grid search
transaction cost analysis
position-level attribution

appendix: scorer formulas

momentum

momentum = price(t) - price(t - lookback_hours)

mean reversion

mean = avg(prices over lookback_hours)
deviation = current_price - mean
mean_reversion = -deviation

bollinger bands

mean = avg(prices)
std = stddev(prices)
upper_band = mean + 2.0 * std
lower_band = mean - 2.0 * std

if price >= upper_band:
    score = -(price - upper_band) / std
elif price <= lower_band:
    score = (lower_band - price) / std
else:
    score = -0.5 * (position - 0.5)  // weak mean reversion inside bands

volume

avg_hourly_volume = total_volume / hours_since_open
recent_hourly_volume = recent_volume / lookback_hours
volume_score = ln(recent_hourly_volume / avg_hourly_volume)

order flow

order_flow = (buy_volume - sell_volume) / (buy_volume + sell_volume)

time decay

hours_remaining = time_to_close
time_decay = 1 - 1 / (hours_remaining / 24 + 1)

ranges from 0 (about to close) to ~1 (distant expiry).

8.0 KiB Raw Permalink Blame History