kalshi prediction market backtesting framework with: - trading pipeline (sources, filters, scorers, selectors) - position sizing with kelly criterion - multiple scoring strategies (momentum, mean reversion, etc) - random baseline for comparison refactoring includes: - extract shared resolve_closed_positions() function - reduce RandomBaseline::run() nesting with helper functions - move MarketCandidate Default impl to types.rs - add explanatory comments to complex logic
237 lines
5.6 KiB
Markdown
237 lines
5.6 KiB
Markdown
kalshi-backtest
|
|
===
|
|
|
|
quant-level backtesting framework for kalshi prediction markets, using a candidate pipeline architecture.
|
|
|
|
|
|
features
|
|
---
|
|
|
|
- **multi-timeframe momentum** - detects divergence between short and long-term trends
|
|
- **bollinger bands mean reversion** - signals when price touches statistical extremes
|
|
- **order flow analysis** - tracks buying vs selling pressure via taker_side
|
|
- **kelly criterion position sizing** - dynamic sizing based on edge and win probability
|
|
- **exit signals** - take profit, stop loss, time stops, and score reversal triggers
|
|
- **category-aware weighting** - different strategies for politics, weather, sports, etc.
|
|
- **ensemble scoring** - combine multiple models with dynamic weighting
|
|
- **cross-market correlations** - lead-lag relationships between related markets
|
|
- **ML ensemble (optional)** - LSTM + MLP models via ONNX runtime
|
|
|
|
|
|
architecture
|
|
---
|
|
|
|
```
|
|
Historical Data (CSV)
|
|
|
|
|
v
|
|
+------------------+
|
|
| Backtest Loop | <- simulates time progression
|
|
+------------------+
|
|
|
|
|
v
|
|
+------------------+
|
|
| Candidate Pipeline |
|
|
+------------------+
|
|
| |
|
|
v v
|
|
Sources Filters -> Scorers -> Selector
|
|
|
|
|
v
|
|
+------------------+
|
|
| Trade Executor | <- kelly sizing, exit signals
|
|
+------------------+
|
|
|
|
|
v
|
|
+------------------+
|
|
| P&L Tracker | <- tracks positions, returns
|
|
+------------------+
|
|
|
|
|
v
|
|
Performance Metrics
|
|
```
|
|
|
|
|
|
data format
|
|
---
|
|
|
|
fetch data from kalshi API using the included script:
|
|
|
|
```bash
|
|
python scripts/fetch_kalshi_data.py
|
|
```
|
|
|
|
or download from https://www.deltabase.tech/
|
|
|
|
**markets.csv**:
|
|
```csv
|
|
ticker,title,category,open_time,close_time,result,status,yes_bid,yes_ask,volume,open_interest
|
|
PRES-2024-DEM,Will Democrats win?,politics,2024-01-01 00:00:00,2024-11-06 00:00:00,no,finalized,45,47,10000,5000
|
|
```
|
|
|
|
**trades.csv**:
|
|
```csv
|
|
timestamp,ticker,price,volume,taker_side
|
|
2024-01-05 12:00:00,PRES-2024-DEM,45,100,yes
|
|
2024-01-05 13:00:00,PRES-2024-DEM,46,50,no
|
|
```
|
|
|
|
|
|
usage
|
|
---
|
|
|
|
```bash
|
|
# build
|
|
cargo build --release
|
|
|
|
# run backtest with quant features
|
|
cargo run --release -- run \
|
|
--data-dir data \
|
|
--start 2024-01-01 \
|
|
--end 2024-06-01 \
|
|
--capital 10000 \
|
|
--max-position 500 \
|
|
--max-positions 10 \
|
|
--kelly-fraction 0.25 \
|
|
--max-position-pct 0.25 \
|
|
--take-profit 0.20 \
|
|
--stop-loss 0.15 \
|
|
--max-hold-hours 72 \
|
|
--compare-random
|
|
|
|
# view results
|
|
cargo run --release -- summary --results-file results/backtest_result.json
|
|
```
|
|
|
|
|
|
cli options
|
|
---
|
|
|
|
| option | default | description |
|
|
|--------|---------|-------------|
|
|
| --data-dir | data | directory with markets.csv and trades.csv |
|
|
| --start | required | backtest start date |
|
|
| --end | required | backtest end date |
|
|
| --capital | 10000 | initial capital |
|
|
| --max-position | 100 | max shares per position |
|
|
| --max-positions | 5 | max concurrent positions |
|
|
| --kelly-fraction | 0.25 | fraction of kelly criterion (0.1=conservative, 1.0=full) |
|
|
| --max-position-pct | 0.25 | max % of capital per position |
|
|
| --take-profit | 0.20 | take profit threshold (20% gain) |
|
|
| --stop-loss | 0.15 | stop loss threshold (15% loss) |
|
|
| --max-hold-hours | 72 | time stop in hours |
|
|
| --compare-random | false | compare vs random baseline |
|
|
|
|
|
|
scorers
|
|
---
|
|
|
|
**basic scorers**:
|
|
- `MomentumScorer` - price change over lookback period
|
|
- `MeanReversionScorer` - deviation from historical mean
|
|
- `VolumeScorer` - unusual volume detection
|
|
- `TimeDecayScorer` - prefer markets with more time to close
|
|
|
|
**quant scorers**:
|
|
- `MultiTimeframeMomentumScorer` - analyzes 1h, 4h, 12h, 24h windows, detects divergence
|
|
- `BollingerMeanReversionScorer` - triggers at upper/lower band touches (2 std)
|
|
- `OrderFlowScorer` - buy/sell imbalance from taker_side
|
|
- `CategoryWeightedScorer` - different weights per category
|
|
- `EnsembleScorer` - combines models with dynamic weights
|
|
- `CorrelationScorer` - cross-market lead-lag signals
|
|
|
|
**ml scorers** (requires `ml` feature):
|
|
- `MLEnsembleScorer` - LSTM + MLP via ONNX
|
|
|
|
|
|
position sizing
|
|
---
|
|
|
|
uses kelly criterion with safety multiplier:
|
|
|
|
```
|
|
kelly = (odds * win_prob - (1 - win_prob)) / odds
|
|
safe_kelly = kelly * kelly_fraction
|
|
position = min(bankroll * safe_kelly, max_position_pct * bankroll)
|
|
```
|
|
|
|
|
|
exit signals
|
|
---
|
|
|
|
positions can exit via:
|
|
1. **resolution** - market resolves yes/no
|
|
2. **take profit** - pnl exceeds threshold
|
|
3. **stop loss** - pnl below threshold
|
|
4. **time stop** - held too long (capital rotation)
|
|
5. **score reversal** - strategy flips bearish
|
|
|
|
|
|
ml training (optional)
|
|
---
|
|
|
|
train ML models using pytorch, then export to ONNX:
|
|
|
|
```bash
|
|
# install dependencies
|
|
pip install torch pandas numpy
|
|
|
|
# train models
|
|
python scripts/train_ml_models.py \
|
|
--data data/trades.csv \
|
|
--markets data/markets.csv \
|
|
--output models/ \
|
|
--epochs 50
|
|
|
|
# enable ml feature
|
|
cargo build --release --features ml
|
|
```
|
|
|
|
|
|
metrics
|
|
---
|
|
|
|
- total return ($ and %)
|
|
- sharpe ratio (annualized)
|
|
- max drawdown
|
|
- win rate
|
|
- average trade P&L
|
|
- average hold time
|
|
- trades per day
|
|
- return by category
|
|
|
|
|
|
extending
|
|
---
|
|
|
|
add custom scorers by implementing the `Scorer` trait:
|
|
|
|
```rust
|
|
use async_trait::async_trait;
|
|
|
|
pub struct MyScorer;
|
|
|
|
#[async_trait]
|
|
impl Scorer for MyScorer {
|
|
fn name(&self) -> &'static str {
|
|
"MyScorer"
|
|
}
|
|
|
|
async fn score(
|
|
&self,
|
|
context: &TradingContext,
|
|
candidates: &[MarketCandidate],
|
|
) -> Result<Vec<MarketCandidate>, String> {
|
|
// compute scores...
|
|
}
|
|
|
|
fn update(&self, candidate: &mut MarketCandidate, scored: MarketCandidate) {
|
|
if let Some(score) = scored.scores.get("my_score") {
|
|
candidate.scores.insert("my_score".to_string(), *score);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
then add to the pipeline in `backtest.rs`.
|