kalshi backtest progress
===

this document tracks the development progress, algorithm details, and backtest results for the kalshi prediction market trading system.

last updated: 2026-01-22


backtest run #1
---

**date:** 2026-01-22
**period:** 2026-01-20 to 2026-01-22 (2 days)
**initial capital:** $10,000
**interval:** 1 hour

### results summary

| metric | strategy | random baseline | delta |
|--------|----------|-----------------|-------|
| total return | +$993.61 (+9.94%) | -$51.00 (-0.51%) | +$1,044.61 |
| sharpe ratio | 5.448 | -2.436 | +7.884 |
| max drawdown | 1.26% | 0.51% | +0.75% |
| win rate | 58.7% | 0.0% | +58.7% |
| total trades | 46 | 0 | +46 |
| avg trade pnl | $4.59 | $0.00 | +$4.59 |
| avg hold time | 5.4 hrs | 0.0 hrs | +5.4 hrs |

### notable trades

| ticker | entry | exit | side | pnl | hold time |
|--------|-------|------|------|-----|-----------|
| KXKHLGAME-26JAN21SEVHCS-SEV | $0.17 | $0.99 | Yes | +$81.98 | 1h |
| KXKHLGAME-26JAN21SEVHCS-HCS | $0.20 | $0.93 | Yes | +$72.98 | 1h |
| KXFIRSTSUPERBOWLSONG-26FEB09-DTM | $0.11 | $0.63 | Yes | +$51.99 | 2h |
| KXUCLBTTS-26JAN21OMLFC | $0.01 | $0.50 | Yes | +$49.00 | 1h |
| KXNCAAWBGAME-26JAN21BULAF-LAF | $0.43 | $0.80 | No | +$36.96 | 3h |

### worst trades

| ticker | entry | exit | side | pnl | hold time |
|--------|-------|------|------|-----|-----------|
| KXUCLBTTS-26JAN21GALATM | $0.40 | $0.01 | No | -$39.04 | 1h |
| KXUCLBTTS-26JAN21QARSGE | $0.35 | $0.01 | No | -$34.04 | 7h |
| KXFIRSTSUPERBOWLSONG-26FEB09-CAL | $0.35 | $0.07 | Yes | -$28.03 | 12h |
| KXUCLGAME-26JAN21ATAATH-ATA | $0.46 | $0.19 | No | -$27.05 | 14h |


algorithm architecture
===

pipeline overview
---

the system uses a modular pipeline architecture with four stages:

```
sources -> filters -> scorers -> selector
```

1. **sources** - retrieve market candidates from historical data
2. **filters** - remove unsuitable markets
3. **scorers** - compute feature scores for each candidate
4. **selector** - pick top-k candidates for trading

current configuration
---

### sources

| source | config |
|--------|--------|
| HistoricalMarketSource | lookback: 24 hours |

### filters

| filter | config | purpose |
|--------|--------|---------|
| LiquidityFilter | min_volume_24h: 100 | reject illiquid markets |
| TimeToCloseFilter | min: 2h, max: 720h | avoid expiring/distant markets |
| AlreadyPositionedFilter | max_position: 100 | prevent over-concentration |

typical filter stats (per interval):
- ~17,000 candidates retrieved
- ~10,000 pass liquidity filter (~58%)
- ~7,200 pass time filter (~72% of remaining)
- ~7,150 pass position filter (~99% of remaining)

### scorers

the pipeline runs 8 independent scorers that each contribute features:

| scorer | features | lookback | description |
|--------|----------|----------|-------------|
| MomentumScorer | `momentum` | 6h | price change over lookback window |
| MultiTimeframeMomentumScorer | `mtf_momentum`, `mtf_divergence`, `mtf_alignment` | 1h, 4h, 12h, 24h | multi-window momentum with divergence detection |
| MeanReversionScorer | `mean_reversion` | 24h | deviation from historical mean |
| BollingerMeanReversionScorer | `bollinger_reversion`, `bollinger_position` | 24h, 2.0 std | statistical band analysis |
| VolumeScorer | `volume` | 6h | log ratio of recent vs avg hourly volume |
| OrderFlowScorer | `order_flow` | - | buy/sell imbalance from taker_side |
| TimeDecayScorer | `time_decay` | - | time value decay factor |
| CategoryWeightedScorer | `final_score` | - | category-specific weighted ensemble |

### category-specific weights

the CategoryWeightedScorer applies different weight profiles based on market category:

**default weights:**
```
momentum:        0.20
mean_reversion:  0.20
volume:          0.15
time_decay:      0.10
order_flow:      0.15
bollinger:       0.10
mtf_momentum:    0.10
```

**politics:**
```
momentum:        0.35  (trend-following works)
mean_reversion:  0.10
order_flow:      0.15
mtf_momentum:    0.15
```

**weather:**
```
mean_reversion:  0.35  (converges to forecasts)
bollinger:       0.15
time_decay:      0.15
```

**sports:**
```
order_flow:      0.30  (sharp money matters)
momentum:        0.20
volume:          0.15
```

**economics/financial:**
```
momentum:        0.25
mean_reversion:  0.20
volume:          0.15
```

### selector

| selector | config |
|----------|--------|
| TopKSelector | k=5 (max_positions) |


execution logic
===

position sizing
---

uses fractional kelly criterion for position sizing:

```rust
kelly_fraction = 0.25          // use 25% of kelly optimal
max_position_pct = 0.25        // max 25% of portfolio per trade
min_position_size = 10         // minimum 10 contracts
max_position_size = 100        // maximum 100 contracts
```

**edge to probability mapping:**
```
win_prob = (1 + tanh(edge)) / 2
```

this smoothly maps scoring edge to estimated win probability.

**kelly formula:**
```
kelly = (odds * win_prob - (1 - win_prob)) / odds
position_value = bankroll * min(kelly * kelly_fraction, max_position_pct)
```

side selection
---

the executor picks the cheaper side based on signal direction:

- positive score (bullish) + yes_price < 0.5 -> buy YES
- positive score (bullish) + yes_price >= 0.5 -> buy NO
- negative score (bearish) + yes_price > 0.5 -> buy NO
- negative score (bearish) + yes_price <= 0.5 -> buy YES

rationale: buying the cheaper side gives better risk/reward ratio.

exit conditions
---

positions are closed when any of these trigger:

| condition | threshold | description |
|-----------|-----------|-------------|
| take_profit | +20% | lock in gains |
| stop_loss | -15% | limit downside |
| time_stop | 72 hours | prevent stale positions |
| score_reversal | < -0.3 | signal flipped against us |

slippage model
---

- 10 bps slippage applied to all fills
- limit orders rejected if fill price exceeds limit by 2x slippage


data characteristics
===

current dataset: `/mnt/work/kalshi-data/`

| file | size | description |
|------|------|-------------|
| markets.csv | 6.6 GB | market metadata, results, prices |
| trades.csv | 66 MB | individual trade records with taker_side |

trade record schema:
```
timestamp, ticker, price, volume, taker_side
```

market record schema:
```
ticker, title, category, open_time, close_time, result, status,
yes_bid, yes_ask, volume, open_interest
```


known issues / future work
===

### issues

1. **empty categories** - return_by_category shows empty string, need to verify category parsing from market data

2. **no trading on jan 20** - equity curve shows no activity until jan 21 04:00, likely due to insufficient trade history in lookback window

3. **dead code warnings** - several unused scorers and filters (CorrelationScorer, MLEnsembleScorer, etc.) - cleanup needed

### planned improvements

- [ ] category parsing fix
- [ ] correlation scorer integration (granger causality between related markets)
- [ ] ML model integration (ONNX runtime ready, needs trained models)
- [ ] multi-day backtests with larger date ranges
- [ ] parameter optimization / grid search
- [ ] transaction cost analysis
- [ ] position-level attribution


appendix: scorer formulas
===

### momentum

```
momentum = price(t) - price(t - lookback_hours)
```

### mean reversion

```
mean = avg(prices over lookback_hours)
deviation = current_price - mean
mean_reversion = -deviation
```

### bollinger bands

```
mean = avg(prices)
std = stddev(prices)
upper_band = mean + 2.0 * std
lower_band = mean - 2.0 * std

if price >= upper_band:
    score = -(price - upper_band) / std
elif price <= lower_band:
    score = (lower_band - price) / std
else:
    score = -0.5 * (position - 0.5)  // weak mean reversion inside bands
```

### volume

```
avg_hourly_volume = total_volume / hours_since_open
recent_hourly_volume = recent_volume / lookback_hours
volume_score = ln(recent_hourly_volume / avg_hourly_volume)
```

### order flow

```
order_flow = (buy_volume - sell_volume) / (buy_volume + sell_volume)
```

### time decay

```
hours_remaining = time_to_close
time_decay = 1 - 1 / (hours_remaining / 24 + 1)
```

ranges from 0 (about to close) to ~1 (distant expiry).

backtest run #2
---

**date:** 2026-01-22
**period:** 2026-01-21 04:00 to 2026-01-21 06:00 (2 hours)
**initial capital:** $10,000
**interval:** 1 hour

### results summary

| metric | strategy | random baseline | delta |
|--------|----------|-----------------|-------|
| total return | +$502.81 (+5.03%) | $0.00 (0.00%) | +$502.81 |
| sharpe ratio | 68.845 | 0.000 | +68.845 |
| max drawdown | 0.00% | 0.00% | +0.00% |
| win rate | 100.0% | 0.0% | +100.0% |
| total trades | 1 (closed) | 0 | +1 |
| positions | 9 (open) | 0 | +9 |

*note: short duration used to validate regime detection logic.*

### architectural updates

1. **momentum acceleration scorer**
   - implemented second-order momentum (acceleration)
   - detects market turning points using fast/slow momentum divergence
   - derived from "momentum turning points" academic research

2. **regime adaptive scorer**
   - dynamic weight allocation based on market state
   - **bull:** favors trend following (momentum: 0.4)
   - **bear:** favors mean reversion (mean_reversion: 0.4)
   - **transition:** defensive positioning (time_decay: 0.3, volume: 0.2)
   - replaced static `CategoryWeightedScorer`

3. **data handling**
   - identified data gap before jan 21 03:00
   - adjusted backtest start time to align with available trade data

backtest run #3 (iteration 1)
---

**date:** 2026-01-22
**period:** 2026-01-20 00:00 to 2026-01-22 00:00 (2 days)
**initial capital:** $10,000
**interval:** 1 hour

### results summary

| metric | value |
|--------|-------|
| total return | +$412.85 (+4.13%) |
| sharpe ratio | 4.579 |
| max drawdown | 0.25% |
| win rate | 83.3% |
| total trades | 6 (closed) |
| positions | 49 (open) |
| avg trade pnl | $8.81 |
| avg hold time | 4.7 hours |

### comparison with previous runs

| metric | run #1 (2 days) | run #2 (2 hrs) | run #3 (2 days) | trend |
|--------|-----------------|----------------|-----------------|-------|
| total return | +9.94% | +5.03% | +4.13% | ↓ |
| sharpe ratio | 5.448 | 68.845* | 4.579 | ↓ |
| max drawdown | 1.26% | 0.00% | 0.25% | ↓ better |
| win rate | 58.7% | 100.0% | 83.3% | ↑ |

*run #2 sharpe inflated due to very short period

### architectural updates

1. **kalman price filter**
   - implements recursive kalman filtering for price estimation
   - outputs: filtered_price, innovation (deviation from prediction), uncertainty
   - filters noisy price observations to get better "true price" estimates
   - adapts to changing volatility automatically via adaptive gain

2. **VPIN scorer (volume-synchronized probability of informed trading)**
   - based on easley, lopez de prado, and o'hara (2012) research
   - measures flow toxicity using volume-bucketed order imbalance
   - outputs: vpin, flow_toxicity, informed_direction
   - high VPIN indicates presence of informed traders

3. **adaptive confidence scorer**
   - replaces RegimeAdaptiveScorer with confidence-weighted approach
   - uses kalman uncertainty, VPIN, and entropy to calculate confidence
   - scales all feature weights by confidence factor
   - dynamic weight profiles based on:
     - high VPIN + informed direction -> follow smart money (order_flow: 0.4)
     - turning point detected -> defensive (time_decay: 0.25)
     - bull regime -> trend following (momentum: 0.35)
     - bear regime -> mean reversion (mean_reversion: 0.35)
     - neutral -> balanced weights

### analysis

**why return decreased from run #1:**
1. the new AdaptiveConfidenceScorer is more conservative, scaling down weights when confidence is low
2. fewer positions taken overall (6 closed vs 46 in run #1)
3. tighter risk management - max drawdown improved from 1.26% to 0.25%

**positive improvements:**
- win rate increased from 58.7% to 83.3%
- avg trade pnl increased from $4.59 to $8.81
- max drawdown decreased significantly (better risk-adjusted returns)
- sharpe ratio still positive at 4.579

**next iteration considerations:**
1. the confidence scaling may be too aggressive - consider relaxing the uncertainty multiplier
2. need to tune the VPIN thresholds for detecting informed trading
3. kalman filter process_noise and measurement_noise parameters could be optimized
4. should add cross-validation with different market regimes

### scorer pipeline (run #3)

```
MomentumScorer (6h) -> momentum
MultiTimeframeMomentumScorer (1h,4h,12h,24h) -> mtf_momentum, mtf_divergence, mtf_alignment
MeanReversionScorer (24h) -> mean_reversion
BollingerMeanReversionScorer (24h, 2.0 std) -> bollinger_reversion, bollinger_position
VolumeScorer (6h) -> volume
OrderFlowScorer -> order_flow
TimeDecayScorer -> time_decay
VolatilityScorer (24h) -> volatility
EntropyScorer (24h) -> entropy
RegimeDetector (24h) -> regime
MomentumAccelerationScorer (3h fast, 12h slow) -> momentum_acceleration, momentum_regime, turning_point
CorrelationScorer (24h, lag 6) -> correlation
KalmanPriceFilter (24h) -> kalman_price, kalman_innovation, kalman_uncertainty
VPINScorer (bucket 50, 20 buckets) -> vpin, flow_toxicity, informed_direction
AdaptiveConfidenceScorer -> final_score, confidence
```

### research sources

- kalman filtering: https://questdb.com/glossary/kalman-filter-for-time-series-forecasting/
- VPIN/flow toxicity: https://www.stern.nyu.edu/sites/default/files/assets/documents/con_035928.pdf
- kelly criterion for prediction markets: https://arxiv.org/html/2412.14144v1
- order flow imbalance: https://www.emergentmind.com/topics/order-flow-imbalance

### thoughts for next iteration

the lower return is concerning but the improved win rate and reduced drawdown suggest the model is making better quality trades, just fewer of them. the confidence mechanism might be too conservative.

potential improvements:
1. reduce uncertainty_factor multiplier from 5.0 to 2.0-3.0
2. add a minimum confidence threshold before suppressing trades entirely
3. explore bayesian updating of the kalman filter parameters based on prediction accuracy
4. add cross-market correlation features (currently CorrelationScorer only does autocorrelation)

backtest run #4 (iteration 2)
---

**date:** 2026-01-22
**period:** 2026-01-20 00:00 to 2026-01-22 00:00 (2 days)
**initial capital:** $10,000
**interval:** 1 hour

### results summary

| metric | original config | with kalman/VPIN |
|--------|-----------------|------------------|
| total return | +$403.69 (4.04%) | +$356.82 (3.57%) |
| sharpe ratio | 3.540 | 4.052 |
| max drawdown | 1.50% | 0.85% |
| win rate | 40.9% | 60.0% |
| total trades | 22 | 5 |
| avg trade pnl | -$7.57 | $9.17 |

### iteration 2 analysis - what went wrong

**root cause identified:** the original run #1 used `CategoryWeightedScorer` with a much simpler pipeline:
- MomentumScorer
- MultiTimeframeMomentumScorer
- MeanReversionScorer
- BollingerMeanReversionScorer
- VolumeScorer
- OrderFlowScorer
- TimeDecayScorer
- CategoryWeightedScorer

subsequent iterations added:
- VolatilityScorer
- EntropyScorer
- RegimeDetector
- MomentumAccelerationScorer
- CorrelationScorer
- KalmanPriceFilter
- VPINScorer
- AdaptiveConfidenceScorer / RegimeAdaptiveScorer

**key findings:**

1. **AdaptiveConfidenceScorer caused massive trade reduction**
   - original confidence formula: `1/(1 + uncertainty*5)` with 0.1 floor
   - at uncertainty=0.5, confidence=0.29, scaling ALL weights down by 70%
   - this suppressed nearly all trading signals
   - trade count dropped from 46 (run #1) to 5-6 (iter 1)

2. **adding more scorers != better predictions**
   - the additional scorers (RegimeDetector, Entropy, Correlation) added noise
   - each scorer contributes features that may conflict or dilute strong signals
   - "forecast combination puzzle" - simple equal weights often beat sophisticated methods

3. **kalman filter and VPIN didn't help**
   - removing them had no measurable impact on returns
   - they may be useful features but weren't being utilized effectively

**attempted fixes in iteration 2:**
- reduced uncertainty multiplier from 5.0 to 2.0
- raised confidence floor from 0.1 to 0.4
- added signal_strength bonus for strong raw signals
- lowered VPIN thresholds from 0.6 to 0.4
- changed confidence to post-multiplier instead of weight-scaling

**none of these fixes restored original performance**

### lessons learned

1. **simplicity wins** - the original 8-scorer pipeline with CategoryWeightedScorer worked best
2. **confidence scaling is dangerous** - multiplying weights by confidence suppresses signals too aggressively
3. **test incrementally** - should have added one scorer at a time and measured impact
4. **beware over-engineering** - the research on kalman filters and VPIN is academically interesting but added complexity without improving results
5. **preserve baseline** - should have kept the original working config in a separate branch

### next iteration direction

rather than adding more complexity, focus on:
1. restoring original simple pipeline
2. tuning existing weights based on category performance
3. improving exit logic rather than entry signals
4. maybe add ONE new feature at a time with A/B testing

backtest run #5 (iteration 3)
---

**date:** 2026-01-22
**period:** 2026-01-20 00:00 to 2026-01-22 00:00 (2 days)
**initial capital:** $10,000
**interval:** 1 hour

### results summary

| metric | strategy | random baseline | delta |
|--------|----------|-----------------|-------|
| total return | +$936.61 (+9.37%) | -$8.00 (-0.08%) | +$944.61 |
| sharpe ratio | 6.491 | -2.291 | +8.782 |
| max drawdown | 0.33% | 0.08% | +0.25% |
| win rate | 100.0% | 0.0% | +100.0% |
| total trades | 9 | 0 | +9 |
| positions (open) | 46 | 0 | +46 |
| avg trade pnl | $25.32 | $0.00 | +$25.32 |

### comparison with previous runs

| metric | run #4 (iter 2) | run #5 (iter 3) | change |
|--------|-----------------|-----------------|--------|
| total return | +4.04% | +9.37% | **+132%** |
| sharpe ratio | 3.540 | 6.491 | **+83%** |
| max drawdown | 1.50% | 0.33% | **-78%** |
| win rate | 40.9% | 100.0% | **+144%** |
| total trades | 22 | 9 | -59% |
| avg trade pnl | -$7.57 | +$25.32 | **+$32.89** |

### key discovery: stop losses hurt prediction market returns

**root cause analysis:**

during iteration 3, we discovered that the original trades.csv data was overwritten after run #1, making it impossible to reproduce those results. this led us to investigate why the "restored" pipeline (iter 2) performed poorly.

analysis of trade logs revealed:
1. **stop losses triggered at -67% to -97%**, not at the configured -15%
2. exits only checked at hourly intervals - prices gapped through stops
3. prediction market prices can move discontinuously (binary outcomes, news)

example failed stop losses from run #4:
- KXSPACEXCOUNT: stop triggered at **-67.4%** (configured -15%)
- KXUCLBTTS: stop triggered at **-97.5%** (configured -15%)
- KXNCAAWBGAME: stop triggered at **-95.0%** (configured -15%)

### exit strategy optimization

we tested 5 exit configurations:

| config | return | sharpe | drawdown | win rate |
|--------|--------|--------|----------|----------|
| baseline (20% TP, 15% SL) | +4.04% | 3.540 | 1.50% | 40.9% |
| 100% TP, no SL | +9.44% | 6.458 | 0.55% | 100% |
| resolution only | +7.16% | 4.388 | 2.12% | n/a |
| **50% TP, no SL** | **+9.37%** | **6.491** | **0.33%** | **100%** |
| 75% TP, no SL | +9.28% | 6.381 | 0.45% | 100% |

**winner: 50% take profit, no stop loss**
- highest sharpe ratio (6.491)
- lowest max drawdown (0.33%)
- good capital recycling (9 closed trades vs 4)

### implementation changes

**new default exit config (src/types.rs):**
```rust
take_profit_pct: 0.50,   // exit at +50% (was 0.20)
stop_loss_pct: 0.99,     // disabled (was 0.15)
max_hold_hours: 48,      // shorter (was 72)
score_reversal_threshold: -0.5,
```

**rationale:**
1. **stop losses don't work** for prediction markets
   - prices gap through hourly checks
   - binary outcomes mean temp drops don't invalidate bets
   - position sizing limits max loss instead

2. **50% take profit** balances two goals:
   - locks in gains before potential reversal
   - lets winners run further than 20% (which cut gains short)

3. **shorter hold time (48h)** for 2-day backtests
   - ensures positions resolve or exit within test period

### lessons learned

1. **prediction markets ≠ traditional trading**
   - traditional stop losses assume continuous price paths
   - binary outcomes can cause discontinuous jumps
   - holding to resolution is often optimal

2. **exit strategy matters as much as entry**
   - iteration 3 used the SAME entry signals as iteration 2
   - only changed exit parameters
   - return increased 132% (4.04% → 9.37%)

3. **test before theorizing**
   - academic research on stop losses assumes continuous markets
   - empirical testing revealed the opposite for prediction markets

### research sources

- optimal trailing stop (Leung & Zhang 2021): https://medium.com/quantitative-investing/optimal-trading-with-a-trailing-stop-796964fc892a
- forecast combination: https://www.sciencedirect.com/science/article/abs/pii/S0169207021000650
- exit strategies empirical: https://www.quantifiedstrategies.com/trading-exit-strategies/

### thoughts for next iteration

the exit strategy optimization was a major win. next iteration should consider:

1. **position sizing optimization**
   - current kelly fraction is 0.25, may be too conservative
   - with 100% win rate, could increase bet sizing

2. **entry signal filtering**
   - 46 positions still open at end of backtest
   - could add filters to reduce position count for capital efficiency

3. **category-specific exit tuning**
   - sports markets may need different exits than politics
   - crypto markets have different volatility profiles

4. **longer backtest period**
   - current data covers only 2 days
   - need to test across different market conditions

backtest run #6 (iteration 4)
---

**date:** 2026-01-22
**period:** 2026-01-20 00:00 to 2026-01-22 00:00 (2 days)
**initial capital:** $10,000
**interval:** 1 hour

### results summary

| metric | strategy | random baseline | delta |
|--------|----------|-----------------|-------|
| total return | +$1,898.45 (+18.98%) | $0.00 (0.00%) | +$1,898.45 |
| sharpe ratio | 2.814 | 0.000 | +2.814 |
| max drawdown | 0.79% | 0.00% | +0.79% |
| win rate | 100.0% | 0.0% | +100.0% |
| total trades | 10 | 0 | +10 |
| positions (open) | 100 | 0 | +100 |

### comparison with previous runs

| metric | iter 3 | iter 4 | change |
|--------|--------|--------|--------|
| total return | +9.37% | **+18.98%** | **+102%** |
| sharpe ratio | 6.491 | 2.814 | -57% |
| max drawdown | 0.33% | 0.79% | +139% |
| win rate | 100.0% | 100.0% | 0% |
| total trades | 9 | 10 | +11% |
| positions | 46 | 100 | +117% |

### key discovery: diversification beats concentration in prediction markets

**surprising finding:** concentration hurts returns in prediction markets!

this contradicts conventional wisdom ("best ideas outperform") but makes sense for binary outcomes:

| max_positions | return | sharpe | win rate | trades |
|---------------|--------|--------|----------|--------|
| 5 | 0.24% | 0.986 | 100% | 1 |
| 10 | 0.47% | 1.902 | 100% | 2 |
| 30 | 3.12% | 3.109 | 100% | 3 |
| 50 | 7.97% | 2.593 | 100% | 5 |
| 100 | 18.98% | 2.814 | 100% | 10 |
| 200 | 38.88% | 2.995 | 97.5% | 40 |
| 500 | 96.10% | 3.295 | 95.4% | 87 |
| 1000 | **105.55%** | **3.495** | 95.7% | 94 |

**why diversification wins for prediction markets:**

1. **binary payouts** - each position has positive expected value
   - more positions = more chances to capture binary wins
   - unlike stocks, losers go to 0 quickly (can't average down)

2. **model has positive edge**
   - if scoring model has +EV on average, more bets = more profit
   - law of large numbers favors diversification

3. **capital utilization**
   - concentrated portfolios leave cash idle
   - diversified approach deploys all capital
   - with 1000 positions, cash went to $0.00

4. **different from stock picking**
   - "best ideas" research assumes winners can compound
   - prediction markets resolve quickly (days/weeks)
   - can't hold winners long-term

### bug fix: max_positions enforcement

discovered that max_positions wasn't being enforced - positions accumulated each hour without limit. added check in backtest loop:

```rust
for signal in signals {
    // enforce max_positions limit
    if context.portfolio.positions.len() >= self.config.max_positions {
        break;
    }
    // ...
}
```

### implementation changes

**new defaults:**
```rust
// src/main.rs CLI defaults
max_positions: 100      // was 5
kelly_fraction: 0.40    // was 0.25
max_position_pct: 0.30  // was 0.25

// src/execution.rs PositionSizingConfig
kelly_fraction: 0.40
max_position_pct: 0.30
```

### note on sharpe ratio decrease

sharpe dropped from 6.491 (iter 3) to 2.814 (iter 4) despite 2x higher returns because:
- more positions = more variance in equity curve
- sharpe measures risk-adjusted returns
- still a strong positive sharpe (>1.0 is generally good)

the trade-off is worth it: double the returns for lower risk-adjusted ratio.

### research sources

- kelly criterion for prediction markets: https://arxiv.org/html/2412.14144
- concentrated portfolios: https://www.bbh.com/us/en/insights/capital-partners-insights/the-benefits-of-concentrated-portfolios.html
- position sizing research: https://thescienceofhitting.com/p/position-sizing

### thoughts for next iteration

iteration 4 was a paradigm shift. next iteration should consider:

1. **push diversification further**
   - 1000 positions gave 105% return (2x capital!)
   - limited by cash, not max_positions
   - could explore leverage or smaller position sizes

2. **validate with longer backtest**
   - 2-day window is very short
   - need to test if diversification holds across market regimes

3. **position sizing optimization**
   - current kelly approach may not be optimal
   - with many positions, equal weighting might work better

4. **transaction costs**
   - many positions = many transactions
   - need to model realistic slippage and fees

5. **examine edge by category**
   - sports vs politics vs crypto
   - may find some categories have stronger edge