autonomous trader agent | sarthak biswas

autonomous trading system for indian equity markets using cross-sectional reversal scoring on 96 nifty stocks. backtested 8.6% cagr with 60% win rate over 5.4 years.

/ the strategy

cross-sectional reversal — ranks 96 nifty stocks by magnitude of decline over a 5-21 day lookback, buys the most oversold, holds for 5 trading days
the edge is behavioral: panic selling pushes stocks below fair value, creating a mean-reversion opportunity that algorithms can't easily arbitrage away
information coefficient: +0.020 (large-cap), +0.025 (midcap) — a small but consistent edge compounded over thousands of trades

/ research journey

tested 6 strategies systematically before finding the edge
5 failed: intraday ml prediction, breakout detection, 5-min mean reversion, 30-min trend following, cross-sectional ml — indian large-cap stocks are too efficient at intraday resolution
daily reversal was the only signal that survived — driven by human psychology, not technical patterns
evolved through 4 versions of allocation logic, each improving capital efficiency — the underlying signal never changed

/ how it works

3-state regime classifier (bull/neutral/weak) using nifty vs 50-dma, momentum, and market breadth with a 2-day persistence filter
adaptive confidence scoring: continuous 0-1 score combining ic, rolling win rate, momentum, and breadth for smooth capital allocation
risk controls: regime-based exposure gates, soft drawdown dampening, recovery boost, kill switches on declining win rates or negative ic, panic filters
a/b pipeline testing with independent scan intervals, capital pools, and paper broker instances for isolated comparison

/ results

backtested over 5.4 years (oct 2020 – jan 2025): 8.6% cagr, 42% total return, 60% win rate
survived the 2025-26 bear market with 6.5% cagr and 9-16% max drawdown
large-cap returns: +38% | midcap returns: +108% (2.8x higher)
~52% average capital deployment — the rest held as a protective cash buffer

/ what's next

this is the target policy model for rl training — the stock-trader-env project provides the verifiable reward environment
goal: use grpo to train the agent's decision-making on thousands of simulated rollouts, optimizing for sharpe ratio and risk discipline
replacing rule-based scoring with a learned policy that adapts to market conditions

/ how it works

01regime classifier evaluates market conditions (bull/neutral/weak)

02confidence scorer computes allocation weight from ic, win rate, momentum, breadth

03reversal scanner ranks stocks by decline magnitude across lookback windows

04risk guardian validates exposure limits, drawdown gates, and kill switches

05trade executor places orders via zerodha kite connect (cnc for swing holding)

/ features

cross-sectional reversal scoring

ranks 96 nifty stocks by decline magnitude. information coefficient: +0.020 (large-cap), +0.025 (midcap). exploits behavioral overreaction — a structural edge driven by psychology, not patterns algorithms can arbitrage away.

3-state regime classifier

classifies market as bull (65-85% exposure), neutral (50-75%), or weak (8-40%) using nifty vs 50-dma, momentum, and breadth. 2-day persistence filter prevents whipsawing.

adaptive confidence scoring

continuous 0-1 scoring combining information coefficient, rolling win rate, momentum, and market breadth. replaces hard thresholds for smoother capital allocation.

a/b pipeline testing

two independent pipelines with separate scan intervals and capital pools. each pipeline runs its own paper broker instance for isolated comparison.

risk management layers

regime-based exposure gates, soft drawdown dampening (gentle in bull, aggressive in weak), recovery boost when signal improves during drawdown recovery, and kill switches that pause trading on declining win rates or negative ic.

research-driven development

tested 6 strategies systematically before finding the edge. 5 failed (ml prediction, breakouts, intraday mean reversion, trend following, cross-sectional ml). every version improvement came from better capital allocation — the signal never changed.