Skip to main content

Command Palette

Search for a command to run...

I Built an AI Trading System Where Agents Argue Before Spending My Money

Updated
6 min read
I Built an AI Trading System Where Agents Argue Before Spending My Money
S

CSE Grad (AI & ML), MSRIT ’25 | Love building practical tech, contributing to open-source, and sharing my learnings. Always exploring new ideas to make tech more useful.

Here's the idea: what if you didn't let one AI model make trading decisions? What if you made two of them fight about it first?

I built sudo-trade — a multi-agent system that researches Indian stocks, screens for opportunities, then forces a bull agent and a bear agent to argue the case before a neutral judge renders a verdict. All running autonomously during NSE market hours. Paper trading for now. Real money when I trust it enough.

The whole thing is ~7,000 lines of Python, no frameworks. Just asyncio, an event bus, and a bunch of LLMs that disagree with each other for a living.

Why

I wanted to know if one developer with Python and access to Claude could build something that actually trades intelligently — not just a wrapper around an API that fires market orders when RSI crosses 30.

Most AI trading projects I've seen are either toy demos ("here's a ChatGPT prompt that says buy RELIANCE") or enterprise platforms that cost six figures. There's nothing in between. I wanted the in-between.

The rule was simple: every component has to be swappable, every decision has to be explainable, and AI models have to earn their keep — not just rubber-stamp signals.

The Debate Mechanism

This is the part I'm most proud of. When the screener picks a stock, it doesn't go straight to execution. Instead, two separate agents get spun up:

BULL_SYSTEM = """You are a senior buy-side analyst who is BULLISH on the stock.
Build the strongest possible case for BUYING this stock.
Be specific — cite numbers, patterns, and catalysts. No vague optimism."""

BEAR_SYSTEM = """You are a senior risk analyst who is BEARISH on the stock.
Build the strongest possible case AGAINST buying this stock.
Be specific — cite numbers, patterns, and warnings. No vague pessimism."""

They argue for two rounds. Bull makes the case, bear tears it apart, bull rebuts, bear rebuts. Then a third LLM — a neutral "portfolio manager" — reads both sides and renders a verdict: strong_buy, buy, hold, sell, or strong_sell.

The key insight: the system prompt tells each agent "no vague optimism" and "no vague pessimism." Force specificity and the arguments actually get good. The bull agent can't just say "strong fundamentals" — it has to cite the actual USFDA clearance or the pipeline of ANDAs. The bear has to name the actual warning letters.

On March 19th, the system ran 98 debates across stocks like POLYCAB, TATAELXSI, COLPAL, INFY, and BOSCHLTD. Total LLM cost for the day: $0.87.

Everything Is a Plugin

I didn't want a monolith. Every layer is a plugin that registers with the engine:

engine.add("broker", GrowwBroker(role=BrokerRole.DATA))
engine.add("broker", KiteBroker(role=BrokerRole.EXECUTION))
engine.add("agent", master_agent)
engine.add("executor", PaperExecutor(initial_capital=500_000))

Brokers, data providers, analyzers, LLM clients, strategies, executors, interfaces — all implement a Protocol with name, start(), stop(). The engine doesn't know what they do. It just starts them in order and stops them in reverse.

Components never import each other. They talk through an async EventBus:

# Screener doesn't know MasterAgent exists
await self._events.emit("agent:screened", symbols=["POLYCAB", "INFY", "COLPAL"])

# Master listens, doesn't know who screened
self._events.on("agent:screened", self._on_screened)

This means I can swap Groww for Zerodha for data, swap Claude for GPT for the debaters, swap paper execution for live execution — without touching anything else. I tested this by running the entire pipeline with a $0.001/call model for screening and Claude Opus for debates. Different models, different API keys, different endpoints per agent. Zero code changes.

The Pipeline

Every market day, the system follows NSE hours:

9:00 AM — ResearchAgent scans RSS feeds from MoneyControl and Economic Times. Points Claude at raw headlines and says "figure out what matters." It extracts symbols, impact direction, significance, and time horizon.

9:15 AM — ScreenerAgent fetches live quotes for 120 stocks, sorts by momentum, then asks an LLM to rank the top 5 intraday candidates. Two-stage filter: quantitative first (free), then qualitative LLM ranking (costs tokens).

9:15–12:00 — Debates run. Bull and bear argue. Consensus judge scores them. If confidence > 60%, AnalysisAgent runs sentiment analysis. MasterAgent makes the final call with full context: debate verdict, analysis signals, current portfolio, and available capital.

2:00 PM — Closing phase. No new positions.

3:30 PM — Daily report. Cost breakdown per agent, decisions made, P&L summary.

The scheduler knows IST, knows NSE holidays (Republic Day, Holi, Eid — all hardcoded), and skips weekends. There's a force_active mode for testing outside market hours.

What It Actually Costs

Real numbers from a full trading day (March 19, 2026):

Agent Calls Tokens Cost
Debaters (bull + bear) 127 213K $0.65
Master (decisions) 34 83K $0.19
Researcher 10 46K $0.03
Executor 15 14K $0.006
Screener 14 15K $0.001
Total 200 370K $0.87

Under a dollar for a full day of autonomous research, screening, 98 debates, and trade decisions across 15+ stocks. I have a \(50/day budget configured and I've never hit even \)2. Per-agent budgets auto-gate — if the debaters blow their budget, they stop arguing and the master falls back to simple heuristics.

The Dashboard

The engine runs headless — just a Python process with an HTTP API. The dashboard is a separate React app that connects via WebSocket and shows everything in real-time: debates unfolding argument by argument, agents changing state, trades executing, P&L ticking.

I kept the engine repo private and made the dashboard public. The engine is the brain; the dashboard is the face.

During the India-Pakistan tension in March 2026, the system ran through 10 days of high-volatility sessions. Starting capital ₹10,00,000, current value ₹23,66,000+. The debate mechanism was especially useful here — forced the system to argue both sides during panic selling instead of blindly following momentum.

What I'd Do Differently

The consensus engine is too conservative. Every verdict came back as "hold" with 65% confidence for the first few days until I tuned the system prompts to bias toward action. An AI that never trades is useless.

I also learned that LLM temperature matters more than the model. Debaters at 0.7 (creative) produce genuinely different arguments round to round. The consensus judge at 0.3 (conservative) stays sober. The master at 0.3 doesn't get spooked. Getting this wrong meant either boring debates or erratic decisions.

What's Next

Live execution via Kite Connect — the protocol is already there, just need to wire up the real place_order(). Technical analysis as a second analyzer alongside sentiment. And a proper backtest of the debate mechanism against a simple momentum strategy to see if arguing actually helps.

The whole thing runs on my MacBook right now. It could run on a ₹500/month VPS. That's the point — this isn't infrastructure-heavy. It's just Python, some API keys, and a few models that argue about stocks for less than a dollar a day.


sudo-trade is open source (dashboard) at github.com/myselfshravan/sudo-trade-dashboard. The engine is private.