Building a Portfolio of Trading Models: Why One Good Backtest Is Not Enough
CuteMarkets Team
Research

Repository reference: cutebacktests
Abstract
One good backtest is not a portfolio. It is at most a candidate input to a portfolio. That distinction became one of the central conclusions of this repository over the last two months, and it changed the research objective itself.
The transition is explicit in Episode 5 and PAPER_BOTS.md. The project moved away from trying to optimize broad standalone ORB search and toward building a small diversified paper-bot portfolio. That is a much harder problem because a strategy now has to be credible not only on its own but also in the company of the other sleeves.
Question
The practical question is not whether one model can make money. The useful question is whether a set of models can coexist with low enough overlap and high enough credibility to justify a live portfolio.
That is why portfolio thinking changes the gate. A strategy that looks excellent in isolation may still be a weak portfolio addition if it overlaps too much with the current anchor, fails offset tests, or is too sparse to pull its weight.
Method: Why a Portfolio of Trading Models Needs More Than Raw PnL
This repository's roadmap now frames the task directly. In paper_bot_portfolio_r1/roadmap.json, the goal is to build a small diversified paper-bot portfolio rather than to keep searching broadly for a single ORB winner.
That shift changes the evaluation object. Instead of asking "which branch has the nicest backtest," the repo now asks things like:
- does the branch survive under stress scenarios
- does it trade often enough to matter
- does it overlap too much with current leaders
- does it offset drawdowns in the right regimes
- is the option path clean enough to operate
This is why c4's gate included orb_overlap_days, c66_overlap_days, and offset_ratio_on_orb_down_days. Those are portfolio questions, not isolated backtest questions.
Evidence / Results
The current practical order from PAPER_BOTS.md is:
c66_strict_parity_paper_bot_r1c4_open_paper_candidate_r1c36_open_paper_candidate_r1
That order already implies portfolio thinking. c66 is first because it has the strongest current deployable evidence, including base out-of-sample return 19.18%, stress-medium 16.70%, stress-harsh 15.56%, and 76 out-of-sample trades across all scenarios. c36 stays below it because the quality branch is too sparse. c4 remained interesting and was still parked because the overlap and feasibility bar remained too demanding.
The QQQ dispersion sleeve then sits behind the formal ladder as research_only, even after positive-looking results such as qqq_single_base with 9 trades and +44537.92. That is a useful reminder that strong numbers on thin samples are not enough to claim portfolio membership.
What Worked
What worked was the change in objective function. Once the repo stopped treating the problem as "find the best ORB" and started treating it as "build a low-overlap set of believable sleeves," the research became more coherent. Strategies could now be classified by role: lead paper bot, backup candidate, parked near-miss, research-only sleeve.
This is one reason the current state of the repo is more interesting than the earlier broader search. The list of survivors is smaller, but the roles are clearer.
What Failed
What failed was the earlier hope that one family would dominate and scale. The ORB audit, the later roadmap, and the c4 gate all point away from that conclusion. A strong isolated backtest did not solve the real problem. The real problem was assembling a group of sleeves that could coexist under realism, parity, overlap, and deployability constraints.
That is a valuable negative result because many public research threads end at the first green chart. Portfolio construction begins exactly where that kind of content usually stops.
Takeaway
A portfolio of trading models needs more than one attractive backtest. It needs a set of branches that survive individually and make sense together. This repo's recent work is valuable because it now evaluates models under that higher standard.
If you want the state-of-the-journey summary, The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us is the capstone. If you want the methodology behind public reporting, Algorithmic Trading Research Log: How to Build in Public Without Hiding Failed Results explains the publishing philosophy. Join the research log to get the next backtest and failure report.
Product links
Build the workflow with CuteMarkets
This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.
Options Data API
See the canonical product page for real-time and historical options data.
Historical Options Data API
Inspect the historical contracts, quotes, trades, and aggregates workflow.
Options Chain API
Go straight to chain snapshots, expirations, and strike discovery.
Pricing
Review plans before you move from free evaluation into production usage.