Episode 3: The Simulator Audit
CuteMarkets Team
Research

Scope
This episode is anchored in backtesting_framework_issue_summary_20260308.md.
Unlike the first two episodes, the evidence here is explicit and direct. This was the week when the repository named the places where the framework was overstating confidence and then patched them.
Result Snapshot
Five patched issues changed the scientific meaning of the repo:
| Issue | Why it mattered |
|---|---|
| contract selection cache ignored the relevant underlying price bucket | wrong strike could be silently reused |
stop_touch used same-bar information | same-bar lookahead in momentum/event paths |
| overnight MR used entry-bar full state | another same-bar leakage path |
| combined Sharpe and Sortino flattened per-symbol returns | portfolio risk was overstated |
| top-level PBO and DSR used the wrong fold granularity | robustness selection was misaligned |
This was not cosmetic work. These are exactly the kinds of mistakes that can make a strategy look stable when it is merely benefiting from information leakage or bad aggregation.
Each issue also distorts a different layer of inference. Wrong contract-cache reuse changes the instrument being tested. Same-bar lookahead changes the information set that the signal is allowed to use. Flattened per-symbol daily returns distort the portfolio estimator itself. Misaligned PBO and DSR usage contaminate the selection procedure that determines which profile is allowed to look "robust." In other words, these were not all the same category of bug. They attacked the validity of the conclusions from multiple angles at once.
The Hard Truth
The repo did something many research codebases avoid: it made the simulator less flattering on purpose.
Behavior changes recorded in the audit included:
stop_touchnow means signal on bart, enter on bart+1- overnight MR only uses prior completed bars
- combined Sharpe and Sortino come from real aggregated daily PnL
- PBO and DSR diagnostics are split correctly between dashboard and selection scenarios
That means some old excitement had to be discounted. The repo implicitly accepted that cost.
What Worked
What worked was not a specific model. What worked was the willingness to treat metric integrity as a production issue.
The test coverage added in the audit matters for that reason. The repo did not just patch the behavior. It also wrote regressions around:
- cached contract universes
- next-bar stop-touch entry semantics
- prior-bar overnight MR semantics
- combined-day risk aggregation
- combined-fold PBO and DSR usage
If you want to build in public credibly, this is how you do it. You show not just the performance chart, but the list of assumptions you found unsafe and the tests you added so they do not quietly come back.
What Did Not Work
The negative result is unavoidable: some previously reported strength, especially in intraday options paths, must be treated as lower-confidence once these fixes are in place.
That is not a failure of the audit. That is the success condition of the audit.
The repo also left one item intentionally unresolved: the default fill-model mismatch between orb_confluence and orb_conviction. That restraint is scientifically useful. It distinguishes between:
- bugs that should be fixed immediately
- defaults that need an explicit product-level decision
That distinction is part of the style this project should keep publicly. A scientific writeup does not need to present the codebase as fully settled. It needs to separate known implementation defects from open design choices. The first category invalidates evidence if left unresolved. The second category changes the interpretation of evidence and therefore has to be documented, not silently normalized.
Why This Week Matters
This is the week the project stopped being only a strategy playground and became a measurement system with standards.
If we keep the One Piece analogy mild, this is the episode where the crew checks whether the compass itself is broken. You do not hunt treasure with a lying compass.
Public Build Takeaway
This episode should be published with no defensiveness. It is one of the strongest credibility signals in the whole repo.
The public lesson is:
- the fastest path to fake alpha is sloppy measurement
- bug-fix posts are not side content; they are core research content
- if the audit makes your earlier results weaker, that is progress
Any audience worth building will respect this episode more than a polished chart with hidden leakage.
Product links
Build the workflow with CuteMarkets
This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.
Options Data API
See the canonical product page for real-time and historical options data.
Historical Options Data API
Inspect the historical contracts, quotes, trades, and aggregates workflow.
Options Chain API
Go straight to chain snapshots, expirations, and strike discovery.
Pricing
Review plans before you move from free evaluation into production usage.