Research SeriesMarch 8, 2026·6 min read

Episode 3: The Simulator Audit

Daniel Ratke

Research & Engineering

Term map

Backtesting vocabulary for this article

Treat signal timestamp, point-in-time universe, quote-aware fill, reject reason, replay artifact, walk-forward test, and cache key as first-class terms. They separate reproducible research from a backtest that only preserves the final performance table.

Follow the linked definitions for Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, Signal timestamp, Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API.

Read this article with Options Backtesting API, Backtesting Framework, Backtesting Data Quality Checklist, Backtesting Execution Realism, Quote-Aware Options Backtests, and Backtest to Paper Trading Parity Checklist.

Scope

This episode is anchored in backtesting_framework_issue_summary_20260308.md.

Unlike the first two episodes, the evidence here is explicit and direct. This was the week when the repository named the places where the framework was overstating confidence and then patched them.

Result Snapshot

Five patched issues changed the scientific meaning of the repo:

Issue	Why it mattered
contract selection cache ignored the relevant underlying price bucket	wrong strike could be silently reused
`stop_touch` used same-bar information	same-bar lookahead in momentum/event paths
overnight MR used entry-bar full state	another same-bar leakage path
combined Sharpe and Sortino flattened per-symbol returns	portfolio risk was overstated
top-level PBO and DSR used the wrong fold granularity	robustness selection was misaligned

This was not cosmetic work. These are exactly the kinds of mistakes that can make a strategy look stable when it is merely benefiting from information leakage or bad aggregation.

Each issue also distorts a different layer of inference. Wrong contract-cache reuse changes the instrument being tested. Same-bar lookahead changes the information set that the signal is allowed to use. Flattened per-symbol daily returns distort the portfolio estimator itself. Misaligned PBO and DSR usage contaminate the selection procedure that determines which profile is allowed to look "robust." These were not all the same category of bug. They attacked the validity of the conclusions from multiple angles at once.

The Hard Truth

The repo did something many research codebases avoid: it made the simulator less flattering on purpose.

Behavior changes recorded in the audit included:

stop_touch now means signal on bar t, enter on bar t+1
overnight MR only uses prior completed bars
combined Sharpe and Sortino come from real aggregated daily PnL
PBO and DSR diagnostics are split correctly between dashboard and selection scenarios

That means some old excitement had to be discounted. The repo implicitly accepted that cost.

What Worked

What worked was not a specific model. What worked was the willingness to treat metric integrity as a production issue.

The test coverage added in the audit matters for that reason. The repo did more than patch the behavior. It also wrote regressions around:

cached contract universes
next-bar stop-touch entry semantics
prior-bar overnight MR semantics
combined-day risk aggregation
combined-fold PBO and DSR usage

If you want to build in public credibly, this is how you do it. You show the assumption list beside the performance chart, but the list of assumptions you found unsafe and the tests you added so they do not quietly come back.

What Did Not Work

The negative result is unavoidable: some previously reported strength, especially in intraday options paths, must be treated as lower-confidence once these fixes are in place.

That is not a failure of the audit. That is the success condition of the audit.

The repo also left one item intentionally unresolved: the default fill-model mismatch between orb_confluence and orb_conviction. That restraint is scientifically useful. It distinguishes between:

bugs that should be fixed immediately
defaults that need an explicit product-level decision

That distinction is part of the style this project should keep publicly. A scientific writeup does not need to present the codebase as fully settled. It needs to separate known implementation defects from open design choices. The first category invalidates evidence if left unresolved. The second category changes the interpretation of evidence and therefore has to be documented, not silently normalized.

Why This Week Matters

This is the week the project stopped being only a strategy playground and became a measurement system with standards.

If we keep the One Piece analogy mild, this is the episode where the crew checks whether the compass itself is broken. You do not hunt treasure with a lying compass.

Public Build Takeaway

This episode should be published with no defensiveness. It is one of the strongest credibility signals in the whole repo.

The public lesson is:

the fastest path to fake alpha is sloppy measurement
bug-fix posts are not side content; they are core research content
if the audit makes your earlier results weaker, that is progress

Any audience worth building will respect this episode more than a polished chart with hidden leakage.

For the Episode 3: The Simulator Audit workflow, continue through Options Backtesting API, Backtesting Framework, Backtesting Execution Realism, Backtesting Data Quality Checklist, Quote-Aware Options Backtests, and Backtest to Paper Trading Parity Checklist.

How the terminology applies

For Episode 3: The Simulator Audit, the backtesting workflow should treat Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, and Signal timestamp as operational state rather than glossary decoration. That framing keeps the research claim causal: the strategy can only select instruments, prices, and labels that existed at the decision time.

A developer implementing this research idea should persist Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API beside the result, instead of leaving those words in a term card. It also turns attractive performance into an auditable record where fills, skips, thresholds, and replay inputs can be challenged independently.

The review artifact for Episode 3: The Simulator Audit becomes more useful when OPRA-originating data, OCC option symbol, Bid/ask spread, Midpoint, Quote/trade condition, and Quote vs trade semantics appear in the same body of evidence as the selected rows. When a result is promoted, these fields should appear in the run manifest, rather than a prose summary or final equity curve.

In production notes for this backtesting workflow, REST snapshot, WebSocket stream, Entitlement gate, Quote freshness, Timestamp semantics, and Pagination cursor define the checks that decide whether the workflow is reproducible. The result is a backtest that can be rerun, compared across threshold families, and rejected when the evidence is not strong enough.

For Episode 3: The Simulator Audit, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the backtesting workflow.

This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.

The audit changed the data shape

The simulator audit did more than fix a few rules. It changed what a valid row had to contain. A candidate trade now needed point-in-time contract discovery, a selected OCC option symbol, quote-aware fills, reject reasons, and a replay artifact. For options, the audit forced bid, ask, spread percent, quote freshness, quote condition, trade condition, and no-bid exit handling into the result itself.

That matters because the same headline strategy can look different under different schemas. A bar-based simulation can enter at a clean close. A quote-aware simulation has to ask whether the top-of-book market existed, whether the ask was reachable, whether the bid could support an exit, and whether the data source was realtime, delayed, or repaired from a backfill. Those checks shrink the opportunity set, but they make the remaining evidence cleaner.

The audit also made failures easier to classify. A branch could fail for look-ahead leakage, missing pagination, stale NBBO, wide spread, low open interest, or weak out-of-sample behavior. That taxonomy is what later episodes needed before portfolio decisions could mean anything.

Terminology

Market-data terms used in this article

These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.

Point-in-time contracts

Contract discovery anchored to the research date so a backtest does not use future listings.

Quote-aware fills

Entry and exit assumptions based on bid/ask quotes, quote age, spread width, and side-specific fill rules.

Reject reasons

Logged explanations for skipped contracts or fills, including stale quote, wide spread, no bid, or missing data.

Replay artifact

The saved request, selection, fill, reject, and metric record that lets another developer audit the backtest.

Cache key

The structured identifier that keeps provider, endpoint, ticker, timestamp, plan, and schema state from being mixed.

Signal timestamp

The exact time a strategy made a decision, used to reconstruct the visible universe and quote window causally.

Look-ahead leakage

A research error where a fill, contract, indicator, or label uses information unavailable at decision time.

Walk-forward test

A validation method that repeatedly trains and evaluates across separated time windows instead of trusting one optimized sample.

Slippage model

A fill-cost assumption based on bid/ask side, midpoint, spread percent, quote age, and liquidity policy.

Same-bar fill

An intraday backtest assumption that can become invalid when signal, entry, stop, and target ordering is ambiguous.

Promotion gate

The written threshold that decides whether a research candidate can move into paper trading or production monitoring.

Options data API

The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.

OPRA-originating data

The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.

OCC option symbol

The exact option contract identifier that preserves root, expiration, call or put side, and strike.

Bid/ask spread

The execution interval between bid and ask that determines whether a contract is realistically tradable.

Midpoint

The computed center between bid and ask, useful as a reference price but not proof that an order would fill.

Quote/trade condition

The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.

Quote vs trade semantics

The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.

REST snapshot

A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.

WebSocket stream

A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.

Entitlement gate

The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.

Quote freshness

The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.

Timestamp semantics

The exchange, provider, ingestion, session, and application time context attached to a market-data record.

Pagination cursor

The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.

Written by

Daniel Ratke

Research & Engineering

Daniel covers the deeper research notes: options backtesting, execution realism, robustness testing, data engineering, and strategy validation.

Product links

Build the workflow with CuteMarkets

This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.

Beginner options path

Send newcomers to the beginner path for calls, puts, chains, Greeks, IV, and risk.

Options Data API

See the main options overview for real-time and historical options data.

Historical Options Data API

Inspect the historical contracts, quotes, trades, and aggregates workflow.

Options Chain API

Go straight to chain snapshots, expirations, and strike discovery.

Pricing

Review plans before you move from free evaluation into production usage.

Back to Blog

Episode 3: The Simulator Audit

Backtesting vocabulary for this article

Scope

Result Snapshot

The Hard Truth

What Worked

What Did Not Work

Why This Week Matters

Public Build Takeaway

Related workflow

How the terminology applies

The audit changed the data shape

Market-data terms used in this article

Daniel Ratke

Build the workflow with CuteMarkets