What survives serious intraday options backtesting?

Usually only narrow setups survive after timestamp causality, realistic fills, liquidity filters, robustness checks, and portfolio overlap constraints are applied.

Why publish failed backtesting results?

Failed results document what was tested, which assumptions broke, and which branches should not be retested casually, making future research faster and more honest.

Research LogApril 20, 2026·6 min read

The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us

Daniel Ratke

Research & Engineering

Quick answer

The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us

Months of intraday options backtesting taught that most attractive ideas fail after causal execution, quote-aware fills, robustness tests, and portfolio gates. The useful output was a narrower map of surviving sleeves and explicit negative results.

Term map

Backtesting vocabulary for this article

Treat signal timestamp, point-in-time universe, quote-aware fill, reject reason, replay artifact, walk-forward test, and cache key as first-class terms. They separate reproducible research from a backtest that only preserves the final performance table.

Follow the linked definitions for Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, Signal timestamp, Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API.

Repository reference: cutebacktests

Abstract

The last two months of intraday options backtesting in this repository did not produce one universal winner. They produced something more useful: a smaller, more believable map of what survived once the simulator became more causal, the metrics became harsher, and the portfolio bar became more explicit.

The clearest summary still comes from Toward The One Piece Of Sharpe. The current picture is straightforward. c66 is the lead_paper_bot. c36 is a sparse but real backup candidate. c4 became a repaired near-miss and was still parked. The QQQ-only dispersion sleeve is economically interesting and still research_only. Broad ORB did not survive as the main expansion path.

This research summary sits next to One Piece of Sharpe Episode 1, Backtesting Test Plan, and Quote-Aware Options Backtests. Keep contract selection, DTE bucket, entry timestamp, exit timestamp, option quote, bid/ask spread, reject reason, Sharpe, DSR, and PBO visible.

Question

The practical question is not "what is the best model?" That question is too small. The more useful question is what months of intraday options backtesting actually taught the repo about which sleeves survive realism and which do not.

That is why the One Piece analogy has to stay restrained. The treasure here is not one perfect strategy. It is a working portfolio of low-overlap models with believable Sharpe. The repo has not reached that destination yet, but it now has a much better map.

Method: How the Repo Learned From Months of Intraday Options Backtesting

The learning process had several stages.

First, the repo improved the measurement system. The March 8 audit repaired contract-selection reuse, same-bar stop_touch logic, same-bar overnight mean-reversion leakage, combined Sharpe and Sortino aggregation, and top-level fold diagnostics.

Second, it audited major families honestly. The ORB audit concluded that broad ORB search mostly did not survive, while a narrow directional 5-7DTE pocket did.

Third, it widened the idea space and harvested failures quickly. c23, c26, c29, c30, c32, c37, and the LFCM catalyst lane were all closed or killed with explicit reasons.

Fourth, it converged toward portfolio roles. c66 became the lead paper bot. c36 remained a sparse but real backup. c4 improved after repair and was still parked. The QQQ dispersion sleeve became the strongest research-only branch.

Evidence / Results

The current result table from Toward The One Piece Of Sharpe is still the best compact summary:

c66 slow-DTE compression: lead_paper_bot, base 19.18%, stress-medium 16.70%, stress-harsh 15.56%, 76 out-of-sample trades
c36 VWAP mean reversion: profitable quality branch, +16004 PnL, 15 trades, DSR 0.6400, but too sparse
c4 dispersion breakout: repaired branch restored 79 and 85 trade rows, still failed the harsh promotion gate
QQQ-only dispersion sleeve: qqq_single_base 9 trades and +44537.92, still research-only because the sample is thin
broad ORB search: mostly weak or too sparse under realistic deployment standards

This mix of results is exactly why the repo's story became more compelling as it became less grandiose. The winners got narrower. The negative results got cleaner. The remaining uncertainty became more explicit.

What Worked

What worked was realism. Once the measurement system became more honest, the repo found a small number of sleeves that still looked serious. c66 is the clearest example because it combined positive out-of-sample returns, stability under stress, and enough operational credibility to lead the paper-bot ladder.

What also worked was selective refusal. The repo kept c36 alive without pretending it was ready. It repaired c4 without pretending repair meant admission. It kept QQQ dispersion in research-only status despite strong-looking headline numbers because the sample was still too thin. That is the behavior of a portfolio researcher rather than a chart collector.

What Failed

What failed was broad optimism. Broad ORB search did not survive. Several intuitive complement ideas died quickly and correctly. The LFCM lane still found 0 valid catalyst days after the data path was repaired. Many nearby or adjacent branches did not clear the same bar as the strongest survivor.

That is a good outcome because it means the repo is becoming more selective for the right reasons. The opportunity set is smaller, but the remaining claims are easier to defend.

Takeaway

Months of intraday options backtesting taught this repository that the real goal is not one heroic backtest. The real goal is a small set of low-overlap sleeves that still make sense after realism fixes, parity checks, stress scenarios, and portfolio gates.

If you want the portfolio-building version of that conclusion, Building a Portfolio of Trading Models: Why One Good Backtest Is Not Enough is the direct companion. If you want the process and publishing philosophy behind it, Algorithmic Trading Research Log: How to Build in Public Without Hiding Failed Results explains why the repo keeps reporting negative results. Join the research log to get the next backtest and failure report.

How the terminology applies

For The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us, the backtesting workflow should treat Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, and Signal timestamp as operational state rather than glossary decoration. That framing keeps the research claim causal: the strategy can only select instruments, prices, and labels that existed at the decision time.

A developer implementing this Research Log idea should persist Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API beside the result, instead of leaving those words in a term card. It also turns attractive performance into an auditable record where fills, skips, thresholds, and replay inputs can be challenged independently.

The review artifact for The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us becomes more useful when OPRA-originating data, OCC option symbol, Bid/ask spread, Midpoint, Quote/trade condition, and Quote vs trade semantics appear in the same body of evidence as the selected rows. When a result is promoted, these fields should appear in the run manifest, rather than a prose summary or final equity curve.

In production notes for this backtesting workflow, REST snapshot, WebSocket stream, Entitlement gate, Quote freshness, Timestamp semantics, and Pagination cursor define the checks that decide whether the workflow is reproducible. The result is a backtest that can be rerun, compared across threshold families, and rejected when the evidence is not strong enough.

For The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the backtesting workflow.

This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.

The summary needs the same audit trail

The full One Piece of Sharpe summary should keep the same market-data vocabulary as the individual episodes. Each branch needs signal timestamp, contract discovery rule, selected OCC option symbol, quote window, trade window, spread percent, quote freshness, reject reasons, and replay manifest version. Without those fields, the summary can make weak branches look comparable to audited branches.

The surviving lesson is that strategy research became more useful as the schema became stricter. NBBO evidence, quote conditions, trade conditions, pagination cursors, entitlement state, and rate-limit assumptions made the claims narrower. That was the point. The result is not a universal recipe; it is a record of which ideas still had evidence after the simulator stopped being generous.

Terminology

Market-data terms used in this article

These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.

Point-in-time contracts

Contract discovery anchored to the research date so a backtest does not use future listings.

Quote-aware fills

Entry and exit assumptions based on bid/ask quotes, quote age, spread width, and side-specific fill rules.

Reject reasons

Logged explanations for skipped contracts or fills, including stale quote, wide spread, no bid, or missing data.

Replay artifact

The saved request, selection, fill, reject, and metric record that lets another developer audit the backtest.

Cache key

The structured identifier that keeps provider, endpoint, ticker, timestamp, plan, and schema state from being mixed.

Signal timestamp

The exact time a strategy made a decision, used to reconstruct the visible universe and quote window causally.

Look-ahead leakage

A research error where a fill, contract, indicator, or label uses information unavailable at decision time.

Walk-forward test

A validation method that repeatedly trains and evaluates across separated time windows instead of trusting one optimized sample.

Slippage model

A fill-cost assumption based on bid/ask side, midpoint, spread percent, quote age, and liquidity policy.

Same-bar fill

An intraday backtest assumption that can become invalid when signal, entry, stop, and target ordering is ambiguous.

Promotion gate

The written threshold that decides whether a research candidate can move into paper trading or production monitoring.

Options data API

The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.

OPRA-originating data

The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.

OCC option symbol

The exact option contract identifier that preserves root, expiration, call or put side, and strike.

Bid/ask spread

The execution interval between bid and ask that determines whether a contract is realistically tradable.

Midpoint

The computed center between bid and ask, useful as a reference price but not proof that an order would fill.

Quote/trade condition

The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.

Quote vs trade semantics

The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.

REST snapshot

A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.

WebSocket stream

A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.

Entitlement gate

The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.

Quote freshness

The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.

Timestamp semantics

The exchange, provider, ingestion, session, and application time context attached to a market-data record.

Pagination cursor

The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.

FAQ

Daniel Ratke

Research & Engineering

Daniel covers the deeper research notes: options backtesting, execution realism, robustness testing, data engineering, and strategy validation.

Product links

Build the workflow with CuteMarkets

This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.

Beginner options path

Send newcomers to the beginner path for calls, puts, chains, Greeks, IV, and risk.

Options Data API

See the main options overview for real-time and historical options data.

Historical Options Data API

Inspect the historical contracts, quotes, trades, and aggregates workflow.

Options Chain API

Go straight to chain snapshots, expirations, and strike discovery.

Pricing

Review plans before you move from free evaluation into production usage.

Back to Blog

The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us

The One Piece of Sharpe: What Months of Intraday Options Backtesting Actually Taught Us

Backtesting vocabulary for this article

Abstract

Question

Method: How the Repo Learned From Months of Intraday Options Backtesting

Evidence / Results

What Worked

What Failed

Takeaway

How the terminology applies

The summary needs the same audit trail

Market-data terms used in this article

Related questions

What survives serious intraday options backtesting?

Why publish failed backtesting results?

Daniel Ratke

Build the workflow with CuteMarkets