Research SeriesApril 8, 2026·9 min read

Episode 7: Failure Week Was Productive

Daniel Ratke

Research & Engineering

Term map

Backtesting vocabulary for this article

Treat signal timestamp, point-in-time universe, quote-aware fill, reject reason, replay artifact, walk-forward test, and cache key as first-class terms. They separate reproducible research from a backtest that only preserves the final performance table.

Follow the linked definitions for Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, Signal timestamp, Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API.

Read this article with Options Backtesting API, Backtesting Framework, Backtesting Data Quality Checklist, Backtesting Execution Realism, Quote-Aware Options Backtests, and Backtest to Paper Trading Parity Checklist.

Scope

This episode focuses on the cluster of negative results logged around 2026-04-06 to 2026-04-08 in RUNS.md.

It is one of the most important episodes in the series because it saved time. Many ideas were tested, failed, and properly closed instead of being allowed to drift around the backlog forever.

Result Snapshot

Lane	Outcome	Main blocker
`c23` wave-failure reclaim	`no_feasible_profile`	too few trades, low trades/week, correlation issue
`c26` gap reclaim continuation	`no_feasible_profile`	failed DSR, Sharpe, Sortino, PBO; sparse sample
`c29` open-drive pullback	`0` trades	no effective sample
`c30` ORB retest higher-low	`no_feasible_profile`	weak quality despite some activity
`c32` gap-failure fade	`no_feasible_profile`	failed DSR, Sharpe, Sortino, trades/week
`c37` debit-spread companion	`0` trades on `SPY`	structure too sparse
`lfcm_catalyst_momentum`	closed	`0` valid catalyst days even after data-path repair

This is what a useful cemetery looks like.

Strategy Context: What These Models Were Actually Trying To Do

c23 was a failed-break reclaim model. In code terms, it looked for an early downside sweep through the opening structure, required the market to reclaim back above VWAP or the opening-range midpoint, and then entered long on follow-through. The quality version tightened the reclaim window, required relative volume, and demanded a stronger reclaim close. The logic is intuitively attractive because it tries to monetize failed downside auctioning. The repo result says that attractiveness was not enough: the setup remained too sparse and too correlated with other existing sleeves to earn a place.

c26 was a gap-reclaim continuation model built on the event-drive variant. It required a meaningful gap up, then asked whether the session could hold support and continue after the reclaim. The quality version increased the minimum gap size, required stronger relative volume, and demanded a larger breakout fraction versus the opening range. This is a classic event-momentum hypothesis: a large pre-open dislocation plus early acceptance should continue. The repo found that the path did not generalize well enough. Even when the trade existed, quality metrics and overfitting diagnostics remained too weak.

c29 and c30 were both long continuation families, but with different structural emphasis. c29 required a strong opening drive, then a shallow pullback that stayed above VWAP or the opening-range high, and finally a resumption of the original drive. c30 waited for an opening-range breakout, then required the retest to hold as a higher low above VWAP before taking the continuation break of the retest high. Both ideas are familiar to discretionary traders. The repo result is valuable because it shows how quickly these intuitive narratives become statistically fragile once you insist on explicit drive magnitude, retracement bounds, relative volume, and time-budget rules. c29 became so constrained that it produced zero trades in the tested lane. c30 produced some trades, but not enough quality.

c32 was the mirror-image failure-fade idea. It looked for a gap-up session that failed to reclaim VWAP and then shorted the continuation of that failed bounce. The quality version made the gap threshold larger and shortened the deadline. This is a plausible opening-reversal archetype: strong overnight enthusiasm that cannot be maintained after the open. In the repo, however, the pattern did not survive the feasibility bar. It was able to tell a compelling market story more easily than it could produce robust out-of-sample evidence.

c37 was not a new underlying signal at all. It took the long-only VWAP mean-reversion logic from the c18 family and tried to express it through 2-5 DTE vertical debit spreads with quote-aware spread execution, rather than through the 0-2 DTE single-leg expression used by c36. It inherited the mean-reversion assumptions of c18 plus additional structural requirements around short-leg bids, debit-to-width ratio, and spread quality. The important negative result here is structural: changing the monetization layer alone can be enough to extinguish a strategy's usable sample.

The LFCM catalyst lane failed for a different reason. It was never primarily an intraday price-pattern strategy. It depended on the existence of historically valid catalyst headlines plus premarket activity. By April 8, the repo had already repaired the premarket data path and allowed Alpaca as a secondary provider. The lane still produced zero valid catalyst days. That makes it one of the cleanest closures in the repo because the data excuse was removed before the idea was killed.

Why These Failures Matter

Each of these failures answers a different question.

c29 and c37 tell us there are ideas that do not even clear the sample-creation threshold. That is an early and clean rejection.

c23, c26, c30, and c32 tell us there are ideas that can create some trades but still fail the combination of robustness, return quality, and frequency needed for promotion.

The LFCM lane tells us something even stronger. After the repo fixed the audit path and added Alpaca as the allowed secondary provider, the lane still had:

22529 ticker-days with premarket bars
0 valid catalyst headline days
0 candidate ticker-days

That is not a data excuse anymore. That is a strategy-universe result.

What Worked

What worked was the decision process itself.

The repo did not do the usual thing where failed branches are left in a vague "interesting, revisit later" state. It named the blockers. In most cases those blockers were exactly the ones that matter for a live portfolio:

sample too sparse
quality metrics too weak
overlap or correlation too high
opportunity not strong enough after realistic filtering

This is one of the strongest credibility signals in the entire project. A public series that only reports survivors looks like marketing. A public series that reports why a lane was killed looks like research.

What Did Not Work

The obvious answer is "those models did not work." But there is a more general negative result here.

What did not work was the temptation to rescue every interesting intuition with one more parameter pass.

The repo could have easily spent another week on:

looser thresholds for c29
different spreads for c37
more permissive catalyst heuristics for LFCM

Instead, the evidence said stop. That is especially important for the wave-style branches because discretionary intuition can keep those ideas alive far longer than the statistics warrant. A reclaim, a higher-low retest, or a failed gap often looks convincing on a chart after the fact. The repo's value here is that it translated those chart narratives into explicit entry windows, retracement bounds, RVOL floors, and regime filters, then showed that the resulting objects still did not clear the bar.

Why This Week Matters

This is the episode that teaches the audience what a serious kill decision looks like.

In mild One Piece language, not every island is hiding treasure. Some are just empty. The project got better because it stopped camping on empty islands.

Public Build Takeaway

Episode 7 should be one of the most shared posts in the series, because it saves other researchers from checking the same dead ends without context.

The lesson is:

publish the graveyard
explain the blocker, more than the death certificate
treat negative results as reusable information

That is how a public research journey becomes useful to other people rather than merely entertaining.

For the Episode 7: Failure Week Was Productive workflow, continue through Options Backtesting API, Backtesting Framework, Backtesting Execution Realism, Backtesting Data Quality Checklist, Quote-Aware Options Backtests, and Backtest to Paper Trading Parity Checklist.

How the terminology applies

For Episode 7: Failure Week Was Productive, the backtesting workflow should treat Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, and Signal timestamp as operational state rather than glossary decoration. That framing keeps the research claim causal: the strategy can only select instruments, prices, and labels that existed at the decision time.

A developer implementing this research idea should persist Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API beside the result, instead of leaving those words in a term card. It also turns attractive performance into an auditable record where fills, skips, thresholds, and replay inputs can be challenged independently.

The review artifact for Episode 7: Failure Week Was Productive becomes more useful when OPRA-originating data, OCC option symbol, Bid/ask spread, Midpoint, Quote/trade condition, and Quote vs trade semantics appear in the same body of evidence as the selected rows. When a result is promoted, these fields should appear in the run manifest, rather than a prose summary or final equity curve.

In production notes for this backtesting workflow, REST snapshot, WebSocket stream, Entitlement gate, Quote freshness, Timestamp semantics, and Pagination cursor define the checks that decide whether the workflow is reproducible. The result is a backtest that can be rerun, compared across threshold families, and rejected when the evidence is not strong enough.

For Episode 7: Failure Week Was Productive, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the backtesting workflow.

This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.

Terminology

Market-data terms used in this article

These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.

Point-in-time contracts

Contract discovery anchored to the research date so a backtest does not use future listings.

Quote-aware fills

Entry and exit assumptions based on bid/ask quotes, quote age, spread width, and side-specific fill rules.

Reject reasons

Logged explanations for skipped contracts or fills, including stale quote, wide spread, no bid, or missing data.

Replay artifact

The saved request, selection, fill, reject, and metric record that lets another developer audit the backtest.

Cache key

The structured identifier that keeps provider, endpoint, ticker, timestamp, plan, and schema state from being mixed.

Signal timestamp

The exact time a strategy made a decision, used to reconstruct the visible universe and quote window causally.

Look-ahead leakage

A research error where a fill, contract, indicator, or label uses information unavailable at decision time.

Walk-forward test

A validation method that repeatedly trains and evaluates across separated time windows instead of trusting one optimized sample.

Slippage model

A fill-cost assumption based on bid/ask side, midpoint, spread percent, quote age, and liquidity policy.

Same-bar fill

An intraday backtest assumption that can become invalid when signal, entry, stop, and target ordering is ambiguous.

Promotion gate

The written threshold that decides whether a research candidate can move into paper trading or production monitoring.

Options data API

The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.

OPRA-originating data

The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.

OCC option symbol

The exact option contract identifier that preserves root, expiration, call or put side, and strike.

Bid/ask spread

The execution interval between bid and ask that determines whether a contract is realistically tradable.

Midpoint

The computed center between bid and ask, useful as a reference price but not proof that an order would fill.

Quote/trade condition

The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.

Quote vs trade semantics

The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.

REST snapshot

A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.

WebSocket stream

A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.

Entitlement gate

The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.

Quote freshness

The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.

Timestamp semantics

The exchange, provider, ingestion, session, and application time context attached to a market-data record.

Pagination cursor

The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.

Written by

Daniel Ratke

Research & Engineering

Daniel covers the deeper research notes: options backtesting, execution realism, robustness testing, data engineering, and strategy validation.

Product links

Build the workflow with CuteMarkets

This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.

Beginner options path

Send newcomers to the beginner path for calls, puts, chains, Greeks, IV, and risk.

Options Data API

See the main options overview for real-time and historical options data.

Historical Options Data API

Inspect the historical contracts, quotes, trades, and aggregates workflow.

Options Chain API

Go straight to chain snapshots, expirations, and strike discovery.

Pricing

Review plans before you move from free evaluation into production usage.

Back to Blog

Episode 7: Failure Week Was Productive

Backtesting vocabulary for this article

Scope

Result Snapshot

Strategy Context: What These Models Were Actually Trying To Do

Why These Failures Matter

What Worked

What Did Not Work

Why This Week Matters

Public Build Takeaway

Related workflow

How the terminology applies

Market-data terms used in this article

Daniel Ratke

Build the workflow with CuteMarkets