VWAP Mean Reversion Backtest: The Logic, the Edge, and the Failure Modes

Daniel Ratke
Research & Engineering

Term map
Backtesting vocabulary for this article
Treat signal timestamp, point-in-time universe, quote-aware fill, reject reason, replay artifact, walk-forward test, and cache key as first-class terms. They separate reproducible research from a backtest that only preserves the final performance table.
Follow the linked definitions for Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, Signal timestamp, Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API.
Repository reference: cutebacktests
Abstract
VWAP mean reversion is one of the most common intraday ideas because it has a clean market intuition. Short-horizon dislocations away from a central intraday benchmark may snap back once the move becomes stretched. The difficulty is not the intuition. The difficulty is turning that intuition into a strategy that preserves both quality and enough sample size to matter.
In this repository, the best example is c36, the option-native descendant of the c18 VWAP mean-reversion family. In Episode 8, the quality version, c36_vwap_mr_option_native_quality_v1, produced +16004 PnL on 15 trades with DSR 0.6400, while failing only trades_per_week_ok. The opportunity version reached 85 trades and +2987 PnL, but the quality shape decayed. That is a scientifically useful result because it shows a real edge and a real bottleneck at the same time.
For the VWAP family, read this with VWAP Mean Reversion Signal Quality and Density, Stock Aggregates and Indicators, and Backtesting Execution Realism.
Question
The practical question is not whether VWAP mean reversion makes sense. It is whether the edge survives once the setup is defined tightly enough to be causal and monetized honestly through options.
That is the question the c36 branch answers well. It is not a loose discretionary fade. It is a constrained intraday mean-reversion model with explicit VWAP residual z-scores, bounded VWAP slope, sigma controls, relative-volume filtering, and short holding periods. The research value of the branch is that it makes the selectivity versus density tradeoff visible.
Method: how the VWAP mean reversion backtest was structured
As described in Episode 8, c36 keeps the same core signal family while varying the degree of selectivity.
The quality version raises the entry threshold, requires stronger relative volume, narrows acceptable sigma and slope conditions, and cuts the time-in-trade budget. In plain language, it asks for cleaner dislocations and exits them quickly. The opportunity version loosens those requirements so more trades can form, even if the average setup is less extreme.
The branch is then monetized through quote-aware single-leg option execution in the 0-2DTE window. This is an important detail because it keeps the comparison focused. The underlying mean-reversion family stays conceptually stable, and the main experimental question becomes whether widening the opportunity set preserves enough quality to justify the extra trades.
Evidence and results
The repository's summary now gives a clean comparison:
c36_vwap_mr_option_native_quality_v1:+16004PnL,15trades,DSR 0.6400- failed only
trades_per_week_ok c36_vwap_mr_option_native_opportunity_v1:85trades and+2987PnL- the denser branch lost enough quality that it did not replace the higher-quality profile
This is one of the most useful negative-positive pairs in the repo. The quality branch says the edge is not imaginary. The opportunity branch says density cannot be purchased for free. The repo ended with a strategy that was interesting enough to keep and not strong enough to promote.
What worked
What worked was the signal logic itself. The repo did not find a dead branch here. It found a branch with a real positive profile that survived stricter evaluation than many other ideas in the same period. That is why c36 remains backup_candidate or open_paper_only in the portfolio map rather than being closed.
This also makes c36 a strong public case study. Many strategy writeups only show complete failures or obvious winners. This one shows something much closer to real research: a credible signal with a real operational weakness.
What failed
What failed was density. The best-quality version simply did not trade often enough to satisfy the repo's portfolio admission bar. The exact failed condition was trades_per_week_ok. That is not a cosmetic failure. It means the branch could make money and still fail the job it needed to do as a component of a diversified portfolio.
The opportunity version then showed why loosening the filters was not an easy repair. Trade count rose sharply, but the quality of the branch did not remain strong enough. The branch could be selective and sparse or denser and weaker, but it did not yet find the middle ground that would justify promotion.
Takeaway
The c36 result is one of the best examples in this repo of how a good backtest can still stop short of deployment. The strategy had real signal and real profits in its quality form. It also had a real density problem. That combination is precisely why VWAP mean reversion remains a live research question here rather than a closed one.
If you want the options-expression angle of this tradeoff, Intraday Mean Reversion Options: Why Signal Quality Drops When You Chase Density goes one step further. If you want the c36 decision itself, VWAP Z-Score Strategy: How We Evaluated c36 and Why It Still Was Not Promoted focuses on the admission bar. Join the research log to get the next backtest and failure report.
How the terminology applies
For VWAP Mean Reversion Backtest: The Logic, the Edge, and the Failure Modes, the backtesting workflow should treat Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, and Signal timestamp as operational state rather than glossary decoration. That framing keeps the research claim causal: the strategy can only select instruments, prices, and labels that existed at the decision time.
A developer implementing this Case Study idea should persist Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API beside the result, instead of leaving those words in a term card. It also turns attractive performance into an auditable record where fills, skips, thresholds, and replay inputs can be challenged independently.
The review artifact for VWAP Mean Reversion Backtest: The Logic, the Edge, and the Failure Modes becomes more useful when OPRA-originating data, OCC option symbol, Bid/ask spread, Midpoint, Quote/trade condition, and Quote vs trade semantics appear in the same body of evidence as the selected rows. When a result is promoted, these fields should appear in the run manifest, rather than a prose summary or final equity curve.
In production notes for this backtesting workflow, REST snapshot, WebSocket stream, Entitlement gate, Quote freshness, Timestamp semantics, and Pagination cursor define the checks that decide whether the workflow is reproducible. The result is a backtest that can be rerun, compared across threshold families, and rejected when the evidence is not strong enough.
For VWAP Mean Reversion Backtest: The Logic, the Edge, and the Failure Modes, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the backtesting workflow.
This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.
Terminology
Market-data terms used in this article
These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.
Point-in-time contracts
Contract discovery anchored to the research date so a backtest does not use future listings.
Quote-aware fills
Entry and exit assumptions based on bid/ask quotes, quote age, spread width, and side-specific fill rules.
Reject reasons
Logged explanations for skipped contracts or fills, including stale quote, wide spread, no bid, or missing data.
Replay artifact
The saved request, selection, fill, reject, and metric record that lets another developer audit the backtest.
Cache key
The structured identifier that keeps provider, endpoint, ticker, timestamp, plan, and schema state from being mixed.
Signal timestamp
The exact time a strategy made a decision, used to reconstruct the visible universe and quote window causally.
Look-ahead leakage
A research error where a fill, contract, indicator, or label uses information unavailable at decision time.
Walk-forward test
A validation method that repeatedly trains and evaluates across separated time windows instead of trusting one optimized sample.
Slippage model
A fill-cost assumption based on bid/ask side, midpoint, spread percent, quote age, and liquidity policy.
Same-bar fill
An intraday backtest assumption that can become invalid when signal, entry, stop, and target ordering is ambiguous.
Promotion gate
The written threshold that decides whether a research candidate can move into paper trading or production monitoring.
Options data API
The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.
OPRA-originating data
The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.
OCC option symbol
The exact option contract identifier that preserves root, expiration, call or put side, and strike.
Bid/ask spread
The execution interval between bid and ask that determines whether a contract is realistically tradable.
Midpoint
The computed center between bid and ask, useful as a reference price but not proof that an order would fill.
Quote/trade condition
The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.
Quote vs trade semantics
The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.
REST snapshot
A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.
WebSocket stream
A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.
Entitlement gate
The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.
Quote freshness
The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.
Timestamp semantics
The exchange, provider, ingestion, session, and application time context attached to a market-data record.
Pagination cursor
The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.

Written by
Daniel Ratke
Research & Engineering
Daniel covers the deeper research notes: options backtesting, execution realism, robustness testing, data engineering, and strategy validation.
Product links
Build the workflow with CuteMarkets
This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.
Beginner options path
Send newcomers to the beginner path for calls, puts, chains, Greeks, IV, and risk.
Options Data API
See the main options overview for real-time and historical options data.
Historical Options Data API
Inspect the historical contracts, quotes, trades, and aggregates workflow.
Options Chain API
Go straight to chain snapshots, expirations, and strike discovery.
Pricing
Review plans before you move from free evaluation into production usage.