HomeBlogVWAP Z-Score Strategy: How We Evaluated c36 and Why It Still Was Not Promoted
ValidationApril 17, 2026·6 min read

VWAP Z-Score Strategy: How We Evaluated c36 and Why It Still Was Not Promoted

Daniel Ratke

Daniel Ratke

Research & Engineering

VWAP Z-Score Strategy: How We Evaluated c36 and Why It Still Was Not Promoted

Term map

Backtesting vocabulary for this article

Treat signal timestamp, point-in-time universe, quote-aware fill, reject reason, replay artifact, walk-forward test, and cache key as first-class terms. They separate reproducible research from a backtest that only preserves the final performance table.

Follow the linked definitions for Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, Signal timestamp, Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API.

Repository reference: cutebacktests

Abstract

A VWAP z-score strategy can be profitable and still fail the admission test that matters. That is one of the clearest lessons from the c36 branch in this repository. The high-quality version of the strategy produced +16004 PnL on 15 trades with DSR 0.6400, yet the repo still refused to promote it because the exact failed gate was trades_per_week_ok.

This shows how a portfolio-minded research process differs from a marketing-minded one. A marketing process would stop at the profit figure and the positive DSR. A portfolio process asks whether the strategy is active enough, clean enough, and additive enough to justify a slot. In c36's case, the answer was still no.

This c36 evaluation belongs with VWAP Mean Reversion Signal Quality and Density, Backtesting Engine Loop, and Backtesting Robustness. Keep VWAP deviation, z-score threshold, setup density, next-bar entry, stop/target logic, Sharpe, Sortino, DSR, PBO, and drawdown explicit.

Question

The useful question is not whether c36 made money. It did. The useful question is why the repo still kept it below c66 in the promotion ladder.

Many traders use "profitable" and "deployable" as if they meant the same thing. They do not. A profitable branch can still be too sparse, too unstable, or too narrow to serve the role a live portfolio needs it to serve.

Method: How the VWAP Z-Score Strategy Was Evaluated

As described in Episode 8, c36 is an option-native descendant of the c18 VWAP mean-reversion family. It uses VWAP residual z-scores, bounded VWAP slope, sigma constraints, relative-volume requirements, and short holding windows. The high-quality version requires stronger excursions and cleaner conditions. The opportunity version relaxes those conditions to gain trade count.

The evaluation then does not stop at raw PnL. The branch is judged in portfolio context. Is the quality branch active enough? Does the opportunity branch preserve the same quality if density increases? Does the strategy deserve promotion, open-paper status, or a lower rung?

Evidence / Results

The c36 results are now well defined in the repo:

  • c36_vwap_mr_option_native_quality_v1: +16004 PnL, 15 trades, DSR 0.6400
  • failed only trades_per_week_ok
  • c36_vwap_mr_option_native_opportunity_v1: 85 trades, +2987 PnL
  • denser version did not preserve the same quality profile

The portfolio map in Toward The One Piece Of Sharpe and PAPER_BOTS.md then places c36 below c66. It remains a backup_candidate or open_paper_only, not the lead paper bot.

What Worked

What worked was the signal itself. c36 is not a fake branch. It is one of the repo's few strategies that produced a clean positive quality result under a relatively strict research process. That is why it remains in the current crew at all.

The branch also worked as a diagnostic case. It showed that the repo was willing to separate "interesting enough to keep studying" from "strong enough to promote." That distinction is one of the healthiest things about the recent research process.

What Failed

What failed was the role fit. The strategy was too sparse in its best form, and the denser version was not good enough to justify replacing the selective one. That is a portfolio failure, not a conceptual failure.

The c36 result is valuable precisely because it is not dramatic. The branch did not die in a blow-up. It stopped below the promotion line because one important constraint stayed unresolved. This is how many real strategies remain stuck. They are good enough to keep alive and not good enough to scale.

Takeaway

The c36 VWAP z-score strategy made money, but it still was not promoted because the portfolio bar is higher than simple profitability. The exact failed gate, trades_per_week_ok, tells the whole story. The edge was real. The role fit was not.

If you want the wider signal-level view, VWAP Mean Reversion Backtest: The Logic, the Edge, and the Failure Modes covers the branch in more detail. If you want the density tradeoff, Intraday Mean Reversion Options: Why Signal Quality Drops When You Chase Density is the natural companion. Join the research log to get the next backtest and failure report.

How the terminology applies

For VWAP Z-Score Strategy: How We Evaluated c36 and Why It Still Was Not Promoted, the backtesting workflow should treat Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, and Signal timestamp as operational state rather than glossary decoration. That framing keeps the research claim causal: the strategy can only select instruments, prices, and labels that existed at the decision time.

A developer implementing this Validation idea should persist Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API beside the result, instead of leaving those words in a term card. It also turns attractive performance into an auditable record where fills, skips, thresholds, and replay inputs can be challenged independently.

The review artifact for VWAP Z-Score Strategy: How We Evaluated c36 and Why It Still Was Not Promoted becomes more useful when OPRA-originating data, OCC option symbol, Bid/ask spread, Midpoint, Quote/trade condition, and Quote vs trade semantics appear in the same body of evidence as the selected rows. When a result is promoted, these fields should appear in the run manifest, rather than a prose summary or final equity curve.

In production notes for this backtesting workflow, REST snapshot, WebSocket stream, Entitlement gate, Quote freshness, Timestamp semantics, and Pagination cursor define the checks that decide whether the workflow is reproducible. The result is a backtest that can be rerun, compared across threshold families, and rejected when the evidence is not strong enough.

For VWAP Z-Score Strategy: How We Evaluated c36 and Why It Still Was Not Promoted, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the backtesting workflow.

This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.

The z-score is not the whole row

The c36 evaluation should keep VWAP z-score beside the option market that expressed it. The signal row needs underlying OHLCV aggregate fields, VWAP, z-score, signal timestamp, entry cutoff, and market session. The option row needs point-in-time contract discovery, selected OCC option symbol, DTE bucket, moneyness band, bid, ask, spread percent, quote freshness, implied volatility, Greeks, and open interest.

That split explains why a high-quality branch can still miss promotion. A clean z-score can appear on a contract with thin top-of-book size, a stale NBBO, or a quote condition that blocks the fill model. A contract can also pass execution checks while the setup density remains too low for portfolio use. Those are separate gates, and the result should show which one failed.

For future retests, store the cache key and replay manifest with the same care as the PnL summary. If the run changes because of a different quote window, schema version, or pagination policy, the reviewer should see that before comparing Sharpe, DSR, or trades per week.

Terminology

Market-data terms used in this article

These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.

Point-in-time contracts

Contract discovery anchored to the research date so a backtest does not use future listings.

Quote-aware fills

Entry and exit assumptions based on bid/ask quotes, quote age, spread width, and side-specific fill rules.

Reject reasons

Logged explanations for skipped contracts or fills, including stale quote, wide spread, no bid, or missing data.

Replay artifact

The saved request, selection, fill, reject, and metric record that lets another developer audit the backtest.

Cache key

The structured identifier that keeps provider, endpoint, ticker, timestamp, plan, and schema state from being mixed.

Signal timestamp

The exact time a strategy made a decision, used to reconstruct the visible universe and quote window causally.

Look-ahead leakage

A research error where a fill, contract, indicator, or label uses information unavailable at decision time.

Walk-forward test

A validation method that repeatedly trains and evaluates across separated time windows instead of trusting one optimized sample.

Slippage model

A fill-cost assumption based on bid/ask side, midpoint, spread percent, quote age, and liquidity policy.

Same-bar fill

An intraday backtest assumption that can become invalid when signal, entry, stop, and target ordering is ambiguous.

Promotion gate

The written threshold that decides whether a research candidate can move into paper trading or production monitoring.

Options data API

The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.

OPRA-originating data

The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.

OCC option symbol

The exact option contract identifier that preserves root, expiration, call or put side, and strike.

Bid/ask spread

The execution interval between bid and ask that determines whether a contract is realistically tradable.

Midpoint

The computed center between bid and ask, useful as a reference price but not proof that an order would fill.

Quote/trade condition

The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.

Quote vs trade semantics

The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.

REST snapshot

A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.

WebSocket stream

A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.

Entitlement gate

The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.

Quote freshness

The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.

Timestamp semantics

The exchange, provider, ingestion, session, and application time context attached to a market-data record.

Pagination cursor

The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.

Daniel Ratke

Written by

Daniel Ratke

Research & Engineering

Daniel covers the deeper research notes: options backtesting, execution realism, robustness testing, data engineering, and strategy validation.