UOA Exact-Contract Backtests: Strong PnL Was Not Enough

Daniel Ratke
Research & Engineering
UOA Exact-Contract Backtests: Strong PnL Was Not Enough
The May 8 UOA pass produced strong local exact-contract results, including 20 quote-closed trades and 17,310 quote PnL for two high-volume profiles, but base PBO was 0.881 and the remote holdout found zero executable candidates.

Term map
Backtesting vocabulary for this article
Treat signal timestamp, point-in-time universe, quote-aware fill, reject reason, replay artifact, walk-forward test, and cache key as first-class terms. They separate reproducible research from a backtest that only preserves the final performance table.
Follow the linked definitions for Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, Signal timestamp, Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API.
Read this article with Options Data API, Options Chain API, Historical Options Data API, Options Volume and Open Interest, Options Flow False Positives, and Option Quote and Trade Conditions.
Repository reference: cutebacktests
Abstract
The May 8 research pass looked exciting at first. The unusual-options-activity branch finally moved away from stock proxies and into exact option contracts, which is the right direction for a strategy that claims to trade option flow.
The best quote-validated shortlist rows were green. The hiVol_hiPrem_early_both_x1 and hiVol_hiPrem_10am_both_x1 profiles each selected 22 bar trades, closed 20 quote-priced trades, made 17310.0 total quote PnL, posted a 55% win rate, and showed daily Sharpe 4.5526. The broader midVol_hiPrem_early_both_x1 profile closed 25 quote-priced trades and made 17907.0.
That was the good news. The bad news was more important: the robustness checks and holdout evidence did not support promotion.
Question
The research question on May 8 was simple: did the new contract-exact UOA path produce a promotable options model family, or only a strong-looking local artifact?
The distinction matters. A UOA strategy that only works when contract selection is loose, quote coverage is partial, or the best period dominates the result is not ready for paper trading. The model has to survive exact contracts, quote-aware fills, PBO/DSR checks, and a separate holdout without quietly turning into a different object.
Method
The May 8 pass focused on exact contracts whose intraday option volume and premium were unusually large. The selected profiles used early-session windows, high cumulative option volume, high estimated premium, short DTE, and quote validation on the option itself.
The shortlist then went through two stricter checks:
- A quote-validated local summary.
- A separate holdout pass.
The bar-family PBO search also tested threshold-grid families so the result was not judged only by the best-looking profile.
Evidence
The headline local numbers were strong:
| Profile | Selected bar trades | Closed quote trades | Quote PnL | Win rate | Daily Sharpe |
|---|---|---|---|---|---|
hiVol_hiPrem_early_both_x1 | 22 | 20 | 17310.0 | 0.55 | 4.5526 |
hiVol_hiPrem_10am_both_x1 | 22 | 20 | 17310.0 | 0.55 | 4.5526 |
midVol_hiPrem_early_both_x1 | 27 | 25 | 17907.0 | 0.56 | 3.6154 |
hiVol_midPrem_full_calls_x1 | 25 | 19 | 24556.0 | 0.5263 | 3.9697 |
midVol_midPrem_full_calls_x1 | 36 | 27 | 16854.0 | 0.4444 | 2.8866 |
But the same summary reported base PBO 0.8810 across 5 strategies and 11 periods. That is not a small warning. It says the selection process was highly fragile.
The family-level scan looked better in places, but not enough. The best early-call threshold families showed PBO around 0.2143 to 0.2381, yet the best selected DSR was only 0.3680, with 14 trades and 11401.0 PnL. That is promising as a research lead, not a promotion case.
Then the separate holdout was blunt. One high-volume QQQ holdout found 0 bar candidates. A quote-priced calls holdout selected 0 bar trades, closed 0 quote trades, and produced 0 total quote PnL.
What Worked
What worked was the shift in measurement. This was no longer a stock-proxy UOA fantasy. The strategy was evaluated on exact option contracts, quote-priced entries and exits, and explicit coverage reasons such as missing_exit_quote and exit_fetch_error.
That is a real improvement. It forced the research object closer to what a paper bot would actually trade.
What Failed
The May 8 result failed because the strongest local PnL did not survive the promotion logic. The PBO result was too high, the selected DSR was too low in the cleaner family scan, and the separate holdout produced no executable evidence.
This is the kind of result that can fool a researcher if they stop after the PnL table. The quote-validated rows were green, but the robustness and holdout checks said the family was not ready.
On Paper Trading
This kind of result explains issues we often see other systematic traders run into when they move from research to paper trading. A paper module should not simply replay the highest-PnL research row and hope the live path matches it.
Several issues commonly appear between research and paper:
- a signal can be valid, but the selected contract may not have a fresh executable quote at the decision time;
- a holdout can go empty even when the local sample looked strong;
- missing exit quotes can make a result look cleaner than the live process would be;
- PBO and DSR can reject a strategy that still has attractive headline PnL.
Those are the kinds of mechanics we want to explain deeply and build directly into the integrated paper trading module we are working on: timestamped signal records, quote freshness gates, holdout-aware promotion, explicit reject reasons, and paper/backtest parity checks before a strategy is allowed to look successful.
Takeaway
May 8 was a useful false start. It showed that contract-exact unusual-options research could produce real-looking option PnL, but it also showed why exact fills are only the first gate.
The next step was not to paper this exact branch. The next step was to ask which mechanics could keep exact-contract cleanliness while reducing selection fragility and holdout collapse. That question shaped the following week's broader model-family search.
Related workflow
For the UOA Exact-Contract Backtests: Strong PnL Was Not Enough workflow, continue through Options Data API, Options Chain API, Historical Options Data API, Options Volume and Open Interest, Options Flow False Positives, and Option Quote and Trade Conditions.
How the terminology applies
For UOA Exact-Contract Backtests: Strong PnL Was Not Enough, the backtesting workflow should treat Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, and Signal timestamp as operational state rather than glossary decoration. That framing keeps the research claim causal: the strategy can only select instruments, prices, and labels that existed at the decision time.
A developer implementing this Research Log idea should persist Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API beside the result, instead of leaving those words in a term card. It also turns attractive performance into an auditable record where fills, skips, thresholds, and replay inputs can be challenged independently.
The review artifact for UOA Exact-Contract Backtests: Strong PnL Was Not Enough becomes more useful when OPRA-originating data, OCC option symbol, Bid/ask spread, Midpoint, Quote/trade condition, and Quote vs trade semantics appear in the same body of evidence as the selected rows. When a result is promoted, these fields should appear in the run manifest, rather than a prose summary or final equity curve.
In production notes for this backtesting workflow, REST snapshot, WebSocket stream, Entitlement gate, Quote freshness, Timestamp semantics, and Pagination cursor define the checks that decide whether the workflow is reproducible. The result is a backtest that can be rerun, compared across threshold families, and rejected when the evidence is not strong enough.
For UOA Exact-Contract Backtests: Strong PnL Was Not Enough, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the backtesting workflow.
This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.
Terminology
Market-data terms used in this article
These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.
Point-in-time contracts
Contract discovery anchored to the research date so a backtest does not use future listings.
Quote-aware fills
Entry and exit assumptions based on bid/ask quotes, quote age, spread width, and side-specific fill rules.
Reject reasons
Logged explanations for skipped contracts or fills, including stale quote, wide spread, no bid, or missing data.
Replay artifact
The saved request, selection, fill, reject, and metric record that lets another developer audit the backtest.
Cache key
The structured identifier that keeps provider, endpoint, ticker, timestamp, plan, and schema state from being mixed.
Signal timestamp
The exact time a strategy made a decision, used to reconstruct the visible universe and quote window causally.
Look-ahead leakage
A research error where a fill, contract, indicator, or label uses information unavailable at decision time.
Walk-forward test
A validation method that repeatedly trains and evaluates across separated time windows instead of trusting one optimized sample.
Slippage model
A fill-cost assumption based on bid/ask side, midpoint, spread percent, quote age, and liquidity policy.
Same-bar fill
An intraday backtest assumption that can become invalid when signal, entry, stop, and target ordering is ambiguous.
Promotion gate
The written threshold that decides whether a research candidate can move into paper trading or production monitoring.
Options data API
The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.
OPRA-originating data
The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.
OCC option symbol
The exact option contract identifier that preserves root, expiration, call or put side, and strike.
Bid/ask spread
The execution interval between bid and ask that determines whether a contract is realistically tradable.
Midpoint
The computed center between bid and ask, useful as a reference price but not proof that an order would fill.
Quote/trade condition
The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.
Quote vs trade semantics
The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.
REST snapshot
A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.
WebSocket stream
A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.
Entitlement gate
The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.
Quote freshness
The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.
Timestamp semantics
The exchange, provider, ingestion, session, and application time context attached to a market-data record.
Pagination cursor
The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.
FAQ
Related questions
Why was the May 8 UOA result not promoted?
The local PnL was attractive, but robustness and holdout evidence failed: PBO was too high, selected DSR was weak in the cleaner family scan, and the remote holdout produced no executable trades.

Written by
Daniel Ratke
Research & Engineering
Daniel covers the deeper research notes: options backtesting, execution realism, robustness testing, data engineering, and strategy validation.
Product links
Build the workflow with CuteMarkets
This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.
Beginner options path
Send newcomers to the beginner path for calls, puts, chains, Greeks, IV, and risk.
Options Data API
See the main options overview for real-time and historical options data.
Historical Options Data API
Inspect the historical contracts, quotes, trades, and aggregates workflow.
Options Chain API
Go straight to chain snapshots, expirations, and strike discovery.
Pricing
Review plans before you move from free evaluation into production usage.