What To Log Before Optimizing a Backtest

Viktoria Chapov
Product & Education
What To Log Before Optimizing a Backtest
Before optimizing, log the complete decision stream: signal time, selected instrument, quote evidence, rejects, parameters, selected trades, daily PnL, and summary diagnostics.

Term map
Backtesting vocabulary for this article
Treat signal timestamp, point-in-time universe, quote-aware fill, reject reason, replay artifact, walk-forward test, and cache key as first-class terms. They separate reproducible research from a backtest that only preserves the final performance table.
Follow the linked definitions for Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, Signal timestamp, Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API.
Read this article with Options Backtesting API, Backtesting Framework, Backtesting Data Quality Checklist, Backtesting Execution Realism, Quote-Aware Options Backtests, and Backtest to Paper Trading Parity Checklist.
Abstract
Optimization is cheap once a backtesting framework exists. Trust is not. Before sweeping parameters, a developer should decide which artifacts prove the backtest is replaying the right market state and more than producing attractive summary numbers.
The practical rule is simple: every promoted result should leave enough evidence for another process to rebuild the same decision stream. That means more than a final Sharpe ratio.
Start With Decision Logs
A decision log should record the signal timestamp, the market data timestamp, the selected instrument, and the reason the trade was accepted or rejected. For options strategies, it should also record DTE, strike, contract type, spread width, quote age, and the pricing side used for entry and exit.
This is the difference between a research notebook and a system. A notebook can say "the strategy bought calls." A system should say which calls, why those calls were visible, which quote priced the order, and why the nearby alternatives were rejected.
Make The Run Reconstructable
The standard for a useful log is reconstructability. Another process should be able to read the artifact and rebuild the same decision stream without guessing which assumptions were active. That means logging the strategy profile, parameter values, data cut, session calendar, symbol universe, option selection policy, fill policy, and software version or manifest hash.
This sounds heavy until the first promising result has to be debugged. A chart can show that a branch made money, but it cannot tell you whether the branch used stale quotes, a newer contract calendar, or a different stop rule than the run beside it. A reconstructable artifact turns the result from a screenshot into a testable object.
For developers, the practical implementation is usually a small manifest file plus structured tables. The manifest describes the experiment. The selected-trades table records accepted trades. The rejects table records opportunities that did not become trades. The daily PnL table turns the trade stream into a time series. The summary table is then a derivative, not the source of truth.
Keep Summary Metrics Separate
Summary metrics still matter. You need return, drawdown, trade count, win rate, and risk-adjusted statistics. But those metrics should be downstream of the decision stream. If the decision stream is wrong, the summary is just a formatted mistake.
In the CuteMarkets research workflow, the useful summaries are the ones that can point back to selected trades, daily PnL, fold results, and diagnostics. A top-level number without those links is hard to debug and easy to overfit.
Log The Search, Not Only The Winner
Optimization creates selection pressure. If a run tests many thresholds, DTE windows, stop sizes, and time filters, the final winner is affected by the number of attempts. The log should therefore describe the search space, the full search space beside the selected configuration. A result found after testing five variations is different from a result found after testing five thousand.
The minimum useful record is the grid definition and the number of evaluated branches. A stronger record includes fold-level results, out-of-sample slices, rejected branches, and the reason each branch was closed. This makes later robustness checks more meaningful. Deflated Sharpe, PBO, and walk-forward diagnostics all depend on knowing that the strategy was selected from a population rather than discovered in isolation.
Developers should also log near misses. Neighboring parameters that behave similarly make a result more credible. A lone winner surrounded by weak variants is more likely to be a selection artifact, even when its final metric looks clean.
Log Negative Evidence
The most valuable optimization runs often produce no-go artifacts. Zero-trade branches, sparse branches, high-concentration winners, and strategies that fail quote checks should be kept. They prevent the same weak idea from being rediscovered later with a new name.
Developers tend to delete failed runs because they are noisy. Trading research improves when failed runs become searchable. A future strategy sweep should know that a DTE bucket failed because it had no listed expiry, not because the signal was inherently weak.
Separate Data Problems From Strategy Problems
A no-go result is most useful when it classifies the failure. Some failures are market-data problems: missing bars, unavailable expirations, stale quotes, or incomplete reference data. Some are execution problems: spreads too wide, no tradable side, or excessive quote age. Some are strategy problems: poor timing, weak asymmetry, concentration, or unstable behavior across folds.
Putting all of those into one "bad backtest" bucket wastes information. The next researcher will not know whether to improve data coverage, change the expression, or abandon the signal. A simple failure taxonomy prevents that ambiguity.
One practical pattern is to log both reject_reason and research_status. The reject reason describes a single event. The research status describes the branch: data_blocked, execution_blocked, signal_weak, too_sparse, overfit_risk, or promoted_for_replay. That gives the team a compact map of what happened without turning the report into prose only.
The Minimum Artifact Set
A serious backtest should write a profile manifest, selected trade rows, daily PnL, rejected-trade counts, top failure reasons, and the exact parameters used for the run. For strategy families, add fold summaries and out-of-sample diagnostics.
Add one more artifact when the branch looks promising: a short launch note. It should state the selected profile, the intended paper-trading constraints, the known weaknesses, and the checks that must match during replay. This note is not a marketing document. It is a contract between research and operation.
Takeaway
Do not optimize first. Instrument first. A backtest that logs decisions, rejects, and summaries gives you a system you can improve. A backtest that only logs the winner gives you a story you cannot audit.
Related workflow
For the What To Log Before Optimizing a Backtest workflow, continue through Options Backtesting API, Backtesting Framework, Backtesting Execution Realism, Backtesting Data Quality Checklist, Quote-Aware Options Backtests, and Backtest to Paper Trading Parity Checklist.
How the terminology applies
For What To Log Before Optimizing a Backtest, the backtesting workflow should treat Point-in-time contracts, Quote-aware fills, Reject reasons, Replay artifact, Cache key, and Signal timestamp as operational state rather than glossary decoration. That framing keeps the research claim causal: the strategy can only select instruments, prices, and labels that existed at the decision time.
A developer implementing this Developer Guide idea should persist Look-ahead leakage, Walk-forward test, Slippage model, Same-bar fill, Promotion gate, and Options data API beside the result, instead of leaving those words in a term card. It also turns attractive performance into an auditable record where fills, skips, thresholds, and replay inputs can be challenged independently.
The review artifact for What To Log Before Optimizing a Backtest becomes more useful when OPRA-originating data, OCC option symbol, Bid/ask spread, Midpoint, Quote/trade condition, and Quote vs trade semantics appear in the same body of evidence as the selected rows. When a result is promoted, these fields should appear in the run manifest, rather than a prose summary or final equity curve.
In production notes for this backtesting workflow, REST snapshot, WebSocket stream, Entitlement gate, Quote freshness, Timestamp semantics, and Pagination cursor define the checks that decide whether the workflow is reproducible. The result is a backtest that can be rerun, compared across threshold families, and rejected when the evidence is not strong enough.
For What To Log Before Optimizing a Backtest, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the backtesting workflow.
This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.
Terminology
Market-data terms used in this article
These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.
Point-in-time contracts
Contract discovery anchored to the research date so a backtest does not use future listings.
Quote-aware fills
Entry and exit assumptions based on bid/ask quotes, quote age, spread width, and side-specific fill rules.
Reject reasons
Logged explanations for skipped contracts or fills, including stale quote, wide spread, no bid, or missing data.
Replay artifact
The saved request, selection, fill, reject, and metric record that lets another developer audit the backtest.
Cache key
The structured identifier that keeps provider, endpoint, ticker, timestamp, plan, and schema state from being mixed.
Signal timestamp
The exact time a strategy made a decision, used to reconstruct the visible universe and quote window causally.
Look-ahead leakage
A research error where a fill, contract, indicator, or label uses information unavailable at decision time.
Walk-forward test
A validation method that repeatedly trains and evaluates across separated time windows instead of trusting one optimized sample.
Slippage model
A fill-cost assumption based on bid/ask side, midpoint, spread percent, quote age, and liquidity policy.
Same-bar fill
An intraday backtest assumption that can become invalid when signal, entry, stop, and target ordering is ambiguous.
Promotion gate
The written threshold that decides whether a research candidate can move into paper trading or production monitoring.
Options data API
The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.
OPRA-originating data
The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.
OCC option symbol
The exact option contract identifier that preserves root, expiration, call or put side, and strike.
Bid/ask spread
The execution interval between bid and ask that determines whether a contract is realistically tradable.
Midpoint
The computed center between bid and ask, useful as a reference price but not proof that an order would fill.
Quote/trade condition
The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.
Quote vs trade semantics
The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.
REST snapshot
A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.
WebSocket stream
A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.
Entitlement gate
The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.
Quote freshness
The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.
Timestamp semantics
The exchange, provider, ingestion, session, and application time context attached to a market-data record.
Pagination cursor
The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.
FAQ
Related questions
Why should failed backtest runs be logged?
Failed runs prevent repeated work by documenting whether a branch died from weak signal logic, missing data, execution rejects, concentration, or poor robustness.

Written by
Viktoria Chapov
Product & Education
Viktoria writes the approachable side of CuteMarkets: product updates, practical tutorials, market context, and beginner-friendly API workflows.
Product links
Build the workflow with CuteMarkets
This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.
Beginner options path
Send newcomers to the beginner path for calls, puts, chains, Greeks, IV, and risk.
Options Data API
See the main options overview for real-time and historical options data.
Historical Options Data API
Inspect the historical contracts, quotes, trades, and aggregates workflow.
Options Chain API
Go straight to chain snapshots, expirations, and strike discovery.
Pricing
Review plans before you move from free evaluation into production usage.