Why should failed backtest runs be logged?

Failed runs prevent repeated work by documenting whether a branch died from weak signal logic, missing data, execution rejects, concentration, or poor robustness.

Developer GuideMay 8, 2026·8 min read

What To Log Before Optimizing a Backtest

CuteMarkets Team

Research

Quick answer

What To Log Before Optimizing a Backtest

Before optimizing, log the complete decision stream: signal time, selected instrument, quote evidence, rejects, parameters, selected trades, daily PnL, and summary diagnostics.

What To Log Before Optimizing a Backtest

Abstract

Optimization is cheap once a backtesting framework exists. Trust is not. Before sweeping parameters, a developer should decide which artifacts prove the backtest is replaying the right market state and not merely producing attractive summary numbers.

The practical rule is simple: every promoted result should leave enough evidence for another process to rebuild the same decision stream. That means more than a final Sharpe ratio.

Start With Decision Logs

A decision log should record the signal timestamp, the market data timestamp, the selected instrument, and the reason the trade was accepted or rejected. For options strategies, it should also record DTE, strike, contract type, spread width, quote age, and the pricing side used for entry and exit.

This is the difference between a research notebook and a system. A notebook can say "the strategy bought calls." A system should say which calls, why those calls were visible, which quote priced the order, and why the nearby alternatives were rejected.

Make The Run Reconstructable

The standard for a useful log is reconstructability. Another process should be able to read the artifact and rebuild the same decision stream without guessing which assumptions were active. That means logging the strategy profile, parameter values, data cut, session calendar, symbol universe, option selection policy, fill policy, and software version or manifest hash.

This sounds heavy until the first promising result has to be debugged. A chart can show that a branch made money, but it cannot tell you whether the branch used stale quotes, a newer contract calendar, or a different stop rule than the run beside it. A reconstructable artifact turns the result from a screenshot into a testable object.

For developers, the practical implementation is usually a small manifest file plus structured tables. The manifest describes the experiment. The selected-trades table records accepted trades. The rejects table records opportunities that did not become trades. The daily PnL table turns the trade stream into a time series. The summary table is then a derivative, not the source of truth.

Keep Summary Metrics Separate

Summary metrics still matter. You need return, drawdown, trade count, win rate, and risk-adjusted statistics. But those metrics should be downstream of the decision stream. If the decision stream is wrong, the summary is just a formatted mistake.

In the CuteMarkets research workflow, the useful summaries are the ones that can point back to selected trades, daily PnL, fold results, and diagnostics. A top-level number without those links is hard to debug and easy to overfit.

Log The Search, Not Only The Winner

Optimization creates selection pressure. If a run tests many thresholds, DTE windows, stop sizes, and time filters, the final winner is affected by the number of attempts. The log should therefore describe the search space, not only the selected configuration. A result found after testing five variations is different from a result found after testing five thousand.

The minimum useful record is the grid definition and the number of evaluated branches. A stronger record includes fold-level results, out-of-sample slices, rejected branches, and the reason each branch was closed. This makes later robustness checks more meaningful. Deflated Sharpe, PBO, and walk-forward diagnostics all depend on knowing that the strategy was selected from a population rather than discovered in isolation.

Developers should also log near misses. Neighboring parameters that behave similarly make a result more credible. A lone winner surrounded by weak variants is more likely to be a selection artifact, even when its final metric looks clean.

Log Negative Evidence

The most valuable optimization runs often produce no-go artifacts. Zero-trade branches, sparse branches, high-concentration winners, and strategies that fail quote checks should be kept. They prevent the same weak idea from being rediscovered later with a new name.

Developers tend to delete failed runs because they are noisy. Trading research improves when failed runs become searchable. A future strategy sweep should know that a DTE bucket failed because it had no listed expiry, not because the signal was inherently weak.

Separate Data Problems From Strategy Problems

A no-go result is most useful when it classifies the failure. Some failures are market-data problems: missing bars, unavailable expirations, stale quotes, or incomplete reference data. Some are execution problems: spreads too wide, no tradable side, or excessive quote age. Some are strategy problems: poor timing, weak asymmetry, concentration, or unstable behavior across folds.

Putting all of those into one "bad backtest" bucket wastes information. The next researcher will not know whether to improve data coverage, change the expression, or abandon the signal. A simple failure taxonomy prevents that ambiguity.

One practical pattern is to log both reject_reason and research_status. The reject reason describes a single event. The research status describes the branch: data_blocked, execution_blocked, signal_weak, too_sparse, overfit_risk, or promoted_for_replay. That gives the team a compact map of what happened without turning the report into prose only.

The Minimum Artifact Set

A serious backtest should write a profile manifest, selected trade rows, daily PnL, rejected-trade counts, top failure reasons, and the exact parameters used for the run. For strategy families, add fold summaries and out-of-sample diagnostics.

Add one more artifact when the branch looks promising: a short launch note. It should state the selected profile, the intended paper-trading constraints, the known weaknesses, and the checks that must match during replay. This note is not a marketing document. It is a contract between research and operation.

Takeaway

Do not optimize first. Instrument first. A backtest that logs decisions, rejects, and summaries gives you a system you can improve. A backtest that only logs the winner gives you a story you cannot audit.

FAQ

Build the workflow with CuteMarkets

This article is part of the broader CuteMarkets product and research stack. Use the landing pages below to move from the blog into the specific API workflow you want to evaluate.

Learn Options From Zero

Send newcomers to the beginner path for calls, puts, chains, Greeks, IV, and risk.

Options Data API

See the canonical product page for real-time and historical options data.

Historical Options Data API

Inspect the historical contracts, quotes, trades, and aggregates workflow.

Options Chain API

Go straight to chain snapshots, expirations, and strike discovery.

Pricing

Review plans before you move from free evaluation into production usage.

Back to Blog

What To Log Before Optimizing a Backtest

What To Log Before Optimizing a Backtest

What To Log Before Optimizing a Backtest

Abstract

Start With Decision Logs

Make The Run Reconstructable

Keep Summary Metrics Separate

Log The Search, Not Only The Winner

Log Negative Evidence

Separate Data Problems From Strategy Problems

The Minimum Artifact Set

Takeaway

Related questions

Why should failed backtest runs be logged?

Build the workflow with CuteMarkets