CuteMarkets Docs

Backtesting Framework

Framework guides for engineers building realistic options backtests with causal data, quote-aware fills, and robust validation.

Tip: open /docs/backtesting-test-plan.md directly for raw markdown (easy copy/paste into an LLM).

Backtesting Test Plan

A backtesting framework needs tests for scientific behavior, not only for syntax. The test suite should prove that the simulator does not leak future information, select stale contracts, overstate fills, or compute portfolio metrics from the wrong unit of observation.

Core test groups

GroupWhat to prove
CausalitySignals only use completed and available data.
Contract selectionHistorical universes, DTE rules, and cache keys produce stable, explainable choices.
Execution realismQuotes, spreads, stops, targets, and fallbacks behave as configured.
Portfolio mathCombined daily PnL drives risk metrics when multiple symbols trade.
RobustnessFolds, holdouts, PBO, and deflated Sharpe use the intended rows.
Public examplesDocs and CLI examples import public names and run against sample or mocked data.

Causality tests

Write tests that make leakage obvious. For an intraday setup, create a session where the signal bar closes beyond a threshold but the same bar's open would have been an impossible fill. The expected behavior is signal on bar t, entry on the next observable bar or quote after t.

Also test:

  • opening-range values come only from the opening-range window
  • prior-day filters do not include the current session close
  • premarket filters do not use regular-session bars
  • exit timestamps cannot precede entry timestamps
  • daily forecast paths only use data available before the forecast date

Contract-selection tests

Use tiny deterministic contract universes. Include similar strikes and expirations so ranking errors are visible.

Required cases:

  • no contracts in the DTE window returns a rejection
  • spread filters reject the wider contract
  • volume or open-interest filters reject inactive contracts
  • changing the entry underlying price can change the selected strike
  • changing the selection timestamp can change quote-aware ranking
  • vertical structures reject missing or invalid paired legs
  • persistent caches do not override a different selection context

Execution tests

Build quote windows by hand. Tests should cover valid quotes, crossed quotes, missing entry quotes, missing exit quotes, wide spreads, and bar fallback settings.

For stops and targets, create quote sequences where the stop is touched before the target and another where the target is touched first. The framework should record the right exit reason and use the configured fill side.

Portfolio and robustness tests

Portfolio tests should use at least two symbols trading on the same calendar day. The expected Sharpe and Sortino inputs should come from one combined daily PnL row for that day, not two separate pseudo-days.

Robustness tests should verify:

  • train and test windows do not overlap unless intentionally configured
  • selected-fold rows feed selection diagnostics
  • combined-fold rows feed portfolio diagnostics
  • sparse profiles fail minimum trade gates
  • rejected profiles still appear in diagnostic summaries

CI commands

For the public site, run:

bash
npm run lint
npm run build
npm run audit:seo

For a Python framework package, keep a focused public-surface test command:

bash
PYTHONPATH=src python -m pytest tests/test_public_surface.py -q

The exact file names can change. The standard should not: every framework change that affects causality, selection, fills, or metrics needs a regression test.

Read next: Backtesting Framework and cutebacktests.

Next steps

Move from the docs into the product workflow

If you are evaluating the API rather than implementing a specific endpoint right now, the product pages map the live, historical, and chain workflows directly.