HomeBlogHistorical Options Backtesting: Data, Fills, and Slippage That Actually Matter
Case StudyApril 14, 2026·9 min read

Historical Options Backtesting: Data, Fills, and Slippage That Actually Matter

Daniel Ratke

Daniel Ratke

Research & Engineering

Historical Options Backtesting: Data, Fills, and Slippage That Actually Matter

Term map

Execution-realism vocabulary for this article

Keep bid, ask, midpoint, quote age, spread percent, last-price risk, no-bid exit, and rejected fill separate. Options execution language is where many attractive research results become fragile.

Follow the linked definitions for Quote window, Last price risk, Spread percent, No-bid exit, Side-specific fill, Stale quote, Liquidity filter, Trade print validation, Condition-aware fill, Options data API, OPRA-originating data, and OCC option symbol.

Repository reference: cutebacktests

Abstract

Historical options backtesting fails most often at the data layer, not at the signal layer. Traders often think they have done "historical options research" when they have downloaded a chain snapshot, picked the nearest strike, and priced the trade with a clean midpoint. That workflow is fast, but it is not enough for serious inference. A useful historical options backtest usually needs historically correct contracts, timestamped quotes, trade prints, and rules for what happens when liquidity is thin.

This repository reached that conclusion from two directions at once. The first was negative: the March 8 audit showed that even contract-selection cache logic could silently distort the tested instrument if the relevant underlying-price bucket was ignored, as described in Backtesting Framework Issue Summary. The second was constructive: the CuteMarkets earnings research notes in Earnings Options Plays Around Earnings, Condensed lay out the practical historical data stack, including contracts with as_of, historical quotes, trades, aggregates, and expirations.

For the full options-data workflow, read this with Historical Options Replay Runbook, Option Quote and Trade Conditions, Backtesting Execution Realism, and Options Data Provider Evaluation. A causal replay needs point-in-time contract selection, OCC symbol identity, DTE, bid, ask, NBBO context, quote age, spread width, trade condition, implied volatility, and open interest in the actual fill logic.

Question

The right question is not "where do I get old option chains?" The right question is: which data objects are required before a historical options backtest deserves to be called causal?

For serious work, the answer is broader than many traders expect. You need contract discovery that respects historical availability. You need quote or trade data near the actual decision window. You need a pricing rule that says what happens when the midpoint is missing or the spread is wide. You also need to know when the option data API stops and the event data begins. CuteMarkets, for example, can supply contracts, chain snapshots, quotes, trades, and aggregates, but an earnings calendar still has to come from outside the options feed.

Method: What Historical Options Backtesting Actually Needs

I think of historical options backtesting as a sequence of four questions.

First, which contracts existed on the day you claim to be studying? This matters more than it sounds. If your backtest accidentally uses a contract that did not exist on the pre-event date, the whole event study is contaminated. The CuteMarkets examples in the repo solve this with the contracts endpoint and historical as_of, which lets the researcher discover historically correct contracts instead of looking backward from today's chain.

Second, what was the execution surface near the entry time? A chain snapshot can tell you the broad menu of strikes and expiries, but it often does not answer the tradeability question. For that, you need quotes and trades. The CuteMarkets teaser is direct on this point: quotes answer whether the spread is narrow enough to trust, and trades answer whether the market is actually printing near the side of the spread you think you can access.

Third, how are you estimating slippage? The historical data layer should not let you hide this choice. If you assume the midpoint without checking spread width, you are sneaking a fill model into the study without admitting it. In this repo, the broader research process keeps rediscovering that monetization layers are where many strategies become fragile. The same logic applies here. A good stock-level idea can be destroyed by an unrealistic option fill assumption.

Fourth, are you reconstructing the event window correctly? For earnings-style studies, the data stack has to combine three things that live in different places: an external earnings calendar, the post-event expiry surface, and the historically correct contract and quote path around that event. That is why historical options backtesting is usually a data-integration problem before it becomes a strategy problem.

Evidence / Results

The repo already contains a concise API map for this workflow in Earnings Options Plays Around Earnings, Condensed. The key CuteMarkets endpoints used there are:

TaskEndpoint
expirationsGET /v1/tickers/expirations/{ticker}
current chain snapshotGET /v1/options/chain/{ticker}
historical contractsGET /v1/options/contracts?as_of=...
historical tradesGET /v1/options/trades/{options_ticker}
historical quotesGET /v1/options/quotes/{options_ticker}
aggregatesGET /v1/options/aggs/{ticker}/...
open/closeGET /v1/options/open-close/{ticker}/{date}

That endpoint map corresponds to different research questions. Expirations and chain snapshots tell you which post-event structures are even possible. Contracts with as_of tell you whether the instrument existed at the time. Quotes and trades tell you whether the structure was tradeable. Aggregates and open/close data support event-study PnL reconstruction.

The strongest negative evidence in the repo is the March audit's contract-selection repair. The audit recorded that "different entries on the same day could silently reuse the wrong strike" because the contract selection cache ignored the underlying-price bucket used for moneyness ranking. This is exactly the kind of bug that chain-only research will miss. The signal may be the same, the date may be the same, and the instrument under test can still be wrong.

Example Code

The CuteMarkets teaser already includes a larger client, but the minimum useful example for historical options backtesting is the combination of historical contract discovery and a simple quote-quality check:

from urllib.parse import quote
import requests

BASE_URL = "https://api.cutemarkets.com/v1"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

def get(path: str, params: dict | None = None) -> dict:
    response = requests.get(
        f"{BASE_URL}{path}",
        headers=HEADERS,
        params=params,
        timeout=30,
    )
    response.raise_for_status()
    return response.json()

def list_contracts_as_of(underlying: str, as_of: str, expiration_date: str) -> list[dict]:
    payload = get(
        "/options/contracts/",
        params={
            "underlying_ticker": underlying,
            "as_of": as_of,
            "expiration_date": expiration_date,
            "limit": 1000,
        },
    )
    return payload["results"]

def get_quotes(option_ticker: str, day: str) -> list[dict]:
    encoded = quote(option_ticker, safe="")
    payload = get(
        f"/options/quotes/{encoded}/",
        params={"timestamp.gte": f"{day}T09:30:00Z", "timestamp.lt": f"{day}T16:00:00Z"},
    )
    return payload["results"]

def mean_relative_spread(quotes: list[dict]) -> float:
    spreads = []
    for q in quotes:
        bid = q.get("bid_price")
        ask = q.get("ask_price")
        if bid is None or ask is None or ask <= 0:
            continue
        mid = (bid + ask) / 2.0
        if mid > 0:
            spreads.append((ask - bid) / mid)
    return sum(spreads) / len(spreads) if spreads else float("nan")

This is not a complete backtester. It is enough to make one important point. Historical options backtesting means finding the contract that existed then and deciding whether the quotes and spreads make the trade believable.

What Worked

What worked in this repo was the growing insistence on data-layer causality. The CuteMarkets examples make that constructive path visible. They show how to estimate an implied move from the post-earnings chain, inspect quotes and trades for execution quality, and reconstruct historical structures with as_of contracts. That is a serious workflow because it tries to match the data object to the research question instead of recycling one snapshot endpoint for everything.

The broader framework audit supports the same conclusion from the opposite side. Once the contract-selection path was corrected and the portfolio metrics were computed properly, the repo could distinguish between strategies that still had life and strategies that had mostly been flattered by the machinery. That is what you want from historical options backtesting. You want the data layer to reduce false positives, even if that means a smaller opportunity set.

What Failed

What failed was the belief that a historical chain snapshot is enough. It is not enough for entry timing, because it does not tell you what the tradeable spread looked like in the actual decision window. It is not enough for contract validity, because it does not tell you what existed on the historical date unless your provider explicitly supports time-correct discovery. It is not enough for slippage, because a midpoint assumption can hide an enormous amount of execution optimism.

Another recurring failure mode is to treat event studies as if the options feed contains the whole event definition. The CuteMarkets note is explicit that the options API provides the market-data surface, not the earnings timestamp itself. If a researcher forgets that boundary, the study can be internally neat and externally wrong. The contract reconstruction may be correct, while the event alignment is not.

The same problem appears in intraday research when the data model is too thin. The repo's March audit is a useful reminder that even a correctly timestamped signal can become misleading if the contract selected is wrong or if combined risk is estimated from the wrong object. Historical options backtesting has to be causal all the way down. Partial realism is still a weak foundation.

Takeaway

Historical options backtesting requires more than old chains. It requires historically correct contracts, credible quote and trade context, an explicit slippage rule, and event alignment that comes from the right source. The CuteMarkets examples in this repo provide a good reference implementation for the data stack, while the March framework audit provides the negative proof of what breaks when the data layer is too casual.

If you want the bigger framework question, What Is Realistic Options Backtesting? A Practical Guide for Serious Traders explains why realism has to start at the simulator level. If you want to see what happens after the backtest leaves research mode, Backtest vs Paper Trading: Why Good Trading Results Break in Live Markets covers the next failure surface. Join the research log to get the next backtest and failure report.

How the terminology applies

For Historical Options Backtesting: Data, Fills, and Slippage That Actually Matter, the execution-realism workflow should treat Quote window, Last price risk, Spread percent, No-bid exit, Side-specific fill, and Stale quote as operational state rather than glossary decoration. That framing keeps the fill model honest because options execution is controlled by displayed markets, timing, liquidity, and side-specific assumptions.

A developer implementing this Case Study idea should persist Liquidity filter, Trade print validation, Condition-aware fill, Options data API, OPRA-originating data, and OCC option symbol beside the result, instead of leaving those words in a term card. It also prevents the page from treating last price, midpoint, or a bar close as interchangeable evidence for a fill.

The review artifact for Historical Options Backtesting: Data, Fills, and Slippage That Actually Matter becomes more useful when Bid/ask spread, Midpoint, Quote/trade condition, Quote vs trade semantics, REST snapshot, and WebSocket stream appear in the same body of evidence as the selected rows. When a modeled order is accepted, these fields should explain why the fill was plausible; when it is skipped, they should explain why.

In production notes for this execution-realism workflow, Entitlement gate, Quote freshness, Timestamp semantics, Pagination cursor, Response envelope, and Rate-limit budget define the checks that decide whether the workflow is reproducible. The result is an execution model that can be tightened without rewriting the strategy narrative.

For Historical Options Backtesting: Data, Fills, and Slippage That Actually Matter, the practical acceptance test is simple: another developer should be able to read the body, identify the exact inputs, reproduce the request sequence, and explain the accepted and rejected rows without relying on the bottom terminology grid. If a phrase appears in the page vocabulary, it should correspond to a stored field, a validation check, a replay step, or an implementation decision in the execution-realism workflow.

This is also the reason the article should not measure success only by the final chart, table, or headline metric. The better standard is whether the data path, timing model, entitlement state, and evidence trail survive review. When those pieces are written directly into the body, the terminology becomes part of the workflow readers can implement.

Terminology

Market-data terms used in this article

These terms keep the article connected to the CuteMarkets knowledge base and to the exact API workflow behind the research.

Quote window

A bounded timestamp range used to inspect executable bid/ask markets around a modeled decision.

Last price risk

The danger of filling a strategy at a stale or isolated trade print rather than the available market.

Spread percent

Bid/ask width divided by midpoint, used as a liquidity and execution-quality filter.

No-bid exit

A contract state where the exit side has no usable bid and the backtest should reject or heavily penalize the fill.

Side-specific fill

A policy that treats buys, sells, entries, exits, stops, and profit targets differently instead of using one price rule.

Stale quote

A bid/ask record too old for the modeled decision time, especially around halts, events, reconnects, or illiquid contracts.

Liquidity filter

A pre-trade rule based on bid, ask, spread percent, volume, OI, quote age, and minimum premium.

Trade print validation

The check that a last sale supports context without replacing the executable bid/ask market.

Condition-aware fill

A fill rule that preserves quote and trade conditions before accepting, rejecting, or labeling a market-data row.

Options data API

The product surface for chains, contracts, quotes, trades, aggregates, Greeks, IV, open interest, and expirations.

OPRA-originating data

The U.S. listed-options source context behind quotes, trades, exchange participation, and consolidated option-market records.

OCC option symbol

The exact option contract identifier that preserves root, expiration, call or put side, and strike.

Bid/ask spread

The execution interval between bid and ask that determines whether a contract is realistically tradable.

Midpoint

The computed center between bid and ask, useful as a reference price but not proof that an order would fill.

Quote/trade condition

The condition-code, exchange, correction, sequence, and timestamp context that explains how a quote or trade row can be used.

Quote vs trade semantics

The distinction between executable bid/ask markets, printed transactions, and bar-level summaries.

REST snapshot

A reproducible request for current or historical market state, used for initialization, backfills, and audit logs.

WebSocket stream

A persistent live connection that needs subscription topics, reconnect tracking, freshness labels, and REST repair paths.

Entitlement gate

The product, plan, quote, live, delayed, historical, or commercial-use boundary checked before data is shown.

Quote freshness

The age, timestamp, and live or delayed state of a bid/ask record before it is used in a scanner, backtest, or UI.

Timestamp semantics

The exchange, provider, ingestion, session, and application time context attached to a market-data record.

Pagination cursor

The continuation token or next URL that keeps large chains, trades, quotes, and historical windows complete.

Response envelope

The shared status, request id, results, pagination, and error shape that keeps API wrappers and logs consistent.

Rate-limit budget

The request capacity that shapes polling cadence, scanner breadth, retries, backfills, and degraded-mode behavior.

Daniel Ratke

Written by

Daniel Ratke

Research & Engineering

Daniel covers the deeper research notes: options backtesting, execution realism, robustness testing, data engineering, and strategy validation.