# Market Data Ingestion and Caching

Market-data ingestion is the process that turns API responses, stream events, retries, pagination, backfills, and entitlement labels into durable application state. Caching is the storage policy that keeps those objects reusable without hiding freshness, plan, or data-quality assumptions.

This guide is for developers building scanners, dashboards, research notebooks, backtesting pipelines, paper-trading bots, and provider evaluation harnesses with CuteMarkets. It connects [Market Data Access Methods](/docs/market-data-access-methods), [Rate Limits](/docs/rate-limits), [OpenAPI](/docs/openapi), [Historical Options Replay Runbook](/docs/historical-options-replay-runbook), and [Stock and Options Data Join Workflow](/docs/stock-options-data-join-workflow).

## Ingestion vocabulary

| Term | Definition | Why it matters |
| --- | --- | --- |
| Source request | The endpoint, parameters, headers, product scope, and timestamp behind a response | Lets another developer reproduce a result |
| Response envelope | Shared API response shape containing status, `request_id`, results, and pagination state | Keeps wrappers predictable |
| Cursor | The continuation reference for the next page of results | Prevents partial-chain and partial-window errors |
| Backfill window | A historical interval requested after a stream gap, deploy, retry, or missing cache hit | Repairs live systems and paper bots |
| Cache key | Stable identifier for the stored result | Prevents mixing tickers, plans, dates, products, or adjusted states |
| Freshness label | Live, delayed, stale, historical, cached, backfilled, or unavailable | Keeps user-facing claims honest |
| Replay manifest | A saved record of requests, selected contracts, quotes, fills, rejects, and metrics | Makes research auditable |
| Reject reason | The reason a candidate was skipped or a fill was blocked | Separates signal failure from data or execution failure |

## Start with a workflow graph

Do not ingest "market data" as one undifferentiated table. Model the workflow.

An options scanner graph might look like this:

1. Load underlying watchlist from [Stocks Data API](/stocks-data-api) or app config.
2. Resolve ticker reference and active status with [Stock Reference](/docs/stock-reference).
3. Load listed expirations with [Expirations](/docs/expirations).
4. Fetch chain pages with [Option Chain](/docs/option-chain).
5. Filter by DTE, moneyness, delta, IV, volume, open interest, spread percent, and quote age.
6. Store selected OCC option symbols.
7. Fetch contract snapshots, quotes, trades, or aggregate bars.
8. Write scanner artifacts with score inputs and reject reasons.

A backtest graph is different:

1. Read signal timestamp from stock bars or strategy logic.
2. Discover historical contracts with `as_of`.
3. Select expiration, strike, side, DTE, and moneyness.
4. Request a quote window around entry and exit.
5. Apply quote-aware fill rules from [Backtesting Execution Realism](/docs/backtesting-execution-realism).
6. Store fill, reject, quote, trade, and aggregate artifacts.
7. Roll metrics through [Backtesting Robustness](/docs/backtesting-robustness).

The same API can support both workflows, but the ingestion graph and cache policy are different.

## Cache key design

Cache keys need enough detail to prevent accidental reuse across incompatible contexts. A weak cache key like `SPY-chain` can corrupt scanner and backtest behavior because it ignores expiration, date, plan, pagination, and schema shape.

Use structured cache keys:

```json
{
  "provider": "cutemarkets",
  "product": "options",
  "endpoint": "option_chain",
  "underlying": "SPY",
  "expiration_date": "2026-06-19",
  "as_of": null,
  "plan": "expert",
  "freshness": "live",
  "params": {
    "limit": 100,
    "cursor": "page_2"
  },
  "schema_version": "2026-06-04"
}
```

For stock aggregates, add ticker, adjusted flag, timespan, multiplier, start date, end date, indicator window, and page cursor. For quotes, add exact contract or stock ticker, timestamp bounds, side-specific fill policy, and quote-age threshold. For paper trading, add account id, strategy profile id, and run id so [Paper Trading Bot Operations](/docs/paper-trading-bot-operations) can compare paper decisions with the backtest artifact.

## Pagination and completeness

Large chains, dense quote windows, and multi-day trade requests often need pagination. Treat incomplete pagination as a data-quality state, not a small implementation detail.

Store these fields:

- first request URL
- page count
- `next_url` or cursor sequence
- total rows collected where available
- stop reason
- rate-limit headers
- retry count
- final completeness state

If a scanner ranks only the first page of a chain, it can overstate the top rows because the rest of the expiration was never loaded. If a historical quote window stops early, a backtest can accept a fill without seeing the full bid/ask context. The [Options Chain Scanner Architecture](/docs/options-chain-scanner-architecture), [Options Flow False Positives](/docs/options-flow-false-positives), and [Backtesting Data Quality Checklist](/docs/backtesting-data-quality-checklist) pages use the same completeness language.

## Backfills after live gaps

Streams and browsers disconnect. Servers deploy. Laptops sleep. A production ingestion layer needs a backfill policy before the first live alert goes out.

A simple policy:

1. Store last event timestamp per topic.
2. On reconnect, compute the missing interval.
3. Request historical REST data for quotes, trades, or bars where supported.
4. Mark the repaired interval as backfilled.
5. Mark unrecoverable gaps explicitly.
6. Suppress alerts that depend on unrecoverable windows.

For options, a backfill can use [Quotes](/docs/quotes), [Trades](/docs/trades), and [Aggregates](/docs/aggregates) around selected OCC symbols. For stocks, use [Stock Trades and Quotes](/docs/stock-trades-quotes), [Stock Aggregates and Indicators](/docs/stock-aggregates-indicators), and [Real-Time Stock Data API](/real-time-stock-data-api). For system design, pair this with [Real-Time Options System Design](/docs/real-time-options-system-design) and [REST vs WebSocket Market Data API Guide](/blog/rest-vs-websocket-market-data-api-guide).

## Request budget and rate limits

Rate limits are part of the ingestion model. A scanner with 500 underlyings, 12 expirations per underlying, full chain pagination, quote drill-downs, and retry behavior can use a lot of request budget. Estimate before launching.

Track:

- requests per workflow run
- requests per ticker
- requests per chain page
- quote-window requests per selected contract
- retry and backoff count
- WebSocket reconnect count
- rate-limit headers
- degraded-mode behavior

The [Options Data API Cost Calculator](/options-data-api-cost-calculator) is useful for budget planning, while [Rate Limits](/docs/rate-limits) explains request-budget terminology. The ingestion log can link to [Pricing](/pricing) because plan, quote access, and product scope affect what the workflow can request.

## Artifact storage

Backtests and scanners need artifacts that can be reviewed later. A useful artifact includes:

| Artifact field | Examples |
| --- | --- |
| Identity | provider, product, endpoint, request id, run id |
| Instrument | stock ticker, OCC option symbol, expiration, strike, side, root |
| Timing | decision timestamp, request timestamp, quote window, trade window, bar timestamp |
| Market state | bid, ask, midpoint, spread percent, quote age, last trade, volume, open interest, IV, Greeks |
| Access state | live, delayed, historical, cached, backfilled, unavailable |
| Data quality | pagination complete, missing rows, stale quote, no bid, wide spread, plan gate |
| Decision | selected, rejected, filled, skipped, alerted, suppressed |
| Links | docs path, product page, source request, related blog or runbook |

For unusual activity, store the score inputs described in [Unusual Options Activity Scanner Model](/docs/unusual-options-activity-scanner-model), [Options Volume and Open Interest](/docs/options-volume-open-interest), and [Options Flow False Positives](/docs/options-flow-false-positives). For event replay, use [Historical Options Replay Runbook](/docs/historical-options-replay-runbook). For stock-plus-options strategies, use [Stock and Options Data Joins for Strategy Research](/blog/stock-options-data-join-workflow-strategies).

## Freshness and expiration policy

Cache expiration depends on the object:

| Object | Cache behavior |
| --- | --- |
| Ticker reference | Cache longer, but refresh active status and corporate-action-sensitive fields |
| Listed expirations | Refresh at session start and before expiration-sensitive workflows |
| Current chain | Short TTL; mark live, delayed, or stale |
| Contract snapshot | Short TTL for live tools; archive for selected alert artifacts |
| Historical quote window | Immutable for most research, but preserve provider corrections policy |
| Historical aggregates | Cache by adjusted state, timespan, and date range |
| Open interest | Treat as session or date context rather than tick-by-tick state |
| WebSocket event | Store as append-only stream evidence or compressed state update |

Never let a cache hide a plan gate. If a quote was cached under Expert access, a Developer-plan user interface needs to know that the live quote field is not newly entitled for that viewer.

## Provider migration harness

A strong provider evaluation uses an ingestion harness rather than a spreadsheet alone. Pick a small but demanding set:

- one liquid ETF such as SPY or QQQ
- one single-name equity with weekly expirations
- one thin contract
- one event day
- one historical replay window
- one live or delayed chain scanner run
- one stock-plus-options join

Run the same workflow through [Options Data Provider Evaluation](/docs/options-data-provider-evaluation), [Stock Data Provider Evaluation](/docs/stock-data-provider-evaluation), [Market Data Licensing and Commercial Use](/docs/market-data-licensing-commercial-use), and [Best Options Data APIs](/best-options-data-apis). Score source clarity, object coverage, pagination, missing data, access method fit, entitlement transparency, caching behavior, and support path.

## Implementation checklist

- Define the workflow before the cache schema.
- Keep quotes, trades, aggregates, snapshots, and reference data in separate tables or clearly separated objects.
- Store source requests and response metadata.
- Treat pagination completeness as a required state.
- Use REST backfills after stream gaps.
- Store freshness labels beside cached values.
- Tie cache keys to product scope and plan state.
- Preserve reject reasons for missing quotes, stale quotes, no bid, wide spreads, incomplete chains, and plan gates.
- Link artifacts to the relevant docs, such as [Quotes](/docs/quotes), [Trades](/docs/trades), [Stock Trades and Quotes](/docs/stock-trades-quotes), and [Backtesting Data Model](/docs/backtesting-data-model).

The next step is to compare this ingestion model with your intended license and access pattern in [Market Data Licensing and Commercial Use](/docs/market-data-licensing-commercial-use), then test a live workflow through [Market Data API Due Diligence Checklist](/blog/market-data-api-due-diligence-checklist).
