Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,8 @@
"router/security",
"router/security/tls",
"router/security/config-validation-and-signing",
"router/security/hardening-guide"
"router/security/hardening-guide",
"router/security/cost-control"
]
},
{
Expand Down
302 changes: 302 additions & 0 deletions docs/router/security/cost-control.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
---
title: "Cost Analysis"
description: "Protect the API from expensive operations by estimating query complexity before execution using @cost and @listSize directives"
icon: scale-balanced
---

## Overview

Cost analysis prevents a single GraphQL request from using too many system resources and
slowing everything else down. A query requesting deeply nested lists can generate
thousands of resolver calls and overwhelm subgraphs, while a simple field lookup costs almost nothing.
Cost analysis allows assigning weights to fields and estimating query complexity *before* execution begins.

When cost analysis is enabled, the router calculates an estimated cost for each incoming operation based
on the fields requested, their configured weights and expected list sizes.
Operations exceeding the configured limit are rejected immediately. No subgraph requests are made,
protecting the infrastructure from resource exhaustion.

The Cosmo Router implements the [IBM GraphQL Cost Directive Specification](https://ibm.github.io/graphql-specs/cost-spec.html),
adapted for GraphQL Federation.

### Key differences from the IBM specification

Weights use plain integers instead of stringified floats as defined in the IBM spec.

The IBM specification does not account for federation. Static cost is calculated based on the query plan,
which uses a federation of subgraphs to create a chain of fetches. This chain is invisible at the supergraph
level and may include entity calls, `@requires` fetches, etc. These are accounted for because they make
certain queries more expensive.

When a field returns a list of objects with a type weight, the IBM spec does not multiply the type's own weight
by the list size — only the children's costs are multiplied. This implementation multiplies both the type weight
and children's costs by the list size, resulting in higher cost estimates for list fields with weighted types.
This is done because federation may trigger entity fetches between subgraphs.

Currently, only static (estimated) costs are supported.
Dynamic (runtime) cost, calculated from actual data returned by subgraphs, is in active development.

## How Cost is Calculated

The router walks through the query and sums the cost of each field.

By default, object types (including interfaces and unions) cost `1`, scalar and enum fields cost `0`.
For example, this query has a cost of `4`:

```graphql
query {
book(id: 1) { # Book object: 1
title # String scalar: 0
author { # Author object: 1
name # String scalar: 0
}
publisher { # Publisher object: 1
address { # Address object: 1
zipCode # Int scalar: 0
}
}
}
}
```
The router also accounts for weights assigned to the same field coordinate in different subgraphs,
allowing specific resolvers to be weighted differently per subgraph.

### List Fields Multiply Cost

When a field returns a list, the cost of that field and all its children is multiplied by the expected list size.
Since the router cannot determine actual list sizes during planning, it uses estimates.

```graphql
query {
employees { # List of Employee
id
department {
name
}
}
}
```

With a default list size of 10, this query costs: `10 × (1 Employee + 1 Department) = 20`

## Configuration

Cost analysis is enabled in the router configuration:

```yaml
security:
cost_analysis:
enabled: true
mode: enforce
estimated_limit: 1000
estimated_list_size: 10
```

| Option | Environment Variable | Default | Description |
|-----------------------|----------------------------------------------|-----------|-----------------------------------------------------------------------------------------------|
| `enabled` | `SECURITY_COST_ANALYSIS_ENABLED` | `false` | When true, the router calculates costs for every operation. |
| `mode` | `SECURITY_COST_ANALYSIS_MODE` | `measure` | `measure` calculates costs only; `enforce` rejects operations exceeding the estimated limit. |
| `estimated_limit` | `SECURITY_COST_ANALYSIS_ESTIMATED_LIMIT` | `0` | Maximum allowed estimated cost. Only enforced when mode is `enforce`. 0 disables enforcement. |
| `estimated_list_size` | `SECURITY_COST_ANALYSIS_ESTIMATED_LIST_SIZE` | `10` | Default assumed size for list fields when no `@listSize` directive is specified. |

## Customizing Cost with Directives

Default weights work for many APIs, but the `@cost` and `@listSize` directives allow fine-tuning.

### The @cost Directive

`@cost` assigns custom weights to types, fields or arguments that are more expensive than average.

When specified **on a type** — all fields returning this type inherit the weight:

```graphql
type Address @cost(weight: 5) {
street: String
city: String
zipCode: String
}
```

When specified **on a field** — only that field carries the weight:

```graphql
type Query {
search(term: String!): [Result] @cost(weight: 10)
}
```

When specified **on an argument** — adds cost for expensive argument processing:

```graphql
type Query {
users(filter: UserFilter @cost(weight: 3)): [User]
}
```

### The @listSize Directive

`@listSize` provides better list size estimates than the global default.

**Static size with `assumedSize`** — when a field consistently returns a predictable number of items:

```graphql
type Query {
topProducts: [Product] @listSize(assumedSize: 5)
featuredCategories: [Category] @listSize(assumedSize: 3)
}
```

**Dynamic size with `slicingArguments`** — when the list size is controlled by a pagination argument:

```graphql
type Query {
products(first: Int, after: String): ProductConnection
@listSize(slicingArguments: ["first"])

searchResults(limit: Int!, offset: Int): [Result]
@listSize(slicingArguments: ["limit"])
}
```

The router reads the argument value from the query to determine the multiplier:

```graphql
query {
products(first: 20) { # Multiplier is 20
edges { node { name } }
}
}
```

When multiple slicing arguments are provided, the router uses the maximum value among them.

## Accessing Cost in Custom Modules

In custom modules, the calculated cost is accessible through the operation context:

```go
func (m *MyModule) Middleware(ctx core.RequestContext, next http.Handler) {
// Get cost after planning
cost, err := ctx.Operation().Cost()
if err != nil {
// Cost analysis not enabled or plan not ready
}

// Use estimated cost for custom logic (rate limiting, logging, etc.)
if cost.Estimated > m.warningThreshold {
m.logger.Warn("High cost operation", "cost", cost.Estimated)
}

next.ServeHTTP(ctx.ResponseWriter(), ctx.Request())
}
```

Use cases:

- Apply rate limiting based on query cost
- Log expensive operations for analysis
- Support billing based on query complexity
- Throttle requests during high-load periods

## Error Responses

When a query exceeds the estimated cost limit (in enforce mode), the router returns a 400 status:

```json
{
"errors": [
{
"message": "The estimated query cost 1540 exceeds the maximum allowed estimated cost 1500"
}
]
}
```

## Best Practices

### Start with Measure Mode

Start with `mode: measure` to calculate costs without rejecting queries.
This reveals traffic patterns before enabling enforcement.

```yaml
security:
cost_analysis:
enabled: true
mode: measure
list_size: 10
```

### Pick Parameters for @list_size Carefully

The default list size significantly impacts cost calculations.
If lists typically return 5–20 items, a default of 10 is reasonable.
For fields that return large lists (100+ items), consider:

1. Using `@listSize(assumedSize: N)` on those specific fields
2. Requiring pagination with `slicingArguments`

### Annotate Expensive Resolvers

Apply `@cost` to fields that trigger expensive operations:

- External API calls
- Complex computations
- Large database queries
- Fields that fan out to multiple subgraphs

```graphql
type Query {
# Calls external payment API
paymentHistory(userId: ID!): [Payment] @cost(weight: 5)

# Requires joining multiple tables
analyticsReport(dateRange: DateRange!): Report @cost(weight: 10)
}
```

### Design for Pagination

Pagination improves cost accuracy. `slicingArguments` provides precise multipliers based on client requests:

```graphql
type Query {
# Good: cost scales with requested page size
users(first: Int!, after: String): UserConnection
@listSize(slicingArguments: ["first"])

# Less ideal: cost uses default list_size
allUsers: [User]
}
```

### Nested Lists

Nested list fields require special attention, as costs multiply:

```graphql
query {
departments { # 10 departments
employees { # × 10 employees each = 100
projects { # × 10 projects each = 1000
tasks { # × 10 tasks each = 10000
name
}
}
}
}
}
```

With a default list size of 10, this query costs over 10,000.
Use `@listSize` to provide realistic estimates for deeply nested structures.

## Features Not Yet Implemented

- **Runtime (dynamic) cost measurement** based on actual list sizes returned by subgraphs
- **Cost telemetry** — OTEL metrics and span attributes for cost values (`cost.estimated`, `cost.actual`, `cost.delta`)
- `sizedFields` in @listSize for applying list size estimates to specific child fields rather than all children
- `requireOneSlicingArgument` for validation of queries
- **Input object fields** when used as parameters of fields
- **Arguments of directives** are not accounted for weights