diff --git a/docs/docs.json b/docs/docs.json index 199ef4f5..d109df14 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -200,7 +200,8 @@ "router/security", "router/security/tls", "router/security/config-validation-and-signing", - "router/security/hardening-guide" + "router/security/hardening-guide", + "router/security/cost-control" ] }, { diff --git a/docs/router/security/cost-control.mdx b/docs/router/security/cost-control.mdx new file mode 100644 index 00000000..c7fb7056 --- /dev/null +++ b/docs/router/security/cost-control.mdx @@ -0,0 +1,302 @@ +--- +title: "Cost Analysis" +description: "Protect the API from expensive operations by estimating query complexity before execution using @cost and @listSize directives" +icon: scale-balanced +--- + +## Overview + +Cost analysis prevents a single GraphQL request from using too many system resources and +slowing everything else down. A query requesting deeply nested lists can generate +thousands of resolver calls and overwhelm subgraphs, while a simple field lookup costs almost nothing. +Cost analysis allows assigning weights to fields and estimating query complexity *before* execution begins. + +When cost analysis is enabled, the router calculates an estimated cost for each incoming operation based +on the fields requested, their configured weights and expected list sizes. +Operations exceeding the configured limit are rejected immediately. No subgraph requests are made, +protecting the infrastructure from resource exhaustion. + +The Cosmo Router implements the [IBM GraphQL Cost Directive Specification](https://ibm.github.io/graphql-specs/cost-spec.html), +adapted for GraphQL Federation. + +### Key differences from the IBM specification + +Weights use plain integers instead of stringified floats as defined in the IBM spec. + +The IBM specification does not account for federation. Static cost is calculated based on the query plan, +which uses a federation of subgraphs to create a chain of fetches. This chain is invisible at the supergraph +level and may include entity calls, `@requires` fetches, etc. These are accounted for because they make +certain queries more expensive. + +When a field returns a list of objects with a type weight, the IBM spec does not multiply the type's own weight +by the list size — only the children's costs are multiplied. This implementation multiplies both the type weight +and children's costs by the list size, resulting in higher cost estimates for list fields with weighted types. +This is done because federation may trigger entity fetches between subgraphs. + +Currently, only static (estimated) costs are supported. +Dynamic (runtime) cost, calculated from actual data returned by subgraphs, is in active development. + +## How Cost is Calculated + +The router walks through the query and sums the cost of each field. + +By default, object types (including interfaces and unions) cost `1`, scalar and enum fields cost `0`. +For example, this query has a cost of `4`: + +```graphql +query { + book(id: 1) { # Book object: 1 + title # String scalar: 0 + author { # Author object: 1 + name # String scalar: 0 + } + publisher { # Publisher object: 1 + address { # Address object: 1 + zipCode # Int scalar: 0 + } + } + } +} +``` +The router also accounts for weights assigned to the same field coordinate in different subgraphs, +allowing specific resolvers to be weighted differently per subgraph. + +### List Fields Multiply Cost + +When a field returns a list, the cost of that field and all its children is multiplied by the expected list size. +Since the router cannot determine actual list sizes during planning, it uses estimates. + +```graphql +query { + employees { # List of Employee + id + department { + name + } + } +} +``` + +With a default list size of 10, this query costs: `10 × (1 Employee + 1 Department) = 20` + +## Configuration + +Cost analysis is enabled in the router configuration: + +```yaml +security: + cost_analysis: + enabled: true + mode: enforce + estimated_limit: 1000 + estimated_list_size: 10 +``` + +| Option | Environment Variable | Default | Description | +|-----------------------|----------------------------------------------|-----------|-----------------------------------------------------------------------------------------------| +| `enabled` | `SECURITY_COST_ANALYSIS_ENABLED` | `false` | When true, the router calculates costs for every operation. | +| `mode` | `SECURITY_COST_ANALYSIS_MODE` | `measure` | `measure` calculates costs only; `enforce` rejects operations exceeding the estimated limit. | +| `estimated_limit` | `SECURITY_COST_ANALYSIS_ESTIMATED_LIMIT` | `0` | Maximum allowed estimated cost. Only enforced when mode is `enforce`. 0 disables enforcement. | +| `estimated_list_size` | `SECURITY_COST_ANALYSIS_ESTIMATED_LIST_SIZE` | `10` | Default assumed size for list fields when no `@listSize` directive is specified. | + +## Customizing Cost with Directives + +Default weights work for many APIs, but the `@cost` and `@listSize` directives allow fine-tuning. + +### The @cost Directive + +`@cost` assigns custom weights to types, fields or arguments that are more expensive than average. + +When specified **on a type** — all fields returning this type inherit the weight: + +```graphql +type Address @cost(weight: 5) { + street: String + city: String + zipCode: String +} +``` + +When specified **on a field** — only that field carries the weight: + +```graphql +type Query { + search(term: String!): [Result] @cost(weight: 10) +} +``` + +When specified **on an argument** — adds cost for expensive argument processing: + +```graphql +type Query { + users(filter: UserFilter @cost(weight: 3)): [User] +} +``` + +### The @listSize Directive + +`@listSize` provides better list size estimates than the global default. + +**Static size with `assumedSize`** — when a field consistently returns a predictable number of items: + +```graphql +type Query { + topProducts: [Product] @listSize(assumedSize: 5) + featuredCategories: [Category] @listSize(assumedSize: 3) +} +``` + +**Dynamic size with `slicingArguments`** — when the list size is controlled by a pagination argument: + +```graphql +type Query { + products(first: Int, after: String): ProductConnection + @listSize(slicingArguments: ["first"]) + + searchResults(limit: Int!, offset: Int): [Result] + @listSize(slicingArguments: ["limit"]) +} +``` + +The router reads the argument value from the query to determine the multiplier: + +```graphql +query { + products(first: 20) { # Multiplier is 20 + edges { node { name } } + } +} +``` + +When multiple slicing arguments are provided, the router uses the maximum value among them. + +## Accessing Cost in Custom Modules + +In custom modules, the calculated cost is accessible through the operation context: + +```go +func (m *MyModule) Middleware(ctx core.RequestContext, next http.Handler) { + // Get cost after planning + cost, err := ctx.Operation().Cost() + if err != nil { + // Cost analysis not enabled or plan not ready + } + + // Use estimated cost for custom logic (rate limiting, logging, etc.) + if cost.Estimated > m.warningThreshold { + m.logger.Warn("High cost operation", "cost", cost.Estimated) + } + + next.ServeHTTP(ctx.ResponseWriter(), ctx.Request()) +} +``` + +Use cases: + +- Apply rate limiting based on query cost +- Log expensive operations for analysis +- Support billing based on query complexity +- Throttle requests during high-load periods + +## Error Responses + +When a query exceeds the estimated cost limit (in enforce mode), the router returns a 400 status: + +```json +{ + "errors": [ + { + "message": "The estimated query cost 1540 exceeds the maximum allowed estimated cost 1500" + } + ] +} +``` + +## Best Practices + +### Start with Measure Mode + +Start with `mode: measure` to calculate costs without rejecting queries. +This reveals traffic patterns before enabling enforcement. + +```yaml +security: + cost_analysis: + enabled: true + mode: measure + list_size: 10 +``` + +### Pick Parameters for @list_size Carefully + +The default list size significantly impacts cost calculations. +If lists typically return 5–20 items, a default of 10 is reasonable. +For fields that return large lists (100+ items), consider: + +1. Using `@listSize(assumedSize: N)` on those specific fields +2. Requiring pagination with `slicingArguments` + +### Annotate Expensive Resolvers + +Apply `@cost` to fields that trigger expensive operations: + +- External API calls +- Complex computations +- Large database queries +- Fields that fan out to multiple subgraphs + +```graphql +type Query { + # Calls external payment API + paymentHistory(userId: ID!): [Payment] @cost(weight: 5) + + # Requires joining multiple tables + analyticsReport(dateRange: DateRange!): Report @cost(weight: 10) +} +``` + +### Design for Pagination + +Pagination improves cost accuracy. `slicingArguments` provides precise multipliers based on client requests: + +```graphql +type Query { + # Good: cost scales with requested page size + users(first: Int!, after: String): UserConnection + @listSize(slicingArguments: ["first"]) + + # Less ideal: cost uses default list_size + allUsers: [User] +} +``` + +### Nested Lists + +Nested list fields require special attention, as costs multiply: + +```graphql +query { + departments { # 10 departments + employees { # × 10 employees each = 100 + projects { # × 10 projects each = 1000 + tasks { # × 10 tasks each = 10000 + name + } + } + } + } +} +``` + +With a default list size of 10, this query costs over 10,000. +Use `@listSize` to provide realistic estimates for deeply nested structures. + +## Features Not Yet Implemented + +- **Runtime (dynamic) cost measurement** based on actual list sizes returned by subgraphs +- **Cost telemetry** — OTEL metrics and span attributes for cost values (`cost.estimated`, `cost.actual`, `cost.delta`) +- `sizedFields` in @listSize for applying list size estimates to specific child fields rather than all children +- `requireOneSlicingArgument` for validation of queries +- **Input object fields** when used as parameters of fields +- **Arguments of directives** are not accounted for weights +