-
Notifications
You must be signed in to change notification settings - Fork 237
IPIP-327: Reframe over HTTP version=2 (DAG-CBOR and better cache controls) #327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
242c476
6e7ca3b
e1448c2
58d0937
0c57718
72739d6
0256c2b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| { | ||
| "fenced-code-language": false, | ||
| "single-h1": false, | ||
| "no-bare-urls": false, | ||
| "no-emphasis-as-header": false, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # IPIP 0000: Add DAG-CBOR support to Reframe over HTTP | ||
|
|
||
| <!-- IPIP number will be assigned by an editor. When opening a pull request to | ||
| submit your IPIP, please use number 0000 and an abbreviated title in the filename, | ||
| `0000-draft-title-abbrev.md`. --> | ||
|
|
||
| - Start Date: 2022-09-29 | ||
| - Related Issues: | ||
| - https://github.com/ipld/edelweiss/issues/16#issuecomment-1074161577 | ||
| - https://github.com/ipfs/kubo/issues/8823 | ||
|
|
||
| ## Summary | ||
|
|
||
| <!--One paragraph explanation of the IPIP.--> | ||
| This IPIP adds DAG-CBOR support to Reframe over HTTP. | ||
|
|
||
| ## Motivation | ||
|
|
||
| We've been using Reframe in Kubo for a while and it is clear that Reframe | ||
| messages are not designed to be created or read by humans. | ||
|
|
||
| The plaintext DAG-JSON representation of messages does not really bring | ||
| anything to the table (because both CIDs and Multiaddrs are in a format that | ||
| needs manual encoding/decoding anyway), and the utility is limited to debugging | ||
| and use in examples. | ||
|
|
||
| We've also identified some HTTP caching and scaling issues due to all methods | ||
| sharing the same URL path and the way `Etag` header is generated, and how | ||
| it made streaming responses impossible. | ||
|
|
||
| ## Detailed design | ||
|
|
||
| We already support DAG-JSON, with its own content type. | ||
| The change here is to add support for requests and responses sent as DAG-CBOR, | ||
| with own content type: `application/vnd.ipfs.rpc+dag-cbor`. | ||
|
|
||
| We change the URL to include method name on the path. This allows deployments | ||
| to scale better: set different HTTP cache control policies, or route different | ||
| methods to different backend services. | ||
|
|
||
| For details, see changes made to `reframe/REFRAME_HTTP_TRANSPORT.md`. | ||
|
|
||
| ## Test fixtures | ||
|
|
||
| TODO: add CIDs of sample DAG-CBOR messages after https://github.com/ipfs/go-delegated-routing implements it, and has own tests. | ||
|
|
||
| ## Design rationale | ||
|
|
||
| IPFS stack aims to support both DAG-CBOR and DAG-JSON. Users can store JSON as | ||
| CBOR and vice versa. Having consistent support for both in Reframe not only | ||
| aligns with user expectations, but also allows us to save some bytes | ||
| (bandwidth, response caching requirements) by using a binary CBOR as the | ||
| production format. | ||
|
|
||
| ### User benefit | ||
|
|
||
| User will be able to choose between binary and human-readable representation, | ||
| just like they do in other parts of IPFS/IPLD stack. | ||
|
|
||
| DAG-JSON is the implicit default, improving ergonomics when debugging with `curl` | ||
| in CLI or fetching response via regular web browser | ||
|
|
||
| ### Compatibility | ||
|
|
||
| IPFS / IPLD stack already includes both DAG-CBOR and DAG-JSON libraries. | ||
| The `version` parameter of the HTTP wire protocol is bumped to `2`. | ||
|
|
||
| Reframe endpoints that care about backward-compatibility with Kubo 0.16 | ||
| can keep support for requests sent with `version=1`. | ||
|
|
||
| ### Security | ||
|
|
||
| N/A, we will use the same DAG-CBOR encoder/decoder as the rest of the stack. | ||
|
|
||
| ### Alternatives | ||
|
|
||
| Alternative is to do nothing, and end up with: | ||
|
|
||
| - inconsistent user experience | ||
| - wasted bandwidth and cache storage | ||
| - difficult deployment and scaling (all methods under same endpoint) | ||
|
|
||
| ### Copyright | ||
|
|
||
| Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,38 +16,81 @@ The Reframe over HTTP protocol is defining the transport and message | |
| serialization mechanisms for sending Reframe messages over HTTP `POST` and | ||
| `GET`, and provides guidance for implementers around HTTP caching. | ||
|
|
||
| # Organization of this document | ||
|
|
||
| - [HTTP Transport Design](#http-transport-design) | ||
| - [HTTP Caching Considerations](#http-caching-considerations) | ||
| - [POST vs GET](#post-vs-get) | ||
| - [Avoiding sending the same response messages twice](#avoiding-sending-the-same-response-messages-twice) | ||
| - [Client controls for time-based caching](#client-controls-for-time-based-caching) | ||
| - [Rate-limiting non-cachable POST requests](#rate-limiting-non-cachable-post-requests) | ||
| ## Organization of this document | ||
|
|
||
| - [HTTP Endpoint](#http-endpoint) | ||
| - [Content type](#content-type) | ||
| - [HTTP methods](#http-methods) | ||
| - [Other notes](#other-notes) | ||
| - [HTTP Caching Considerations](#http-caching-considerations) | ||
| - [POST vs GET](#post-vs-get) | ||
| - [Etag](#etag) | ||
| - [Last-Modified](#last-modified) | ||
| - [Cache-Control](#cache-control) | ||
| - [Rate-limiting non-cachable POST requests](#rate-limiting-non-cachable-post-requests) | ||
| - [Implementations](#implementations) | ||
|
|
||
| # HTTP Transport Design | ||
| ## HTTP Endpoint | ||
|
|
||
| All messages sent in HTTP body MUST be encoded as DAG-JSON and use explicit content type `application/vnd.ipfs.rpc+dag-json; version=1` | ||
| ``` | ||
| https://rpc-service.example.net/reframe | ||
| ``` | ||
|
|
||
| URL of a Reframe endpoint must end with `/reframe` path. | ||
|
|
||
| ### Content type | ||
|
|
||
| Requests SHOULD be sent with explicit `Accept` and `Content-Type` HTTP headers specifying the body format. | ||
|
|
||
| All messages sent in HTTP body MUST be encoded as either: | ||
|
|
||
| - [DAG-CBOR](https://ipld.io/specs/codecs/dag-cbor/spec/), and use explicit content type `application/vnd.ipfs.rpc+dag-cbor; version=2` | ||
| - **This is a CBOR (binary) format for use in production.** | ||
| - CBOR request MUST include HTTP header: `Accept: application/vnd.ipfs.rpc+dag-cbor; version=2` | ||
| - CBOR request AND response MUST include header: `Content-Type: application/vnd.ipfs.rpc+dag-cbor; version=2` | ||
| - [DAG-JSON](https://ipld.io/specs/codecs/dag-json/spec/), and use explicit content type `application/vnd.ipfs.rpc+dag-json; version=2` | ||
| - **This is a human-readable plain text format for use in testing and debugging.** | ||
| - JSON request MUST include header: `Accept: application/vnd.ipfs.rpc+dag-json; version=2` | ||
| - JSON request AND response MUST include header: `Content-Type: application/vnd.ipfs.rpc+dag-json; version=2` | ||
|
|
||
| Implementations SHOULD error when an explicit content type is missing, but MAY decide to implement some defaults instead. | ||
| The rules around implicit content type are as follows: | ||
|
|
||
| - Requests without a matching `Content-Type` header MAY be interpreted as DAG-JSON. | ||
| - Requests without a matching `Accept` header MAY produce a DAG-JSON response. | ||
| - Responses without a matching `Content-Type` header MAY be interpreted as DAG-JSON. | ||
|
|
||
| ### HTTP methods | ||
|
|
||
| Requests MUST be sent as either: | ||
|
|
||
| - `GET /reframe?q={percent-encoded-dag-json}` | ||
| - DAG-JSON is supported via a `?q` query parameter, and the value MUST be [percent-encoded](https://en.wikipedia.org/wiki/Percent-encoding) | ||
| - `GET /reframe/{method}/{request-as-mbase64url-dag-cbor}` | ||
| - Cachable HTTP `GET` requests with message passed as DAG-CBOR in HTTP path segment, encoded as URL-safe [`base64url` multibase](https://docs.ipfs.io/concepts/glossary/#base64url) string | ||
| - Cachable `method` name is placed on the URL path, allowing for different caching strategies per `method`, and custom routing/scaling per `method`, if needed. | ||
| - DAG-CBOR in multibase `base64url` is used (even when request body is DAG-JSON) because JSON may include characters that are not safe to be used in URLs, and percent-encoding or base-encoding a big JSON query may take too much space. | ||
| - Suitable for sharing links, sending bigger messages, and when a query result MUST benefit from HTTP caching (see _HTTP Caching Considerations_ below). | ||
| - DAG-CBOR response is the implicit default, unless explicit `Accept` header is passed | ||
| - `GET /reframe/{method}?json={percent-encoded-request-as-dag-json}` | ||
| - DAG-JSON is supported via a `?json` query parameter, and the value MUST be [percent-encoded](https://en.wikipedia.org/wiki/Percent-encoding) | ||
| - Suitable for sharing links, sending smaller messages, testing and debugging. | ||
| - `POST /reframe` | ||
| - Ephemeral HTTP `POST` request with message passed as DAG-JSON in HTTP request body | ||
| - DAG-JSON response is the implicit default, unless explicit `Accept` header is passed | ||
| - `POST /reframe/{method}` | ||
| - Ephemeral HTTP `POST` request with DAG-JSON or DAG-CBOR message passed in HTTP request body and a mandatory `Content-Type` header informing endpoint how to parse the body | ||
| - Suitable for bigger messages, and when HTTP caching should be skipped for the most fresh results | ||
| - Response type is the same as `Content-Type` of the request, unless explicit `Accept` header is passed | ||
|
|
||
| Servers MUST support `GET` for methods marked as cachable and MUST support `POST` for all methods (both cachable and not-cachable). This allows servers to rate-limit `POST` when cachable `GET` could be used instead, and enables clients to use `POST` as a fallback in case there is a technical problem with bigger Reframe messages not fitting in a `GET` URL. See "Caching Considerations" section. | ||
|
|
||
| ### Other notes | ||
|
|
||
| If a server supports HTTP/1.1, then it MAY send chunked-encoded messages. Clients supporting HTTP/1.1 MUST accept chunked-encoded responses. | ||
|
|
||
| Requests and Responses MUST occur over a single HTTP call instead of the server being allowed to dial back the client with a response at a later time. The response status code MUST be 200 if the RPC transaction succeeds, even when there's an error at the application layer, and a non-200 status code if the RPC transaction fails. | ||
|
|
||
| If a server chooses to respond to a single request message with a group of messages in the response it should do so as a set of `\n` delimited DAG-JSON messages (i.e. `{Response1}\n{Response2}...`). | ||
| If a server chooses to respond to a single request message with a group of DAG-JSON messages in the response it should do so as a set of `\n` delimited DAG-JSON messages (i.e. `{Response1}\n{Response2}...`). | ||
| DAG-CBOR responses require no special handling, as they are already self-delimiting due to the nature of the CBOR encoding. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💭 This should work in theory, but irl we may have constraints beyond regular CBOR.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://www.rfc-editor.org/rfc/rfc8742.html states:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, be very careful with this assumption .. we expect the delimiting to be done before the codec—i.e. you give the codec the exact bytes which match the CID and no more; otherwise we have a validation gap. The only special-casing we've done that's related to this is that Edelweiss has a special flag that it uses for dag-json that lets it be sloppy with ending characters (spaces, EOL): https://github.com/ipld/go-ipld-prime/blob/7548eb883bda4712355797547a0628a0ad1c00cb/codec/dagjson/unmarshal.go#L36-L41 but not for additional data that might delimit the block. dag-cbor is very strict about not wanting extraneous bytes; in Go and JS.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean.. yeah, what other options do we have here?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, well if you want to delimit by whole-value boundaries then we could work that into the dag-cbor decoder but it'll take a little bit of work and have to be a special option for it. If it's needed, then let's get an issue filed in go-ipld-prime and let me know what sort of priority it is and I'll start having a look. |
||
|
|
||
| Requests and responses MUST come with `version=1` as a _Required Parameter_ in the `Accept` and `Content-Type` HTTP headers. | ||
| Requests and responses MUST come with `version=2` as a _Required Parameter_ in the `Accept` and `Content-Type` HTTP headers. | ||
|
|
||
| Note: This version header is what allows the transport to more easily evolve over time (e.g. if it was desired to change the transport to support other encodings than DAG-JSON, utilize headers differently, move the request data from the body, etc.). Not including the version number is may lead to incompatibility with future versions of the transport. | ||
|
|
||
|
|
@@ -65,23 +108,42 @@ Use of `GET` endpoint is not mandatory, but suggested if a Reframe deployment | |
| expects to handle the same message query multiple times, and want to leverage | ||
| existing HTTP tooling to maximize HTTP cache hits. | ||
|
|
||
| ### Avoiding sending the same response messages twice | ||
| ### Etag | ||
|
|
||
| For small responses. | ||
|
|
||
| Implementations MUST always return strong | ||
| Implementations MAY return | ||
| [`Etag`](https://httpwg.org/specs/rfc7232.html#header.etag) HTTP header based | ||
| on digest of DAG-JSON response messages. This allows clients to send | ||
| inexpensive conditional requests with | ||
| on a digest of response messages ONLY when `Etag` generation does not require | ||
| buffering bigger response in memory before sending it to the client. | ||
|
|
||
| In other words, do not use `Etag` if it will block a big, streaming response. | ||
| Streaming responses should use `Last-Modified` instead. | ||
|
|
||
| `Etag` allows clients to send inexpensive conditional requests with | ||
| [`If-None-Match`](https://httpwg.org/specs/rfc7232.html#header.if-none-match) | ||
| header, which will skip when the response message did not change. | ||
|
|
||
| ### Client controls for time-based caching | ||
| ### Last-Modified | ||
|
|
||
| For streaming responses. | ||
|
|
||
| Implementations can also return (optional) | ||
| Implementations SHOULD return | ||
| [`Last-Modified`](https://httpwg.org/specs/rfc7232.html#header.last-modified) | ||
| HTTP header, allowing clients to send conditional requests with | ||
| HTTP header with bigger, streaming responses. | ||
|
|
||
| This allows clients to send conditional requests with | ||
| [`If-Modified-Since`](https://httpwg.org/specs/rfc7232.html#header.if-modified-since) | ||
| header to specify their acceptance for stale (cached) responses. | ||
|
|
||
| ### Cache-Control | ||
|
|
||
| Implementations MAY return custom `Cache-Control` per Reframe method, | ||
| when a specific cache window makes sense in the context of specific method. | ||
|
|
||
| It is also acceptable to leave it out and let reverse HTTP provies / CDNs to | ||
| set it. Value will depend on use case, and expected load. | ||
|
|
||
| ### Rate-limiting non-cachable POST requests | ||
|
|
||
| HTTP endpoint can return status code | ||
|
|
@@ -99,6 +161,6 @@ Retry-After: 3600 | |
| too many POST requests: consider switching to cachable GET or try again later (see Retry-After header) | ||
| ``` | ||
|
|
||
| # Implementations | ||
| ## Implementations | ||
|
|
||
| https://github.com/ipfs/go-delegated-routing | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think dag-cbor support would be net positive, but I think the perf argument for this is rather weak, there are lower hanging fruit like enabling response compression and we have quite voluminous headers in practice. I'm easy to convince here, it would just take some measurements with real delegated routing payloads using dag-json and dag-cbor encodings. I get the theoretical arguments and the evidence from that paper, but for perf it's still weak without direct measurements of the real workload.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made quick test with a single
FindProvidersResponseand we get better gains than the paper suggests:{"FindProvidersResponse":{"Providers":[{"Node":{"peer":{"ID":{"/":{"bytes":"EiAngCqwSSL46hQ5+DWaJsZ1SPV2RwrqwID/OEuj5Rdgqw"}},"Multiaddresses":[{"/":{"bytes":"NhFlbGFzdGljLmRhZy5ob3VzZQYBu94D"}}]}},"Proto":[{"2304":{}}]}]}}baguqeerafsujii5p52wlc3fxrxhmbx6nu7uudyeznqjnvr7ogkjkndkuyukqDAG-JSON block is 223 bytes
bafyreignkd26ejp6jteqf4cggzbhp3twuvidqsa2lshknk6kjh7j2pc2h4DAG-CBOR block is 143 bytes
The diff is ~35%.
I think it makes sense, since DAG-JSON eats additional space by base64-encoding binary fields (all CIDs and Multiaddrs), which creates consistent ~33% overhead.
This comment was marked as duplicate.
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel : these numbers are uncompressed right? Aren't the numbers we need to compare when compression is employed since that is how the bytes will be sent on the wire (and is it how CDNs cache as well - I don't know).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BigLep depends on cache – afaik Nginx does not cache gzipped payload, but supports hosting of pre-compressed files if it finds static pre-compressed
.gzvariant, which does not help our use case.Also, we do not have compression in libp2p.
Anyway, assuming we care only about HTTP, and go with default gzip for the samples above (iiuc cbor grows due to gzip envelope):
(
ipfs block get baguqeerafsujii5p52wlc3fxrxhmbx6nu7uudyeznqjnvr7ogkjkndkuyukq | gzip -c | wc -c)(
ipfs block get bafyreignkd26ejp6jteqf4cggzbhp3twuvidqsa2lshknk6kjh7j2pc2h4 | gzip -c | wc -c)That seems to still give.. ~20% savings with CBOR.