Skip to content

Add a new ChunkedBody extractor#541

Closed
sunshowers wants to merge 2 commits intomainfrom
sunshowers/spr/add-a-new-chunkedbody-extractor
Closed

Add a new ChunkedBody extractor#541
sunshowers wants to merge 2 commits intomainfrom
sunshowers/spr/add-a-new-chunkedbody-extractor

Conversation

@sunshowers
Copy link
Contributor

@sunshowers sunshowers commented Jan 5, 2023

The current UntypedBody extractor writes data into a single Vec<u8>.
Consider what happens if the body is large (e.g. 100MB, which can happen
if uploading an artifact over HTTP). As each chunk (typically 10-100KB)
comes in, we'll have to both copy data from the incoming Bytes, and
reallocate the Vec over and over.

To avoid this issue, Eliza Weisman and I wrote buf-list, which
represents a list (really a queue) of chunks that can be operated on
using standard Tokio and other abstractions:
https://crates.io/crates/buf-list.

Add a ChunkedBody extractor that uses a BufList inside, avoiding
this issue. Also add ChunkedBody to usage suggestions.

One other change I did is to remove the nonexistent type parameter J
from UntypedBody<J> suggestions -- that didn't look right.

Questions

  • One consideration here is that BufList needs to be exposed as a public API. It's currently at 0.1 -- I could release a 1.0 if that would be helpful as far as exposing in the API goes. What do you think?

Created using spr 1.3.4
Created using spr 1.3.4
@ahl
Copy link
Collaborator

ahl commented Jan 5, 2023

I haven't yet looked at the code, but I had a couple of questions about (my understanding of) the approach:

Do we need to expose a separate interface for large requests in addition to small ones? This seems like an additional burden to end-users we should be reluctant to impose.

Would it be feasible to change UntypedBody to expose Stream interface? Effectively by encapsulating hyper::HttpBody? Users could then read the contents into a Bytes or a BufList or whatever else they like.

@sunshowers
Copy link
Contributor Author

Do we need to expose a separate interface for large requests in addition to small ones? This seems like an additional burden to end-users we should be reluctant to impose.

We definitely don't need to — the BufList interface would also work for smaller bodies. But we'd have to break API compatibility for that which was my main concern.

Would it be feasible to change UntypedBody to expose Stream interface? Effectively by encapsulating hyper::HttpBody? Users could then read the contents into a Bytes or a BufList or whatever else they like.

Again, that's definitely possible to do if we're willing to break API compatibility. We could have UntypedBody represent a body that hasn't been fully extracted yet, then allow collecting it into various data structures or just processing it as a stream.

I mostly went for this approach for API compat reasons, but there's a lot of available design space here.

@ahl
Copy link
Collaborator

ahl commented Jan 5, 2023

First, I don't think we need to be beholden to API compatibility.

Second, I think we could encapsulate an hyper::HttpBody in such a way that would retain compatibility. In particular--and this is without any experimentation, so I may be very wrong--by changing UntypedBody::as_bytes to effectively do the work currently done in http_read_body.

@sunshowers
Copy link
Contributor Author

Hmm, doing that work would be an async function and as_bytes/as_str are synchronous currently.

@sunshowers
Copy link
Contributor Author

I'll put up an alternative version of this PR which breaks BC.

@sunshowers
Copy link
Contributor Author

Put up #542 -- converting this to a draft since I agree that that's definitely a better, more orthogonal API.

@sunshowers sunshowers marked this pull request as draft January 5, 2023 20:28
@sunshowers
Copy link
Contributor Author

Abandoned in favor of #556.

@sunshowers sunshowers closed this Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants