Move to dedicated Worker gRPC specification instead of Arrow Flight#375
Move to dedicated Worker gRPC specification instead of Arrow Flight#375
Conversation
gene-bordegaray
left a comment
There was a problem hiding this comment.
Had a confusion with the relationship between Worker and WorkerService that could get some help from a comment or two and to document API changes. But this implementation seems much more fitting for the use cases compared to using Flight and has some nice clean ups 😄
|
A follow up question. I wouldn't expect there to be much / any changes to bench results since this is not affecting the data streaming. Any insight as to why we see much faster / slower result in cases? |
For TPCH_SF1, the queries are very small computationally speaking, so the network noisy becomes very relevant. A better indicator for those queries is the For the TPCH_SF10, I see them pretty much within the noisy threshold, anything that is within the +-20% range can perfectly be just noise. All other engines so pretty much this same noise, and I imagine it can be attributed to:
|
| let task_ctx = Arc::new(task_ctx_with_extension( | ||
| &task_ctx, | ||
| DistributedTaskContext { | ||
| task_index: doget.target_task_index as usize, | ||
| task_count: doget.target_task_count as usize, | ||
| }, | ||
| )); |
There was a problem hiding this comment.
After reworking a bit the protocol, this no longer needs to be set multiple times per execute_task call, it can be just set once at the instantiation of the task in set_plan
| let mut cfg = | ||
| SessionConfig::default().with_extension(Arc::new(DistributedTaskContext { | ||
| task_index: key.task_number as usize, | ||
| task_count: body.task_count as usize, | ||
| })); |
There was a problem hiding this comment.
All the information for building a DistributedTaskContext exists at the moment of calling set_plan so it can be set once here, rather than multiple times per execute_task call
# Conflicts: # benchmarks/cdk/bin/worker.rs # console/examples/cluster.rs # console/examples/console_worker.rs # src/lib.rs # src/observability/mod.rs # src/observability/service.rs # src/worker/worker_service.rs
Closes #373
This PR moves away from using the Arrow Flight protobuf specification as application protocol for communicating workers in favor of using a dedicated protobuf specification. The rationale behind that can be seen in #373.
This is a breaking API change that will prevent old workers work with new ones, so it should be carefully rolled out.
Part of the reason of moving to a dedicated protobuf specification is about commitment to API stability: we want to reduce as much as possible future breaking API changes, and we expect that maintaining a proper gRPC protobuf specification will help with that.
The migration should be simple:
into_flight_server->into_worker_server). Just following the Rust compiler here is enoughBesides just moving to a dedicated protocol, this PR does two additional things:
StageKeytoTaskKey, as it was uniquely identifying tasks and not stages 882bec2If we take into account that ~710 LOC are now automatically generated out of a .proto file, we are looking at a
+724-924net change, implying a total net reduction of lines of ~200, which sounds pretty good and expected.Remote benchmarks against
main.TL;DR: the same