feature: add compute provider to worker deployment #704

jaypipes · 2026-01-28T15:30:03Z

This commit lays the API groundwork for the serverless workers feature. In order for something other than the client/SDK to "manage" a Worker using a serverless compute platform like AWS Lambda, we need some place to store configuration settings like the Lambda Function ARN and IAM Role ARN that needs to be assumed when invoking the Lambda Function.

The PR adds a new field compute_provider to WorkerDeploymentInfo that contains a temporal.api.deployment.v1.ComputeProvider message. This message is a simple wrapper around a blob of bytes that contains configuration settings for a worker lifecycle controller.

The configuration settings for the worker lifecycle controller are specific to different compute providers. The first compute provider we are modeling is AWS Lambda, and the ComputeProviderAWSLambda message houses the ARN of the AWS Lambda Function to be invoked and an optional ARN of the IAM Role that should be assumed by a worker control plane controller when invoking the function.

CLAassistant · 2026-01-28T15:30:10Z

All committers have signed the CLA.

Adds a new field `compute_provider` to `WorkerDeploymentInfo` that contains a `temporal.api.deployment.v1.ComputeProvider` message. This message is a simple wrapper around a blob of bytes that contains configuration settings for a worker lifecycle controller. The configuration settings for the worker lifecycle controller are specific to different compute providers. The first compute provider we are modeling is AWS Lambda, and the `ComputeProviderAWSLambda` message houses the ARN of the AWS Lambda Function to be invoked and an optional ARN of the IAM Role that should be assumed by a worker control plane controller when invoking the function. Signed-off-by: Jay Pipes <jay.pipes@temporal.io>

02strich · 2026-01-28T17:28:36Z

temporal/api/deployment/v1/compute_provider.proto

+// on a TaskQueue that has no active pollers in the last five minutes, a
+// serverless worker lifecycle controller might need to invoke an AWS Lambda
+// Function that itself ends up calling the SDK's worker.New() function.
+message ComputeProvider {


can this have a field like provider_type or similar to identify who should interpret the details?

Mentioned in adjacent comment at https://github.com/temporalio/api/pull/704/changes#r2737966940, but I think the config about which control plane this goes to is missing atm

google.protobuf.Any is made for this sort of situation, but it doesn't handle the encryption and other payload metadata. Maybe we should say the Payload contains an Any, which contains the underlying config plus a type url?

Payloads for proto messages actually do put their type in the metadata just like Any, so they work for this use case, but I am going to be making a comment down there right now concerning issues with using proto here.

02strich · 2026-01-28T17:29:08Z

buf.lock

is this change intentional?

no... it happened when I ran make buf-lint locally :(

cretz

Looking good (and simple which is awesome). This cannot really be merged to main anytime soon until the implementation using it has basically been completely built due to concerns of this being mutated as we work on the impl.

cretz · 2026-01-28T18:30:39Z

temporal/api/deployment/v1/compute_provider.proto

+// Function via its ARN. This will start the Worker that is housed within that
+// AWS Lambda Function and the Worker will begin listening to Tasks on the
+// WorkerDeployment's configured TaskQueue.
+message ComputeProviderAWSLambda {


Suggested change

message ComputeProviderAWSLambda {

message ComputeProviderDetailsAWSLambda {

Because this is not a "compute provider" as we've now defined it since "compute provider" as we've now defined it is "compute provider details" + "scaling config" + "endpoint config".

cretz · 2026-01-28T18:32:20Z

temporal/api/deployment/v1/compute_provider.proto

+// on a TaskQueue that has no active pollers in the last five minutes, a
+// serverless worker lifecycle controller might need to invoke an AWS Lambda
+// Function that itself ends up calling the SDK's worker.New() function.
+message ComputeProvider {


I think even as placeholders, we should go ahead and add fields that represent scaling algo config and control plane Nexus endpoint config (even if for many they will use a "system" one that we run)

cretz · 2026-01-28T18:32:54Z

temporal/api/deployment/v1/compute_provider.proto

+// on a TaskQueue that has no active pollers in the last five minutes, a
+// serverless worker lifecycle controller might need to invoke an AWS Lambda
+// Function that itself ends up calling the SDK's worker.New() function.
+message ComputeProvider {


Mentioned in adjacent comment at https://github.com/temporalio/api/pull/704/changes#r2737966940, but I think the config about which control plane this goes to is missing atm

cretz · 2026-01-28T18:34:58Z

temporal/api/workflowservice/v1/request_response.proto

+
+    // Contains the new worker compute provider configuration for the
+    // WorkerDeployment.
+	temporal.api.deployment.v1.ComputeProvider compute_provider = 20;


I assume the absence/null of this represents removing compute provider?

cretz · 2026-01-28T18:36:40Z

temporal/api/workflowservice/v1/service.proto

+    // Creates or replaces the compute provider configuration for a Worker
+    // Deployment.


Needs to be pretty clear/obvious here or somewhere that this completely changes how task queues operate and completely changes the worker paradigm.

I fear the triviality of the API/CLI names and such don't convey the significance of this "switch flip" that completely changes how these task queues will work.

Does it? Can these task queues not be polled like normal? (as a "base load" below burstable compute) I know we're not going to focus on that case at first but will we disable it?

I think for the proper server-side worker lifecycle management and clearer understanding by users, a task queue needs to use server-owned workers or user-owned workers. Having a hybrid can cause unpredictability and shift expected numbers, known workers, etc. If a base load is needed, why can't the server control the lifetimes of those workers too (or at least know about them)? If that means that you just tell the server that these may exist, that can make sense maybe. But to have the server not even know about workers on a task queue where it expects to control workers is going to cause confusion and unpredictability IMO.

I'd go as far as saying from a server POV a worker should not be allowed to poll on a task queue unless it was a worker the server configured. This way you don't have silent processing workers out there the server can't account for in its scaling/versioning logic.

dnr · 2026-01-29T00:21:43Z

temporal/api/deployment/v1/compute_provider.proto

+// on a TaskQueue that has no active pollers in the last five minutes, a
+// serverless worker lifecycle controller might need to invoke an AWS Lambda
+// Function that itself ends up calling the SDK's worker.New() function.
+message ComputeProvider {


google.protobuf.Any is made for this sort of situation, but it doesn't handle the encryption and other payload metadata. Maybe we should say the Payload contains an Any, which contains the underlying config plus a type url?

dnr · 2026-01-29T00:24:47Z

temporal/api/deployment/v1/compute_provider.proto

+message ComputeProviderAWSLambda {
+    // The Qualified or Unqualified ARN of the AWS Lambda Function to be
+    // invoked.
+    string invoke = 1;


maybe:

Suggested change

string invoke = 1;

string invoke_arn = 1;

without arn in the name I would have assumed this is an http url

dnr · 2026-01-29T00:26:12Z

temporal/api/workflowservice/v1/service.proto

+    // Creates or replaces the compute provider configuration for a Worker
+    // Deployment.


Does it? Can these task queues not be polled like normal? (as a "base load" below burstable compute) I know we're not going to focus on that case at first but will we disable it?

cretz · 2026-01-29T13:50:52Z

temporal/api/deployment/v1/compute_provider.proto

+    // An individual ComputeProviderAWSLambda,
+    // ComputeProviderGoogleCloudRun, etc will be encrypted and
+    // serialized into compute_provider_config.
+    temporal.api.common.v1.Payload provider_details = 1;


I wanted to add a note here based on internal discussions. This can only really be a payload for custom providers, because people have "encrypt every payload" proxies and cloud side needing to read these prevents people from encrypting this.

But there is a problem using proto at all here IMO, see comment below at https://github.com/temporalio/api/pull/704/changes#r2741776135.

So we should have provider type w/ a oneof I think:

Suggested change

temporal.api.common.v1.Payload provider_details = 1;

// Passed to provider Nexus endpoint to identify the contents of the provider details

string provider_type = 1;

oneof provider_details {

// Non-encoded provider details, used in situations where decoding cannot occur

bytes provider_details_open = 2;

// Encoded provider details

temporal.api.common.v1.Payload provider_details_payload = 3;

}

Note how provider_details_open is bytes and not google.protobuf.Any here

cretz · 2026-01-29T13:54:11Z

temporal/api/deployment/v1/compute_provider.proto

+// Function via its ARN. This will start the Worker that is housed within that
+// AWS Lambda Function and the Worker will begin listening to Tasks on the
+// WorkerDeployment's configured TaskQueue.
+message ComputeProviderAWSLambda {


IMO these should not be protos, these should be JSON schema defined as part of a Nexus IDL (https://github.com/nexus-rpc/nexus-rpc-gen). People should have to use proto to implement a compute provider accepting this (including us).

jaypipes requested review from a team as code owners January 28, 2026 15:30

jaypipes force-pushed the jaypipes/compute-provider branch 2 times, most recently from 988feaa to 2efdaa8 Compare January 28, 2026 15:49

jaypipes force-pushed the jaypipes/compute-provider branch from 2efdaa8 to 5a5db55 Compare January 28, 2026 17:10

02strich reviewed Jan 28, 2026

View reviewed changes

cretz reviewed Jan 28, 2026

View reviewed changes

dnr reviewed Jan 29, 2026

View reviewed changes

cretz reviewed Jan 29, 2026

View reviewed changes

	message ComputeProviderAWSLambda {
	message ComputeProviderDetailsAWSLambda {

		// Creates or replaces the compute provider configuration for a Worker
		// Deployment.

-    temporal.api.common.v1.Payload provider_details = 1;
+    // Passed to provider Nexus endpoint to identify the contents of the provider details
+    string provider_type = 1;
+    oneof provider_details {
+        // Non-encoded provider details, used in situations where decoding cannot occur
+        bytes provider_details_open = 2;
+        // Encoded provider details
+        temporal.api.common.v1.Payload provider_details_payload = 3;
+    }

feature: add compute provider to worker deployment #704

Are you sure you want to change the base?

feature: add compute provider to worker deployment #704

Conversation

jaypipes commented Jan 28, 2026

Uh oh!

CLAassistant commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cretz Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLAassistant commented Jan 28, 2026 •

edited

Loading

cretz Jan 29, 2026 •

edited

Loading

cretz left a comment •

edited

Loading

cretz Jan 28, 2026 •

edited

Loading

cretz Jan 29, 2026 •

edited

Loading

cretz Jan 29, 2026 •

edited

Loading