Provide environment information
System:
- OS: Linux 6.12 Debian GNU/Linux 11 (bullseye) 11 (bullseye)
- CPU: (4) x64 AMD EPYC 7R13 Processor
- Memory: 4.57 GB / 7.55 GB
- Container: Yes
- Shell: 5.1.4 - /bin/bash
Binaries:
- Node: 20.11.1 - /usr/local/bin/node
- npm: 10.2.4 - /usr/local/bin/npm
- pnpm: 8.15.5 - /usr/local/bin/pnpm
Deployment:
- Trigger.dev: v4.0.0-beta.23
- Helm Chart: v4.0.0-beta.18
- Registry: AWS ECR
- Authentication: EKS IRSA (IAM Roles for Service Accounts)
Describe the bug
When running npx trigger.dev@v4-beta deploy against a self-hosted instance on EKS using IRSA for ECR authentication, deployments fail if DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN is not set.
The CLI returns the following error:
Failed to start deployment: Failed to get deployment image ref
And the webapp logs show a ValidationError from the AWS SDK:
{
"assumeRole":{},
"sessionName":"TriggerWebappECRAccess_1753172908266_70oxc8",
"error":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
"http":{"requestId":"P4kJ62bmgEtC8hVTFmwrk","path":"/api/v1/deployments"},
"level":"error",
"message":"Failed to assume role"
}
{
"cause":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
"level":"error",
"message":"Failed to get deployment image ref"
}
This seems related to the code always attempting an AssumeRole operation, even when the pod already has the necessary ECR permissions via IRSA's default credential chain. This requires users to configure workarounds that add complexity and may differ from typical practices for IRSA.
Root Cause
The implementation in initializeDeployment.server.ts unconditionally passes an assumeRole object to getDeploymentImageRef, even when the corresponding environment variables are undefined:
// This always creates an object, never undefined
assumeRole: {
roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN, // undefined in my case
externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID, // also undefined
}
This prevents the code from using the default credential chain and causes the AWS SDK to throw a ValidationError because roleArn is null.
Use Case Comparison
It seems there might be two different use cases for ECR integration:
- Trigger.dev Cloud (Cross-Account):
AssumeRole is necessary for the central webapp to access a user's ECR in another account. This works as expected.
- Self-Hosted on EKS (Same Account): The webapp and ECR are in the same account, and the pod already has direct permissions via IRSA. In this common setup, an
AssumeRole operation is typically not needed.
The current implementation appears to be designed for the first use case.
Reproduction repo
Not applicable - this is a deployment configuration issue.
To reproduce
-
Create IRSA for Trigger.dev on EKS:
eksctl create iamserviceaccount \
--cluster=my-cluster \
--namespace=trigger-dev \
--name=trigger-dev-webapp \
--attach-policy-arn=arn:aws:iam::<ACCOUNT_ID>:policy/TriggerDevECRAccess \
--approve
-
Deploy Trigger.dev with Helm and Kustomize:
values.yaml:
registry:
deploy: false
repositoryNamespace: "trigger"
external:
host: "<ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com"
auth:
enabled: false
kustomization.yaml:
patches:
- path: sa-patch.yaml
target:
kind: ServiceAccount
name: trigger-dev-webapp
sa-patch.yaml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: trigger-dev-webapp
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/eksctl-my-cluster-addon-iamserviceaccount-trigger-dev-webapp"
-
Attempt to deploy a project:
npx trigger.dev@v4-beta deploy
-
Observe the ValidationError in the webapp logs.
Expected Behavior
When DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN is not set, npx trigger.dev@v4-beta deploy should succeed by using the pod's default AWS credential chain (provided by IRSA), without attempting an AssumeRole operation.
Additional information
Suggested Fix
A possible solution could be to only construct the assumeRole object when the DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN environment variable is explicitly set. This would allow the default credential chain to be used when the variable is absent.
// In apps/webapp/app/v3/services/initializeDeployment.server.ts
+ const assumeRole = env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN
+ ? {
+ roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
+ externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
+ }
+ : undefined;
const [imageRefError, imageRefResult] = await tryCatch(
getDeploymentImageRef({
// ... other params
- assumeRole: {
- roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
- externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
- },
+ assumeRole,
})
);
Discussion on Workarounds
Without this change, users in a same-account IRSA setup need to implement workarounds that have notable drawbacks:
- Setting the ARN to the pod's own role: This requires configuring the role's trust policy to allow self-assumption, which is an uncommon pattern and adds extra STS API calls.
- Creating a dedicated intermediate role: This adds the complexity of creating and maintaining an additional IAM role.
Both approaches seem to add unnecessary complexity for a standard self-hosted EKS setup. It would be great if Trigger.dev could natively support the direct use of IRSA credentials, which aligns with the "optional STS assume role support" mentioned in PR #2224.
@nicktrn
Provide environment information
System:
Binaries:
Deployment:
Describe the bug
When running
npx trigger.dev@v4-beta deployagainst a self-hosted instance on EKS using IRSA for ECR authentication, deployments fail ifDEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARNis not set.The CLI returns the following error:
And the webapp logs show a
ValidationErrorfrom the AWS SDK:{ "assumeRole":{}, "sessionName":"TriggerWebappECRAccess_1753172908266_70oxc8", "error":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null", "http":{"requestId":"P4kJ62bmgEtC8hVTFmwrk","path":"/api/v1/deployments"}, "level":"error", "message":"Failed to assume role" } { "cause":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null", "level":"error", "message":"Failed to get deployment image ref" }This seems related to the code always attempting an
AssumeRoleoperation, even when the pod already has the necessary ECR permissions via IRSA's default credential chain. This requires users to configure workarounds that add complexity and may differ from typical practices for IRSA.Root Cause
The implementation in
initializeDeployment.server.tsunconditionally passes anassumeRoleobject togetDeploymentImageRef, even when the corresponding environment variables are undefined:This prevents the code from using the default credential chain and causes the AWS SDK to throw a
ValidationErrorbecauseroleArnis null.Use Case Comparison
It seems there might be two different use cases for ECR integration:
AssumeRoleis necessary for the central webapp to access a user's ECR in another account. This works as expected.AssumeRoleoperation is typically not needed.The current implementation appears to be designed for the first use case.
Reproduction repo
Not applicable - this is a deployment configuration issue.
To reproduce
Create IRSA for Trigger.dev on EKS:
Deploy Trigger.dev with Helm and Kustomize:
values.yaml:kustomization.yaml:sa-patch.yaml:Attempt to deploy a project:
Observe the
ValidationErrorin the webapp logs.Expected Behavior
When
DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARNis not set,npx trigger.dev@v4-beta deployshould succeed by using the pod's default AWS credential chain (provided by IRSA), without attempting anAssumeRoleoperation.Additional information
Suggested Fix
A possible solution could be to only construct the
assumeRoleobject when theDEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARNenvironment variable is explicitly set. This would allow the default credential chain to be used when the variable is absent.Discussion on Workarounds
Without this change, users in a same-account IRSA setup need to implement workarounds that have notable drawbacks:
Both approaches seem to add unnecessary complexity for a standard self-hosted EKS setup. It would be great if Trigger.dev could natively support the direct use of IRSA credentials, which aligns with the "optional STS assume role support" mentioned in PR #2224.
@nicktrn