Add new env var for sqsMsgVisibilityTimeoutSec #1220

tiationg-kho · 2025-12-30T01:24:06Z

Issue #, if available:
#1205

Description of changes:
Add new env var for sqsMsgVisibilityTimeoutSec to SQS mode.

Quick revisit how NTH listen to ASG launch event:

ASG launch event come → NTH start process → change to internal interruption event
- if node ready → sent continue to life cycle hook → delete SQS msg
- if node not ready → cancel internal event → wait SQS vis timeout → same ASG launch event come again → (after SQS reach max receive count, msg goto DLQ)

So we better let SQS visibility timeout become configurable. Which can mitigate the 1205 issue.

Solution:

Recommend to set a higher sqsMsgVisibilityTimeoutSec value like 40 seconds to override the default 20 seconds in our sqs-monitor.
Also, if applicable, recommend to increase SQS maxReceiveCount for DLQ like 5.

How you tested your changes:
Environment (Linux / Windows):
Kubernetes Version:

Test1:

default sqsMsgVisibilityTimeoutSec (20sec) and do not set DLQ for SQS -> succeed after 3 tries

Test2:

set sqsMsgVisibilityTimeoutSec as 10sec and SQS maxReceiveCount for DLQ as 3 -> fail after 3 tries

Test3:

set sqsMsgVisibilityTimeoutSec as 10sec and SQS maxReceiveCount for DLQ as 10 -> succeed after 5 tries

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tiationg-kho · 2025-12-30T01:24:56Z

Hi @LikithaVemulapalli @stevegocoding, could you help me take a look?

LikithaVemulapalli · 2025-12-30T23:38:19Z

pkg/monitor/sqsevent/sqs-monitor.go

 func (m SQSMonitor) receiveQueueMessages(qURL string) ([]*sqs.Message, error) {
+	visibilityTimeout := m.SqsMsgVisibilityTimeoutSec
+	if visibilityTimeout <= 0 {
+		visibilityTimeout = 20


can we import the default constant from config instead of directly adding 20 here for single source of truth

LikithaVemulapalli · 2025-12-30T23:43:59Z

pkg/config/config.go

 	flag.BoolVar(&config.UseAPIServerCacheToListPods, "use-apiserver-cache", getBoolEnv(useAPIServerCache, false), "If true, leverage the k8s apiserver's index on pod's spec.nodeName to list pods on a node, instead of doing an etcd quorum read.")
 	flag.IntVar(&config.HeartbeatInterval, "heartbeat-interval", getIntEnv(heartbeatIntervalKey, -1), "The time period in seconds between consecutive heartbeat signals. Valid range: 30-3600 seconds (30 seconds to 1 hour).")
 	flag.IntVar(&config.HeartbeatUntil, "heartbeat-until", getIntEnv(heartbeatUntilKey, -1), "The duration in seconds over which heartbeat signals are sent. Valid range: 60-172800 seconds (1 minute to 48 hours).")
+	flag.IntVar(&config.SqsMsgVisibilityTimeoutSec, "sqs-msg-visibility-timeout-sec", getIntEnv(sqsMsgVisibilityTimeoutSecConfigKey, sqsMsgVisibilityTimeoutSecDefault), "Duration in seconds that a message is hidden from other consumers after being retrieved from the SQS queue by sqs-monitor.")


I believe we should add validation for this SQS visibility timeout field, the range can be from 0 to 43200 sec which is 12 hrs, SQS throws an error after 12 hrs, so could you include the range in all the comments for this field and also validation to throw error if config.SqsMsgVisibilityTimeoutSec < 0 || config.SqsMsgVisibilityTimeoutSec > 43200 and also the default per this doc is 30 seconds.
For reference: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html

LikithaVemulapalli · 2025-12-30T23:46:36Z

config/helm/aws-node-termination-handler/README.md

 | `topologySpreadConstraints`  | [Topology Spread Constraints](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/) for pod scheduling. Useful with a highly available deployment to reduce the risk of running multiple replicas on the same Node      | `[]`                                   |
 | `heartbeatInterval`  | The time period in seconds between consecutive heartbeat signals. Valid range: 30-3600 seconds (30 seconds to 1 hour). | `-1`                                   |
 | `heartbeatUntil`  | The duration in seconds over which heartbeat signals are sent. Valid range: 60-172800 seconds (1 minute to 48 hours). | `-1`                                   |
+| `sqsMsgVisibilityTimeoutSec`  | Duration in seconds that a message is hidden from other consumers after being retrieved from the SQS queue by sqs-monitor. | `20`                                   |


update comment here as well after including the range

Add new env var for sqsMsgVisibilityTimeoutSec

53a73ce

tiationg-kho requested a review from a team as a code owner December 30, 2025 01:24

Fix visibilityTimeout check

f7417cc

tiationg-kho assigned tiationg-kho and unassigned tiationg-kho Dec 30, 2025

tiationg-kho requested a review from LikithaVemulapalli December 30, 2025 19:06

LikithaVemulapalli requested changes Dec 30, 2025

View reviewed changes

Add valid range check for SqsMsgVisibilityTimeoutSec

30ebd0b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new env var for sqsMsgVisibilityTimeoutSec #1220

Add new env var for sqsMsgVisibilityTimeoutSec #1220

Uh oh!

tiationg-kho commented Dec 30, 2025

Uh oh!

tiationg-kho commented Dec 30, 2025

Uh oh!

LikithaVemulapalli Dec 30, 2025

Uh oh!

tiationg-kho Dec 31, 2025

Uh oh!

LikithaVemulapalli Dec 30, 2025

Uh oh!

tiationg-kho Dec 31, 2025

Uh oh!

LikithaVemulapalli Dec 30, 2025

Uh oh!

tiationg-kho Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add new env var for sqsMsgVisibilityTimeoutSec #1220

Are you sure you want to change the base?

Add new env var for sqsMsgVisibilityTimeoutSec #1220

Uh oh!

Conversation

tiationg-kho commented Dec 30, 2025

Uh oh!

tiationg-kho commented Dec 30, 2025

Uh oh!

LikithaVemulapalli Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

tiationg-kho Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

LikithaVemulapalli Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

tiationg-kho Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

LikithaVemulapalli Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

tiationg-kho Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants