Skip to content

docs: document EXPERIMENT_AUTO_UNLOCK_STATE_LOCK#3083

Open
ZachGoldberg wants to merge 2 commits into
mainfrom
docs/auto-unlock-state-lock-experiment
Open

docs: document EXPERIMENT_AUTO_UNLOCK_STATE_LOCK#3083
ZachGoldberg wants to merge 2 commits into
mainfrom
docs/auto-unlock-state-lock-experiment

Conversation

@ZachGoldberg
Copy link
Copy Markdown
Contributor

@ZachGoldberg ZachGoldberg commented Apr 14, 2026

Summary

  • Adds documentation for the new Pipelines experiment flag PIPELINES_FEATURE_EXPERIMENT_AUTO_UNLOCK_STATE_LOCK.
  • The flag enables automatic force-unlock of stale Terraform/OpenTofu state locks when an AWS session timeout causes the lock-release step to fail mid-apply.
  • Single-unit only (run --all not supported yet). Default OFF.

Related

Summary by CodeRabbit

  • Documentation
    • Added documentation for a new feature flag that enables automatic recovery of stale Terraform/OpenTofu state locks when pipeline apply operations fail due to expired AWS session credentials. The system detects the specific error pattern, retries with fresh credentials to identify and clear stale locks, and still surfaces the original apply error. Disabled by default, must be explicitly enabled, and only applies to single-unit commands.

Documents the new Pipelines experiment that automatically force-unlocks a stale
Terraform/OpenTofu state lock when an AWS session timeout causes the lock-release
step to fail mid-apply.  Currently single-unit only (no run --all support).
Default OFF.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Apr 14, 2026 5:38pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 86dfe743-5ce2-4b25-9241-f9da293320fc

📥 Commits

Reviewing files that changed from the base of the PR and between 46dbb30 and 035a5e7.

📒 Files selected for processing (1)
  • docs/2.0/reference/pipelines/feature-flags.md
✅ Files skipped from review due to trivial changes (1)
  • docs/2.0/reference/pipelines/feature-flags.md

Walkthrough

Documents a new feature flag, PIPELINES_FEATURE_EXPERIMENT_AUTO_UNLOCK_STATE_LOCK, that detects AWS ExpiredTokenException on failed state lock release and runs a recovery sequence using fresh credentials to detect and force-unlock the stale lock while still returning the original apply error.

Changes

Cohort / File(s) Summary
Feature Flag Documentation
docs/2.0/reference/pipelines/feature-flags.md
Added documentation for PIPELINES_FEATURE_EXPERIMENT_AUTO_UNLOCK_STATE_LOCK: detection when apply output contains both Error releasing the state lock and ExpiredTokenException; recovery steps using terragrunt plan -- -lock-timeout=0s with fresh credentials then terragrunt force-unlock; notes Terragrunt 1.0+ dependency, disabled by default, must be "true", and only supported for single-unit commands.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Pipelines
    participant Terragrunt
    participant AWS_StateBackend
    Pipelines->>Terragrunt: run apply (original credentials)
    Terragrunt->>AWS_StateBackend: attempt release state lock
    AWS_StateBackend-->>Terragrunt: Error releasing the state lock + ExpiredTokenException
    Terragrunt-->>Pipelines: return apply error (original)
    Pipelines->>Terragrunt: run terragrunt plan -- -lock-timeout=0s with fresh creds
    Terragrunt->>AWS_StateBackend: detect stale lock ID
    Terragrunt->>AWS_StateBackend: terragrunt force-unlock (clear lock)
    Terragrunt-->>Pipelines: recovery attempted (apply error still returned)
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🔓 Tokens lapse, the lock holds tight,
Fresh creds step in to make it right,
A plan detects the stubborn key,
A gentle force-unlock sets state free—
Doc'd and ready, tidy as light. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding documentation for the EXPERIMENT_AUTO_UNLOCK_STATE_LOCK feature flag, which matches the raw summary and PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/auto-unlock-state-lock-experiment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
docs/2.0/reference/pipelines/feature-flags.md (3)

129-133: Consider adding a safety warning about force-unlock.

Force-unlocking state locks is a powerful operation that, if misused, can lead to state corruption if multiple operations run concurrently. While the feature is designed to handle a specific safe scenario (expired AWS session), users should understand the risks.

Suggested note to add: "Note: This feature only unlocks when it's safe to do so (when the session that held the lock has expired). Manual force-unlock operations should be used with caution."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/2.0/reference/pipelines/feature-flags.md` around lines 129 - 133, Add a
concise safety warning after the paragraph that describes automatic force-unlock
(the list item mentioning "terragrunt plan -- -lock-timeout=0s" and "terragrunt
force-unlock") clarifying that force-unlocking is powerful and can corrupt state
if used during concurrent operations; include the suggested sentence: "Note:
This feature only unlocks when it's safe to do so (when the session that held
the lock has expired). Manual force-unlock operations should be used with
caution." and ensure the note also reiterates that the feature only applies to
single-unit commands and is not supported with "run --all".

129-130: Consider restructuring for better readability.

The main description is a single complex sentence covering detection logic, workflow steps, and behavior. Breaking this into a bulleted list or shorter sentences would improve clarity for users.

Example structure:

  • When it activates: Detects both Error releasing the state lock and ExpiredTokenException in apply output
  • What it does: Automatically runs terragrunt plan -- -lock-timeout=0s with fresh credentials to identify the lock ID, then runs terragrunt force-unlock to clear it
  • Important: The original apply error is still returned; the unlock only clears the lock for the next run
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/2.0/reference/pipelines/feature-flags.md` around lines 129 - 130, Split
the long single-sentence description into a short, clear bulleted or
multi-sentence structure: create a "When it activates" line referencing the
detection of both "Error releasing the state lock" and "ExpiredTokenException"
in apply output, a "What it does" line describing that Pipelines runs
`terragrunt plan -- -lock-timeout=0s` with fresh credentials to discover the
stale lock ID and then runs `terragrunt force-unlock` to release it, and an
"Important" note stating that the original apply error is still returned and the
unlock only clears the lock for the next run; keep the existing terms
("terragrunt plan -- -lock-timeout=0s", "terragrunt force-unlock", "Error
releasing the state lock", "ExpiredTokenException") so readers can easily match
the behavior to the implementation.

130-130: Clarify that this detection is AWS-specific.

The feature detects ExpiredTokenException, which is specific to AWS session timeouts. Consider noting this explicitly so users with other cloud providers (GCP, Azure) or local backends understand the scope.

Suggested addition: "This detection is specific to AWS session timeouts; similar lock issues with other providers will not trigger automatic unlock."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/2.0/reference/pipelines/feature-flags.md` at line 130, Update the
feature description in docs/2.0/reference/pipelines/feature-flags.md to
explicitly state the detection is AWS-specific: add a clarifying sentence after
the paragraph that mentions ExpiredTokenException (e.g., "This detection is
specific to AWS session timeouts; similar lock issues with other providers will
not trigger automatic unlock.") so readers using GCP, Azure, or local backends
understand the limitation; ensure the new sentence references the existing
mention of ExpiredTokenException and the terragrunt force-unlock flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/2.0/reference/pipelines/feature-flags.md`:
- Around line 129-133: Add a concise safety warning after the paragraph that
describes automatic force-unlock (the list item mentioning "terragrunt plan --
-lock-timeout=0s" and "terragrunt force-unlock") clarifying that force-unlocking
is powerful and can corrupt state if used during concurrent operations; include
the suggested sentence: "Note: This feature only unlocks when it's safe to do so
(when the session that held the lock has expired). Manual force-unlock
operations should be used with caution." and ensure the note also reiterates
that the feature only applies to single-unit commands and is not supported with
"run --all".
- Around line 129-130: Split the long single-sentence description into a short,
clear bulleted or multi-sentence structure: create a "When it activates" line
referencing the detection of both "Error releasing the state lock" and
"ExpiredTokenException" in apply output, a "What it does" line describing that
Pipelines runs `terragrunt plan -- -lock-timeout=0s` with fresh credentials to
discover the stale lock ID and then runs `terragrunt force-unlock` to release
it, and an "Important" note stating that the original apply error is still
returned and the unlock only clears the lock for the next run; keep the existing
terms ("terragrunt plan -- -lock-timeout=0s", "terragrunt force-unlock", "Error
releasing the state lock", "ExpiredTokenException") so readers can easily match
the behavior to the implementation.
- Line 130: Update the feature description in
docs/2.0/reference/pipelines/feature-flags.md to explicitly state the detection
is AWS-specific: add a clarifying sentence after the paragraph that mentions
ExpiredTokenException (e.g., "This detection is specific to AWS session
timeouts; similar lock issues with other providers will not trigger automatic
unlock.") so readers using GCP, Azure, or local backends understand the
limitation; ensure the new sentence references the existing mention of
ExpiredTokenException and the terragrunt force-unlock flow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: def7d86e-a78c-49e0-bfd5-f857b879a278

📥 Commits

Reviewing files that changed from the base of the PR and between 001f82f and 46dbb30.

📒 Files selected for processing (1)
  • docs/2.0/reference/pipelines/feature-flags.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant