fix(install): ADO Entra ID auth path + --update pre-flight abort (#1015)#1031
fix(install): ADO Entra ID auth path + --update pre-flight abort (#1015)#1031danielmeppiel merged 11 commits intomainfrom
Conversation
…-> v0.71.1 Replace the on.steps: 'exit 1' label-name guards in pr-review-panel and triage-panel with top-level frontmatter 'if:' fields. gh-aw propagates top-level 'if:' to BOTH the pre_activation and activation jobs, so unmatched label events now render as a clean gray Skipped status instead of red Failed (which was polluting the CI dashboard on every PR labeled with anything other than 'panel-review', and on every issue labeled with anything other than 'status/needs-triage'). Workaround for the lack of native label-name filtering on pull_request_target / issues 'labeled' triggers. Both .md files now carry a TODO marker pointing at github/gh-aw ADR-28737, which adds a first-class 'on.labels:' filter (committed 2026-04-27, post-v0.71.1, not yet released). Once released, both gates can collapse to 'on.labels: [<name>]'. Also bump gh-aw v0.68.3 -> v0.71.1 (latest released) and recompile all workflows. Other lock.yml files and agentics-maintenance.yml change only because of the setup-action SHA bump and the regenerated maintenance-workflow template; no behavioural change there. Repro of the original noise: https://github.com/microsoft/apm/actions/runs/25089778042 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolved conflicts in generated files by taking main's versions, then re-running 'gh aw compile' to re-apply the panel/triage label-gate 'if:' fields and re-bump setup-cli SHA to v0.71.1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Typed exception for remote-host auth failures (PAT rejected, bearer rejected, no credentials available). Carries a pre-rendered diagnostic_context from build_error_context so the renderer can display actionable guidance on the default output path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three defects fixed: 1. Pass auth_scheme from _dep_ctx to _build_repo_url so bearer tokens use extraheader injection instead of embedding JWT in URL userinfo. 2. Merge _dep_ctx.git_env (GIT_CONFIG_COUNT/KEY_0/VALUE_0 overrides) into the subprocess env so git ls-remote sends the Authorization header for AAD bearer tokens. 3. Detect auth-related failures (401/403/Authentication failed) from git ls-remote stderr and raise AuthenticationError with build_error_context diagnostics instead of returning bare False. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
#1015) - Add AuthenticationError to the re-raise chain (alongside PolicyViolationError, DirectDependencyError, PathTraversalError) so auth failures are not wrapped as "Failed to resolve APM dependencies: ...". - Add _preflight_auth_check(): when update_refs is set, run one git ls-remote per distinct (host, org) cluster BEFORE any write phase. Aborts with AuthenticationError carrying "No files were modified" + build_error_context diagnostics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add except AuthenticationError handlers at both install entry points in commands/install.py. Renders the diagnostic_context (from build_error_context) on the DEFAULT path -- not --verbose-gated. Catches AuthenticationError BEFORE the generic Exception handler so the "Failed to install APM dependencies:" double-wrap is avoided. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…1015) - test_errors.py: AuthenticationError attribute roundtrip, isinstance - test_validation_ado_bearer.py: bearer scheme passed to _build_repo_url, git_env merged into subprocess, 401/403 raise AuthenticationError, DNS failure returns False, PAT regression (basic scheme embeds token) - test_pipeline_auth_preflight.py: --update pre-flight rejects bad auth with "No files were modified", passes good auth, skips github.com, deduplicates (host, org) clusters - test_install_cmd_auth_rendering.py: AuthenticationError not caught by PolicyViolationError, bypasses generic RuntimeError wrap - Fix: let AuthenticationError propagate through outer try/except in _validate_package_exists (the catch-all was for parse failures only) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- bearer-only install (no PAT) succeeds via az cli - bogus PAT + no az -> actionable diagnostic, no legacy wording - --update pre-flight abort: "No files were modified", files untouched - explicit PAT regression after bearer fix Note: integration tests require live ADO access + az login; skipped in CI unless APM_TEST_ADO_BEARER=1 and tenant context is configured. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ance - Reorder ADO quick-start to recommend `az login` first, PAT second - Add tenant-discovery guidance (org settings URL + az account show) - Document auth-failure diagnostics and --update pre-flight behavior - Mirror changes to skill resource (authentication.md sync) Closes #1015 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes Azure DevOps (ADO) Entra ID (AAD) bearer authentication plumbing during install/validation, introduces a --update auth preflight to abort before destructive writes on auth failure, and updates docs/workflows/tests to cover the regression and new behavior.
Changes:
- Plumbs
auth_schemeand per-dependencygit_envinto ADO/GHES validation and raises a typedAuthenticationErrorfor auth-specific failures. - Adds an
--updatepreflight auth probe in the install pipeline and rendersAuthenticationErrordiagnostics on the default CLI output path. - Adds unit/integration coverage plus docs + changelog updates; updates gh-aw workflow gating to “skip not fail” for non-matching labels.
Show a summary per file
| File | Description |
|---|---|
src/apm_cli/install/validation.py |
Passes auth_scheme, merges git_env, and raises AuthenticationError on auth failures during repo validation. |
src/apm_cli/install/pipeline.py |
Adds _preflight_auth_check() and re-raises AuthenticationError to avoid generic wrapping. |
src/apm_cli/install/errors.py |
Introduces AuthenticationError carrying diagnostic_context. |
src/apm_cli/commands/install.py |
Renders AuthenticationError diagnostics without requiring --verbose. |
tests/unit/install/test_validation_ado_bearer.py |
Unit tests for bearer scheme plumbing and env merging in validation. |
tests/unit/install/test_pipeline_auth_preflight.py |
Unit tests for preflight probe behavior and deduping. |
tests/unit/install/test_install_cmd_auth_rendering.py |
Tests exception hierarchy expectations for clean rendering (no double wrap). |
tests/unit/install/test_errors.py |
Tests AuthenticationError fields and defaults. |
tests/integration/test_ado_bearer_e2e.py |
Adds #1015 regression coverage for bearer-only install + diagnostic output + update abort behavior. |
packages/apm-guide/.apm/skills/apm-usage/authentication.md |
Updates ADO auth guidance and documents preflight/diagnostics. |
docs/src/content/docs/getting-started/authentication.md |
Updates public docs with recommended az login flow and diagnostics. |
CHANGELOG.md |
Adds an Unreleased entry describing the #1015 fix. |
.github/workflows/*.md / .lock.yml / .github/aw/actions-lock.json |
Updates gh-aw gating approach and refreshes generated lock artifacts. |
Copilot's findings
- Files reviewed: 21/21 changed files
- Comments generated: 1
… test asserts - pipeline.py:301 — pass ctx.auth_resolver to _preflight_auth_check instead of the local arg, which can be None even after resolve phase populates ctx.auth_resolver (resolve.py:91-92). Prevents an AttributeError on update installs that don't pass an AuthResolver explicitly. Flagged by Copilot reviewer. - 3 test files — replace 'dev.azure.com' in str(...) with bounded full-phrase equality assertion against 'Authentication failed for dev.azure.com'. Satisfies CodeQL #82/#83/#84 (incomplete URL substring sanitization) and our own tests.instructions.md rule banning bare URL/host substring checks. Tests: 20/20 green in tests/unit/install/. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3d071f8 to
36ec4a7
Compare
- pipeline.py:301 -- pass ctx.auth_resolver to _preflight_auth_check instead of the local 'auth_resolver' arg, which can be None even after resolve phase populates ctx.auth_resolver (resolve.py:91-92). Prevents AttributeError on update installs that don't pass an AuthResolver explicitly. Flagged by Copilot reviewer. - 3 test files -- replace 'dev.azure.com' in str(...) with bounded full-phrase equality assertion against 'Authentication failed for dev.azure.com'. Satisfies CodeQL alerts #82/#83/#84 (incomplete URL substring sanitization) and our own tests.instructions.md rule banning bare URL/host substring checks. Verified: 20/20 unit tests in tests/unit/install/test_*.py green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review fixes pushed (
|
| Thread | File:Line | Fix |
|---|---|---|
| @copilot-pull-request-reviewer | src/apm_cli/install/pipeline.py:301 |
_preflight_auth_check(ctx, auth_resolver, …) → _preflight_auth_check(ctx, ctx.auth_resolver, …). Resolve phase guarantees ctx.auth_resolver is populated (resolve.py:91-92); the local arg can still be None. Inline comment added. |
| CodeQL #82 | tests/unit/install/test_install_cmd_auth_rendering.py:32 |
assert "dev.azure.com" in str(e) → assert str(e) == "Authentication failed for dev.azure.com" (bounded full-phrase). |
| CodeQL #83 | tests/unit/install/test_pipeline_auth_preflight.py:79 |
Same pattern. |
| CodeQL #84 | tests/unit/install/test_validation_ado_bearer.py:159 |
Same pattern. |
The bounded-equality fix also satisfies our own .github/instructions/tests.instructions.md rule banning bare URL/host substring checks, so it eliminates two classes of issue at once. Targeted suite: 20/20 green (uv run --extra dev pytest tests/unit/install/test_install_cmd_auth_rendering.py tests/unit/install/test_pipeline_auth_preflight.py tests/unit/install/test_validation_ado_bearer.py tests/unit/install/test_errors.py -x).
Live verification with real az login
Local environment, real az, real ADO repo dev.azure.com/dmeppiel-org/market-js-app/_git/compliance-rules:
$ az account show --query "{user: user.name, tenant: tenantId}" -o json
{
"tenant": "<redacted-tenant-id>",
"user": "<redacted-user>"
}
Proof 1 — az login + NO ADO_APM_PAT → install succeeds
This is the headline bug from #1015. Before this PR: silent failure. After this PR:
$ env -u ADO_APM_PAT apm install --only apm --verbose \
"dev.azure.com/dmeppiel-org/market-js-app/_git/compliance-rules"
[*] Validating 1 package...
Trying git ls-remote for dev.azure.com (1 attempt)
git ls-remote (https) rc=0 for dev.azure.com/dmeppiel-org/market-js-app/_git/compliance-rules
[+] dev.azure.com/dmeppiel-org/market-js-app/compliance-rules
Added dev.azure.com/dmeppiel-org/market-js-app/compliance-rules to apm.yml
[*] Updated apm.yml with 1 new package(s)
[>] Installing 1 new package...
Parsed apm.yml: 1 APM deps, 0 MCP deps
[i] dev.azure.com -- using bearer from az cli (source: AAD_BEARER_AZ_CLI)
Resolved 1 direct dependencies (no transitive)
[+] dmeppiel-org/market-js-app/compliance-rules (cached)
|-- 1 instruction(s) integrated -> .github/instructions/
|-- 3 prompts integrated -> .github/prompts/
|-- 1 agents integrated -> .github/agents/
[i] Added apm_modules/ to .gitignore
Generated apm.lock.yaml with 1 dependencies
[*] Installed 1 APM dependency.
$ echo $?
0
Note [i] dev.azure.com -- using bearer from az cli (source: AAD_BEARER_AZ_CLI) — the bearer fix surfaces the real auth source. Compare against main where the JWT was being embedded in the URL and ADO rejected it.
Proof 2 — NO PAT + az NOT signed in → diagnostic on the default path, zero mutation
Simulated logged-out az via AZURE_CONFIG_DIR=<empty-dir>:
$ env -u ADO_APM_PAT AZURE_CONFIG_DIR=/tmp/empty-az-config apm install --only apm \
"dev.azure.com/dmeppiel-org/market-js-app/_git/compliance-rules"
[*] Validating 1 package...
[i] apm.yml restored to its previous state.
Authentication failed for dev.azure.com
Azure DevOps requires authentication. You have two options:
1. Sign in with Azure CLI (recommended for Entra ID users):
az login
apm install # retry -- no env var needed
2. Use a Personal Access Token:
export ADO_APM_PAT=your_token
Docs: https://microsoft.github.io/apm/getting-started/authentication/#azure-devops
$ echo $?
1
Key observations:
- No
--verboseflag — diagnostic renders on the default output path, exactly as the issue comment promised. - The legacy
not accessible or doesn't existstring is gone. az loginis listed first as recommended for Entra users — matches the docs reorder in this PR.apm.yml,apm.lock.yaml,apm_modules/— none mutated (apm.lock.yamlandapm_modules/not even created).
Proof 3 — apm install --update with broken auth → abort + byte-identical state
Started from a successfully installed project (Proof 1's tree), then ran --update with az logged out:
$ # Pre-snapshot
$ shasum -a 256 apm.yml apm.lock.yaml
0782b11fb5cfb02768168c4fb3119a7abb56c201772b2e57bc1cbf12e7243586 apm.yml
314774b230df0a1f562b806ea092eac96437736a097b642c9c1254495eb0c538 apm.lock.yaml
$ # apm_modules/ aggregate hash (find -exec sha256sum | sort | sha256sum):
931c76abb90736348c0f02672c7e1802fcfc5476342436a67c0556d7318aa583 -
$ env -u ADO_APM_PAT AZURE_CONFIG_DIR=/tmp/empty-az-config apm install --update --only apm
[>] Installing dependencies from apm.yml...
Authentication failed for dev.azure.com
Azure DevOps requires authentication. You have two options:
1. Sign in with Azure CLI (recommended for Entra ID users):
az login
apm install # retry -- no env var needed
2. Use a Personal Access Token:
export ADO_APM_PAT=your_token
Docs: https://microsoft.github.io/apm/getting-started/authentication/#azure-devops
No files were modified.
apm.yml, apm.lock.yaml, and apm_modules/ are unchanged.
$ echo $?
1
$ # Post-snapshot — must match pre exactly
$ shasum -a 256 apm.yml apm.lock.yaml
0782b11fb5cfb02768168c4fb3119a7abb56c201772b2e57bc1cbf12e7243586 apm.yml
314774b230df0a1f562b806ea092eac96437736a097b642c9c1254495eb0c538 apm.lock.yaml
931c76abb90736348c0f02672c7e1802fcfc5476342436a67c0556d7318aa583 -
Match? YES — zero mutation.
Pre/post SHA256 match exactly for all three at-risk artifacts. The pre-flight probe aborted before any write phase fired, the abort message contains the literal "No files were modified" guarantee, and the three artifacts are listed by name. This is the data-loss trap from #1015 closed.
Re-requesting review.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com
TL;DR
Fixes the ADO Entra ID (AAD) bearer auth path that silently failed since #852 — the bearer token's
auth_schemeandgit_envwere never threaded through_validate_package_exists, causinggit ls-remoteto run unauthenticated and fall through to a misleading "not accessible or doesn't exist" error. Additionally,apm install --updatesilently overwrote local files even when authentication failed. This PR plumbs the bearer correctly, raises a typedAuthenticationErrorwith actionable diagnostics, and adds a pre-flight auth check that aborts--updatebefore modifying any files.Note
Closes #1015. Integration tests require live ADO access (
APM_TEST_ADO_BEARER=1+az login); they are skipped in default CI.Problem (WHY)
auth_schemedropped._validate_package_existshardcodedauth_scheme="basic", so bearer tokens were embedded as:<token>@in the clone URL instead of being passed viaGIT_ASKPASS— causing a 401 on every ADO bearer-only install.git_envnever merged. The subprocess environment forgit ls-remotein validation did not includedep_ctx.git_env, soGIT_ASKPASSandGIT_TERMINAL_PROMPT=0were missing —gitwould either prompt interactively or use no auth at all.except Exceptionat line 489 ofvalidation.pycaughtAuthenticationError(RuntimeError), silently returningFalseand producing the ambiguous "not accessible or doesn't exist" error instead of an actionable diagnostic.--updateoverwrites on auth failure. No pre-flight check existed — the pipeline proceeded to modifyapm.yml,apm.lock.yaml, andapm_modules/even when every remote operation failed due to missing credentials.AuthenticationErrorwas caught by the genericexcept Exceptionininstall_pipeline(), re-wrapped as"Failed to install APM dependencies: Failed to resolve…", hiding the original diagnostic.Approach (WHAT)
dep_ctx.auth_schemeand pass it to_build_repo_urlso bearer usesGIT_ASKPASS, not URL embeddingdep_ctx.git_envinto the subprocess environment forgit ls-remotein validationgit ls-remotestderr and raiseAuthenticationErrorwithbuild_error_contextdiagnosticsexcept AuthenticationError: raisebefore the broadexcept Exceptionin bothvalidation.pyandpipeline.py_preflight_auth_check()topipeline.py— clusters deps by(host, org), runs onegit ls-remoteper cluster, aborts with "No files were modified" on failureexcept AuthenticationErrorhandlers at both install entry points incommands/install.py— render diagnostic on default output pathImplementation (HOW)
src/apm_cli/install/errors.py— AddedAuthenticationError(RuntimeError)with adiagnostic_context: strattribute for carrying the pre-rendered 4-case ADO diagnostic frombuild_error_context.src/apm_cli/install/validation.py— Three surgical fixes: (1) capture_auth_schemefromdep_ctxand pass to_build_repo_url, (2) mergedep_ctx.git_envinto subprocess env, (3) detect auth-failure patterns in stderr and raiseAuthenticationError. Also addedexcept AuthenticationError: raisebefore the outerexcept Exceptionthat was swallowing it.src/apm_cli/install/pipeline.py— Added_preflight_auth_check()that clusters--updatedeps by(host, org), skipsgithub.com, and runs onegit ls-remoteper cluster. On failure raisesAuthenticationErrorwith "No files were modified" + file list. Also addedexcept AuthenticationError: raisebefore the generic handler.src/apm_cli/commands/install.py— Addedexcept AuthenticationErrorat both entry points (beforeexcept Exception). Renders_rich_error(str(e))+_rich_echo(e.diagnostic_context)on the default output path, thensys.exit(1).docs/…/authentication.md+packages/…/authentication.md— Reordered ADO section to lead withaz login, added tenant-discovery guidance, documented auth-failure diagnostics and--updatepre-flight behavior. Skill resource mirrored.CHANGELOG.md— Unreleased entry documenting the fix.Diagrams
Legend: Data flow from
AuthResolverthrough validation, pipeline, and rendering — dashed nodes are new in this PR.flowchart LR subgraph Resolve["AuthResolver"] A1["resolve_for_dep()"] end subgraph Validate["validation.py"] V1["_validate_package_exists()"] V2["auth_scheme passthrough"]:::new V3["git_env merge"]:::new V4["401/403 detection"]:::new end subgraph Pipeline["pipeline.py"] P1["_preflight_auth_check()"]:::new P2["install_pipeline()"] P3["AuthenticationError re-raise"]:::new end subgraph Render["commands/install.py"] R1["except AuthenticationError"]:::new R2["_rich_error + diagnostic"]:::new end A1 --> V1 V2 --> V1 V3 --> V1 V1 -->|"auth failure"| V4 V4 -->|"raises"| P3 P1 -->|"--update pre-flight"| V1 P3 -->|"propagates"| R1 R1 --> R2 classDef new stroke-dasharray: 5 5; class V2,V3,V4,P1,P3,R1,R2 new;Trade-offs
AuthenticationErrorcarrying adiagnostic_contextstring; rejected per-host subclasses (AdoAuthError,GheAuthError) because the 4-case diagnostic frombuild_error_contextalready encodes host-specific guidance — a class hierarchy would add indirection without new signal.git ls-remotevstry_with_fallbackin validation. Rejected refactoring_validate_package_existsto usetry_with_fallbackbecause the blast radius would touch every host type, not just ADO. The pre-flight probe is additive and ADO-scoped.azvsazure-identitySDK. Kept the existingazure_cli.pysubprocess contract (az account get-access-token); addingazure-identitywould pull a heavy dependency and change the auth surface for a bug fix.except AuthenticationError: raisepattern (2 sites). Surgical re-raise before broadexcept Exceptionrather than narrowing the outer catch — minimal diff, zero risk to non-auth paths.Benefits
az login && apm installwithout hitting silent failures.--updateis safe. Pre-flight probe rejects bad auth before modifyingapm.yml,apm.lock.yaml, orapm_modules/— users see "No files were modified".AuthenticationErrorroundtrip, validation bearer plumbing, pipeline preflight, and install command rendering.Validation
uv run --extra dev pytest tests/unit tests/test_console.py -x:Full pytest output (6695 tests)
New test files only:
How to test
uv run --extra dev pytest tests/unit/install/test_errors.py tests/unit/install/test_validation_ado_bearer.py tests/unit/install/test_pipeline_auth_preflight.py tests/unit/install/test_install_cmd_auth_rendering.py -x— all 20 pass.uv run --extra dev pytest tests/unit tests/test_console.py -x— full suite (6695) passes.unset ADO_APM_PAT && az login --tenant <tenant> && apm install dev.azure.com/<org>/<project>/<repo>— succeeds via bearer.unset ADO_APM_PAT && AZURE_CONFIG_DIR=/nonexistent apm install dev.azure.com/<org>/<project>/<repo>— prints 4-case diagnostic, exits non-zero.--update— prints "No files were modified", local files untouched.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com