Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,7 @@ claude/
| `Error occurred when subscribing to Marketplace: Marketplace Subscription purchase eligibility check failed` | Your subscription cannot purchase the Anthropic offer (no entitlement, sandbox sub, paid-offer policy denial, etc.). Either use a subscription with Claude-on-Foundry entitlement, or pre-accept the agreement explicitly with `az term accept --publisher anthropic --product anthropic-<model>-offer --plan anthropic-<model>-plan-new`. |
| Opaque `400 715-123420 "An error occurred. Please reach out to support for additional assistance."` on the Terraform deployment step (RG / Foundry account / project all succeed) | **Insufficient quota.** Terraform's `azapi_resource` bypasses ARM preflight validation and the Cognitive Services RP returns this generic code instead of `InsufficientQuota`. **Fix:** check `az cognitiveservices usage list -l <region> --query "[?contains(name.value,'<model>')]"` — if `currentValue + requestedCapacity > limit`, lower `CLAUDE_SONNET_CAPACITY` / `CLAUDE_HAIKU_CAPACITY` / `CLAUDE_OPUS_CAPACITY` via `azd env set`, delete unused deployments to free capacity, or request a quota increase in the Foundry portal. **Also check for soft-deleted accounts** still holding quota — see [Free quota held by soft-deleted accounts](#free-quota-held-by-soft-deleted-accounts). To confirm it really is quota, re-run on the Bicep variant which surfaces the clearer `InsufficientQuota` error. |
| Bicep: `InsufficientQuota: This operation require N new capacity in quota Tokens Per Minute (thousands) - Claude <model>, which is bigger than the current available capacity X. The current quota usage is U and the quota limit is L.` | Same root cause as `715-123420` above, just with a clear message because Bicep goes through ARM preflight. Lower the capacity env var(s) or free up quota. |
| `GatewayTimeout: The gateway did not receive a response from 'Microsoft.CognitiveServices' within the specified time period.` during the model deployment step, often with the deployment stuck in `Creating` | **ARM-layer poll timeout on a slow long-running operation, not a real failure.** The Cognitive Services RP keeps working after ARM gives up; the model deployment can still reach `Succeeded` minutes later. First-time Claude provisioning on a fresh resource is the slowest combination, and times vary by region and family. **Do not re-run `azd up` blindly &mdash; it can collide with the in-flight LRO.** Check the server-side state first: run `pwsh -File scripts/verify-claude-code.ps1 -WaitForDeployment` (POSIX: `bash scripts/verify-claude-code.sh --wait-for-deployment`), which polls `az cognitiveservices account deployment list` and waits while any deployment is still `Creating`. Or check directly: `az cognitiveservices account deployment list -g <rg> -n <foundry-account>`. If state is already `Succeeded`, run `azd env refresh` to repopulate outputs and you're done. |
| Preflight: `Marketplace offer ... not found` | `CLAUDE_MODEL_NAME` is misspelled, the model isn't in the Anthropic-on-Foundry catalog yet, or Anthropic changed the plan-name convention. |
| Preflight: `Quota insufficient` (exit 6) | Requested `CLAUDE_*_CAPACITY` plus existing usage exceeds the per-region quota limit. Lower the requested capacity, free up quota by deleting unused deployments, or [purge soft-deleted accounts](#free-quota-held-by-soft-deleted-accounts) that may still be holding TPM. |
| Quota looks full but you have no live deployments (`az cognitiveservices usage list` shows `currentValue > 0`, deployment still fails with `715-123420` / `InsufficientQuota`) | **Soft-deleted Cognitive Services accounts still reserve quota for 48 h.** A previous `azd down` (or any RG / account delete) puts the AIServices account in a recoverable state that keeps holding TPM. **Fix:** list and purge them: `az cognitiveservices account list-deleted -o table` then `az cognitiveservices account purge --name <name> --location <region> --resource-group <rg>` for each. See [Free quota held by soft-deleted accounts](#free-quota-held-by-soft-deleted-accounts). |
Expand Down
105 changes: 100 additions & 5 deletions scripts/verify-claude-code.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,19 @@
schema the Claude Code VS Code extension reads.
3. `az` is logged in and the current token tenant matches the tenant
that owns the Foundry resource (a mismatch is the #1 cause of 401s).
4. The Claude Code CLI is on PATH. If not, the script prints the install
4. Each Claude model deployment on the Foundry account has reached a
terminal `provisioningState`. If `azd up` returned a `GatewayTimeout`
from `Microsoft.CognitiveServices`, that is an ARM-layer poll timeout,
not a deployment failure &mdash; the RP often keeps going for many
more minutes. Re-running this verifier (optionally with
`-WaitForDeployment`) is the safe way to confirm the actual outcome
without colliding with the in-flight long-running operation.
5. The Claude Code CLI is on PATH. If not, the script prints the install
hint (or runs the official installer when `-AutoInstall` is set, the
same gate as `CLAUDE_CODE_AUTO_INSTALL` in the postprovision hook).
5. (Default) A non-interactive `claude -p` round trip against each
6. (Default) A non-interactive `claude -p` round trip against each
deployed family. Skips this step with `-SkipClaudeCall`.
6. (Opt-in) A `python src/hello_claude.py` round trip exercising the
7. (Opt-in) A `python src/hello_claude.py` round trip exercising the
Anthropic SDK + Entra ID code path. Enable with `-RunPythonSample`.

.PARAMETER RepoRoot
Expand All @@ -35,6 +42,18 @@
Requires `.env.local` populated via `azd env get-values` and a venv with
`pip install -r requirements.txt`.

.PARAMETER WaitForDeployment
If any Claude model deployment is still in a non-terminal state (e.g.
`Creating`), poll the Cognitive Services RP until every deployment
reaches `Succeeded` / `Failed` / `Canceled` or `-WaitTimeoutSeconds`
elapses. Use this after a `GatewayTimeout` from `azd up` to confirm
whether the deployment actually finished server-side.

.PARAMETER WaitTimeoutSeconds
Maximum seconds to wait for non-terminal deployments when
`-WaitForDeployment` is set. Default: 1800 (30 min). Set to 0 for a
single status check with no polling.

.EXAMPLE
pwsh -File scripts/verify-claude-code.ps1
# All checks + live claude -p round trip per deployed family.
Expand All @@ -46,13 +65,20 @@
.EXAMPLE
pwsh -File scripts/verify-claude-code.ps1 -RunPythonSample
# Adds a Python Entra ID round trip on top of the standard checks.

.EXAMPLE
pwsh -File scripts/verify-claude-code.ps1 -WaitForDeployment
# Use this after a GatewayTimeout from `azd up` to wait for the RP to
# finish provisioning the model deployment(s).
#>
[CmdletBinding()]
param(
[string] $RepoRoot,
[switch] $AutoInstall,
[switch] $SkipClaudeCall,
[switch] $RunPythonSample
[switch] $RunPythonSample,
[switch] $WaitForDeployment,
[int] $WaitTimeoutSeconds = 1800
)

$ErrorActionPreference = 'Stop'
Expand Down Expand Up @@ -172,7 +198,8 @@ if (-not $azCmd) {
$accountsJson = & az cognitiveservices account list -o json 2>$null
if ($accountsJson) {
$accounts = $accountsJson | ConvertFrom-Json
$found = $accounts | Where-Object { $_.name -eq $foundryResource } | Select-Object -First 1
$script:foundryAccount = $accounts | Where-Object { $_.name -eq $foundryResource } | Select-Object -First 1
$found = $script:foundryAccount
}
} catch { }
if ($found) {
Expand All @@ -187,6 +214,74 @@ if (-not $azCmd) {
}
}

# ---------------------------------------------------------------------------
# 4b. Model deployment provisioning state.
#
# A `GatewayTimeout` from `Microsoft.CognitiveServices` during `azd up`
# is an ARM-layer poll timeout, not a deployment failure -- the RP
# often keeps provisioning for many more minutes. This check asks the
# RP directly so we can confirm the actual outcome without re-running
# `azd up` (which can collide with the in-flight LRO).
# ---------------------------------------------------------------------------
if ($azCmd -and $script:foundryAccount -and $deployedFamilies.Count -gt 0) {
$rgName = $script:foundryAccount.resourceGroup
$expectedNames = @($deployedFamilies | ForEach-Object { $_.Deployment })
$terminalStates = @('Succeeded', 'Failed', 'Canceled')
$deadline = (Get-Date).AddSeconds([math]::Max(0, $WaitTimeoutSeconds))
$pollIntervalSec = 30
$firstPass = $true

while ($true) {
$deployments = @()
try {
$depsJson = & az cognitiveservices account deployment list -g $rgName -n $foundryResource -o json 2>$null
if ($depsJson) { $deployments = @($depsJson | ConvertFrom-Json) }
} catch { }

$statuses = @()
$stillCreating = @()
foreach ($name in $expectedNames) {
$d = $deployments | Where-Object { $_.name -eq $name } | Select-Object -First 1
if (-not $d) {
$statuses += [pscustomobject]@{ Name = $name; State = '<missing>' }
continue
}
$state = $d.properties.provisioningState
$statuses += [pscustomobject]@{ Name = $name; State = $state }
if ($state -and $terminalStates -notcontains $state) {
$stillCreating += $name
}
}

if ($firstPass -or $stillCreating.Count -eq 0 -or -not $WaitForDeployment -or (Get-Date) -ge $deadline) {
foreach ($s in $statuses) {
$checkName = "Deployment '$($s.Name)'"
switch ($s.State) {
'Succeeded' { Add-Result $checkName 'PASS' 'provisioningState=Succeeded' }
'Failed' { Add-Result $checkName 'FAIL' 'provisioningState=Failed' }
'Canceled' { Add-Result $checkName 'FAIL' 'provisioningState=Canceled' }
'<missing>' { Add-Result $checkName 'WARN' 'not found on Foundry account - may still be creating, or activator is stale' }
default {
$hint = if ($WaitForDeployment) { "still $($s.State) after waiting $WaitTimeoutSeconds s" } else { "provisioningState=$($s.State); rerun with -WaitForDeployment to poll" }
Add-Result $checkName 'WARN' $hint
}
}
}
}

if (-not $WaitForDeployment -or $stillCreating.Count -eq 0 -or (Get-Date) -ge $deadline) {
break
}

$remaining = [int]($deadline - (Get-Date)).TotalSeconds
Write-Host (" ... {0} deployment(s) still provisioning ({1}); polling again in {2}s (timeout in {3}s)" -f $stillCreating.Count, ($stillCreating -join ', '), $pollIntervalSec, $remaining) -ForegroundColor DarkGray
Start-Sleep -Seconds $pollIntervalSec
$firstPass = $false
}
} elseif ($WaitForDeployment) {
Add-Result 'Model deployment state' 'WARN' 'cannot poll - az not available, Foundry resource not visible, or no families set'
}

# ---------------------------------------------------------------------------
# 5. Claude Code CLI on PATH (optional auto-install).
# ---------------------------------------------------------------------------
Expand Down
71 changes: 65 additions & 6 deletions scripts/verify-claude-code.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
# bash scripts/verify-claude-code.sh --skip-claude-call # config checks only, no token cost
# bash scripts/verify-claude-code.sh --auto-install # install claude CLI if missing
# bash scripts/verify-claude-code.sh --run-python-sample # also run python src/hello_claude.py
# bash scripts/verify-claude-code.sh --wait-for-deployment # poll RP while any deployment is still Creating
# (use after a GatewayTimeout from `azd up`)
# bash scripts/verify-claude-code.sh --wait-timeout 1800 # cap on --wait-for-deployment (default 1800s)
#
# Exit codes:
# 0 all checks passed (warnings allowed)
Expand All @@ -17,15 +20,19 @@ repo_root=""
auto_install=0
skip_claude=0
run_python=0
wait_deployment=0
wait_timeout=1800

while [[ $# -gt 0 ]]; do
case "$1" in
--repo-root) repo_root="$2"; shift 2 ;;
--auto-install) auto_install=1; shift ;;
--skip-claude-call) skip_claude=1; shift ;;
--run-python-sample) run_python=1; shift ;;
-h|--help) sed -n '2,15p' "$0"; exit 0 ;;
*) echo "Unknown flag: $1" >&2; exit 2 ;;
--repo-root) repo_root="$2"; shift 2 ;;
--auto-install) auto_install=1; shift ;;
--skip-claude-call) skip_claude=1; shift ;;
--run-python-sample) run_python=1; shift ;;
--wait-for-deployment) wait_deployment=1; shift ;;
--wait-timeout) wait_timeout="$2"; shift 2 ;;
-h|--help) sed -n '2,15p' "$0"; exit 0 ;;
*) echo "Unknown flag: $1" >&2; exit 2 ;;
esac
done

Expand Down Expand Up @@ -142,13 +149,65 @@ else
loc=$(az cognitiveservices account list -o tsv --query "[?name=='$foundry_resource'].location | [0]" 2>/dev/null || echo '')
if [[ -n "$rg" ]]; then
add_result PASS "Foundry resource reachable" "$foundry_resource (rg: $rg, location: $loc)"
foundry_rg="$rg"
else
add_result WARN "Foundry resource reachable" "$foundry_resource not visible to current az login - wrong tenant/subscription?"
fi
fi
fi
fi

# 4b. Model deployment provisioning state.
#
# A `GatewayTimeout` from `Microsoft.CognitiveServices` during `azd up`
# is an ARM-layer poll timeout, not a deployment failure -- the RP
# often keeps provisioning for many more minutes. Ask the RP directly
# so we can confirm the actual outcome without re-running `azd up`.
foundry_rg="${foundry_rg:-}"
if command -v az >/dev/null 2>&1 && [[ -n "$foundry_rg" && ${#deployed_families[@]} -gt 0 ]]; then
poll_interval=30
deadline=$(( $(date +%s) + (wait_timeout > 0 ? wait_timeout : 0) ))
first_pass=1
while :; do
deps_json=$(az cognitiveservices account deployment list -g "$foundry_rg" -n "$foundry_resource" -o json 2>/dev/null || echo '[]')
still_creating=()
for entry in "${deployed_families[@]}"; do
name="${entry##*|}"
state=$(echo "$deps_json" | python -c "import json,sys; data=json.load(sys.stdin); m=[d for d in data if d.get('name')==sys.argv[1]]; print(m[0]['properties']['provisioningState'] if m else '<missing>')" "$name" 2>/dev/null || echo '<unknown>')
case "$state" in
Succeeded|Failed|Canceled|'<missing>'|'<unknown>') : ;;
*) still_creating+=("$name") ;;
esac
if [[ $first_pass -eq 1 || ${#still_creating[@]} -eq 0 || $wait_deployment -eq 0 || $(date +%s) -ge $deadline ]]; then
case "$state" in
Succeeded) add_result PASS "Deployment '$name'" "provisioningState=Succeeded" ;;
Failed) add_result FAIL "Deployment '$name'" "provisioningState=Failed" ;;
Canceled) add_result FAIL "Deployment '$name'" "provisioningState=Canceled" ;;
'<missing>') add_result WARN "Deployment '$name'" "not found on Foundry account - may still be creating, or activator is stale" ;;
'<unknown>') add_result WARN "Deployment '$name'" "could not parse deployment list (jq/python missing?)" ;;
*)
if [[ $wait_deployment -eq 1 ]]; then
add_result WARN "Deployment '$name'" "still $state after waiting ${wait_timeout}s"
else
add_result WARN "Deployment '$name'" "provisioningState=$state; rerun with --wait-for-deployment to poll"
fi
;;
esac
fi
done

if [[ $wait_deployment -eq 0 || ${#still_creating[@]} -eq 0 || $(date +%s) -ge $deadline ]]; then
break
fi
remaining=$(( deadline - $(date +%s) ))
printf " ${C_DIM}... %d deployment(s) still provisioning (%s); polling again in %ds (timeout in %ds)${C_RST}\n" "${#still_creating[@]}" "$(IFS=,; echo "${still_creating[*]}")" "$poll_interval" "$remaining"
sleep "$poll_interval"
first_pass=0
done
elif [[ $wait_deployment -eq 1 ]]; then
add_result WARN "Model deployment state" "cannot poll - az not available, Foundry resource not visible, or no families set"
fi

# 5. Claude Code CLI on PATH.
auto_install_env="${CLAUDE_CODE_AUTO_INSTALL:-}"
auto_install_env_on=0
Expand Down
1 change: 1 addition & 0 deletions skills/claude-on-foundry/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ Match the customer's exact error string to a row. Verify the diagnostic command
| `Marketplace offer ... not found` (from preflight, exit 4) | `CLAUDE_*_MODEL` value is misspelled or that SKU isn't in the catalog. | `./Get-ClaudeCatalog.ps1` and grep the family. | Set `CLAUDE_<FAMILY>_MODEL` to a name from the catalog. |
| `Quota insufficient` (from preflight, exit 6) | Requested capacity + existing usage > per-region limit. | `az cognitiveservices usage list -l <region> --query "[?contains(name.value,'claude-')]"` | Lower `CLAUDE_<FAMILY>_CAPACITY`, free quota (see soft-delete row), or request a quota bump in the Foundry portal. |
| Bicep: `InsufficientQuota: This operation require N new capacity in quota Tokens Per Minute (thousands) - Claude <model>` | Same as above; Bicep gets the clear message because it goes through ARM preflight. | Same diagnostic. | Same fix. |
| `GatewayTimeout: The gateway did not receive a response from 'Microsoft.CognitiveServices' within the specified time period.` &mdash; deployment stuck in `Creating` | ARM-layer poll timeout on a slow LRO, **not** a real failure. The RP keeps provisioning after ARM gives up; deployment usually reaches `Succeeded` minutes later. More likely on first-time deploys; varies by region and family. | `az cognitiveservices account deployment list -g <rg> -n <foundry-account> -o table` &mdash; check `provisioningState`. | If `Succeeded`: run `azd env refresh` and proceed. If still `Creating`: wait it out with `pwsh -File scripts/verify-claude-code.ps1 -WaitForDeployment` (POSIX: `--wait-for-deployment`), which polls until terminal state. **Do not re-run `azd up`** &mdash; it can collide with the in-flight LRO. |
| Terraform: opaque `400 715-123420 "An error occurred. Please reach out to support for additional assistance."` | **Almost always insufficient quota.** Terraform's `azapi_resource` skips ARM preflight so the RP returns this generic code. | `az cognitiveservices usage list -l <region> --query "[?contains(name.value,'<model>')].{quota:name.value, used:currentValue, limit:limit}" -o table` | If `used + requested > limit`: lower capacity OR purge soft-deleted accounts (next row). Re-run on Bicep variant if you need a clearer error. |
| Quota looks full but no live deployments exist | Soft-deleted Cognitive Services accounts hold quota for up to 48 h. | `az cognitiveservices account list-deleted -o table` | **Confirm with user first**, then for each: `az cognitiveservices account purge --name <n> --location <loc> --resource-group <rg>`. The original RG name is in the deleted-account id field 9. |
| `Marketplace Subscription purchase eligibility check failed` | Subscription can't purchase the Anthropic offer (no entitlement / sandbox / paid-offer policy). | Confirm sub type (see [PLAN](#plan--before-azd-up)). | Either use a Claude-eligible sub, or pre-accept explicitly: `az term accept --publisher anthropic --product anthropic-<model>-offer --plan anthropic-<model>-plan-new`. |
Expand Down
Loading