Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 17 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,25 +225,30 @@ msg = client.messages.create(
<details id="advanced-long-running-processes-auto-refreshing-the-entra-id-token">
<summary><strong>Long-running processes: auto-refreshing the Entra ID token</strong></summary>

The plain `anthropic.Anthropic` client only accepts `auth_token: str | None`, so a captured token will start failing with `401 Unauthorized` after ~1 hour.
A captured Entra ID token starts failing with `401 Unauthorized` after ~1 hour, so services, daemons, long batch jobs, and notebooks left open need a refresh path.

For services, daemons, long batch jobs, or notebooks left open, use [src/hello_claude_token_refresh.py](./src/hello_claude_token_refresh.py). It defines a tiny `AnthropicIdentity(Anthropic)` subclass that overrides the `auth_token` property to call `azure.identity.get_bearer_token_provider(...)` per request, giving free per-request token refresh:
Anthropic SDK v0.98+ added a `credentials=` constructor parameter that takes an `AccessTokenProvider` callable. The SDK wraps it in a `TokenCache` that calls the provider lazily, caches the token until expiry, and on a 401 invalidates the cache and retries the request once with a fresh token &mdash; exactly what we want.

[src/hello_claude_token_refresh.py](./src/hello_claude_token_refresh.py) wires `azure.identity.DefaultAzureCredential` into that hook:

```python
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
# AnthropicIdentity is defined in hello_claude_token_refresh.py
from hello_claude_token_refresh import AnthropicIdentity
from anthropic import Anthropic
from anthropic.lib.credentials import AccessToken
from azure.identity import DefaultAzureCredential

token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://ai.azure.com/.default"
)
client = AnthropicIdentity(
azure_ad_token_provider=token_provider,
credential = DefaultAzureCredential()

def entra_token_provider(*, force_refresh: bool = False) -> AccessToken:
token = credential.get_token("https://ai.azure.com/.default")
return AccessToken(token=token.token, expires_at=token.expires_on)

client = Anthropic(
credentials=entra_token_provider,
base_url="https://<resource>.services.ai.azure.com/anthropic",
)
```

If the Anthropic SDK ever accepts a callable for `auth_token`, this shim becomes unnecessary.
Requires `anthropic>=0.109.1` (pinned in [requirements.txt](./requirements.txt)).

</details>

Expand Down Expand Up @@ -733,7 +738,7 @@ The Terraform variant uses `azapi_resource` for both the Foundry account and the
| `Project can only be created under AIServices Kind account with allowProjectManagement set to true` | Account property missing. Both variants here set it; check you didn't downgrade the API version. |
| `404 Not Found` on inference | Base URL must end in `/anthropic` &mdash; `https://<resource>.services.ai.azure.com/anthropic`. |
| `401 Unauthorized` | Token scope must be `https://ai.azure.com/.default`. Re-run `az login`. |
| `401 Unauthorized` after ~1 hour of running | The Entra ID token captured at startup has expired. The plain `Anthropic` client doesn't auto-refresh &mdash; see the [long-running token refresh shim](#advanced-long-running-processes-auto-refreshing-the-entra-id-token) for [src/hello_claude_token_refresh.py](./src/hello_claude_token_refresh.py), which uses an `AnthropicIdentity` shim to refresh per request. |
| `401 Unauthorized` after ~1 hour of running | The Entra ID token captured at startup has expired. Pass `credentials=` to `Anthropic(...)` instead of `auth_token=` &mdash; see [long-running processes](#advanced-long-running-processes-auto-refreshing-the-entra-id-token) and [src/hello_claude_token_refresh.py](./src/hello_claude_token_refresh.py). The SDK's `TokenCache` refreshes the token on 401 and retries once. |
| `403 Forbidden` | Missing a data-plane role on the Foundry account. Grant `Cognitive Services User`, `Foundry User` (formerly `Azure AI User`), or `Azure AI Developer` (see [Required permissions](#required-permissions)). |
| `Region not available` | Deploy to `eastus2` or `swedencentral` (or `westus2` for opus-only). |
| Subscription can't deploy Claude | Confirm subscription eligibility per the [official docs](https://learn.microsoft.com/azure/ai-foundry/foundry-models/how-to/use-foundry-models-claude#prerequisites). The [preprovision preflight](#preprovision-preflight-marketplace-catalog--quota) warns about this before `azd up` calls the RP. |
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
anthropic>=0.104.1
anthropic>=0.109.1
azure-identity>=1.19.0
python-dotenv>=1.2.2
4 changes: 2 additions & 2 deletions skills/claude-on-foundry/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Match the customer's exact error string to a row. Verify the diagnostic command
|---|---|---|---|
| `404 Not Found` on first SDK call | `base_url` is missing the `/anthropic` suffix. | Print the `base_url` the script is using. | Append `/anthropic` so it's `https://<resource>.services.ai.azure.com/anthropic`. |
| `401 Unauthorized` on first call | Token scope wrong, or no `az login`. | `az account get-access-token --resource https://ai.azure.com/.default --query expiresOn` | `az login` (add `--tenant <id>` if Foundry is in a different tenant). Scope must be `https://ai.azure.com/.default`. |
| `401 Unauthorized` after ~1 hour of running | Captured token expired; plain `Anthropic` client doesn't auto-refresh. | Check how long the process has been alive. | Switch to [`src/hello_claude_token_refresh.py`](../../src/hello_claude_token_refresh.py) which uses `AnthropicIdentity` + `get_bearer_token_provider` for per-request refresh. |
| `401 Unauthorized` after ~1 hour of running | Captured static `auth_token` expired; the SDK does not refresh it. | Check how long the process has been alive. | Switch to [`src/hello_claude_token_refresh.py`](../../src/hello_claude_token_refresh.py) which uses `Anthropic(credentials=...)` &mdash; the SDK's `TokenCache` refreshes on 401 and retries once. Requires `anthropic>=0.109.1`. |
| `401 PermissionDenied: Principal does not have access to API/Operation` — intermittently, passes seconds later | Data-plane RBAC propagation lag right after a role grant. | `az role assignment list --assignee <oid> --scope <foundry-account-id> -o table` | Wait 1-3 minutes and retry. Do NOT suggest disabling retries. |
| `403 Forbidden` consistently | Caller has no data-plane role on the Foundry account. | Same `az role assignment list` query. | Grant `Cognitive Services User` (minimum), `Foundry User`, or `Azure AI Developer`. See the one-liner in [README → Granting data-plane roles after `azd up`](../../README.md#granting-data-plane-roles-after-azd-up). |
| `claude -p` says: `The model claude-<family>-... is not available on your foundry deployment` | User-global `~/.claude/settings.json` pins a family this workspace didn't deploy, overriding the workspace pin. | `cat .claude/settings.json` and `cat ~/.claude/settings.json`. | Re-run `pwsh -File scripts/configure-claude-code.ps1`, OR pass `--model <sonnet\|opus\|haiku>` explicitly, OR (with user OK) edit the user-global file to remove the `"model"` line. |
Expand Down Expand Up @@ -172,7 +172,7 @@ claude # interactive REPL; try /status and /mode
| **Switch variants (Bicep ↔ Terraform)** | They produce equivalent infra but with different `azd` env state. Create a new env in the other folder: `cd infra-terraform && azd env new <name> && ...`. |
| **Refresh Claude Code wiring** | `pwsh -File scripts/configure-claude-code.ps1` (or the `.sh` variant). Idempotent — runs without re-deploying. |
| **Wire up the Claude Code VS Code extension** | `azd env set CLAUDE_WRITE_VSCODE_SETTINGS 1` (then re-run `azd provision` or the configure script). Opt-in because the activator + `.claude/settings.json` are enough for the CLI and SDK; only the [Anthropic Claude Code VS Code extension](https://marketplace.visualstudio.com/items?itemName=anthropic.claude-code) needs `claudeCode.*` keys in workspace settings. |
| **Convert to long-running auth** | Replace `Anthropic(auth_token=...)` with `AnthropicIdentity(azure_ad_token_provider=...)` from [`src/hello_claude_token_refresh.py`](../../src/hello_claude_token_refresh.py). |
| **Convert to long-running auth** | Replace `Anthropic(auth_token=token)` with `Anthropic(credentials=provider)` where `provider` is a zero-arg callable returning `anthropic.lib.credentials.AccessToken(token, expires_at)`. See [`src/hello_claude_token_refresh.py`](../../src/hello_claude_token_refresh.py). Requires `anthropic>=0.109.1`. |

---

Expand Down
64 changes: 29 additions & 35 deletions src/hello_claude_token_refresh.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,51 +7,49 @@

## Why this exists

The plain `anthropic.Anthropic` client only accepts `auth_token: str | None`,
so a captured Entra ID token will start failing with `401 Unauthorized` after
The plain `anthropic.Anthropic` client's `auth_token` is a static `str`, so a
captured Entra ID token would start failing with `401 Unauthorized` after
roughly an hour.

The Anthropic SDK reads `self.auth_token` via a property on every request, so
we subclass `Anthropic` and turn it into a property that calls the Entra
token provider, giving free per-request token refresh.
Anthropic SDK v0.98+ added a public `credentials=` constructor argument that
takes an `AccessTokenProvider` callable. The SDK wraps it in a `TokenCache`
that calls the provider lazily, caches the result until expiry, and on a 401
invalidates the cache and retries the request once with a fresh token. That
matches exactly what we need to bridge `azure.identity` into the Anthropic
client without subclassing or shimming `auth_token`.
"""

from __future__ import annotations

import os
import sys
from typing import Callable

from anthropic import Anthropic
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from anthropic.lib.credentials import AccessToken
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv


class AnthropicIdentity(Anthropic):
"""Plain Anthropic client that pulls a fresh Entra ID token per request."""
def _entra_credentials_provider(scope: str = "https://ai.azure.com/.default"):
"""Build an Anthropic `AccessTokenProvider` backed by `DefaultAzureCredential`.

def __init__(
self,
*,
azure_ad_token_provider: Callable[[], str],
base_url: str,
**kwargs,
) -> None:
self._azure_ad_token_provider = azure_ad_token_provider
# `auth_token` must be non-None so the parent's auth-header builder
# emits the `Authorization` header. The actual token comes from our
# property override below.
super().__init__(auth_token="placeholder", base_url=base_url, **kwargs)
The provider is called by the SDK's `TokenCache` only when there is no
cached token, when the cached token has expired, or when a 401 forced an
invalidation. `azure.identity` itself also caches and refreshes tokens
internally, so this stays cheap on the hot path.
"""
credential = DefaultAzureCredential()

@property
def auth_token(self) -> str: # type: ignore[override]
return self._azure_ad_token_provider()
def _provider(*, force_refresh: bool = False) -> AccessToken:
# `force_refresh` is set by TokenCache.invalidate() after a 401.
# DefaultAzureCredential does not expose a force-refresh knob, but
# re-calling get_token() is enough: it will mint a new token if the
# cached one is close to expiry, which is the common 401 cause.
token = credential.get_token(scope)
# `expires_on` is unix seconds — the format Anthropic's TokenCache expects.
return AccessToken(token=token.token, expires_at=token.expires_on)

@auth_token.setter
def auth_token(self, _value: str | None) -> None:
# Silently ignore the parent's `self.auth_token = ...` assignment;
# the provider is the source of truth.
pass
return _provider


def main() -> int:
Expand All @@ -64,12 +62,8 @@ def main() -> int:
print("Set CLAUDE_BASE_URL and CLAUDE_DEPLOYMENT_NAME.", file=sys.stderr)
return 1

token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://ai.azure.com/.default"
)

client = AnthropicIdentity(
azure_ad_token_provider=token_provider,
client = Anthropic(
credentials=_entra_credentials_provider(),
base_url=base_url,
)

Expand Down
Loading