Feat/api version default routing#30
Open
pelenz wants to merge 4 commits into
Open
Conversation
- Model: gpt-35-turbo/0125 (retired 2025-11-14) → gpt-4.1-mini/2025-04-14 - Region: AOAI eastus2 + APIM koreacentral → both westeurope - AOAI API version: 2024-02-01 → 2024-10-21 - AOAI deployment capacity: 2 → 50 TPM (was tripping 429s with default) - README: prepend "Fork notes" section with recon deployment runbook, azd env get-value extraction, recon invocation, LAW KQL cross-check, teardown. Upstream sample content preserved below the divider. This fork now serves as the second recon test target — CS integration pattern (a), AOAI built-in RAI filter only, no APIM llm-content-safety policy, no middleware. Complements ailz-dev (pattern d) for cross-pattern diff testing in recon. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The upstream OpenAPI declares every operation with `?api-version={api-version}`,
which makes the query param part of APIM operation matching — requests without
it 404 before any policy runs. To fix:
- Add an all-APIs (service-level) policy that sets `api-version=2024-10-21`
when the caller doesn't pass one. Runs after route matching, so it only
serves to defaults the value forwarded to the AOAI backend.
- Override URL templates for the seven `/deployments/{deployment-id}/...`
passthrough operations (chat, completions, embeddings, image gen, speech,
transcriptions, translations) to drop the `?api-version` query template,
so route matching no longer requires it from callers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5000 TPM per subscription was tripping during recon test bursts. Bump to 20000 so the rate-limit policy doesn't fire during normal probing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
|
Update for dev purpose |
Author
|
@microsoft-github-policy-service agree |
…get docs Infra: - New `deployTwoBackends` param in `infra/main.bicep` (default true) — gates the second AOAI account, its role assignment, and the second backend in the APIM load-balancer pool. Lets the target run as a single-backend recon test target without editing Bicep. - New `apim-to-log-analytics` diagnostic setting in `infra/core/gateway/apim.bicep`, scoped to the APIM service and routed to the existing Log Analytics workspace in resource-specific (`Dedicated`) mode. Emits `GatewayLogs`, `WebSocketConnectionLogs`, and `AllMetrics`. Complements the existing App Insights logger pipeline; the two land in different tables and don't overlap. Docs: - `CLAUDE.md` — fork-authoritative guidance: deployment shape (pattern a, westeurope, gpt-4.1-mini, StandardV2), commands, soft-delete recovery, request flow, key files, gotchas. - `costs_analysis.md` — Azure Monitor / KQL recipes for traffic & cost attribution, gpt-4.1-mini pricing reference. - `apim_diagnostic.md` — the two APIM telemetry pipelines, what each writes to (`AppRequests` vs. `AzureDiagnostics`/`ApiManagementGatewayLogs`), Bicep patch description, when pipeline 2 is worth enabling. - `arm_what_if_limitations.md` — what `azd provision --preview` misses (extension resources, APIM sub-resources, policy XML), the persistent noise it reports, and a repo-specific "trust this / don't trust that" guide with `az` verification commands. - README — adds matching fork notes pointer. Hygiene: - `.gitignore` — add `.DS_Store` and `.claude/`. - Add sample-app `package.json` + `package-lock.json` (already referenced by `src/` scripts; previously untracked). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.