feat: Add new spec for A2A protocol conformance tests#1882
feat: Add new spec for A2A protocol conformance tests#1882darrelmiller wants to merge 8 commits into
Conversation
Introduces a language-neutral YAML format for declaring conformance tests that A2A SDKs must pass. Key design decisions: - Abstract operations instead of wire methods for transport independence - Three conformance levels (must/should/may) per RFC 2119 - Client golden-response tests for interop bug coverage - SUT behavior contract via message-prefix convention - Inline CDDL type definitions throughout the spec Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Spec additions: - §13 Report Format: standardized JSON output schema with CDDL for dashboards to consume results from any runner implementation - Appendix A: consolidated report CDDL added - Sections 14-17 renumbered accordingly Test suites (tests/acts/): - suite.acts.yaml: top-level manifest including all suites - discovery.acts.yaml: 6 tests (CARD-DISC-*) - core-operations.acts.yaml: 7 tests (CORE-SEND/GET/CANCEL-*) - multi-turn.acts.yaml: 3 tests (CORE-MULTI-*) - streaming.acts.yaml: 6 tests (STREAM-SSE/SUB-*) - polling.acts.yaml: 2 tests (CORE-EXEC-*) - error-handling.acts.yaml: 7 tests (CORE-ERR-*, JSONRPC-ERR-*) - wire-format.acts.yaml: 3 tests (DM-FMT-*) - data-types.acts.yaml: 4 tests (DM-ART-*) - push-notifications.acts.yaml: 4 tests (PUSH-CFG-*) - client-parsing.acts.yaml: 6 tests (CLIENT-PARSE-*) Tests synthesized from a2a-tck, a2a-itk, agntcy/csit, and agentbin to validate the ACTS format covers real-world scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New test files: - history.acts.yaml (6 tests: history length, ordering, content) - version-negotiation.acts.yaml (2 tests: version errors, defaults) - transport-bindings.acts.yaml (8 tests: JSON-RPC, REST, gRPC) Additions to existing files: - core-operations: +3 (failure, content-type error, list tasks) - streaming: +5 (first event, message-only, concurrent, resubscribe) - error-handling: +4 (malformed request, capability errors, error data) - discovery: +3 (caching, schema validation, extended card) - data-types: +3 (timestamps, schema validation, tolerance) - push-notifications: +6 (list, errors, idempotent delete, delivery) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-referenced every MUST and SHOULD in the A2A specification against existing ACTS tests. Found 23 uncovered testable requirements and created tests for all of them. New test files: - auth-security.acts.yaml (12 tests: auth rejection, extended card access, in-task AUTH_REQUIRED state, push webhook auth) Additions to existing files: - core-operations: +2 (ListTasks includeArtifacts, nextPageToken) - multi-turn: +2 (mismatched contextId/taskId, rejected client contextId) - error-handling: +3 (error @type field, ErrorInfo, missing-vs-unauthorized) - discovery: +1 (protocol declaration in Agent Card) - transport-bindings: +1 (REST application/a2a+json content type) - client-parsing: +2 (extended card caching, capability checking) Coverage: 76 testable spec requirements now have matching tests. 99 additional requirements are process/documentation/deployment concerns not testable via conformance tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reconcile 4 critical naming mismatches between spec and tests: - action/request -> operation/params (JSON-RPC alignment) - raw_request/raw_expect -> raw/expect (flat block, reuse expect) - golden_response -> client_response (clearer naming) - rawBody -> body_raw (snake_case consistency) Add 3 structural improvements identified during format review: - status field in expect-block (tests already use it) - runner_requirements enum for runner-special tests - named-assertion / assertions block (used by 9 tests) Updated both inline CDDL and Appendix A consolidated grammar. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
§12.6 stated the report format was 'not prescribed' (MAY), but §13 mandates a JSON report format (MUST). Updated §12.6 to reference §13 and align the normative language. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Interactive viewer with search, filter by level/tag - Syntax-highlighted YAML rendering - Expand/collapse all tests - Stats dashboard showing 111 tests (MUST/SHOULD/MAY breakdown) - Standalone HTML file, zero dependencies
There was a problem hiding this comment.
Code Review
This pull request introduces the A2A Conformance Test Specification (ACTS) along with a comprehensive suite of YAML-based conformance tests covering core operations, streaming, discovery, error handling, and security. The review feedback identifies several schema and specification violations across the test files, including incorrect abstract operation names, invalid keys in error expectations, mismatched field names for file content, and incorrect JSON-RPC error codes. Additionally, the gRPC-to-HTTP error mapping table in the specification document should be updated to prioritize canonical transcoding mappings.
| - id: set-config | ||
| operation: set_push_notification_config |
| expect_error: | ||
| code: UnsupportedOperationError |
There was a problem hiding this comment.
| - file: | ||
| name: "report.pdf" | ||
| mediaType: "application/pdf" | ||
| bytes: "JVBERi0xLjQ=" |
There was a problem hiding this comment.
| expect: | ||
| status: error |
There was a problem hiding this comment.
| error: | ||
| code: -32009 |
| | `TaskNotCancelableError` | -32002 | `FAILED_PRECONDITION` | 409 | | ||
| | `UnsupportedOperationError` | -32004 | `UNIMPLEMENTED` | 405 | | ||
| | `ContentTypeNotSupportedError` | -32005 | `INVALID_ARGUMENT` | 415 | | ||
| | `InvalidParamsError` | -32602 | `INVALID_ARGUMENT` | 400 | | ||
| | `VersionNotSupportedError` | -32006 | `UNIMPLEMENTED` | 406 | |
There was a problem hiding this comment.
The mapping of gRPC statuses to HTTP statuses in this table does not align with the general rule to prioritize compatibility with gRPC transcoding systems by using the canonical mapping. Specifically:
TaskNotCancelableError(gRPCFAILED_PRECONDITION) should map to HTTP400(canonical mapping forFAILED_PRECONDITION), not409.UnsupportedOperationError(gRPCUNIMPLEMENTED) should map to HTTP501(canonical mapping forUNIMPLEMENTED), not405.VersionNotSupportedError(gRPCUNIMPLEMENTED) should map to HTTP501, not406.ContentTypeNotSupportedError(gRPCINVALID_ARGUMENT) should map to HTTP400, not415.
References
- When mapping gRPC statuses to HTTP statuses, prioritize compatibility with gRPC transcoding systems by using the canonical mapping, even if a more specific HTTP status code is available.
| expect: | ||
| error: | ||
| exists: true |
There was a problem hiding this comment.
|
ACTS scopes protocol conformance cleanly. One gap worth flagging: two implementations can pass all 111 tests and still produce different cryptographic outputs from the same input — JCS canonicalization, hash derivation, and signature binding sit below the transport layer ACTS validates. We hit this recently when our own JCS serializer diverged on Unicode handling that six other implementations got right; the shared test vectors caught it, not protocol-level tests. A derivation conformance layer would compose well as a separate test class or companion suite. |
|
test (edit later) |
|
We have an adversarial bench live -- 138 profiles across 30 categories, tested against our own A2A agent at On @andysalvo's point about JCS and cryptographic outputs: that layer is below protocol-observable behaviour and cannot be reached by protocol assertions -- it needs shared test vectors that implementations run locally and compare byte-for-byte. A derivation conformance suite would compose alongside ACTS as a separate test class rather than an extension of it. Profile schema is public if it is useful for expressing an adversarial class in the ACTS format. AlgoVoi (chopmob-cloud) -- Acquisition enquiries: https://docs.algovoi.co.uk/acquisition |
|
The gap is real and sits below what ACTS validates. Two implementations can pass all 111 ACTS tests and still diverge on JCS canonical output for the same input — particularly on unicode normalisation, number representation, and key ordering edge cases. ACTS validates the transport and protocol layer; it does not validate the cryptographic substrate underneath. The 8-implementation cross-validation corpus at \chopmob-cloud/algovoi-jcs-conformance-vectors\ covers exactly this layer: Python (rfc8785), TypeScript (canonicalize), Go (gowebpki/jcs), Rust (serde_json), Java (cyberphone/json-canonicalization), PHP, .NET, Ruby — all producing byte-identical output across the same vector set, including the non-ASCII UTF-8 cases that diverge most frequently in practice. A complete A2A conformance spec needs both layers: ACTS for protocol, a JCS byte-match suite for the cryptographic substrate. The two are complementary rather than overlapping. AlgoVoi (chopmob-cloud) -- Acquisition enquiries: https://docs.algovoi.co.uk/acquisition |
Summary
This PR introduces ACTS (A2A Conformance Test Specification) — a unified YAML format for A2A protocol conformance tests,plus 111 tests covering all testable MUST/SHOULD requirements from the A2A v1.0 spec.
Problem
The A2A ecosystem has 4 fragmented conformance testing efforts (a2a-tck, a2a-itk, agntcy/csit, agentbin), each withdifferent test definitions. This makes it impossible to verify that SDKs passing "conformance tests" can actuallyinteroperate.
Solution
One canonical test format that all SDKs test against. Language-agnostic YAML declarations + freedom to implement runners in any language = guaranteed interoperability.
What's Included
📋 Specification (docs/acts-specification.md)
✅ Test Suite (111 tests across 14 files)
🎨 HTML Viewer (tests/acts/test-viewer.html)
Coverage Validation
✅ Spec audit: All 76 testable MUST/SHOULD requirements have tests
✅ Gap analysis: All scenarios from 4 existing repos covered
✅ CDDL validation: All files validate against RFC 8610 grammar