Skip to content

feat: Add new spec for A2A protocol conformance tests#1882

Open
darrelmiller wants to merge 8 commits into
mainfrom
conformance-spec
Open

feat: Add new spec for A2A protocol conformance tests#1882
darrelmiller wants to merge 8 commits into
mainfrom
conformance-spec

Conversation

@darrelmiller

Copy link
Copy Markdown
Contributor

Summary

This PR introduces ACTS (A2A Conformance Test Specification) — a unified YAML format for A2A protocol conformance tests,plus 111 tests covering all testable MUST/SHOULD requirements from the A2A v1.0 spec.

Problem

The A2A ecosystem has 4 fragmented conformance testing efforts (a2a-tck, a2a-itk, agntcy/csit, agentbin), each withdifferent test definitions. This makes it impossible to verify that SDKs passing "conformance tests" can actuallyinteroperate.

Solution

One canonical test format that all SDKs test against. Language-agnostic YAML declarations + freedom to implement runners in any language = guaranteed interoperability.

What's Included

📋 Specification (docs/acts-specification.md)

  • ~2000 lines, 17 sections, inline CDDL grammar (RFC 8610 conformant)
  • Abstract operations (transport-agnostic: JSON-RPC, gRPC, REST)
  • Rich assertion DSL, streaming support, state machine validation
  • Standard JSON report format for dashboards

✅ Test Suite (111 tests across 14 files)

  • 72 MUST, 35 SHOULD, 4 MAY tests
  • Core ops, discovery, streaming, errors, multi-turn, auth, push notifications, history, polling, wire format, data types, version negotiation, transport bindings, client parsing
  • Full coverage: All 76 testable spec requirements mapped

🎨 HTML Viewer (tests/acts/test-viewer.html)

  • Interactive browser with search, filter, syntax highlighting
  • 290 KB standalone file, zero dependencies

Coverage Validation

✅ Spec audit: All 76 testable MUST/SHOULD requirements have tests
✅ Gap analysis: All scenarios from 4 existing repos covered
✅ CDDL validation: All files validate against RFC 8610 grammar

darrelmiller and others added 7 commits May 25, 2026 17:58
Introduces a language-neutral YAML format for declaring conformance
tests that A2A SDKs must pass. Key design decisions:

- Abstract operations instead of wire methods for transport independence
- Three conformance levels (must/should/may) per RFC 2119
- Client golden-response tests for interop bug coverage
- SUT behavior contract via message-prefix convention
- Inline CDDL type definitions throughout the spec

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Spec additions:
- §13 Report Format: standardized JSON output schema with CDDL for
  dashboards to consume results from any runner implementation
- Appendix A: consolidated report CDDL added
- Sections 14-17 renumbered accordingly

Test suites (tests/acts/):
- suite.acts.yaml: top-level manifest including all suites
- discovery.acts.yaml: 6 tests (CARD-DISC-*)
- core-operations.acts.yaml: 7 tests (CORE-SEND/GET/CANCEL-*)
- multi-turn.acts.yaml: 3 tests (CORE-MULTI-*)
- streaming.acts.yaml: 6 tests (STREAM-SSE/SUB-*)
- polling.acts.yaml: 2 tests (CORE-EXEC-*)
- error-handling.acts.yaml: 7 tests (CORE-ERR-*, JSONRPC-ERR-*)
- wire-format.acts.yaml: 3 tests (DM-FMT-*)
- data-types.acts.yaml: 4 tests (DM-ART-*)
- push-notifications.acts.yaml: 4 tests (PUSH-CFG-*)
- client-parsing.acts.yaml: 6 tests (CLIENT-PARSE-*)

Tests synthesized from a2a-tck, a2a-itk, agntcy/csit, and agentbin
to validate the ACTS format covers real-world scenarios.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New test files:
- history.acts.yaml (6 tests: history length, ordering, content)
- version-negotiation.acts.yaml (2 tests: version errors, defaults)
- transport-bindings.acts.yaml (8 tests: JSON-RPC, REST, gRPC)

Additions to existing files:
- core-operations: +3 (failure, content-type error, list tasks)
- streaming: +5 (first event, message-only, concurrent, resubscribe)
- error-handling: +4 (malformed request, capability errors, error data)
- discovery: +3 (caching, schema validation, extended card)
- data-types: +3 (timestamps, schema validation, tolerance)
- push-notifications: +6 (list, errors, idempotent delete, delivery)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-referenced every MUST and SHOULD in the A2A specification against
existing ACTS tests. Found 23 uncovered testable requirements and created
tests for all of them.

New test files:
- auth-security.acts.yaml (12 tests: auth rejection, extended card access,
  in-task AUTH_REQUIRED state, push webhook auth)

Additions to existing files:
- core-operations: +2 (ListTasks includeArtifacts, nextPageToken)
- multi-turn: +2 (mismatched contextId/taskId, rejected client contextId)
- error-handling: +3 (error @type field, ErrorInfo, missing-vs-unauthorized)
- discovery: +1 (protocol declaration in Agent Card)
- transport-bindings: +1 (REST application/a2a+json content type)
- client-parsing: +2 (extended card caching, capability checking)

Coverage: 76 testable spec requirements now have matching tests.
99 additional requirements are process/documentation/deployment concerns
not testable via conformance tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reconcile 4 critical naming mismatches between spec and tests:
- action/request -> operation/params (JSON-RPC alignment)
- raw_request/raw_expect -> raw/expect (flat block, reuse expect)
- golden_response -> client_response (clearer naming)
- rawBody -> body_raw (snake_case consistency)

Add 3 structural improvements identified during format review:
- status field in expect-block (tests already use it)
- runner_requirements enum for runner-special tests
- named-assertion / assertions block (used by 9 tests)

Updated both inline CDDL and Appendix A consolidated grammar.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
§12.6 stated the report format was 'not prescribed' (MAY), but §13
mandates a JSON report format (MUST). Updated §12.6 to reference §13
and align the normative language.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Interactive viewer with search, filter by level/tag
- Syntax-highlighted YAML rendering
- Expand/collapse all tests
- Stats dashboard showing 111 tests (MUST/SHOULD/MAY breakdown)
- Standalone HTML file, zero dependencies
@darrelmiller darrelmiller requested a review from a team as a code owner May 26, 2026 16:03

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the A2A Conformance Test Specification (ACTS) along with a comprehensive suite of YAML-based conformance tests covering core operations, streaming, discovery, error handling, and security. The review feedback identifies several schema and specification violations across the test files, including incorrect abstract operation names, invalid keys in error expectations, mismatched field names for file content, and incorrect JSON-RPC error codes. Additionally, the gRPC-to-HTTP error mapping table in the specification document should be updated to prioritize canonical transcoding mappings.

Comment on lines +201 to +202
- id: set-config
operation: set_push_notification_config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The abstract operation set_push_notification_config is used here, but it is not defined in the ACTS specification. According to the specification, the correct abstract operation name is create_push_config.

          - id: set-config
            operation: create_push_config

Comment on lines +331 to +332
expect_error:
code: UnsupportedOperationError

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expect_error block uses the key code to specify the expected error type. However, the ACTS specification defines this field as error_type. Using code will cause schema validation failures in compliant test runners.

            expect_error:
              error_type: UnsupportedOperationError

Comment on lines +213 to +216
- file:
name: "report.pdf"
mediaType: "application/pdf"
bytes: "JVBERi0xLjQ="

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The field name bytes is used under the file object to represent the base64-encoded file content. However, both the ACTS specification and the underlying Part protobuf definition use the field name raw for this purpose.

                        - file:
                            name: "report.pdf"
                            mediaType: "application/pdf"
                            raw: "JVBERi0xLjQ="

Comment on lines +153 to +154
expect:
status: error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expect block specifies status: error. According to the ACTS specification, status must be a valid HTTP status code (integer) or a numeric assertion. If the request is expected to fail, you should use the expect_error block instead of expect.

            expect_error:
              error_type: InvalidParamsError

Comment on lines +40 to +41
error:
code: -32009

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expected JSON-RPC error code is set to -32009. However, the ACTS specification defines the JSON-RPC code for VersionNotSupportedError as -32006. This mismatch will cause conformant implementations to fail this test.

                error:
                  code: -32006

Comment on lines +626 to +630
| `TaskNotCancelableError` | -32002 | `FAILED_PRECONDITION` | 409 |
| `UnsupportedOperationError` | -32004 | `UNIMPLEMENTED` | 405 |
| `ContentTypeNotSupportedError` | -32005 | `INVALID_ARGUMENT` | 415 |
| `InvalidParamsError` | -32602 | `INVALID_ARGUMENT` | 400 |
| `VersionNotSupportedError` | -32006 | `UNIMPLEMENTED` | 406 |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mapping of gRPC statuses to HTTP statuses in this table does not align with the general rule to prioritize compatibility with gRPC transcoding systems by using the canonical mapping. Specifically:

  • TaskNotCancelableError (gRPC FAILED_PRECONDITION) should map to HTTP 400 (canonical mapping for FAILED_PRECONDITION), not 409.
  • UnsupportedOperationError (gRPC UNIMPLEMENTED) should map to HTTP 501 (canonical mapping for UNIMPLEMENTED), not 405.
  • VersionNotSupportedError (gRPC UNIMPLEMENTED) should map to HTTP 501, not 406.
  • ContentTypeNotSupportedError (gRPC INVALID_ARGUMENT) should map to HTTP 400, not 415.
References
  1. When mapping gRPC statuses to HTTP statuses, prioritize compatibility with gRPC transcoding systems by using the canonical mapping, even if a more specific HTTP status code is available.

Comment on lines +173 to +175
expect:
error:
exists: true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For asserting that an operation fails, the expect_error block should be used instead of expect with an error body check. This ensures consistency across the test suite and aligns with the schema defined in the specification.

            expect_error:
              error_type: UnsupportedOperationError

@andysalvo

Copy link
Copy Markdown

ACTS scopes protocol conformance cleanly. One gap worth flagging: two implementations can pass all 111 tests and still produce different cryptographic outputs from the same input — JCS canonicalization, hash derivation, and signature binding sit below the transport layer ACTS validates. We hit this recently when our own JCS serializer diverged on Unicode handling that six other implementations got right; the shared test vectors caught it, not protocol-level tests. A derivation conformance layer would compose well as a separate test class or companion suite.

@MoltyCel

Copy link
Copy Markdown

test (edit later)

@muscariello muscariello requested review from a team as code owners May 29, 2026 07:34
@chopmob-cloud

Copy link
Copy Markdown

We have an adversarial bench live -- 138 profiles across 30 categories, tested against our own A2A agent at api.algovoi.co.uk. It sits above what ACTS validates: ACTS checks protocol compliance against MUST/SHOULD requirements, the bench tests agent behaviour under adversarial conditions (prompt injection, coercive payment patterns, wallet draining attempts, malicious tool call sequences). The two layers complement without overlapping.

On @andysalvo's point about JCS and cryptographic outputs: that layer is below protocol-observable behaviour and cannot be reached by protocol assertions -- it needs shared test vectors that implementations run locally and compare byte-for-byte. A derivation conformance suite would compose alongside ACTS as a separate test class rather than an extension of it.

Profile schema is public if it is useful for expressing an adversarial class in the ACTS format.


AlgoVoi (chopmob-cloud) -- Acquisition enquiries: https://docs.algovoi.co.uk/acquisition

@chopmob-cloud

Copy link
Copy Markdown

The gap is real and sits below what ACTS validates. Two implementations can pass all 111 ACTS tests and still diverge on JCS canonical output for the same input — particularly on unicode normalisation, number representation, and key ordering edge cases. ACTS validates the transport and protocol layer; it does not validate the cryptographic substrate underneath.

The 8-implementation cross-validation corpus at \chopmob-cloud/algovoi-jcs-conformance-vectors\ covers exactly this layer: Python (rfc8785), TypeScript (canonicalize), Go (gowebpki/jcs), Rust (serde_json), Java (cyberphone/json-canonicalization), PHP, .NET, Ruby — all producing byte-identical output across the same vector set, including the non-ASCII UTF-8 cases that diverge most frequently in practice.

A complete A2A conformance spec needs both layers: ACTS for protocol, a JCS byte-match suite for the cryptographic substrate. The two are complementary rather than overlapping.

AlgoVoi (chopmob-cloud) -- Acquisition enquiries: https://docs.algovoi.co.uk/acquisition

@msampathkumar msampathkumar changed the title Conformance spec feat: Add new spec for A2A protocol conformance tests Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants