fix: integration test readiness for fibp#7
Merged
vieiralucas merged 4 commits intomainfrom Mar 26, 2026
Merged
Conversation
fix(test_helper): replace plain tcp connect with fibp handshake probe - plain tcp accept fires before server finishes fibp init; blocking handshake probe guarantees the server is ready to accept connections - add mtls client-cert support in probe for servers requiring client certs - retry server spawn up to 3 times to handle port toctou race condition fix(transport): correct consume push routing (corr_id=0) and auth frame - server-push frames carry corr_id=0; register push queue at 0 in addition to the consume request corr_id so push frames are dispatched correctly - use separate one-shot ack_q for the consume ack so push frames and the ack don't land on the same queue - op_auth payload is raw key bytes (no length prefix) — matches rust sdk - parse_error_frame: op_error payloads are plain utf-8 text, not code:u16+len:u16+msg; derive error type from message content fix(codec): all module_function helpers must precede private keyword - encode_message, encode_str16, and all read_* helpers moved before private so module_function makes them callable as codec.method_name from the batcher background thread and other module-function contexts - decode_nack_response alias replaced with explicit def (aliases do not inherit module_function) fix(codec): remove phantom queue_id field from decode_consume_push - server wire format is: count | msg_id | fairness_key | attempt_count | headers | payload — no queue_id field; pass queue name from caller fix(client): pass queue name to decode_consume_push fix(batcher): re-broadcast original exception type to preserve rpc errors - wrapping in fila::error.new loses queuenotfounderror / rpcerror types fix(test_helper): create_queue uses protobuf-encoded createqueuerequest - admin ops use protobuf payloads; old binary format caused decode errors
- remove now-unnecessary rubocop:disable comments after method refactors - extract auth/queue/message error predicates from parse_error_frame to lower cyclomatic complexity below the configured threshold - replace io.select with sock.wait_readable per lint/incompatibleioselect rubocop cop recommendation
There was a problem hiding this comment.
1 issue found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="lib/fila/transport.rb">
<violation number="1" location="lib/fila/transport.rb:122">
P2: Using `@pending[0]` for push frames can collide with normal request correlation IDs when `next_corr_id` wraps to 0, breaking frame routing.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
probe tls support at load time using a minimal self-signed cert. some bleeding-edge server binaries panic on tls startup due to a missing rustls crypto provider (fixed in faiscadev/fila pr #140 but not yet released to dev-latest). tls tests guard themselves with fila_tls_available so the suite does not fail when the server binary lacks tls support.
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="test/test_tls_auth.rb">
<violation number="1" location="test/test_tls_auth.rb:6">
P2: This file-level guard now skips non-TLS tests whenever TLS is unavailable, reducing integration coverage for API-key and backward-compat paths.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| require 'openssl' | ||
|
|
||
| return unless FILA_SERVER_AVAILABLE | ||
| return unless FILA_SERVER_AVAILABLE && FILA_TLS_AVAILABLE |
There was a problem hiding this comment.
P2: This file-level guard now skips non-TLS tests whenever TLS is unavailable, reducing integration coverage for API-key and backward-compat paths.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/test_tls_auth.rb, line 6:
<comment>This file-level guard now skips non-TLS tests whenever TLS is unavailable, reducing integration coverage for API-key and backward-compat paths.</comment>
<file context>
@@ -3,7 +3,7 @@
require 'openssl'
-return unless FILA_SERVER_AVAILABLE
+return unless FILA_SERVER_AVAILABLE && FILA_TLS_AVAILABLE
# Helper to generate self-signed CA and server/client certificates for tests.
</file context>
…0xffffffff corr_id=0 is permanently allocated to the server-push channel. the previous implementation wrapped next_corr_id through 0, which would have collided with the push channel after 2^32 requests. identified by cubic.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes multiple bugs that caused integration tests to fail with "fila-server failed to start within 10s" and other errors. The root cause was a combination of an insufficient readiness probe and several protocol encoding bugs.
Readiness probe (
test/test_helper.rb)FIBP\x01\x00→ echo) so we only proceed when the server can actually serve FIBP requestsAdmin
create_queueencoding (test/test_helper.rb)CreateQueueRequest { name: "..." }in proto3 wire format (field 1, wire type 2)Consume push routing (
lib/fila/transport.rb)corr_id=0regardless of the consume request's corr_id. The push queue was registered only at the request corr_id, so push frames were silently droppedOP_AUTHpayload format (lib/fila/transport.rb)u16length, which the server decoded as the first two bytes of the key, causing "invalid or missing api key"OP_ERRORframe parsing (lib/fila/transport.rb)OP_ERRORframes carry plain UTF-8 text, notcode:u16 + len:u16 + msg. The old parser read garbage bytes as an error codeCodecmodule function visibility (lib/fila/codec.rb)module_functiononly applies to methods defined beforeprivate. Helper methods (encode_message,encode_str16,read_*) were afterprivate, making them unavailable asCodec.method_namefrom the batcher threadprivate; remove theprivateblock entirelyalias decode_nack_responsereplaced with an explicitdef(aliases don't inheritmodule_function)Consume push wire format (
lib/fila/codec.rb)queue_id: string16field that doesn't exist in the server's wire format, causingnil.unpack1on every consumed messageBatcher error propagation (
lib/fila/batcher.rb)Fila::Error.new(e.message)lost the specific type (QueueNotFoundError,RPCError). Tests assertingassert_raises(Fila::RPCError)would failTest plan
test/test_client.rb— 3/3 passingtest/test_batch.rb— 16/16 passingtest/test_tls_auth.rb— 9/9 passing (TLS, mTLS, API key, backward compat)🤖 Generated with Claude Code
Summary by cubic
Fixes flaky integration tests by using a real FIBP readiness probe, correcting protocol mismatches, and guarding TLS tests when the server lacks TLS. Startup is reliable now, server push/auth/error paths match the wire format, and
corr_idwraparound no longer collides with server-push.CreateQueueRequestforcreate_queue.corr_id=0and use a one-shot ACK queue; client passes queue name to the decoder.0for server-push; cycle request IDs through1..0xFFFFFFFFto avoid collisions.OP_AUTHpayload is raw API key bytes (no length prefix).OP_ERRORas UTF-8 text and map to typed errors by message.module_function, definedecode_nack_response, remove the non-existentqueue_idfrom consume push decoding.QueueNotFoundError,RPCError, etc.).Written for commit 86dcc4f. Summary will update on new commits.