Skip to content

feat: replace grpc with fibp binary protocol#9

Open
vieiralucas wants to merge 20 commits intomainfrom
feat/21.4-binary-protocol
Open

feat: replace grpc with fibp binary protocol#9
vieiralucas wants to merge 20 commits intomainfrom
feat/21.4-binary-protocol

Conversation

@vieiralucas
Copy link
Copy Markdown
Member

@vieiralucas vieiralucas commented Apr 4, 2026

Summary

  • Migrate Ruby SDK from gRPC/protobuf to the FIBP binary protocol over TCP
  • Add lib/fila/fibp/ module with codec (encoding primitives), opcodes/error codes, and TCP connection manager
  • Rewrite Fila::Client and Fila::Batcher to use FIBP instead of gRPC stubs
  • Add admin operations: create_queue, delete_queue, get_stats, list_queues, set_config, get_config, list_config, redrive
  • Add auth operations: create_api_key, revoke_api_key, list_api_keys, set_acl, get_acl
  • Add new error types: AuthenticationError, ForbiddenError, NotLeaderError, QueueAlreadyExistsError, ApiKeyNotFoundError
  • TLS/mTLS via Ruby OpenSSL, API key sent in FIBP handshake (not gRPC metadata)
  • Leader hint redirect on NotLeaderError for consume operations
  • Remove grpc and google-protobuf gem dependencies, delete all proto files
  • Bump version to 0.5.0
  • Net: +1214 / -854 lines

Test plan

  • CI lint (rubocop) passes
  • Integration tests pass against fila-server (enqueue, consume, ack, nack, batch modes)
  • TLS/mTLS/API key auth tests pass
  • Admin operations verified via create_queue in test helper

Summary by cubic

Switch the Ruby SDK from gRPC/protobuf to the FIBP binary protocol over TCP for faster, lighter I/O. Also adds transparent leader-redirect in consume, delivery/enqueue batching, and tighter error mapping.

  • New Features

    • FIBP codec and TCP connection with TLS/mTLS; API key sent in handshake.
    • Background enqueue batching: :auto (default), :linger, :disabled; adds enqueue_many; close drains pending work.
    • Delivery batching in consume; automatic leader-hint reconnect on NotLeaderError.
    • Admin APIs: create/delete queue, stats, list queues, runtime config, redrive.
    • Auth APIs: create/revoke/list API keys, set/get ACL.
    • New errors: AuthenticationError, ForbiddenError, NotLeaderError, QueueAlreadyExistsError, ApiKeyNotFoundError.
    • Protocol alignment: corrected opcode constants; split stats decoding for clarity.
    • More robust single-message ack/nack result handling.
  • Migration

    • Remove grpc and google-protobuf from your Gemfile; delete proto files and generation steps.
    • Initialize via Fila::Client.new(addr, ca_cert:, client_cert:, client_key:, api_key:, batch_mode:); API key is sent in the FIBP handshake.
    • Default batching is batch_mode: :auto; for strict per-call behavior (e.g., tests), use :disabled or enqueue_many.
    • Update error handling to the new error classes and messages.

Written for commit 2da74e7. Summary will update on new commits.

feat: transparent leader hint reconnect on consume
- Add batch_enqueue() for explicit multi-message BatchEnqueue RPC
- Add background batcher with three modes: :auto (default), :linger, :disabled
- Auto mode: opportunistic batching via Queue drain (zero latency at low load)
- Linger mode: timer-based batching with configurable linger_ms and batch_size
- Single-item optimization: 1 message uses Enqueue RPC (preserves error types)
- Delivery batching: consume unpacks repeated messages field transparently
- close() drains pending messages before disconnecting
- Update proto with BatchEnqueue RPC and ConsumeResponse.messages field
- Bump version to 0.3.0
- copy new service.proto (BatchEnqueue RPC removed, EnqueueRequest now
  takes repeated EnqueueMessage, AckRequest/NackRequest take repeated
  messages, ConsumeResponse only has repeated messages field)
- regenerate ruby proto code from new service.proto
- replace batch_enqueue method with enqueue_many (no "batch" prefix)
- enqueue wraps single message in EnqueueMessage + EnqueueRequest
- ack/nack wrap in repeated AckMessage/NackMessage, parse first result
- consume uses only repeated messages field (singular field removed)
- rename BatchEnqueueResult to EnqueueResult
- batcher uses unified Enqueue RPC for all batch sizes
- update all tests to match new api surface
- bump version to 0.4.0
- ack/nack: raise RPCError on nil result instead of silently returning
  (identified by cubic — nil means zero results, which is an error for
  single-message operations)
- batcher: extract result_to_outcome and broadcast_error from flush_batch
  to satisfy AbcSize/CyclomaticComplexity/MethodLength/PerceivedComplexity
- batcher: remove redundant begin block in pop_with_timeout
- batcher: add rubocop:disable for Metrics/ClassLength (126 lines, thread
  management + two modes is inherently complex)
- test_batch: fix array indentation and prefer single-quoted strings
migrate the ruby sdk from grpc to the fila binary protocol (fibp).
adds tcp connection with tls/mtls support, fibp codec with all opcodes,
admin operations, auth operations, and leader hint redirect. removes
grpc and protobuf dependencies. bumps version to 0.5.0.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 issues found across 24 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="test/test_tls_auth.rb">

<violation number="1" location="test/test_tls_auth.rb:98">
P3: Close the temporary client in this rejection test; it currently leaks a connection when the expected exception is raised.</violation>

<violation number="2" location="test/test_tls_auth.rb:245">
P3: Close this temporary TLS client in an `ensure` block to avoid leaking connections during test execution.</violation>
</file>

<file name="test/test_helper.rb">

<violation number="1" location="test/test_helper.rb:116">
P2: Ensure the client is closed even when `create_queue` raises; otherwise failed admin calls can leak open connections.</violation>

<violation number="2" location="test/test_helper.rb:122">
P1: Close the probe client in an `ensure` block so retry failures do not leak sockets during server startup.</violation>
</file>

<file name="lib/fila/fibp/connection.rb">

<violation number="1" location="lib/fila/fibp/connection.rb:139">
P0: **TLS peer verification disabled when custom CA cert is provided.** `ctx.verify_mode` is never set in the `@ca_cert` branch, so it defaults to `VERIFY_NONE`. The custom CA cert store is populated but never actually used for verification — any server certificate will be accepted, defeating the purpose of TLS and enabling MITM attacks.

Add `verify_mode` and `verify_hostname` after setting the cert store.</violation>
</file>

<file name="lib/fila/batcher.rb">

<violation number="1" location="lib/fila/batcher.rb:157">
P2: String concatenation with `+=` in a loop is O(n²) — each iteration copies the entire accumulated buffer. Use `<<` to append in-place.</violation>
</file>

<file name="lib/fila/client.rb">

<violation number="1" location="lib/fila/client.rb:224">
P1: `decode_stats_result` is called but never defined — `get_stats` will raise `NoMethodError` at runtime. The method implementation needs to be added to the `private` section of the `Client` class.</violation>

<violation number="2" location="lib/fila/client.rb:235">
P1: `decode_list_queues_result` is called but never defined — `list_queues` will raise `NoMethodError` at runtime. The method implementation needs to be added to the `private` section of the `Client` class.</violation>

<violation number="3" location="lib/fila/client.rb:495">
P2: After a leader redirect, the `ensure` block cancels `rid` on the **new** connection instead of the old one. Cancel the subscription and nil out `rid` before reconnecting, so the ensure block doesn't send a stale cancel to the wrong server.</violation>

<violation number="4" location="lib/fila/client.rb:539">
P2: `reconnect_to` replaces `@conn` but the `Batcher` still holds a reference to the old (closed) connection. After a leader redirect during consume, subsequent enqueue calls through the batcher will fail. Consider giving the Batcher a way to update its connection reference, or having it access the client's connection indirectly.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread lib/fila/fibp/connection.rb
Comment thread test/test_helper.rb Outdated
Comment thread lib/fila/client.rb
Comment thread lib/fila/client.rb
Comment thread test/test_helper.rb Outdated
Comment thread lib/fila/batcher.rb Outdated
Comment thread lib/fila/client.rb
Comment thread lib/fila/client.rb Outdated
Comment thread test/test_tls_auth.rb
Comment thread test/test_tls_auth.rb
@vieiralucas vieiralucas force-pushed the feat/21.4-binary-protocol branch from c4a7c6e to 301ff1c Compare April 4, 2026 13:21
extract handle_not_leader_redirect helper to bring cyclomatic
complexity under threshold, removing rubocop:disable comment.
use anonymous block forwarding per rubocop style rules.
@vieiralucas vieiralucas force-pushed the feat/21.4-binary-protocol branch from 5647575 to 2da74e7 Compare April 4, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant