Skip to content

feat(connectors): add compression support to S3 sink connector (gzip, zstd) #3104

@atharvalade

Description

@atharvalade

Description

The S3 sink connector (iggy_connector_s3_sink) added in #2976 currently writes uncompressed files to S3. For production workloads with high message throughput, compression is essential to reduce storage costs and upload times.

Proposed Changes

Add a compression config option to the S3 sink connector supporting:

  • none (default, current behavior)
  • gzip — widely supported, good compatibility with downstream tools (Athena, Spark, etc.)
  • zstd — better compression ratio and speed, growing ecosystem support

Config Example

[plugin_config]
compression = "gzip"  # none | gzip | zstd

Implementation Notes

  • Compress the finalized buffer bytes before uploading to S3 (after finalize_buffer, before upload_with_retry)
  • Append the appropriate file extension (.jsonl.gz, .jsonl.zst, .json.gz, etc.) — the path module already derives extensions from OutputFormat, this needs to account for compression
  • Set the correct Content-Encoding header on the S3 put_object call
  • Use flate2 for gzip and zstd crate for zstd — check if they're already in the workspace, otherwise add them
  • Add unit tests for round-trip compress/decompress verification
  • Update the connector README and config.toml example

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions