Description
The S3 sink connector (iggy_connector_s3_sink) added in #2976 currently writes uncompressed files to S3. For production workloads with high message throughput, compression is essential to reduce storage costs and upload times.
Proposed Changes
Add a compression config option to the S3 sink connector supporting:
none (default, current behavior)
gzip — widely supported, good compatibility with downstream tools (Athena, Spark, etc.)
zstd — better compression ratio and speed, growing ecosystem support
Config Example
[plugin_config]
compression = "gzip" # none | gzip | zstd
Implementation Notes
- Compress the finalized buffer bytes before uploading to S3 (after finalize_buffer, before upload_with_retry)
- Append the appropriate file extension (.jsonl.gz, .jsonl.zst, .json.gz, etc.) — the path module already derives extensions from OutputFormat, this needs to account for compression
- Set the correct Content-Encoding header on the S3 put_object call
- Use flate2 for gzip and zstd crate for zstd — check if they're already in the workspace, otherwise add them
- Add unit tests for round-trip compress/decompress verification
- Update the connector README and config.toml example
Description
The S3 sink connector (
iggy_connector_s3_sink) added in #2976 currently writes uncompressed files to S3. For production workloads with high message throughput, compression is essential to reduce storage costs and upload times.Proposed Changes
Add a
compressionconfig option to the S3 sink connector supporting:none(default, current behavior)gzip— widely supported, good compatibility with downstream tools (Athena, Spark, etc.)zstd— better compression ratio and speed, growing ecosystem supportConfig Example
Implementation Notes