Skip to content

base64 encode/decode is ~6x slower than jrsonnet on large payloads #779

@He-Pin

Description

@He-Pin

I was benchmarking base64 performance on large payloads and noticed sjsonnet is significantly slower than jrsonnet — about 6x on a ~4.5MB string with a couple of encode/decode roundtrips.

Dug into it a bit. The bottleneck isn't the base64 codec itself — it's the UTF-16 ↔ UTF-8 conversion that happens on every call. Since Java/Scala strings are UTF-16 internally, every std.base64(str) has to do str.getBytes("UTF-8") to get bytes for the encoder, and every std.base64Decode has to do new String(bytes, "UTF-8") to produce the result. That's two full copies of the data per operation, going through the charset encoder/decoder.

jrsonnet doesn't have this problem because its strings are UTF-8 natively (custom IStr type backed by [u8]), so base64 can work directly on the string bytes with zero conversion.

For small payloads (a few KB) this doesn't really matter — interpreter overhead dominates. But once you get into the hundreds-of-KB or MB range, the conversion cost adds up fast.

Repro (requires hyperfine + both tools installed):

// base64_ultra.jsonnet
local s1 = std.repeat("The quick brown fox jumps over the lazy dog. ", 100000);
local e1 = std.base64(s1);
local d1 = std.base64Decode(e1);
local e2 = std.base64(d1);
local d2 = std.base64Decode(e2);
{
  input_len: std.length(s1),
  encoded_len: std.length(e1),
  roundtrip_ok: d2 == s1,
}
hyperfine --warmup 2 \
  'sjsonnet base64_ultra.jsonnet' \
  'jrsonnet base64_ultra.jsonnet'

On my M4 Max (Scala Native build):

  • sjsonnet: ~88ms
  • jrsonnet: ~14ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions