Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions bench/reports/sjsonnet-vs-jrsonnet-gaps.md

Large diffs are not rendered by default.

60 changes: 60 additions & 0 deletions bench/reports/sync-points.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Performance sync points

This file tracks current performance migration and exploration work so the same
idea is not repeated without new evidence.

## Active baselines

| Area | Ref | Notes |
|---|---|---|
| upstream/master | `cedc083b4676be43e01bdd6f6cb5d7f4432d0d32` | Clean base used for current local rechecks. |
| jrsonnet | `5e8cbcdbc860a616dbd193428f8933dd7532f537` | Source-built with `cargo build --release -p jrsonnet`. |

## Current confirmed gaps

| workload | status | report |
|---|---|---|
| `large_string_template` | improved by simple-format ASCII-safe propagation; jrsonnet still `~1.34x` faster | `bench/reports/sjsonnet-vs-jrsonnet-gaps.md` |
| kube-prometheus realworld | improved by strict JSON byte import parsing; jrsonnet still `1.55x` faster | `bench/reports/sjsonnet-vs-jrsonnet-gaps.md` |

## Accepted ideas

| idea | status | evidence |
|---|---|---|
| Strict JSON byte import parsing | implemented locally; not committed | `Importer.parseJsonImport` uses `ujson.ByteArrayParser`; `CachedResolvedFile` caches small files as bytes and lazily decodes text; kube Native A/B improved candidate to `132.7/132.1 ms` vs clean `139.4/140.3 ms`. |
| Hybrid sort for inline object materialization | implemented locally; pending PR | `Materializer.computeSortedInlineOrder` keeps insertion sort for ≤16 visible fields and uses in-place quicksort for larger inline objects. Native kube A/B on top of strict JSON bytes improved forward `145.3 -> 140.0 ms` and reverse `151.6 -> 148.9 ms`; output equality and full `__.test` passed. |
| Simple named format ASCII-safe propagation | implemented locally; pending PR | `Format.PartialApplyFmt` returns `Val.Str.asciiSafe` when all static format literals and simple named dynamic values are JSON-string ASCII-safe. Native `large_string_template` improved in both command orders (`8.64 -> 8.01 ms`, `8.65 -> 8.17 ms`); JVM JMH stayed neutral-positive (`0.683 -> 0.677 ms/op`). |
| ByteRenderer medium/long string render cache | implemented locally; pending PR | `BaseByteRenderer` caches fully rendered quoted bytes for repeated strings with 128..4096 chars and a bounded 2048-entry/16KiB-per-entry cache. Kube Native A/B improved in both command orders (`132.89 -> 132.28 ms`, `132.26 -> 130.72 ms`); `large_string_template` stayed neutral-to-positive because the huge unique string is above the cache cap. |

## Rejected ideas

| idea | reason |
|---|---|
| Nested byte-buffer flush threshold 16/32/64 KiB | Not stable positive under same-run forward/reverse Native A/B. |
| Single-part parsed string fast path | Not stable positive under same-run forward/reverse Native A/B. |
| 4-slot object value cache | Reduced overflow count but produced only neutral Native wall-clock results. |
| Lazy small overflow cache before HashMap | Reduced overflow count further but regressed Native wall-clock. |
| Strict JSON object cycle-check skip marker | Debug stats improved, but same-run Native A/B was not stable enough to keep. |
| visitLongString char/range-copy path | Stable JVM JMH regression on `large_string_template` (`~0.82ms` baseline to `~1.21ms` candidate); rejected before Native A/B. |
| Lazy simple-named format byte rendering | Three structural variants improved/held JVM JMH but were neutral-to-negative on Scala Native whole-process `large_string_template`; code reverted. |
| Strict JSON integer parse via `ParseUtils.parseIntegralNum` | Tried both an explicit integral scan and the parser-provided `decIndex/expIndex` fast path. Output stayed identical, but kube Native A/B was not stable-positive; reverse median/min favored the existing `toString.toDouble` path. |
| ByteRenderer ASCII-safe object key precheck | Replaced direct key rendering with `Platform.isAsciiJsonSafe` + low-byte copy for safe keys. Output stayed identical, but kube Native reverse A/B favored the existing short-string renderer across mean/median/min. |
| Direct `String.charAt` scan in `visitShortString` | Avoided the reusable `getChars` temp-buffer copy. Output stayed identical and kube Native improved weakly, but `large_string_template` regressed/noised negative in both command orders, so the existing reusable-buffer renderer path was restored. |
| Long strict-JSON imported string values marked ASCII-safe during parse | Mirrored the large Jsonnet string literal optimization for `.json` imports. Output stayed identical, but kube Native reverse A/B favored baseline, so the parse-time scan was removed. |
| Lower parsed Jsonnet string ASCII-safe threshold to `>=128` | Tried to align parser marking with ByteRenderer's long-string cutoff. Output stayed identical, but the parse-time scan regressed kube Native in both command orders. |
| Lazy materialization-time cache for inline-object sorted order | Stored `computeSortedInlineOrder` results back on `Val.Obj` when absent. Output stayed identical, but real kube Native single-run A/B was neutral-to-negative, so the lazy write was removed. |
| Native CLI path-only parse cache | Avoided `ResolvedFile.contentHash()` for the Native CLI to bypass SHA-256/OpenSSL provider work. It linked and preserved output, but Native wall-clock was neutral on `null` and negative/noisy on kube, so the default content-hash cache was restored. |
| Native GC switch to Commix | Attempted to set `nativeGC` to Commix in Mill. Build script compilation failed because the GC API was not exposed on the current Mill build classpath, so the config experiment was reverted. |
| Parser `_asciiSafe` hint for static format safety | Reused the parser's large-string ASCII-safe marker to avoid re-scanning static format literals. Debug stats improved, but Native whole-process `large_string_template` regressed in both command orders, so the hint path was removed. |
| Native manual ASCII-safe string-to-byte copy | Replaced `String.getBytes(0, len, dst, dstPos)` with a manual `charAt` loop for known ASCII-safe strings. Native `large_string_template` regressed heavily in both command orders, so the platform copy stays on `getBytes`. |
| Single-character append in simple format loop | Branched the single-label simple format path to call `StringBuilder.append(Char)` when the dynamic value length is one. Native `large_string_template` regressed in both command orders, so the existing `append(String)` loop remains. |
| ByteRenderer minified object comma path | Specialized direct/generic object rendering to manage comma/empty state locally for minified JSON. Output stayed identical and kube improved weakly, but `large_string_template` regressed/noised negative in both command orders, so the generic renderer path was restored. |
| Native-only long ASCII escaped string renderer | Gated a direct `charAt` long-string renderer to Scala Native to avoid UTF-8 byte-array allocation for escaped ASCII strings. Output stayed identical, but `large_string_template` regressed in both command orders, so the UTF-8 encode plus SWAR scan remains the best path. |
| Inline small-stack cycle tracking | Replaced eager `IdentityHashMap` cycle tracking with four inline identity slots plus overflow map while preserving recursive error behavior. Kube was noise-level and `large_string_template` regressed in both command orders, so eager `IdentityHashMap` tracking was restored. |
| ByteRenderer quoted key cache | Cached quoted object-key bytes per renderer using HashMap, direct-mapped, and capped variants. Output stayed identical, but kube reverse A/B was not stable-positive and some variants regressed, so direct key rendering was restored. |

## Policy

Before opening a performance PR, rerun focused JMH and Scala Native hyperfine
against the current base and source-built jrsonnet. Keep a change only when the
target benchmark is stable-positive and guard benchmarks do not regress.
62 changes: 43 additions & 19 deletions sjsonnet/src-jvm-native/sjsonnet/CachedResolvedFile.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import fastparse.ParserInput
import java.io.File
import java.nio.charset.StandardCharsets
import java.nio.file.Files
import java.security.MessageDigest

/**
* A class that encapsulates a resolved import. This is used to cache the result of resolving an
Expand Down Expand Up @@ -37,63 +38,86 @@ class CachedResolvedFile(
s"Resolved import path $resolvedImportPath is too large: ${jFile.length()} bytes > $memoryLimitBytes bytes"
)

private val resolvedImportContent: ResolvedFile = {
// TODO: Support caching binary data
if (jFile.length() > cacheThresholdBytes) {
// If the file is too large, then we will just read it from disk
null
} else if (binaryData) {
StaticBinaryResolvedFile(readRawBytes(jFile))
} else {
StaticResolvedFile(readString(jFile))
}
}
private val cachedBytes: Array[Byte] =
if (jFile.length() > cacheThresholdBytes) null
else readRawBytes(jFile)

private val cachedBinaryContent: ResolvedFile =
if (cachedBytes != null && binaryData) StaticBinaryResolvedFile(cachedBytes)
else null

private def readString(jFile: File): String = {
new String(Files.readAllBytes(jFile.toPath), StandardCharsets.UTF_8)
}

private def readRawBytes(jFile: File): Array[Byte] = Files.readAllBytes(jFile.toPath)

private lazy val resolvedTextContent: ResolvedFile =
StaticResolvedFile(new String(cachedBytes, StandardCharsets.UTF_8))

private lazy val cachedBytesHash: String =
cachedBytes.length.toString + ":" + bytesToHex(
MessageDigest.getInstance("SHA-256").digest(cachedBytes)
)

private def bytesToHex(bytes: Array[Byte]): String = {
val hexChars = "0123456789abcdef"
val out = new Array[Char](bytes.length * 2)
var i = 0
var j = 0
while (i < bytes.length) {
val b = bytes(i) & 0xff
out(j) = hexChars.charAt(b >>> 4)
out(j + 1) = hexChars.charAt(b & 0x0f)
i += 1
j += 2
}
new String(out)
}

/**
* A method that will return a reader for the resolved import. If the import is too large, then
* this will return a reader that will read the file from disk. Otherwise, it will return a reader
* that reads from memory.
*/
def getParserInput(): ParserInput = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
FileParserInput(jFile)
} else if (binaryData) {
cachedBinaryContent.getParserInput()
} else {
resolvedImportContent.getParserInput()
resolvedTextContent.getParserInput()
}
}

override def readString(): String = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
// If the file is too large, then we will just read it from disk
readString(jFile)
} else if (binaryData) {
cachedBinaryContent.readString()
} else {
// Otherwise, we will read it from memory
resolvedImportContent.readString()
resolvedTextContent.readString()
}
}

override def contentHash(): String = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
// If the file is too large, then we will just read it from disk
Platform.hashFile(jFile)
} else {
resolvedImportContent.contentHash()
cachedBytesHash
}
}

override def readRawBytes(): Array[Byte] = {
if (resolvedImportContent == null) {
if (cachedBytes == null) {
// If the file is too large, then we will just read it from disk
readRawBytes(jFile)
} else {
// Otherwise, we will read it from memory
resolvedImportContent.readRawBytes()
cachedBytes
}
}
}
41 changes: 41 additions & 0 deletions sjsonnet/src/sjsonnet/BaseByteRenderer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class BaseByteRenderer[T <: java.io.OutputStream](

protected val elemBuilder = new upickle.core.ByteBuilder
private val unicodeCharBuilder = new upickle.core.CharBuilder
private var cachedLongStrings: java.util.HashMap[String, Array[Byte]] = null

def flushByteBuilder(): Unit = {
elemBuilder.writeOutToIfLongerThan(out, if (depth == 0) 0 else 8192)
Expand Down Expand Up @@ -307,6 +308,42 @@ class BaseByteRenderer[T <: java.io.OutputStream](
* chunks and escapes only the bytes that require it.
*/
private def visitLongString(str: String): Unit = {
val charLen = str.length
if (charLen >= 128 && charLen <= BaseByteRenderer.LONG_STRING_CACHE_MAX_CHARS) {
val cache = cachedLongStrings
if (cache != null) {
val cached = cache.get(str)
if (cached != null) {
elemBuilder.ensureLength(cached.length)
System.arraycopy(cached, 0, elemBuilder.arr, elemBuilder.length, cached.length)
elemBuilder.length += cached.length
return
}
}

val start = elemBuilder.length
renderLongStringUncached(str)
val renderedLen = elemBuilder.length - start
if (renderedLen <= BaseByteRenderer.LONG_STRING_CACHE_MAX_BYTES) {
val c =
if (cache != null) cache
else {
val newCache = new java.util.HashMap[String, Array[Byte]]()
cachedLongStrings = newCache
newCache
}
if (c.size() < BaseByteRenderer.LONG_STRING_CACHE_MAX_ENTRIES) {
val rendered = new Array[Byte](renderedLen)
System.arraycopy(elemBuilder.arr, start, rendered, 0, renderedLen)
c.put(str, rendered)
}
}
} else {
renderLongStringUncached(str)
}
}

private def renderLongStringUncached(str: String): Unit = {
val bytes = str.getBytes(java.nio.charset.StandardCharsets.UTF_8)
val bLen = bytes.length
val firstEscape = CharSWAR.findFirstEscapeChar(bytes, 0, bLen)
Expand Down Expand Up @@ -446,6 +483,10 @@ class BaseByteRenderer[T <: java.io.OutputStream](

object BaseByteRenderer {

private final val LONG_STRING_CACHE_MAX_CHARS = 4096
private final val LONG_STRING_CACHE_MAX_BYTES = 16384
private final val LONG_STRING_CACHE_MAX_ENTRIES = 2048

/** Pre-allocated spaces buffer for bulk indentation. */
private[sjsonnet] val SPACES: Array[Byte] = {
val a = new Array[Byte](64)
Expand Down
49 changes: 40 additions & 9 deletions sjsonnet/src/sjsonnet/Format.scala
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ object Format {
val literalEnds: Array[Int],
/** Non-null when all simple named specs use the same label. */
val singleNamedLabel: String,
/** True when all literal text copied to the output is already JSON-string ASCII-safe. */
val staticAsciiSafe: Boolean,
/**
* True when ALL specs are simple `%(key)s` with a named label and no formatting flags. In
* this case we can use a fast path that caches the object key lookup and avoids widenRaw
Expand Down Expand Up @@ -483,6 +485,7 @@ object Format {
litStarts,
litEnds,
singleNamedLabel,
Platform.isAsciiJsonSafe(s),
allSimpleNamed
)
}
Expand All @@ -497,6 +500,7 @@ object Format {
val emptyStarts = new Array[Int](size)
val emptyEnds = new Array[Int](size)
var staticChars = leading.length
var staticAsciiSafe = Platform.isAsciiJsonSafe(leading)
var hasAnyStar = false
var allSimpleNamed = true
var idx = 0
Expand All @@ -508,6 +512,7 @@ object Format {
specs(idx) = formatted.bits
literals(idx) = literal
staticChars += literal.length
staticAsciiSafe &&= Platform.isAsciiJsonSafe(literal)
hasAnyStar ||= formatted.widthStar || formatted.precisionStar
allSimpleNamed = false
idx += 1
Expand All @@ -526,6 +531,7 @@ object Format {
emptyStarts,
emptyEnds,
null,
staticAsciiSafe,
allSimpleNamed
)
}
Expand Down Expand Up @@ -556,7 +562,7 @@ object Format {
// Super-fast path: all specs are simple %(key)s with an object value.
// Avoids per-spec pattern matching, widenRaw, and uses offset-based literal appends.
if (parsed.allSimpleNamedString && values0.isInstanceOf[Val.Obj]) {
return formatSimpleNamedString(parsed, values0.asInstanceOf[Val.Obj], pos)
return formatSimpleNamedStringValue(parsed, values0.asInstanceOf[Val.Obj], pos).str
}

val values = values0 match {
Expand Down Expand Up @@ -751,34 +757,47 @@ object Format {
if (singleSpecNoStatic) singleFormatted else output.toString()
}

private[sjsonnet] def formatValue(parsed: RuntimeFormat, values0: Val, pos: Position)(implicit
evaluator: EvalScope): Val.Str =
if (parsed.allSimpleNamedString && values0.isInstanceOf[Val.Obj]) {
formatSimpleNamedStringValue(parsed, values0.asInstanceOf[Val.Obj], pos)
} else {
Val.Str(pos, format(parsed, values0, pos))
}

/**
* Super-fast path for format strings where ALL specs are simple `%(key)s` with a `Val.Obj`. This
* avoids per-spec pattern matching, widenRaw overhead, and caches repeated key lookups. For the
* large_string_template benchmark (605KB, 256 `%(x)s` interpolations), this eliminates 256
* redundant object lookups and the generic dispatch overhead.
*/
private def formatSimpleNamedString(parsed: RuntimeFormat, obj: Val.Obj, pos: Position)(implicit
evaluator: EvalScope): String = {
private def formatSimpleNamedStringValue(parsed: RuntimeFormat, obj: Val.Obj, pos: Position)(
implicit evaluator: EvalScope): Val.Str = {
val output = new java.lang.StringBuilder(parsed.staticChars + parsed.specBits.length * 16)
var asciiSafe = parsed.staticAsciiSafe

// Append leading literal using offsets if source is available, else use string
appendLeading(output, parsed)

val singleLabel = parsed.singleNamedLabel
if (singleLabel != null) {
val str = simpleStringValue(obj.value(singleLabel, pos)(evaluator).value)
val rawVal = obj.value(singleLabel, pos)(evaluator).value
val str = simpleStringValue(rawVal)
asciiSafe &&= simpleStringValueAsciiSafe(rawVal)
var idx = 0
while (idx < parsed.specBits.length) {
output.append(str)
appendLiteral(output, parsed, idx)
idx += 1
}
return output.toString
val result = output.toString
return if (asciiSafe) Val.Str.asciiSafe(pos, result) else Val.Str(pos, result)
}

// Cache for repeated key lookups: most format strings reuse the same key many times
var cachedKey: String = null
var cachedStr: String = null
var cachedAsciiSafe = false

var idx = 0
while (idx < parsed.specBits.length) {
Expand All @@ -787,12 +806,16 @@ object Format {
// Look up and cache the string value for this key
// String.equals already does identity check (eq) internally
val str =
if (key == cachedKey) cachedStr
else {
if (key == cachedKey) {
asciiSafe &&= cachedAsciiSafe
cachedStr
} else {
val rawVal = obj.value(key, pos)(evaluator).value
val s = simpleStringValue(rawVal)
cachedKey = key
cachedStr = s
cachedAsciiSafe = simpleStringValueAsciiSafe(rawVal)
asciiSafe &&= cachedAsciiSafe
s
}

Expand All @@ -803,7 +826,8 @@ object Format {

idx += 1
}
output.toString
val result = output.toString
if (asciiSafe) Val.Str.asciiSafe(pos, result) else Val.Str(pos, result)
}

private def simpleStringValue(rawVal: Val)(implicit evaluator: EvalScope): String =
Expand All @@ -826,6 +850,13 @@ object Format {
value.toString
}

private def simpleStringValueAsciiSafe(rawVal: Val): Boolean =
rawVal match {
case vs: Val.Str => vs._asciiSafe
case _: Val.Num | _: Val.True | _: Val.False | _: Val.Null => true
case _ => false
}

private def formatInteger(formatted: FormatSpec, s: Double): String = {
// Fast path: if the value fits in a Long (and isn't Long.MinValue where
// negation overflows), avoid BigInt allocation entirely
Expand Down Expand Up @@ -1013,6 +1044,6 @@ object Format {
// Each PartialApplyFmt instance caches its own parsed format, so no external cache needed.
private val parsed = scanFormat(fmt)
def evalRhs(values0: Eval, ev: EvalScope, pos: Position): Val =
Val.Str(pos, format(parsed, values0.value, pos)(ev))
formatValue(parsed, values0.value, pos)(ev)
}
}
Loading
Loading