Skip to content

Harden TTS WAV validation and upstream enum pinning#270

Merged
bernardladenthin merged 1 commit into
mainfrom
claude/laughing-albattani-bl5h9h
Jun 25, 2026
Merged

Harden TTS WAV validation and upstream enum pinning#270
bernardladenthin merged 1 commit into
mainfrom
claude/laughing-albattani-bl5h9h

Conversation

@bernardladenthin

Copy link
Copy Markdown
Owner

Summary

  • TTS WAV validation: Enhanced TtsIntegrationTest.synthesizesWellFormedWav() to verify the complete WAV structure (RIFF/WAVE headers, fmt/data chunks, 24 kHz mono 16-bit PCM format, chunk size consistency) and confirm the output contains non-zero audio samples rather than silence. This catches misconfigurations or model loading failures that would previously pass with only a header-size check.

  • Upstream enum pinning: Added CMake assertions in generate-tts-upstream.cmake to verify the outetts_version enum enumerators and order remain token-identical to the hand-written copy in tts_upstream.h, preventing silent miscompiles if a llama.cpp bump changes the enum definition.

  • Upstream function pinning: Added CMake assertions to verify both prompt_add overloads that tts_upstream.h declares are present in the generated TU, catching link-time failures early at configure time.

  • Header optimization: Changed tts_upstream.h to forward-declare nlohmann::ordered_json instead of including the full header, reducing compile-time overhead for all translation units that include the shared interface. Updated tts_engine.cpp to include the full json header where needed (for constructing the empty-object speaker argument).

  • Documentation: Updated CLAUDE.md to document the new enum and function pinning assertions as part of the fail-loud drift-detection contract.

Test plan

  • TtsIntegrationTest.synthesizesWellFormedWav() now validates WAV structure comprehensively and confirms non-silent output
  • CMake configure step aborts with clear error messages if upstream enum or function signatures drift
  • Existing integration tests pass locally
  • CI is green on this branch

Related issues / PRs

None

Checklist

  • I have read CONTRIBUTING.md and CODE_OF_CONDUCT.md
  • My commits follow Conventional Commits
  • No security-sensitive changes

https://claude.ai/code/session_01QTQ8mBM9tyKkHbXVBpGwET

Re-checked all six findings raised after PR #268 against the pinned upstream
tts.cpp @ b9739. Two were false alarms; four were genuine.

Verified upstream-faithful (NOT divergences) — added provenance comments only:
- #1 llama_model_n_embd_out: upstream tts.cpp:1042 uses the exact same call.
  Comment now notes it reads the vocoder OUTPUT embedding width, matching upstream.
- #2 0.25 s silence lead-in: upstream tts.cpp:1077-1080 zeroes the first 0.25 s
  identically. Comment notes it mirrors upstream and that our `i < audio.size()`
  bound is an added safety guard over upstream's fixed 24000/4.

Genuine findings fixed:
- #3 heavy include in shared header: tts_upstream.h now includes
  <nlohmann/json_fwd.hpp> (forward-declares ordered_json) instead of the full
  <nlohmann/json.hpp>, and drops the json default argument. The single caller,
  tts_engine.cpp, includes the full json and passes an explicit empty object.
  Future includers of the shared interface no longer pull in ~25k lines of json.
- #4 unasserted duplicate enum: generate-tts-upstream.cmake now captures the
  upstream `outetts_version` enum body and pins its enumerators + order against
  the hand-written copy in tts_upstream.h, so a reorder/rename fails the configure
  instead of silently assigning different integer values across the two TUs.
- #5 prompt_add overload coverage: the bare `void prompt_add(` prefix de-statics
  all three upstream overloads but only proved >=1 existed. The generator now
  pins (whitespace-tolerant) both overloads the header declares, turning a future
  cryptic link error into a clear configure-time failure.
- #6 weak WAV assertion: TtsIntegrationTest now parses the RIFF/WAVE header
  (PCM format, mono, 24 kHz, 16-bit), checks chunk-size self-consistency, and
  scans the PCM payload for non-zero samples — so a near-empty or all-silent
  result no longer passes the way `length > 44` did.

Generator regexes validated against the real b9739 source via `cmake -P`
(including a negative drift control); Java test compiles; clang-format 22.1.5
and Spotless both clean. Native build not run here (sandbox proxy blocks the
dependency FetchContent clones) — exercised in CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01QTQ8mBM9tyKkHbXVBpGwET
@bernardladenthin bernardladenthin merged commit 7633baf into main Jun 25, 2026
10 of 12 checks passed
@bernardladenthin bernardladenthin deleted the claude/laughing-albattani-bl5h9h branch June 25, 2026 07:53
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants