Harden TTS WAV validation and upstream enum pinning#270
Merged
Conversation
Re-checked all six findings raised after PR #268 against the pinned upstream tts.cpp @ b9739. Two were false alarms; four were genuine. Verified upstream-faithful (NOT divergences) — added provenance comments only: - #1 llama_model_n_embd_out: upstream tts.cpp:1042 uses the exact same call. Comment now notes it reads the vocoder OUTPUT embedding width, matching upstream. - #2 0.25 s silence lead-in: upstream tts.cpp:1077-1080 zeroes the first 0.25 s identically. Comment notes it mirrors upstream and that our `i < audio.size()` bound is an added safety guard over upstream's fixed 24000/4. Genuine findings fixed: - #3 heavy include in shared header: tts_upstream.h now includes <nlohmann/json_fwd.hpp> (forward-declares ordered_json) instead of the full <nlohmann/json.hpp>, and drops the json default argument. The single caller, tts_engine.cpp, includes the full json and passes an explicit empty object. Future includers of the shared interface no longer pull in ~25k lines of json. - #4 unasserted duplicate enum: generate-tts-upstream.cmake now captures the upstream `outetts_version` enum body and pins its enumerators + order against the hand-written copy in tts_upstream.h, so a reorder/rename fails the configure instead of silently assigning different integer values across the two TUs. - #5 prompt_add overload coverage: the bare `void prompt_add(` prefix de-statics all three upstream overloads but only proved >=1 existed. The generator now pins (whitespace-tolerant) both overloads the header declares, turning a future cryptic link error into a clear configure-time failure. - #6 weak WAV assertion: TtsIntegrationTest now parses the RIFF/WAVE header (PCM format, mono, 24 kHz, 16-bit), checks chunk-size self-consistency, and scans the PCM payload for non-zero samples — so a near-empty or all-silent result no longer passes the way `length > 44` did. Generator regexes validated against the real b9739 source via `cmake -P` (including a negative drift control); Java test compiles; clang-format 22.1.5 and Spotless both clean. Native build not run here (sandbox proxy blocks the dependency FetchContent clones) — exercised in CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QTQ8mBM9tyKkHbXVBpGwET
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
TTS WAV validation: Enhanced
TtsIntegrationTest.synthesizesWellFormedWav()to verify the complete WAV structure (RIFF/WAVE headers, fmt/data chunks, 24 kHz mono 16-bit PCM format, chunk size consistency) and confirm the output contains non-zero audio samples rather than silence. This catches misconfigurations or model loading failures that would previously pass with only a header-size check.Upstream enum pinning: Added CMake assertions in
generate-tts-upstream.cmaketo verify theoutetts_versionenum enumerators and order remain token-identical to the hand-written copy intts_upstream.h, preventing silent miscompiles if a llama.cpp bump changes the enum definition.Upstream function pinning: Added CMake assertions to verify both
prompt_addoverloads thattts_upstream.hdeclares are present in the generated TU, catching link-time failures early at configure time.Header optimization: Changed
tts_upstream.hto forward-declarenlohmann::ordered_jsoninstead of including the full header, reducing compile-time overhead for all translation units that include the shared interface. Updatedtts_engine.cppto include the full json header where needed (for constructing the empty-object speaker argument).Documentation: Updated
CLAUDE.mdto document the new enum and function pinning assertions as part of the fail-loud drift-detection contract.Test plan
TtsIntegrationTest.synthesizesWellFormedWav()now validates WAV structure comprehensively and confirms non-silent outputRelated issues / PRs
None
Checklist
CONTRIBUTING.mdandCODE_OF_CONDUCT.mdhttps://claude.ai/code/session_01QTQ8mBM9tyKkHbXVBpGwET