Add compilation corpus for fuzzing + improve CLI robustness of contrib/compile by Vaibhav701161 · Pull Request #693 · sourcemeta/blaze

Vaibhav701161 · 2026-04-08T14:22:13Z

Summary

This PR introduces a structured seed corpus for exercising the contrib/compile binary and improves its robustness when handling malformed inputs.

The corpus is designed to support fuzzing workflows (e.g., AFL++) and systematically test the schema compilation pipeline against invalid, edge-case, and stress inputs.

Additionally, this PR adds a guard at the CLI boundary to prevent assertion failures when non-schema JSON inputs are provided.

Motivation

Following the introduction of the compile contrib binary in #410, the next step toward effective fuzz testing is having a high-quality input corpus.

As discussed in #99, the compiler should:

Never crash on invalid JSON, unknown dialects, or invalid schemas
Fail gracefully with an error and non-zero exit status

This PR addresses that gap by:

Providing a curated corpus that targets real compiler paths (not random JSON)
Enabling reproducible testing of compilation behavior
Identifying cases where the compiler currently aborts instead of failing gracefully

Changes

1. Compilation Corpus (`test/corpus/compile/`)

A structured set of inputs organized by intent:

valid/ - valid schemas across supported drafts and OpenAPI dialects
invalid_json/ - malformed JSON rejected by the parser
invalid_schema/ - spec-invalid schemas (valid JSON, invalid semantics)
unknown_dialect/ - unsupported or malformed $schema values
edge_cases/ - unusual boundary conditions and recursive structures
stress/ - large schemas to exercise performance and limits

Each file is intentionally small and targets a specific compiler behavior.

2. CLI Robustness Fix (`contrib/compile.cc`)

Added an is_schema() guard before compilation:

Prevents assertion failures when the input is not a valid JSON Schema (object/boolean)
Ensures consistent error handling at the CLI boundary
Aligns behavior with expectations outlined in Add compilation testing binary #99

3. Crash Discovery

Running the corpus revealed several inputs that currently trigger assertion failures (SIGABRT), including:

Invalid keyword types (e.g., "items": "invalid")
Invalid numeric constraints (e.g., negative maxLength, multipleOf)
Empty applicator arrays (allOf, anyOf, oneOf)

These cases are documented in the corpus README and retained as regression inputs.

Expected behavior: graceful failure
Current behavior: assert() abort

4. Helper Script (`contrib/run_corpus.sh`)

A minimal utility to run the corpus locally and detect crashes.

Not part of CI
Intended for manual testing and fuzzing workflows

5. Documentation

Added test/corpus/compile/README.md:

Explains corpus structure and intent
Documents known crash-inducing inputs
Provides example AFL++ usage

Design Considerations

The corpus is not a conformance test suite; it is focused on robustness and fuzzing
Inputs are curated for signal over volume (duplicates removed)
The CLI guard avoids modifying compiler internals while improving usability and stability

Future Work

Integrate corpus with AFL++ or similar fuzzers
Address identified assertion failures in the compiler
Potentially extend coverage for additional dialects and features

Impact

Improves reliability of the compile binary under malformed inputs
Provides a foundation for systematic fuzz testing
Surfaces real robustness issues in the compilation pipeline

…nd CI The is_schema() guard was checking the root document before --path extraction. When the JS test suite passes test-suite JSON files (arrays at root) with --path to navigate to the actual schema, the guard incorrectly rejected the input. Move the guard to only apply to direct (non-path) invocations, and keep the post-extraction guard for --path usage. Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

jviotti · 2026-04-09T19:40:04Z

Interesting. However, instead of committing the fuzzing and JSON files, which ones of the ones there did result in a crash in Blaze? Did you find any? If so, for the ones that caused a crash, propose them as proper unit tests in test/evaluator? They can show it breaks at the moment, and we can debug from there?

…orpus Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

Vaibhav701161 added 6 commits April 8, 2026 14:09

contrib: guard compile CLI with is_schema to prevent abort on invalid…

5d0eb5e

… input Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

test: add initial valid schema corpus for compile binary

3160161

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

test: add invalid and edge-case schema inputs for robustness testing

fa15be4

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

test: add stress schemas to exercise compilation limits

9230476

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

contrib: add helper script to run compile corpus locally

7839427

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

docs: document compile corpus structure and known compiler issues

fbbf8d7

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

cubic-dev-ai bot reviewed Apr 8, 2026

View reviewed changes

Vaibhav701161 added 4 commits April 10, 2026 15:02

test: add failing tests for compiler crashes discovered via compile c…

bd166cf

…orpus Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

test: narrow compile crash PR scope

8ffd6e4

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

test: align evaluator invalid schema tests

98b76be

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

test: drop non-schema root evaluator case

f96a632

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add compilation corpus for fuzzing + improve CLI robustness of contrib/compile#693

Add compilation corpus for fuzzing + improve CLI robustness of contrib/compile#693
Vaibhav701161 wants to merge 11 commits intosourcemeta:mainfrom
Vaibhav701161:test/compile-corpus-fuzzing

Vaibhav701161 commented Apr 8, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Vaibhav701161 commented Apr 8, 2026

Uh oh!

jviotti commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Vaibhav701161 commented Apr 8, 2026

Summary

Motivation

Changes

1. Compilation Corpus (test/corpus/compile/)

2. CLI Robustness Fix (contrib/compile.cc)

3. Crash Discovery

4. Helper Script (contrib/run_corpus.sh)

5. Documentation

Design Considerations

Future Work

Impact

Related

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Vaibhav701161 commented Apr 8, 2026

Uh oh!

jviotti commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Compilation Corpus (`test/corpus/compile/`)

2. CLI Robustness Fix (`contrib/compile.cc`)

4. Helper Script (`contrib/run_corpus.sh`)