Skip to content

Add compilation corpus for fuzzing + improve CLI robustness of contrib/compile#693

Open
Vaibhav701161 wants to merge 11 commits intosourcemeta:mainfrom
Vaibhav701161:test/compile-corpus-fuzzing
Open

Add compilation corpus for fuzzing + improve CLI robustness of contrib/compile#693
Vaibhav701161 wants to merge 11 commits intosourcemeta:mainfrom
Vaibhav701161:test/compile-corpus-fuzzing

Conversation

@Vaibhav701161
Copy link
Copy Markdown

Summary

This PR introduces a structured seed corpus for exercising the contrib/compile binary and improves its robustness when handling malformed inputs.

The corpus is designed to support fuzzing workflows (e.g., AFL++) and systematically test the schema compilation pipeline against invalid, edge-case, and stress inputs.

Additionally, this PR adds a guard at the CLI boundary to prevent assertion failures when non-schema JSON inputs are provided.


Motivation

Following the introduction of the compile contrib binary in #410, the next step toward effective fuzz testing is having a high-quality input corpus.

As discussed in #99, the compiler should:

  • Never crash on invalid JSON, unknown dialects, or invalid schemas
  • Fail gracefully with an error and non-zero exit status

This PR addresses that gap by:

  1. Providing a curated corpus that targets real compiler paths (not random JSON)
  2. Enabling reproducible testing of compilation behavior
  3. Identifying cases where the compiler currently aborts instead of failing gracefully

Changes

1. Compilation Corpus (test/corpus/compile/)

A structured set of inputs organized by intent:

  • valid/ - valid schemas across supported drafts and OpenAPI dialects
  • invalid_json/ - malformed JSON rejected by the parser
  • invalid_schema/ - spec-invalid schemas (valid JSON, invalid semantics)
  • unknown_dialect/ - unsupported or malformed $schema values
  • edge_cases/ - unusual boundary conditions and recursive structures
  • stress/ - large schemas to exercise performance and limits

Each file is intentionally small and targets a specific compiler behavior.


2. CLI Robustness Fix (contrib/compile.cc)

Added an is_schema() guard before compilation:

  • Prevents assertion failures when the input is not a valid JSON Schema (object/boolean)
  • Ensures consistent error handling at the CLI boundary
  • Aligns behavior with expectations outlined in Add compilation testing binary #99

3. Crash Discovery

Running the corpus revealed several inputs that currently trigger assertion failures (SIGABRT), including:

  • Invalid keyword types (e.g., "items": "invalid")
  • Invalid numeric constraints (e.g., negative maxLength, multipleOf)
  • Empty applicator arrays (allOf, anyOf, oneOf)

These cases are documented in the corpus README and retained as regression inputs.

Expected behavior: graceful failure
Current behavior: assert() abort


4. Helper Script (contrib/run_corpus.sh)

A minimal utility to run the corpus locally and detect crashes.

  • Not part of CI
  • Intended for manual testing and fuzzing workflows

5. Documentation

Added test/corpus/compile/README.md:

  • Explains corpus structure and intent
  • Documents known crash-inducing inputs
  • Provides example AFL++ usage

Design Considerations

  • The corpus is not a conformance test suite; it is focused on robustness and fuzzing
  • Inputs are curated for signal over volume (duplicates removed)
  • The CLI guard avoids modifying compiler internals while improving usability and stability

Future Work

  • Integrate corpus with AFL++ or similar fuzzers
  • Address identified assertion failures in the compiler
  • Potentially extend coverage for additional dialects and features

Impact

  • Improves reliability of the compile binary under malformed inputs
  • Provides a foundation for systematic fuzz testing
  • Surfaces real robustness issues in the compilation pipeline

Related

… input

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 64 files

@Vaibhav701161
Copy link
Copy Markdown
Author

@jviotti , any comments ?

…nd CI

The is_schema() guard was checking the root document before --path
extraction. When the JS test suite passes test-suite JSON files (arrays
at root) with --path to navigate to the actual schema, the guard
incorrectly rejected the input. Move the guard to only apply to
direct (non-path) invocations, and keep the post-extraction guard
for --path usage.

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
@jviotti
Copy link
Copy Markdown
Member

jviotti commented Apr 9, 2026

Interesting. However, instead of committing the fuzzing and JSON files, which ones of the ones there did result in a crash in Blaze? Did you find any? If so, for the ones that caused a crash, propose them as proper unit tests in test/evaluator? They can show it breaks at the moment, and we can debug from there?

…orpus

Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants