Add compilation corpus for fuzzing + improve CLI robustness of contrib/compile#693
Open
Vaibhav701161 wants to merge 11 commits intosourcemeta:mainfrom
Open
Add compilation corpus for fuzzing + improve CLI robustness of contrib/compile#693Vaibhav701161 wants to merge 11 commits intosourcemeta:mainfrom
Vaibhav701161 wants to merge 11 commits intosourcemeta:mainfrom
Conversation
… input Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Author
|
@jviotti , any comments ? |
…nd CI The is_schema() guard was checking the root document before --path extraction. When the JS test suite passes test-suite JSON files (arrays at root) with --path to navigate to the actual schema, the guard incorrectly rejected the input. Move the guard to only apply to direct (non-path) invocations, and keep the post-extraction guard for --path usage. Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Member
|
Interesting. However, instead of committing the fuzzing and JSON files, which ones of the ones there did result in a crash in Blaze? Did you find any? If so, for the ones that caused a crash, propose them as proper unit tests in |
…orpus Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
Signed-off-by: Vaibhav mittal <vaibhavmittal929@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a structured seed corpus for exercising the
contrib/compilebinary and improves its robustness when handling malformed inputs.The corpus is designed to support fuzzing workflows (e.g., AFL++) and systematically test the schema compilation pipeline against invalid, edge-case, and stress inputs.
Additionally, this PR adds a guard at the CLI boundary to prevent assertion failures when non-schema JSON inputs are provided.
Motivation
Following the introduction of the
compilecontrib binary in #410, the next step toward effective fuzz testing is having a high-quality input corpus.As discussed in #99, the compiler should:
This PR addresses that gap by:
Changes
1. Compilation Corpus (
test/corpus/compile/)A structured set of inputs organized by intent:
valid/- valid schemas across supported drafts and OpenAPI dialectsinvalid_json/- malformed JSON rejected by the parserinvalid_schema/- spec-invalid schemas (valid JSON, invalid semantics)unknown_dialect/- unsupported or malformed$schemavaluesedge_cases/- unusual boundary conditions and recursive structuresstress/- large schemas to exercise performance and limitsEach file is intentionally small and targets a specific compiler behavior.
2. CLI Robustness Fix (
contrib/compile.cc)Added an
is_schema()guard before compilation:3. Crash Discovery
Running the corpus revealed several inputs that currently trigger assertion failures (SIGABRT), including:
"items": "invalid")maxLength,multipleOf)allOf,anyOf,oneOf)These cases are documented in the corpus README and retained as regression inputs.
Expected behavior: graceful failure
Current behavior:
assert()abort4. Helper Script (
contrib/run_corpus.sh)A minimal utility to run the corpus locally and detect crashes.
5. Documentation
Added
test/corpus/compile/README.md:Design Considerations
Future Work
Impact
compilebinary under malformed inputsRelated
compilecontrib program for fuzzing purposes #410 - Introducecompilecontrib program