Skip to content
This repository was archived by the owner on Sep 17, 2025. It is now read-only.

daniel-lukacs/confgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WARNING: This repository is not under active development. All code is experimental, it was shared with educational purposes. Use at your own risk!

Quick start

This is a small C++ property testing tool for automatically generating arbitrary JSON documents (e.g. configurations) that satisfy all properties in a user-provided JSON Schema.

Benefits:

  • Automatic testing over high number of input combinations.
  • Declarative description of input type and restrictions (e.g. boundary limits, size limits, etc.)
  • "Dangerous" input combinations are prioritized.

Here, we explain demo1.cpp, a small demonstration of how the tool works.

To build and run:

$ xmake                                  # builds the code
$ ./build/linux/x86_64/release/demo1     # or: xmake run demo1

The demo contains the (faulty) implementation of a simple function that calculates the mean of two numbers (truncated to zero) stored in a JSON document. Calling mean with a JSON configuration like { "a" : 1, "b" : 5 } should result in (1+5)/2 = 6.

int mean(const nlohmann::json & conf){
  int16_t a = conf.at("a").get<int16_t>();
  int16_t b = conf.at("b").get<int16_t>();

  return (a+b)/2;
}


bool meanTest(const nlohmann::json & conf){
  int a = conf.at("a").get<int>();
  int b = conf.at("b").get<int>();

  int expected = (a+b)/2;
  int actual = mean(conf);

  if(expected != actual){
    throw TestFailed(std::format("Expected result: {}. Actual result: {}. Configuration: {}", expected, actual, nlohmann::to_string(conf)));
  }

  return true;
}

The demo runs two types of tests, testing whether this expectation is satisfied for all generated configurations:

  • Test 1 checks mean for configurations based on the following schema (fields a and b are both integers between 0 and 1000):
{ "type" : "object"
, "required" : ["a", "b"]
, "properties" : 
    { "a" : {"type" : "integer", "minimum" : 0, "maximum" : 1000 }
    , "b" : {"type" : "integer", "minimum" : 0, "maximum" : 1000}
    }
, "additionalProperties" : false 
}
  • Test 2 checks mean for configurations based on the following schema (fields a and b are both integers with no further restrictions.):
{ "type" : "object"
, "required" : ["a", "b"]
, "properties" : 
    { "a" : {"type" : "integer"}
    , "b" : {"type" : "integer"}
    }
, "additionalProperties" : false 
}

Running the demo will produce the following output (with high probability):

Using configuration: seed=16412145747125965940

- Test mean()
OK, passed 100 tests
Test 1 successfully passed.

- Test mean()
Test failed: Expected result: -44199. Actual result: 21337. Configuration: {"a":-37621,"b":-50777}

As we see, Test 1 was successful: 100 different JSON configurations were generated based on the schema, and mean passed all of them.

On the other hand, Test 2 failed! Indeed, the output shows that a JSON configuration was generated for which mean generated seemingly non-sensicial results: ((-37621) + (-50777)) / 2 = -44199, yet the returned value is not even close. Looking at the implementation, we may notice that it is using int16_t which is (on most modern platforms) smaller than the generated int, resulting in an integer overflow during the casts (or in some other cases: during the addition).

As we see, the tool manage to automatically find a JSON configuration that triggered faulty behavior in the implementation.

Limitations

  • This is a property testing tool, which comes with a number of consequences:

    • As usual in testing, it should not signal false positives, but it may signal false negatives. Paraphrasing E. Dijkstra, tests can only show the presence of bugs, not their absence. (Property testing lessens the probability of these false negatives, compared to manual testing.)
    • In addition, property tests may signal a third, indecisive result (also known as: "giving up"). This happens when the property tester cannot find (in a given number of attempts) a value that satisfies the given constaints. An example of an unsatisfiable schema can be found in tests/negative/06-array-too-many-unique-schema.json (generating 5 unique items of a two-valued type is impossible).
    • Property testing is, by its nature, non-deterministic: test values are generated randomly (although usually not with a uniforml probability distribution). This means that subsequent runs of tests may result in different outcomes. Ideally, this should be avoided by increasing sample size, as opposed to fixing the randomness seed.
  • The current implementation is not 100% complete with respect to JSON Schema. These should have examples in the tests/negative folder. Only the types in this specification are implemented, and even among these, the following are not yet implemented:

    • Regex constraints on strings.
    • Regex constraints on object fields.
    • Dependency constraints on objects.
    • Non-integer numbers. (RapidCheck does not support inRange for double or float. See this issue.)

Self-tests

Self-tests consist of:

  1. Generating large amount of values based on selected JSON schemas, and then
  2. checking whether all values satisfy the schema.

We defined two kinds of test cases for testing the tester:

  • In tests/postive, there are schemas for which the tool should be able to generate values.
  • Intests/negative, there are schemas which the tools should reject, either because their support is not implemented yet, or because they cannot be satisfied at all.

To run

$ xmake                                       # builds the code
$ ./build/linux/x86_64/release/self-tests     # or: xmake run self-tests

In the outcome, successful tests in the positive category should appear like this:

- Testing schema 'Array: homogenous, lower and upper bounded size' (5/11 in tests/positive/05-array-schema.json).
OK, passed 100 tests

Successful tests in the negative category should appear like this:

- Testing schema '(F) String: Regular expression (not implemented)' (1/1 in tests/negative/01-string-regex-schema.json).
Generation failed. Reason: Schema '{"$comment":"(F) String: Regular expression (not implemented)","pattern":"^[A-Za-z]*@gmail.com$","type":"string"}' has 'pattern' property, which not yet implemented.

- Testing schema '(F) Array: not enough values to generate unique array of size' (1/1 in tests/negative/06-array-too-many-unique-schema.json).
Generation failed. Reason: rc::GenerationFailure("Gave up trying to generate 5 values for container"

Here, the first test failed (as expected) with an exception thrown by the generator, because we do not yet support generating strings based on regular expressions. The second test also failed (as expected) with an exception, because RapidCheck gave up trying to find a value that satisfies the unsatisfiable schema.

API

The API consists of two parts, everything is under the jg namespace in include/JsonGenerator.h.

The first part consists of check functions that let you run a tester function (of type std::function<bool(const nlohmann::json&)>) over generated JSON documents. Internally, these functions call rc::check of RapidCheck.

  • bool jg::check(const std::string & description, const nlohmann::json & schema, const std::function<bool(const nlohmann::json&)>& testable)

    • This function instance expects an arbitrary description (e.g. the name of the test), the JSON schema that will be used for generation, and the tester function (utilizing a generated JSON document).
  • bool jg::check(const std::string & description, const nlohmann::json & schema, const std::function<bool(const nlohmann::json&)>& testable, jg::LogSet & log, const int depth)

    • This overload expects a logger, that will be used as a target for printing messages. As RapidCheck will run the tester 100x times, we use a special logger that stores messages in a set, in order to avoid repetitive output.
    • The depth argument is used when generating recursive types. For example, JSON schema { "type : "object" } allows the generation of objects whose fields are also objects, whose fields are also objects, and so on. The depth ensures that the recursion stops deterministically.
  • bool jg::check(const std::string & description, const std::function<bool(const T&)>& testable)

    • This overload can be used without JSON schemas for any configuration class T with a custom rc::arbitrary<T>() implementation. (See demo2.cpp.)

The second part of the API consists of internal functions that can be used to generate JSON documents. IMPORTANT: These can only be invoked inside the tester function passed to jg::check or rc::check.

  • const nlohmann::json generateJson(const nlohmann::json & schema, LogSet & log, const int depth)

Demo 2: Compile-time constraints (without JSON schema)

In demo2.cpp, we test the same scenario as in demo1.cpp, but we show how to describe a configuration and constraints in C++ code, instead of JSON Schema.

Usage and outcome should be the same, differences are in the code:

$ xmake                                  # builds the code
$ ./build/linux/x86_64/release/demo2     # or: xmake run demo2

Description of configuration class and constraints in demo2.cpp:

class Config {
  public:
    int a;
    int b;
};
[...]

namespace rc {
  template<>
  struct Arbitrary<Config> {
    static Gen<Config> arbitrary() {
      return gen::build<Config>(
          gen::set(&Config::a, rc::gen::inRange(0,1000)),
          gen::set(&Config::b, rc::gen::inRange(0,1000))
      );
    }
  };
}

The overload of rc::gen::arbitrary<Config>() describes the constraints that will be satisfied by all generated instances of Configuration. Here, both a and b fields will get values between 0 and 999. Refer to the RapidCheck documentation on how to define constraints for your custom class.

A call like jg::check<Config>("Test mean()", [](const Config & conf){return false;}) will automatically generate instances of Config satisfying these constraints, and pass them to the tester function.

Implementation notes

We use a patched version of RapidCheck:

  • This version does not absorb exceptions thrown during testing, it lets them through. If needed, the tester function can catch exceptions to prevent interrupting the testing loop. Those exceptions that cannot be catched there (e.g. RapidCheck giving up the generation) are not recoverable anyway, so it's OK to let them through.
  • If you use the packaged xmake.lua, the patching is done automatically (unless you already have installed rapidcheck in your ~/.xmake).

In demo2.cpp, we use a macro for automatically generating serialization/deserialization fuctions for Nlohmann JSON, found here.

In self-tests.cpp, we rely on JSON Schema Validator, to check that generated values in fact satisfy the schema.

About

C++ property testing tool for automatically generating arbitrary JSON documents based on JSON schema constraints.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors