build/python-build/preprocess creates a local packaging mirror by copying the repository src, include, src_interfaces, and include_interfaces trees into build/python-build/.
Those copied directories are generated build inputs, not source of truth. They are ignored by git and may be deleted at any time. The preprocess target removes the generated mirror before copying fresh files so stale local headers or source files are not mixed with current repository code.
When reviewing, debugging, or changing G2S code, use the repository-level source and header trees. Do not treat ignored files under build/python-build/src, build/python-build/include, build/python-build/src_interfaces, build/python-build/include_interfaces, or the local build/python-build/jsoncpp clone as canonical project source.
Native ds is implemented by src/ds.cpp and include/directSampling.hpp. It intentionally reuses the generic simulation() / simulationFull() orchestration and only opts into additional SamplingModule hooks for raw neighbor values, strict informed-neighbor filtering, safe TI-id resolution, per-node sample context, and kernel flat-index mapping. Legacy ds-l remains implemented by src/ds-l.cpp and should not be changed when working on native DS behavior.
Native DS uses the first configured kernel as its default mismatch kernel and switches to a per-node kernel only when a valid -kii map selects one. -ii maps are interpreted per simulation cell in both vector and full simulation; multi-channel -ii maps may provide per-variable TI selection for full simulation. Candidate patterns that cannot support the full observed data event inside the selected TI are rejected instead of being scored on a partial neighborhood. After simulation, native DS applies a conservative categorical singleton cleanup that preserves conditioning values and only flips one-cell islands fully surrounded by another categorical value.
Native DS visits TI candidates through a deterministic pseudo-random permutation over the allowed flattened TI candidates. The permutation is array-free, bounded by -f / -mer, and keyed by the global seed, local per-node seed, simulation path order, and variable. Per-node sample context is stored per thread so parallel path workers do not overwrite each other's candidate-order context.
Explicit -sp path ordering and -wPO dependency ordering use deterministic tie-breakers. This is required for repeated same-seed DS runs because equal path priorities or dependency depths otherwise leave the visit order unspecified, especially when path optimization uses a parallel sort implementation.
When a sampler requests strict informed neighbors, the parallel simulation loop waits until earlier-path neighbor values are written before adding them to the data event. Native DS depends on this behavior for reproducibility under -j; otherwise thread timing can decide whether a still-pending neighbor is skipped.
Continuous DS mismatch keeps the generic pow path for custom -cn / -cnorm values, but shortcuts the common 1 and 2 powers with direct absolute-difference and multiplication/square-root operations in the candidate scoring loop.
The Python and MATLAB native DS examples are intentionally fully unconditional: their destination images are all NaN, use the same spatial and variable shape as the loaded training image, and pass -j 1.00001 to exercise path-level parallel execution. Do not add sparse demonstration conditioning points to those examples unless the example is explicitly renamed and documented as conditional.
The transform helper in include/qsTransformUtils.hpp is shared by QS and native DS. Rotation and scale tolerance maps are inert unless provided by a caller; existing QS deterministic transform behavior should remain unchanged.
G2S clients and the server communicate over ZeroMQ using one binary request frame. Every request starts with infoContainer, defined in include/protocol.hpp:
struct infoContainer {
int version;
taskType task;
};The remaining payload depends on taskType. Current task payload shapes are:
| Task | Payload |
|---|---|
EXIST |
64-byte content identifier |
UPLOAD |
64-byte content identifier followed by serialized .bgrid payload |
DOWNLOAD |
64-byte content identifier |
JOB |
job JSON bytes |
PROGESSION |
one jobIdType |
DURATION |
one jobIdType |
KILL |
one jobIdType |
UPLOAD_JSON |
64-byte content identifier followed by JSON bytes |
DOWNLOAD_JSON |
64-byte content identifier |
SHUTDOWN |
no payload |
SERVER_STATUS |
no payload |
JOB_STATUS |
one jobIdType |
DOWNLOAD_TEXT |
64-byte content identifier |
Validation must happen before a request payload is cast, copied, deserialized, used as a filename, or passed to task-specific handlers. At minimum, request handling must check:
- the frame is at least
sizeof(infoContainer); versionis positive and understood by the receiver;taskis one of the supported enum values;- fixed-size payloads match exactly;
- variable-size payloads are bounded by explicit limits;
- 64-byte content identifiers contain only the expected safe hash or object-name format for that task;
- job JSON, data payloads, and text payloads are rejected when malformed or oversized;
- invalid requests produce a deterministic error reply.
The server currently performs part of this validation in src/server.cpp before dispatch and in data storage helpers. Future protocol changes should move toward one parser/serializer layer that returns typed validated requests, so new task cases do not repeat manual size checks and byte casts.
New or changed task types should include parser tests, including malformed and boundary-size frames. Fuzzing the request parser is the preferred way to cover truncated frames, oversized frames, unknown task ids, invalid content identifiers, and mismatched serialized payload sizes.
The server remains stateless with respect to client delivery state. Interfaces are responsible for remembering what they have already displayed.
Structured per-job reporting now uses sidecar files under /tmp/G2S/:
/tmp/G2S/logs/<job>.log: chronological human-readable log/tmp/G2S/warnings/<job>.txt: warning event stream/tmp/G2S/errors/<job>.txt: fatal error payload/tmp/G2S/progress/<job>.kv: current machine-readable progress snapshot/tmp/G2S/meta/<job>.kv: final key/value summary
The existing DOWNLOAD_TEXT task is reused for these artifacts through conventional names:
log_<job>warning_<job>error_<job>progress_<job>meta_<job>
This keeps the server stateless:
- the server only returns the current contents of the requested artifact;
- the interface keeps local cursors such as
log_offsetandwarning_offset; - structured progress is polled as current state, not tailed as an append-only stream.
progress_<job> is a line-based key/value file. Typical keys include:
statusprogress_percentstagestage_detailcurrent_steptotal_stepslast_update_unix_ms
meta_<job> is also a line-based key/value file and is intended to be read at the end of the run. Typical keys include:
job_idalgorithmstatusstart_time_unix_msend_time_unix_msduration_mswarning_count- algorithm-specific timing keys such as
tree_creation_msorsimulation_ms
Interfaces should prefer the structured progress and meta files for progress and duration. Plain logs should be treated as human-facing traces, not as the canonical machine-readable status source.
The reporting path is designed to keep the server stateless. The server does not remember what was already sent to any client or interface. Instead:
- the server exposes the current contents of
log_<job>,warning_<job>,error_<job>,progress_<job>, andmeta_<job> - the interface keeps local per-job cursors for append-only streams such as
log_offsetandwarning_offset - each poll requests the current progress snapshot plus any newly appended log/warning bytes
- final metadata is read from
meta_<job>once the run finishes
This split matters operationally:
progress_<job>is current state, so interfaces overwrite their previous view on each polllog_<job>andwarning_<job>are append-only streams, so interfaces display only the new suffix they have not shown yeterror_<job>is a terminal payload, so interfaces can fetch it when job status changes to failure
This keeps delivery-side state out of the server while still allowing live -showLogs output in MATLAB, Python, and other bindings.
The human log should still be structured enough to follow setup and outputs without reading raw source. The current convention is:
INPUT: successful data/image loads with resolved shape and encodingPARAM: effective parameter values after parsing, defaulting, and mode selectionOUTPUT: emitted result artifacts with resolved shape and encoding
These log lines are for operators and debugging only. They should not be parsed as the authoritative machine-readable state channel.
Bindings now separate transport/state from display:
-showLogsenables live display of newly appendedlog_<job>andwarning_<job>text while the job runs-returnMetareturns the final parsedmeta_<job>key/value payload to the caller- Python displays warnings in orange/yellow-ish ANSI text and errors in red before warning/exception propagation
- MATLAB warnings are non-fatal again, while fatal errors still abort the call
The repository includes a small reporting-only utility algorithm, report_probe, for exercising this stack without running a full simulation. It is implemented by src/errorTest.cpp, registered in build/algosName.config, and supports:
-mode log: progress plus plain log lines, then success-mode warning: progress, plain log lines, one warning event, then success-mode error: progress, plain log lines, one warning event, a fatal error payload, then nonzero exit
The companion examples example/python/reporting_probe.py and example/matlab/reporting_probe.m are the intended smoke tests for interface-side warning/error/log rendering. Their error-path probe intentionally does not catch the fatal interface exception, so the default behavior matches ordinary caller expectations in Python and MATLAB. Users who want to recover from that failure path should add their own try/except or try/catch around the probe call.
The human-readable log is now expected to show both setup and effective behavior, not just raw argv. In practice this means:
- successful image/grid loads should produce
INPUTlines with the resolved source name, shape, variable count, encoding, and variable-type summary - resolved execution settings should produce
PARAMlines after argument parsing and defaulting - emitted outputs should produce
OUTPUTlines with the resolved artifact id and resulting shape
For -wPO, the current conventions are:
qsand nativeds: logpath_optimization=true|falsebecause the flag is effective in vector simulationsnesim: logspath_optimization_requested=true|falseso the request is visible in the operator log, even though the current scaffold does not expose the same effective mode as QS
Native QS accepts deterministic CPU-only local search-pattern transforms:
-rmisupplies rotation per simulated node;-smisupplies isotropic scale per simulated node.
Transforms map original QS template offsets into simulation-space lookup offsets before candidate matching. The training image is not transformed. QS reads already simulated values at transformed offsets, then passes those values to the existing matcher with the original TI/kernel offsets so kernel weights keep their original flat-index mapping. This preserves vector/full simulation behavior while allowing constant rotations or scales to turn or resize TI structures in the simulation.
Supported geometries are 2D and 3D only. 2D rotation maps use one channel containing radians in the XY plane. 3D rotation maps use four channels containing quaternion values in (qx, qy, qz, qw) order; invalid or near-zero node quaternions fall back to identity rotation. Scale maps use one channel; invalid node scale values fall back to identity scale.
The Python example example/python/qs_rotation_equivalence_2d.py checks the constant-rotation sign convention by comparing -rmi +pi/2 with clockwise and counter-clockwise rotated training images.