smite-ir: Introduce affine state variables for strict state enforcement#97
smite-ir: Introduce affine state variables for strict state enforcement#97Chand-ra wants to merge 8 commits into
Conversation
531ca61 to
798752a
Compare
| let candidate_to_resolve = chosen_candidate | ||
| .unwrap_or_else(|| candidates.last().unwrap()) | ||
| .clone(); | ||
| return self.resolve_candidate(candidate_to_resolve); | ||
| } |
There was a problem hiding this comment.
As discussed here, this is an anti-pattern and we should modify the function's signature to return an Option<usize> instead of intentionally generating an invalid instruction.
There was a problem hiding this comment.
Decided to panic instead, because that's in-line with the current behavior. SpliceMutator will require this behavior to change as a whole, so I'll modify it if and when we need it instead of front-loading the complexity in this PR.
34528b0 to
18143e2
Compare
|
Rebased on top of the latest master to resolve merge conflicts. |
| actual_type, expected_type, | ||
| "{operation:?} input {i}: expected {expected_type:?}, got {actual_type:?}", | ||
| ); | ||
| if actual_type.is_affine() { |
There was a problem hiding this comment.
There's some trickiness here -- should is_affine() return true or false for a compound variable containing an affine field?
If it returns true, we need to ensure below that we only remove the candidate when the affine field is actually extracted -- not when a non-affine field is extracted.
If it returns false, this code will never run for compound variables, and we would fail to detect multiple uses of affine fields.
Or if we are going to ban affine types from being extractable fields, we should add an assertion somewhere to panic on such cases.
There was a problem hiding this comment.
Compound variables containing an affine type should definitely return false on is_affine(), to allow multiple extractions from them.
To access an affine variable contained in a compound one, Generators should first manually Extract* the affine variable, registering it as an available candidate. We can then detect over-extractions by re-introducing the OverExtract logic we just got rid of, and over-use of the extracted affine will be naturally handled by the existing code.
Or, we can introduce dummy operations that accept such compound variables and output a single affine type.
In any case, I don't think we need to deal with this for now. The specifics can be ironed out if and when we encounter such a scenario (I personally have a feeling we won't).
There was a problem hiding this comment.
Then let's ban it and add an assertion in the operation.extractable_fields() loop below that extractable fields are never affine.
639bf6a to
825f42b
Compare
Add `SentOpenChannel` to the `Variable` and `VariableType` enums. Unlike standard data variables which can be referenced infinitely, this new type is designed to be "affine" (single-use). Affine variables represent strict topological locks in the LN state machine (e.g., `SentOpenChannel` ensures a channel is actively pending before allowing a `RecvAcceptChannel`).
Update `pick_variable` with a dedicated branch for affine types. Unlike data variables which use a probabilistic 75/15/10 strategy, affine types use deterministic tip-tracking.
Update `Program::validate()` to enforce the single-use nature of affine state variables, preventing mutators from generating impossible state-machine sequences. Add a validation rule: An affine variable cannot be consumed as an input more than once. This prevents mutators from reusing old state variables to bypass protocol ordering.
Affine variables carry no data and act purely as state tokens, so swapping them yields no practical fuzzing value.
Introduce a dedicated `OpenChannelMessage` variable, separating it from the generic `Message` type. This strongly types the output of `BuildOpenChannel`, and is used as an input to `SendOpenChannel` in a subsequent commit. This prevents `InputSwapMutator` from swapping it with a different message type, possibly causing a timeout.
Introduce the strict topological link for channel initialization. - Add `SendOpenChannel` operation: Consumes a composed `Message` and outputs a `SentOpenChannel` affine variable. The executor verifies the outgoing message is type `open_channel`. - Modify `RecvAcceptChannel`: Now requires `SentOpenChannel` as an input, enforcing that the fuzzer cannot wait for an `accept_channel` unless it actually opened a channel first. - Update unit tests to accommodate the new input requirements for `RecvAcceptChannel`.
Update `OpenChannelGenerator` to utilize the newly introduced strict topological operations: 1. Replace `SendMessage` with `SendOpenChannel`, which consumes the `Message` payload and outputs a `SentOpenChannel`. 2. Update `RecvAcceptChannel` to consume the `SentOpenChannel`. 3. Update tests to accommodate the non-void output of `SendOpenChannel`.
Add tests to verify the data-flow rules introduced for state tracking.
825f42b to
982d207
Compare
|
Had to rebase again to solve a merge conflict. |
| let chosen_candidate = candidates | ||
| .pop() | ||
| .unwrap_or_else(|| panic!("no candidates for {var_type:?}")); | ||
| return self.resolve_candidate(chosen_candidate); | ||
| } |
There was a problem hiding this comment.
Typical usage of ProgramBuilder is like this:
let v = builder.pick_variable(SentOpenChannel, rng);
builder.append(RecvAcceptChannel, &[v]);So we can't remove from self.candidates in both pick_variable and append or we'll double-consume the affine variable.
I'm pretty sure we want append to be the only place where we actually mark the variable as consumed.
| @@ -105,10 +124,17 @@ impl ProgramBuilder { | |||
| /// variable selection (75% most recent, 15% any existing, 10% fresh). | |||
| #[allow(clippy::missing_panics_doc)] // candidates is always non-empty | |||
There was a problem hiding this comment.
candidates actually can be empty now since we delete affine types from the entry's vector in append but never fully remove the entry itself.
We should either ensure that empty entries get fully deleted or we should fix the assumption here.
| actual_type, expected_type, | ||
| "{operation:?} input {i}: expected {expected_type:?}, got {actual_type:?}", | ||
| ); | ||
| if actual_type.is_affine() { |
There was a problem hiding this comment.
Then let's ban it and add an assertion in the operation.extractable_fields() loop below that extractable fields are never affine.
| .flat_map(|(i, instr)| { | ||
| (0..instr.inputs.len()) | ||
| // Affine variables carry no data, so swapping one affine variable with | ||
| // another has no practical effect. | ||
| .filter(move |&j| !instr.operation.input_types()[j].is_affine()) | ||
| .map(move |j| (i, j)) | ||
| }) |
There was a problem hiding this comment.
Nit: it's a bit awkward/inefficient to call input_types once for every input instead of once per instruction
| .flat_map(|(i, instr)| { | |
| (0..instr.inputs.len()) | |
| // Affine variables carry no data, so swapping one affine variable with | |
| // another has no practical effect. | |
| .filter(move |&j| !instr.operation.input_types()[j].is_affine()) | |
| .map(move |j| (i, j)) | |
| }) | |
| .flat_map(|(i, instr)| { | |
| instr | |
| .operation | |
| .input_types() | |
| .into_iter() | |
| .enumerate() | |
| .filter_map(move |(j, ty)| (!ty.is_affine()).then_some((i, j))) | |
| }) |
| } | ||
|
|
||
| Operation::SendOpenChannel => { | ||
| let bytes = resolve_message(&variables, instr.inputs[0])?; |
There was a problem hiding this comment.
This should be resolving an OpenChannelMessage now instead of Message, which also allows us to remove the msg_type check below.
We should also test the case where a different message type is passed as input.
| let sent_open_channel = resolve_sent_open_channel(&variables, instr.inputs[0])?; | ||
| // Consume the affine `SentOpenChannel`. | ||
| variables[sent_open_channel] = None; |
There was a problem hiding this comment.
I think it would make sense to rename resolve_sent_open_channel to consume_sent_open_channel and have it update variables instead, so the resolve+consume operations happen in the same place.
| index: usize, | ||
| ) -> Result<usize, ExecuteError> { | ||
| let Ok(var) = resolve(variables, index) else { | ||
| return Err(ExecuteError::AffineOverUse { index }); |
There was a problem hiding this comment.
This mislabels index-out-of-bounds errors as AffineOverUse.
| std::time::Instant::now(), | ||
| ) | ||
| .unwrap_err(); | ||
| dbg!(&err); |
| return Err(ExecuteError::AffineOverUse { index }); | ||
| }; | ||
| match var { | ||
| Variable::SentOpenChannel => Ok(index), |
There was a problem hiding this comment.
Returning the same index that was passed in seems off. I think we should just return Ok(()) instead.
If we do change this function to also consume the affine variable, the return value will be unused anyway,
This PR introduces affine (single-use) types to the IR to strictly enforce the Lightning Network state machine during fuzzing, in order to prevent timeout hangs caused by semantically invalid mutation sequences.
Solves issue #75, see it for more details.