Locating `THROTL_GRP_QUANTUM` in assembly (and the more general question of feasibility of #16)

## `THROTL_GRP_QUANTUM`

**TL;DR**: The second `while` loop (https://elixir.bootlin.com/linux/v6.14/source/block/blk-throttle.c#L960) is compiled with certain unrolling optimization

Experiments: directly mutating `VALUE` below:

```c
		if (nr_writes >= VALUE)
			break;
```

```shell
for value in `seq 1 10`; do
    sed "s|if (nr_writes >= FINDME)|if (nr_writes >= $value)|" \
        < block/blk-throttle.template.c \
        > block/blk-throttle.c
    make block/blk-throttle.o
    objdump -d block/blk-throttle.o > disas/origin.$value.txt
done
```

Running `x=N; diff origin.$x.txt origin.$((x+1)).txt`
**Only 4 to 5 produce reasonably small changes. All other adjacent pairs differ drastically.**

```diff
4083c4083
<     3648:     41 83 fd 04             cmp    $0x4,%r13d
---
>     3648:     41 83 fd 05             cmp    $0x5,%r13d
```

### Adding `__attribute__((optimize("O0")))` to `throtl_dispatch_tg` function

1 vs. 2

```diff
73,74c73,74
<       c3:     83 7d e4 00             cmpl   $0x0,-0x1c(%rbp)
<       c7:     75 39                   jne    102 <throtl_dispatch_tg+0xf2>
---
>       c3:     83 7d e4 01             cmpl   $0x1,-0x1c(%rbp)
>       c7:     77 39                   ja     102 <throtl_dispatch_tg+0xf2>
```

All other adjacent pairs are like

```diff
73c73
<       c3:     83 7d e4 01             cmpl   $0x1,-0x1c(%rbp)
---
>       c3:     83 7d e4 02             cmpl   $0x2,-0x1c(%rbp)
```

**With `-O0`, code structure can be retained**

### Adding `__attribute__((optimize("O2","-fno-unroll-loops","-fno-peel-loops","-fno-unswitch-loops")))`

```diff
$ x=4 ; diff no-unroll.$x.txt no-unroll.$((x+1)).txt
4011c4011
<     350e:     41 83 fc 04             cmp    $0x4,%r12d
---
>     350e:     41 83 fc 05             cmp    $0x5,%r12d
$ x=5 ; diff no-unroll.$x.txt no-unroll.$((x+1)).txt
4011c4011
<     350e:     41 83 fc 05             cmp    $0x5,%r12d
---
>     350e:     41 83 fc 06             cmp    $0x6,%r12d
$ x=7 ; diff no-unroll.$x.txt no-unroll.$((x+1)).txt
4011c4011
<     3512:     41 83 fc 07             cmp    $0x7,%r12d
---
>     3512:     41 83 fc 08             cmp    $0x8,%r12d
$ x=8 ; diff no-unroll.$x.txt no-unroll.$((x+1)).txt
4011c4011
<     3512:     41 83 fc 08             cmp    $0x8,%r12d
---
>     3512:     41 83 fc 09             cmp    $0x9,%r12d
$ x=9 ; diff no-unroll.$x.txt no-unroll.$((x+1)).txt
4011c4011
<     3512:     41 83 fc 09             cmp    $0x9,%r12d
---
>     3512:     41 83 fc 0a             cmp    $0xa,%r12d
```

All other adjacent pairs exhibit huge differences. E.g. from 6 to 7, the `cmp` is moved to a different address

```
$ for x in `seq 1 9`; do echo "x=$x"; diff no-unroll.$x.txt no-unroll.$((x+1)).txt | wc -l; done
x=1
3349
x=2
3322
x=3
89
x=4
4
x=5
4
x=6
99
x=7
4
x=8
4
x=9
4
```

Cancelling more optimization make things better but still not like `-O0`

```c
__attribute__((optimize(
	"O2",
	"-fno-unroll-loops",
	"-fno-peel-loops",
	"-fno-unswitch-loops",
	"-fno-reorder-blocks",
	"-fno-guess-branch-probability",
	"-fno-reorder-functions"
)))
```

```
$ for x in `seq 1 9`; do echo "x=$x"; diff no-reorder.$x.txt no-reorder.$((x+1)).txt | wc -l; done
x=1
3304
x=2
3344
x=3
4
x=4
4
x=5
4
x=6
4
x=7
4
x=8
4
x=9
4
```

### Dump optimization record

```shell
gcc -Wp,-MMD,block/.blk-throttle.o.d -nostdinc -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -I./ubuntu/include -include ./include/linux/compiler_types.h -D__KERNEL__ -std=gnu11 -fshort-wchar -funsigned-char -fno-common -fno-PIE -fno-strict-aliasing -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -mindirect-branch-cs-prefix -mfunction-return=thunk-extern -fno-jump-tables -mharden-sls=all -fpatchable-function-entry=16,16 -fno-delete-null-pointer-checks -O2 -fno-allow-store-data-races -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -ftrivial-auto-var-init=zero -fno-stack-clash-protection -fzero-call-used-regs=used-gpr -pg -mrecord-mcount -mfentry -DCC_USING_FENTRY -fmin-function-alignment=16 -fstrict-flex-arrays=3 -fno-strict-overflow -fno-stack-check -fconserve-stack -Wall -Wundef -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Werror=strict-prototypes -Wno-format-security -Wno-trigraphs -Wno-frame-address -Wno-address-of-packed-member -Wmissing-declarations -Wmissing-prototypes -Wframe-larger-than=1024 -Wno-main -Wno-dangling-pointer -Wvla -Wno-pointer-sign -Wcast-function-type -Wno-stringop-overflow -Wno-array-bounds -Wno-alloc-size-larger-than -Wimplicit-fallthrough=5 -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wenum-conversion -Wextra -Wunused -Wno-unused-but-set-variable -Wno-unused-const-variable -Wno-packed-not-aligned -Wno-format-overflow -Wno-format-truncation -Wno-stringop-truncation -Wno-override-init -Wno-missing-field-initializers -Wno-type-limits -Wno-shift-negative-value -Wno-maybe-uninitialized -Wno-sign-compare -Wno-unused-parameter -g -gdwarf-5  -fsanitize=bounds-strict -fsanitize=shift -fsanitize=bool -fsanitize=enum  -fsanitize=signed-integer-overflow    -DKBUILD_MODFILE='"block/blk-throttle"' -DKBUILD_BASENAME='"blk_throttle"' -DKBUILD_MODNAME='"blk_throttle"' -D__KBUILD_MODNAME=kmod_blk_throttle \
-fopt-info-all-optall \
-fsave-optimization-record \
-c -o block/blk-throttle.o block/blk-throttle.c 
```

```text
block/blk-throttle.c:954:56: note: considering unrolling loop 4 at BB 39
considering unrolling loop with constant number of iterations
considering unrolling loop with runtime-computable number of iterations
```

which confirms the theory again.

## Feasibility of #16 

See chat history https://chatgpt.com/share/68b53b44-6f98-8013-9258-dff728a75001 
Especially the last question --

> 3. Forget about compilers for now. If I want to achieve something semantically equivalent to "changing a constant in source from A to B", is it always possible to do so by only modifying assembly operands (i.e. not changing assembly code structure or instruction boundaries), even via heroic manual binary rewrite? If not, give counter examples

<details>
  <summary>ChatGPT answer</summary>
3) Can you *always* emulate “A→B” by **only** changing existing operands (no instruction-length/control-flow/layout changes)?

**No.** Many real cases require changing instruction *forms*, duplicating/removing blocks, or editing data sections. Counterexamples:

1. **Immediate size overflows**
   Original fits imm8, new needs imm32:
   `cmp $127,%eax` (3 bytes) → `cmp $300,%eax` needs the imm32 form (6 bytes). You can’t keep instruction boundaries unchanged.

2. **Algorithm selection changes**
   Division/mod by constant:
   `x/8` → `sar $3,%eax` (1 instruction).
   `x/3` → sequence with a 64-bit multiply by magic + shifts/adds.
   Going **to** or **from** a power-of-two divisor crosses between *one* instruction and a *sequence*. No operand-only patch can do that.

3. **Loop unrolling/peeling/if-conversion driven by the constant**
   A bound moving from 2→3 may flip heuristic thresholds: the compiler might (un)roll or (un)peel. The number of replicated basic blocks changes; you would have to add or remove blocks, not just immediates.

4. **Switch lowering & jump-table sizing**
   Changing `case 0..A` to `case 0..B` generally requires resizing the jump table, adjusting the range check and bias, and possibly emitting more case bodies. Simply bumping a compare immediate will index past the last table entry — wrong semantics.

5. **Compile-time only constants**
   Anything in `#if` / `static_assert` / template parameters / `if constexpr` affects what code exists. The “other side” of the branch was never generated; you can’t conjure it by operand edits.

6. **Layout-affecting constants**
   Array/struct sizes or alignment constants change field offsets and stack frame layout throughout a function (and sometimes across TUs). Achieving semantic equivalence would require editing dozens of displacements, and may still miss ABI effects. Some changes also alter section sizes → cascading address shifts; you said “no structure change.”

7. **Call vs inline cutoff**
   A size threshold might flip an inline into a library call (or vice versa). Operand-only edits can’t insert/remove a call sequence.

8. **Architectural encoding limits**
   Some instructions simply lack an immediate form for the needed width (e.g., `cmp r/m64, imm` sign-extends imm32; if you needed a distinct 64-bit constant outside the sign-extendable range, there is no encoding to keep length identical).

9. **Constant folded away**
   If A makes a branch statically dead (DCE removes the whole block) and B would keep it alive, the “missing” code isn’t present to re-enable with an operand tweak.

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locating `THROTL_GRP_QUANTUM` in assembly (and the more general question of feasibility of #16) #30

`THROTL_GRP_QUANTUM`

Adding `attribute((optimize("O0")))` to `throtl_dispatch_tg` function

Adding `attribute((optimize("O2","-fno-unroll-loops","-fno-peel-loops","-fno-unswitch-loops")))`

Dump optimization record

Feasibility of #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Locating THROTL_GRP_QUANTUM in assembly (and the more general question of feasibility of #16) #30

Description

THROTL_GRP_QUANTUM

Adding __attribute__((optimize("O0"))) to throtl_dispatch_tg function

Adding __attribute__((optimize("O2","-fno-unroll-loops","-fno-peel-loops","-fno-unswitch-loops")))

Dump optimization record

Feasibility of #16

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Locating `THROTL_GRP_QUANTUM` in assembly (and the more general question of feasibility of #16) #30

`THROTL_GRP_QUANTUM`

Adding `attribute((optimize("O0")))` to `throtl_dispatch_tg` function

Adding `attribute((optimize("O2","-fno-unroll-loops","-fno-peel-loops","-fno-unswitch-loops")))`