[nvidia_unstable-10.1] Handle KVM_CAP_ARM_RME ABI change between kernel versions by ianm-nv · Pull Request #6 · NVIDIA/QEMU

ianm-nv · 2025-12-05T16:13:05Z

The KVM_CAP_ARM_RME capability number changed between Linux kernel versions 6.16 and 6.17 (from 243 to 244). This change adds runtime detection of the host kernel version to ensure QEMU remains compatible with both kernel versions during the transition period.

The implementation:

Defaults to KVM_CAP_ARM_RME (244) from the updated headers
Detects kernel 6.16 at runtime and uses the previous value (243)
Applies the correct capability number to all RME-related ioctls

This allows the same QEMU binary to work correctly on hosts running either kernel version.

Testing was successful on CCA hardware with both the 6.16 and 6.17 kernel.

The KVM_CAP_ARM_RME capability number changed between Linux kernel versions 6.16 and 6.17 (from 243 to 244). This change adds runtime detection of the host kernel version to ensure QEMU remains compatible with both kernel versions during the transition period. The implementation: - Defaults to KVM_CAP_ARM_RME (244) from the updated headers - Detects kernel 6.16 at runtime and uses the previous value (243) - Applies the correct capability number to all RME-related ioctls This allows the same QEMU binary to work correctly on hosts running either kernel version. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Ian May <ianm@nvidia.com>

nvmochs · 2025-12-05T17:18:10Z

I confirmed all the kvm_vm_enable_cap using KVM_CAP_ARM_RME have been converted to use the variable instead. The logic for kernel detection looks good.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

NathanChenNVIDIA

Acked-by: Nathan Chen nathanc@nvidia.com

nvmochs · 2025-12-05T18:11:16Z

Merged, closing PR.

Commit e27194e087ae ("virtio-gpu-virgl: correct parent for blob memory region") made the name member of MemoryRegion unset, causing a NULL pointer dereference[1]: > Thread 2 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. > (gdb) bt > #0 0x00007ffff56565e2 in __strcmp_evex () at /lib64/libc.so.6 > #1 0x0000555555841bdb in find_fd (head=0x5555572337d0 <cpr_state>, > name=0x0, id=0) at ../migration/cpr.c:68 > #2 cpr_delete_fd (name=name@entry=0x0, id=id@entry=0) at > ../migration/cpr.c:77 > #3 0x000055555582290a in qemu_ram_free (block=0x7ff7e93aa7f0) at > ../system/physmem.c:2615 > #4 0x000055555581ae02 in memory_region_finalize (obj=<optimized out>) > at ../system/memory.c:1816 > #5 0x0000555555a70ab9 in object_deinit (obj=<optimized out>, > type=<optimized out>) at ../qom/object.c:715 > #6 object_finalize (data=0x7ff7e936eff0) at ../qom/object.c:729 > #7 object_unref (objptr=0x7ff7e936eff0) at ../qom/object.c:1232 > #8 0x0000555555814fae in memory_region_unref (mr=<optimized out>) at > ../system/memory.c:1848 > #9 flatview_destroy (view=0x555559ed6c40) at ../system/memory.c:301 > #10 0x0000555555bfc122 in call_rcu_thread (opaque=<optimized out>) at > ../util/rcu.c:324 > #11 0x0000555555bf17a7 in qemu_thread_start (args=0x555557b99520) at > ../util/qemu-thread-posix.c:393 > #12 0x00007ffff556f464 in start_thread () at /lib64/libc.so.6 > #13 0x00007ffff55f25ac in __clone3 () at /lib64/libc.so.6 The intention of the aforementioned commit is to prevent a MemoryRegion from parenting itself while its references is counted indendependently of the device. To achieve the same goal, add a type of QOM objects that count references and parent MemoryRegions. [1] https://lore.kernel.org/qemu-devel/4eb93d7a-1fa9-4b3c-8ad7-a2eb64f025a0@collabora.com/ Cc: qemu-stable@nongnu.org Fixes: e27194e087ae ("virtio-gpu-virgl: correct parent for blob memory region") Fixes: be88ad4 ("virtio-gpu-virgl: correct parent for blob memory region") for 10.2.x Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Tested-by: Joelle van Dyne <j@getutm.app> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260214-region-v1-1-229f00ae1f38@rsg.ci.i.u-tokyo.ac.jp> (cherry picked from commit b2a279094c3b86667969cc645f7fb1087e08dd19) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

The test case in the ppe42 functional test triggers a TCG debug assertion, which causes the test to fail in an --enable-debug build or when the sanitizers are enabled: #6 0x00007ffff4a3b517 in __assert_fail (assertion=0x5555562e7589 "!temp_readonly(ots)", file=0x5555562e5b23 "../../tcg/tcg.c", line=4928, function=0x5555562e8900 <__PRETTY_FUNCTION__.23> "tcg_reg_alloc_mov") at ./assert/assert.c:105 #7 0x0000555555cc2189 in tcg_reg_alloc_mov (s=0x7fff60000b70, op=0x7fff600126f8) at ../../tcg/tcg.c:4928 #8 0x0000555555cc74e0 in tcg_gen_code (s=0x7fff60000b70, tb=0x7fffa802f540, pc_start=4294446080) at ../../tcg/tcg.c:6667 #9 0x0000555555d02abe in setjmp_gen_code (env=0x555556cbe610, tb=0x7fffa802f540, pc=4294446080, host_pc=0x7fffeea00c00, max_insns=0x7fffee9f9d74, ti=0x7fffee9f9d90) at ../../accel/tcg/translate-all.c:257 #10 0x0000555555d02d75 in tb_gen_code (cpu=0x555556cba590, s=...) at ../../accel/tcg/translate-all.c:325 #11 0x0000555555cf5922 in cpu_exec_loop (cpu=0x555556cba590, sc=0x7fffee9f9ee0) at ../../accel/tcg/cpu-exec.c:970 #12 0x0000555555cf5aae in cpu_exec_setjmp (cpu=0x555556cba590, sc=0x7fffee9f9ee0) at ../../accel/tcg/cpu-exec.c:1016 #13 0x0000555555cf5b4b in cpu_exec (cpu=0x555556cba590) at ../../accel/tcg/cpu-exec.c:1042 #14 0x0000555555d1e7ab in tcg_cpu_exec (cpu=0x555556cba590) at ../../accel/tcg/tcg-accel-ops.c:82 #15 0x0000555555d1ff97 in rr_cpu_thread_fn (arg=0x555556cba590) at ../../accel/tcg/tcg-accel-ops-rr.c:285 #16 0x00005555561586c9 in qemu_thread_start (args=0x555556ee3c90) at ../../util/qemu-thread-posix.c:393 #17 0x00007ffff4a9caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447 #18 0x00007ffff4b29c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 This can be reproduced "by hand": ./build/clang/qemu-system-ppc -display none -vga none \ -machine ppe42_machine -serial stdio \ -device loader,file=$HOME/.cache/qemu/download/03c1ac0fb7f6c025102a02776a93b35101dae7c14b75e4eab36a337e39042ea8 \ -device loader,addr=0xfff80040,cpu-num=0 (assuming you have the image file from the functional test in your local cache). This happens for this input: IN: 0xfff80c00: 07436004 .byte 0x07, 0x43, 0x60, 0x04 which generates (among other things): not_i32 $0x80000,$0x80000 which the TCG optimization pass turns into: mov_i32 $0x80000,$0xfff7ffff dead: 1 pref=0xffff and where we then assert because we tried to write to a constant. This happens for the CLRBWIBC instruction which ends up in do_mask_branch() with rb_is_gpr false and invert true. In this case we will generate code that sets mask to a tcg_constant_tl() but then uses it as the LHS in tcg_gen_not_tl(). Fix the assertion by doing the invert in the translate time C code for the "mask is constant" case. Cc: qemu-stable@nongnu.org Fixes: f7ec91c ("target/ppc: Add IBM PPE42 special instructions") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Glenn Miles <milesg@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/qemu-devel/20260212150753.1749448-1-peter.maydell@linaro.org Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com> (cherry picked from commit 78c6b6010ce7cfa54874dda514e694640b76f1e4) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

Details: https://gitlab.com/qemu-project/qemu/-/issues/3144 The function schedule_next_request is called with tg->lock held and it may call throttle_group_co_restart_queue, which takes tgm->throttled_reqs_lock, qemu_co_mutex_lock may leave current coroutine if other iothread has taken the lock. If the next coroutine will call throttle_group_co_io_limits_intercept - it will try to take the mutex tg->lock which will never be released. Here is the backtrace of the iothread: Thread 30 (Thread 0x7f8aad1fd6c0 (LWP 24240) "IO iothread2"): #0 futex_wait (futex_word=0x5611adb7d828, expected=2, private=0) at ../sysdeps/nptl/futex-internal.h:146 #1 __GI___lll_lock_wait (futex=futex@entry=0x5611adb7d828, private=0) at lowlevellock.c:49 #2 0x00007f8ab5a97501 in lll_mutex_lock_optimized (mutex=0x5611adb7d828) at pthread_mutex_lock.c:48 #3 ___pthread_mutex_lock (mutex=0x5611adb7d828) at pthread_mutex_lock.c:93 #4 0x00005611823f5482 in qemu_mutex_lock_impl (mutex=0x5611adb7d828, file=0x56118289daca "../block/throttle-groups.c", line=372) at ../util/qemu-thread-posix.c:94 #5 0x00005611822b0b39 in throttle_group_co_io_limits_intercept (tgm=0x5611af1bb4d8, bytes=4096, direction=THROTTLE_READ) at ../block/throttle-groups.c:372 #6 0x00005611822473b1 in blk_co_do_preadv_part (blk=0x5611af1bb490, offset=15972311040, bytes=4096, qiov=0x7f8aa4000f98, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1354 #7 0x0000561182247fa0 in blk_aio_read_entry (opaque=0x7f8aa4005910) at ../block/block-backend.c:1619 #8 0x000056118241952e in coroutine_trampoline (i0=-1543497424, i1=32650) at ../util/coroutine-ucontext.c:175 #9 0x00007f8ab5a56f70 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:66 from target:/lib64/libc.so.6 #10 0x00007f8aad1ef190 in ?? () #11 0x0000000000000000 in ?? () The lock is taken in line 386: (gdb) p tg.lock $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 24240, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\260^\000\000\001", '\000' <repeats 26 times>, __align = 2}, file = 0x56118289daca "../block/throttle-groups.c", line = 386, initialized = true} The solution is to use tg->lock to protect both ThreadGroup fields and ThrottleGroupMember.throttled_reqs. It doesn't seem to be possible to use separate locks because we need to first manipulate ThrottleGroup fields, then schedule next coroutine using throttled_reqs and after than update token field from ThrottleGroup depending on the throttled_reqs state. Signed-off-by: Dmitry Guryanov <dmitry.guryanov@gmail.com> Message-ID: <20251208085528.890098-1-dmitry.guryanov@gmail.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit d4816177654d59e26ce212c436513f01842eb410) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

The current implementation of qdev_get_printable_name() sometimes returns a string that must not be freed (vdev->id or the fixed fallback string "<unknown device>" and sometimes returns a string that must be freed (the return value of qdev_get_dev_path()). This forces callers to leak the string in the "must be freed" case. Make the function consistent that it always returns a string that the caller must free, and make the three callsites free it. This fixes leaks like this that show up when running "make check" with the address sanitizer enabled: Direct leak of 13 byte(s) in 1 object(s) allocated from: #0 0x5561de21f293 in malloc (/home/pm215/qemu/build/san/qemu-system-i386+0x1a2d293) (BuildId: 6d6fad7130fd5c8dbbc03401df554f68b8034936) #1 0x767ad7a82ac9 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x62ac9) (BuildId: 116e142b9b52c8a4dfd403e759e71ab8f95d8bb3) #2 0x5561deaf34f2 in pcibus_get_dev_path /home/pm215/qemu/build/san/../../hw/pci/pci.c:2792:12 #3 0x5561df9d8830 in qdev_get_printable_name /home/pm215/qemu/build/san/../../hw/core/qdev.c:431:24 #4 0x5561deebdca2 in virtio_init_region_cache /home/pm215/qemu/build/san/../../hw/virtio/virtio.c:298:17 #5 0x5561df05f842 in memory_region_write_accessor /home/pm215/qemu/build/san/../../system/memory.c:491:5 #6 0x5561df05ed1b in access_with_adjusted_size /home/pm215/qemu/build/san/../../system/memory.c:567:18 #7 0x5561df05e3fa in memory_region_dispatch_write /home/pm215/qemu/build/san/../../system/memory.c #8 0x5561df0aa805 in address_space_stm_internal /home/pm215/qemu/build/san/../../system/memory_ldst.c.inc:85:13 #9 0x5561df0bcad3 in qtest_process_command /home/pm215/qemu/build/san/../../system/qtest.c:480:13 Cc: qemu-stable@nongnu.org Fixes: e209d4d ("virtio: improve virtqueue mapping error messages") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20260307155046.3940197-3-peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> (cherry picked from commit 1e3e1d51e20e8b38efa089bf54b5ee2cbbcca221) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

NathanChenNVIDIA approved these changes Dec 5, 2025

View reviewed changes

nvmochs closed this Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nvidia_unstable-10.1] Handle KVM_CAP_ARM_RME ABI change between kernel versions#6

[nvidia_unstable-10.1] Handle KVM_CAP_ARM_RME ABI change between kernel versions#6
ianm-nv wants to merge 1 commit intoNVIDIA:nvidia_unstable-10.1from
ianm-nv:nvidia_unstable-10.1+abi-compat

ianm-nv commented Dec 5, 2025

Uh oh!

nvmochs commented Dec 5, 2025

Uh oh!

NathanChenNVIDIA left a comment

Uh oh!

nvmochs commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ianm-nv commented Dec 5, 2025

Uh oh!

nvmochs commented Dec 5, 2025

Uh oh!

NathanChenNVIDIA left a comment

Choose a reason for hiding this comment

Uh oh!

nvmochs commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants