fast-get: fix partition leak for multi-thread usage#10606
fast-get: fix partition leak for multi-thread usage#10606tmleman wants to merge 1 commit intothesofproject:mainfrom
Conversation
Each thread that calls fast_get() on a large buffer gets a partition added to its memory domain. Previously, fast_put() only removed the partition from the allocating thread's domain (entry->thread), leaking partitions for any additional threads that were granted access. Fix by having each thread remove its OWN partition on fast_put() using k_current_get() instead of entry->thread. The partition removal now happens unconditionally (not just on last reference), ensuring each thread cleans up after itself. Order of operations: 1. Free buffer if last reference (while partition still grants access) 2. Remove current thread's partition (prevents leaks) 3. Clear entry if last reference Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to fix a userspace memory-domain partition leak when multiple threads call fast_get() on large buffers by ensuring partitions are removed on fast_put() for all participating threads/domains.
Changes:
- Adjust
fast_put()to free the SRAM buffer when the last reference is released. - Change partition removal logic to remove a large-buffer partition from the current thread’s memory domain (instead of
entry->thread), and do so outside the “last ref” block.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #if CONFIG_USERSPACE | ||
| if (entry->size > FAST_GET_MAX_COPY_SIZE) { | ||
| struct k_mem_partition part = { | ||
| .start = (uintptr_t)entry->sram_ptr, | ||
| .size = ALIGN_UP(entry->size, CONFIG_MM_DRV_PAGE_SIZE), | ||
| .attr = K_MEM_PARTITION_P_RO_U_RO | XTENSA_MMU_CACHED_WB, | ||
| }; | ||
| struct k_mem_domain *domain = k_current_get()->mem_domain_info.mem_domain; | ||
|
|
||
| LOG_DBG("removing partition %p size %#zx from thread %p", | ||
| (void *)part.start, part.size, k_current_get()); | ||
| int err = k_mem_domain_remove_partition(domain, &part); | ||
|
|
||
| if (err) | ||
| LOG_WRN("partition removal failed: %d", err); | ||
| } |
There was a problem hiding this comment.
Partition removal is now executed on every fast_put() for large buffers, even when entry->refcount remains > 0. This breaks fast_get()/fast_put() reference counting: a thread can call fast_get() multiple times and still expect access after only some puts, but the first put would revoke access by removing the partition while the buffer is still in use. Partition removal needs to be tied to the last reference for the relevant owner (e.g., per-thread/per-domain refcount), not performed unconditionally on each put.
| }; | ||
| struct k_mem_domain *domain = k_current_get()->mem_domain_info.mem_domain; | ||
|
|
||
| LOG_DBG("removing partition %p size %#zx from thread %p", | ||
| (void *)part.start, part.size, k_current_get()); | ||
| int err = k_mem_domain_remove_partition(domain, &part); | ||
|
|
There was a problem hiding this comment.
k_mem_domain_remove_partition() operates on a memory domain, which can be shared by multiple threads. Removing the partition from k_current_get()->mem_domain_info.mem_domain can therefore revoke access for other threads in the same domain that still hold references (entry->refcount > 0). To avoid use-after-revoke faults, this code should track granted partitions per-domain (and only remove when the last user in that domain releases), or otherwise enforce/document that each fast_get() user has a unique mem_domain.
| #if CONFIG_USERSPACE | ||
| if (entry->size > FAST_GET_MAX_COPY_SIZE) { | ||
| struct k_mem_partition part = { | ||
| .start = (uintptr_t)entry->sram_ptr, | ||
| .size = ALIGN_UP(entry->size, CONFIG_MM_DRV_PAGE_SIZE), | ||
| .attr = K_MEM_PARTITION_P_RO_U_RO | XTENSA_MMU_CACHED_WB, | ||
| }; | ||
| struct k_mem_domain *domain = k_current_get()->mem_domain_info.mem_domain; | ||
|
|
||
| LOG_DBG("removing partition %p size %#zx from thread %p", | ||
| (void *)part.start, part.size, k_current_get()); | ||
| int err = k_mem_domain_remove_partition(domain, &part); | ||
|
|
||
| if (err) | ||
| LOG_WRN("partition removal failed: %d", err); | ||
| } |
There was a problem hiding this comment.
This change targets CONFIG_USERSPACE behavior (partition add/remove), but existing fast_get/fast_put unit tests don’t appear to exercise userspace + large-buffer paths (e.g., multi-thread, partition add/remove, and the intended leak prevention). Consider adding a Zephyr ztest that enables CONFIG_USERSPACE and validates that: (1) multiple gets/puts from the same thread keep access until the final put, and (2) multiple threads (potentially in different domains) don’t leak partitions after puts.
Each thread that calls fast_get() on a large buffer gets a partition added to its memory domain. Previously, fast_put() only removed the partition from the allocating thread's domain (entry->thread), leaking partitions for any additional threads that were granted access.
Fix by having each thread remove its OWN partition on fast_put() using k_current_get() instead of entry->thread. The partition removal now happens unconditionally (not just on last reference), ensuring each thread cleans up after itself.
Order of operations: