Skip to content

Introduce an IPC semaphore mechanism which is robust to process deaths #26463

@mlugg

Description

@mlugg

This is a proposal to add a new API to Serenity's libc implementation. I spoke about this with @linusg who said it would be worth opening an issue here discussing the use case.

Background

I'm a part of the Zig project, and have been working on a protocol intended to improve upon the GNU Jobserver protocol. For anyone unfamiliar, the idea of a job server is that you can limit the amount of CPU-intensive work being simultaneously done by a process tree (e.g. a bunch of compilers invoked by make) to prevent excessive context switching and heavy scheduler load; it's basically a thread pool but across different processes and programs. There is a protocol for this designed by GNU, but it has a problem: if a process dies while holding a "token" (permission to do work), that token effectively leaks. At worst, this causes a deadlock, when all tokens were held by processes that died and hence will never be returned to the pool. I'll call this problem "robustness" (this term sort of comes from POSIX; see pthread_mutexattr_setrobust).

The protocol I've been working on (here) solves this problem by using primitives which the OS will clean up on process death. The current implementation on most POSIX systems is based on System V semaphores: this is an old primitive which is largely considered deprecated, but which provides the "robustness" guarantee we need through the SEM_UNDO flag, which instructs the kernel to revert all changes applied to the semaphore by this process once it dies.

SerenityOS doesn't support this primitive. To be clear, that's reasonable: System V IPC is seldom used nowadays, and it's a pretty weird API. As such, I'm not suggesting implementing it. However, it would be good if Serenity had an answer to this use case.

Proposed API

I think the most obvious way of doing this would be to extend the POSIX semaphore API with something akin to the System V SEM_UNDO flag I mentioned: a way to increment/decrement a semaphore such that the operation is undone by the system when the process exits. One could imagine these new functions:

/* Like `sem_post`, but the operation is reverted by the system when this process exits. */
int sem_post_robust(sem_t *);
/* Like `sem_wait`, but the operation is reverted by the system when this process exits. */
int sem_wait_robust(sem_t *);
int sem_trywait_robust(sem_t *);
int sem_timedwait_robust(sem_t *, const struct timespec *abstime);

The implementation could look like how other systems deal with SEM_UNDO: each of these functions triggers a system call which adjusts both the semaphore value and a per-semaphore-per-process "adjustment" integer. When the process exits, its "adjustment" integers are added back to the corresponding semaphore values. This would satisfy the use case.

If you want to play on hard mode, you could implement a variant of this which does not perform a system call in the uncontended case. In short, we can tell the kernel about the address in memory of an "adjustment" integer and the semaphore to which it applies, and from then on, update it purely in userland, having the kernel apply the adjustment from that value on process exit. There's a complexity here though, which is that the process could receive a signal between updating the actual semaphore and updating its adjustment integer. This race is what makes it kinda difficult to do this without a system call: you'd either need to somehow stop the process being interrupted between those two operations, or to detect that condition in the kernel (e.g. use inline assembly in userland, and have check if the program counter address lies at a certain instruction in it?) and account for it. To be clear, though, this isn't something necessary: every other system right now requires a syscall for acquiring and releasing "job tokens" in the (not-yet-finalized) protocol, so Serenity wouldn't be abnormally inefficient for just doing the same!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions