-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[core] Queue::drop is acquiring the snatch guard as well
#8691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
| ) = queue_result; | ||
| // Queue::drop is acquiring the snatch lock as well | ||
| drop(snatch_guard); | ||
| } else { | ||
| drop(snatch_guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It seems like the potential for Queue::drop getting called here is when:
- The user calls
Device::maintainon this thread (let's call it thread 1) while holding a last strong ref. to the queue on another thread (let's call that thread 2). - Execution on this thread (1) progresses until we have a strong ref. to
queue, and follow the branch to callqueue.maintain(…). - Thread (2) drops its strong ref., making the strong ref. here in thread 1 the last one.
Does that sound right? Are you using wgpu from multiple threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - thread 1 is calling Instance::poll_all(true) in a loop, and other threads 2+ create/drop queues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a (like eric's) to say:
- Why this is valid
- Why would holding it for longer be a problem.
This kind of thing is really hard to reason about and the effects are deeply non-local so retaining as much of the knowledge as we can about the decisions that were made is super important.
ErichDonGubler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minus some questions I'd like to fully understand before narrowing lock regions.
|
|
||
| // Don't hold the locks while calling release_gpu_resources. | ||
| drop(fence); | ||
| drop(snatch_guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I think it's fine—but I'm not positive—to no longer be locking the snatch lock when we take the device lost closure. Tagging in @cwfitzgerald to double-check my conclusion here. Does that sound right, Connor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason why this would be problematic. The snatch lock doesn't guard the device lost closure in any way.
|
|
||
| // Don't hold the locks while calling release_gpu_resources. | ||
| drop(fence); | ||
| drop(snatch_guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason why this would be problematic. The snatch lock doesn't guard the device lost closure in any way.
| ) = queue_result; | ||
| // Queue::drop is acquiring the snatch lock as well | ||
| drop(snatch_guard); | ||
| } else { | ||
| drop(snatch_guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a (like eric's) to say:
- Why this is valid
- Why would holding it for longer be a problem.
This kind of thing is really hard to reason about and the effects are deeply non-local so retaining as much of the knowledge as we can about the decisions that were made is super important.
Description
Fix a deadlock in
Device::maintainTesting
Manually
Checklist
cargo fmt.Runtaplo format.cargo clippy --tests. If applicable, add:--target wasm32-unknown-unknownRuncargo xtask testto run tests.If this contains user-facing changes, add aCHANGELOG.mdentry.