Skip to content

Rewrote parTraverseN and parTraverseN_ for better performance#4451

Open
djspiewak wants to merge 20 commits intotypelevel:series/3.6.xfrom
djspiewak:bug/parTraverseNPerf
Open

Rewrote parTraverseN and parTraverseN_ for better performance#4451
djspiewak wants to merge 20 commits intotypelevel:series/3.6.xfrom
djspiewak:bug/parTraverseNPerf

Conversation

@djspiewak
Copy link
Member

This shifts to a fully bespoke implementation of parTraverseN and such. There are a few things left to clean up, such as a few more tests and running some comparative benchmarks, but early results are very promising. In particular, the failure case from #4434 appears to be around two to three orders of magnitude faster with this implementation (which makes sense, since it handles early abort correctly). Kudos to @SystemFw for the core idea which makes this possible.

One of the things I'm doing here is giving up entirely on universal fairness and merely focusing on in-batch fairness. A simpler way of saying this is that we are hardened against head of line blocking, both for actions and cancelation.

Fixes #4434

@djspiewak
Copy link
Member Author

Pros and cons on performance, though I think it's possible to do better here. It's a little bit slower than the previous implementation in the happy path, but it's several orders of magnitude faster in the error path so I'll call that a win.

Before

[info] Benchmark                             (cpuTokens)  (size)   Mode  Cnt    Score    Error  Units
[info] ParallelBenchmark.parTraverse               10000    1000  thrpt   10  292.924 ±  1.496  ops/s
[info] ParallelBenchmark.parTraverseN              10000    1000  thrpt   10  277.978 ±  1.280  ops/s
[info] ParallelBenchmark.parTraverseNCancel        10000    1000  thrpt   10    0.006 ±  0.001  ops/s
[info] ParallelBenchmark.traverse                  10000    1000  thrpt   10   48.015 ±  0.016  ops/s

After

[info] Benchmark                             (cpuTokens)  (size)   Mode  Cnt    Score   Error  Units
[info] ParallelBenchmark.parTraverse               10000    1000  thrpt   10  293.834 ± 1.152  ops/s
[info] ParallelBenchmark.parTraverseN              10000    1000  thrpt   10  233.868 ± 0.309  ops/s
[info] ParallelBenchmark.parTraverseNCancel        10000    1000  thrpt   10    7.859 ± 0.014  ops/s
[info] ParallelBenchmark.traverse                  10000    1000  thrpt   10   48.059 ± 0.014  ops/s

@djspiewak
Copy link
Member Author

So I haven't golfed the failure down yet, but it really looks like we're hitting a bug in Scala.js, probably stemming from the "null safe" test. @durban you may be amused

I think we could just remove the null safe test now since we're not using an ArrayBuffer internally, but it's kind of a neat surprise.

@durban
Copy link
Contributor

durban commented Jul 23, 2025

Well, "amused" is one word for it :-) So it's not a bug in Scala.js, as in, it behaves as documented: dereferencing null is undefined behavior in Scala.js (LOL, what? Seriously.), so literally any behavior is "behaving as documented". Apparently scalaJSLinkerConfig could be configured to behave properly for nulls. But removing that very specific test is also fine I think.

@djspiewak
Copy link
Member Author

Well that's fun. I actually thought we had some special checking for when the cur0 action became null in the runloop, but apparently not.

@durban
Copy link
Contributor

durban commented Jul 23, 2025

@djspiewak
Copy link
Member Author

Ahhhhhhh that makes sense. Okay, by that token, I think it's fair to say that a lot of our combinators just aren't null-safe and that's how it's going to be. :P

@durban
Copy link
Contributor

durban commented Aug 9, 2025

It's annoying, because in Scala they are null safe. (The test passed before, it just failed on JS.) We'd have to do something like this (everywhere), to make it work on JS:

def combinator(fa: F[A], ...) = {
  if (fa eq null) throw new NullPointerException
}

Which is (1) annoying, (2) very redundant, except on Scala.js, and (3) apparently has performance problems in Scala.js (or maybe that's only the linker setting?).

I don't propose we do this. There is a Scala.js linker setting which fixes the problem. In Scala and Scala Native it works by default.

@durban
Copy link
Contributor

durban commented Aug 9, 2025

(Just some context about the null test you've removed: I've added that on my old branch, because I've had a bug previously in my implementation, where it didn't handle correctly if f(a) was null. I don't remember exactly, but I think it just ignored it. So I've added the test to make sure we see the NPE. As I've said, I think it's fine to remove it here.)

@mr-git
Copy link
Contributor

mr-git commented Sep 29, 2025

Could this fix hit 3.6.4?

@djspiewak
Copy link
Member Author

Could this fix hit 3.6.4?

There are a couple failing tests related to early termination that I'm still trying to track down. Am trying to find the spare time needed to push on it. Help definitely welcome! Otherwise I'll probably get to it within the next few weeks. Sorry :(

@domaspoliakas
Copy link
Contributor

I see that last CI is green, do you mean that you want to reintroduce the tests removed in this commit? 599b790

@djspiewak djspiewak force-pushed the bug/parTraverseNPerf branch from fc113cb to 584ce3b Compare March 8, 2026 15:51
@djspiewak djspiewak marked this pull request as ready for review March 8, 2026 18:03
@djspiewak djspiewak added this to the v3.6.next milestone Mar 8, 2026
@djspiewak
Copy link
Member Author

@durban Would you mind taking another look here? I think I got this all sorted out.

@durban
Copy link
Contributor

durban commented Mar 9, 2026

Yes, I'll take another look (hopefully sometime this week).

Copy link
Contributor

@durban durban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djspiewak I've pushed 2 failing tests. The idea is the same for them: let every fiber start; then one of them self-cancels/fails; let awaitAll observe this through preempt; and the problem is, awaitAll succeeds in this case, and thus the fibers will not be cancelled.

(I've also added another comment.)

case Outcome.Errored(_) | Outcome.Canceled() => preempt.complete(None) *> cancelAll
}

work *> resurface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, but this means we can get cancelled when we're already done. If work completes successfully, i.e., we're literally done, there is no reason really, to observe a cancellation. (It feels a little bit like the timeout changes.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a correctness issue actually because we could lose data. I'll correct it.

@durban
Copy link
Contributor

durban commented Mar 11, 2026

Also: I thought the idea was, that there is no need to release the semaphore in case there is an error/cancel, because we're shutting down all the things anyway... This latest versions seems to release the semaphore always.

(Also: sorry for pushing to your branch, I've just didn't feel like going through the ceremony of opening a PR-for-the-PR. You can obviously roll back if I've messed up something.)

@djspiewak
Copy link
Member Author

Also: I thought the idea was, that there is no need to release the semaphore in case there is an error/cancel, because we're shutting down all the things anyway... This latest versions seems to release the semaphore always.

It was initially easier to structure it that way, but this is a good point. I should move this back to the happy path exclusively.

(Also: sorry for pushing to your branch, I've just didn't feel like going through the ceremony of opening a PR-for-the-PR. You can obviously roll back if I've messed up something.)

No worries at all! I think that's totally reasonable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants