Rewrote `parTraverseN` and `parTraverseN_` for better performance by djspiewak · Pull Request #4451 · typelevel/cats-effect

djspiewak · 2025-07-22T21:01:34Z

This shifts to a fully bespoke implementation of parTraverseN and such. There are a few things left to clean up, such as a few more tests and running some comparative benchmarks, but early results are very promising. In particular, the failure case from #4434 appears to be around two to three orders of magnitude faster with this implementation (which makes sense, since it handles early abort correctly). Kudos to @SystemFw for the core idea which makes this possible.

One of the things I'm doing here is giving up entirely on universal fairness and merely focusing on in-batch fairness. A simpler way of saying this is that we are hardened against head of line blocking, both for actions and cancelation.

Fixes #4434

…gical

djspiewak · 2025-07-23T15:13:42Z

Pros and cons on performance, though I think it's possible to do better here. It's a little bit slower than the previous implementation in the happy path, but it's several orders of magnitude faster in the error path so I'll call that a win.

Before

[info] Benchmark                             (cpuTokens)  (size)   Mode  Cnt    Score    Error  Units
[info] ParallelBenchmark.parTraverse               10000    1000  thrpt   10  292.924 ±  1.496  ops/s
[info] ParallelBenchmark.parTraverseN              10000    1000  thrpt   10  277.978 ±  1.280  ops/s
[info] ParallelBenchmark.parTraverseNCancel        10000    1000  thrpt   10    0.006 ±  0.001  ops/s
[info] ParallelBenchmark.traverse                  10000    1000  thrpt   10   48.015 ±  0.016  ops/s

After

[info] Benchmark                             (cpuTokens)  (size)   Mode  Cnt    Score   Error  Units
[info] ParallelBenchmark.parTraverse               10000    1000  thrpt   10  293.834 ± 1.152  ops/s
[info] ParallelBenchmark.parTraverseN              10000    1000  thrpt   10  233.868 ± 0.309  ops/s
[info] ParallelBenchmark.parTraverseNCancel        10000    1000  thrpt   10    7.859 ± 0.014  ops/s
[info] ParallelBenchmark.traverse                  10000    1000  thrpt   10   48.059 ± 0.014  ops/s

djspiewak · 2025-07-23T17:40:29Z

So I haven't golfed the failure down yet, but it really looks like we're hitting a bug in Scala.js, probably stemming from the "null safe" test. @durban you may be amused

I think we could just remove the null safe test now since we're not using an ArrayBuffer internally, but it's kind of a neat surprise.

durban · 2025-07-23T19:25:28Z

Well, "amused" is one word for it :-) So it's not a bug in Scala.js, as in, it behaves as documented: dereferencing null is undefined behavior in Scala.js (LOL, what? Seriously.), so literally any behavior is "behaving as documented". Apparently scalaJSLinkerConfig could be configured to behave properly for nulls. But removing that very specific test is also fine I think.

djspiewak · 2025-07-23T19:35:48Z

Well that's fun. I actually thought we had some special checking for when the cur0 action became null in the runloop, but apparently not.

durban · 2025-07-23T19:43:47Z

That we do check. It's this line: https://github.com/typelevel/cats-effect/blob/series/3.x/core/shared/src/main/scala/cats/effect/IO.scala#L2024 (and fa is null).

djspiewak · 2025-07-23T19:44:56Z

Ahhhhhhh that makes sense. Okay, by that token, I think it's fair to say that a lot of our combinators just aren't null-safe and that's how it's going to be. :P

durban · 2025-08-09T20:32:04Z

It's annoying, because in Scala they are null safe. (The test passed before, it just failed on JS.) We'd have to do something like this (everywhere), to make it work on JS:

def combinator(fa: F[A], ...) = {
  if (fa eq null) throw new NullPointerException
}

Which is (1) annoying, (2) very redundant, except on Scala.js, and (3) apparently has performance problems in Scala.js (or maybe that's only the linker setting?).

I don't propose we do this. There is a Scala.js linker setting which fixes the problem. In Scala and Scala Native it works by default.

kernel/shared/src/main/scala/cats/effect/kernel/GenConcurrent.scala

durban · 2025-08-09T21:10:06Z

(Just some context about the null test you've removed: I've added that on my old branch, because I've had a bug previously in my implementation, where it didn't handle correctly if f(a) was null. I don't remember exactly, but I think it just ignored it. So I've added the test to make sure we see the NPE. As I've said, I think it's fine to remove it here.)

mr-git · 2025-09-29T08:23:08Z

Could this fix hit 3.6.4?

djspiewak · 2025-09-29T14:04:10Z

Could this fix hit 3.6.4?

There are a couple failing tests related to early termination that I'm still trying to track down. Am trying to find the spare time needed to push on it. Help definitely welcome! Otherwise I'll probably get to it within the next few weeks. Sorry :(

domaspoliakas · 2025-10-01T10:31:28Z

I see that last CI is green, do you mean that you want to reintroduce the tests removed in this commit? 599b790

… work

djspiewak · 2026-03-08T18:27:17Z

@durban Would you mind taking another look here? I think I got this all sorted out.

durban · 2026-03-09T18:25:55Z

Yes, I'll take another look (hopefully sometime this week).

durban

@djspiewak I've pushed 2 failing tests. The idea is the same for them: let every fiber start; then one of them self-cancels/fails; let awaitAll observe this through preempt; and the problem is, awaitAll succeeds in this case, and thus the fibers will not be cancelled.

(I've also added another comment.)

durban · 2026-03-11T01:52:44Z

kernel/shared/src/main/scala/cats/effect/kernel/GenConcurrent.scala

+            case Outcome.Errored(_) | Outcome.Canceled() => preempt.complete(None) *> cancelAll
+          }
+
+          work *> resurface


Minor thing, but this means we can get cancelled when we're already done. If work completes successfully, i.e., we're literally done, there is no reason really, to observe a cancellation. (It feels a little bit like the timeout changes.)

This is also a correctness issue actually because we could lose data. I'll correct it.

durban · 2026-03-11T03:09:20Z

Also: I thought the idea was, that there is no need to release the semaphore in case there is an error/cancel, because we're shutting down all the things anyway... This latest versions seems to release the semaphore always.

(Also: sorry for pushing to your branch, I've just didn't feel like going through the ceremony of opening a PR-for-the-PR. You can obviously roll back if I've messed up something.)

djspiewak · 2026-03-11T18:48:41Z

Also: I thought the idea was, that there is no need to release the semaphore in case there is an error/cancel, because we're shutting down all the things anyway... This latest versions seems to release the semaphore always.

It was initially easier to structure it that way, but this is a good point. I should move this back to the happy path exclusively.

(Also: sorry for pushing to your branch, I've just didn't feel like going through the ceremony of opening a PR-for-the-PR. You can obviously roll back if I've messed up something.)

No worries at all! I think that's totally reasonable

djspiewak added 6 commits July 22, 2025 14:13

Added benchmarks to reproduce issue

9ca5eec

Reimplemented parTraverseN to be much more efficient and less patholo…

d609d3e

…gical

Fixed surfacing of inner errors/cancelation in parTraverseN_

9f0b328

Swap to deferred to avoid orphaned errors

607ab9a

Ported over @durban's tests and fixed

569ce54

Added early abort when stop case is encountered

0109cc3

djspiewak requested a review from durban July 22, 2025 21:01

djspiewak added the 🍄 enhancement label Jul 22, 2025

djspiewak added 2 commits July 22, 2025 19:01

Organized imports

a3f8fe8

Fixed null test for Scala 3

6a2588c

Removed spurious test that triggered scala.js bugs

599b790

djspiewak added 2 commits July 23, 2025 18:12

Added test for error short-circuiting

2e276bb

Restructured failure case propagation

3649a5d

yhefamly mentioned this pull request Aug 8, 2025

Add parFlatTraverseN and parFlatSequenceN #4468

Open

durban reviewed Aug 9, 2025

View reviewed changes

Added more tests and shuffled preemption paths to actually, you know,…

584ce3b

… work

djspiewak force-pushed the bug/parTraverseNPerf branch from fc113cb to 584ce3b Compare March 8, 2026 15:51

djspiewak added 2 commits March 8, 2026 12:27

We need a bit more time in CI

4c2e0d7

Moved short circuiting tests to JVM platform

7709fd1

djspiewak added 2 commits March 8, 2026 13:06

Scalafix

4aa9ae7

Updated scaladoc and fixed a laziness bug

417d359

djspiewak marked this pull request as ready for review March 8, 2026 18:03

djspiewak added this to the v3.6.next milestone Mar 8, 2026

Fixed function suspension in fiber

e195227

durban added 3 commits March 11, 2026 02:37

Fix typo

6f90794

Add failing tests

69e4abf

scalafmt

2848998

durban reviewed Mar 11, 2026

View reviewed changes

Uh oh!

Conversation

djspiewak commented Jul 22, 2025

Uh oh!

djspiewak commented Jul 23, 2025

Before

After

Uh oh!

djspiewak commented Jul 23, 2025

Uh oh!

durban commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djspiewak commented Jul 23, 2025

Uh oh!

durban commented Jul 23, 2025

Uh oh!

djspiewak commented Jul 23, 2025

Uh oh!

durban commented Aug 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

durban commented Aug 9, 2025

Uh oh!

mr-git commented Sep 29, 2025

Uh oh!

djspiewak commented Sep 29, 2025

Uh oh!

domaspoliakas commented Oct 1, 2025

Uh oh!

djspiewak commented Mar 8, 2026

Uh oh!

durban commented Mar 9, 2026

Uh oh!

durban left a comment

Choose a reason for hiding this comment

Uh oh!

durban Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

djspiewak Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

durban commented Mar 11, 2026

Uh oh!

djspiewak commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

durban commented Jul 23, 2025 •

edited

Loading