Skip to content

Conversation

@peter-reinholdt
Copy link
Contributor

Some fixes for threading:

Previously, the chunk boundary was computed as:

ceil(sqrt(thread_idx / num_threads) * sideLength);

Since thread_idx / num_threads performed integer division, this always evaluated to 0 for thread_idx < num_threads.
I also removed the sqrt, since (as far as I can see), where the end_chunk_idx is used, it covers "linear" work, not triangular/quadratically growing work.

Additionally, I encountered segfaults on large problem sizes (many determinants), which I eventually tracked down to a heap-use-after-free from pyci/src/hci.cpp:277.
Not entirely sure why this would happen, but maybe the order of threads joining is not guaranteed to correspond to the order of the v_wfns? The proposed change first joins all threads, then adds the determinants:

    for (auto &thread : v_threads) thread.join();
    for (auto &wf : v_wfns) wfn.add_dets_from_wfn(wf);

@msricher
Copy link
Collaborator

Thank you!! I'll merge this soon.

@msricher msricher merged commit bb546bf into theochem:master Aug 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants