Fixes for LoongArch LSX + fast math by mr-c · Pull Request #1391 · simd-everywhere/simde

mr-c · 2026-02-22T09:06:43Z

A subset of #1369 , so we can merge the major fixes

__lsx_vftintrz_w_d accepts two __m128d arguments, so it's should be called with zero_f64 that is declared. This fixes the following compilation error that I get when compiling current simde master for loongarch64-linux-gnu with gcc 14.3.1 and `-Ofast -mlsx -mlasx` in CFLAGS: ../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde__m128i simde_mm_cvttpd_epi32(simde__m128d)’: ../test/x86/avx512/../../../simde/x86/sse2.h:3736:39: error: ‘zero_i64’ was not declared in this scope; did you mean ‘zero_f64’? 3736 | r_.lsx_i64 = __lsx_vftintrz_w_d(zero_i64, simde__m128d_to_private(a).lsx_f64); | ^~~~~~~~ | zero_f64 Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>

Similarly to what other architectures do, __lsx_vftintrz_w_s should be used when both SIMDE_FAST_CONVERSION_RANGE and SIMDE_FAST_NANS are declared, not just stored to a temporary and lost. Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>

__lsx_vftintrne_w_s actually returns a vector of 4 ints, but lsxintrin.h from gcc 14 and 15 declares it as returning a vector of 2 longs. We use HEDLEY_REINTERPRET_CAST to work this around. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759

__lsx_vfcmp_cun_s actually retuns a vector of 4 ints, but lsxintrin.h from GCC 14 and 15 declares it as returning two longs. Use HEDLEY_REINTERPRET_CAST to work this around and assign the correct member of simde__m128_private. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759

Change from SIMD_LOONGARCH_LSX_NATIVE to SIMDE_LOONGARCH_LSX_NATIVE.

…4.04 To avoid binutils mismatch

This works around two similar instances of ICE of GCC 14: test/x86/avx512/range.cpp: In function ‘int test_simde_mm256_maskz_range_ps()’: test/x86/avx512/range.cpp:702:1: error: unrecognizable insn: 702 | } | ^ (insn 191 190 192 2 (set (reg:V8SF 446 [ r_$f32_514 ]) (vec_merge:V8SF (vec_duplicate:V8SF (const_double:SF 0.0 [0x0.0p+0])) (reg:V8SF 446 [ r_$f32_514 ]) (const_int 1 [0x1]))) "../test/x86/avx512/../../../simde/x86/avx.h":1041:17 -1 (nil)) [...] The similar workaround is already present in simde_mm256_set_ps. Link: https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117575

…arch64 Avoid some usages of __lsx_vst and __lasx_xvst, as they may cause maybe-uninitialized warnings to be triggered: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766 The optimizing compiler still generates optimal vectorized code for fixed-size __builtin_memcpy, so no performance loss is expected.

…ialized

... in the same way it's already done for RISC-V GCC. Co-authored-by: Michael R. Crusoe <crusoe@debian.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121064 has been fixed in the gcc-14 branch, and already released in gcc 15.2 Co-authored-by: Michael R. Crusoe <crusoe@debian.org>

Use simde_memcpy instead of direct assignment to prevent GCC from generating incorrect vshuf.w instructions on LoongArch with -Ofast. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121064

With -ffinite-math-only (implied by -ffast-math/-Ofast), Clang optimize away comparisons involving infinity, causing test assertions to fail. Skip the affected test cases when SIMDE_FAST_MATH is defined.

iv-m and others added 14 commits February 22, 2026 09:59

x86/sse2: Fix typo about SIMDE_LOONGARCH_LSX_NATIVE

688c2c7

Change from SIMD_LOONGARCH_LSX_NATIVE to SIMDE_LOONGARCH_LSX_NATIVE.

gh-actions gcc-qemu: only add extra repository for gcc-15 on Ubuntu 2…

6b04a41

…4.04 To avoid binutils mismatch

arm neon ext: small adjustment to reduce risk of -Werror=maybe-uninit…

99a92d8

…ialized

arm/neon: Disable (maybe) uninitialized variable warnings on loongarch64

0e68472

... in the same way it's already done for RISC-V GCC. Co-authored-by: Michael R. Crusoe <crusoe@debian.org>

gcc loong64: work around the vec_perm_const bug in the LoongArch backend

3a31938

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121064 has been fixed in the gcc-14 branch, and already released in gcc 15.2 Co-authored-by: Michael R. Crusoe <crusoe@debian.org>

diagnostics: fix typo

4089949

x86/avx512/fixupimm: work around GCC LoongArch bug 121064

cf65c2b

Use simde_memcpy instead of direct assignment to prevent GCC from generating incorrect vshuf.w instructions on LoongArch with -Ofast. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121064

test x86 avx512 roundscale{,_round}: skip infinity under SIMDE_FAST_MATH

9329b7d

With -ffinite-math-only (implied by -ffast-math/-Ofast), Clang optimize away comparisons involving infinity, causing test assertions to fail. Skip the affected test cases when SIMDE_FAST_MATH is defined.

mr-c enabled auto-merge (rebase) February 22, 2026 09:07

mr-c mentioned this pull request Feb 22, 2026

Fixes for LoongArch LSX + fast math #1369

Open

gh-actions gcc-16: armel seems to be no longer available

f8268c0

mr-c merged commit c86a433 into simd-everywhere:master Feb 22, 2026
132 checks passed

mr-c deleted the fixes-for-loongarch-lsx-fast-math2 branch February 22, 2026 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fixes for LoongArch LSX + fast math#1391

Fixes for LoongArch LSX + fast math#1391
mr-c merged 15 commits intosimd-everywhere:masterfrom
mr-c:fixes-for-loongarch-lsx-fast-math2

mr-c commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

mr-c commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants