Skip to content

Comments

Fixes for LoongArch LSX + fast math#1391

Merged
mr-c merged 15 commits intosimd-everywhere:masterfrom
mr-c:fixes-for-loongarch-lsx-fast-math2
Feb 22, 2026
Merged

Fixes for LoongArch LSX + fast math#1391
mr-c merged 15 commits intosimd-everywhere:masterfrom
mr-c:fixes-for-loongarch-lsx-fast-math2

Conversation

@mr-c
Copy link
Collaborator

@mr-c mr-c commented Feb 22, 2026

A subset of #1369 , so we can merge the major fixes

iv-m and others added 14 commits February 22, 2026 09:59
__lsx_vftintrz_w_d accepts two __m128d arguments, so it's
should be called with zero_f64 that is declared.

This fixes the following compilation error that I get when
compiling current simde master for loongarch64-linux-gnu
with gcc 14.3.1 and `-Ofast -mlsx -mlasx` in CFLAGS:

../test/x86/avx512/../../../simde/x86/sse2.h: In function ‘simde__m128i simde_mm_cvttpd_epi32(simde__m128d)’:
../test/x86/avx512/../../../simde/x86/sse2.h:3736:39: error: ‘zero_i64’ was not declared in this scope; did you mean ‘zero_f64’?
 3736 |       r_.lsx_i64 = __lsx_vftintrz_w_d(zero_i64, simde__m128d_to_private(a).lsx_f64);
      |                                       ^~~~~~~~
      |                                       zero_f64

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
Similarly to what other architectures do, __lsx_vftintrz_w_s
should be used when both SIMDE_FAST_CONVERSION_RANGE and
SIMDE_FAST_NANS are declared, not just stored to a temporary
and lost.

Signed-off-by: Ivan A. Melnikov <iv@altlinux.org>
__lsx_vftintrne_w_s actually returns a vector of 4 ints,
but lsxintrin.h from gcc 14 and 15 declares it as returning
a vector of 2 longs. We use HEDLEY_REINTERPRET_CAST to
work this around.

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759
__lsx_vfcmp_cun_s actually retuns a vector of 4 ints, but
lsxintrin.h from GCC 14 and 15 declares it as returning two longs.
Use HEDLEY_REINTERPRET_CAST to work this around and assign
the correct member of simde__m128_private.

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123759
Change from SIMD_LOONGARCH_LSX_NATIVE to SIMDE_LOONGARCH_LSX_NATIVE.
This works around two similar instances of ICE of GCC 14:

  test/x86/avx512/range.cpp: In function ‘int test_simde_mm256_maskz_range_ps()’:
  test/x86/avx512/range.cpp:702:1: error: unrecognizable insn:
    702 | }
        | ^
  (insn 191 190 192 2 (set (reg:V8SF 446 [ r_$f32_514 ])
          (vec_merge:V8SF (vec_duplicate:V8SF (const_double:SF 0.0 [0x0.0p+0]))
              (reg:V8SF 446 [ r_$f32_514 ])
              (const_int 1 [0x1]))) "../test/x86/avx512/../../../simde/x86/avx.h":1041:17 -1
       (nil))
  [...]

The similar workaround is already present in simde_mm256_set_ps.

Link: https://gcc.gnu.org/pipermail/gcc-patches/2026-January/706166.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117575
…arch64

Avoid some usages of __lsx_vst and __lasx_xvst, as they may
cause maybe-uninitialized warnings to be triggered:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123766

The optimizing compiler still generates optimal vectorized
code for fixed-size __builtin_memcpy, so no performance
loss is expected.
... in the same way it's already done for RISC-V GCC.

Co-authored-by: Michael R. Crusoe <crusoe@debian.org>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121064 has been fixed in the
gcc-14 branch, and already released in gcc 15.2

Co-authored-by: Michael R. Crusoe <crusoe@debian.org>
Use simde_memcpy instead of direct assignment to prevent GCC from
generating incorrect vshuf.w instructions on LoongArch with -Ofast.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121064
With -ffinite-math-only (implied by -ffast-math/-Ofast), Clang optimize away
comparisons involving infinity, causing test assertions to fail. Skip the
affected test cases when SIMDE_FAST_MATH is defined.
@mr-c mr-c enabled auto-merge (rebase) February 22, 2026 09:07
@mr-c mr-c merged commit c86a433 into simd-everywhere:master Feb 22, 2026
132 checks passed
@mr-c mr-c deleted the fixes-for-loongarch-lsx-fast-math2 branch February 22, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants