Open
Conversation
Owner
|
That |
Author
true, I was not optimizing it for performance, actually I ran into the compile issue with an ancient compiler and looked into this file, then fixed it, and did the optimization as a favor. Well, say, this could save ~10-ish bytes in binary code :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
x86[-64] doesn't have integer saturating arithmetic instructions (thus slow if not vectorized), since all x86-64 CPUs support SSE2, we can use SSE2 as a baseline implementation.
This implmentation is taken from clang's optimization result, and gcc/msvc can't optimize it this way, see here for a comparison on godbolt.
It also contains a minor fix to fix minimal gcc version to compile (without globally enabling
SSE4.1/AVX2but use thetargetGCC extension). I think the old valueGCC 4.7.1was there because AVX2 support is added in GCC 4.7, but starting from GCC 4.9, it is now possible to call x86 intrinsics from select functions in a file that are tagged with the corresponding target attribute without having to compile the entire file with the-mxxxoption..Technically GCC 4.7 and 4.8 don't have the
targetfeature in x86 intrinsic headers and don't allow including per-instruction-extension-set header directly, code like below in<?mmintrin.h>is only available since GCC 4.9.