- User Since
- Jul 30 2013, 7:58 PM (251 w, 4 d)
Fri, May 25
Mention in infinite loop in comments. Update C code in comments in ARM test. Change ashr to lshr in tests when its directly after the abs pattern. InstCombine is able to convert those. The ashr inside the loop dont' get changed because value tracking doesn't understand the recurrence I guess.
Thu, May 24
The original motivating code is something like the loop in the htest_one_block function here https://github.com/mozilla/mozjpeg/blob/master/jchuff.c
Add the code changes.
Can you say more about the bug and what the fix was so I know where to focus?
LGTM, if you fix the ordering in cpuid.h.
It's not really a question of whether it will be lowered correctly. Target independent DAG combines call getBooleanContents and expect that the condition for a target independent ISD:;SELECT node obey that.
Wed, May 23
Hi @aemerson, I'm not opposed to adding it back. But the clang policy for vector builtins has always been that we won't support all the builtins that gcc does and to encourage the use of the _mm_* wrappers which are guaranteed to work in both compilers. It possible to change your source code to use the portable intrinsic name?
Add back popcntintrin.h
Rebase to remvoe the InstCombine changes that were split off to another patch and have now been committed
Tue, May 22
Leave the message still saying x86intrin.h. Change the error checks to look for either x86intrin.h or immintrin.h to have been included. Really only the immintrin.h check is necessary since that's the header that does the include, but I put both so the error message saying x86intrin.h would be less confusing.
Implemented @DavidKreitzer's suggestion in r333033
That should have said abs(x) >= 0 is always true.
Add a simple test case to show (abs(x)) > 0 is always true.
Add a combine to negate ABS/NABS to recover the broken tests.
It looks like neither icc or MSVC doe a perfect job of ensuring zero extension here. For example: https://godbolt.org/g/UqnXh8
It is odd, but they really are split in the icc include files. So they got split a while back in clang to match the Intel Intrinsic Guide documentation.
This seems right to me. GCC believes believes that __bultin_abs always returns a positive number.
First there was mmintrin.h which covered MMX instructions. Then xmmintrin.h came along to support SSE1 and implicitly included mmintrin.h. The emmintrin.h to support SSE2 and implicitly included xmmintrin.h. This repeated for each new version of SSE. With each header file including the previous header file. I think most of the later SSE headers start with the first letter of the code name of the CPU that added that SSE level including cancelled projects. So nmmintrin.h refers to Nehalem. tmmintrin.h came from the cancelled Tejas CPU. wmmintrin.h referes to Westmere. pmmintrin.h refers to Penryn. Eventually this was determined to not be very scalable to remember which header file contained what intrinsics and you have to change it with each generation to get the latest.. So immintrin.h was created to just include everything. I assume the 'i' here just stands for Intel. The 'mm' is just historic legacy due to the earlier file names. x86intrin.h was created by gcc to hold the intrinsics not specified by Intel. I think icc has an x86intrin.h that just includes immintrin.h.
Mon, May 21
Add vector tests too.
Sorry @rsmith, this patch was created only to go to Chandler for a preliminary review. I didn't expect it to get a wider audience.
My main focus was on preventing 512 bit instructions on Skylake X. Do you have measurements for that?
@atdt How can I access that document?
Just commit the test files to trunk with the current codegen. No need for a phabricator review. Then rebase this patch to show the diff with the new codegen
So I think we've covered the whether this is ok to do questions. If someone can double check signed/unsigned and vector element sizes are all correct and approve this that would be great.
Add fast-isel tests
I think you can pass StringRef(F).substr(1). That won't create a temporary string. It will just create a StringRef pointing into the middle of an existing std::string stored in the parsed attributes.
I don't know that the fast-isel tests would show anything more than the intrinsics tests do since I ran those test cases through the autoupgrade code to get IR with selects. Should I move the intrinsic+select to fast-isel and leave just intrinsic by itself in the *intrinsic.ll test?
IRBuilder constant folding can be disable by declaring it as IRBuilder<NoFolder>
Because the builtins take one of the arguments as an immediate, they must be implemented as macros. This was the frontend can verify that it's an imediate or a constant expression.
Sun, May 20
Update comment. Fix dead code.
Sat, May 19
Fri, May 18
Remove the InstCombine change.
I fear that LICM is going to separate this pattern if its used in a loop and then isel won't be able to see the whole pattern.
Why are we going through shift intrinsics to do this? Why can't we just emit shl and lshr instructions directly?
Thu, May 17
Is it possible to add tests for any patterns that are new? It's not good to add untested code.
Wed, May 16
You can just remove the comment.
What about _mm512_cvtepi64_epi8?
I haven't tried but you might be able to do
What about this. This is the most obvious representation for this.
Can you precommit the test cases?
LGTM if you remove that one FIXME that's still in there.
Tue, May 15
Do we need to do something for the FIXMEs?