- User Since
- Jul 30 2013, 7:58 PM (268 w, 2 d)
Tue, Sep 18
Sun, Sep 16
Can we DAG combine to X86ISD::BEXTR for these cases? Then we can properly check one use of the inner parts of the pattern. The other option might be to use PatFrags that check for one use like the existing srl_su fragment. My concern is that if the inner nodes of the patterns here have more uses, the emitted pattern won't completely remove the need for the inner node. So you'll end up emitting multiple instructions for this pattern and additional instructions for the remaining use of the inner node.
Fri, Sep 14
Rebase to use the code from r342292.
Punt on vXi8 types for now. Add FIXMEs.
Not sure how D38128 would help. Isn't that about register allocation?
Or I failed to get the file in the diff at all.
Thu, Sep 13
I'm going to assume Reid meant to hit approve instead of changes required.
Wed, Sep 12
Fix the typo in the comment. I missed it earlier in my haste to board a plane.
This whole patch does fix the infinite loop from PR38915, but it requires the whole patch and not just the change in InstCombineSelect.cpp
I don’t know if it fixes that PR. This transform causes an infinite loop on the tests where both max/min operands are invertible but not constants. Test50 and test51
More test cases and fix an infinite loop.
Tue, Sep 11
Use 0x67 address size prefix for x32. Don't disable movsq with x32.
I believe we have the same bug for memset as well, but I'd prefer to tackle that as a follow up since I don't have a test case yet.
Mon, Sep 10
Add the test case from the description.
Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.
Sun, Sep 9
Sat, Sep 8
Rebase after the other ADC changes that have gone in recently. Add a non-ADX command line to the stack folding test.
Fri, Sep 7
This is kind similar, but here the issue isn't the number of scalar operations, it's the size of the scalar operations. This probably also applies to v2i8 and v2i16 and any other v2iX type where X is 32 or less.
For the failure I posted. It looks like the Live Variable Analysis Pass is removing a register operand for a clobber without removing the immediate operand preceeding that contains the inline assembly flags that indicate it was a clobber. So later we see the clobber flag, but the register operand after it is missing.
We're seeing a failure from this on this test case. Looks like somehow the operand after an InlineAsm::Kind_Clobber has become an immediate rather than a register. So the call to getReg fails.
Thu, Sep 6
Pass a parameter to the lambda to indicate the LHS/RHS are swapped. Check this with an assert.
Do you have commit access, or do you need someone to commit this for you?
Mon, Sep 3
So does this look ok?
Sat, Sep 1
Thu, Aug 30
LGTM. Can you update lib/Target/X86/X86.td in LLVM repo as well?
Disable the new custom scalarizing if we are legalizing v2i32 via widening instead or promoting. The generic type legalizer knows to scalarize v2i32 div/rem in that case since it can't widen a trapping operation.
@davidxl, this transform doesn't really know if its changing the compare predicate because the min/max matching considers (select (icmp slt X, ~C), ~X, C) to be equivalent to (select (icmp sgt ~X, C), ~X, C). And it only returns the select operands to the calling code. So this transform rewrites the icmp, but doesn't know which form the original compare took.
Now with lamdba
LHS/RHS are set based on the select operands not the compare operands aren't they?
Address review comments. Rebase after pre-committing new tests.
Address review comments.
Wed, Aug 29
Add test cases for KNL and SKX. The KNL code is pretty terrible on anything but v16i1, but that's to be expected since we don't have the v32i1 or v64i1 types on KNL or an 8-bit kshift. The intrinsics that will be added for clang will only use 8, 32, and 64 on cpus with avx512dq/avx512bw so this isn't an issue that needs to be addressed immediately.
Forgot to drop an InstCombine patch from my tree before uploading that last patch.
Updated cpus.ll test to also check for this new error. Had to add a function body to get it to create the X86Subtarget correctly to make the check and had to fix some of the old RUN lines to not pass a 64-bit triple with a 32-bit cpu when they expect no error.
I did briefly look at always hoisting it, but it potentially changes the critical path so I'm playing it safe.
Add comment. Use existing createMinMax helper to shorten some code.
Looking into the more general fold. Trying to figure out the right magic to avoid an infinite loop.
Tue, Aug 28
I think I accidentally merged by commits on my tree and picked up some extra files.
Take Eli's suggestion and only widen the index. This fixes the scatter/gather index only widening cases to use the better instruction. The default 'promote' case is still promoting to a v2i64 index type since that's what type legalization will naturally do.
Rebase after fixing tets
Mon, Aug 27
@jonpa, it looks like changing the urem to this will trigger the original bug and avoid the optimization Simon is adding here.
Switch to m_Constant
Fix 32-bit and 64-bit check line swap
Doh! I got my check-prefixes reversed. Let me fix that.
Sun, Aug 26
Missed a test case
Sat, Aug 25
Address a review comment.
Rebase after committing test cases