- User Since
- May 22 2014, 1:24 PM (148 w, 5 d)
This seems unnecessarily complicated. Why can't we just match any shuffle of a splat before we get here and simplify it?
Mon, Mar 27
Sat, Mar 25
LGTM. I'd make the VSRAI signbits part a follow-up commit for the sake of minimalism and bug bisection (if needed), but if you think it works better as one piece, that's ok too.
Fri, Mar 24
- Added 32-bit target testing in rL298744
- Don't use hasSSE2() in the x86 override - that won't work if we're in soft-float mode (nice catch!).
- Add TODO comment to handle 64-bit type on x86 32-bit target.
On 2nd thought, that EVT/MVT argument makes no sense. The returned type from the hook is always going to be an MVT because it will be a supported type in order to be fast. Using MVT makes the code a bit cleaner since we don't have to pass a context around for those.
Have the TLI hook return the preferred operand (load) type for a given bitwidth, so we don't have to cycle through all of those when transforming the memcmp().
Check all of the 16-byte simple value types before giving up.
Thu, Mar 23
Wed, Mar 22
Tue, Mar 21
I had to use Alive to convince myself those tests are right. :)
Patch updated as suggested by Eli:
Canonicalize an insert of a constant before an insert of a variable. There was already a test for this (but did nothing before), so I added a comment to explain.
Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?
Mon, Mar 20
Fri, Mar 17
Can we do better than repeating this in every emit* function? Is there ever a case where we don't want to infer the attrs?
Thu, Mar 16
Wed, Mar 15
Tue, Mar 14
Mon, Mar 13
Hal's earlier comment said we don't need to hold this up for clang, but for reference that fix is proposed now:
- The code duplication was already bugging me, and we know it's only going to grow over time, so I'm proposing a single place to house the undef checking. Fixing the duplication between FoldConstantArithmetic and FoldConstantVectorArithmetic is another step.
- Fixed to check for zero or undef elements in the build vector.
Sun, Mar 12
LGTM. I don't know if the force-to-zero is important either, but since we're generally happy to add xors to avoid partial reg dependencies, that seems fine to me.
Fri, Mar 10
Thu, Mar 9
Patch updated (no code changes, but...):
Wed, Mar 8
I didn't notice that an extra x86 test (for https://bugs.llvm.org/show_bug.cgi?id=30693) was failing with this change because it has div-by-0.
I can't tell if that test is useful or not any more ( cc'ing @zvi ).
Changing the definition of isConstantOrConstantVector() to include undef may have unintended consequences (there's a test in test/CodeGen/ARM/select.ll that will trigger the assert in foldBinOpIntoSelect()), so just assert that an undef result from DAG.getNode() is ok for now.