- User Since
- May 22 2014, 1:24 PM (256 w, 6 d)
Add a TODO comment about optimizing for size. That's an existing concern even for scalar code. But there may not be a clear answer for optsize because, for example, we may be able to hoist fdiv out of a loop by doing this transform.
Remove reordering of reciprocal estimate and repeated divisor transforms. We still see a diff for the SSE target with an illegal type because we don't try to generate the reciprocal estimate sequence until the types are legal.
Note that there is a dedicated folder for CGP IR tests at 'llvm/test/Transforms/CodeGenPrepare' with target subfolders under there, but there's also precedence for doing it this way, so we see IR and asm in the same file.
In the generic case, we're creating a masked load of a scalar (although as noted inline, there's apparently no support or test for that pattern). But I don't know what the codegen for a <1 x X> masked load would be (do we cmp/br around the load since it's not speculatable?). Someone else with masked load lowering knowledge (@craig.topper @efriedma ?) should also review this.
Mon, Apr 22
LGTM. This canonicalization is reverse of the typical (shorter code is preferred), but the reasoning in the summary makes sense: we have lots of IR optimizer logic for selects, and this should be reversible in the backend.
Sorry, I didn't notice the patch stack before commenting in D60846.
Can you push the baseline tests to trunk/head, so this diff is against an updated public baseline?
There's also a TODO comment/test about this enhancement in test/Transforms/LICM/hoist-mustexec.ll -- that should be updated as part of this patch.
Sun, Apr 21
This looks like a straightforward extension of the existing logic, so LGTM. Up to you if you want to wait for another review.
Fri, Apr 19
These are 2 independent transforms, right? It would be better to split that into 2 smaller reviews/commits, so we are not missing anything (and easier to diagnose if there's trouble post-commit). The tests should show minimal IR patterns to exercise those 2 independent transforms.
Thu, Apr 18
This looks like the expected improvement to me, but given the 'blend' comment/test, I'd like someone else to confirm that I'm not overlooking some weird corner case.
I added some baseline tests here:
Add inc or dec param to helper function to reduce code duplication.
Wed, Apr 17
Made ARM test checks more exact, so we can see there actually is an extra instruction here (vext.32). That seems like a regression, but I'm not sure if it's important and/or another known shortcoming for folding shuffles.
I haven't hand-written FileCheck lines in a long time, and I messed that up...another update coming up.
- Added TODO comment about adding a more effificient constant query.
- Updated ARM test assertions to show more complete diff.
I haven't looked at LVI much, but judging from the git log, nobody else has done that recently either.
Ping * 2.
We clearly need to fix the bug (crash). @reames or someone else with more GEP experience should have a look though. I've just pointed out some minor bits to clean up.
- Added check to make sure that IndexReg has not already been set.
- Fixed node insertion/deletion/rauw to match existing code patterns in foldMask*() helpers. Dead node removal should be handled by PostprocessISelDAG(), so there's no need to do that explicitly.
Tue, Apr 16
- Try to make code comment clearer.
- Set scale/index parts of the AddressMode directly rather than recursing.
- Remove dead node and explicitly RAUW.
- Added some shift constant variation to the tests and a negative test.
Redo equivalency matching to reduce code.
Mon, Apr 15
- Improve hashing as suggested (I don't know how to expose that in a test, so no extra tests for that).
- Add a "double-negation" matcher to recognize selects with same true/false ops but different conditions (positive and negative tests added).
Name: Inverted predicate
%cond = icmp sgt i8 %x, 42 %invcond = icmp sle i8 %x, 42 %m1 = select i1 %cond, i32 %t, i32 %f %m2 = select i1 %invcond, i32 %f, i32 %t %r = xor i32 %m1, %m2 => %r = i32 0