Summary
In special cases select instructions can be eliminated by
replacing them with a cheaper bitwise operation even when the
select result is used outside its home block. The instances implemented
here are %x=icmp.eq; %y=select %x,%r, null; %z=icmp.eq|neq %y, null;
br %z,true, false ==> %x=icmp.ne; %y=icmp.eq %r,null; %z=or %x,%y; br
%z,true,false. The optimization is integrated into the instruction
combiner and performed only when all uses of the select result can
be replaced by the select operand proper. For this dominator information
is used. Integration into the combiner means that only the select-icmp is
transformed at first. Then the combiner iteratively works out the simpler
pattern. Note that the instruction combiner already handles the local pattern
when there is no use of the select result outside its home block.
Details
Diff Detail
Event Timeline
A few nits in line, one question in line, and one more thought: Can't this replace a few of our other queries for instructions with dominator tree lookups? Is that a follow on patch or?
Thanks!
-eric
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2429 | s/or/of | |
2433 | Local variable naming style. | |
2946 | Local variable naming style. | |
2976 | Nit: Sentences begin with capital letters. | |
2978 | Isn't this just a boolean then? | |
lib/Transforms/InstCombine/InstructionCombining.cpp | ||
100 | Probably don't need to have it as requires, but optional? i.e. if it doesn't exist then just don't do the transform in the way you've got? |
Thanks, Eric! I think I have to rely on dominance optionally to get keep a few of the MCJIT tests running.
What specifically is on your mind wrt to other queries?
Cheers
Gerolf
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2452–2454 | Why not perform a dyn_cast instead? | |
2457–2460 | This will return true if IC is null, did is this intensional? | |
2956 | s/int/unsigned | |
2973–2974 | This is just llvm::ICmpInst::isEquality. | |
2978 | I'd make this an unsigned, getSuccessor is asking which successor to take. It just so happens that we are only interested in cases where there are two successors. |
Hi Gerolf,
One more comment in line.
As far as the dominance tree, you don't ever check to find out if you have dominance trees, you just require it. The leading question was: if it's not optional, then you can probably use it in more places in the pass. If it is optional then you're not actually checking for it before using it.
Thanks!
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2978 | Yes. Makes sense that way. Might be better off just grabbing the correct successor rather than mucking with the integer. |
(most of the other reviewers are doing a great job on this patch, so I've only made a couple of specific comments... will leave the rest of the review to them)
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2431 | Agreed. I Think the name is good, but it should be made a very generic property of an instruction. I actually wonder whether this should be a method on DominatorTree itself. | |
lib/Transforms/InstCombine/InstructionCombining.cpp | ||
100 | I'm actually OK changing InstCombine so that it always requires dominator trees, but I think that should be a separate change. I would add it as optional here, and then I would submit a follow-up patch to switch it over so we don't hold up the logic here on the debate about how often to require the analysis. Then the patch that makes it required can discuss the relevant concern by citing compile-item benchmarks showing it's not too expensive. =] |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2429 | Replaced by crisper comment. | |
2431 | I generalized the signature to use instruction *. | |
2433 | Fixed. | |
2452–2454 | Right, thanks. | |
2457–2460 | Excellent catch, thanks! | |
2946 | Fixed. | |
2956 | ok, unsigned. | |
2958 | nice suggestion, thanks. | |
2966 | dito | |
2973–2974 | yes, thanks! | |
2976 | Yop, fixed. | |
2978 | Addressed as suggested. | |
lib/Transforms/InstCombine/InstructionCombining.cpp | ||
100 | We go for required. Open question is one or two commits. | |
test/Transforms/InstCombine/select-cmp-br.ll | ||
2 | Done. | |
10 | done. | |
11 | dito for this and the rest below. |
Updates:
- Follow up on all your great reviews
- Note: patch still assumes that it is OK to enable DT by default
- New change in pr12338: when DT is default all arithmetic instructions in
that test get removed. This is fine. The purpose of the test is to protect
against an inifinite loop in the instruction combiner.
lib/Transforms/InstCombine/InstCombine.h | ||
---|---|---|
247 | It's unusual to capitalize all the letters in DEF and USE, it's more common to see Def and Use. I think all arguments and the method could be const. | |
248 | SI could be const as should the method itself. | |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
2437–2438 | Could be a little simpler with for (const Use &U : Def->uses()) { | |
2442–2443 | You could make this a little easier to follow if you made this return false; In fact, you could probably remove the continue; as well. | |
2445 | This could just be return true; | |
2459–2461 | You could make this return IC && IC->getOperand(0) == SI; Are we interested in doing anything if the other operand to the ICmpInst is the SelectInst? |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2459–2461 | Where do we check this? |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2459–2461 | Thinking about this, isChainSelectCmpBranch is very specific to this transform; I think this makes more sense as a static function instead of a member function. |
Changes:
- a few more improvements to code readability
- isChainSelectCmpBranch() is now static
- const parameters and variables
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2438 | DT is invariant of this loop, you could hoist it out of the loop (return false). |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2957–2958 | I think this reads easier as: !CI->isZero() Speaking of which, why is it bad if the constant is zero? |
I still think that we need to confirm there is not a compile time regression from requiring dom trees here...
Also, some code comments below. I think this code needs to be refactored a bit here. =/ It's already gotten pretty gnarly and this pushes it further.
Thanks for working on all of this though. I really like the result.
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2428–2431 | Please follow either the specific coding standards for doxygen comments or match the surrounding code. The function above has a pretty good example. | |
2436–2437 | If all you need is the User, iterate over UI->users()? | |
2946 | This whole large new block is pretty indicative that this needs to be refactored. I think you should extract a method for the entire transform guarded by this boolean (in a separate patch, should be super simple refactoring). The existence of the boolean is a really clear sign that this will help -- you can use early exit rather than a boolean to guard falling through to the transform. It will reduce indentation and make everything much more readable. |
lib/Transforms/InstCombine/InstCombine.h | ||
---|---|---|
247 | 80 cols. | |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
2428 | This is not quite right, the function actually asserts if the definition and the provided use are in different blocks (it does not check this condition). | |
2458 | Can this really be called on un-inserted instructions? | |
2461 | Or on instructions in incomplete blocks? | |
2970 | Make this an else if? | |
2983 | As we only actually perform the transform if I.isEquality(), can you host that part of the check up to be with the isChainSelectCmpBranch(SI) check? | |
lib/Transforms/InstCombine/InstructionCombining.cpp | ||
100 | Are you going to separate this into a follow-up patch, or are you planning to provide compile-time measurements to before committing this one? |
hfinkel: Are you going to separate this into a follow-up patch, or are you planning to provide compile-time measurements to before committing this one?
I guess I had said that compile-time is not impacted, but never showed any data. And then I had said many more things about compile-time measurements, our test suite and dominators, so the noise numbers should come as no surprise (all with 0.1-0.2 secs on x86):
Performance Regressions - Compile Time Δ Previous Current σ Δ (B) σ (B)
SingleSource/UnitTests/ObjC++/property-reference http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.202=2 4.37% 0.4192 0.4375 0.0003 0.00% 0.0003
SingleSource/Benchmarks/Shootout-C++/fibo http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.255=2 3.98% 0.2714 0.2822 0.0005 0.00% 0.0005
MultiSource/Applications/lemon/lemon http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.521=2 3.35% 0.5018 0.5186 0.0045 0.00% 0.0045
SingleSource/UnitTests/ObjC++/property-reference-object http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.62=2 3.05% 0.3375 0.3478 0.0002 0.00% 0.0002
MultiSource/Applications/sqlite3/sqlite3 http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.513=2 2.81% 7.1620 7.3631 0.0438 0.00% 0.0438
External/SPEC/CFP2006/444_namd/444_namd http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.163=2 2.15% 3.1007 3.1675 0.0021 0.00% 0.0021
MultiSource/Applications/oggenc/oggenc http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.537=2 1.73% 2.0891 2.1252 0.0065 0.00% 0.0065
External/SPEC/CINT2000/197_parser/197_parser http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.447=2 1.63% 1.1629 1.1819 0.0000 0.00% 0.0000
External/SPEC/CINT2000/176_gcc/176_gcc http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.333=2 1.60% 12.4641 12.6631 0.0030 0.00% 0.0030
External/SPEC/CINT95/099_go/099_go http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.528=2 1.51% 1.7386 1.7649 0.0020 0.00% 0.0020
External/SPEC/CINT2006/403_gcc/403_gcc http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.59=2 1.46% 30.0626 30.5019 0.0195 0.00% 0.0195
External/SPEC/CINT95/134_perl/134_perl http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.319=2 1.34% 2.5077 2.5413 0.0010 0.00% 0.0010
External/SPEC/CINT2006/400_perlbench/400_perlbench http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.280=2 1.31% 10.8686 11.0114 0.0005 0.00% 0.0005
External/SPEC/CINT2006/445_gobmk/445_gobmk http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.272=2 1.18% 7.7186 7.8093 0.0010 0.00% 0.0010
External/SPEC/CINT2000/253_perlbmk/253_perlbmk http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.241=2 1.11% 5.5141 5.5752 0.0047 0.00% 0.0047
External/SPEC/CINT2000/254_gap/254_gap http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.99=2 1.03% 4.5858 4.6332 0.0058 0.00% 0.0058
The most notable change is a minor refactoring introducing
replacedSelectWithOperand() that replaces a select with one
of its operands when possible. Refactoring includes the follow-up on good
suggestions like Use isZero() (David), employ users() iterator (Chandler)
and hoist equality check (Hal).
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2448 | If you mean for this to rule-out any other uses in the same block as the definition (I don't see anything else that might enforce that there is only one use in the parent block), should this be a call to properlyDominates? |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2448 | Very much agree. Done. |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2448 | Ups, I have to take it back. DB can be a successor block of the DI parent and it may contain a use. So dominates() must be used. |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
---|---|---|
2448 | Alright, please make sure the function description comment is accurate and that this is noted. |
lib/Transforms/InstCombine/InstCombine.h | ||
---|---|---|
102 | This change looks superfluous. |
lib/Transforms/InstCombine/InstCombine.h | ||
---|---|---|
102 | Good catch. Somehow I flipped TLI and DT. Done. | |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
2448 | Ok. I sharpened the comment and explicitly mention that there could a use of \p DI in \p DB. |
Thanks, Hal + all co-reviewers. I never thought that landing this optimization could be that exciting! :-) Very much appreciate all your feedback!
Committed revision 218721
I need to re-open this for another round of reviews. Although the select-icmp
optmization had received pretty good reviews its commit got hit by
a self-host failure (pr21210) that exposed two issues. In addition to
the fixes there is a new function and some small refactoring.
Fixes:
f1) do not replace select by operand in the block that contains the select
f2) do not perform the optimization when a use of the select
could be reached on both paths out of the conditional block that
contains the select
Major changes:
- dominatesAllUses() fixes f1)
- replacedSelectWithOperand has a more natural interface and fixes f2)
- replaceUsesOutsideBlock() in lib/IR/Value.cpp
- test/Transforms/InstCombine/pr21210.ll
include/llvm/IR/Value.h | ||
---|---|---|
264 ↗ | (On Diff #15905) | Add period after block and remove empty comment line. The other differences from RAUW (as noted above the implementation) should also be noted here. There is no need to name the BB argument ExceptionBB, BB is fine. |
lib/IR/Value.cpp | ||
360 ↗ | (On Diff #15905) | We don't need to use the name ExceptionBB, or refer to this as the "exception block" in the assert below. |
lib/Transforms/InstCombine/InstCombineCompares.cpp | ||
2536 | I assume that Succ->getUniquePredecessor() is cheaper to evaluate than InstCombiner::dominatesAllUses(SI, Icmp, Succ), so I'd check that first. Also, you can drop the InstCombiner:: |
This change looks superfluous.