Page MenuHomePhabricator

Select Elimination in InstCombine
ClosedPublic

Authored by Gerolf on Sep 8 2014, 9:39 PM.

Details

Summary

Summary
In special cases select instructions can be eliminated by
replacing them with a cheaper bitwise operation even when the
select result is used outside its home block. The instances implemented
here are %x=icmp.eq; %y=select %x,%r, null; %z=icmp.eq|neq %y, null;
br %z,true, false ==> %x=icmp.ne; %y=icmp.eq %r,null; %z=or %x,%y; br
%z,true,false. The optimization is integrated into the instruction
combiner and performed only when all uses of the select result can
be replaced by the select operand proper. For this dominator information
is used. Integration into the combiner means that only the select-icmp is
transformed at first. Then the combiner iteratively works out the simpler
pattern. Note that the instruction combiner already handles the local pattern
when there is no use of the select result outside its home block.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mcrosier added inline comments.Sep 9 2014, 5:24 AM
test/Transforms/InstCombine/select-cmp-br.ll
10

@test1

11

CHECK-LABEL: @test1

37

insert newline

38

@test2

39

CHECK-LABEL: @test2

65

newline

66

@test3

chandlerc edited edge metadata.Sep 9 2014, 3:00 PM

(most of the other reviewers are doing a great job on this patch, so I've only made a couple of specific comments... will leave the rest of the review to them)

lib/Transforms/InstCombine/InstCombineCompares.cpp
2461

Agreed. I Think the name is good, but it should be made a very generic property of an instruction.

I actually wonder whether this should be a method on DominatorTree itself.

lib/Transforms/InstCombine/InstructionCombining.cpp
98

I'm actually OK changing InstCombine so that it always requires dominator trees, but I think that should be a separate change.

I would add it as optional here, and then I would submit a follow-up patch to switch it over so we don't hold up the logic here on the debate about how often to require the analysis. Then the patch that makes it required can discuss the relevant concern by citing compile-item benchmarks showing it's not too expensive. =]

Gerolf added inline comments.Sep 9 2014, 11:35 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2459

Replaced by crisper comment.

2461

I generalized the signature to use instruction *.

2463

Fixed.

2482–2484

Right, thanks.

2487–2490

Excellent catch, thanks!

3067

Fixed.

3077

ok, unsigned.

3079

nice suggestion, thanks.

3087

dito

3094–3095

yes, thanks!

3097

Yop, fixed.

3099

Addressed as suggested.

lib/Transforms/InstCombine/InstructionCombining.cpp
98

We go for required. Open question is one or two commits.

test/Transforms/InstCombine/select-cmp-br.ll
2

Done.

10

done.

11

dito for this and the rest below.

Gerolf updated this revision to Diff 13561.Sep 10 2014, 2:28 PM
Gerolf edited edge metadata.

Updates:

  • Follow up on all your great reviews
  • Note: patch still assumes that it is OK to enable DT by default
  • New change in pr12338: when DT is default all arithmetic instructions in

that test get removed. This is fine. The purpose of the test is to protect
against an inifinite loop in the instruction combiner.

majnemer added inline comments.Sep 10 2014, 5:58 PM
lib/Transforms/InstCombine/InstCombine.h
249

It's unusual to capitalize all the letters in DEF and USE, it's more common to see Def and Use.

I think all arguments and the method could be const.

250

SI could be const as should the method itself.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2467–2468

Could be a little simpler with for (const Use &U : Def->uses()) {

2472–2473

You could make this a little easier to follow if you made this return false;

In fact, you could probably remove the continue; as well.

2475

This could just be return true;

2489–2491

You could make this return IC && IC->getOperand(0) == SI;

Are we interested in doing anything if the other operand to the ICmpInst is the SelectInst?

Gerolf added inline comments.Sep 10 2014, 8:04 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2467–2468

Done

2472–2473

Done

2475

Done

2489–2491

No, rhs is expected to be a constant.

majnemer added inline comments.Sep 10 2014, 8:34 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2489–2491

Where do we check this?

majnemer added inline comments.Sep 10 2014, 8:35 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2489–2491

Thinking about this, isChainSelectCmpBranch is very specific to this transform; I think this makes more sense as a static function instead of a member function.

Please see below.

Gerolf updated this revision to Diff 13593.Sep 11 2014, 12:23 PM

Changes:

  • a few more improvements to code readability
  • isChainSelectCmpBranch() is now static
  • const parameters and variables
majnemer added inline comments.Sep 11 2014, 12:53 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2468

DT is invariant of this loop, you could hoist it out of the loop (return false).

Sure. Anything else?

Cheers
Gerolf

majnemer added inline comments.Sep 11 2014, 3:46 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
3078–3079

I think this reads easier as: !CI->isZero()

Speaking of which, why is it bad if the constant is zero?

I still think that we need to confirm there is not a compile time regression from requiring dom trees here...

Also, some code comments below. I think this code needs to be refactored a bit here. =/ It's already gotten pretty gnarly and this pushes it further.

Thanks for working on all of this though. I really like the result.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2458–2461

Please follow either the specific coding standards for doxygen comments or match the surrounding code. The function above has a pretty good example.

2466–2467

If all you need is the User, iterate over UI->users()?

3067

This whole large new block is pretty indicative that this needs to be refactored. I think you should extract a method for the entire transform guarded by this boolean (in a separate patch, should be super simple refactoring).

The existence of the boolean is a really clear sign that this will help -- you can use early exit rather than a boolean to guard falling through to the transform. It will reduce indentation and make everything much more readable.

Please see below.

Gerolf updated this revision to Diff 13660.Sep 12 2014, 2:34 PM

Changes:

  • added comments and increased smoothyness
hfinkel added inline comments.
lib/Transforms/InstCombine/InstCombine.h
249

80 cols.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2458

This is not quite right, the function actually asserts if the definition and the provided use are in different blocks (it does not check this condition).

2488

Can this really be called on un-inserted instructions?

2491

Or on instructions in incomplete blocks?

3091

Make this an else if?

3104

As we only actually perform the transform if I.isEquality(), can you host that part of the check up to be with the isChainSelectCmpBranch(SI) check?

lib/Transforms/InstCombine/InstructionCombining.cpp
98

Are you going to separate this into a follow-up patch, or are you planning to provide compile-time measurements to before committing this one?

Gerolf added inline comments.Sep 29 2014, 3:24 PM
lib/Transforms/InstCombine/InstCombine.h
249

ok

lib/Transforms/InstCombine/InstCombineCompares.cpp
2458

the assertion actually checks that the use + def are in the *same* block

2488

Yes, conservatively the optimization will not run.

2491

dito.

3091

Part of refactoring.

3104

Yes, thanks.

hfinkel: Are you going to separate this into a follow-up patch, or are you planning to provide compile-time measurements to before committing this one?

I guess I had said that compile-time is not impacted, but never showed any data. And then I had said many more things about compile-time measurements, our test suite and dominators, so the noise numbers should come as no surprise (all with 0.1-0.2 secs on x86):

Performance Regressions - Compile Time Δ Previous Current σ Δ (B) σ (B)
SingleSource/UnitTests/ObjC++/property-reference http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.202=2 4.37% 0.4192 0.4375 0.0003 0.00% 0.0003
SingleSource/Benchmarks/Shootout-C++/fibo http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.255=2 3.98% 0.2714 0.2822 0.0005 0.00% 0.0005
MultiSource/Applications/lemon/lemon http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.521=2 3.35% 0.5018 0.5186 0.0045 0.00% 0.0045
SingleSource/UnitTests/ObjC++/property-reference-object http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.62=2 3.05% 0.3375 0.3478 0.0002 0.00% 0.0002
MultiSource/Applications/sqlite3/sqlite3 http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.513=2 2.81% 7.1620 7.3631 0.0438 0.00% 0.0438
External/SPEC/CFP2006/444_namd/444_namd http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.163=2 2.15% 3.1007 3.1675 0.0021 0.00% 0.0021
MultiSource/Applications/oggenc/oggenc http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.537=2 1.73% 2.0891 2.1252 0.0065 0.00% 0.0065
External/SPEC/CINT2000/197_parser/197_parser http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.447=2 1.63% 1.1629 1.1819 0.0000 0.00% 0.0000
External/SPEC/CINT2000/176_gcc/176_gcc http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.333=2 1.60% 12.4641 12.6631 0.0030 0.00% 0.0030
External/SPEC/CINT95/099_go/099_go http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.528=2 1.51% 1.7386 1.7649 0.0020 0.00% 0.0020
External/SPEC/CINT2006/403_gcc/403_gcc http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.59=2 1.46% 30.0626 30.5019 0.0195 0.00% 0.0195
External/SPEC/CINT95/134_perl/134_perl http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.319=2 1.34% 2.5077 2.5413 0.0010 0.00% 0.0010
External/SPEC/CINT2006/400_perlbench/400_perlbench http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.280=2 1.31% 10.8686 11.0114 0.0005 0.00% 0.0005
External/SPEC/CINT2006/445_gobmk/445_gobmk http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.272=2 1.18% 7.7186 7.8093 0.0010 0.00% 0.0010
External/SPEC/CINT2000/253_perlbmk/253_perlbmk http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.241=2 1.11% 5.5141 5.5752 0.0047 0.00% 0.0047
External/SPEC/CINT2000/254_gap/254_gap http://ghoflehner.apple.com/perf/v4/nts/2/graph?test.99=2 1.03% 4.5858 4.6332 0.0058 0.00% 0.0058

Gerolf updated this revision to Diff 14196.Sep 29 2014, 6:19 PM

The most notable change is a minor refactoring introducing
replacedSelectWithOperand() that replaces a select with one
of its operands when possible. Refactoring includes the follow-up on good
suggestions like Use isZero() (David), employ users() iterator (Chandler)
and hoist equality check (Hal).

hfinkel added inline comments.Sep 30 2014, 3:08 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2478

If you mean for this to rule-out any other uses in the same block as the definition (I don't see anything else that might enforce that there is only one use in the parent block), should this be a call to properlyDominates?

Gerolf added inline comments.Sep 30 2014, 3:54 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2478

Very much agree. Done.

hfinkel accepted this revision.Sep 30 2014, 3:57 PM
hfinkel added a reviewer: hfinkel.

LGTM.

This revision is now accepted and ready to land.Sep 30 2014, 3:57 PM
Gerolf added inline comments.Sep 30 2014, 4:40 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2478

Ups, I have to take it back. DB can be a successor block of the DI parent and it may contain a use. So dominates() must be used.

hfinkel added inline comments.Sep 30 2014, 4:46 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2478

Alright, please make sure the function description comment is accurate and that this is noted.

majnemer added inline comments.Sep 30 2014, 4:48 PM
lib/Transforms/InstCombine/InstCombine.h
103

This change looks superfluous.

Gerolf added inline comments.Sep 30 2014, 5:09 PM
lib/Transforms/InstCombine/InstCombine.h
103

Good catch. Somehow I flipped TLI and DT. Done.
Also, the not // not required comment on DT is wrong now, so I removed it.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2478

Ok. I sharpened the comment and explicitly mention that there could a use of \p DI in \p DB.

Thanks, Hal + all co-reviewers. I never thought that landing this optimization could be that exciting! :-) Very much appreciate all your feedback!

Committed revision 218721

Gerolf updated this revision to Diff 15905.Nov 6 2014, 6:05 PM

I need to re-open this for another round of reviews. Although the select-icmp
optmization had received pretty good reviews its commit got hit by
a self-host failure (pr21210) that exposed two issues. In addition to
the fixes there is a new function and some small refactoring.

Fixes:

f1) do not replace select by operand in the block that contains the select
f2) do not perform the optimization when a use of the select
could be reached on both paths out of the conditional block that
contains the select

Major changes:

  • dominatesAllUses() fixes f1)
  • replacedSelectWithOperand has a more natural interface and fixes f2)
  • replaceUsesOutsideBlock() in lib/IR/Value.cpp
  • test/Transforms/InstCombine/pr21210.ll
hfinkel added inline comments.Nov 10 2014, 11:28 PM
include/llvm/IR/Value.h
264

Add period after block and remove empty comment line.

The other differences from RAUW (as noted above the implementation) should also be noted here.

There is no need to name the BB argument ExceptionBB, BB is fine.

lib/IR/Value.cpp
360

We don't need to use the name ExceptionBB, or refer to this as the "exception block" in the assert below.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2566

I assume that Succ->getUniquePredecessor() is cheaper to evaluate than InstCombiner::dominatesAllUses(SI, Icmp, Succ), so I'd check that first. Also, you can drop the InstCombiner::

Gerolf added inline comments.Nov 11 2014, 8:37 PM
include/llvm/IR/Value.h
264

ok.

lib/IR/Value.cpp
360

ok.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2566

Good point. At least your assumption should be right :-)

Gerolf updated this revision to Diff 16075.Nov 11 2014, 8:39 PM

Update with minor changes to comments etc suggested by Hal.

I think this is generally good (again). LGTM.

lib/IR/Value.cpp
357

We don't need this description here, we have a better one in the header.

lib/Transforms/InstCombine/InstCombineCompares.cpp
34

"Number of select opts" -> "Number of selects replaced by an operand"

Thanks, Hal!

Committed revision 222590.