Page MenuHomePhabricator

[X86] Improvement in CodeGen instruction selection for LEAs.
ClosedPublic

Authored by jbhateja on Jul 5 2017, 8:40 AM.

Details

Summary

1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to

 accommodate similar operand appearing in the DAG  e.g.
             T1 = A + B
             T2 = T1 + 10
             T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will now look like
            Base = B , Index = A , Scale = 2 , Disp = 10

2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity

then complex LEAs (having 3 operands) could be factored out  e.g.
            leal 1(%rax,%rcx,1), %rdx
            leal 1(%rax,%rcx,2), %rcx
will be factored as following
            leal 1(%rax,%rcx,1), %rdx
            leal (%rdx,%rcx)   , %edx

3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.

4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put.

PR32755 will be taken care of by this pathc.

Previous patch revisions : r313343 , r314886

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@lsaba, @reviewers , waiting for your LGTM or any remaining comments on this.
Thanks

lsaba accepted this revision.Sep 13 2017, 7:13 AM
This revision is now accepted and ready to land.Sep 13 2017, 7:13 AM
jbhateja updated this revision to Diff 115359.Sep 14 2017, 10:18 PM
  • Few synthetic changes.
This revision was automatically updated to reflect the committed changes.
RKSimon reopened this revision.Sep 16 2017, 2:39 AM

Reverted in rL313376 due to PR34629 and PR34634

This revision is now accepted and ready to land.Sep 16 2017, 2:39 AM
RKSimon requested changes to this revision.Sep 16 2017, 2:42 AM

PR34629 and PR34634 need to be addressed

This revision now requires changes to proceed.Sep 16 2017, 2:42 AM
jbhateja updated this revision to Diff 115561.Sep 17 2017, 12:24 AM
jbhateja edited edge metadata.
  • Undefining result operand of factored statement to preserve SSA nature of Machine IR.
  • This fixes reperted PR 34634 and PR 34629 and build-bot failures reported.
jbhateja added a comment.EditedSep 17 2017, 12:38 AM

@RKSimon, @Reviewers, revision was in accepted state earlier and fix to counter reported issues post commit to trunk has been fixed. Please do let me know if another acceptance is needed to land this again.

jbhateja updated this revision to Diff 115583.Sep 17 2017, 11:36 AM
  • Updating tests for reported PRs for initial patch.
jmolloy requested changes to this revision.Sep 18 2017, 5:04 AM
jmolloy added a subscriber: jmolloy.
jmolloy added inline comments.
lib/Target/X86/X86OptimizeLEAs.cpp
1029 ↗(On Diff #115583)

This can cause recursion deep enough to cause stack overflows. Please could you refactor this to not use direct recursion? The domtree may be hundreds of nodes deep in degenerate cases.

This revision now requires changes to proceed.Sep 18 2017, 5:04 AM
jbhateja updated this revision to Diff 116144.Sep 21 2017, 12:15 AM
jbhateja edited edge metadata.

@reviewers, required revision change are through, let me know if this can land back.

jbhateja added a comment.EditedSep 27 2017, 4:38 AM

@jmolloy , @RKSimon , this patch has been reviewd and due to regression was opened again for review, required changes have
been made, can this land now in trunk if there are no more observations from any reviewers.

Thanks

jbhateja accepted this revision.EditedOct 3 2017, 1:46 AM

@reviewers, if no more comment I shall be landing this into trunk since required revision changes post acceptance are through.

jbhateja updated this revision to Diff 118948.Oct 13 2017, 11:37 AM
  • Operands of factored LEA must belong to same register class as per Intel's Architecture Manual.
  • Some code reorganization + rebase.
jbhateja updated this revision to Diff 119010.Oct 14 2017, 1:06 AM
  • D35014 : Review comments resolution
  • Removing 2 tests, pulled their latest renamed versions from trunk.
  • [X86] : Factorize LEA, handling for patterns involing SUBREG_TO_REG as LEA operands.
  • Few more changes for LEA factorization.
  • Updating test lea-opt-cse3.ll
  • Formatting changes.
  • Formatting changes
  • Changes to avoid creating costly LEAs in loops, strength reduction for simple LEAs with unit scale
  • Updating test.
  • [X86] Limiting the scope of DAG operands folding while AM based instruction selection to LEAs.
  • Merge from trunk.
  • Extending aggressive AM based folding for LEAs to cover more cases.
  • Updating test post rebase.
  • Formatting changes + fine tuning pattern matching condition.
  • Adding a check for subtarget feature Slow3OpLEA in pattern matching.
  • Few synthetic changes.
  • Undefining result operand of factored statement to preserve SSA nature of Machine IR.
  • Merge branch 'master' of https://github.com/llvm-mirror/llvm
  • Merge branch 'master' of https://github.com/llvm-mirror/llvm
  • Updating tests for reported PRs for initial patch.
  • Merge branch 'master' of https://github.com/llvm-mirror/llvm
  • Pull from trunk.
  • Operands of LEAs must be of same register class.
  • Revert "Operands of LEAs must be of same register class."
jbhateja updated this revision to Diff 119011.Oct 14 2017, 1:17 AM
  • Operands of LEA must be of same register class, this constraint is as per Intel's architecture manual.
  • Remove map entry from LEAs map if value list becomes empty.
  • Rebase.

Patch has been regressed through chrome test sweet.
No issues reported. Thanks to Hans Wennborg (hans@chromium.org) for validating it.

RKSimon added inline comments.Oct 29 2017, 6:12 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
1178 ↗(On Diff #113356)

My comment still stands - try to avoid hard coded values embedded in the source - add a getMaxOperandFoldingDepth() helper.

105 ↗(On Diff #119011)
bool isLegalScale() const {
199 ↗(On Diff #119011)

Is a default argument for a setter a good idea? Especially one that is the inverse of what the setter says it is.

1537 ↗(On Diff #119011)

These AM.Scale increments are scary - better to set it with AM.Scale = 2?

lib/Target/X86/X86OptimizeLEAs.cpp
327 ↗(On Diff #119011)

Do you mean:

bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent());
1013 ↗(On Diff #119011)

Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp

auto IsLegalScale = [](int S) {
  return S == 1 || S == 2 || S == 4 || S == 8;
};
1019 ↗(On Diff #119011)
return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm();
1074 ↗(On Diff #119011)

DL is only used here - just use LI1.getDebugLoc() directly?

29 ↗(On Diff #106796)

Include ordering still broken

839 ↗(On Diff #106796)

Has @lsaba test been added to the patch? I couldn't see it.

test/CodeGen/X86/lea-opt-cse2.ll
3 ↗(On Diff #119011)

Why have you changed these tests?

test/CodeGen/X86/lea-opt-cse4.ll
3 ↗(On Diff #119011)

Why have you changed these tests?

jbhateja retitled this revision from [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. to [X86] Improvement in CodeGen instruction selection for LEAs..Oct 29 2017, 8:36 AM
jbhateja edited the summary of this revision. (Show Details)
jbhateja added inline comments.Oct 29 2017, 9:55 AM
lib/Target/X86/X86ISelDAGToDAG.cpp
1178 ↗(On Diff #113356)

Helper added.

105 ↗(On Diff #119011)

Fixed.

199 ↗(On Diff #119011)

Fixed. We do not need a default argument here, both the calls to this routines is passing an explicit argument.

1537 ↗(On Diff #119011)

Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is a max legal scale). This was intentionally done, initial change was only working for AM.scale = 2 and was very restrictive. Aggressive operand folding is done only for LEAs currently and is enabled instantiating and RAII object of X86AggressiveOperandFolding class.

lib/Target/X86/X86OptimizeLEAs.cpp
327 ↗(On Diff #119011)

fixed.

1013 ↗(On Diff #119011)

Fixed

1019 ↗(On Diff #119011)

Fixed

839 ↗(On Diff #106796)

We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization inside loops, only simplifyLEA can kick in.

839 ↗(On Diff #106796)

We have a test case for loops lea-opt-cse2.ll, so not added this.
We are not doing any factorization inside loops, only simplifyLEA can kick in.

test/CodeGen/X86/lea-opt-cse4.ll
3 ↗(On Diff #119011)

FixupLEAPass down the pipeline transforms some complex LEA ptterns to simple with add. Optimization, with changes in the patch we will have following

leal 1(%rax,%rcx,4), %eax

which after FixupLEAPass will get converted to

leal (%rax,%rcx,4), %eax
addl $1, %eax

jbhateja updated this revision to Diff 120753.Oct 29 2017, 10:12 AM
  • Rebasing
  • Review comments resolution.

@RKSimon, requested revision changes have been made as per your comments. Can you please validate.

jbhateja updated this revision to Diff 121450.Nov 3 2017, 3:11 AM

1/ Making the factorization alog iterative. This was earlier commited with

Diff : https://reviews.llvm.org/D35014?id=116144
but some how got removed in successive commits.

2/ Rebasing again. All comments are resolved.

@RKSimon, @lsaba , @jmolly , all your comments have been addressed. Kindly verify so that I can land this into trunk.

A few minor comments @lsaba @craig.topper any final comments?

lib/Target/X86/X86ISelDAGToDAG.cpp
1178 ↗(On Diff #113356)

I meant make this a class method, but if you don't want to you can leave it here as lambda

203 ↗(On Diff #121450)

Make this a const method

lib/Target/X86/X86OptimizeLEAs.cpp
108 ↗(On Diff #121450)

CopyLike

189 ↗(On Diff #106796)

Again, can we avoid the static?

@RKSimon No more comments from my side

craig.topper edited edge metadata.Nov 26 2017, 12:05 AM

I haven't been following this much so I have no comments either.

RKSimon accepted this revision.Nov 26 2017, 1:21 AM

LGTM

LGTM - with those final few minors I mentioned

jbhateja updated this revision to Diff 124698.Nov 28 2017, 11:02 PM
  • Reivew comment resolution.
  • Rebasing patch.
jbhateja updated this revision to Diff 124699.Nov 28 2017, 11:22 PM
  • Rebasing to resolve incorrect overrideing of register names in kill statements.
jbhateja added inline comments.Nov 28 2017, 11:25 PM
lib/Target/X86/X86OptimizeLEAs.cpp
189 ↗(On Diff #106796)

Its used bacause we want MemOpKey for LEA factorization to be indipendent of Scale, keeping it as static avoids recreation of dummy scale.

jbhateja updated this revision to Diff 124757.Nov 29 2017, 8:17 AM
  • Register name case changes due to rebase.
This revision was automatically updated to reflect the committed changes.
RKSimon reopened this revision.Dec 7 2017, 7:19 AM

rL319543 was reverted at rL319591 due to asan bot breakage

Please rebase. Thanks.

This revision was not accepted when it landed; it landed in state Needs Review.Mon, Oct 7, 5:06 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMon, Oct 7, 5:06 AM
Herald added a subscriber: hiraditya. · View Herald Transcript