This is an archive of the discontinued LLVM Phabricator instance.

Experiment with keeping GEPs near calls
ClosedPublic

Authored by djasper on Jan 29 2015, 10:27 AM.

Details

Summary

This is not meant as an actual code review to get this patch submitted but to have a basis for further discussion.

It is meant to experiment around a solution for http://llvm.org/PR22230

With this patch, LLVM, generates pretty nice code for the example from the bug report (see below). Obviously it is far from complete or correct.

.section TEXT,text,regular,pure_instructions
.macosx_version_min 10, 10
.globl Z1fPhP1A
.align 4, 0x90
Z1fPhP1A: ## @_Z1fPhP1A
.cfi_startproc

BB#0: ## %entry

pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
pushq %r15
pushq %r14
pushq %rbx
pushq %rax
Ltmp3:
.cfi_offset %rbx, -40
Ltmp4:
.cfi_offset %r14, -32
Ltmp5:
.cfi_offset %r15, -24
movq %rsi, %r14
movq %rdi, %rbx
incq %rbx
leaq LJTI0_0(%rip), %r15
jmp LBB0_1
.align 4, 0x90
LBB0_6: ## %for.cond.backedge

  1. in Loop: Header=BB0_1 Depth=1

incq %rbx
LBB0_1: ## %for.cond

  1. =>This Inner Loop Header: Depth=1

movzbl (%rbx), %eax
cmpq $3, %rax
ja LBB0_6

  1. BB#2: ## %for.cond
  2. in Loop: Header=BB0_1 Depth=1

movslq (%r15,%rax,4), %rax
addq %r15, %rax
jmpq *%rax
LBB0_3: ## %if.then

  1. in Loop: Header=BB0_1 Depth=1

movq %r14, %rdi
jmp LBB0_5
LBB0_4: ## %if.then4

  1. in Loop: Header=BB0_1 Depth=1

leaq 4(%r14), %rdi
jmp LBB0_5
LBB0_7: ## %if.then8

  1. in Loop: Header=BB0_1 Depth=1

leaq 8(%r14), %rdi
jmp LBB0_5
LBB0_8: ## %if.then12

  1. in Loop: Header=BB0_1 Depth=1

leaq 12(%r14), %rdi
LBB0_5: ## %for.cond.backedge

  1. in Loop: Header=BB0_1 Depth=1

callq __Z6assignPj
jmp LBB0_6
.cfi_endproc
.align 2, 0x90
L0_0_set_3 = LBB0_3-LJTI0_0
L0_0_set_4 = LBB0_4-LJTI0_0
L0_0_set_7 = LBB0_7-LJTI0_0
L0_0_set_8 = LBB0_8-LJTI0_0
LJTI0_0:
.long L0_0_set_3
.long L0_0_set_4
.long L0_0_set_7
.long L0_0_set_8

.subsections_via_symbols

Diff Detail

Event Timeline

djasper updated this revision to Diff 18977.Jan 29 2015, 10:27 AM
djasper retitled this revision from to Experiment with keeping GEPs near calls.
djasper updated this object.
djasper edited the test plan for this revision. (Show Details)
djasper added reviewers: chandlerc, rnk, qcolombet.
djasper added a subscriber: Unknown Object (MLST).
qcolombet edited edge metadata.Feb 23 2015, 11:45 AM

Hi Daniel,

I do not know if you still expect feedback from this post, as we already discuss the matter on IRC.
I add a couple of comment here for the record.

Note: Please move the commit list to llvm instead of CFE :).

Thanks,
Q.

lib/CodeGen/RegisterCoalescer.cpp
847 ↗(On Diff #18977)

This test is used to avoid rematerializing expensive operation.

849 ↗(On Diff #18977)

This check is used to ensure that we won’t introduce correctness issues.
The following code only supports trivial rematerialization. For more general rematerialization, you have to do more interfere checks.

And for the rest of us... Am I right in gleaning from this that we need to do a better job of taking advantage of addressing modes used to compute the targets of indirect calls?

djasper edited edge metadata.Feb 23 2015, 1:21 PM
djasper edited subscribers, added: Unknown Object (MLST); removed: Unknown Object (MLST).
djasper updated this revision to Diff 20538.Feb 23 2015, 1:28 PM

Current work-in-progress.

To summarize what was discussed through other channels (IRC mostly):

  • We eventually want to have a lazy code motion pass which sinks GEPs into loops if they can be folded into address mode instructions (well, if sinking does not significantly increase the cost of the operation inside the loop) and dependent on register pressure.
  • For the first attempts, it seems easier to re-use MachineLICM. Conceptually, this move loop-invariant code, so it isn't the "wrong" place and it already calculates much of the required information.

Still incomplete, but feedback is appreciated.

djasper updated this revision to Diff 20540.Feb 23 2015, 1:29 PM

Hi Daniel,

Could you upload a patch with the full context?

Thanks,
-Quentin

ab added a subscriber: ab.Mar 5 2015, 3:07 PM
djasper updated this revision to Diff 21320.Mar 5 2015, 3:24 PM

Slightly updated patch with full context.

Uploaded patch with full context. We probably want to hide this behavior behind a flag, too, and do some benchmarks/experiments.

Hi Daniel,

Thanks for your patience.

First of all, yes this new logic needs to be hidden behind a flag until it is properly tuned.
In particular, we miss a profitability model, i.e., reduce exceeded register pressure, before doing the sinking.

Now, regarding the code itself, the sinking logic, modulo the missing profitability model, looks right to me, but it does not seem correct to have it in HoistOutOfLoop :).

Cheers,
-Quentin

lib/CodeGen/MachineLICM.cpp
776

A comment here would be welcome.
I suspect you are just interested in the fact that the instruction shouldn't use or define physical registers as well as memory operand (which is part of what this function does).
Without a comment, at first glance, this is confusing because instructions in the preheader should be loop invariant!

782

The idiom is usually !MO.getReg()

789

I believe this check shouldn't be needed for the general approach.

djasper added inline comments.Mar 13 2015, 2:26 AM
lib/CodeGen/MachineLICM.cpp
776

Added comment and removed HasLoopPHIUse, which I am no longer sure is necessary.

782

Done.

789

I generally agree, but we need some sort of cost model. And for now, assuming that we'd always want to sink copies seems like a reasonable first one. Added a comment to this effect.

djasper updated this revision to Diff 21906.Mar 13 2015, 2:27 AM
  • Addressed review comments
  • Added test.
qcolombet accepted this revision.Mar 13 2015, 10:14 AM
qcolombet edited edge metadata.

Hi Daniel,

LGTM with minor nitpicks.

Thanks,
-Quentin

lib/CodeGen/MachineLICM.cpp
797

-> it must *not* have...

798

The test for HasLoopPHIUse is not strictly necessary, but it acts as an optimization for the next check I believe.
Indeed, if the candidate is used within a phi for the loop, it won’t be sunken.

This revision is now accepted and ready to land.Mar 13 2015, 10:14 AM
djasper closed this revision.Mar 15 2015, 6:59 AM

Addressed comments and submitted as r232262.

lib/CodeGen/MachineLICM.cpp
797

Fixed.

798

Re-instated.