This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improvement in CodeGen instruction selection for LEAs.
ClosedPublic

Authored by jbhateja on Jul 5 2017, 8:40 AM.

Download Raw Diff

Details

Reviewers

lsaba
RKSimon
craig.topper
qcolombet
jmolloy
jbhateja

Commits

rG328199ec2643: [X86] Improvement in CodeGen instruction selection for LEAs.
rG908c8b37c2be: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
rL319543: [X86] Improvement in CodeGen instruction selection for LEAs.
rL313343: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.

Summary

1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to

 accommodate similar operand appearing in the DAG  e.g.
             T1 = A + B
             T2 = T1 + 10
             T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will now look like
            Base = B , Index = A , Scale = 2 , Disp = 10

2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity

then complex LEAs (having 3 operands) could be factored out  e.g.
            leal 1(%rax,%rcx,1), %rdx
            leal 1(%rax,%rcx,2), %rcx
will be factored as following
            leal 1(%rax,%rcx,1), %rdx
            leal (%rdx,%rcx)   , %edx

3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.

4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put.

PR32755 will be taken care of by this pathc.

Previous patch revisions : r313343 , r314886

Diff Detail

Build Status

Buildable 10016
Build 10016: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Please add test cases for the optimization added in OptimizeLEAPass

lsaba added inline comments.Jul 6 2017, 2:06 AM

lib/Target/X86/X86OptimizeLEAs.cpp
114	This comment is only valid for the else statement, please change it to explain the different cases between Identical and Similar Disp
495	Please add a comment explaining what this function does

Changes for review comments for patch.

Reviewers,

Have posted an RFC for wider fix at following link

Kindly review the same and add your comments

https://groups.google.com/forum/#!topic/llvm-dev/x2LDXpON500

Thanks.

Harbormaster completed remote builds in B8077: Diff 105755.Jul 9 2017, 6:15 AM

Missed one change to be submitted, build is running will upload the change post lit. Thanks

spatel added a subscriber: spatel.Jul 9 2017, 8:21 AM

Review comments changes cont..

lsaba added inline comments.Jul 10 2017, 6:20 AM

lib/Target/X86/X86OptimizeLEAs.cpp
943	should it also be erased from the LEAs list?

jbhateja added inline comments.Jul 10 2017, 8:24 AM

lib/Target/X86/X86OptimizeLEAs.cpp
943	Why do you think so ? LEAs is a Map where Key = F ( BASE , INDEX , DISP , SEGMENT) Value = Vector of MI (LEA Instr). This MAP is populated per BasicBlock basis. Outer Loop traverse over Map entries Sort Vector in decresing order of Scale. Inner Loop traverses over Sorted vector of LEA for a given Key LI1 insturction will be traversed only once. Map will be delted once we leave this function. Machine CSE which is value number based is already run before this pass so if there are multiple identical LEAs (i.e same BASE/INDEX/SCALE/DISP/SEGMENT) in a BasicBlock they will be factored out before we land up here..

lsaba added inline comments.Jul 11 2017, 5:02 AM

lib/Target/X86/X86OptimizeLEAs.cpp
943	just making sure:) by the way, can't this algorithm work cross a function's basic blocks?

Performing LEA factorization/CSE across bacis blocks.

jbhateja marked an inline comment as done.Jul 17 2017, 8:05 AM

jbhateja marked 2 inline comments as done and an inline comment as not done.Jul 17 2017, 8:07 AM

ping @reviewers

lsaba added inline comments.Jul 19 2017, 8:03 AM

lib/Target/X86/X86OptimizeLEAs.cpp
418	it is unclear what this function does, can you explain?

jbhateja added inline comments.Jul 19 2017, 9:54 AM

lib/Target/X86/X86OptimizeLEAs.cpp
418	In a nutshell we are implementing a scoped hash map. Which is LEAs. Every time we enter a new scope and encounter an LEA we first record the length of list of MIs corresponding to MemOpKey of new LEA. After that we insert the new LEA in the beginning of the list which is a value field of the hash map. When we leave a scope we remove the LEA instructions from the LEAs hash map. Since we recorded the original length of list of MIs when we entered the scope at exit we keep on removing elements from the beginning of list till the size becomes same as what was recorded at the entry.

jbhateja marked an inline comment as done.Jul 19 2017, 9:55 AM

lsaba added inline comments.Jul 24 2017, 5:52 AM

lib/Target/X86/X86OptimizeLEAs.cpp
419	already initialized at the beginning of the function
test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	can you please add a test case that covers scale >1 cases
test/CodeGen/X86/umul-with-overflow.ll
8	why did this test change?

jbhateja added inline comments.Jul 24 2017, 8:18 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	This commit if you see has two parts 1/ pattern matching based on addressing mode (which is limited currently). 2/ factoring of LEAs which is generic. Checking in incremental changes should be fine I guess. Generic pattern will need to be brought out of addessing mode based selection as I described in following link https://groups.google.com/forum/#!topic/llvm-dev/x2LDXpON500 Please comment in the thread.
test/CodeGen/X86/umul-with-overflow.ll
8	Beecause I generated its output with script utils/update_llc_test_checks.py which adds an assertion for each instruction. I think it sould be fine.

jbhateja added inline comments.Jul 24 2017, 8:20 AM

lib/Target/X86/X86OptimizeLEAs.cpp
419	Yes.

ping @ reviewers. can we do an incremental checkin for this.

Thanks

lsaba added inline comments.Jul 26 2017, 4:21 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	I am not sure i understand what you mean by "Generic pattern will need to be brought out of addessing mode" , as far as i understand, for the following C code: int foo(int a, int b) { int x = a + 2b + 4; int y = a + 4b + 4; int c = x*y ; return c; } the currently generated IR: define i32 @foo(i32 %a, i32 %b) local_unnamed_addr #0 { entry: %mul = shl i32 %b, 1 %add = add i32 %a, 4 %add1 = add i32 %add, %mul %mul2 = shl i32 %b, 2 %add4 = add i32 %add, %mul2 %mul5 = mul nsw i32 %add1, %add4 ret i32 %mul5 } the currently generated asm: leal 4(%rdi,%rsi,2), %ecx leal 4(%rdi,%rsi,4), %eax imull %ecx, %eax retq this will be refactored by this optimization in this current commit (not a future commit) to: leal 4(%rdi,%rsi,2), %ecx leal (%ecx,%rsi,2), %eax imull %ecx, %eax retq please correct me if im wrong
test/CodeGen/X86/lea-opt-cst.ll
3 ↗	(On Diff #105275)	please generate the test with the original checks before your changes and commit it first in a separate commit
test/CodeGen/X86/umul-with-overflow.ll
8	This needs to be in a separate pre commit. please commit and rebase

RKSimon added inline comments.Jul 26 2017, 4:24 AM

lib/Target/X86/X86OptimizeLEAs.cpp
239	Can we avoid the static?
317	Comment describing the purpose of the class
340	(style) cleanup the positions of the * - check what clang-format does
340	comment
355	comment
375	(style) remove braces
test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	Please can you commit this test file to trunk with current codegen and update the patch to show the diff
test/CodeGen/X86/lea-opt-cst.ll
3 ↗	(On Diff #105275)	Please can you commit this test file to trunk with current codegen and update the patch to show the diff
test/CodeGen/X86/umul-with-overflow.ll
8	I regenerated this recently - please rebase
utils/TableGen/DAGISelMatcherGen.cpp
308 ↗	(On Diff #105275)	This is still here

jbhateja added inline comments.Jul 26 2017, 8:33 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	Hi Lama, By generic patten handling I meant LEA folding into complex LEAs which is currently restrictive. Consider following case %struct.SA = type { i32 , i32 , i32 , i32 , i32}; define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 { entry: %h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0 %0 = load i32, i32* %h0, align 8 %h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3 %h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4 %1 = load i32, i32* %h4, align 8 %add = add i32 %0, 1 %add4 = add i32 %add, %1 %add5 = add i32 %add4, %1 store i32 %add5, i32* %h3, align 4 %add10 = add i32 %add5, %1 %add29 = add i32 %add10, %1 store i32 %add29, i32* %h4, align 8 ret void } ASM : foo: # @foo .cfi_startproc BB#0: # %entry movl (%rdi), %eax movl 16(%rdi), %ecx leal (%rax,%rcx,2), %edx leal 1(%rax,%rcx,2), %eax movl %eax, 12(%rdi) leal 1(%rdx,%rcx,2), %eax movl %eax, 16(%rdi) It could be further optimized to following: movl (%rdi), %eax movl 16(%rdi), %ecx leal 1(%rax,%rcx,2), %edx movl %eax, 12(%rdi) leal (%rdx,%rcx,2), %eax movl %eax, 16(%rdi) Folding is currently being done as a part of addressing mode matcher, I feel that efficient folding can only be done as a separate MI pass, that is what I explained in the proposal (http://lists.llvm.org/pipermail/llvm-dev/2017-July/115182.html). Thanks for your example I will add it to the test cases , it demonstrates generic ness of Factorization.

lsaba added inline comments.Jul 27 2017, 12:48 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	Hi, Thanks, I understand the need for a more generic pattern matching and I agree. This is unrelated to my comment which refers solely to the Factorize LEA optimization which needs more testing, for example covering different Scale values (like the example i provided) and testing factorizing LEAs cross Basic Blocks.

RKSimon mentioned this in rL309262: [X86] Adding test cases for LEA factorization (PR32755 / D35014).Jul 27 2017, 3:37 AM

lsaba added inline comments.Jul 31 2017, 4:50 AM

lib/Target/X86/X86OptimizeLEAs.cpp
930	This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle it separately, for example in this reproducer : ; ModuleID = 'bugpoint-reduced-simplified.bc' source_filename = "bugpoint-output-2ef2e5d.bc" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: norecurse nounwind readnone uwtable define i32 @foo(i32 %a, i32 %b, i32 %d, i32 %y, i32 %x) local_unnamed_addr #0 { entry: %mul1 = shl i32 %b, 1 %add2 = add i32 %a, 4 %add3 = add i32 %add2, %mul1 %mul4 = shl i32 %b, 2 %add6 = add i32 %add2, %mul4 br label %for.body for.cond.cleanup: ; preds = %for.body ret i32 %add for.body: ; preds = %for.body, %entry %x.addr.015 = phi i32 [ %x, %entry ], [ %add3, %for.body ] %y.addr.014 = phi i32 [ %y, %entry ], [ %add6, %for.body ] %mul = mul nsw i32 %x.addr.015, %y.addr.014 %add = add nsw i32 0, %mul %exitcond = icmp eq i32 undef, %d br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !1 } attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp- math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+ mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 6.0.0 (cfe/trunk 309511)"} !1 = distinct !{!1, !2} !2 = !{!"llvm.loop.unroll.disable"}

RKSimon added inline comments.Jul 31 2017, 5:58 AM

lib/Target/X86/X86OptimizeLEAs.cpp
30	(style) Insert the include in alphabetical order (so before MachineFunctionPass.h)

1/ Changes to cover review comments.
2/ Handling for patterns involving SUBREG_TO_REG as LEA operands.
3/ Formatting changes.

RKSimon added inline comments.Aug 1 2017, 9:14 AM

lib/Target/X86/X86OptimizeLEAs.cpp
159	(style) for (unsigned i = 1, e = MI->getNumOperands(); i <e ; i++)
265	(style) for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i)
test/CodeGen/X86/umul-with-overflow.ll
8	Still needs to be rebased - you've lost the x86_64 tests

Merge branch 'master' of https://github.com/llvm-mirror/llvm
Formatting changes

Pinging reviewers. Kindly pour your comments.
Thanks

In D35014#831745, @jbhateja wrote:

Pinging reviewers. Kindly pour your comments.
Thanks

Please address my last comment (Line 920)

In D35014#833192, @lsaba wrote:

In D35014#831745, @jbhateja wrote:

Pinging reviewers. Kindly pour your comments.
Thanks

Please address my last comment (Line 920)

The test case you provided is giving correct results with currently checked in changes.

lsaba added inline comments.Aug 7 2017, 5:17 AM

lib/Target/X86/X86OptimizeLEAs.cpp
258	need to check MO2.isReg()

jbhateja added inline comments.Aug 8 2017, 7:03 AM

lib/Target/X86/X86OptimizeLEAs.cpp
258	Yes, I shall take care of this. Kindly let me know if there are any other comments apart from this. It shall save iterations.

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

In D35014#835240, @jbhateja wrote:

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

Hi,
I ran the patch on several benchmarks to check performance, overall the changes look good, but there is a regression in one of the benchmarks (EEMBC/coremark-pro) caused by creating an undesired lea instruction instead of the previously created add instruction, I am working on creating a simple reproducer for the problem and would appreciate your patience.

Thanks

In D35014#835498, @lsaba wrote:

In D35014#835240, @jbhateja wrote:

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

Hi,
I ran the patch on several benchmarks to check performance, overall the changes look good, but there is a regression in one of the benchmarks (EEMBC/coremark-pro) caused by creating an undesired lea instruction instead of the previously created add instruction, I am working on creating a simple reproducer for the problem and would appreciate your patience.

Thanks

The change in X86DAGToDAGISel::matchAddressBase is good when it allows us to git rid of extra lea/add instructions, or replace slow lea with fast lea, but in some cases it only replaces an add instruction with a lea instruction and since the throughput of add instruction is higher, we would prefer to keep the add instruction, for example, for the following IR:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind uwtable
define void @foo() local_unnamed_addr #0 {
entry:
  br i1 undef, label %BB2, label %BB1
BB1:                                  ; preds = %entry
  %rem.us.1 = srem i32 undef, 65536
  br label %BB2
BB2:      ; preds = %BB1, %entry
  %s = phi i32 [ undef, %entry ], [ %rem.us.1, %BB1 ]
  %a = phi i32 [ 1, %entry ], [ 0, %BB1 ]
  %mul1 = mul nsw i32 %s, %a
  %rem1 = srem i32 %mul1, 65536
  %add1 = add nsw i32 %rem1, %a
  %conv1 = trunc i32 %add1 to i16
  store i16 %conv1, i16* undef, align 2, !tbaa !1
  %add2 = add i32 %add1, %a
  %0 = trunc i32 %add2 to i16
  %conv2 = and i16 %0, 255
  store i16 %conv2, i16* undef, align 2, !tbaa !1
  ret void
}
attributes #0 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-
math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-
features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="true" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = !{!"clang version 6.0.0 (cfe/trunk 310239)"}
!1 = !{!2, !2, i64 0}
!2 = !{!"short", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}

the originally generated code was:

.LBB0_2:                                # %for.cond11.for.inc35_crit_edge.us.unr-lcssa
   movl	%eax, %ecx
   imull	%eax, %ecx
   movl	%ecx, %edx
   sarl	$31, %edx
   shrl	$16, %edx
   addl	%ecx, %edx
   andl	$-65536, %edx           # imm = 0xFFFF0000
   subl	%edx, %ecx
   addl	%eax, %ecx
   movw	%cx, (%rax)
   addl	%eax, %ecx
   movzbl	%cl, %eax
   movw	%ax, (%rax)
   retq

while the generated code now is

movl	%eax, %ecx
imull	%eax, %ecx
movl	%ecx, %edx
sarl	$31, %edx
shrl	$16, %edx
addl	%ecx, %edx
andl	$-65536, %edx           # imm = 0xFFFF0000
subl	%edx, %ecx
leal	(%rcx,%rax), %edx
movw	%dx, (%rax)
leal	(%rcx,%rax,2), %eax
movzbl	%al, %eax
movw	%ax, (%rax)
retq

Need to refine this optimization further to avoid such cases since the impact can be substantial if the code is in a hot loop for example

qcolombet added inline comments.Aug 9 2017, 10:58 AM

include/llvm/CodeGen/MachineInstr.h
1295	Genuine question. MRI is usually accessibly via other more efficient means. Do we really need to rely on this one?

Changes to avoid creating costly complex LEAs having scale less than equal to 2 in loops.
Strength reduction for simple LEAs with unit scale for better throughput.
Pattern matching for DAG folding has been improved to make it generic.
Incorporating other review comments.

jbhateja added inline comments.Aug 14 2017, 11:25 AM

include/llvm/CodeGen/MachineInstr.h
1295	It seems making it public will be useful as one can directly use the function which internally calls getParent() twice to get to MachineFunction which contains Reg Info.

ping @ Reviewers, I guess I have addressed all comments.

It seems like there are still correctness issues in the patch, I ran the llvm-test-suite and got a couple of runfails :
multisource_applications_alac_encode_alacconvert_encode
multisource_applications_jm_lencod_lencod

please debug those failures.

In general, please consider running execution tests on the patches to discover runtime failures.

Limiting the scope of DAG operands folding while AM based instruction selection to LEAs.
Formatting changes , rebase and lnt failure fix.

Harbormaster completed remote builds in B9546: Diff 112328.Aug 23 2017, 3:40 AM

Ping @ reviewers. I think all the comments have been resolved.
Do let me know if any other comments.

Extending aggressive AM based folding for LEAs to cover more cases.
Merge branch 'master' of https://github.com/llvm-mirror/llvm

Harbormaster completed remote builds in B9799: Diff 113356.Aug 30 2017, 10:52 PM

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

In D35014#859838, @jbhateja wrote:

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

There are still functionality issues with the pass, please allow some time to create a reproducer

In D35014#859838, @jbhateja wrote:

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

The following ll code fails in CodeGen selection, please debug the issue:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%struct.S1 = type { i32, i32 }
; Function Attrs: nounwind uwtable
define fastcc void @func(i32 %end) unnamed_addr #0 {
entry:
 br label %while.body
while.body:                                       ; preds = %if.end, %entry
 %a = phi i32 [ %end, %entry ], [ undef, %if.end ]
 br i1 undef, label %if.then, label %if.else
if.then:                                          ; preds = %while.body
 %dec = add nsw i32 %a, -1
 %idx1 = sext i32 %dec to i64
 %idx2 = getelementptr inbounds %struct.S1, %struct.S1* null, i64 %idx1
 %0 = bitcast %struct.S1* %idx2 to i64*
 store i64 0, i64* %0, align 4
 %1 = load [3 x float]*, [3 x float]** undef, align 8 
 %idx3 = getelementptr inbounds [3 x float], [3 x float]* %1, i64 %idx1, i64 0
 %2 = bitcast float* %idx3 to i32*
 %3 = load i32, i32* %2, align 4
 store i32 %3, i32* undef, align 4
 br label %if.end
if.else:                                          ; preds = %while.body
 br label %if.end
if.end:                                           ; preds = %if.else, %if.then
 br label %while.body
}
attributes #0 = { nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }
!llvm.module.flags = !{!0}
!0 = !{i32 1, !"wchar_size", i32 4}

Some style comments, but @lsaba 's comments need to be dealt with first.

include/llvm/CodeGen/MachineInstr.h
1296	(style) newline before private
include/llvm/CodeGen/SelectionDAG.h
306	(style) newline before private
lib/Target/X86/X86ISelDAGToDAG.cpp
177	(style) newline
1174	clang-format? If so, commit it as an NFC change
1182	Two hard coded depths like this is weird - better to have a getMaxOperandFoldingDepth() helper?
lib/Target/X86/X86OptimizeLEAs.cpp
224	NFC change - just commit it if you want, but don't pollute a patch with it
528	NFC change
562	clang-format? If so, commit it as an NFC change
856	clang-format? If so, commit it as an NFC change
970	(style) Remove braces

Fine tuning pattern matching condition.
Formatting changes.

Harbormaster completed remote builds in B9890: Diff 113802.Sep 4 2017, 9:55 PM

3-Ops LEA are costly starting target SandyBridge , is there a limitation in the code for the targets this transformation works on? If not I think there should be.
you can check the Slow3OpsLEA feature for the full list of targets.

In D35014#861853, @lsaba wrote:

3-Ops LEA are costly starting target SandyBridge , is there a limitation in the code for the targets this transformation works on? If not I think there should be.
you can check the Slow3OpsLEA feature for the full list of targets.

Yes, this check could be added in pattern matching where i'm avoiding creation of complex LEA with scale less than equal to 2.

Please let me know if anything else you see to save iteration.
Thanks

Rebasing again.
Adding a check for subtarget feature Slow3OpLEA in pattern matching.

ping @ reviewers.

@lsaba, @reviewers , waiting for your LGTM or any remaining comments on this.
Thanks

lsaba accepted this revision.Sep 13 2017, 7:13 AM

This revision is now accepted and ready to land.Sep 13 2017, 7:13 AM

Few synthetic changes.

Closed by commit rL313343: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. (authored by jbhateja). · Explain WhySep 14 2017, 10:32 PM

This revision was automatically updated to reflect the committed changes.

Reverted in rL313376 due to PR34629 and PR34634

This revision is now accepted and ready to land.Sep 16 2017, 2:39 AM

PR34629 and PR34634 need to be addressed

This revision now requires changes to proceed.Sep 16 2017, 2:42 AM

Undefining result operand of factored statement to preserve SSA nature of Machine IR.
This fixes reperted PR 34634 and PR 34629 and build-bot failures reported.

@RKSimon, @Reviewers, revision was in accepted state earlier and fix to counter reported issues post commit to trunk has been fixed. Please do let me know if another acceptance is needed to land this again.

Updating tests for reported PRs for initial patch.

jmolloy requested changes to this revision.Sep 18 2017, 5:04 AM

jmolloy added a subscriber: jmolloy.

jmolloy added inline comments.

lib/Target/X86/X86OptimizeLEAs.cpp
1029	This can cause recursion deep enough to cause stack overflows. Please could you refactor this to not use direct recursion? The domtree may be hundreds of nodes deep in degenerate cases.

This revision now requires changes to proceed.Sep 18 2017, 5:04 AM

Merge branch 'master' of https://github.com/llvm-mirror/llvm
Making Factorization algorithm iterative.

@reviewers, required revision change are through, let me know if this can land back.

@jmolloy , @RKSimon , this patch has been reviewd and due to regression was opened again for review, required changes have
been made, can this land now in trunk if there are no more observations from any reviewers.

Thanks

@reviewers, if no more comment I shall be landing this into trunk since required revision changes post acceptance are through.

Operands of factored LEA must belong to same register class as per Intel's Architecture Manual.
Some code reorganization + rebase.

D35014 : Review comments resolution
Removing 2 tests, pulled their latest renamed versions from trunk.
[X86] : Factorize LEA, handling for patterns involing SUBREG_TO_REG as LEA operands.
Few more changes for LEA factorization.
Updating test lea-opt-cse3.ll
Formatting changes.
Formatting changes
Changes to avoid creating costly LEAs in loops, strength reduction for simple LEAs with unit scale
Updating test.
[X86] Limiting the scope of DAG operands folding while AM based instruction selection to LEAs.
Merge from trunk.
Extending aggressive AM based folding for LEAs to cover more cases.
Updating test post rebase.
Formatting changes + fine tuning pattern matching condition.
Adding a check for subtarget feature Slow3OpLEA in pattern matching.
Few synthetic changes.
Undefining result operand of factored statement to preserve SSA nature of Machine IR.
Merge branch 'master' of https://github.com/llvm-mirror/llvm
Merge branch 'master' of https://github.com/llvm-mirror/llvm
Updating tests for reported PRs for initial patch.
Merge branch 'master' of https://github.com/llvm-mirror/llvm
Pull from trunk.
Operands of LEAs must be of same register class.
Revert "Operands of LEAs must be of same register class."

Operands of LEA must be of same register class, this constraint is as per Intel's architecture manual.
Remove map entry from LEAs map if value list becomes empty.
Rebase.

Patch has been regressed through chrome test sweet.
No issues reported. Thanks to Hans Wennborg (hans@chromium.org) for validating it.

RKSimon added inline comments.Oct 29 2017, 6:12 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
105	bool isLegalScale() const {
199	Is a default argument for a setter a good idea? Especially one that is the inverse of what the setter says it is.
1182	My comment still stands - try to avoid hard coded values embedded in the source - add a getMaxOperandFoldingDepth() helper.
1484	These AM.Scale increments are scary - better to set it with AM.Scale = 2?
lib/Target/X86/X86OptimizeLEAs.cpp
30	Include ordering still broken
308	Do you mean: bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent());
930	Has @lsaba test been added to the patch? I couldn't see it.
976	Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp auto IsLegalScale = [](int S) { return S == 1 \|\| S == 2 \|\| S == 4 \|\| S == 8; };
982	return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm();
1037	DL is only used here - just use LI1.getDebugLoc() directly?
test/CodeGen/X86/lea-opt-cse2.ll
3	Why have you changed these tests?
test/CodeGen/X86/lea-opt-cse4.ll
3	Why have you changed these tests?

jbhateja retitled this revision from [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. to [X86] Improvement in CodeGen instruction selection for LEAs..Oct 29 2017, 8:36 AM

jbhateja edited the summary of this revision. (Show Details)

jbhateja added inline comments.Oct 29 2017, 9:55 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
105	Fixed.
199	Fixed. We do not need a default argument here, both the calls to this routines is passing an explicit argument.
1182	Helper added.
1484	Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is a max legal scale). This was intentionally done, initial change was only working for AM.scale = 2 and was very restrictive. Aggressive operand folding is done only for LEAs currently and is enabled instantiating and RAII object of X86AggressiveOperandFolding class.
lib/Target/X86/X86OptimizeLEAs.cpp
308	fixed.
930	We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization inside loops, only simplifyLEA can kick in.
930	We have a test case for loops lea-opt-cse2.ll, so not added this. We are not doing any factorization inside loops, only simplifyLEA can kick in.
976	Fixed
982	Fixed
test/CodeGen/X86/lea-opt-cse4.ll
3	FixupLEAPass down the pipeline transforms some complex LEA ptterns to simple with add. Optimization, with changes in the patch we will have following leal 1(%rax,%rcx,4), %eax which after FixupLEAPass will get converted to leal (%rax,%rcx,4), %eax addl $1, %eax

Rebasing
Review comments resolution.

@RKSimon, requested revision changes have been made as per your comments. Can you please validate.

1/ Making the factorization alog iterative. This was earlier commited with

Diff : https://reviews.llvm.org/D35014?id=116144
but some how got removed in successive commits.

2/ Rebasing again. All comments are resolved.

@RKSimon, @lsaba , @jmolly , all your comments have been addressed. Kindly verify so that I can land this into trunk.

A few minor comments @lsaba @craig.topper any final comments?

lib/Target/X86/X86ISelDAGToDAG.cpp
203	Make this a const method
1182	I meant make this a class method, but if you don't want to you can leave it here as lambda
lib/Target/X86/X86OptimizeLEAs.cpp
93	CopyLike
239	Again, can we avoid the static?

@RKSimon No more comments from my side

I haven't been following this much so I have no comments either.

LGTM

LGTM - with those final few minors I mentioned

Reivew comment resolution.
Rebasing patch.

Rebasing to resolve incorrect overrideing of register names in kill statements.

jbhateja added inline comments.Nov 28 2017, 11:25 PM

lib/Target/X86/X86OptimizeLEAs.cpp
239	Its used bacause we want MemOpKey for LEA factorization to be indipendent of Scale, keeping it as static avoids recreation of dummy scale.

Closed by commit rL319543: [X86] Improvement in CodeGen instruction selection for LEAs. (authored by jbhateja). · Explain WhyDec 1 2017, 6:08 AM

This revision was automatically updated to reflect the committed changes.

rL319543 was reverted at rL319591 due to asan bot breakage

Please rebase. Thanks.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 7 2019, 5:06 AM

Closed by commit rG328199ec2643: [X86] Improvement in CodeGen instruction selection for LEAs. (authored by jbhateja). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:06 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineInstr.h

3 lines

SelectionDAG.h

3 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGISel.cpp

11 lines

Target/

X86/

X86ISelDAGToDAG.cpp

85 lines

X86OptimizeLEAs.cpp

404 lines

test/

CodeGen/

X86/

GlobalISel/

2 lines

34 lines

2 lines

14 lines

42 lines

40 lines

76 lines

12 lines

15 lines

9 lines

mul-constant-result.ll

14 lines

umul-with-overflow.ll

16 lines

Transforms/

LoopStrengthReduce/

X86/

ivchain-X86.ll

6 lines

Commit	Tree	Parents	Author	Summary	Date
c34762454f98	35f40c8f47b5	c9b297466515	Jatin Bhateja	Adding a check for subtarget feature Slow3OpLEA in pattern matching.	Sep 7 2017, 3:42 PM
c9b297466515	0c8f6647ff43	391227b6858b	Jatin Bhateja	Formatting changes + fine tuning pattern matching condition.	Sep 4 2017, 9:50 PM
391227b6858b	65ee67753f6b	1c8f0ee986a6	Jatin Bhateja	Updating test post rebase.	Sep 2 2017, 7:06 AM
1c8f0ee986a6	3a7bed9043a4	8d993e3c3216	Jatin Bhateja	Extending aggressive AM based folding for LEAs to cover more cases.	Aug 30 2017, 10:46 PM
8d993e3c3216	99b92448bb7e	13477a59cb3b	Jatin Bhateja	Merge from trunk.	Aug 28 2017, 3:43 AM
13477a59cb3b	5047f682e847	e9d2700f9f41	Jatin Bhateja	[X86] Limiting the scope of DAG operands folding while AM based instruction… (Show More…)	Aug 22 2017, 8:07 PM
e9d2700f9f41	4162b0823d03	81de1d9cd427	Jatin Bhateja	Updating test.	Aug 14 2017, 11:05 AM
81de1d9cd427	393f459cdc41	e8366431704b	Jatin Bhateja	Changes to avoid creating costly LEAs in loops, strength reduction for simple… (Show More…)	Aug 14 2017, 10:50 AM
e8366431704b	958c4793227c	1c7301883945	Jatin Bhateja	Formatting changes	Aug 1 2017, 10:00 AM
1c7301883945	4588884d4be9	ff95474b94a2	Jatin Bhateja	Formatting changes.	Aug 1 2017, 7:35 AM
ff95474b94a2	1e4aec08e877	9fac37d47267	Jatin Bhateja	Updating test lea-opt-cse3.ll	Jul 31 2017, 8:04 PM
9fac37d47267	501bf8fd9f91	0980f6e0c80c	Jatin Bhateja	Few more changes for LEA factorization.	Jul 31 2017, 7:55 PM
0980f6e0c80c	94ea59a55c19	814bae033495	Jatin Bhateja	[X86] : Factorize LEA, handling for patterns involing SUBREG_TO_REG as LEA… (Show More…)	Jul 31 2017, 8:01 AM
814bae033495	6d54239de489	c798335767d8	Jatin Bhateja	Removing 2 tests, pulled their latest renamed versions from trunk.	Jul 27 2017, 10:07 PM
c798335767d8	624ca34fcdfc	fdb37c007a66	Jatin Bhateja	D35014 : Review comments resolution	Jul 27 2017, 10:00 PM
fdb37c007a66	bbb738419f00	87b0093fce1f	Jatin Bhateja	[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. (Show More…)	Jul 5 2017, 3:28 AM
87b0093fce1f	c5e8e0eb6886	9f206fb20b8f	Jatin Bhateja	[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs 1/… (Show More…)	Jul 5 2017, 3:22 AM

Diff 114326

include/llvm/CodeGen/MachineInstr.h

Show First 20 Lines • Show All 1,283 Lines • ▼ Show 20 Lines	if (MO.isReg() && MO.isTied()) {
getOperand(findTiedOperandIdx(OpIdx)).TiedTo = 0;		getOperand(findTiedOperandIdx(OpIdx)).TiedTo = 0;
MO.TiedTo = 0;		MO.TiedTo = 0;
}		}
}		}

/// Add all implicit def and use operands to this instruction.		/// Add all implicit def and use operands to this instruction.
void addImplicitDefUseOperands(MachineFunction &MF);		void addImplicitDefUseOperands(MachineFunction &MF);

private:
/// If this instruction is embedded into a MachineFunction, return the		/// If this instruction is embedded into a MachineFunction, return the
/// MachineRegisterInfo object for the current function, otherwise		/// MachineRegisterInfo object for the current function, otherwise
/// return null.		/// return null.
MachineRegisterInfo *getRegInfo();		MachineRegisterInfo *getRegInfo();
		qcolombetUnsubmitted Not Done Reply Inline Actions Genuine question. MRI is usually accessibly via other more efficient means. Do we really need to rely on this one? qcolombet: Genuine question. MRI is usually accessibly via other more efficient means. Do we really need…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions It seems making it public will be useful as one can directly use the function which internally calls getParent() twice to get to MachineFunction which contains Reg Info. jbhateja: It seems making it public will be useful as one can directly use the function which internally…

		RKSimonUnsubmitted Not Done Reply Inline Actions (style) newline before private RKSimon: (style) newline before private
		private:

/// Unlink all of the register operands in this instruction from their		/// Unlink all of the register operands in this instruction from their
/// respective use lists. This requires that the operands already be on their		/// respective use lists. This requires that the operands already be on their
/// use lists.		/// use lists.
void RemoveRegOperandsFromUseLists(MachineRegisterInfo&);		void RemoveRegOperandsFromUseLists(MachineRegisterInfo&);

/// Add all of the register operands in this instruction from their		/// Add all of the register operands in this instruction from their
/// respective use lists. This requires that the operands not be on their		/// respective use lists. This requires that the operands not be on their
/// use lists yet.		/// use lists yet.
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	public:

/// When true, additional steps are taken to		/// When true, additional steps are taken to
/// ensure that getConstant() and similar functions return DAG nodes that		/// ensure that getConstant() and similar functions return DAG nodes that
/// have legal types. This is important after type legalization since		/// have legal types. This is important after type legalization since
/// any illegally typed nodes generated after this point will not experience		/// any illegally typed nodes generated after this point will not experience
/// type legalization.		/// type legalization.
bool NewNodesMustHaveLegalTypes = false;		bool NewNodesMustHaveLegalTypes = false;

		/// Set to true for DAG of BasicBlock contained inside a loop.
		bool IsDAGPartOfLoop = false;

private:		private:
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) newline before private RKSimon: (style) newline before private
/// DAGUpdateListener is a friend so it can manipulate the listener stack.		/// DAGUpdateListener is a friend so it can manipulate the listener stack.
friend struct DAGUpdateListener;		friend struct DAGUpdateListener;

/// Linked list of registered DAGUpdateListener instances.		/// Linked list of registered DAGUpdateListener instances.
/// This stack is maintained by DAGUpdateListener RAII.		/// This stack is maintained by DAGUpdateListener RAII.
DAGUpdateListener *UpdateListeners = nullptr;		DAGUpdateListener *UpdateListeners = nullptr;

/// Implementation of setSubgraphColor.		/// Implementation of setSubgraphColor.
▲ Show 20 Lines • Show All 1,255 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show All 20 Lines
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/OptimizationDiagnosticInfo.h"		#include "llvm/Analysis/OptimizationDiagnosticInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/CodeGen/FastISel.h"		#include "llvm/CodeGen/FastISel.h"
#include "llvm/CodeGen/FunctionLoweringInfo.h"		#include "llvm/CodeGen/FunctionLoweringInfo.h"
#include "llvm/CodeGen/GCMetadata.h"		#include "llvm/CodeGen/GCMetadata.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	SelectionDAGISel::~SelectionDAGISel() {
delete CurDAG;		delete CurDAG;
delete FuncInfo;		delete FuncInfo;
}		}

void SelectionDAGISel::getAnalysisUsage(AnalysisUsage &AU) const {		void SelectionDAGISel::getAnalysisUsage(AnalysisUsage &AU) const {
if (OptLevel != CodeGenOpt::None)		if (OptLevel != CodeGenOpt::None)
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<GCModuleInfo>();		AU.addRequired<GCModuleInfo>();
		if (OptLevel != CodeGenOpt::None)
		AU.addRequired<LoopInfoWrapperPass>();
AU.addRequired<StackProtector>();		AU.addRequired<StackProtector>();
AU.addPreserved<StackProtector>();		AU.addPreserved<StackProtector>();
AU.addPreserved<GCModuleInfo>();		AU.addPreserved<GCModuleInfo>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
if (UseMBPI && OptLevel != CodeGenOpt::None)		if (UseMBPI && OptLevel != CodeGenOpt::None)
AU.addRequired<BranchProbabilityInfoWrapperPass>();		AU.addRequired<BranchProbabilityInfoWrapperPass>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
▲ Show 20 Lines • Show All 1,078 Lines • ▼ Show 20 Lines	else
FastIS->setLastLocalValue(nullptr);		FastIS->setLastLocalValue(nullptr);
}		}
createSwiftErrorEntriesInEntryBlock(FuncInfo, FastIS, TLI, TII, SDB);		createSwiftErrorEntriesInEntryBlock(FuncInfo, FastIS, TLI, TII, SDB);

processDbgDeclares(FuncInfo);		processDbgDeclares(FuncInfo);

// Iterate over all basic blocks in the function.		// Iterate over all basic blocks in the function.
for (const BasicBlock *LLVMBB : RPOT) {		for (const BasicBlock *LLVMBB : RPOT) {
		CurDAG->IsDAGPartOfLoop = false;
if (OptLevel != CodeGenOpt::None) {		if (OptLevel != CodeGenOpt::None) {
bool AllPredsVisited = true;		bool AllPredsVisited = true;
for (const_pred_iterator PI = pred_begin(LLVMBB), PE = pred_end(LLVMBB);		for (const_pred_iterator PI = pred_begin(LLVMBB), PE = pred_end(LLVMBB);
PI != PE; ++PI) {		PI != PE; ++PI) {
if (!FuncInfo->VisitedBBs.count(*PI)) {		if (!FuncInfo->VisitedBBs.count(*PI)) {
AllPredsVisited = false;		AllPredsVisited = false;
break;		break;
}		}
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	for (const BasicBlock *LLVMBB : RPOT) {

if (getAnalysis<StackProtector>().shouldEmitSDCheck(*LLVMBB)) {		if (getAnalysis<StackProtector>().shouldEmitSDCheck(*LLVMBB)) {
bool FunctionBasedInstrumentation =		bool FunctionBasedInstrumentation =
TLI->getSSPStackGuardCheck(*Fn.getParent());		TLI->getSSPStackGuardCheck(*Fn.getParent());
SDB->SPDescriptor.initialize(LLVMBB, FuncInfo->MBBMap[LLVMBB],		SDB->SPDescriptor.initialize(LLVMBB, FuncInfo->MBBMap[LLVMBB],
FunctionBasedInstrumentation);		FunctionBasedInstrumentation);
}		}

		if (OptLevel != CodeGenOpt::None) {
		auto &LIWP = getAnalysis<LoopInfoWrapperPass>();
		LoopInfo &LI = LIWP.getLoopInfo();
		if (LI.getLoopFor(LLVMBB))
		CurDAG->IsDAGPartOfLoop = true;
		}

if (Begin != BI)		if (Begin != BI)
++NumDAGBlocks;		++NumDAGBlocks;
else		else
++NumFastIselBlocks;		++NumFastIselBlocks;

if (Begin != BI) {		if (Begin != BI) {
// Run SelectionDAG instruction selection on the remainder of the block		// Run SelectionDAG instruction selection on the remainder of the block
// not handled by FastISel. If FastISel is not run, this is the entire		// not handled by FastISel. If FastISel is not run, this is the entire
▲ Show 20 Lines • Show All 2,152 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	bool hasSymbolicDisplacement() const {
MCSym != nullptr \|\| JT != -1 \|\| BlockAddr != nullptr;		MCSym != nullptr \|\| JT != -1 \|\| BlockAddr != nullptr;
}		}

bool hasBaseOrIndexReg() const {		bool hasBaseOrIndexReg() const {
return BaseType == FrameIndexBase \|\|		return BaseType == FrameIndexBase \|\|
IndexReg.getNode() != nullptr \|\| Base_Reg.getNode() != nullptr;		IndexReg.getNode() != nullptr \|\| Base_Reg.getNode() != nullptr;
}		}

		bool hasComplexAddressingMode() const {
		return Disp && IndexReg.getNode() != nullptr &&
		Base_Reg.getNode() != nullptr;
		}

/// Return true if this addressing mode is already RIP-relative.		/// Return true if this addressing mode is already RIP-relative.
bool isRIPRelative() const {		bool isRIPRelative() const {
if (BaseType != RegBase) return false;		if (BaseType != RegBase) return false;
if (RegisterSDNode *RegNode =		if (RegisterSDNode *RegNode =
dyn_cast_or_null<RegisterSDNode>(Base_Reg.getNode()))		dyn_cast_or_null<RegisterSDNode>(Base_Reg.getNode()))
return RegNode->getReg() == X86::RIP;		return RegNode->getReg() == X86::RIP;
return false;		return false;
}		}

		bool isLegalScale() {
		RKSimonUnsubmitted Not Done Reply Inline Actions bool isLegalScale() const { RKSimon: ``` bool isLegalScale() const { ```
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed. jbhateja: Fixed.
		return (Scale == 1 \|\| Scale == 2 \|\| Scale == 4 \|\| Scale == 8);
		}

void setBaseReg(SDValue Reg) {		void setBaseReg(SDValue Reg) {
BaseType = RegBase;		BaseType = RegBase;
Base_Reg = Reg;		Base_Reg = Reg;
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
void dump() {		void dump() {
dbgs() << "X86ISelAddressMode " << this << '\n';		dbgs() << "X86ISelAddressMode " << this << '\n';
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	class X86DAGToDAGISel final : public SelectionDAGISel {

/// If true, selector should try to optimize for code size instead of		/// If true, selector should try to optimize for code size instead of
/// performance.		/// performance.
bool OptForSize;		bool OptForSize;

/// If true, selector should try to optimize for minimum code size.		/// If true, selector should try to optimize for minimum code size.
bool OptForMinSize;		bool OptForMinSize;

		/// If true, selector should try to aggresively fold operands into AM.
		bool OptForAggressingFolding;

public:		public:
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) newline RKSimon: (style) newline
explicit X86DAGToDAGISel(X86TargetMachine &tm, CodeGenOpt::Level OptLevel)		explicit X86DAGToDAGISel(X86TargetMachine &tm, CodeGenOpt::Level OptLevel)
: SelectionDAGISel(tm, OptLevel), OptForSize(false),		: SelectionDAGISel(tm, OptLevel), OptForSize(false),
OptForMinSize(false) {}		OptForMinSize(false), OptForAggressingFolding(false) {}

StringRef getPassName() const override {		StringRef getPassName() const override {
return "X86 DAG->DAG Instruction Selection";		return "X86 DAG->DAG Instruction Selection";
}		}

bool runOnMachineFunction(MachineFunction &MF) override {		bool runOnMachineFunction(MachineFunction &MF) override {
// Reset the subtarget each time through.		// Reset the subtarget each time through.
Subtarget = &MF.getSubtarget<X86Subtarget>();		Subtarget = &MF.getSubtarget<X86Subtarget>();
SelectionDAGISel::runOnMachineFunction(MF);		SelectionDAGISel::runOnMachineFunction(MF);
return true;		return true;
}		}

void EmitFunctionEntryCode() override;		void EmitFunctionEntryCode() override;

bool IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const override;		bool IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const override;

void PreprocessISelDAG() override;		void PreprocessISelDAG() override;

		void setAggressiveOperandFolding(bool val = false) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Is a default argument for a setter a good idea? Especially one that is the inverse of what the setter says it is. RKSimon: Is a default argument for a setter a good idea? Especially one that is the inverse of what the…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed. We do not need a default argument here, both the calls to this routines is passing an explicit argument. jbhateja: Fixed. We do not need a default argument here, both the calls to this routines is passing an…
		OptForAggressingFolding = val;
		}

		bool getAggressiveOperandFolding() { return OptForAggressingFolding; }
		RKSimonUnsubmitted Not Done Reply Inline Actions Make this a const method RKSimon: Make this a const method

// Include the pieces autogenerated from the target description.		// Include the pieces autogenerated from the target description.
#include "X86GenDAGISel.inc"		#include "X86GenDAGISel.inc"

private:		private:
void Select(SDNode *N) override;		void Select(SDNode *N) override;

bool foldOffsetIntoAddress(uint64_t Offset, X86ISelAddressMode &AM);		bool foldOffsetIntoAddress(uint64_t Offset, X86ISelAddressMode &AM);
bool matchLoadInAddress(LoadSDNode *N, X86ISelAddressMode &AM);		bool matchLoadInAddress(LoadSDNode *N, X86ISelAddressMode &AM);
bool matchWrapper(SDValue N, X86ISelAddressMode &AM);		bool matchWrapper(SDValue N, X86ISelAddressMode &AM);
bool matchAddress(SDValue N, X86ISelAddressMode &AM);		bool matchAddress(SDValue N, X86ISelAddressMode &AM);
bool matchAdd(SDValue N, X86ISelAddressMode &AM, unsigned Depth);		bool matchAdd(SDValue N, X86ISelAddressMode &AM, unsigned Depth);
bool matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,		bool matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,
unsigned Depth);		unsigned Depth);
		bool matchAddressLEA(SDValue N, X86ISelAddressMode &AM);
bool matchAddressBase(SDValue N, X86ISelAddressMode &AM);		bool matchAddressBase(SDValue N, X86ISelAddressMode &AM);
bool selectAddr(SDNode *Parent, SDValue N, SDValue &Base,		bool selectAddr(SDNode *Parent, SDValue N, SDValue &Base,
SDValue &Scale, SDValue &Index, SDValue &Disp,		SDValue &Scale, SDValue &Index, SDValue &Disp,
SDValue &Segment);		SDValue &Segment);
bool selectVectorAddr(SDNode *Parent, SDValue N, SDValue &Base,		bool selectVectorAddr(SDNode *Parent, SDValue N, SDValue &Base,
SDValue &Scale, SDValue &Index, SDValue &Disp,		SDValue &Scale, SDValue &Index, SDValue &Disp,
SDValue &Segment);		SDValue &Segment);
template <class GatherScatterSDNode>		template <class GatherScatterSDNode>
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	bool useNonTemporalLoad(LoadSDNode *N) const {
return Subtarget->hasAVX2();		return Subtarget->hasAVX2();
case 64:		case 64:
return Subtarget->hasAVX512();		return Subtarget->hasAVX512();
}		}
}		}

bool foldLoadStoreIntoMemOperand(SDNode *Node);		bool foldLoadStoreIntoMemOperand(SDNode *Node);
};		};

		class X86AggressiveOperandFolding {
		public:
		explicit X86AggressiveOperandFolding(X86DAGToDAGISel &ISel, bool val)
		: Selector(&ISel) {
		Selector->setAggressiveOperandFolding(val);
		}
		~X86AggressiveOperandFolding() {
		Selector->setAggressiveOperandFolding(false);
		}

		private:
		X86DAGToDAGISel *Selector;
		};
}		}


bool		bool
X86DAGToDAGISel::IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const {		X86DAGToDAGISel::IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const {
if (OptLevel == CodeGenOpt::None) return false;		if (OptLevel == CodeGenOpt::None) return false;

if (!N.hasOneUse())		if (!N.hasOneUse())
▲ Show 20 Lines • Show All 697 Lines • ▼ Show 20 Lines	static bool foldMaskAndShiftToScale(SelectionDAG &DAG, SDValue N,
insertDAGNode(DAG, N, NewSHLAmt);		insertDAGNode(DAG, N, NewSHLAmt);
insertDAGNode(DAG, N, NewSHL);		insertDAGNode(DAG, N, NewSHL);
DAG.ReplaceAllUsesWith(N, NewSHL);		DAG.ReplaceAllUsesWith(N, NewSHL);

AM.Scale = 1 << AMShiftAmt;		AM.Scale = 1 << AMShiftAmt;
AM.IndexReg = NewSRL;		AM.IndexReg = NewSRL;
return false;		return false;
}		}

bool X86DAGToDAGISel::matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,		bool X86DAGToDAGISel::matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,
unsigned Depth) {		unsigned Depth) {
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format? If so, commit it as an NFC change RKSimon: clang-format? If so, commit it as an NFC change
SDLoc dl(N);		SDLoc dl(N);
DEBUG({		DEBUG({
dbgs() << "MatchAddress: ";		dbgs() << "MatchAddress: ";
AM.dump();		AM.dump();
});		});
// Limit recursion.
if (Depth > 5)		// Limit recursion. For aggressive operand folding recurse
		// till depth 8 which is the maximum legal scale value.
		RKSimonUnsubmitted Not Done Reply Inline Actions Two hard coded depths like this is weird - better to have a getMaxOperandFoldingDepth() helper? RKSimon: Two hard coded depths like this is weird - better to have a getMaxOperandFoldingDepth() helper?
		RKSimonUnsubmitted Not Done Reply Inline Actions My comment still stands - try to avoid hard coded values embedded in the source - add a getMaxOperandFoldingDepth() helper. RKSimon: My comment still stands - try to avoid hard coded values embedded in the source - add a…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Helper added. jbhateja: Helper added.
		RKSimonUnsubmitted Not Done Reply Inline Actions I meant make this a class method, but if you don't want to you can leave it here as lambda RKSimon: I meant make this a class method, but if you don't want to you can leave it here as lambda
		unsigned MaxDepth = getAggressiveOperandFolding() ? 8 : 5;
		if (Depth > MaxDepth)
return matchAddressBase(N, AM);		return matchAddressBase(N, AM);

// If this is already a %rip relative address, we can only merge immediates		// If this is already a %rip relative address, we can only merge immediates
// into it. Instead of handling this in every case, we handle it here.		// into it. Instead of handling this in every case, we handle it here.
// RIP relative addressing: %rip + 32-bit displacement!		// RIP relative addressing: %rip + 32-bit displacement!
if (AM.isRIPRelative()) {		if (AM.isRIPRelative()) {
// FIXME: JumpTable and ExternalSymbol address currently don't like		// FIXME: JumpTable and ExternalSymbol address currently don't like
// displacements. It isn't very important, but this should be fixed for		// displacements. It isn't very important, but this should be fixed for
▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::matchAddressBase(SDValue N, X86ISelAddressMode &AM) {
if (AM.BaseType != X86ISelAddressMode::RegBase \|\| AM.Base_Reg.getNode()) {		if (AM.BaseType != X86ISelAddressMode::RegBase \|\| AM.Base_Reg.getNode()) {
// If so, check to see if the scale index register is set.		// If so, check to see if the scale index register is set.
if (!AM.IndexReg.getNode()) {		if (!AM.IndexReg.getNode()) {
AM.IndexReg = N;		AM.IndexReg = N;
AM.Scale = 1;		AM.Scale = 1;
return false;		return false;
}		}

		if (OptLevel != CodeGenOpt::None && getAggressiveOperandFolding() &&
		AM.BaseType == X86ISelAddressMode::RegBase) {
		if (AM.Base_Reg == N) {
		SDValue Base_Reg = AM.Base_Reg;
		AM.Base_Reg = AM.IndexReg;
		AM.IndexReg = Base_Reg;
		craig.topperUnsubmitted Done Reply Inline Actions Is Scale limited to 1 before this or could it be 2 in which case this creates an illegal scale of 3? craig.topper: Is Scale limited to 1 before this or could it be 2 in which case this creates an illegal scale…
		RKSimonUnsubmitted Done Reply Inline Actions There is a check for AM.scale == 1. But I agree it'd be clearer with "AM.Scale = 2" instead of incrementing. RKSimon: There is a check for AM.scale == 1. But I agree it'd be clearer with "AM.Scale = 2" instead of…
		AM.Scale++;
		return false;
		} else if (AM.IndexReg == N) {
		RKSimonUnsubmitted Done Reply Inline Actions AM.Scale = 2; RKSimon: AM.Scale = 2;
		AM.Scale++;
		RKSimonUnsubmitted Not Done Reply Inline Actions These AM.Scale increments are scary - better to set it with AM.Scale = 2? RKSimon: These AM.Scale increments are scary - better to set it with AM.Scale = 2?
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is a max legal scale). This was intentionally done, initial change was only working for AM.scale = 2 and was very restrictive. Aggressive operand folding is done only for LEAs currently and is enabled instantiating and RAII object of X86AggressiveOperandFolding class. jbhateja: Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is…
		return false;
		}
		}

// Otherwise, we cannot select it.		// Otherwise, we cannot select it.
return true;		return true;
}		}

// Default, generate it as a register.		// Default, generate it as a register.
AM.BaseType = X86ISelAddressMode::RegBase;		AM.BaseType = X86ISelAddressMode::RegBase;
AM.Base_Reg = N;		AM.Base_Reg = N;
return false;		return false;
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::selectMOV64Imm32(SDValue N, SDValue &Imm) {
return CR->getUnsignedMax().ult(1ull << 32);		return CR->getUnsignedMax().ult(1ull << 32);
}		}

bool X86DAGToDAGISel::selectLEA64_32Addr(SDValue N, SDValue &Base,		bool X86DAGToDAGISel::selectLEA64_32Addr(SDValue N, SDValue &Base,
SDValue &Scale, SDValue &Index,		SDValue &Scale, SDValue &Index,
SDValue &Disp, SDValue &Segment) {		SDValue &Disp, SDValue &Segment) {
// Save the debug loc before calling selectLEAAddr, in case it invalidates N.		// Save the debug loc before calling selectLEAAddr, in case it invalidates N.
SDLoc DL(N);		SDLoc DL(N);

if (!selectLEAAddr(N, Base, Scale, Index, Disp, Segment))		if (!selectLEAAddr(N, Base, Scale, Index, Disp, Segment))
return false;		return false;

RegisterSDNode *RN = dyn_cast<RegisterSDNode>(Base);		RegisterSDNode *RN = dyn_cast<RegisterSDNode>(Base);
if (RN && RN->getReg() == 0)		if (RN && RN->getReg() == 0)
Base = CurDAG->getRegister(0, MVT::i64);		Base = CurDAG->getRegister(0, MVT::i64);
else if (Base.getValueType() == MVT::i32 && !dyn_cast<FrameIndexSDNode>(Base)) {		else if (Base.getValueType() == MVT::i32 && !dyn_cast<FrameIndexSDNode>(Base)) {
// Base could already be %rip, particularly in the x32 ABI.		// Base could already be %rip, particularly in the x32 ABI.
Show All 18 Lines	Index = SDValue(CurDAG->getMachineNode(
CurDAG->getTargetConstant(X86::sub_32bit, DL,		CurDAG->getTargetConstant(X86::sub_32bit, DL,
MVT::i32)),		MVT::i32)),
0);		0);
}		}

return true;		return true;
}		}

		bool X86DAGToDAGISel::matchAddressLEA(SDValue N, X86ISelAddressMode &AM) {
		// Avoid enabling aggressive operand folding when node N is a part of loop.
		X86AggressiveOperandFolding Enable(*this, !CurDAG->IsDAGPartOfLoop);

		bool matchRes = matchAddress(N, AM);

		// Check for legality of scale when recursion unwinds back to the top.
		if (OptLevel != CodeGenOpt::None && !matchRes) {
		if (!AM.isLegalScale())
		return true;

		// Avoid creating costly complex LEAs having scale less than 2
		// within loop.
		if(CurDAG->IsDAGPartOfLoop && Subtarget->slow3OpsLEA() &&
		AM.Scale <= 2 && AM.hasComplexAddressingMode() &&
		(!AM.hasSymbolicDisplacement() && N.getOpcode() < ISD::BUILTIN_OP_END))
		return true;
		}

		return matchRes;
		}


/// Calls SelectAddr and determines if the maximal addressing		/// Calls SelectAddr and determines if the maximal addressing
/// mode it matches can be cost effectively emitted as an LEA instruction.		/// mode it matches can be cost effectively emitted as an LEA instruction.
bool X86DAGToDAGISel::selectLEAAddr(SDValue N,		bool X86DAGToDAGISel::selectLEAAddr(SDValue N,
SDValue &Base, SDValue &Scale,		SDValue &Base, SDValue &Scale,
SDValue &Index, SDValue &Disp,		SDValue &Index, SDValue &Disp,
SDValue &Segment) {		SDValue &Segment) {
X86ISelAddressMode AM;		X86ISelAddressMode AM;

// Save the DL and VT before calling matchAddress, it can invalidate N.		// Save the DL and VT before calling matchAddress, it can invalidate N.
SDLoc DL(N);		SDLoc DL(N);
MVT VT = N.getSimpleValueType();		MVT VT = N.getSimpleValueType();

// Set AM.Segment to prevent MatchAddress from using one. LEA doesn't support		// Set AM.Segment to prevent MatchAddress from using one. LEA doesn't support
// segments.		// segments.
SDValue Copy = AM.Segment;		SDValue Copy = AM.Segment;
SDValue T = CurDAG->getRegister(0, MVT::i32);		SDValue T = CurDAG->getRegister(0, MVT::i32);
AM.Segment = T;		AM.Segment = T;
if (matchAddress(N, AM))		if (matchAddressLEA(N, AM))
return false;		return false;
assert (T == AM.Segment);		assert (T == AM.Segment);
AM.Segment = Copy;		AM.Segment = Copy;

unsigned Complexity = 0;		unsigned Complexity = 0;
if (AM.BaseType == X86ISelAddressMode::RegBase)		if (AM.BaseType == X86ISelAddressMode::RegBase)
if (AM.Base_Reg.getNode())		if (AM.Base_Reg.getNode())
Complexity = 1;		Complexity = 1;
▲ Show 20 Lines • Show All 1,093 Lines • Show Last 20 Lines

lib/Target/X86/X86OptimizeLEAs.cpp

Show All 16 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86.h"		#include "X86.h"
#include "X86InstrInfo.h"		#include "X86InstrInfo.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/LiveVariables.h"		#include "llvm/CodeGen/LiveVariables.h"
		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Insert the include in alphabetical order (so before MachineFunctionPass.h) RKSimon: (style) Insert the include in alphabetical order (so before MachineFunctionPass.h)
		RKSimonUnsubmitted Not Done Reply Inline Actions Include ordering still broken RKSimon: Include ordering still broken
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "x86-optimize-LEAs"		#define DEBUG_TYPE "x86-optimize-LEAs"

static cl::opt<bool>		static cl::opt<bool>
DisableX86LEAOpt("disable-x86-lea-opt", cl::Hidden,		DisableX86LEAOpt("disable-x86-lea-opt", cl::Hidden,
cl::desc("X86: Disable LEA optimizations."),		cl::desc("X86: Disable LEA optimizations."),
cl::init(false));		cl::init(false));

STATISTIC(NumSubstLEAs, "Number of LEA instruction substitutions");		STATISTIC(NumSubstLEAs, "Number of LEA instruction substitutions");
		STATISTIC(NumFactoredLEAs, "Number of LEAs factorized");
STATISTIC(NumRedundantLEAs, "Number of redundant LEA instructions removed");		STATISTIC(NumRedundantLEAs, "Number of redundant LEA instructions removed");

/// \brief Returns true if two machine operands are identical and they are not		/// \brief Returns true if two machine operands are identical and they are not
/// physical registers.		/// physical registers.
static inline bool isIdenticalOp(const MachineOperand &MO1,		static inline bool isIdenticalOp(const MachineOperand &MO1,
const MachineOperand &MO2);		const MachineOperand &MO2);

		/// \brief Returns true if two machine instructions have identical operands.
		static bool isIdenticalMI(MachineRegisterInfo *MRI, const MachineOperand &MO1,
		const MachineOperand &MO2);

/// \brief Returns true if two address displacement operands are of the same		/// \brief Returns true if two address displacement operands are of the same
/// type and use the same symbol/index/address regardless of the offset.		/// type and use the same symbol/index/address regardless of the offset.
static bool isSimilarDispOp(const MachineOperand &MO1,		static bool isSimilarDispOp(const MachineOperand &MO1,
const MachineOperand &MO2);		const MachineOperand &MO2);

/// \brief Returns true if the instruction is LEA.		/// \brief Returns true if the instruction is LEA.
static inline bool isLEA(const MachineInstr &MI);		static inline bool isLEA(const MachineInstr &MI);

		/// \brief Returns true if Definition of Operand is a copylike instruction.
		static bool isDefCopyLike(MachineRegisterInfo *MRI, const MachineOperand &Opr);

namespace {		namespace {
/// A key based on instruction's memory operands.		/// A key based on instruction's memory operands.
class MemOpKey {		class MemOpKey {
public:		public:
MemOpKey(const MachineOperand Base, const MachineOperand Scale,		MemOpKey(const MachineOperand Base, const MachineOperand Scale,
const MachineOperand Index, const MachineOperand Segment,		const MachineOperand Index, const MachineOperand Segment,
const MachineOperand *Disp)		const MachineOperand *Disp, bool DispCheck = false)
: Disp(Disp) {		: Disp(Disp), DeepCheck(DispCheck) {
		craig.topperUnsubmitted Done Reply Inline Actions Need a space before HardDispCheck. craig.topper: Need a space before HardDispCheck.
Operands[0] = Base;		Operands[0] = Base;
Operands[1] = Scale;		Operands[1] = Scale;
Operands[2] = Index;		Operands[2] = Index;
Operands[3] = Segment;		Operands[3] = Segment;
}		}

		/// Checks operands of MemOpKey are identical, if Base or Index
		/// operand definitions are of kind SUBREG_TO_REG then compare
		/// operands of defining MI.
		bool performDeepCheck(const MemOpKey &Other) const {
		MachineInstr MI = const_cast<MachineInstr >(Operands[0]->getParent());
		MachineRegisterInfo *MRI = MI->getRegInfo();

		for (int i = 0; i < 4; i++) {
		bool copyLike = isDefCopyLike(MRI, *Operands[i]);
		RKSimonUnsubmitted Not Done Reply Inline Actions CopyLike RKSimon: CopyLike
		if (copyLike && !isIdenticalMI(MRI, Operands[i], Other.Operands[i]))
		return false;
		else if (!copyLike && !isIdenticalOp(Operands[i], Other.Operands[i]))
		return false;
		}
		return isIdenticalOp(Disp, Other.Disp);
		}

bool operator==(const MemOpKey &Other) const {		bool operator==(const MemOpKey &Other) const {
		if (DeepCheck)
		return performDeepCheck(Other);

// Addresses' bases, scales, indices and segments must be identical.		// Addresses' bases, scales, indices and segments must be identical.
for (int i = 0; i < 4; ++i)		for (int i = 0; i < 4; ++i)
if (!isIdenticalOp(Operands[i], Other.Operands[i]))		if (!isIdenticalOp(Operands[i], Other.Operands[i]))
return false;		return false;

// Addresses' displacements don't have to be exactly the same. It only		// Addresses' displacements don't have to be exactly the same. It only
// matters that they use the same symbol/index/address. Immediates' or		// matters that they use the same symbol/index/address. Immediates' or
// offsets' differences will be taken care of during instruction		// offsets' differences will be taken care of during instruction
// substitution.		// substitution.
		lsabaUnsubmitted Done Reply Inline Actions This comment is only valid for the else statement, please change it to explain the different cases between Identical and Similar Disp lsaba: This comment is only valid for the else statement, please change it to explain the different…
return isSimilarDispOp(Disp, Other.Disp);		return isSimilarDispOp(Disp, Other.Disp);
}		}

		craig.topperUnsubmitted Done Reply Inline Actions No need for 'else' after an if that returns. craig.topper: No need for 'else' after an if that returns.
// Address' base, scale, index and segment operands.		// Address' base, scale, index and segment operands.
const MachineOperand *Operands[4];		const MachineOperand *Operands[4];

// Address' displacement operand.		// Address' displacement operand.
const MachineOperand *Disp;		const MachineOperand *Disp;

		// If true checks Address' base, index, segment and
		// displacement are identical, in additions if base/index
		// are defined by copylike instruction then futher
		// compare the operands of the defining instruction.
		bool DeepCheck;
};		};
} // end anonymous namespace		} // end anonymous namespace

/// Provide DenseMapInfo for MemOpKey.		/// Provide DenseMapInfo for MemOpKey.
namespace llvm {		namespace llvm {
template <> struct DenseMapInfo<MemOpKey> {		template <> struct DenseMapInfo<MemOpKey> {
typedef DenseMapInfo<const MachineOperand *> PtrInfo;		typedef DenseMapInfo<const MachineOperand *> PtrInfo;

static inline MemOpKey getEmptyKey() {		static inline MemOpKey getEmptyKey() {
return MemOpKey(PtrInfo::getEmptyKey(), PtrInfo::getEmptyKey(),		return MemOpKey(PtrInfo::getEmptyKey(), PtrInfo::getEmptyKey(),
PtrInfo::getEmptyKey(), PtrInfo::getEmptyKey(),		PtrInfo::getEmptyKey(), PtrInfo::getEmptyKey(),
PtrInfo::getEmptyKey());		PtrInfo::getEmptyKey());
}		}

static inline MemOpKey getTombstoneKey() {		static inline MemOpKey getTombstoneKey() {
return MemOpKey(PtrInfo::getTombstoneKey(), PtrInfo::getTombstoneKey(),		return MemOpKey(PtrInfo::getTombstoneKey(), PtrInfo::getTombstoneKey(),
PtrInfo::getTombstoneKey(), PtrInfo::getTombstoneKey(),		PtrInfo::getTombstoneKey(), PtrInfo::getTombstoneKey(),
PtrInfo::getTombstoneKey());		PtrInfo::getTombstoneKey());
}		}

static unsigned getHashValue(const MemOpKey &Val) {		static unsigned getHashValue(const MemOpKey &Val) {
// Checking any field of MemOpKey is enough to determine if the key is		// Checking any field of MemOpKey is enough to determine if the key is
// empty or tombstone.		// empty or tombstone.
		hash_code Hash(0);
assert(Val.Disp != PtrInfo::getEmptyKey() && "Cannot hash the empty key");		assert(Val.Disp != PtrInfo::getEmptyKey() && "Cannot hash the empty key");
assert(Val.Disp != PtrInfo::getTombstoneKey() &&		assert(Val.Disp != PtrInfo::getTombstoneKey() &&
"Cannot hash the tombstone key");		"Cannot hash the tombstone key");

hash_code Hash = hash_combine(Val.Operands[0], Val.Operands[1],		auto getMIHash = [](MachineInstr *MI) -> hash_code {
Val.Operands[2], Val.Operands[3]);		hash_code h(0);
		for (unsigned i = 1, e = MI->getNumOperands(); i < e; i++)
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) for (unsigned i = 1, e = MI->getNumOperands(); i <e ; i++) RKSimon: (style) ``` for (unsigned i = 1, e = MI->getNumOperands(); i <e ; i++) ```
		h = hash_combine(h, MI->getOperand(i));
		return h;
		};

		const MachineOperand &Base = *Val.Operands[0];
		const MachineOperand &Index = *Val.Operands[2];
		MachineInstr MI = const_cast<MachineInstr >(Base.getParent());
		MachineRegisterInfo *MRI = MI->getRegInfo();

		if (isDefCopyLike(MRI, Base))
		Hash = getMIHash(MRI->getVRegDef(Base.getReg()));
		else
		Hash = hash_combine(Hash, Base);

		if (isDefCopyLike(MRI, Index))
		Hash = getMIHash(MRI->getVRegDef(Index.getReg()));
		else
		Hash = hash_combine(Hash, Index);

		Hash = hash_combine(Hash, Val.Operands[1], Val.Operands[3]);

// If the address displacement is an immediate, it should not affect the		// If the address displacement is an immediate, it should not affect the
// hash so that memory operands which differ only be immediate displacement		// hash so that memory operands which differ only be immediate displacement
// would have the same hash. If the address displacement is something else,		// would have the same hash. If the address displacement is something else,
// we should reflect symbol/index/address in the hash.		// we should reflect symbol/index/address in the hash.
switch (Val.Disp->getType()) {		switch (Val.Disp->getType()) {
case MachineOperand::MO_Immediate:		case MachineOperand::MO_Immediate:
break;		break;
Show All 28 Lines	static bool isEqual(const MemOpKey &LHS, const MemOpKey &RHS) {
// empty or tombstone.		// empty or tombstone.
if (RHS.Disp == PtrInfo::getEmptyKey())		if (RHS.Disp == PtrInfo::getEmptyKey())
return LHS.Disp == PtrInfo::getEmptyKey();		return LHS.Disp == PtrInfo::getEmptyKey();
if (RHS.Disp == PtrInfo::getTombstoneKey())		if (RHS.Disp == PtrInfo::getTombstoneKey())
return LHS.Disp == PtrInfo::getTombstoneKey();		return LHS.Disp == PtrInfo::getTombstoneKey();
return LHS == RHS;		return LHS == RHS;
}		}
};		};
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions NFC change - just commit it if you want, but don't pollute a patch with it RKSimon: NFC change - just commit it if you want, but don't pollute a patch with it

/// \brief Returns a hash table key based on memory operands of \p MI. The		/// \brief Returns a hash table key based on memory operands of \p MI. The
/// number of the first memory operand of \p MI is specified through \p N.		/// number of the first memory operand of \p MI is specified through \p N.
static inline MemOpKey getMemOpKey(const MachineInstr &MI, unsigned N) {		static inline MemOpKey getMemOpKey(const MachineInstr &MI, unsigned N) {
assert((isLEA(MI) \|\| MI.mayLoadOrStore()) &&		assert((isLEA(MI) \|\| MI.mayLoadOrStore()) &&
"The instruction must be a LEA, a load or a store");		"The instruction must be a LEA, a load or a store");
return MemOpKey(&MI.getOperand(N + X86::AddrBaseReg),		return MemOpKey(&MI.getOperand(N + X86::AddrBaseReg),
&MI.getOperand(N + X86::AddrScaleAmt),		&MI.getOperand(N + X86::AddrScaleAmt),
&MI.getOperand(N + X86::AddrIndexReg),		&MI.getOperand(N + X86::AddrIndexReg),
&MI.getOperand(N + X86::AddrSegmentReg),		&MI.getOperand(N + X86::AddrSegmentReg),
&MI.getOperand(N + X86::AddrDisp));		&MI.getOperand(N + X86::AddrDisp));
}		}

		static inline MemOpKey getMemOpCSEKey(const MachineInstr &MI, unsigned N) {
		static MachineOperand DummyScale = MachineOperand::CreateImm(1);
		RKSimonUnsubmitted Not Done Reply Inline Actions Can we avoid the static? RKSimon: Can we avoid the static?
		RKSimonUnsubmitted Not Done Reply Inline Actions Again, can we avoid the static? RKSimon: Again, can we avoid the static?
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Its used bacause we want MemOpKey for LEA factorization to be indipendent of Scale, keeping it as static avoids recreation of dummy scale. jbhateja: Its used bacause we want MemOpKey for LEA factorization to be indipendent of Scale, keeping it…
		assert((isLEA(MI) \|\| MI.mayLoadOrStore()) &&
		"The instruction must be a LEA, a load or a store");
		return MemOpKey(&MI.getOperand(N + X86::AddrBaseReg), &DummyScale,
		&MI.getOperand(N + X86::AddrIndexReg),
		&MI.getOperand(N + X86::AddrSegmentReg),
		&MI.getOperand(N + X86::AddrDisp), true);
		}
		craig.topperUnsubmitted Done Reply Inline Actions Add space before 'true' craig.topper: Add space before 'true'

static inline bool isIdenticalOp(const MachineOperand &MO1,		static inline bool isIdenticalOp(const MachineOperand &MO1,
const MachineOperand &MO2) {		const MachineOperand &MO2) {
return MO1.isIdenticalTo(MO2) &&		return MO1.isIdenticalTo(MO2) &&
(!MO1.isReg() \|\|		(!MO1.isReg() \|\|
!TargetRegisterInfo::isPhysicalRegister(MO1.getReg()));		!TargetRegisterInfo::isPhysicalRegister(MO1.getReg()));
}		}

		static bool isIdenticalMI(MachineRegisterInfo *MRI, const MachineOperand &MO1,
		const MachineOperand &MO2) {
		MachineInstr *MI1 = nullptr;
		MachineInstr *MI2 = nullptr;
		lsabaUnsubmitted Not Done Reply Inline Actions need to check MO2.isReg() lsaba: need to check MO2.isReg()
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Yes, I shall take care of this. Kindly let me know if there are any other comments apart from this. It shall save iterations. jbhateja: Yes, I shall take care of this. Kindly let me know if there are any other comments apart from…
		if (!MO1.isReg() \|\| !MO2.isReg())
		return false;

		MI1 = MRI->getVRegDef(MO1.getReg());
		MI2 = MRI->getVRegDef(MO2.getReg());
		if (!MI1 \|\| !MI2)
		return false;
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i) RKSimon: (style) ``` for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i) ```
		if (MI1->getOpcode() != MI2->getOpcode())
		return false;
		if (MI1->getNumOperands() != MI2->getNumOperands())
		return false;
		for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i)
		if (!isIdenticalOp(MI1->getOperand(i), MI2->getOperand(i)))
		return false;
		return true;
		}

#ifndef NDEBUG		#ifndef NDEBUG
static bool isValidDispOp(const MachineOperand &MO) {		static bool isValidDispOp(const MachineOperand &MO) {
return MO.isImm() \|\| MO.isCPI() \|\| MO.isJTI() \|\| MO.isSymbol() \|\|		return MO.isImm() \|\| MO.isCPI() \|\| MO.isJTI() \|\| MO.isSymbol() \|\|
MO.isGlobal() \|\| MO.isBlockAddress() \|\| MO.isMCSymbol() \|\| MO.isMBB();		MO.isGlobal() \|\| MO.isBlockAddress() \|\| MO.isMCSymbol() \|\| MO.isMBB();
}		}
#endif		#endif

static bool isSimilarDispOp(const MachineOperand &MO1,		static bool isSimilarDispOp(const MachineOperand &MO1,
Show All 15 Lines
}		}

static inline bool isLEA(const MachineInstr &MI) {		static inline bool isLEA(const MachineInstr &MI) {
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
return Opcode == X86::LEA16r \|\| Opcode == X86::LEA32r \|\|		return Opcode == X86::LEA16r \|\| Opcode == X86::LEA32r \|\|
Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r;		Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r;
}		}

		static bool isDefCopyLike(MachineRegisterInfo *MRI, const MachineOperand &Opr) {
		if (!Opr.isReg() \|\| TargetRegisterInfo::isPhysicalRegister(Opr.getReg()))
		RKSimonUnsubmitted Not Done Reply Inline Actions Do you mean: bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent()); RKSimon: Do you mean: ``` bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent()); ```
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions fixed. jbhateja: fixed.
		return false;
		MachineInstr *MI = MRI->getVRegDef(Opr.getReg());
		return MI && MI->isCopyLike();
		}

namespace {		namespace {

		/// This class captures the functions and attributes
		/// needed to factorize LEA within and across basic
		RKSimonUnsubmitted Not Done Reply Inline Actions Comment describing the purpose of the class RKSimon: Comment describing the purpose of the class
		/// blocks.LEA instruction with same BASE,OFFSET and
		/// INDEX are the candidates for factorization.
		class FactorizeLEAOpt {
		public:
		using LEAListT = std::list<MachineInstr *>;
		using LEAMapT = DenseMap<MemOpKey, LEAListT>;
		using ValueT = DenseMap<MemOpKey, unsigned>;
		using ScopeEntryT = std::pair<MachineBasicBlock *, ValueT>;
		using ScopeStackT = std::vector<ScopeEntryT>;

		FactorizeLEAOpt() = default;
		FactorizeLEAOpt(const FactorizeLEAOpt &) = delete;
		FactorizeLEAOpt &operator=(const FactorizeLEAOpt &) = delete;

		void performCleanup() {
		for (auto LEA : removedLEAs)
		LEA->eraseFromParent();
		LEAs.clear();
		Stack.clear();
		removedLEAs.clear();
		}

		LEAMapT &getLEAMap() { return LEAs; }
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) cleanup the positions of the * - check what clang-format does RKSimon: (style) cleanup the positions of the * - check what clang-format does
		RKSimonUnsubmitted Not Done Reply Inline Actions comment RKSimon: comment
		ScopeEntryT *getTopScope() { return &Stack.back(); }

		void addForLazyRemoval(MachineInstr *Instr) { removedLEAs.insert(Instr); }

		bool checkIfScheduledForRemoval(MachineInstr *Instr) {
		return removedLEAs.find(Instr) != removedLEAs.end();
		}

		/// Push the ScopeEntry for the BasicBlock over Stack.
		/// Also traverses over list of instruction and update
		/// LEAs Map and ScopeEntry for each LEA instruction
		/// found using insertLEA().
		void pushScope(MachineBasicBlock *MBB);

		/// Stores the size of MachineInstr list corrosponding
		RKSimonUnsubmitted Not Done Reply Inline Actions comment RKSimon: comment
		/// to key K from LEAs MAP into the ScopeEntry of
		/// the basic block, then insert the LEA at the beginning
		/// of the list.
		void insertLEA(MachineInstr *MI);

		/// Pops out ScopeEntry of top most BasicBlock from the stack
		/// and remove the LEA instructions contained in the scope
		/// from the LEAs Map.
		void popScope();

		/// If LEA contains Physical Registers then its not a candidate
		/// for factorizations since physical registers may violate SSA
		/// semantics of MI.
		bool constainsPhyReg(MachineInstr *MI, unsigned RecLevel);

		private:
		ScopeStackT Stack;
		LEAMapT LEAs;
		std::set<MachineInstr *> removedLEAs;
		};
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) remove braces RKSimon: (style) remove braces

		void FactorizeLEAOpt::pushScope(MachineBasicBlock *MBB) {
		ValueT EmptyMap;
		ScopeEntryT SE = std::make_pair(MBB, EmptyMap);
		Stack.push_back(SE);
		for (auto &MI : *MBB) {
		if (isLEA(MI))
		insertLEA(&MI);
		}
		}

		void FactorizeLEAOpt::popScope() {
		ScopeEntryT &SE = Stack.back();
		for (auto MapEntry : SE.second) {
		LEAMapT::iterator Itr = LEAs.find(MapEntry.first);
		assert((Itr != LEAs.end()) &&
		"LEAs map must have a node corresponding to ScopeEntry's Key.");

		while (((*Itr).second.size() > MapEntry.second))
		(*Itr).second.pop_front();
		// If list goes empty remove entry from LEAs Map.
		if ((*Itr).second.empty())
		LEAs.erase(Itr);
		}
		Stack.pop_back();
		}

		bool FactorizeLEAOpt::constainsPhyReg(MachineInstr *MI, unsigned RecLevel) {
		if (!MI \|\| !RecLevel)
		return false;

		MachineRegisterInfo *MRI = MI->getRegInfo();
		for (auto Operand : MI->operands()) {
		if (!Operand.isReg())
		continue;
		if (TargetRegisterInfo::isPhysicalRegister(Operand.getReg()))
		return true;
		MachineInstr *OperDefMI = MRI->getVRegDef(Operand.getReg());
		if (OperDefMI && (MI != OperDefMI) && OperDefMI->isCopyLike() &&
		constainsPhyReg(OperDefMI, RecLevel - 1))
		return true;
		}
		return false;
		lsabaUnsubmitted Done Reply Inline Actions it is unclear what this function does, can you explain? lsaba: it is unclear what this function does, can you explain?
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions In a nutshell we are implementing a scoped hash map. Which is LEAs. Every time we enter a new scope and encounter an LEA we first record the length of list of MIs corresponding to MemOpKey of new LEA. After that we insert the new LEA in the beginning of the list which is a value field of the hash map. When we leave a scope we remove the LEA instructions from the LEAs hash map. Since we recorded the original length of list of MIs when we entered the scope at exit we keep on removing elements from the beginning of list till the size becomes same as what was recorded at the entry. jbhateja: In a nutshell we are implementing a scoped hash map. Which is LEAs. Every time we enter a new…
		}
		lsabaUnsubmitted Not Done Reply Inline Actions already initialized at the beginning of the function lsaba: already initialized at the beginning of the function
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Yes. jbhateja: Yes.

		void FactorizeLEAOpt::insertLEA(MachineInstr *MI) {
		unsigned lsize;
		if (constainsPhyReg(MI, 2))
		return;

		MemOpKey Key = getMemOpCSEKey(*MI, 1);
		ScopeEntryT *TopScope = getTopScope();

		LEAMapT::iterator Itr = LEAs.find(Key);
		if (Itr == LEAs.end()) {
		lsize = 0;
		LEAs[Key].push_front(MI);
		} else {
		lsize = (*Itr).second.size();
		(*Itr).second.push_front(MI);
		}
		if (TopScope->second.find(Key) == TopScope->second.end())
		TopScope->second[Key] = lsize;
		}

class OptimizeLEAPass : public MachineFunctionPass {		class OptimizeLEAPass : public MachineFunctionPass {
public:		public:
OptimizeLEAPass() : MachineFunctionPass(ID) {}		OptimizeLEAPass() : MachineFunctionPass(ID) {}

StringRef getPassName() const override { return "X86 LEA Optimize"; }		StringRef getPassName() const override { return "X86 LEA Optimize"; }

/// \brief Loop over all of the basic blocks, replacing address		/// \brief Loop over all of the basic blocks, replacing address
/// calculations in load and store instructions, if it's already		/// calculations in load and store instructions, if it's already
/// been calculated by LEA. Also, remove redundant LEAs.		/// been calculated by LEA. Also, remove redundant LEAs.
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.setPreservesCFG();
		MachineFunctionPass::getAnalysisUsage(AU);
		AU.addRequired<MachineDominatorTree>();
		}

private:		private:
typedef DenseMap<MemOpKey, SmallVector<MachineInstr *, 16>> MemOpMap;		typedef DenseMap<MemOpKey, SmallVector<MachineInstr *, 16>> MemOpMap;

/// \brief Returns a distance between two instructions inside one basic block.		/// \brief Returns a distance between two instructions inside one basic block.
/// Negative result means, that instructions occur in reverse order.		/// Negative result means, that instructions occur in reverse order.
int calcInstrDist(const MachineInstr &First, const MachineInstr &Last);		int calcInstrDist(const MachineInstr &First, const MachineInstr &Last);

/// \brief Choose the best \p LEA instruction from the \p List to replace		/// \brief Choose the best \p LEA instruction from the \p List to replace
Show All 21 Lines	private:
/// numbers to all instructions in the basic block to speed up calculation of		/// numbers to all instructions in the basic block to speed up calculation of
/// distance between them.		/// distance between them.
void findLEAs(const MachineBasicBlock &MBB, MemOpMap &LEAs);		void findLEAs(const MachineBasicBlock &MBB, MemOpMap &LEAs);

/// \brief Removes redundant address calculations.		/// \brief Removes redundant address calculations.
bool removeRedundantAddrCalc(MemOpMap &LEAs);		bool removeRedundantAddrCalc(MemOpMap &LEAs);

/// Replace debug value MI with a new debug value instruction using register		/// Replace debug value MI with a new debug value instruction using register
/// VReg with an appropriate offset and DIExpression to incorporate the		/// VReg with an appropriate offset and DIExpression to incorporate the
		lsabaUnsubmitted Done Reply Inline Actions Please add a comment explaining what this function does lsaba: Please add a comment explaining what this function does
/// address displacement AddrDispShift. Return new debug value instruction.		/// address displacement AddrDispShift. Return new debug value instruction.
MachineInstr *replaceDebugValue(MachineInstr &MI, unsigned VReg,		MachineInstr *replaceDebugValue(MachineInstr &MI, unsigned VReg,
int64_t AddrDispShift);		int64_t AddrDispShift);

/// \brief Removes LEAs which calculate similar addresses.		/// \brief Removes LEAs which calculate similar addresses.
bool removeRedundantLEAs(MemOpMap &LEAs);		bool removeRedundantLEAs(MemOpMap &LEAs);

		/// \brief Visit over basic blocks, collect LEAs in a scoped
		/// hash map (FactorizeLEAOpt::LEAs) and try to factor them out.
		bool FactorizeLEAsAllBasicBlocks(MachineFunction &MF);

		bool FactorizeLEAsBasicBlock(MachineDomTreeNode *DN);

		/// \brief Factor out LEAs which share Base,Index,Offset and Segment.
		bool processBasicBlock(const MachineBasicBlock &MBB);

		/// \brief Try to replace LEA with a lower strength instruction
		/// to improves latency and throughput.
		bool strengthReduceLEAs(MemOpMap &LEAs, const MachineBasicBlock &MBB);

DenseMap<const MachineInstr *, unsigned> InstrPos;		DenseMap<const MachineInstr *, unsigned> InstrPos;

		FactorizeLEAOpt FactorOpt;

		MachineDominatorTree *DT;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
const X86InstrInfo *TII;		const X86InstrInfo *TII;
const X86RegisterInfo *TRI;		const X86RegisterInfo *TRI;

static char ID;		static char ID;
};		};
char OptimizeLEAPass::ID = 0;		char OptimizeLEAPass::ID = 0;
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions NFC change RKSimon: NFC change

FunctionPass *llvm::createX86OptimizeLEAs() { return new OptimizeLEAPass(); }		FunctionPass *llvm::createX86OptimizeLEAs() { return new OptimizeLEAPass(); }

int OptimizeLEAPass::calcInstrDist(const MachineInstr &First,		int OptimizeLEAPass::calcInstrDist(const MachineInstr &First,
const MachineInstr &Last) {		const MachineInstr &Last) {
// Both instructions must be in the same basic block and they must be		// Both instructions must be in the same basic block and they must be
// presented in InstrPos.		// presented in InstrPos.
assert(Last.getParent() == First.getParent() &&		assert(Last.getParent() == First.getParent() &&
Show All 17 Lines
bool OptimizeLEAPass::chooseBestLEA(const SmallVectorImpl<MachineInstr *> &List,		bool OptimizeLEAPass::chooseBestLEA(const SmallVectorImpl<MachineInstr *> &List,
const MachineInstr &MI,		const MachineInstr &MI,
MachineInstr *&BestLEA,		MachineInstr *&BestLEA,
int64_t &AddrDispShift, int &Dist) {		int64_t &AddrDispShift, int &Dist) {
const MachineFunction *MF = MI.getParent()->getParent();		const MachineFunction *MF = MI.getParent()->getParent();
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags) +		int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags) +
X86II::getOperandBias(Desc);		X86II::getOperandBias(Desc);

		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format? If so, commit it as an NFC change RKSimon: clang-format? If so, commit it as an NFC change
BestLEA = nullptr;		BestLEA = nullptr;

// Loop over all LEA instructions.		// Loop over all LEA instructions.
for (auto DefMI : List) {		for (auto DefMI : List) {
// Get new address displacement.		// Get new address displacement.
int64_t AddrDispShiftTemp = getAddrDispShift(MI, MemOpNo, *DefMI, 1);		int64_t AddrDispShiftTemp = getAddrDispShift(MI, MemOpNo, *DefMI, 1);

// Make sure address displacement fits 4 bytes.		// Make sure address displacement fits 4 bytes.
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	while (I1 != List.end()) {
replaceDebugValue(MI, FirstVReg, AddrDispShift);		replaceDebugValue(MI, FirstVReg, AddrDispShift);
continue;		continue;
}		}

// Get the number of the first memory operand.		// Get the number of the first memory operand.
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
int MemOpNo =		int MemOpNo =
X86II::getMemoryOperandNo(Desc.TSFlags) +		X86II::getMemoryOperandNo(Desc.TSFlags) +
X86II::getOperandBias(Desc);		X86II::getOperandBias(Desc);
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format? If so, commit it as an NFC change RKSimon: clang-format? If so, commit it as an NFC change

// Update address base.		// Update address base.
MO.setReg(FirstVReg);		MO.setReg(FirstVReg);

// Update address disp.		// Update address disp.
MachineOperand &Op = MI.getOperand(MemOpNo + X86::AddrDisp);		MachineOperand &Op = MI.getOperand(MemOpNo + X86::AddrDisp);
if (Op.isImm())		if (Op.isImm())
Op.setImm(Op.getImm() + AddrDispShift);		Op.setImm(Op.getImm() + AddrDispShift);
Show All 20 Lines	while (I1 != List.end()) {
}		}
++I1;		++I1;
}		}
}		}

return Changed;		return Changed;
}		}

		static inline int getADDrrFromLEA(int LEAOpcode) {
		switch (LEAOpcode) {
		default:
		llvm_unreachable("Unexpected LEA instruction");
		case X86::LEA16r:
		return X86::ADD16rr;
		case X86::LEA32r:
		return X86::ADD32rr;
		case X86::LEA64_32r:
		case X86::LEA64r:
		return X86::ADD64rr;
		}
		}

		bool OptimizeLEAPass::strengthReduceLEAs(MemOpMap &LEAs,
		const MachineBasicBlock &BB) {
		bool Changed = false;

		// Loop over all entries in the table.
		for (auto &E : LEAs) {
		auto &List = E.second;

		// Loop over all LEA pairs.
		for (auto I1 = List.begin(); I1 != List.end(); I1++) {
		MachineInstrBuilder NewMI;
		MachineInstr &First = **I1;
		MachineOperand &Res = First.getOperand(0);
		MachineOperand &Base = First.getOperand(1);
		MachineOperand &Scale = First.getOperand(2);
		MachineOperand &Index = First.getOperand(3);
		MachineOperand &Offset = First.getOperand(4);

		const MCInstrDesc &ADDrr = TII->get(getADDrrFromLEA(First.getOpcode()));
		const DebugLoc DL = First.getDebugLoc();

		if (!Base.isReg() \|\| !Index.isReg())
		continue;
		if (TargetRegisterInfo::isPhysicalRegister(Res.getReg()) \|\|
		lsabaUnsubmitted Not Done Reply Inline Actions This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle it separately, for example in this reproducer : ; ModuleID = 'bugpoint-reduced-simplified.bc' source_filename = "bugpoint-output-2ef2e5d.bc" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: norecurse nounwind readnone uwtable define i32 @foo(i32 %a, i32 %b, i32 %d, i32 %y, i32 %x) local_unnamed_addr #0 { entry: %mul1 = shl i32 %b, 1 %add2 = add i32 %a, 4 %add3 = add i32 %add2, %mul1 %mul4 = shl i32 %b, 2 %add6 = add i32 %add2, %mul4 br label %for.body for.cond.cleanup: ; preds = %for.body ret i32 %add for.body: ; preds = %for.body, %entry %x.addr.015 = phi i32 [ %x, %entry ], [ %add3, %for.body ] %y.addr.014 = phi i32 [ %y, %entry ], [ %add6, %for.body ] %mul = mul nsw i32 %x.addr.015, %y.addr.014 %add = add nsw i32 0, %mul %exitcond = icmp eq i32 undef, %d br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !1 } attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp- math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+ mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 6.0.0 (cfe/trunk 309511)"} !1 = distinct !{!1, !2} !2 = !{!"llvm.loop.unroll.disable"} lsaba: This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle…
		RKSimonUnsubmitted Not Done Reply Inline Actions Has @lsaba test been added to the patch? I couldn't see it. RKSimon: Has @lsaba test been added to the patch? I couldn't see it.
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization inside loops, only simplifyLEA can kick in. jbhateja: We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions We have a test case for loops lea-opt-cse2.ll, so not added this. We are not doing any factorization inside loops, only simplifyLEA can kick in. jbhateja: We have a test case for loops lea-opt-cse2.ll, so not added this. We are not doing any…
		TargetRegisterInfo::isPhysicalRegister(Base.getReg()) \|\|
		TargetRegisterInfo::isPhysicalRegister(Index.getReg()))
		continue;
		lsabaUnsubmitted Done Reply Inline Actions could we end up with an illegal scale here? (eg. scale1 = 4, scale2=1) lsaba: could we end up with an illegal scale here? (eg. scale1 = 4, scale2=1)

		MachineBasicBlock &MBB = (const_cast<MachineBasicBlock >(&BB));
		if (Scale.isImm() && Scale.getImm() == 1) {
		// R = B + I
		if (Offset.isImm() && !Offset.getImm()) {
		NewMI = BuildMI(MBB, &First, DL, ADDrr)
		.addDef(Res.getReg())
		.addUse(Base.getReg())
		.addUse(Index.getReg());
		Changed = NewMI.getInstr() != nullptr;
		lsabaUnsubmitted Done Reply Inline Actions should it also be erased from the LEAs list? lsaba: should it also be erased from the LEAs list?
		jbhatejaAuthorUnsubmitted Done Reply Inline Actions Why do you think so ? LEAs is a Map where Key = F ( BASE , INDEX , DISP , SEGMENT) Value = Vector of MI (LEA Instr). This MAP is populated per BasicBlock basis. Outer Loop traverse over Map entries Sort Vector in decresing order of Scale. Inner Loop traverses over Sorted vector of LEA for a given Key LI1 insturction will be traversed only once. Map will be delted once we leave this function. Machine CSE which is value number based is already run before this pass so if there are multiple identical LEAs (i.e same BASE/INDEX/SCALE/DISP/SEGMENT) in a BasicBlock they will be factored out before we land up here.. jbhateja: Why do you think so ? LEAs is a Map where Key = F ( BASE , INDEX , DISP , SEGMENT) Value…
		lsabaUnsubmitted Not Done Reply Inline Actions just making sure:) by the way, can't this algorithm work cross a function's basic blocks? lsaba: just making sure:) by the way, can't this algorithm work cross a function's basic blocks?
		First.eraseFromParent();
		RKSimonUnsubmitted Done Reply Inline Actions Please can you run this through clang-format? RKSimon: Please can you run this through clang-format?
		}
		}
		}
		}
		return Changed;
		}

		bool OptimizeLEAPass::processBasicBlock(const MachineBasicBlock &MBB) {
		bool cseDone = false;

		// Legal scale value (1,2,4 & 8) vector.
		int LegalScale[9] = {0, 1, 1, 0, 1, 0, 0, 0, 1};

		auto CompareFn = [](const MachineInstr *Arg1,
		const MachineInstr *Arg2) -> bool {
		if (Arg1->getOperand(2).getImm() < Arg2->getOperand(2).getImm())
		return false;
		return true;
		};

		// Loop over all entries in the table.
		for (auto &E : FactorOpt.getLEAMap()) {
		auto &List = E.second;
		if (List.size() > 1)
		List.sort(CompareFn);

		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Remove braces RKSimon: (style) Remove braces
		// Loop over all LEA pairs.
		for (auto Iter1 = List.begin(); Iter1 != List.end(); Iter1++) {
		for (auto Iter2 = std::next(Iter1); Iter2 != List.end(); Iter2++) {
		MachineInstr &LI1 = **Iter1;
		MachineInstr &LI2 = **Iter2;

		RKSimonUnsubmitted Not Done Reply Inline Actions Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp auto IsLegalScale = [](int S) { return S == 1 \|\| S == 2 \|\| S == 4 \|\| S == 8; }; RKSimon: Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp ``` auto…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed jbhateja: Fixed
		if (!DT->dominates(&LI2, &LI1))
		continue;

		int Scale1 = LI1.getOperand(2).getImm();
		int Scale2 = LI2.getOperand(2).getImm();
		assert(LI2.getOperand(0).isReg() && "Result is a VirtualReg");
		RKSimonUnsubmitted Not Done Reply Inline Actions return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm(); RKSimon: ``` return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm(); ```
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed jbhateja: Fixed
		DebugLoc DL = LI1.getDebugLoc();

		if (FactorOpt.checkIfScheduledForRemoval(&LI1))
		continue;

		int Factor = Scale1 - Scale2;
		if (Factor > 0 && LegalScale[Factor]) {
		DEBUG(dbgs() << "CSE LEAs: Candidate to replace: "; LI1.dump(););
		MachineInstrBuilder NewMI =
		BuildMI((const_cast<MachineBasicBlock >(&MBB)), &LI1, DL,
		TII->get(LI1.getOpcode()))
		.addDef(LI1.getOperand(0).getReg()) // Dst = Dst of LI1.
		.addUse(LI2.getOperand(0).getReg()) // Base = Dst of LI2.
		.addImm(Factor) // Scale = Diff b/w scales.
		.addUse(LI1.getOperand(3).getReg()) // Index = Index of LI1.
		.addImm(0) // Disp = 0
		.addUse(
		LI1.getOperand(5).getReg()); // Segment = Segmant of LI1.

		cseDone = NewMI.getInstr() != nullptr;

		/// Lazy removal shall ensure that replaced LEA remains
		/// till we finish processing all the basic block. This shall
		/// provide opportunity for further factorization based on
		/// the replaced LEA which will be legal since it has same
		/// destination as newly formed LEA.
		FactorOpt.addForLazyRemoval(&LI1);

		NumFactoredLEAs++;
		DEBUG(dbgs() << "CSE LEAs: Replaced by: "; NewMI->dump(););
		}
		}
		}
		}
		return cseDone;
		}

		bool OptimizeLEAPass::FactorizeLEAsBasicBlock(MachineDomTreeNode *DN) {
		bool Changed = false;
		MachineBasicBlock *MBB = DN->getBlock();
		FactorOpt.pushScope(MBB);

		Changed \|= processBasicBlock(*MBB);
		for (auto Child : DN->getChildren())
		FactorizeLEAsBasicBlock(Child);

		FactorOpt.popScope();
		jmolloyUnsubmitted Not Done Reply Inline Actions This can cause recursion deep enough to cause stack overflows. Please could you refactor this to not use direct recursion? The domtree may be hundreds of nodes deep in degenerate cases. jmolloy: This can cause recursion deep enough to cause stack overflows. Please could you refactor this…
		return Changed;
		}

		bool OptimizeLEAPass::FactorizeLEAsAllBasicBlocks(MachineFunction &MF) {
		bool Changed = FactorizeLEAsBasicBlock(DT->getRootNode());
		FactorOpt.performCleanup();
		return Changed;
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions DL is only used here - just use LI1.getDebugLoc() directly? RKSimon: DL is only used here - just use LI1.getDebugLoc() directly?

bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {		bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {
bool Changed = false;		bool Changed = false;

if (DisableX86LEAOpt \|\| skipFunction(*MF.getFunction()))		if (DisableX86LEAOpt \|\| skipFunction(*MF.getFunction()))
return false;		return false;

MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();		TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();
TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();		TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();
		DT = &getAnalysis<MachineDominatorTree>();

		// Attempt factorizing LEAs.
		Changed \|= FactorizeLEAsAllBasicBlocks(MF);

// Process all basic blocks.		// Process all basic blocks.
for (auto &MBB : MF) {		for (auto &MBB : MF) {
MemOpMap LEAs;		MemOpMap LEAs;
InstrPos.clear();		InstrPos.clear();

// Find all LEA instructions in basic block.		// Find all LEA instructions in basic block.
findLEAs(MBB, LEAs);		findLEAs(MBB, LEAs);

// If current basic block has no LEAs, move on to the next one.		// If current basic block has no LEAs, move on to the next one.
if (LEAs.empty())		if (LEAs.empty())
continue;		continue;

// Remove redundant LEA instructions.		// Remove redundant LEA instructions.
Changed \|= removeRedundantLEAs(LEAs);		Changed \|= removeRedundantLEAs(LEAs);

		// Strength reduce LEA instructions.
		Changed \|= strengthReduceLEAs(LEAs, MBB);

// Remove redundant address calculations. Do it only for -Os/-Oz since only		// Remove redundant address calculations. Do it only for -Os/-Oz since only
// a code size gain is expected from this part of the pass.		// a code size gain is expected from this part of the pass.
if (MF.getFunction()->optForSize())		if (MF.getFunction()->optForSize())
Changed \|= removeRedundantAddrCalc(LEAs);		Changed \|= removeRedundantAddrCalc(LEAs);
}		}

return Changed;		return Changed;
}		}

test/CodeGen/X86/GlobalISel/callingconv.ll

	Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines
	; X32-NEXT: .cfi_def_cfa_offset 16			; X32-NEXT: .cfi_def_cfa_offset 16
	; X32-NEXT: movl 16(%esp), %eax			; X32-NEXT: movl 16(%esp), %eax
	; X32-NEXT: movl 20(%esp), %ecx			; X32-NEXT: movl 20(%esp), %ecx
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: movl (%ecx), %edx			; X32-NEXT: movl (%ecx), %edx
	; X32-NEXT: movl 4(%ecx), %ecx			; X32-NEXT: movl 4(%ecx), %ecx
	; X32-NEXT: movl %eax, (%esp)			; X32-NEXT: movl %eax, (%esp)
	; X32-NEXT: movl $4, %eax			; X32-NEXT: movl $4, %eax
	; X32-NEXT: leal (%esp,%eax), %eax			; X32-NEXT: addl %esp, %eax
	; X32-NEXT: movl %edx, 4(%esp)			; X32-NEXT: movl %edx, 4(%esp)
	; X32-NEXT: movl %ecx, 4(%eax)			; X32-NEXT: movl %ecx, 4(%eax)
	; X32-NEXT: calll variadic_callee			; X32-NEXT: calll variadic_callee
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: test_variadic_call_2:			; X64-LABEL: test_variadic_call_2:
	; X64: # BB#0:			; X64: # BB#0:
	Show All 16 Lines

test/CodeGen/X86/GlobalISel/gep.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-linux-gnu -global-isel -verify-machineinstrs < %s -o - \| FileCheck %s --check-prefix=ALL --check-prefix=X64_GISEL			; RUN: llc -mtriple=x86_64-linux-gnu -global-isel -verify-machineinstrs < %s -o - \| FileCheck %s --check-prefix=ALL --check-prefix=X64_GISEL
	; RUN: llc -mtriple=x86_64-linux-gnu -verify-machineinstrs < %s -o - \| FileCheck %s --check-prefix=ALL --check-prefix=X64			; RUN: llc -mtriple=x86_64-linux-gnu -verify-machineinstrs < %s -o - \| FileCheck %s --check-prefix=ALL --check-prefix=X64

	define i32* @test_gep_i8(i32 *%arr, i8 %ind) {			define i32* @test_gep_i8(i32 *%arr, i8 %ind) {
	; X64_GISEL-LABEL: test_gep_i8:			; X64_GISEL-LABEL: test_gep_i8:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $4, %rax			; X64_GISEL-NEXT: movq $4, %rcx
	; X64_GISEL-NEXT: movsbq %sil, %rcx			; X64_GISEL-NEXT: movsbq %sil, %rax
	; X64_GISEL-NEXT: imulq %rax, %rcx			; X64_GISEL-NEXT: imulq %rcx, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rcx), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i8:			; X64-LABEL: test_gep_i8:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>			; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>
	; X64-NEXT: movsbq %sil, %rax			; X64-NEXT: movsbq %sil, %rax
	; X64-NEXT: leaq (%rdi,%rax,4), %rax			; X64-NEXT: leaq (%rdi,%rax,4), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i8 %ind			%arrayidx = getelementptr i32, i32* %arr, i8 %ind
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

	define i32* @test_gep_i8_const(i32 *%arr) {			define i32* @test_gep_i8_const(i32 *%arr) {
	; X64_GISEL-LABEL: test_gep_i8_const:			; X64_GISEL-LABEL: test_gep_i8_const:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $80, %rax			; X64_GISEL-NEXT: movq $80, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i8_const:			; X64-LABEL: test_gep_i8_const:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: leaq 80(%rdi), %rax			; X64-NEXT: leaq 80(%rdi), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i8 20			%arrayidx = getelementptr i32, i32* %arr, i8 20
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

	define i32* @test_gep_i16(i32 *%arr, i16 %ind) {			define i32* @test_gep_i16(i32 *%arr, i16 %ind) {
	; X64_GISEL-LABEL: test_gep_i16:			; X64_GISEL-LABEL: test_gep_i16:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $4, %rax			; X64_GISEL-NEXT: movq $4, %rcx
	; X64_GISEL-NEXT: movswq %si, %rcx			; X64_GISEL-NEXT: movswq %si, %rax
	; X64_GISEL-NEXT: imulq %rax, %rcx			; X64_GISEL-NEXT: imulq %rcx, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rcx), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i16:			; X64-LABEL: test_gep_i16:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>			; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>
	; X64-NEXT: movswq %si, %rax			; X64-NEXT: movswq %si, %rax
	; X64-NEXT: leaq (%rdi,%rax,4), %rax			; X64-NEXT: leaq (%rdi,%rax,4), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i16 %ind			%arrayidx = getelementptr i32, i32* %arr, i16 %ind
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

	define i32* @test_gep_i16_const(i32 *%arr) {			define i32* @test_gep_i16_const(i32 *%arr) {
	; X64_GISEL-LABEL: test_gep_i16_const:			; X64_GISEL-LABEL: test_gep_i16_const:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $80, %rax			; X64_GISEL-NEXT: movq $80, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i16_const:			; X64-LABEL: test_gep_i16_const:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: leaq 80(%rdi), %rax			; X64-NEXT: leaq 80(%rdi), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i16 20			%arrayidx = getelementptr i32, i32* %arr, i16 20
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

	define i32* @test_gep_i32(i32 *%arr, i32 %ind) {			define i32* @test_gep_i32(i32 *%arr, i32 %ind) {
	; X64_GISEL-LABEL: test_gep_i32:			; X64_GISEL-LABEL: test_gep_i32:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $4, %rax			; X64_GISEL-NEXT: movq $4, %rcx
	; X64_GISEL-NEXT: movslq %esi, %rcx			; X64_GISEL-NEXT: movslq %esi, %rax
	; X64_GISEL-NEXT: imulq %rax, %rcx			; X64_GISEL-NEXT: imulq %rcx, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rcx), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i32:			; X64-LABEL: test_gep_i32:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movslq %esi, %rax			; X64-NEXT: movslq %esi, %rax
	; X64-NEXT: leaq (%rdi,%rax,4), %rax			; X64-NEXT: leaq (%rdi,%rax,4), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i32 %ind			%arrayidx = getelementptr i32, i32* %arr, i32 %ind
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

	define i32* @test_gep_i32_const(i32 *%arr) {			define i32* @test_gep_i32_const(i32 *%arr) {
	; X64_GISEL-LABEL: test_gep_i32_const:			; X64_GISEL-LABEL: test_gep_i32_const:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $20, %rax			; X64_GISEL-NEXT: movq $20, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i32_const:			; X64-LABEL: test_gep_i32_const:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: leaq 20(%rdi), %rax			; X64-NEXT: leaq 20(%rdi), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i32 5			%arrayidx = getelementptr i32, i32* %arr, i32 5
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

	define i32* @test_gep_i64(i32 *%arr, i64 %ind) {			define i32* @test_gep_i64(i32 *%arr, i64 %ind) {
	; X64_GISEL-LABEL: test_gep_i64:			; X64_GISEL-LABEL: test_gep_i64:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $4, %rax			; X64_GISEL-NEXT: movq $4, %rax
	; X64_GISEL-NEXT: imulq %rsi, %rax			; X64_GISEL-NEXT: imulq %rsi, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i64:			; X64-LABEL: test_gep_i64:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: leaq (%rdi,%rsi,4), %rax			; X64-NEXT: leaq (%rdi,%rsi,4), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i64 %ind			%arrayidx = getelementptr i32, i32* %arr, i64 %ind
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

	define i32* @test_gep_i64_const(i32 *%arr) {			define i32* @test_gep_i64_const(i32 *%arr) {
	; X64_GISEL-LABEL: test_gep_i64_const:			; X64_GISEL-LABEL: test_gep_i64_const:
	; X64_GISEL: # BB#0:			; X64_GISEL: # BB#0:
	; X64_GISEL-NEXT: movq $20, %rax			; X64_GISEL-NEXT: movq $20, %rax
	; X64_GISEL-NEXT: leaq (%rdi,%rax), %rax			; X64_GISEL-NEXT: addq %rdi, %rax
	; X64_GISEL-NEXT: retq			; X64_GISEL-NEXT: retq
	;			;
	; X64-LABEL: test_gep_i64_const:			; X64-LABEL: test_gep_i64_const:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: leaq 20(%rdi), %rax			; X64-NEXT: leaq 20(%rdi), %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%arrayidx = getelementptr i32, i32* %arr, i64 5			%arrayidx = getelementptr i32, i32* %arr, i64 5
	ret i32* %arrayidx			ret i32* %arrayidx
	}			}

test/CodeGen/X86/GlobalISel/memop-scalar.ll

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
ret i32 %r		ret i32 %r
}		}

; check that gep index doesn't folded into memory operand		; check that gep index doesn't folded into memory operand
define i32 @test_gep_folding_largeGepIndex(i32* %arr, i32 %val) {		define i32 @test_gep_folding_largeGepIndex(i32* %arr, i32 %val) {
; ALL-LABEL: test_gep_folding_largeGepIndex:		; ALL-LABEL: test_gep_folding_largeGepIndex:
; ALL: # BB#0:		; ALL: # BB#0:
; ALL-NEXT: movabsq $228719476720, %rax # imm = 0x3540BE3FF0		; ALL-NEXT: movabsq $228719476720, %rax # imm = 0x3540BE3FF0
; ALL-NEXT: leaq (%rdi,%rax), %rax		; ALL-NEXT: addq %rdi, %rax
; ALL-NEXT: movl %esi, (%rax)		; ALL-NEXT: movl %esi, (%rax)
; ALL-NEXT: movl (%rax), %eax		; ALL-NEXT: movl (%rax), %eax
; ALL-NEXT: retq		; ALL-NEXT: retq
%arrayidx = getelementptr i32, i32* %arr, i64 57179869180		%arrayidx = getelementptr i32, i32* %arr, i64 57179869180
store i32 %val, i32* %arrayidx		store i32 %val, i32* %arrayidx
%r = load i32, i32* %arrayidx		%r = load i32, i32* %arrayidx
ret i32 %r		ret i32 %r
}		}

test/CodeGen/X86/lea-opt-cse1.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86

	%struct.SA = type { i32 , i32 , i32 , i32 ,i32 }			%struct.SA = type { i32 , i32 , i32 , i32 ,i32 }

	define void @test_func(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr {			define void @test_func(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr {
	; X64-LABEL: test_func:			; X64-LABEL: test_func:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movl (%rdi), %eax			; X64-NEXT: movl (%rdi), %eax
	; X64-NEXT: movl 16(%rdi), %ecx			; X64-NEXT: movl 16(%rdi), %ecx
	; X64-NEXT: leal (%rax,%rcx), %edx
	; X64-NEXT: leal 1(%rax,%rcx), %eax			; X64-NEXT: leal 1(%rax,%rcx), %eax
	; X64-NEXT: movl %eax, 12(%rdi)			; X64-NEXT: movl %eax, 12(%rdi)
	; X64-NEXT: leal 1(%rcx,%rdx), %eax			; X64-NEXT: addq %rcx, %eax
	; X64-NEXT: movl %eax, 16(%rdi)			; X64-NEXT: movl %eax, 16(%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: test_func:			; X86-LABEL: test_func:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi0:
	; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .Lcfi1:
	; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl (%eax), %ecx			; X86-NEXT: movl (%eax), %ecx
	; X86-NEXT: movl 16(%eax), %edx			; X86-NEXT: movl 16(%eax), %edx
	; X86-NEXT: leal 1(%ecx,%edx), %esi			; X86-NEXT: leal 1(%ecx,%edx), %ecx
				; X86-NEXT: movl %ecx, 12(%eax)
	; X86-NEXT: addl %edx, %ecx			; X86-NEXT: addl %edx, %ecx
	; X86-NEXT: movl %esi, 12(%eax)
	; X86-NEXT: leal 1(%edx,%ecx), %ecx
	; X86-NEXT: movl %ecx, 16(%eax)			; X86-NEXT: movl %ecx, 16(%eax)
	; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0			%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
	%0 = load i32, i32* %h0, align 8			%0 = load i32, i32* %h0, align 8
	%h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3			%h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3
	%h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4			%h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4
	%1 = load i32, i32* %h4, align 8			%1 = load i32, i32* %h4, align 8
	%add = add i32 %0, 1			%add = add i32 %0, 1
	%add4 = add i32 %add, %1			%add4 = add i32 %add, %1
	store i32 %add4, i32* %h3, align 4			store i32 %add4, i32* %h3, align 4
	%add29 = add i32 %add4 , %1			%add29 = add i32 %add4 , %1
	store i32 %add29, i32* %h4, align 8			store i32 %add29, i32* %h4, align 8
	ret void			ret void
	}			}

test/CodeGen/X86/lea-opt-cse2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+slow-3ops-lea \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown -mattr=+slow-3ops-lea \| FileCheck %s -check-prefix=X86
				RKSimonUnsubmitted Not Done Reply Inline Actions Why have you changed these tests? RKSimon: Why have you changed these tests?

	%struct.SA = type { i32 , i32 , i32 , i32 , i32};			%struct.SA = type { i32 , i32 , i32 , i32 , i32};

	define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {			define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB0_1: # %loop			; X64-NEXT: .LBB0_1: # %loop
	; X64-NEXT: # =>This Inner Loop Header: Depth=1			; X64-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-NEXT: movl (%rdi), %eax			; X64-NEXT: movl 16(%rdi), %eax
	; X64-NEXT: movl 16(%rdi), %ecx			; X64-NEXT: movl (%rdi), %ecx
	; X64-NEXT: leal 1(%rax,%rcx), %edx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: movl %edx, 12(%rdi)			; X64-NEXT: incl %ecx
				; X64-NEXT: movl %ecx, 12(%rdi)
	; X64-NEXT: decl %esi			; X64-NEXT: decl %esi
	; X64-NEXT: jne .LBB0_1			; X64-NEXT: jne .LBB0_1
	; X64-NEXT: # BB#2: # %exit			; X64-NEXT: # BB#2: # %exit
	; X64-NEXT: addl %ecx, %eax			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: leal 1(%rcx,%rax), %eax			; X64-NEXT: movl %ecx, 16(%rdi)
	; X64-NEXT: movl %eax, 16(%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: foo:			; X86-LABEL: foo:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %edi			; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi0:			; X86-NEXT: .Lcfi0:
	; X86-NEXT: .cfi_def_cfa_offset 8			; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi1:			; X86-NEXT: .Lcfi1:
	; X86-NEXT: .cfi_def_cfa_offset 12			; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: .Lcfi2:
	; X86-NEXT: .cfi_offset %esi, -12
	; X86-NEXT: .Lcfi3:
	; X86-NEXT: .cfi_offset %edi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: .p2align 4, 0x90			; X86-NEXT: .p2align 4, 0x90
	; X86-NEXT: .LBB0_1: # %loop			; X86-NEXT: .LBB0_1: # %loop
	; X86-NEXT: # =>This Inner Loop Header: Depth=1			; X86-NEXT: # =>This Inner Loop Header: Depth=1
	; X86-NEXT: movl (%eax), %edx			; X86-NEXT: movl 16(%eax), %edx
	; X86-NEXT: movl 16(%eax), %esi			; X86-NEXT: movl (%eax), %esi
	; X86-NEXT: leal 1(%edx,%esi), %edi			; X86-NEXT: addl %edx, %esi
	; X86-NEXT: movl %edi, 12(%eax)			; X86-NEXT: incl %esi
				; X86-NEXT: movl %esi, 12(%eax)
	; X86-NEXT: decl %ecx			; X86-NEXT: decl %ecx
	; X86-NEXT: jne .LBB0_1			; X86-NEXT: jne .LBB0_1
	; X86-NEXT: # BB#2: # %exit			; X86-NEXT: # BB#2: # %exit
	; X86-NEXT: addl %esi, %edx			; X86-NEXT: addl %edx, %esi
	; X86-NEXT: leal 1(%esi,%edx), %ecx			; X86-NEXT: movl %esi, 16(%eax)
	; X86-NEXT: movl %ecx, 16(%eax)
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iter = phi i32 [%n ,%entry ] ,[ %iter.ctr ,%loop]			%iter = phi i32 [%n ,%entry ] ,[ %iter.ctr ,%loop]
	%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0			%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
	%0 = load i32, i32* %h0, align 8			%0 = load i32, i32* %h0, align 8
	Show All 15 Lines

test/CodeGen/X86/lea-opt-cse3.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86

	define i32 @foo(i32 %a, i32 %b) local_unnamed_addr #0 {			define i32 @foo(i32 %a, i32 %b) local_unnamed_addr #0 {
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>			; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>
	; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-NEXT: leal 4(%rdi,%rsi,2), %ecx			; X64-NEXT: leal 4(%rdi,%rsi,2), %ecx
	; X64-NEXT: leal 4(%rdi,%rsi,4), %eax			; X64-NEXT: leal (%ecx,%rsi,2), %eax
	; X64-NEXT: imull %ecx, %eax			; X64-NEXT: imull %ecx, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: foo:			; X86-LABEL: foo:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: leal 4(%ecx,%eax,2), %edx			; X86-NEXT: leal 4(%ecx,%eax,2), %ecx
	; X86-NEXT: leal 4(%ecx,%eax,4), %eax			; X86-NEXT: leal (%ecx,%eax,2), %eax
	; X86-NEXT: imull %edx, %eax			; X86-NEXT: imull %ecx, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%mul = shl i32 %b, 1			%mul = shl i32 %b, 1
	%add = add i32 %a, 4			%add = add i32 %a, 4
	%add1 = add i32 %add, %mul			%add1 = add i32 %add, %mul
	%mul2 = shl i32 %b, 2			%mul2 = shl i32 %b, 2
	%add4 = add i32 %add, %mul2			%add4 = add i32 %add, %mul2
	%mul5 = mul nsw i32 %add1, %add4			%mul5 = mul nsw i32 %add1, %add4
	ret i32 %mul5			ret i32 %mul5
	}			}

	define i32 @foo1(i32 %a, i32 %b) local_unnamed_addr #0 {			define i32 @foo1(i32 %a, i32 %b) local_unnamed_addr #0 {
	; X64-LABEL: foo1:			; X64-LABEL: foo1:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>			; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>
	; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-NEXT: leal 4(%rdi,%rsi,4), %ecx			; X64-NEXT: leal 4(%rdi,%rsi,4), %ecx
	; X64-NEXT: leal 4(%rdi,%rsi,8), %eax			; X64-NEXT: leal (%ecx,%rsi,4), %eax
	; X64-NEXT: imull %ecx, %eax			; X64-NEXT: imull %ecx, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: foo1:			; X86-LABEL: foo1:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: leal 4(%ecx,%eax,4), %edx			; X86-NEXT: leal 4(%ecx,%eax,4), %ecx
	; X86-NEXT: leal 4(%ecx,%eax,8), %eax			; X86-NEXT: leal (%ecx,%eax,4), %eax
	; X86-NEXT: imull %edx, %eax			; X86-NEXT: imull %ecx, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%mul = shl i32 %b, 2			%mul = shl i32 %b, 2
	%add = add i32 %a, 4			%add = add i32 %a, 4
	%add1 = add i32 %add, %mul			%add1 = add i32 %add, %mul
	%mul2 = shl i32 %b, 3			%mul2 = shl i32 %b, 3
	%add4 = add i32 %add, %mul2			%add4 = add i32 %add, %mul2
	%mul5 = mul nsw i32 %add1, %add4			%mul5 = mul nsw i32 %add1, %add4
	ret i32 %mul5			ret i32 %mul5
	}			}

	define i32 @foo1_mult_basic_blocks(i32 %a, i32 %b) local_unnamed_addr #0 {			define i32 @foo1_mult_basic_blocks(i32 %a, i32 %b) local_unnamed_addr #0 {
	; X64-LABEL: foo1_mult_basic_blocks:			; X64-LABEL: foo1_mult_basic_blocks:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>			; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>
	; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-NEXT: leal 4(%rdi,%rsi,4), %ecx			; X64-NEXT: leal 4(%rdi,%rsi,4), %ecx
	; X64-NEXT: xorl %eax, %eax			; X64-NEXT: xorl %eax, %eax
	; X64-NEXT: cmpl $10, %ecx			; X64-NEXT: cmpl $10, %ecx
	; X64-NEXT: je .LBB2_2			; X64-NEXT: je .LBB2_2
	; X64-NEXT: # BB#1: # %mid			; X64-NEXT: # BB#1: # %mid
	; X64-NEXT: leal 4(%rdi,%rsi,8), %eax			; X64-NEXT: leal (%ecx,%rsi,4), %eax
	; X64-NEXT: imull %eax, %ecx			; X64-NEXT: imull %ecx, %eax
	; X64-NEXT: movl %ecx, %eax
	; X64-NEXT: .LBB2_2: # %exit			; X64-NEXT: .LBB2_2: # %exit
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: foo1_mult_basic_blocks:			; X86-LABEL: foo1_mult_basic_blocks:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi0:
	; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .Lcfi1:
	; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %esi			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: leal 4(%esi,%edx,4), %ecx			; X86-NEXT: leal 4(%eax,%edx,4), %ecx
	; X86-NEXT: xorl %eax, %eax			; X86-NEXT: xorl %eax, %eax
	; X86-NEXT: cmpl $10, %ecx			; X86-NEXT: cmpl $10, %ecx
	; X86-NEXT: je .LBB2_2			; X86-NEXT: je .LBB2_2
	; X86-NEXT: # BB#1: # %mid			; X86-NEXT: # BB#1: # %mid
	; X86-NEXT: leal 4(%esi,%edx,8), %eax			; X86-NEXT: leal (%ecx,%edx,4), %eax
	; X86-NEXT: imull %eax, %ecx			; X86-NEXT: imull %ecx, %eax
	; X86-NEXT: movl %ecx, %eax
	; X86-NEXT: .LBB2_2: # %exit			; X86-NEXT: .LBB2_2: # %exit
	; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%mul = shl i32 %b, 2			%mul = shl i32 %b, 2
	%add = add i32 %a, 4			%add = add i32 %a, 4
	%add1 = add i32 %add, %mul			%add1 = add i32 %add, %mul
	%cmp = icmp ne i32 %add1 , 10			%cmp = icmp ne i32 %add1 , 10
	br i1 %cmp , label %mid , label %exit			br i1 %cmp , label %mid , label %exit
	mid:			mid:
	Show All 22 Lines
	; X64-NEXT: imull %eax, %ecx			; X64-NEXT: imull %eax, %ecx
	; X64-NEXT: movl %ecx, %eax			; X64-NEXT: movl %ecx, %eax
	; X64-NEXT: .LBB3_2: # %exit			; X64-NEXT: .LBB3_2: # %exit
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: foo1_mult_basic_blocks_illegal_scale:			; X86-LABEL: foo1_mult_basic_blocks_illegal_scale:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %esi			; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi2:			; X86-NEXT: .Lcfi0:
	; X86-NEXT: .cfi_def_cfa_offset 8			; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .Lcfi3:			; X86-NEXT: .Lcfi1:
	; X86-NEXT: .cfi_offset %esi, -8			; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %esi			; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-NEXT: leal 4(%esi,%edx,2), %ecx			; X86-NEXT: leal 4(%esi,%edx,2), %ecx
	; X86-NEXT: xorl %eax, %eax			; X86-NEXT: xorl %eax, %eax
	; X86-NEXT: cmpl $10, %ecx			; X86-NEXT: cmpl $10, %ecx
	; X86-NEXT: je .LBB3_2			; X86-NEXT: je .LBB3_2
	; X86-NEXT: # BB#1: # %mid			; X86-NEXT: # BB#1: # %mid
	Show All 22 Lines

test/CodeGen/X86/lea-opt-cse4.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+slow-3ops-lea \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown -mattr=+slow-3ops-lea \| FileCheck %s -check-prefix=X86
				RKSimonUnsubmitted Not Done Reply Inline Actions Why have you changed these tests? RKSimon: Why have you changed these tests?
				jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions FixupLEAPass down the pipeline transforms some complex LEA ptterns to simple with add. Optimization, with changes in the patch we will have following leal 1(%rax,%rcx,4), %eax which after FixupLEAPass will get converted to leal (%rax,%rcx,4), %eax addl $1, %eax jbhateja: FixupLEAPass down the pipeline transforms some complex LEA ptterns to simple with add.

	%struct.SA = type { i32 , i32 , i32 , i32 , i32};			%struct.SA = type { i32 , i32 , i32 , i32 , i32};

	define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {			define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {
	; X64-LABEL: foo:			; X64-LABEL: foo:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: movl 16(%rdi), %eax			; X64-NEXT: movl (%rdi), %eax
	; X64-NEXT: movl (%rdi), %ecx			; X64-NEXT: movl 16(%rdi), %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: leal (%rax,%rcx,4), %eax
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl $1, %eax
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: movl %eax, 12(%rdi)
	; X64-NEXT: leal (%rcx,%rax), %edx			; X64-NEXT: addl %ecx, %eax
	; X64-NEXT: leal 1(%rax,%rcx), %ecx
	; X64-NEXT: movl %ecx, 12(%rdi)
	; X64-NEXT: leal 1(%rax,%rdx), %eax
	; X64-NEXT: movl %eax, 16(%rdi)			; X64-NEXT: movl %eax, 16(%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: foo:			; X86-LABEL: foo:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi0:
	; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .Lcfi1:
	; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl 16(%eax), %ecx			; X86-NEXT: movl (%eax), %ecx
	; X86-NEXT: movl (%eax), %edx			; X86-NEXT: movl 16(%eax), %edx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: leal (%ecx,%edx,4), %ecx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl $1, %ecx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: movl %ecx, 12(%eax)
	; X86-NEXT: leal 1(%ecx,%edx), %esi			; X86-NEXT: addl %edx, %ecx
	; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: movl %esi, 12(%eax)
	; X86-NEXT: leal 1(%ecx,%edx), %ecx
	; X86-NEXT: movl %ecx, 16(%eax)			; X86-NEXT: movl %ecx, 16(%eax)
	; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0			%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
	%0 = load i32, i32* %h0, align 8			%0 = load i32, i32* %h0, align 8
	%h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3			%h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3
	%h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4			%h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4
	%1 = load i32, i32* %h4, align 8			%1 = load i32, i32* %h4, align 8
	%add = add i32 %0 , 1			%add = add i32 %0 , 1
	Show All 10 Lines


	define void @foo_loop(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {			define void @foo_loop(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 {
	; X64-LABEL: foo_loop:			; X64-LABEL: foo_loop:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: .p2align 4, 0x90			; X64-NEXT: .p2align 4, 0x90
	; X64-NEXT: .LBB1_1: # %loop			; X64-NEXT: .LBB1_1: # %loop
	; X64-NEXT: # =>This Inner Loop Header: Depth=1			; X64-NEXT: # =>This Inner Loop Header: Depth=1
	; X64-NEXT: movl (%rdi), %ecx
	; X64-NEXT: movl 16(%rdi), %eax			; X64-NEXT: movl 16(%rdi), %eax
	; X64-NEXT: leal 1(%rcx,%rax), %edx			; X64-NEXT: movl (%rdi), %ecx
	; X64-NEXT: movl %edx, 12(%rdi)			; X64-NEXT: addl %eax, %ecx
				; X64-NEXT: incl %ecx
				; X64-NEXT: movl %ecx, 12(%rdi)
	; X64-NEXT: decl %esi			; X64-NEXT: decl %esi
	; X64-NEXT: jne .LBB1_1			; X64-NEXT: jne .LBB1_1
	; X64-NEXT: # BB#2: # %exit			; X64-NEXT: # BB#2: # %exit
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: leal 1(%rax,%rcx), %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %eax, %ecx
	; X64-NEXT: movl %ecx, 16(%rdi)			; X64-NEXT: movl %ecx, 16(%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: foo_loop:			; X86-LABEL: foo_loop:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: pushl %edi
	; X86-NEXT: .Lcfi2:
	; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: pushl %esi			; X86-NEXT: pushl %esi
	; X86-NEXT: .Lcfi3:			; X86-NEXT: .Lcfi0:
	; X86-NEXT: .cfi_def_cfa_offset 12			; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .Lcfi4:			; X86-NEXT: .Lcfi1:
	; X86-NEXT: .cfi_offset %esi, -12			; X86-NEXT: .cfi_offset %esi, -8
	; X86-NEXT: .Lcfi5:			; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-NEXT: .cfi_offset %edi, -8
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: .p2align 4, 0x90			; X86-NEXT: .p2align 4, 0x90
	; X86-NEXT: .LBB1_1: # %loop			; X86-NEXT: .LBB1_1: # %loop
	; X86-NEXT: # =>This Inner Loop Header: Depth=1			; X86-NEXT: # =>This Inner Loop Header: Depth=1
	; X86-NEXT: movl (%eax), %esi
	; X86-NEXT: movl 16(%eax), %ecx			; X86-NEXT: movl 16(%eax), %ecx
	; X86-NEXT: leal 1(%esi,%ecx), %edi			; X86-NEXT: movl (%eax), %edx
	; X86-NEXT: movl %edi, 12(%eax)			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: decl %edx			; X86-NEXT: incl %edx
				; X86-NEXT: movl %edx, 12(%eax)
				; X86-NEXT: decl %esi
	; X86-NEXT: jne .LBB1_1			; X86-NEXT: jne .LBB1_1
	; X86-NEXT: # BB#2: # %exit			; X86-NEXT: # BB#2: # %exit
	; X86-NEXT: addl %ecx, %esi			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: leal 1(%ecx,%esi), %edx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: movl %edx, 16(%eax)			; X86-NEXT: movl %edx, 16(%eax)
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi
	; X86-NEXT: retl			; X86-NEXT: retl
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iter = phi i32 [%n ,%entry ] ,[ %iter.ctr ,%loop]			%iter = phi i32 [%n ,%entry ] ,[ %iter.ctr ,%loop]
	%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0			%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
	%0 = load i32, i32* %h0, align 8			%0 = load i32, i32* %h0, align 8
	Show All 21 Lines

test/CodeGen/X86/mul-constant-i16.ll

	Show First 20 Lines • Show All 552 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%mul = mul nsw i16 %x, 28			%mul = mul nsw i16 %x, 28
	ret i16 %mul			ret i16 %mul
	}			}

	define i16 @test_mul_by_29(i16 %x) {			define i16 @test_mul_by_29(i16 %x) {
	; X86-LABEL: test_mul_by_29:			; X86-LABEL: test_mul_by_29:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: leal (%ecx,%ecx,8), %eax			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%eax,%eax,2), %eax			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %ecx, %eax			; X86-NEXT: leal (%ecx,%eax,2), %eax
	; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>			; X86-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_mul_by_29:			; X64-LABEL: test_mul_by_29:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-NEXT: leal (%rdi,%rdi,8), %eax			; X64-NEXT: leal (%rdi,%rdi,8), %eax
	; X64-NEXT: leal (%rax,%rax,2), %eax			; X64-NEXT: leal (%rax,%rax,2), %eax
	; X64-NEXT: addl %edi, %eax			; X64-NEXT: leal (%rax,%rdi,2), %eax
	; X64-NEXT: addl %edi, %eax
	; X64-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>			; X64-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
	; X64-NEXT: retq			; X64-NEXT: retq
	%mul = mul nsw i16 %x, 29			%mul = mul nsw i16 %x, 29
	ret i16 %mul			ret i16 %mul
	}			}

	define i16 @test_mul_by_30(i16 %x) {			define i16 @test_mul_by_30(i16 %x) {
	; X86-LABEL: test_mul_by_30:			; X86-LABEL: test_mul_by_30:
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-constant-i32.ll

	Show First 20 Lines • Show All 1,451 Lines • ▼ Show 20 Lines
	; SLM-NOOPT-NEXT: retq # sched: [4:1.00]			; SLM-NOOPT-NEXT: retq # sched: [4:1.00]
	%mul = mul nsw i32 %x, 28			%mul = mul nsw i32 %x, 28
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test_mul_by_29(i32 %x) {			define i32 @test_mul_by_29(i32 %x) {
	; X86-LABEL: test_mul_by_29:			; X86-LABEL: test_mul_by_29:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: leal (%ecx,%ecx,8), %eax			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%eax,%eax,2), %eax			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %ecx, %eax			; X86-NEXT: leal (%ecx,%eax,2), %eax
	; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-HSW-LABEL: test_mul_by_29:			; X64-HSW-LABEL: test_mul_by_29:
	; X64-HSW: # BB#0:			; X64-HSW: # BB#0:
	; X64-HSW-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-HSW-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-HSW-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]			; X64-HSW-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]
	; X64-HSW-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]			; X64-HSW-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]
	; X64-HSW-NEXT: addl %edi, %eax # sched: [1:0.25]			; X64-HSW-NEXT: leal (%rax,%rdi,2), %eax # sched: [1:0.50]
	; X64-HSW-NEXT: addl %edi, %eax # sched: [1:0.25]
	; X64-HSW-NEXT: retq # sched: [2:1.00]			; X64-HSW-NEXT: retq # sched: [2:1.00]
	;			;
	; X64-JAG-LABEL: test_mul_by_29:			; X64-JAG-LABEL: test_mul_by_29:
	; X64-JAG: # BB#0:			; X64-JAG: # BB#0:
	; X64-JAG-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-JAG-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-JAG-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]			; X64-JAG-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]
	; X64-JAG-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]			; X64-JAG-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]
	; X64-JAG-NEXT: addl %edi, %eax # sched: [1:0.50]			; X64-JAG-NEXT: leal (%rax,%rdi,2), %eax # sched: [1:0.50]
	; X64-JAG-NEXT: addl %edi, %eax # sched: [1:0.50]
	; X64-JAG-NEXT: retq # sched: [4:1.00]			; X64-JAG-NEXT: retq # sched: [4:1.00]
	;			;
	; X86-NOOPT-LABEL: test_mul_by_29:			; X86-NOOPT-LABEL: test_mul_by_29:
	; X86-NOOPT: # BB#0:			; X86-NOOPT: # BB#0:
	; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %eax			; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %eax
	; X86-NOOPT-NEXT: retl			; X86-NOOPT-NEXT: retl
	;			;
	; HSW-NOOPT-LABEL: test_mul_by_29:			; HSW-NOOPT-LABEL: test_mul_by_29:
	▲ Show 20 Lines • Show All 257 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-constant-i64.ll

	Show First 20 Lines • Show All 1,517 Lines • ▼ Show 20 Lines
	}			}

	define i64 @test_mul_by_29(i64 %x) {			define i64 @test_mul_by_29(i64 %x) {
	; X86-LABEL: test_mul_by_29:			; X86-LABEL: test_mul_by_29:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: leal (%eax,%eax,8), %ecx			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%ecx,%ecx,2), %ecx			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %eax, %ecx			; X86-NEXT: leal (%ecx,%eax,2), %ecx
	; X86-NEXT: addl %eax, %ecx
	; X86-NEXT: movl $29, %eax			; X86-NEXT: movl $29, %eax
	; X86-NEXT: mull {{[0-9]+}}(%esp)			; X86-NEXT: mull {{[0-9]+}}(%esp)
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-HSW-LABEL: test_mul_by_29:			; X64-HSW-LABEL: test_mul_by_29:
	; X64-HSW: # BB#0:			; X64-HSW: # BB#0:
	; X64-HSW-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]			; X64-HSW-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]
	; X64-HSW-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]			; X64-HSW-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]
	; X64-HSW-NEXT: addq %rdi, %rax # sched: [1:0.25]			; X64-HSW-NEXT: leaq (%rax,%rdi,2), %rax # sched: [1:0.50]
	; X64-HSW-NEXT: addq %rdi, %rax # sched: [1:0.25]
	; X64-HSW-NEXT: retq # sched: [2:1.00]			; X64-HSW-NEXT: retq # sched: [2:1.00]
	;			;
	; X64-JAG-LABEL: test_mul_by_29:			; X64-JAG-LABEL: test_mul_by_29:
	; X64-JAG: # BB#0:			; X64-JAG: # BB#0:
	; X64-JAG-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]			; X64-JAG-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]
	; X64-JAG-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]			; X64-JAG-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]
	; X64-JAG-NEXT: addq %rdi, %rax # sched: [1:0.50]			; X64-JAG-NEXT: leaq (%rax,%rdi,2), %rax # sched: [1:0.50]
	; X64-JAG-NEXT: addq %rdi, %rax # sched: [1:0.50]
	; X64-JAG-NEXT: retq # sched: [4:1.00]			; X64-JAG-NEXT: retq # sched: [4:1.00]
	;			;
	; X86-NOOPT-LABEL: test_mul_by_29:			; X86-NOOPT-LABEL: test_mul_by_29:
	; X86-NOOPT: # BB#0:			; X86-NOOPT: # BB#0:
	; X86-NOOPT-NEXT: movl $29, %eax			; X86-NOOPT-NEXT: movl $29, %eax
	; X86-NOOPT-NEXT: mull {{[0-9]+}}(%esp)			; X86-NOOPT-NEXT: mull {{[0-9]+}}(%esp)
	; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %ecx			; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %ecx
	; X86-NOOPT-NEXT: addl %ecx, %edx			; X86-NOOPT-NEXT: addl %ecx, %edx
	▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-constant-result.ll

	Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	; X86-NEXT: leal (%eax,%eax,8), %ecx			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%ecx,%ecx,2), %ecx			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %ecx, %eax			; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	; X86-NEXT: .LBB0_35:			; X86-NEXT: .LBB0_35:
	; X86-NEXT: leal (%eax,%eax,8), %ecx			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%ecx,%ecx,2), %ecx			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %eax, %ecx			; X86-NEXT: leal (%ecx,%eax,2), %eax
	; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	; X86-NEXT: .LBB0_36:			; X86-NEXT: .LBB0_36:
	; X86-NEXT: movl %eax, %ecx			; X86-NEXT: movl %eax, %ecx
	; X86-NEXT: shll $5, %ecx			; X86-NEXT: shll $5, %ecx
	; X86-NEXT: subl %eax, %ecx			; X86-NEXT: subl %eax, %ecx
	; X86-NEXT: jmp .LBB0_12			; X86-NEXT: jmp .LBB0_12
	; X86-NEXT: .LBB0_37:			; X86-NEXT: .LBB0_37:
	▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	; X64-HSW-NEXT: .LBB0_30:			; X64-HSW-NEXT: .LBB0_30:
	; X64-HSW-NEXT: leal (%rax,%rax,8), %eax			; X64-HSW-NEXT: leal (%rax,%rax,8), %eax
	; X64-HSW-NEXT: leal (%rax,%rax,2), %eax			; X64-HSW-NEXT: leal (%rax,%rax,2), %eax
	; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>			; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>
	; X64-HSW-NEXT: retq			; X64-HSW-NEXT: retq
	; X64-HSW-NEXT: .LBB0_31:			; X64-HSW-NEXT: .LBB0_31:
	; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx			; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
	; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx			; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
	; X64-HSW-NEXT: jmp .LBB0_17
	; X64-HSW-NEXT: .LBB0_32:
	; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
	; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
	; X64-HSW-NEXT: addl %eax, %ecx
	; X64-HSW-NEXT: .LBB0_17:			; X64-HSW-NEXT: .LBB0_17:
	; X64-HSW-NEXT: addl %eax, %ecx			; X64-HSW-NEXT: addl %eax, %ecx
	; X64-HSW-NEXT: movl %ecx, %eax			; X64-HSW-NEXT: movl %ecx, %eax
	; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>			; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>
	; X64-HSW-NEXT: retq			; X64-HSW-NEXT: retq
				; X64-HSW-NEXT: .LBB0_32:
				; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
				; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
				; X64-HSW-NEXT: leal (%rcx,%rax,2), %eax
				; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>
				; X64-HSW-NEXT: retq
	; X64-HSW-NEXT: .LBB0_33:			; X64-HSW-NEXT: .LBB0_33:
	; X64-HSW-NEXT: movl %eax, %ecx			; X64-HSW-NEXT: movl %eax, %ecx
	; X64-HSW-NEXT: shll $5, %ecx			; X64-HSW-NEXT: shll $5, %ecx
	; X64-HSW-NEXT: subl %eax, %ecx			; X64-HSW-NEXT: subl %eax, %ecx
	; X64-HSW-NEXT: jmp .LBB0_8			; X64-HSW-NEXT: jmp .LBB0_8
	; X64-HSW-NEXT: .LBB0_34:			; X64-HSW-NEXT: .LBB0_34:
	; X64-HSW-NEXT: movl %eax, %ecx			; X64-HSW-NEXT: movl %eax, %ecx
	; X64-HSW-NEXT: shll $5, %ecx			; X64-HSW-NEXT: shll $5, %ecx
	▲ Show 20 Lines • Show All 947 Lines • Show Last 20 Lines

test/CodeGen/X86/umul-with-overflow.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-linux-gnu \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown-linux-gnu \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=X64

	declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)			declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)

	define zeroext i1 @a(i32 %x) nounwind {			define zeroext i1 @a(i32 %x) nounwind {
	; X86-LABEL: a:			; X86-LABEL: a:
				lsabaUnsubmitted Not Done Reply Inline Actions why did this test change? lsaba: why did this test change?
				jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Beecause I generated its output with script utils/update_llc_test_checks.py which adds an assertion for each instruction. I think it sould be fine. jbhateja: Beecause I generated its output with script utils/update_llc_test_checks.py which adds an…
				lsabaUnsubmitted Not Done Reply Inline Actions This needs to be in a separate pre commit. please commit and rebase lsaba: This needs to be in a separate pre commit. please commit and rebase
				RKSimonUnsubmitted Not Done Reply Inline Actions I regenerated this recently - please rebase RKSimon: I regenerated this recently - please rebase
				RKSimonUnsubmitted Not Done Reply Inline Actions Still needs to be rebased - you've lost the x86_64 tests RKSimon: Still needs to be rebased - you've lost the x86_64 tests
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl $3, %ecx			; X86-NEXT: movl $3, %ecx
	; X86-NEXT: mull %ecx			; X86-NEXT: mull %ecx
	; X86-NEXT: seto %al			; X86-NEXT: seto %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: a:			; X64-LABEL: a:
	Show All 18 Lines
	;			;
	; X64-LABEL: test2:			; X64-LABEL: test2:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-NEXT: addl %esi, %edi			; X64-NEXT: addl %esi, %edi
	; X64-NEXT: leal (%rdi,%rdi), %eax			; X64-NEXT: leal (%rdi,%rdi), %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%tmp0 = add i32 %b, %a			%tmp0 = add i32 %b, %a
	%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 2)			%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 2)
	%tmp2 = extractvalue { i32, i1 } %tmp1, 0			%tmp2 = extractvalue { i32, i1 } %tmp1, 0
	ret i32 %tmp2			ret i32 %tmp2
	}			}

	define i32 @test3(i32 %a, i32 %b) nounwind readnone {			define i32 @test3(i32 %a, i32 %b) nounwind readnone {
	; X86-LABEL: test3:			; X86-LABEL: test3:
	; X86: # BB#0: # %entry			; X86: # BB#0: # %entry
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: addl {{[0-9]+}}(%esp), %eax			; X86-NEXT: addl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl $4, %ecx			; X86-NEXT: movl $4, %ecx
	; X86-NEXT: mull %ecx			; X86-NEXT: mull %ecx
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test3:			; X64-LABEL: test3:
	; X64: # BB#0: # %entry			; X64: # BB#0: # %entry
	; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>			; X64-NEXT: # kill: %ESI<def> %ESI<kill> %RSI<def>
	; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-NEXT: leal (%rdi,%rsi), %eax			; X64-NEXT: leal (%rdi,%rsi), %eax
	; X64-NEXT: movl $4, %ecx			; X64-NEXT: movl $4, %ecx
	; X64-NEXT: mull %ecx			; X64-NEXT: mull %ecx
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%tmp0 = add i32 %b, %a			%tmp0 = add i32 %b, %a
	%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 4)			%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 4)
	%tmp2 = extractvalue { i32, i1 } %tmp1, 0			%tmp2 = extractvalue { i32, i1 } %tmp1, 0
	ret i32 %tmp2			ret i32 %tmp2
	}			}

test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll

	; RUN: llc < %s -O3 -march=x86-64 -mcpu=core2 \| FileCheck %s -check-prefix=X64			; RUN: llc < %s -O3 -march=x86-64 -mcpu=core2 \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -O3 -march=x86 -mcpu=core2 \| FileCheck %s -check-prefix=X32			; RUN: llc < %s -O3 -march=x86 -mcpu=core2 \| FileCheck %s -check-prefix=X32

	; @simple is the most basic chain of address induction variables. Chaining			; @simple is the most basic chain of address induction variables. Chaining
	; saves at least one register and avoids complex addressing and setup			; saves at least one register and avoids complex addressing and setup
	; code.			; code.
	;			;
	; X64: @simple			; X64: @simple
	; %x * 4			; %x * 4
	; X64: shlq $2			; X64: shlq $2
	; no other address computation in the preheader			; no other address computation in the preheader
	; X64-NEXT: xorl			; X64-NEXT: xorl
	; X64-NEXT: .p2align			; X64-NEXT: .p2align
	; X64: %loop			; X64: %loop
	; no complex address modes			; no complex address modes
	; X64-NOT: (%{{[^)]+}},%{{[^)]+}},			; X64-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
	;			;
	; X32: @simple			; X32: @simple
	; no expensive address computation in the preheader			; no expensive address computation in the preheader
	; X32-NOT: imul			; X32-NOT: imul
	; X32: %loop			; X32: %loop
	; no complex address modes			; no complex address modes
	; X32-NOT: (%{{[^)]+}},%{{[^)]+}},			; X32-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
	define i32 @simple(i32* %a, i32* %b, i32 %x) nounwind {			define i32 @simple(i32* %a, i32* %b, i32 %x) nounwind {
	entry:			entry:
	br label %loop			br label %loop
	loop:			loop:
	%iv = phi i32* [ %a, %entry ], [ %iv4, %loop ]			%iv = phi i32* [ %a, %entry ], [ %iv4, %loop ]
	%s = phi i32 [ 0, %entry ], [ %s4, %loop ]			%s = phi i32 [ 0, %entry ], [ %s4, %loop ]
	%v = load i32, i32* %iv			%v = load i32, i32* %iv
	%iv1 = getelementptr inbounds i32, i32* %iv, i32 %x			%iv1 = getelementptr inbounds i32, i32* %iv, i32 %x
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; strange increment expressions like this:			; strange increment expressions like this:
	; IV + ((sext i32 (2 * %s) to i64) + (-1 * (sext i32 %s to i64)))			; IV + ((sext i32 (2 * %s) to i64) + (-1 * (sext i32 %s to i64)))
	;			;
	; X32: extrastride:			; X32: extrastride:
	; no spills in the preheader			; no spills in the preheader
	; X32-NOT: mov{{.*}}(%esp){{$}}			; X32-NOT: mov{{.*}}(%esp){{$}}
	; X32: %for.body{{$}}			; X32: %for.body{{$}}
	; no complex address modes			; no complex address modes
	; X32-NOT: (%{{[^)]+}},%{{[^)]+}},			; X32-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
	; no reloads			; no reloads
	; X32-NOT: (%esp)			; X32-NOT: (%esp)
	define void @extrastride(i8* nocapture %main, i32 %main_stride, i32* nocapture %res, i32 %x, i32 %y, i32 %z) nounwind {			define void @extrastride(i8* nocapture %main, i32 %main_stride, i32* nocapture %res, i32 %x, i32 %y, i32 %z) nounwind {
	entry:			entry:
	%cmp8 = icmp eq i32 %z, 0			%cmp8 = icmp eq i32 %z, 0
	br i1 %cmp8, label %for.end, label %for.body.lr.ph			br i1 %cmp8, label %for.end, label %for.body.lr.ph

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improvement in CodeGen instruction selection for LEAs.ClosedPublic

Details

Diff Detail

Event Timeline

BB#0: # %entry

Revision Contents

Diff 114326

include/llvm/CodeGen/MachineInstr.h

include/llvm/CodeGen/SelectionDAG.h

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

lib/Target/X86/X86ISelDAGToDAG.cpp

lib/Target/X86/X86OptimizeLEAs.cpp

test/CodeGen/X86/GlobalISel/callingconv.ll

test/CodeGen/X86/GlobalISel/gep.ll

test/CodeGen/X86/GlobalISel/memop-scalar.ll

test/CodeGen/X86/lea-opt-cse1.ll

test/CodeGen/X86/lea-opt-cse2.ll

test/CodeGen/X86/lea-opt-cse3.ll

test/CodeGen/X86/lea-opt-cse4.ll

test/CodeGen/X86/mul-constant-i16.ll

test/CodeGen/X86/mul-constant-i32.ll

test/CodeGen/X86/mul-constant-i64.ll

test/CodeGen/X86/mul-constant-result.ll

test/CodeGen/X86/umul-with-overflow.ll

test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll

[X86] Improvement in CodeGen instruction selection for LEAs.
ClosedPublic