This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improvement in CodeGen instruction selection for LEAs.
ClosedPublic

Authored by jbhateja on Jul 5 2017, 8:40 AM.

Download Raw Diff

Details

Reviewers

lsaba
RKSimon
craig.topper
qcolombet
jmolloy
jbhateja

Commits

rG328199ec2643: [X86] Improvement in CodeGen instruction selection for LEAs.
rG908c8b37c2be: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.
rL319543: [X86] Improvement in CodeGen instruction selection for LEAs.
rL313343: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs.

Summary

1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to

 accommodate similar operand appearing in the DAG  e.g.
             T1 = A + B
             T2 = T1 + 10
             T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will now look like
            Base = B , Index = A , Scale = 2 , Disp = 10

2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity

then complex LEAs (having 3 operands) could be factored out  e.g.
            leal 1(%rax,%rcx,1), %rdx
            leal 1(%rax,%rcx,2), %rcx
will be factored as following
            leal 1(%rax,%rcx,1), %rdx
            leal (%rdx,%rcx)   , %edx

3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.

4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put.

PR32755 will be taken care of by this pathc.

Previous patch revisions : r313343 , r314886

Diff Detail

Build Status

Buildable 8084
Build 8084: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Please add test cases for the optimization added in OptimizeLEAPass

lsaba added inline comments.Jul 6 2017, 2:06 AM

lib/Target/X86/X86OptimizeLEAs.cpp
85–86	This comment is only valid for the else statement, please change it to explain the different cases between Identical and Similar Disp
291	Please add a comment explaining what this function does

Changes for review comments for patch.

Reviewers,

Have posted an RFC for wider fix at following link

Kindly review the same and add your comments

https://groups.google.com/forum/#!topic/llvm-dev/x2LDXpON500

Thanks.

Harbormaster completed remote builds in B8077: Diff 105755.Jul 9 2017, 6:15 AM

Missed one change to be submitted, build is running will upload the change post lit. Thanks

spatel added a subscriber: spatel.Jul 9 2017, 8:21 AM

Review comments changes cont..

lsaba added inline comments.Jul 10 2017, 6:20 AM

lib/Target/X86/X86OptimizeLEAs.cpp
737	should it also be erased from the LEAs list?

jbhateja added inline comments.Jul 10 2017, 8:24 AM

lib/Target/X86/X86OptimizeLEAs.cpp
737	Why do you think so ? LEAs is a Map where Key = F ( BASE , INDEX , DISP , SEGMENT) Value = Vector of MI (LEA Instr). This MAP is populated per BasicBlock basis. Outer Loop traverse over Map entries Sort Vector in decresing order of Scale. Inner Loop traverses over Sorted vector of LEA for a given Key LI1 insturction will be traversed only once. Map will be delted once we leave this function. Machine CSE which is value number based is already run before this pass so if there are multiple identical LEAs (i.e same BASE/INDEX/SCALE/DISP/SEGMENT) in a BasicBlock they will be factored out before we land up here..

lsaba added inline comments.Jul 11 2017, 5:02 AM

lib/Target/X86/X86OptimizeLEAs.cpp
737	just making sure:) by the way, can't this algorithm work cross a function's basic blocks?

Performing LEA factorization/CSE across bacis blocks.

jbhateja marked an inline comment as done.Jul 17 2017, 8:05 AM

jbhateja marked 2 inline comments as done and an inline comment as not done.Jul 17 2017, 8:07 AM

ping @reviewers

lsaba added inline comments.Jul 19 2017, 8:03 AM

lib/Target/X86/X86OptimizeLEAs.cpp
340	it is unclear what this function does, can you explain?

jbhateja added inline comments.Jul 19 2017, 9:54 AM

lib/Target/X86/X86OptimizeLEAs.cpp
340	In a nutshell we are implementing a scoped hash map. Which is LEAs. Every time we enter a new scope and encounter an LEA we first record the length of list of MIs corresponding to MemOpKey of new LEA. After that we insert the new LEA in the beginning of the list which is a value field of the hash map. When we leave a scope we remove the LEA instructions from the LEAs hash map. Since we recorded the original length of list of MIs when we entered the scope at exit we keep on removing elements from the beginning of list till the size becomes same as what was recorded at the entry.

jbhateja marked an inline comment as done.Jul 19 2017, 9:55 AM

lsaba added inline comments.Jul 24 2017, 5:52 AM

lib/Target/X86/X86OptimizeLEAs.cpp
341	already initialized at the beginning of the function
test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	can you please add a test case that covers scale >1 cases
test/CodeGen/X86/umul-with-overflow.ll
6	why did this test change?

jbhateja added inline comments.Jul 24 2017, 8:18 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	This commit if you see has two parts 1/ pattern matching based on addressing mode (which is limited currently). 2/ factoring of LEAs which is generic. Checking in incremental changes should be fine I guess. Generic pattern will need to be brought out of addessing mode based selection as I described in following link https://groups.google.com/forum/#!topic/llvm-dev/x2LDXpON500 Please comment in the thread.
test/CodeGen/X86/umul-with-overflow.ll
6	Beecause I generated its output with script utils/update_llc_test_checks.py which adds an assertion for each instruction. I think it sould be fine.

jbhateja added inline comments.Jul 24 2017, 8:20 AM

lib/Target/X86/X86OptimizeLEAs.cpp
341	Yes.

ping @ reviewers. can we do an incremental checkin for this.

Thanks

lsaba added inline comments.Jul 26 2017, 4:21 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	I am not sure i understand what you mean by "Generic pattern will need to be brought out of addessing mode" , as far as i understand, for the following C code: int foo(int a, int b) { int x = a + 2b + 4; int y = a + 4b + 4; int c = x*y ; return c; } the currently generated IR: define i32 @foo(i32 %a, i32 %b) local_unnamed_addr #0 { entry: %mul = shl i32 %b, 1 %add = add i32 %a, 4 %add1 = add i32 %add, %mul %mul2 = shl i32 %b, 2 %add4 = add i32 %add, %mul2 %mul5 = mul nsw i32 %add1, %add4 ret i32 %mul5 } the currently generated asm: leal 4(%rdi,%rsi,2), %ecx leal 4(%rdi,%rsi,4), %eax imull %ecx, %eax retq this will be refactored by this optimization in this current commit (not a future commit) to: leal 4(%rdi,%rsi,2), %ecx leal (%ecx,%rsi,2), %eax imull %ecx, %eax retq please correct me if im wrong
test/CodeGen/X86/lea-opt-cst.ll
4	please generate the test with the original checks before your changes and commit it first in a separate commit
test/CodeGen/X86/umul-with-overflow.ll
6	This needs to be in a separate pre commit. please commit and rebase

RKSimon added inline comments.Jul 26 2017, 4:24 AM

lib/Target/X86/X86OptimizeLEAs.cpp
188	Can we avoid the static?
239	Comment describing the purpose of the class
262	(style) cleanup the positions of the * - check what clang-format does
262	comment
277	comment
297	(style) remove braces
test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	Please can you commit this test file to trunk with current codegen and update the patch to show the diff
test/CodeGen/X86/lea-opt-cst.ll
4	Please can you commit this test file to trunk with current codegen and update the patch to show the diff
test/CodeGen/X86/umul-with-overflow.ll
6	I regenerated this recently - please rebase
utils/TableGen/DAGISelMatcherGen.cpp
308	This is still here

jbhateja added inline comments.Jul 26 2017, 8:33 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	Hi Lama, By generic patten handling I meant LEA folding into complex LEAs which is currently restrictive. Consider following case %struct.SA = type { i32 , i32 , i32 , i32 , i32}; define void @foo(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr #0 { entry: %h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0 %0 = load i32, i32* %h0, align 8 %h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3 %h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4 %1 = load i32, i32* %h4, align 8 %add = add i32 %0, 1 %add4 = add i32 %add, %1 %add5 = add i32 %add4, %1 store i32 %add5, i32* %h3, align 4 %add10 = add i32 %add5, %1 %add29 = add i32 %add10, %1 store i32 %add29, i32* %h4, align 8 ret void } ASM : foo: # @foo .cfi_startproc BB#0: # %entry movl (%rdi), %eax movl 16(%rdi), %ecx leal (%rax,%rcx,2), %edx leal 1(%rax,%rcx,2), %eax movl %eax, 12(%rdi) leal 1(%rdx,%rcx,2), %eax movl %eax, 16(%rdi) It could be further optimized to following: movl (%rdi), %eax movl 16(%rdi), %ecx leal 1(%rax,%rcx,2), %edx movl %eax, 12(%rdi) leal (%rdx,%rcx,2), %eax movl %eax, 16(%rdi) Folding is currently being done as a part of addressing mode matcher, I feel that efficient folding can only be done as a separate MI pass, that is what I explained in the proposal (http://lists.llvm.org/pipermail/llvm-dev/2017-July/115182.html). Thanks for your example I will add it to the test cases , it demonstrates generic ness of Factorization.

lsaba added inline comments.Jul 27 2017, 12:48 AM

test/CodeGen/X86/lea-opt-csebb.ll
1 ↗	(On Diff #106796)	Hi, Thanks, I understand the need for a more generic pattern matching and I agree. This is unrelated to my comment which refers solely to the Factorize LEA optimization which needs more testing, for example covering different Scale values (like the example i provided) and testing factorizing LEAs cross Basic Blocks.

RKSimon mentioned this in rL309262: [X86] Adding test cases for LEA factorization (PR32755 / D35014).Jul 27 2017, 3:37 AM

lsaba added inline comments.Jul 31 2017, 4:50 AM

lib/Target/X86/X86OptimizeLEAs.cpp
724	This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle it separately, for example in this reproducer : ; ModuleID = 'bugpoint-reduced-simplified.bc' source_filename = "bugpoint-output-2ef2e5d.bc" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: norecurse nounwind readnone uwtable define i32 @foo(i32 %a, i32 %b, i32 %d, i32 %y, i32 %x) local_unnamed_addr #0 { entry: %mul1 = shl i32 %b, 1 %add2 = add i32 %a, 4 %add3 = add i32 %add2, %mul1 %mul4 = shl i32 %b, 2 %add6 = add i32 %add2, %mul4 br label %for.body for.cond.cleanup: ; preds = %for.body ret i32 %add for.body: ; preds = %for.body, %entry %x.addr.015 = phi i32 [ %x, %entry ], [ %add3, %for.body ] %y.addr.014 = phi i32 [ %y, %entry ], [ %add6, %for.body ] %mul = mul nsw i32 %x.addr.015, %y.addr.014 %add = add nsw i32 0, %mul %exitcond = icmp eq i32 undef, %d br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !1 } attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp- math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+ mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 6.0.0 (cfe/trunk 309511)"} !1 = distinct !{!1, !2} !2 = !{!"llvm.loop.unroll.disable"}

RKSimon added inline comments.Jul 31 2017, 5:58 AM

lib/Target/X86/X86OptimizeLEAs.cpp
29	(style) Insert the include in alphabetical order (so before MachineFunctionPass.h)

1/ Changes to cover review comments.
2/ Handling for patterns involving SUBREG_TO_REG as LEA operands.
3/ Formatting changes.

RKSimon added inline comments.Aug 1 2017, 9:14 AM

lib/Target/X86/X86OptimizeLEAs.cpp
130	(style) for (unsigned i = 1, e = MI->getNumOperands(); i <e ; i++)
215	(style) for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i)
test/CodeGen/X86/umul-with-overflow.ll
6	Still needs to be rebased - you've lost the x86_64 tests

Merge branch 'master' of https://github.com/llvm-mirror/llvm
Formatting changes

Pinging reviewers. Kindly pour your comments.
Thanks

In D35014#831745, @jbhateja wrote:

Pinging reviewers. Kindly pour your comments.
Thanks

Please address my last comment (Line 920)

In D35014#833192, @lsaba wrote:

In D35014#831745, @jbhateja wrote:

Pinging reviewers. Kindly pour your comments.
Thanks

Please address my last comment (Line 920)

The test case you provided is giving correct results with currently checked in changes.

lsaba added inline comments.Aug 7 2017, 5:17 AM

lib/Target/X86/X86OptimizeLEAs.cpp
208	need to check MO2.isReg()

jbhateja added inline comments.Aug 8 2017, 7:03 AM

lib/Target/X86/X86OptimizeLEAs.cpp
208	Yes, I shall take care of this. Kindly let me know if there are any other comments apart from this. It shall save iterations.

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

In D35014#835240, @jbhateja wrote:

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

Hi,
I ran the patch on several benchmarks to check performance, overall the changes look good, but there is a regression in one of the benchmarks (EEMBC/coremark-pro) caused by creating an undesired lea instruction instead of the previously created add instruction, I am working on creating a simple reproducer for the problem and would appreciate your patience.

Thanks

In D35014#835498, @lsaba wrote:

In D35014#835240, @jbhateja wrote:

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

Hi,
I ran the patch on several benchmarks to check performance, overall the changes look good, but there is a regression in one of the benchmarks (EEMBC/coremark-pro) caused by creating an undesired lea instruction instead of the previously created add instruction, I am working on creating a simple reproducer for the problem and would appreciate your patience.

Thanks

The change in X86DAGToDAGISel::matchAddressBase is good when it allows us to git rid of extra lea/add instructions, or replace slow lea with fast lea, but in some cases it only replaces an add instruction with a lea instruction and since the throughput of add instruction is higher, we would prefer to keep the add instruction, for example, for the following IR:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind uwtable
define void @foo() local_unnamed_addr #0 {
entry:
  br i1 undef, label %BB2, label %BB1
BB1:                                  ; preds = %entry
  %rem.us.1 = srem i32 undef, 65536
  br label %BB2
BB2:      ; preds = %BB1, %entry
  %s = phi i32 [ undef, %entry ], [ %rem.us.1, %BB1 ]
  %a = phi i32 [ 1, %entry ], [ 0, %BB1 ]
  %mul1 = mul nsw i32 %s, %a
  %rem1 = srem i32 %mul1, 65536
  %add1 = add nsw i32 %rem1, %a
  %conv1 = trunc i32 %add1 to i16
  store i16 %conv1, i16* undef, align 2, !tbaa !1
  %add2 = add i32 %add1, %a
  %0 = trunc i32 %add2 to i16
  %conv2 = and i16 %0, 255
  store i16 %conv2, i16* undef, align 2, !tbaa !1
  ret void
}
attributes #0 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-
math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="core-avx2" "target-
features"="+aes,+avx,+avx2,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+rdrnd,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="true" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = !{!"clang version 6.0.0 (cfe/trunk 310239)"}
!1 = !{!2, !2, i64 0}
!2 = !{!"short", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}

the originally generated code was:

.LBB0_2:                                # %for.cond11.for.inc35_crit_edge.us.unr-lcssa
   movl	%eax, %ecx
   imull	%eax, %ecx
   movl	%ecx, %edx
   sarl	$31, %edx
   shrl	$16, %edx
   addl	%ecx, %edx
   andl	$-65536, %edx           # imm = 0xFFFF0000
   subl	%edx, %ecx
   addl	%eax, %ecx
   movw	%cx, (%rax)
   addl	%eax, %ecx
   movzbl	%cl, %eax
   movw	%ax, (%rax)
   retq

while the generated code now is

movl	%eax, %ecx
imull	%eax, %ecx
movl	%ecx, %edx
sarl	$31, %edx
shrl	$16, %edx
addl	%ecx, %edx
andl	$-65536, %edx           # imm = 0xFFFF0000
subl	%edx, %ecx
leal	(%rcx,%rax), %edx
movw	%dx, (%rax)
leal	(%rcx,%rax,2), %eax
movzbl	%al, %eax
movw	%ax, (%rax)
retq

Need to refine this optimization further to avoid such cases since the impact can be substantial if the code is in a hot loop for example

qcolombet added inline comments.Aug 9 2017, 10:58 AM

include/llvm/CodeGen/MachineInstr.h
1291 ↗	(On Diff #109147)	Genuine question. MRI is usually accessibly via other more efficient means. Do we really need to rely on this one?

Changes to avoid creating costly complex LEAs having scale less than equal to 2 in loops.
Strength reduction for simple LEAs with unit scale for better throughput.
Pattern matching for DAG folding has been improved to make it generic.
Incorporating other review comments.

jbhateja added inline comments.Aug 14 2017, 11:25 AM

include/llvm/CodeGen/MachineInstr.h
1291 ↗	(On Diff #109147)	It seems making it public will be useful as one can directly use the function which internally calls getParent() twice to get to MachineFunction which contains Reg Info.

ping @ Reviewers, I guess I have addressed all comments.

It seems like there are still correctness issues in the patch, I ran the llvm-test-suite and got a couple of runfails :
multisource_applications_alac_encode_alacconvert_encode
multisource_applications_jm_lencod_lencod

please debug those failures.

In general, please consider running execution tests on the patches to discover runtime failures.

Limiting the scope of DAG operands folding while AM based instruction selection to LEAs.
Formatting changes , rebase and lnt failure fix.

Harbormaster completed remote builds in B9546: Diff 112328.Aug 23 2017, 3:40 AM

Ping @ reviewers. I think all the comments have been resolved.
Do let me know if any other comments.

Extending aggressive AM based folding for LEAs to cover more cases.
Merge branch 'master' of https://github.com/llvm-mirror/llvm

Harbormaster completed remote builds in B9799: Diff 113356.Aug 30 2017, 10:52 PM

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

In D35014#859838, @jbhateja wrote:

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

There are still functionality issues with the pass, please allow some time to create a reproducer

In D35014#859838, @jbhateja wrote:

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

The following ll code fails in CodeGen selection, please debug the issue:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%struct.S1 = type { i32, i32 }
; Function Attrs: nounwind uwtable
define fastcc void @func(i32 %end) unnamed_addr #0 {
entry:
 br label %while.body
while.body:                                       ; preds = %if.end, %entry
 %a = phi i32 [ %end, %entry ], [ undef, %if.end ]
 br i1 undef, label %if.then, label %if.else
if.then:                                          ; preds = %while.body
 %dec = add nsw i32 %a, -1
 %idx1 = sext i32 %dec to i64
 %idx2 = getelementptr inbounds %struct.S1, %struct.S1* null, i64 %idx1
 %0 = bitcast %struct.S1* %idx2 to i64*
 store i64 0, i64* %0, align 4
 %1 = load [3 x float]*, [3 x float]** undef, align 8 
 %idx3 = getelementptr inbounds [3 x float], [3 x float]* %1, i64 %idx1, i64 0
 %2 = bitcast float* %idx3 to i32*
 %3 = load i32, i32* %2, align 4
 store i32 %3, i32* undef, align 4
 br label %if.end
if.else:                                          ; preds = %while.body
 br label %if.end
if.end:                                           ; preds = %if.else, %if.then
 br label %while.body
}
attributes #0 = { nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }
!llvm.module.flags = !{!0}
!0 = !{i32 1, !"wchar_size", i32 4}

Some style comments, but @lsaba 's comments need to be dealt with first.

include/llvm/CodeGen/MachineInstr.h
1296 ↗	(On Diff #113356)	(style) newline before private
include/llvm/CodeGen/SelectionDAG.h
305 ↗	(On Diff #113356)	(style) newline before private
lib/Target/X86/X86ISelDAGToDAG.cpp
165	(style) newline
1115	clang-format? If so, commit it as an NFC change
1124	Two hard coded depths like this is weird - better to have a getMaxOperandFoldingDepth() helper?
lib/Target/X86/X86OptimizeLEAs.cpp
173	NFC change - just commit it if you want, but don't pollute a patch with it
316	NFC change
357	clang-format? If so, commit it as an NFC change
650	clang-format? If so, commit it as an NFC change
764	(style) Remove braces

Fine tuning pattern matching condition.
Formatting changes.

Harbormaster completed remote builds in B9890: Diff 113802.Sep 4 2017, 9:55 PM

3-Ops LEA are costly starting target SandyBridge , is there a limitation in the code for the targets this transformation works on? If not I think there should be.
you can check the Slow3OpsLEA feature for the full list of targets.

In D35014#861853, @lsaba wrote:

3-Ops LEA are costly starting target SandyBridge , is there a limitation in the code for the targets this transformation works on? If not I think there should be.
you can check the Slow3OpsLEA feature for the full list of targets.

Yes, this check could be added in pattern matching where i'm avoiding creation of complex LEA with scale less than equal to 2.

Please let me know if anything else you see to save iteration.
Thanks

Rebasing again.
Adding a check for subtarget feature Slow3OpLEA in pattern matching.

ping @ reviewers.

@lsaba, @reviewers , waiting for your LGTM or any remaining comments on this.
Thanks

lsaba accepted this revision.Sep 13 2017, 7:13 AM

This revision is now accepted and ready to land.Sep 13 2017, 7:13 AM

Few synthetic changes.

Closed by commit rL313343: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. (authored by jbhateja). · Explain WhySep 14 2017, 10:32 PM

This revision was automatically updated to reflect the committed changes.

Reverted in rL313376 due to PR34629 and PR34634

This revision is now accepted and ready to land.Sep 16 2017, 2:39 AM

PR34629 and PR34634 need to be addressed

This revision now requires changes to proceed.Sep 16 2017, 2:42 AM

Undefining result operand of factored statement to preserve SSA nature of Machine IR.
This fixes reperted PR 34634 and PR 34629 and build-bot failures reported.

@RKSimon, @Reviewers, revision was in accepted state earlier and fix to counter reported issues post commit to trunk has been fixed. Please do let me know if another acceptance is needed to land this again.

Updating tests for reported PRs for initial patch.

jmolloy requested changes to this revision.Sep 18 2017, 5:04 AM

jmolloy added a subscriber: jmolloy.

jmolloy added inline comments.

lib/Target/X86/X86OptimizeLEAs.cpp
823	This can cause recursion deep enough to cause stack overflows. Please could you refactor this to not use direct recursion? The domtree may be hundreds of nodes deep in degenerate cases.

This revision now requires changes to proceed.Sep 18 2017, 5:04 AM

Merge branch 'master' of https://github.com/llvm-mirror/llvm
Making Factorization algorithm iterative.

@reviewers, required revision change are through, let me know if this can land back.

@jmolloy , @RKSimon , this patch has been reviewd and due to regression was opened again for review, required changes have
been made, can this land now in trunk if there are no more observations from any reviewers.

Thanks

@reviewers, if no more comment I shall be landing this into trunk since required revision changes post acceptance are through.

Operands of factored LEA must belong to same register class as per Intel's Architecture Manual.
Some code reorganization + rebase.

D35014 : Review comments resolution
Removing 2 tests, pulled their latest renamed versions from trunk.
[X86] : Factorize LEA, handling for patterns involing SUBREG_TO_REG as LEA operands.
Few more changes for LEA factorization.
Updating test lea-opt-cse3.ll
Formatting changes.
Formatting changes
Changes to avoid creating costly LEAs in loops, strength reduction for simple LEAs with unit scale
Updating test.
[X86] Limiting the scope of DAG operands folding while AM based instruction selection to LEAs.
Merge from trunk.
Extending aggressive AM based folding for LEAs to cover more cases.
Updating test post rebase.
Formatting changes + fine tuning pattern matching condition.
Adding a check for subtarget feature Slow3OpLEA in pattern matching.
Few synthetic changes.
Undefining result operand of factored statement to preserve SSA nature of Machine IR.
Merge branch 'master' of https://github.com/llvm-mirror/llvm
Merge branch 'master' of https://github.com/llvm-mirror/llvm
Updating tests for reported PRs for initial patch.
Merge branch 'master' of https://github.com/llvm-mirror/llvm
Pull from trunk.
Operands of LEAs must be of same register class.
Revert "Operands of LEAs must be of same register class."

Operands of LEA must be of same register class, this constraint is as per Intel's architecture manual.
Remove map entry from LEAs map if value list becomes empty.
Rebase.

Patch has been regressed through chrome test sweet.
No issues reported. Thanks to Hans Wennborg (hans@chromium.org) for validating it.

RKSimon added inline comments.Oct 29 2017, 6:12 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
100	bool isLegalScale() const {
187	Is a default argument for a setter a good idea? Especially one that is the inverse of what the setter says it is.
1124	My comment still stands - try to avoid hard coded values embedded in the source - add a getMaxOperandFoldingDepth() helper.
1422	These AM.Scale increments are scary - better to set it with AM.Scale = 2?
lib/Target/X86/X86OptimizeLEAs.cpp
29	Include ordering still broken
237	Do you mean: bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent());
724	Has @lsaba test been added to the patch? I couldn't see it.
770	Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp auto IsLegalScale = [](int S) { return S == 1 \|\| S == 2 \|\| S == 4 \|\| S == 8; };
776	return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm();
831	DL is only used here - just use LI1.getDebugLoc() directly?
test/CodeGen/X86/lea-opt-cse2.ll
3 ↗	(On Diff #119011)	Why have you changed these tests?
test/CodeGen/X86/lea-opt-cse4.ll
3 ↗	(On Diff #119011)	Why have you changed these tests?

jbhateja retitled this revision from [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. to [X86] Improvement in CodeGen instruction selection for LEAs..Oct 29 2017, 8:36 AM

jbhateja edited the summary of this revision. (Show Details)

jbhateja added inline comments.Oct 29 2017, 9:55 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
100	Fixed.
187	Fixed. We do not need a default argument here, both the calls to this routines is passing an explicit argument.
1124	Helper added.
1422	Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is a max legal scale). This was intentionally done, initial change was only working for AM.scale = 2 and was very restrictive. Aggressive operand folding is done only for LEAs currently and is enabled instantiating and RAII object of X86AggressiveOperandFolding class.
lib/Target/X86/X86OptimizeLEAs.cpp
237	fixed.
724	We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization inside loops, only simplifyLEA can kick in.
724	We have a test case for loops lea-opt-cse2.ll, so not added this. We are not doing any factorization inside loops, only simplifyLEA can kick in.
770	Fixed
776	Fixed
test/CodeGen/X86/lea-opt-cse4.ll
3 ↗	(On Diff #119011)	FixupLEAPass down the pipeline transforms some complex LEA ptterns to simple with add. Optimization, with changes in the patch we will have following leal 1(%rax,%rcx,4), %eax which after FixupLEAPass will get converted to leal (%rax,%rcx,4), %eax addl $1, %eax

Rebasing
Review comments resolution.

@RKSimon, requested revision changes have been made as per your comments. Can you please validate.

1/ Making the factorization alog iterative. This was earlier commited with

Diff : https://reviews.llvm.org/D35014?id=116144
but some how got removed in successive commits.

2/ Rebasing again. All comments are resolved.

@RKSimon, @lsaba , @jmolly , all your comments have been addressed. Kindly verify so that I can land this into trunk.

A few minor comments @lsaba @craig.topper any final comments?

lib/Target/X86/X86ISelDAGToDAG.cpp
191	Make this a const method
1124	I meant make this a class method, but if you don't want to you can leave it here as lambda
lib/Target/X86/X86OptimizeLEAs.cpp
85	CopyLike
188	Again, can we avoid the static?

@RKSimon No more comments from my side

I haven't been following this much so I have no comments either.

LGTM

LGTM - with those final few minors I mentioned

Reivew comment resolution.
Rebasing patch.

Rebasing to resolve incorrect overrideing of register names in kill statements.

jbhateja added inline comments.Nov 28 2017, 11:25 PM

lib/Target/X86/X86OptimizeLEAs.cpp
188	Its used bacause we want MemOpKey for LEA factorization to be indipendent of Scale, keeping it as static avoids recreation of dummy scale.

Closed by commit rL319543: [X86] Improvement in CodeGen instruction selection for LEAs. (authored by jbhateja). · Explain WhyDec 1 2017, 6:08 AM

This revision was automatically updated to reflect the committed changes.

rL319543 was reverted at rL319591 due to asan bot breakage

Please rebase. Thanks.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 7 2019, 5:06 AM

Closed by commit rG328199ec2643: [X86] Improvement in CodeGen instruction selection for LEAs. (authored by jbhateja). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:06 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelDAGToDAG.cpp

13 lines

X86OptimizeLEAs.cpp

110 lines

test/

CodeGen/

X86/

40 lines

12 lines

15 lines

9 lines

mul-constant-result.ll

14 lines

umul-with-overflow.ll

35 lines

Transforms/

LoopStrengthReduce/

X86/

ivchain-X86.ll

6 lines

utils/

TableGen/

DAGISelMatcherGen.cpp

2 lines

Diff 105790

lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	struct X86ISelAddressMode {
bool isRIPRelative() const {		bool isRIPRelative() const {
if (BaseType != RegBase) return false;		if (BaseType != RegBase) return false;
if (RegisterSDNode *RegNode =		if (RegisterSDNode *RegNode =
dyn_cast_or_null<RegisterSDNode>(Base_Reg.getNode()))		dyn_cast_or_null<RegisterSDNode>(Base_Reg.getNode()))
return RegNode->getReg() == X86::RIP;		return RegNode->getReg() == X86::RIP;
return false;		return false;
}		}

void setBaseReg(SDValue Reg) {		void setBaseReg(SDValue Reg) {
		RKSimonUnsubmitted Not Done Reply Inline Actions bool isLegalScale() const { RKSimon: ``` bool isLegalScale() const { ```
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed. jbhateja: Fixed.
BaseType = RegBase;		BaseType = RegBase;
Base_Reg = Reg;		Base_Reg = Reg;
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
void dump() {		void dump() {
dbgs() << "X86ISelAddressMode " << this << '\n';		dbgs() << "X86ISelAddressMode " << this << '\n';
dbgs() << "Base_Reg ";		dbgs() << "Base_Reg ";
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	class X86DAGToDAGISel final : public SelectionDAGISel {

/// If true, selector should try to optimize for code size instead of		/// If true, selector should try to optimize for code size instead of
/// performance.		/// performance.
bool OptForSize;		bool OptForSize;

/// If true, selector should try to optimize for minimum code size.		/// If true, selector should try to optimize for minimum code size.
bool OptForMinSize;		bool OptForMinSize;

public:		public:
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) newline RKSimon: (style) newline
explicit X86DAGToDAGISel(X86TargetMachine &tm, CodeGenOpt::Level OptLevel)		explicit X86DAGToDAGISel(X86TargetMachine &tm, CodeGenOpt::Level OptLevel)
: SelectionDAGISel(tm, OptLevel), OptForSize(false),		: SelectionDAGISel(tm, OptLevel), OptForSize(false),
OptForMinSize(false) {}		OptForMinSize(false) {}

StringRef getPassName() const override {		StringRef getPassName() const override {
return "X86 DAG->DAG Instruction Selection";		return "X86 DAG->DAG Instruction Selection";
}		}

bool runOnMachineFunction(MachineFunction &MF) override {		bool runOnMachineFunction(MachineFunction &MF) override {
// Reset the subtarget each time through.		// Reset the subtarget each time through.
Subtarget = &MF.getSubtarget<X86Subtarget>();		Subtarget = &MF.getSubtarget<X86Subtarget>();
SelectionDAGISel::runOnMachineFunction(MF);		SelectionDAGISel::runOnMachineFunction(MF);
return true;		return true;
}		}

void EmitFunctionEntryCode() override;		void EmitFunctionEntryCode() override;

bool IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const override;		bool IsProfitableToFold(SDValue N, SDNode U, SDNode Root) const override;

void PreprocessISelDAG() override;		void PreprocessISelDAG() override;

// Include the pieces autogenerated from the target description.		// Include the pieces autogenerated from the target description.
		RKSimonUnsubmitted Not Done Reply Inline Actions Is a default argument for a setter a good idea? Especially one that is the inverse of what the setter says it is. RKSimon: Is a default argument for a setter a good idea? Especially one that is the inverse of what the…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed. We do not need a default argument here, both the calls to this routines is passing an explicit argument. jbhateja: Fixed. We do not need a default argument here, both the calls to this routines is passing an…
#include "X86GenDAGISel.inc"		#include "X86GenDAGISel.inc"

private:		private:
void Select(SDNode *N) override;		void Select(SDNode *N) override;
		RKSimonUnsubmitted Not Done Reply Inline Actions Make this a const method RKSimon: Make this a const method

bool foldOffsetIntoAddress(uint64_t Offset, X86ISelAddressMode &AM);		bool foldOffsetIntoAddress(uint64_t Offset, X86ISelAddressMode &AM);
bool matchLoadInAddress(LoadSDNode *N, X86ISelAddressMode &AM);		bool matchLoadInAddress(LoadSDNode *N, X86ISelAddressMode &AM);
bool matchWrapper(SDValue N, X86ISelAddressMode &AM);		bool matchWrapper(SDValue N, X86ISelAddressMode &AM);
bool matchAddress(SDValue N, X86ISelAddressMode &AM);		bool matchAddress(SDValue N, X86ISelAddressMode &AM);
bool matchAdd(SDValue N, X86ISelAddressMode &AM, unsigned Depth);		bool matchAdd(SDValue N, X86ISelAddressMode &AM, unsigned Depth);
bool matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,		bool matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,
unsigned Depth);		unsigned Depth);
▲ Show 20 Lines • Show All 907 Lines • ▼ Show 20 Lines	static bool foldMaskAndShiftToScale(SelectionDAG &DAG, SDValue N,
DAG.ReplaceAllUsesWith(N, NewSHL);		DAG.ReplaceAllUsesWith(N, NewSHL);

AM.Scale = 1 << AMShiftAmt;		AM.Scale = 1 << AMShiftAmt;
AM.IndexReg = NewSRL;		AM.IndexReg = NewSRL;
return false;		return false;
}		}

bool X86DAGToDAGISel::matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,		bool X86DAGToDAGISel::matchAddressRecursively(SDValue N, X86ISelAddressMode &AM,
unsigned Depth) {		unsigned Depth) {
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format? If so, commit it as an NFC change RKSimon: clang-format? If so, commit it as an NFC change
SDLoc dl(N);		SDLoc dl(N);
DEBUG({		DEBUG({
dbgs() << "MatchAddress: ";		dbgs() << "MatchAddress: ";
AM.dump();		AM.dump();
});		});
// Limit recursion.		// Limit recursion.
if (Depth > 5)		if (Depth > 5)
return matchAddressBase(N, AM);		return matchAddressBase(N, AM);

		RKSimonUnsubmitted Not Done Reply Inline Actions Two hard coded depths like this is weird - better to have a getMaxOperandFoldingDepth() helper? RKSimon: Two hard coded depths like this is weird - better to have a getMaxOperandFoldingDepth() helper?
		RKSimonUnsubmitted Not Done Reply Inline Actions My comment still stands - try to avoid hard coded values embedded in the source - add a getMaxOperandFoldingDepth() helper. RKSimon: My comment still stands - try to avoid hard coded values embedded in the source - add a…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Helper added. jbhateja: Helper added.
		RKSimonUnsubmitted Not Done Reply Inline Actions I meant make this a class method, but if you don't want to you can leave it here as lambda RKSimon: I meant make this a class method, but if you don't want to you can leave it here as lambda
// If this is already a %rip relative address, we can only merge immediates		// If this is already a %rip relative address, we can only merge immediates
// into it. Instead of handling this in every case, we handle it here.		// into it. Instead of handling this in every case, we handle it here.
// RIP relative addressing: %rip + 32-bit displacement!		// RIP relative addressing: %rip + 32-bit displacement!
if (AM.isRIPRelative()) {		if (AM.isRIPRelative()) {
// FIXME: JumpTable and ExternalSymbol address currently don't like		// FIXME: JumpTable and ExternalSymbol address currently don't like
// displacements. It isn't very important, but this should be fixed for		// displacements. It isn't very important, but this should be fixed for
// consistency.		// consistency.
if (!(AM.ES \|\| AM.MCSym) && AM.JT != -1)		if (!(AM.ES \|\| AM.MCSym) && AM.JT != -1)
▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	bool X86DAGToDAGISel::matchAddressBase(SDValue N, X86ISelAddressMode &AM) {
if (AM.BaseType != X86ISelAddressMode::RegBase \|\| AM.Base_Reg.getNode()) {		if (AM.BaseType != X86ISelAddressMode::RegBase \|\| AM.Base_Reg.getNode()) {
// If so, check to see if the scale index register is set.		// If so, check to see if the scale index register is set.
if (!AM.IndexReg.getNode()) {		if (!AM.IndexReg.getNode()) {
AM.IndexReg = N;		AM.IndexReg = N;
AM.Scale = 1;		AM.Scale = 1;
return false;		return false;
}		}

		if (AM.BaseType == X86ISelAddressMode::RegBase && AM.Scale == 1) {
		if (AM.Base_Reg == N) {
		SDValue Base_Reg = AM.Base_Reg;
		AM.Base_Reg = AM.IndexReg;
		AM.IndexReg = Base_Reg;
		AM.Scale = 2;
		craig.topperUnsubmitted Done Reply Inline Actions Is Scale limited to 1 before this or could it be 2 in which case this creates an illegal scale of 3? craig.topper: Is Scale limited to 1 before this or could it be 2 in which case this creates an illegal scale…
		RKSimonUnsubmitted Done Reply Inline Actions There is a check for AM.scale == 1. But I agree it'd be clearer with "AM.Scale = 2" instead of incrementing. RKSimon: There is a check for AM.scale == 1. But I agree it'd be clearer with "AM.Scale = 2" instead of…
		return false;
		} else if (AM.IndexReg == N) {
		AM.Scale = 2;
		RKSimonUnsubmitted Done Reply Inline Actions AM.Scale = 2; RKSimon: AM.Scale = 2;
		return false;
		RKSimonUnsubmitted Not Done Reply Inline Actions These AM.Scale increments are scary - better to set it with AM.Scale = 2? RKSimon: These AM.Scale increments are scary - better to set it with AM.Scale = 2?
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is a max legal scale). This was intentionally done, initial change was only working for AM.scale = 2 and was very restrictive. Aggressive operand folding is done only for LEAs currently and is enabled instantiating and RAII object of X86AggressiveOperandFolding class. jbhateja: Increments are triggered only in aggressive folding mode and can fold upto 8 operands (which is…
		}
		}

// Otherwise, we cannot select it.		// Otherwise, we cannot select it.
return true;		return true;
}		}

// Default, generate it as a register.		// Default, generate it as a register.
AM.BaseType = X86ISelAddressMode::RegBase;		AM.BaseType = X86ISelAddressMode::RegBase;
AM.Base_Reg = N;		AM.Base_Reg = N;
return false;		return false;
▲ Show 20 Lines • Show All 1,325 Lines • Show Last 20 Lines

lib/Target/X86/X86OptimizeLEAs.cpp

Show All 20 Lines
#include "X86InstrInfo.h"		#include "X86InstrInfo.h"
#include "X86Subtarget.h"		#include "X86Subtarget.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/LiveVariables.h"		#include "llvm/CodeGen/LiveVariables.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		#include "llvm/CodeGen/MachineDominators.h"
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Insert the include in alphabetical order (so before MachineFunctionPass.h) RKSimon: (style) Insert the include in alphabetical order (so before MachineFunctionPass.h)
		RKSimonUnsubmitted Not Done Reply Inline Actions Include ordering still broken RKSimon: Include ordering still broken
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"

Show All 23 Lines
static inline bool isLEA(const MachineInstr &MI);		static inline bool isLEA(const MachineInstr &MI);

namespace {		namespace {
/// A key based on instruction's memory operands.		/// A key based on instruction's memory operands.
class MemOpKey {		class MemOpKey {
public:		public:
MemOpKey(const MachineOperand Base, const MachineOperand Scale,		MemOpKey(const MachineOperand Base, const MachineOperand Scale,
const MachineOperand Index, const MachineOperand Segment,		const MachineOperand Index, const MachineOperand Segment,
const MachineOperand *Disp)		const MachineOperand *Disp, bool DispCheck = false)
: Disp(Disp) {		: Disp(Disp), HardDispCheck(DispCheck) {
		craig.topperUnsubmitted Done Reply Inline Actions Need a space before HardDispCheck. craig.topper: Need a space before HardDispCheck.
Operands[0] = Base;		Operands[0] = Base;
Operands[1] = Scale;		Operands[1] = Scale;
Operands[2] = Index;		Operands[2] = Index;
Operands[3] = Segment;		Operands[3] = Segment;
}		}

bool operator==(const MemOpKey &Other) const {		bool operator==(const MemOpKey &Other) const {
// Addresses' bases, scales, indices and segments must be identical.		// Addresses' bases, scales, indices and segments must be identical.
for (int i = 0; i < 4; ++i)		for (int i = 0; i < 4; ++i)
if (!isIdenticalOp(Operands[i], Other.Operands[i]))		if (!isIdenticalOp(Operands[i], Other.Operands[i]))
return false;		return false;

// Addresses' displacements don't have to be exactly the same. It only		// Addresses' displacements don't have to be exactly the same. It only
// matters that they use the same symbol/index/address. Immediates' or		// matters that they use the same symbol/index/address. Immediates' or
// offsets' differences will be taken care of during instruction		// offsets' differences will be taken care of during instruction
		RKSimonUnsubmitted Not Done Reply Inline Actions CopyLike RKSimon: CopyLike
// substitution.		// substitution. If HardDispCheck is true then Disp must be identical.
		lsabaUnsubmitted Done Reply Inline Actions This comment is only valid for the else statement, please change it to explain the different cases between Identical and Similar Disp lsaba: This comment is only valid for the else statement, please change it to explain the different…
		if (!HardDispCheck)
return isSimilarDispOp(Disp, Other.Disp);		return isSimilarDispOp(Disp, Other.Disp);
		craig.topperUnsubmitted Done Reply Inline Actions No need for 'else' after an if that returns. craig.topper: No need for 'else' after an if that returns.
		return isIdenticalOp(Disp,Other.Disp);
}		}

// Address' base, scale, index and segment operands.		// Address' base, scale, index and segment operands.
const MachineOperand *Operands[4];		const MachineOperand *Operands[4];

// Address' displacement operand.		// Address' displacement operand.
const MachineOperand *Disp;		const MachineOperand *Disp;

		// Forces absolute displacement check.
		bool HardDispCheck;
};		};
} // end anonymous namespace		} // end anonymous namespace

/// Provide DenseMapInfo for MemOpKey.		/// Provide DenseMapInfo for MemOpKey.
namespace llvm {		namespace llvm {
template <> struct DenseMapInfo<MemOpKey> {		template <> struct DenseMapInfo<MemOpKey> {
typedef DenseMapInfo<const MachineOperand *> PtrInfo;		typedef DenseMapInfo<const MachineOperand *> PtrInfo;

Show All 14 Lines	static unsigned getHashValue(const MemOpKey &Val) {
// empty or tombstone.		// empty or tombstone.
assert(Val.Disp != PtrInfo::getEmptyKey() && "Cannot hash the empty key");		assert(Val.Disp != PtrInfo::getEmptyKey() && "Cannot hash the empty key");
assert(Val.Disp != PtrInfo::getTombstoneKey() &&		assert(Val.Disp != PtrInfo::getTombstoneKey() &&
"Cannot hash the tombstone key");		"Cannot hash the tombstone key");

hash_code Hash = hash_combine(Val.Operands[0], Val.Operands[1],		hash_code Hash = hash_combine(Val.Operands[0], Val.Operands[1],
Val.Operands[2], Val.Operands[3]);		Val.Operands[2], Val.Operands[3]);

// If the address displacement is an immediate, it should not affect the		// If the address displacement is an immediate, it should not affect the
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) for (unsigned i = 1, e = MI->getNumOperands(); i <e ; i++) RKSimon: (style) ``` for (unsigned i = 1, e = MI->getNumOperands(); i <e ; i++) ```
// hash so that memory operands which differ only be immediate displacement		// hash so that memory operands which differ only be immediate displacement
// would have the same hash. If the address displacement is something else,		// would have the same hash. If the address displacement is something else,
// we should reflect symbol/index/address in the hash.		// we should reflect symbol/index/address in the hash.
switch (Val.Disp->getType()) {		switch (Val.Disp->getType()) {
case MachineOperand::MO_Immediate:		case MachineOperand::MO_Immediate:
break;		break;
case MachineOperand::MO_ConstantPoolIndex:		case MachineOperand::MO_ConstantPoolIndex:
case MachineOperand::MO_JumpTableIndex:		case MachineOperand::MO_JumpTableIndex:
Show All 26 Lines	static bool isEqual(const MemOpKey &LHS, const MemOpKey &RHS) {
// empty or tombstone.		// empty or tombstone.
if (RHS.Disp == PtrInfo::getEmptyKey())		if (RHS.Disp == PtrInfo::getEmptyKey())
return LHS.Disp == PtrInfo::getEmptyKey();		return LHS.Disp == PtrInfo::getEmptyKey();
if (RHS.Disp == PtrInfo::getTombstoneKey())		if (RHS.Disp == PtrInfo::getTombstoneKey())
return LHS.Disp == PtrInfo::getTombstoneKey();		return LHS.Disp == PtrInfo::getTombstoneKey();
return LHS == RHS;		return LHS == RHS;
}		}
};		};
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions NFC change - just commit it if you want, but don't pollute a patch with it RKSimon: NFC change - just commit it if you want, but don't pollute a patch with it

/// \brief Returns a hash table key based on memory operands of \p MI. The		/// \brief Returns a hash table key based on memory operands of \p MI. The
/// number of the first memory operand of \p MI is specified through \p N.		/// number of the first memory operand of \p MI is specified through \p N.
static inline MemOpKey getMemOpKey(const MachineInstr &MI, unsigned N) {		static inline MemOpKey getMemOpKey(const MachineInstr &MI, unsigned N) {
assert((isLEA(MI) \|\| MI.mayLoadOrStore()) &&		assert((isLEA(MI) \|\| MI.mayLoadOrStore()) &&
"The instruction must be a LEA, a load or a store");		"The instruction must be a LEA, a load or a store");
return MemOpKey(&MI.getOperand(N + X86::AddrBaseReg),		return MemOpKey(&MI.getOperand(N + X86::AddrBaseReg),
&MI.getOperand(N + X86::AddrScaleAmt),		&MI.getOperand(N + X86::AddrScaleAmt),
&MI.getOperand(N + X86::AddrIndexReg),		&MI.getOperand(N + X86::AddrIndexReg),
&MI.getOperand(N + X86::AddrSegmentReg),		&MI.getOperand(N + X86::AddrSegmentReg),
&MI.getOperand(N + X86::AddrDisp));		&MI.getOperand(N + X86::AddrDisp));
}		}

		static inline MemOpKey getMemOpCSEKey(const MachineInstr &MI, unsigned N) {
		static MachineOperand DummyScale = MachineOperand::CreateImm(1);
		RKSimonUnsubmitted Not Done Reply Inline Actions Can we avoid the static? RKSimon: Can we avoid the static?
		RKSimonUnsubmitted Not Done Reply Inline Actions Again, can we avoid the static? RKSimon: Again, can we avoid the static?
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Its used bacause we want MemOpKey for LEA factorization to be indipendent of Scale, keeping it as static avoids recreation of dummy scale. jbhateja: Its used bacause we want MemOpKey for LEA factorization to be indipendent of Scale, keeping it…
		assert((isLEA(MI) \|\| MI.mayLoadOrStore()) &&
		"The instruction must be a LEA, a load or a store");
		return MemOpKey(&MI.getOperand(N + X86::AddrBaseReg),
		&DummyScale,
		&MI.getOperand(N + X86::AddrIndexReg),
		&MI.getOperand(N + X86::AddrSegmentReg),
		&MI.getOperand(N + X86::AddrDisp), true);
		craig.topperUnsubmitted Done Reply Inline Actions Add space before 'true' craig.topper: Add space before 'true'
		}

static inline bool isIdenticalOp(const MachineOperand &MO1,		static inline bool isIdenticalOp(const MachineOperand &MO1,
const MachineOperand &MO2) {		const MachineOperand &MO2) {
return MO1.isIdenticalTo(MO2) &&		return MO1.isIdenticalTo(MO2) &&
(!MO1.isReg() \|\|		(!MO1.isReg() \|\|
!TargetRegisterInfo::isPhysicalRegister(MO1.getReg()));		!TargetRegisterInfo::isPhysicalRegister(MO1.getReg()));
}		}

#ifndef NDEBUG		#ifndef NDEBUG
static bool isValidDispOp(const MachineOperand &MO) {		static bool isValidDispOp(const MachineOperand &MO) {
return MO.isImm() \|\| MO.isCPI() \|\| MO.isJTI() \|\| MO.isSymbol() \|\|		return MO.isImm() \|\| MO.isCPI() \|\| MO.isJTI() \|\| MO.isSymbol() \|\|
MO.isGlobal() \|\| MO.isBlockAddress() \|\| MO.isMCSymbol() \|\| MO.isMBB();		MO.isGlobal() \|\| MO.isBlockAddress() \|\| MO.isMCSymbol() \|\| MO.isMBB();
		lsabaUnsubmitted Not Done Reply Inline Actions need to check MO2.isReg() lsaba: need to check MO2.isReg()
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Yes, I shall take care of this. Kindly let me know if there are any other comments apart from this. It shall save iterations. jbhateja: Yes, I shall take care of this. Kindly let me know if there are any other comments apart from…
}		}
#endif		#endif

static bool isSimilarDispOp(const MachineOperand &MO1,		static bool isSimilarDispOp(const MachineOperand &MO1,
const MachineOperand &MO2) {		const MachineOperand &MO2) {
assert(isValidDispOp(MO1) && isValidDispOp(MO2) &&		assert(isValidDispOp(MO1) && isValidDispOp(MO2) &&
"Address displacement operand is not valid");		"Address displacement operand is not valid");
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i) RKSimon: (style) ``` for (unsigned i = 1, e = MI1->getNumOperands(); i < e; ++i) ```
return (MO1.isImm() && MO2.isImm()) \|\|		return (MO1.isImm() && MO2.isImm()) \|\|
(MO1.isCPI() && MO2.isCPI() && MO1.getIndex() == MO2.getIndex()) \|\|		(MO1.isCPI() && MO2.isCPI() && MO1.getIndex() == MO2.getIndex()) \|\|
(MO1.isJTI() && MO2.isJTI() && MO1.getIndex() == MO2.getIndex()) \|\|		(MO1.isJTI() && MO2.isJTI() && MO1.getIndex() == MO2.getIndex()) \|\|
(MO1.isSymbol() && MO2.isSymbol() &&		(MO1.isSymbol() && MO2.isSymbol() &&
MO1.getSymbolName() == MO2.getSymbolName()) \|\|		MO1.getSymbolName() == MO2.getSymbolName()) \|\|
(MO1.isGlobal() && MO2.isGlobal() &&		(MO1.isGlobal() && MO2.isGlobal() &&
MO1.getGlobal() == MO2.getGlobal()) \|\|		MO1.getGlobal() == MO2.getGlobal()) \|\|
(MO1.isBlockAddress() && MO2.isBlockAddress() &&		(MO1.isBlockAddress() && MO2.isBlockAddress() &&
MO1.getBlockAddress() == MO2.getBlockAddress()) \|\|		MO1.getBlockAddress() == MO2.getBlockAddress()) \|\|
(MO1.isMCSymbol() && MO2.isMCSymbol() &&		(MO1.isMCSymbol() && MO2.isMCSymbol() &&
MO1.getMCSymbol() == MO2.getMCSymbol()) \|\|		MO1.getMCSymbol() == MO2.getMCSymbol()) \|\|
(MO1.isMBB() && MO2.isMBB() && MO1.getMBB() == MO2.getMBB());		(MO1.isMBB() && MO2.isMBB() && MO1.getMBB() == MO2.getMBB());
}		}

static inline bool isLEA(const MachineInstr &MI) {		static inline bool isLEA(const MachineInstr &MI) {
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
return Opcode == X86::LEA16r \|\| Opcode == X86::LEA32r \|\|		return Opcode == X86::LEA16r \|\| Opcode == X86::LEA32r \|\|
Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r;		Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r;
}		}

namespace {		namespace {
class OptimizeLEAPass : public MachineFunctionPass {		class OptimizeLEAPass : public MachineFunctionPass {
		RKSimonUnsubmitted Not Done Reply Inline Actions Do you mean: bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent()); RKSimon: Do you mean: ``` bool isInstrErased = !(Opr.isReg() && Opr.getParent()->getParent()); ```
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions fixed. jbhateja: fixed.
public:		public:
OptimizeLEAPass() : MachineFunctionPass(ID) {}		OptimizeLEAPass() : MachineFunctionPass(ID) {}
		RKSimonUnsubmitted Not Done Reply Inline Actions Comment describing the purpose of the class RKSimon: Comment describing the purpose of the class

StringRef getPassName() const override { return "X86 LEA Optimize"; }		StringRef getPassName() const override { return "X86 LEA Optimize"; }

/// \brief Loop over all of the basic blocks, replacing address		/// \brief Loop over all of the basic blocks, replacing address
/// calculations in load and store instructions, if it's already		/// calculations in load and store instructions, if it's already
/// been calculated by LEA. Also, remove redundant LEAs.		/// been calculated by LEA. Also, remove redundant LEAs.
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.setPreservesCFG();
		MachineFunctionPass::getAnalysisUsage(AU);
		AU.addRequired<MachineDominatorTree>();
		}

private:		private:
typedef DenseMap<MemOpKey, SmallVector<MachineInstr *, 16>> MemOpMap;		typedef DenseMap<MemOpKey, SmallVector<MachineInstr *, 16>> MemOpMap;

/// \brief Returns a distance between two instructions inside one basic block.		/// \brief Returns a distance between two instructions inside one basic block.
/// Negative result means, that instructions occur in reverse order.		/// Negative result means, that instructions occur in reverse order.
int calcInstrDist(const MachineInstr &First, const MachineInstr &Last);		int calcInstrDist(const MachineInstr &First, const MachineInstr &Last);

/// \brief Choose the best \p LEA instruction from the \p List to replace		/// \brief Choose the best \p LEA instruction from the \p List to replace
/// address calculation in \p MI instruction. Return the address displacement		/// address calculation in \p MI instruction. Return the address displacement
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) cleanup the positions of the * - check what clang-format does RKSimon: (style) cleanup the positions of the * - check what clang-format does
		RKSimonUnsubmitted Not Done Reply Inline Actions comment RKSimon: comment
/// and the distance between \p MI and the chosen \p BestLEA in		/// and the distance between \p MI and the chosen \p BestLEA in
/// \p AddrDispShift and \p Dist.		/// \p AddrDispShift and \p Dist.
bool chooseBestLEA(const SmallVectorImpl<MachineInstr *> &List,		bool chooseBestLEA(const SmallVectorImpl<MachineInstr *> &List,
const MachineInstr &MI, MachineInstr *&BestLEA,		const MachineInstr &MI, MachineInstr *&BestLEA,
int64_t &AddrDispShift, int &Dist);		int64_t &AddrDispShift, int &Dist);

/// \brief Returns the difference between addresses' displacements of \p MI1		/// \brief Returns the difference between addresses' displacements of \p MI1
/// and \p MI2. The numbers of the first memory operands for the instructions		/// and \p MI2. The numbers of the first memory operands for the instructions
/// are specified through \p N1 and \p N2.		/// are specified through \p N1 and \p N2.
int64_t getAddrDispShift(const MachineInstr &MI1, unsigned N1,		int64_t getAddrDispShift(const MachineInstr &MI1, unsigned N1,
const MachineInstr &MI2, unsigned N2) const;		const MachineInstr &MI2, unsigned N2) const;

/// \brief Returns true if the \p Last LEA instruction can be replaced by the		/// \brief Returns true if the \p Last LEA instruction can be replaced by the
/// \p First. The difference between displacements of the addresses calculated		/// \p First. The difference between displacements of the addresses calculated
/// by these LEAs is returned in \p AddrDispShift. It'll be used for proper		/// by these LEAs is returned in \p AddrDispShift. It'll be used for proper
		RKSimonUnsubmitted Not Done Reply Inline Actions comment RKSimon: comment
/// replacement of the \p Last LEA's uses with the \p First's def register.		/// replacement of the \p Last LEA's uses with the \p First's def register.
bool isReplaceable(const MachineInstr &First, const MachineInstr &Last,		bool isReplaceable(const MachineInstr &First, const MachineInstr &Last,
int64_t &AddrDispShift) const;		int64_t &AddrDispShift) const;

/// \brief Find all LEA instructions in the basic block. Also, assign position		/// \brief Find all LEA instructions in the basic block. Also, assign position
/// numbers to all instructions in the basic block to speed up calculation of		/// numbers to all instructions in the basic block to speed up calculation of
/// distance between them.		/// distance between them.
void findLEAs(const MachineBasicBlock &MBB, MemOpMap &LEAs);		void findLEAs(const MachineBasicBlock &MBB, MemOpMap &LEAs);

		/// \brief Find all LEA instructions in the basic block that have same
		/// Base, Index, Disp and Segment.
		void populateCSEMap(const MachineBasicBlock &MBB, MemOpMap &LEAs);

		/// \brief Factor out LEAs which share Base,Index,Offset and Segment.
		lsabaUnsubmitted Done Reply Inline Actions Please add a comment explaining what this function does lsaba: Please add a comment explaining what this function does
		bool cseLEAs(const MachineBasicBlock &MBB);

/// \brief Removes redundant address calculations.		/// \brief Removes redundant address calculations.
bool removeRedundantAddrCalc(MemOpMap &LEAs);		bool removeRedundantAddrCalc(MemOpMap &LEAs);

/// Replace debug value MI with a new debug value instruction using register		/// Replace debug value MI with a new debug value instruction using register
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) remove braces RKSimon: (style) remove braces
/// VReg with an appropriate offset and DIExpression to incorporate the		/// VReg with an appropriate offset and DIExpression to incorporate the
/// address displacement AddrDispShift. Return new debug value instruction.		/// address displacement AddrDispShift. Return new debug value instruction.
MachineInstr *replaceDebugValue(MachineInstr &MI, unsigned VReg,		MachineInstr *replaceDebugValue(MachineInstr &MI, unsigned VReg,
int64_t AddrDispShift);		int64_t AddrDispShift);

/// \brief Removes LEAs which calculate similar addresses.		/// \brief Removes LEAs which calculate similar addresses.
bool removeRedundantLEAs(MemOpMap &LEAs);		bool removeRedundantLEAs(MemOpMap &LEAs);

DenseMap<const MachineInstr *, unsigned> InstrPos;		DenseMap<const MachineInstr *, unsigned> InstrPos;

		MachineDominatorTree *DT;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
const X86InstrInfo *TII;		const X86InstrInfo *TII;
const X86RegisterInfo *TRI;		const X86RegisterInfo *TRI;

static char ID;		static char ID;
};		};
char OptimizeLEAPass::ID = 0;		char OptimizeLEAPass::ID = 0;
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions NFC change RKSimon: NFC change

FunctionPass *llvm::createX86OptimizeLEAs() { return new OptimizeLEAPass(); }		FunctionPass *llvm::createX86OptimizeLEAs() { return new OptimizeLEAPass(); }

		void OptimizeLEAPass::populateCSEMap(const MachineBasicBlock &MBB, MemOpMap &LEAs) {
		for (auto &MI : MBB) {
		if (isLEA(MI))
		LEAs[getMemOpCSEKey(MI, 1)].push_back(const_cast<MachineInstr *>(&MI));
		}
		}

int OptimizeLEAPass::calcInstrDist(const MachineInstr &First,		int OptimizeLEAPass::calcInstrDist(const MachineInstr &First,
const MachineInstr &Last) {		const MachineInstr &Last) {
// Both instructions must be in the same basic block and they must be		// Both instructions must be in the same basic block and they must be
// presented in InstrPos.		// presented in InstrPos.
assert(Last.getParent() == First.getParent() &&		assert(Last.getParent() == First.getParent() &&
"Instructions are in different basic blocks");		"Instructions are in different basic blocks");
assert(InstrPos.find(&First) != InstrPos.end() &&		assert(InstrPos.find(&First) != InstrPos.end() &&
InstrPos.find(&Last) != InstrPos.end() &&		InstrPos.find(&Last) != InstrPos.end() &&
"Instructions' positions are undefined");		"Instructions' positions are undefined");

return InstrPos[&Last] - InstrPos[&First];		return InstrPos[&Last] - InstrPos[&First];
}		}

// Find the best LEA instruction in the List to replace address recalculation in		// Find the best LEA instruction in the List to replace address recalculation in
		lsabaUnsubmitted Done Reply Inline Actions it is unclear what this function does, can you explain? lsaba: it is unclear what this function does, can you explain?
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions In a nutshell we are implementing a scoped hash map. Which is LEAs. Every time we enter a new scope and encounter an LEA we first record the length of list of MIs corresponding to MemOpKey of new LEA. After that we insert the new LEA in the beginning of the list which is a value field of the hash map. When we leave a scope we remove the LEA instructions from the LEAs hash map. Since we recorded the original length of list of MIs when we entered the scope at exit we keep on removing elements from the beginning of list till the size becomes same as what was recorded at the entry. jbhateja: In a nutshell we are implementing a scoped hash map. Which is LEAs. Every time we enter a new…
// MI. Such LEA must meet these requirements:		// MI. Such LEA must meet these requirements:
		lsabaUnsubmitted Not Done Reply Inline Actions already initialized at the beginning of the function lsaba: already initialized at the beginning of the function
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Yes. jbhateja: Yes.
// 1) The address calculated by the LEA differs only by the displacement from		// 1) The address calculated by the LEA differs only by the displacement from
// the address used in MI.		// the address used in MI.
// 2) The register class of the definition of the LEA is compatible with the		// 2) The register class of the definition of the LEA is compatible with the
// register class of the address base register of MI.		// register class of the address base register of MI.
// 3) Displacement of the new memory operand should fit in 1 byte if possible.		// 3) Displacement of the new memory operand should fit in 1 byte if possible.
// 4) The LEA should be as close to MI as possible, and prior to it if		// 4) The LEA should be as close to MI as possible, and prior to it if
// possible.		// possible.
bool OptimizeLEAPass::chooseBestLEA(const SmallVectorImpl<MachineInstr *> &List,		bool OptimizeLEAPass::chooseBestLEA(const SmallVectorImpl<MachineInstr *> &List,
const MachineInstr &MI,		const MachineInstr &MI,
MachineInstr *&BestLEA,		MachineInstr *&BestLEA,
int64_t &AddrDispShift, int &Dist) {		int64_t &AddrDispShift, int &Dist) {
const MachineFunction *MF = MI.getParent()->getParent();		const MachineFunction *MF = MI.getParent()->getParent();
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags) +		int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags) +
X86II::getOperandBias(Desc);		X86II::getOperandBias(Desc);

		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format? If so, commit it as an NFC change RKSimon: clang-format? If so, commit it as an NFC change
BestLEA = nullptr;		BestLEA = nullptr;

// Loop over all LEA instructions.		// Loop over all LEA instructions.
for (auto DefMI : List) {		for (auto DefMI : List) {
// Get new address displacement.		// Get new address displacement.
int64_t AddrDispShiftTemp = getAddrDispShift(MI, MemOpNo, *DefMI, 1);		int64_t AddrDispShiftTemp = getAddrDispShift(MI, MemOpNo, *DefMI, 1);

// Make sure address displacement fits 4 bytes.		// Make sure address displacement fits 4 bytes.
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	while (I1 != List.end()) {
replaceDebugValue(MI, FirstVReg, AddrDispShift);		replaceDebugValue(MI, FirstVReg, AddrDispShift);
continue;		continue;
}		}

// Get the number of the first memory operand.		// Get the number of the first memory operand.
const MCInstrDesc &Desc = MI.getDesc();		const MCInstrDesc &Desc = MI.getDesc();
int MemOpNo =		int MemOpNo =
X86II::getMemoryOperandNo(Desc.TSFlags) +		X86II::getMemoryOperandNo(Desc.TSFlags) +
X86II::getOperandBias(Desc);		X86II::getOperandBias(Desc);
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format? If so, commit it as an NFC change RKSimon: clang-format? If so, commit it as an NFC change

// Update address base.		// Update address base.
MO.setReg(FirstVReg);		MO.setReg(FirstVReg);

// Update address disp.		// Update address disp.
MachineOperand &Op = MI.getOperand(MemOpNo + X86::AddrDisp);		MachineOperand &Op = MI.getOperand(MemOpNo + X86::AddrDisp);
if (Op.isImm())		if (Op.isImm())
Op.setImm(Op.getImm() + AddrDispShift);		Op.setImm(Op.getImm() + AddrDispShift);
Show All 20 Lines	while (I1 != List.end()) {
}		}
++I1;		++I1;
}		}
}		}

return Changed;		return Changed;
}		}

		bool OptimizeLEAPass::cseLEAs(const MachineBasicBlock &MBB) {
		MemOpMap LEAs;
		bool cseDone = false;

		// Legal scale value (1,2,4 & 8) vector.
		int LegalScale[9] = {0,1,1,0,1,0,0,0,1};

		populateCSEMap(MBB,LEAs);

		auto CompareFn =
		[] (const MachineInstr Arg1,const MachineInstr Arg2) -> bool {
		if(Arg1->getOperand(2).getImm() < Arg2->getOperand(2).getImm())
		return false;
		return true;
		};

		// Loop over all entries in the table.
		for (auto &E : LEAs) {
		auto &List = E.second;
		if(List.size() > 1) {
		std::sort(List.begin(),List.end(),CompareFn);
		}
		// Loop over all LEA pairs.
		for(auto LII = List.begin(); LII != List.end(); LII++) {
		MachineInstr &LI1 = **LII;
		auto LINext = std::next(LII);
		if(LINext == List.end())
		break;
		MachineInstr &LI2 = **LINext;
		if (!DT->dominates(&LI2,&LI1))
		continue;

		int Scale1 = LI1.getOperand(2).getImm();
		int Scale2 = LI2.getOperand(2).getImm();
		assert(LI2.getOperand(0).isReg() && "Result is a VirtualReg");
		DebugLoc DL = LI1.getDebugLoc();

		int Factor = Scale1 - Scale2;
		lsabaUnsubmitted Not Done Reply Inline Actions This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle it separately, for example in this reproducer : ; ModuleID = 'bugpoint-reduced-simplified.bc' source_filename = "bugpoint-output-2ef2e5d.bc" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: norecurse nounwind readnone uwtable define i32 @foo(i32 %a, i32 %b, i32 %d, i32 %y, i32 %x) local_unnamed_addr #0 { entry: %mul1 = shl i32 %b, 1 %add2 = add i32 %a, 4 %add3 = add i32 %add2, %mul1 %mul4 = shl i32 %b, 2 %add6 = add i32 %add2, %mul4 br label %for.body for.cond.cleanup: ; preds = %for.body ret i32 %add for.body: ; preds = %for.body, %entry %x.addr.015 = phi i32 [ %x, %entry ], [ %add3, %for.body ] %y.addr.014 = phi i32 [ %y, %entry ], [ %add6, %for.body ] %mul = mul nsw i32 %x.addr.015, %y.addr.014 %add = add nsw i32 0, %mul %exitcond = icmp eq i32 undef, %d br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !1 } attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp- math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+ mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 6.0.0 (cfe/trunk 309511)"} !1 = distinct !{!1, !2} !2 = !{!"llvm.loop.unroll.disable"} lsaba: This could end up in an assertion failure if LI1 is at the beginning of the BB, need to handle…
		RKSimonUnsubmitted Not Done Reply Inline Actions Has @lsaba test been added to the patch? I couldn't see it. RKSimon: Has @lsaba test been added to the patch? I couldn't see it.
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization inside loops, only simplifyLEA can kick in. jbhateja: We have a similare test case for loop lea-opt-cse2.ll. We are not doing any factorization…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions We have a test case for loops lea-opt-cse2.ll, so not added this. We are not doing any factorization inside loops, only simplifyLEA can kick in. jbhateja: We have a test case for loops lea-opt-cse2.ll, so not added this. We are not doing any…
		if (Factor > 0 && LegalScale[Factor]) {
		DEBUG(dbgs() << "CSE LEAs: Candidate to replace: "; LI1.dump(););
		MachineInstr *NewMI =
		lsabaUnsubmitted Done Reply Inline Actions could we end up with an illegal scale here? (eg. scale1 = 4, scale2=1) lsaba: could we end up with an illegal scale here? (eg. scale1 = 4, scale2=1)
		BuildMI((const_cast<MachineBasicBlock >(&MBB)),
		&LI1,DL,TII->get(LI1.getOpcode()))
		.addDef(LI1.getOperand(0).getReg()) // Dst = Dst of LI1.
		.addUse(LI2.getOperand(0).getReg()) // Base = Dst of LI2.
		.addImm(Factor) // Scale = Diff b/w scales.
		.addUse(LI1.getOperand(3).getReg()) // Index = Index of LI1.
		.addImm(0) // Disp = 0
		.addUse(LI1.getOperand(5).getReg()); // Segment = Segmant of LI1.

		LI1.eraseFromParent();
		lsabaUnsubmitted Done Reply Inline Actions should it also be erased from the LEAs list? lsaba: should it also be erased from the LEAs list?
		jbhatejaAuthorUnsubmitted Done Reply Inline Actions Why do you think so ? LEAs is a Map where Key = F ( BASE , INDEX , DISP , SEGMENT) Value = Vector of MI (LEA Instr). This MAP is populated per BasicBlock basis. Outer Loop traverse over Map entries Sort Vector in decresing order of Scale. Inner Loop traverses over Sorted vector of LEA for a given Key LI1 insturction will be traversed only once. Map will be delted once we leave this function. Machine CSE which is value number based is already run before this pass so if there are multiple identical LEAs (i.e same BASE/INDEX/SCALE/DISP/SEGMENT) in a BasicBlock they will be factored out before we land up here.. jbhateja: Why do you think so ? LEAs is a Map where Key = F ( BASE , INDEX , DISP , SEGMENT) Value…
		lsabaUnsubmitted Not Done Reply Inline Actions just making sure:) by the way, can't this algorithm work cross a function's basic blocks? lsaba: just making sure:) by the way, can't this algorithm work cross a function's basic blocks?
		cseDone = NewMI != nullptr;
		RKSimonUnsubmitted Done Reply Inline Actions Please can you run this through clang-format? RKSimon: Please can you run this through clang-format?
		DEBUG(dbgs() << "CSE LEAs: Replaced by: "; NewMI->dump(););
		}
		}
		}
		return cseDone;
		}


bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {		bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {
bool Changed = false;		bool Changed = false;

if (DisableX86LEAOpt \|\| skipFunction(*MF.getFunction()))		if (DisableX86LEAOpt \|\| skipFunction(*MF.getFunction()))
return false;		return false;

MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();		TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();
TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();		TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();
		DT = &getAnalysis<MachineDominatorTree>();

// Process all basic blocks.		// Process all basic blocks.
for (auto &MBB : MF) {		for (auto &MBB : MF) {
MemOpMap LEAs;		MemOpMap LEAs;
InstrPos.clear();		InstrPos.clear();

		// Attempt CSE over LEAs.
		Changed \|= cseLEAs(MBB);
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Remove braces RKSimon: (style) Remove braces

// Find all LEA instructions in basic block.		// Find all LEA instructions in basic block.
findLEAs(MBB, LEAs);		findLEAs(MBB, LEAs);

// If current basic block has no LEAs, move on to the next one.		// If current basic block has no LEAs, move on to the next one.
if (LEAs.empty())		if (LEAs.empty())
		RKSimonUnsubmitted Not Done Reply Inline Actions Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp auto IsLegalScale = [](int S) { return S == 1 \|\| S == 2 \|\| S == 4 \|\| S == 8; }; RKSimon: Really don't like this - write a helper instead like you did in X86ISelDAGToDAG.cpp ``` auto…
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed jbhateja: Fixed
continue;		continue;

// Remove redundant LEA instructions.		// Remove redundant LEA instructions.
Changed \|= removeRedundantLEAs(LEAs);		Changed \|= removeRedundantLEAs(LEAs);

// Remove redundant address calculations. Do it only for -Os/-Oz since only		// Remove redundant address calculations. Do it only for -Os/-Oz since only
		RKSimonUnsubmitted Not Done Reply Inline Actions return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm(); RKSimon: ``` return Arg1->getOperand(2).getImm() >= Arg2->getOperand(2).getImm(); ```
		jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Fixed jbhateja: Fixed
// a code size gain is expected from this part of the pass.		// a code size gain is expected from this part of the pass.
if (MF.getFunction()->optForSize())		if (MF.getFunction()->optForSize())
Changed \|= removeRedundantAddrCalc(LEAs);		Changed \|= removeRedundantAddrCalc(LEAs);
}		}

return Changed;		return Changed;
}		}
		jmolloyUnsubmitted Not Done Reply Inline Actions This can cause recursion deep enough to cause stack overflows. Please could you refactor this to not use direct recursion? The domtree may be hundreds of nodes deep in degenerate cases. jmolloy: This can cause recursion deep enough to cause stack overflows. Please could you refactor this…
		RKSimonUnsubmitted Not Done Reply Inline Actions DL is only used here - just use LI1.getDebugLoc() directly? RKSimon: DL is only used here - just use LI1.getDebugLoc() directly?

test/CodeGen/X86/lea-opt-cst.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s -check-prefix=X64
				; RUN: llc < %s -mtriple=i686-unknown \| FileCheck %s -check-prefix=X86

				RKSimonUnsubmitted Done Reply Inline Actions Drop the triple/datalayouts and just use the -mtriple=x86_64-unknown-linux-gnu instead. Add a i686-unknown-linux-gnu test as well. By convention we tend to use X86 as the prefix for i686 triples and X64 for x86_64 triples. You should be able to use utils/update_llc_test_checks.py to generate the codegen. RKSimon: Drop the triple/datalayouts and just use the -mtriple=x86_64-unknown-linux-gnu instead. Add a…
				lsabaUnsubmitted Not Done Reply Inline Actions please generate the test with the original checks before your changes and commit it first in a separate commit lsaba: please generate the test with the original checks before your changes and commit it first in a…
				RKSimonUnsubmitted Not Done Reply Inline Actions Please can you commit this test file to trunk with current codegen and update the patch to show the diff RKSimon: Please can you commit this test file to trunk with current codegen and update the patch to show…
				%struct.SA = type { i32 , i32 , i32 , i32 ,i32 }

				define void @test_func(%struct.SA* nocapture %ctx, i32 %n) local_unnamed_addr {
				; X64-LABEL: test_func:
				; X64: # BB#0: # %entry
				; X64-NEXT: movl (%rdi), %eax
				; X64-NEXT: movl 16(%rdi), %ecx
				; X64-NEXT: leal 1(%rax,%rcx), %eax
				; X64-NEXT: movl %eax, 12(%rdi)
				; X64-NEXT: leal (%eax,%rcx), %eax
				; X64-NEXT: movl %eax, 16(%rdi)
				; X64-NEXT: retq
				;
				; X86-LABEL: test_func:
				; X86: # BB#0: # %entry
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-NEXT: movl (%eax), %ecx
				; X86-NEXT: movl 16(%eax), %edx
				; X86-NEXT: leal 1(%ecx,%edx), %ecx
				; X86-NEXT: movl %ecx, 12(%eax)
				; X86-NEXT: leal (%ecx,%edx), %ecx
				; X86-NEXT: movl %ecx, 16(%eax)
				; X86-NEXT: retl
				entry:
				%h0 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 0
				%0 = load i32, i32* %h0, align 8
				%h3 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 3
				%h4 = getelementptr inbounds %struct.SA, %struct.SA* %ctx, i64 0, i32 4
				%1 = load i32, i32* %h4, align 8
				%add = add i32 %0, 1
				%add4 = add i32 %add, %1
				store i32 %add4, i32* %h3, align 4
				%add29 = add i32 %add4 , %1
				store i32 %add29, i32* %h4, align 8
				ret void
				}

test/CodeGen/X86/mul-constant-i16.ll

	Show First 20 Lines • Show All 552 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%mul = mul nsw i16 %x, 28			%mul = mul nsw i16 %x, 28
	ret i16 %mul			ret i16 %mul
	}			}

	define i16 @test_mul_by_29(i16 %x) {			define i16 @test_mul_by_29(i16 %x) {
	; X86-LABEL: test_mul_by_29:			; X86-LABEL: test_mul_by_29:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: leal (%ecx,%ecx,8), %eax			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%eax,%eax,2), %eax			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %ecx, %eax			; X86-NEXT: leal (%ecx,%eax,2), %eax
	; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>			; X86-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_mul_by_29:			; X64-LABEL: test_mul_by_29:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-NEXT: leal (%rdi,%rdi,8), %eax			; X64-NEXT: leal (%rdi,%rdi,8), %eax
	; X64-NEXT: leal (%rax,%rax,2), %eax			; X64-NEXT: leal (%rax,%rax,2), %eax
	; X64-NEXT: addl %edi, %eax			; X64-NEXT: leal (%rax,%rdi,2), %eax
	; X64-NEXT: addl %edi, %eax
	; X64-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>			; X64-NEXT: # kill: %AX<def> %AX<kill> %EAX<kill>
	; X64-NEXT: retq			; X64-NEXT: retq
	%mul = mul nsw i16 %x, 29			%mul = mul nsw i16 %x, 29
	ret i16 %mul			ret i16 %mul
	}			}

	define i16 @test_mul_by_30(i16 %x) {			define i16 @test_mul_by_30(i16 %x) {
	; X86-LABEL: test_mul_by_30:			; X86-LABEL: test_mul_by_30:
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-constant-i32.ll

	Show First 20 Lines • Show All 1,451 Lines • ▼ Show 20 Lines
	; SLM-NOOPT-NEXT: retq # sched: [4:1.00]			; SLM-NOOPT-NEXT: retq # sched: [4:1.00]
	%mul = mul nsw i32 %x, 28			%mul = mul nsw i32 %x, 28
	ret i32 %mul			ret i32 %mul
	}			}

	define i32 @test_mul_by_29(i32 %x) {			define i32 @test_mul_by_29(i32 %x) {
	; X86-LABEL: test_mul_by_29:			; X86-LABEL: test_mul_by_29:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: leal (%ecx,%ecx,8), %eax			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%eax,%eax,2), %eax			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %ecx, %eax			; X86-NEXT: leal (%ecx,%eax,2), %eax
	; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-HSW-LABEL: test_mul_by_29:			; X64-HSW-LABEL: test_mul_by_29:
	; X64-HSW: # BB#0:			; X64-HSW: # BB#0:
	; X64-HSW-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-HSW-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-HSW-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]			; X64-HSW-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]
	; X64-HSW-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]			; X64-HSW-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]
	; X64-HSW-NEXT: addl %edi, %eax # sched: [1:0.25]			; X64-HSW-NEXT: leal (%rax,%rdi,2), %eax # sched: [1:0.50]
	; X64-HSW-NEXT: addl %edi, %eax # sched: [1:0.25]
	; X64-HSW-NEXT: retq # sched: [1:1.00]			; X64-HSW-NEXT: retq # sched: [1:1.00]
	;			;
	; X64-JAG-LABEL: test_mul_by_29:			; X64-JAG-LABEL: test_mul_by_29:
	; X64-JAG: # BB#0:			; X64-JAG: # BB#0:
	; X64-JAG-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>			; X64-JAG-NEXT: # kill: %EDI<def> %EDI<kill> %RDI<def>
	; X64-JAG-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]			; X64-JAG-NEXT: leal (%rdi,%rdi,8), %eax # sched: [1:0.50]
	; X64-JAG-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]			; X64-JAG-NEXT: leal (%rax,%rax,2), %eax # sched: [1:0.50]
	; X64-JAG-NEXT: addl %edi, %eax # sched: [1:0.50]			; X64-JAG-NEXT: leal (%rax,%rdi,2), %eax # sched: [1:0.50]
	; X64-JAG-NEXT: addl %edi, %eax # sched: [1:0.50]
	; X64-JAG-NEXT: retq # sched: [4:1.00]			; X64-JAG-NEXT: retq # sched: [4:1.00]
	;			;
	; X86-NOOPT-LABEL: test_mul_by_29:			; X86-NOOPT-LABEL: test_mul_by_29:
	; X86-NOOPT: # BB#0:			; X86-NOOPT: # BB#0:
	; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %eax			; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %eax
	; X86-NOOPT-NEXT: retl			; X86-NOOPT-NEXT: retl
	;			;
	; HSW-NOOPT-LABEL: test_mul_by_29:			; HSW-NOOPT-LABEL: test_mul_by_29:
	▲ Show 20 Lines • Show All 257 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-constant-i64.ll

	Show First 20 Lines • Show All 1,517 Lines • ▼ Show 20 Lines
	}			}

	define i64 @test_mul_by_29(i64 %x) {			define i64 @test_mul_by_29(i64 %x) {
	; X86-LABEL: test_mul_by_29:			; X86-LABEL: test_mul_by_29:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: leal (%eax,%eax,8), %ecx			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%ecx,%ecx,2), %ecx			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %eax, %ecx			; X86-NEXT: leal (%ecx,%eax,2), %ecx
	; X86-NEXT: addl %eax, %ecx
	; X86-NEXT: movl $29, %eax			; X86-NEXT: movl $29, %eax
	; X86-NEXT: mull {{[0-9]+}}(%esp)			; X86-NEXT: mull {{[0-9]+}}(%esp)
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-HSW-LABEL: test_mul_by_29:			; X64-HSW-LABEL: test_mul_by_29:
	; X64-HSW: # BB#0:			; X64-HSW: # BB#0:
	; X64-HSW-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]			; X64-HSW-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]
	; X64-HSW-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]			; X64-HSW-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]
	; X64-HSW-NEXT: addq %rdi, %rax # sched: [1:0.25]			; X64-HSW-NEXT: leaq (%rax,%rdi,2), %rax # sched: [1:0.50]
	; X64-HSW-NEXT: addq %rdi, %rax # sched: [1:0.25]
	; X64-HSW-NEXT: retq # sched: [1:1.00]			; X64-HSW-NEXT: retq # sched: [1:1.00]
	;			;
	; X64-JAG-LABEL: test_mul_by_29:			; X64-JAG-LABEL: test_mul_by_29:
	; X64-JAG: # BB#0:			; X64-JAG: # BB#0:
	; X64-JAG-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]			; X64-JAG-NEXT: leaq (%rdi,%rdi,8), %rax # sched: [1:0.50]
	; X64-JAG-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]			; X64-JAG-NEXT: leaq (%rax,%rax,2), %rax # sched: [1:0.50]
	; X64-JAG-NEXT: addq %rdi, %rax # sched: [1:0.50]			; X64-JAG-NEXT: leaq (%rax,%rdi,2), %rax # sched: [1:0.50]
	; X64-JAG-NEXT: addq %rdi, %rax # sched: [1:0.50]
	; X64-JAG-NEXT: retq # sched: [4:1.00]			; X64-JAG-NEXT: retq # sched: [4:1.00]
	;			;
	; X86-NOOPT-LABEL: test_mul_by_29:			; X86-NOOPT-LABEL: test_mul_by_29:
	; X86-NOOPT: # BB#0:			; X86-NOOPT: # BB#0:
	; X86-NOOPT-NEXT: movl $29, %eax			; X86-NOOPT-NEXT: movl $29, %eax
	; X86-NOOPT-NEXT: mull {{[0-9]+}}(%esp)			; X86-NOOPT-NEXT: mull {{[0-9]+}}(%esp)
	; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %ecx			; X86-NOOPT-NEXT: imull $29, {{[0-9]+}}(%esp), %ecx
	; X86-NOOPT-NEXT: addl %ecx, %edx			; X86-NOOPT-NEXT: addl %ecx, %edx
	▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

test/CodeGen/X86/mul-constant-result.ll

	Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	; X86-NEXT: leal (%eax,%eax,8), %ecx			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%ecx,%ecx,2), %ecx			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %ecx, %eax			; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	; X86-NEXT: .LBB0_35:			; X86-NEXT: .LBB0_35:
	; X86-NEXT: leal (%eax,%eax,8), %ecx			; X86-NEXT: leal (%eax,%eax,8), %ecx
	; X86-NEXT: leal (%ecx,%ecx,2), %ecx			; X86-NEXT: leal (%ecx,%ecx,2), %ecx
	; X86-NEXT: addl %eax, %ecx			; X86-NEXT: leal (%ecx,%eax,2), %eax
	; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: retl			; X86-NEXT: retl
	; X86-NEXT: .LBB0_36:			; X86-NEXT: .LBB0_36:
	; X86-NEXT: movl %eax, %ecx			; X86-NEXT: movl %eax, %ecx
	; X86-NEXT: shll $5, %ecx			; X86-NEXT: shll $5, %ecx
	; X86-NEXT: subl %eax, %ecx			; X86-NEXT: subl %eax, %ecx
	; X86-NEXT: jmp .LBB0_12			; X86-NEXT: jmp .LBB0_12
	; X86-NEXT: .LBB0_37:			; X86-NEXT: .LBB0_37:
	▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	; X64-HSW-NEXT: .LBB0_30:			; X64-HSW-NEXT: .LBB0_30:
	; X64-HSW-NEXT: leal (%rax,%rax,8), %eax			; X64-HSW-NEXT: leal (%rax,%rax,8), %eax
	; X64-HSW-NEXT: leal (%rax,%rax,2), %eax			; X64-HSW-NEXT: leal (%rax,%rax,2), %eax
	; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>			; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>
	; X64-HSW-NEXT: retq			; X64-HSW-NEXT: retq
	; X64-HSW-NEXT: .LBB0_31:			; X64-HSW-NEXT: .LBB0_31:
	; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx			; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
	; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx			; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
	; X64-HSW-NEXT: jmp .LBB0_17
	; X64-HSW-NEXT: .LBB0_32:
	; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
	; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
	; X64-HSW-NEXT: addl %eax, %ecx
	; X64-HSW-NEXT: .LBB0_17:			; X64-HSW-NEXT: .LBB0_17:
	; X64-HSW-NEXT: addl %eax, %ecx			; X64-HSW-NEXT: addl %eax, %ecx
	; X64-HSW-NEXT: movl %ecx, %eax			; X64-HSW-NEXT: movl %ecx, %eax
	; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>			; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>
	; X64-HSW-NEXT: retq			; X64-HSW-NEXT: retq
				; X64-HSW-NEXT: .LBB0_32:
				; X64-HSW-NEXT: leal (%rax,%rax,8), %ecx
				; X64-HSW-NEXT: leal (%rcx,%rcx,2), %ecx
				; X64-HSW-NEXT: leal (%rcx,%rax,2), %eax
				; X64-HSW-NEXT: # kill: %EAX<def> %EAX<kill> %RAX<kill>
				; X64-HSW-NEXT: retq
	; X64-HSW-NEXT: .LBB0_33:			; X64-HSW-NEXT: .LBB0_33:
	; X64-HSW-NEXT: movl %eax, %ecx			; X64-HSW-NEXT: movl %eax, %ecx
	; X64-HSW-NEXT: shll $5, %ecx			; X64-HSW-NEXT: shll $5, %ecx
	; X64-HSW-NEXT: subl %eax, %ecx			; X64-HSW-NEXT: subl %eax, %ecx
	; X64-HSW-NEXT: jmp .LBB0_8			; X64-HSW-NEXT: jmp .LBB0_8
	; X64-HSW-NEXT: .LBB0_34:			; X64-HSW-NEXT: .LBB0_34:
	; X64-HSW-NEXT: movl %eax, %ecx			; X64-HSW-NEXT: movl %eax, %ecx
	; X64-HSW-NEXT: shll $5, %ecx			; X64-HSW-NEXT: shll $5, %ecx
	▲ Show 20 Lines • Show All 949 Lines • Show Last 20 Lines

test/CodeGen/X86/umul-with-overflow.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-linux-gnu \| FileCheck %s			; RUN: llc < %s -mtriple=i686-unknown-linux-gnu \| FileCheck %s

	declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)			declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
	define zeroext i1 @a(i32 %x) nounwind {			define zeroext i1 @a(i32 %x) nounwind {
				; CHECK-LABEL: a:
				lsabaUnsubmitted Not Done Reply Inline Actions why did this test change? lsaba: why did this test change?
				jbhatejaAuthorUnsubmitted Not Done Reply Inline Actions Beecause I generated its output with script utils/update_llc_test_checks.py which adds an assertion for each instruction. I think it sould be fine. jbhateja: Beecause I generated its output with script utils/update_llc_test_checks.py which adds an…
				lsabaUnsubmitted Not Done Reply Inline Actions This needs to be in a separate pre commit. please commit and rebase lsaba: This needs to be in a separate pre commit. please commit and rebase
				RKSimonUnsubmitted Not Done Reply Inline Actions I regenerated this recently - please rebase RKSimon: I regenerated this recently - please rebase
				RKSimonUnsubmitted Not Done Reply Inline Actions Still needs to be rebased - you've lost the x86_64 tests RKSimon: Still needs to be rebased - you've lost the x86_64 tests
				; CHECK: # BB#0:
				; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: movl $3, %ecx
				; CHECK-NEXT: mull %ecx
				; CHECK-NEXT: seto %al
				; CHECK-NEXT: retl
	%res = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %x, i32 3)			%res = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %x, i32 3)
	%obil = extractvalue {i32, i1} %res, 1			%obil = extractvalue {i32, i1} %res, 1
	ret i1 %obil			ret i1 %obil

	; CHECK-LABEL: a:
	; CHECK: mull
	; CHECK: seto %al
	; CHECK: ret
	}			}

	define i32 @test2(i32 %a, i32 %b) nounwind readnone {			define i32 @test2(i32 %a, i32 %b) nounwind readnone {
				; CHECK-LABEL: test2:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: addl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: addl %eax, %eax
				; CHECK-NEXT: retl
	entry:			entry:
	%tmp0 = add i32 %b, %a			%tmp0 = add i32 %b, %a
	%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 2)			%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 2)
	%tmp2 = extractvalue { i32, i1 } %tmp1, 0			%tmp2 = extractvalue { i32, i1 } %tmp1, 0
	ret i32 %tmp2			ret i32 %tmp2
	; CHECK-LABEL: test2:
	; CHECK: addl
	; CHECK-NEXT: addl
	; CHECK-NEXT: ret
	}			}

	define i32 @test3(i32 %a, i32 %b) nounwind readnone {			define i32 @test3(i32 %a, i32 %b) nounwind readnone {
				; CHECK-LABEL: test3:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: addl {{[0-9]+}}(%esp), %eax
				; CHECK-NEXT: movl $4, %ecx
				; CHECK-NEXT: mull %ecx
				; CHECK-NEXT: retl
	entry:			entry:
	%tmp0 = add i32 %b, %a			%tmp0 = add i32 %b, %a
	%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 4)			%tmp1 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %tmp0, i32 4)
	%tmp2 = extractvalue { i32, i1 } %tmp1, 0			%tmp2 = extractvalue { i32, i1 } %tmp1, 0
	ret i32 %tmp2			ret i32 %tmp2
	; CHECK-LABEL: test3:
	; CHECK: addl
	; CHECK: mull
	; CHECK-NEXT: ret
	}			}

test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll

	; RUN: llc < %s -O3 -march=x86-64 -mcpu=core2 \| FileCheck %s -check-prefix=X64			; RUN: llc < %s -O3 -march=x86-64 -mcpu=core2 \| FileCheck %s -check-prefix=X64
	; RUN: llc < %s -O3 -march=x86 -mcpu=core2 \| FileCheck %s -check-prefix=X32			; RUN: llc < %s -O3 -march=x86 -mcpu=core2 \| FileCheck %s -check-prefix=X32

	; @simple is the most basic chain of address induction variables. Chaining			; @simple is the most basic chain of address induction variables. Chaining
	; saves at least one register and avoids complex addressing and setup			; saves at least one register and avoids complex addressing and setup
	; code.			; code.
	;			;
	; X64: @simple			; X64: @simple
	; %x * 4			; %x * 4
	; X64: shlq $2			; X64: shlq $2
	; no other address computation in the preheader			; no other address computation in the preheader
	; X64-NEXT: xorl			; X64-NEXT: xorl
	; X64-NEXT: .p2align			; X64-NEXT: .p2align
	; X64: %loop			; X64: %loop
	; no complex address modes			; no complex address modes
	; X64-NOT: (%{{[^)]+}},%{{[^)]+}},			; X64-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
	;			;
	; X32: @simple			; X32: @simple
	; no expensive address computation in the preheader			; no expensive address computation in the preheader
	; X32-NOT: imul			; X32-NOT: imul
	; X32: %loop			; X32: %loop
	; no complex address modes			; no complex address modes
	; X32-NOT: (%{{[^)]+}},%{{[^)]+}},			; X32-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
	define i32 @simple(i32* %a, i32* %b, i32 %x) nounwind {			define i32 @simple(i32* %a, i32* %b, i32 %x) nounwind {
	entry:			entry:
	br label %loop			br label %loop
	loop:			loop:
	%iv = phi i32* [ %a, %entry ], [ %iv4, %loop ]			%iv = phi i32* [ %a, %entry ], [ %iv4, %loop ]
	%s = phi i32 [ 0, %entry ], [ %s4, %loop ]			%s = phi i32 [ 0, %entry ], [ %s4, %loop ]
	%v = load i32, i32* %iv			%v = load i32, i32* %iv
	%iv1 = getelementptr inbounds i32, i32* %iv, i32 %x			%iv1 = getelementptr inbounds i32, i32* %iv, i32 %x
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; strange increment expressions like this:			; strange increment expressions like this:
	; IV + ((sext i32 (2 * %s) to i64) + (-1 * (sext i32 %s to i64)))			; IV + ((sext i32 (2 * %s) to i64) + (-1 * (sext i32 %s to i64)))
	;			;
	; X32: extrastride:			; X32: extrastride:
	; no spills in the preheader			; no spills in the preheader
	; X32-NOT: mov{{.*}}(%esp){{$}}			; X32-NOT: mov{{.*}}(%esp){{$}}
	; X32: %for.body{{$}}			; X32: %for.body{{$}}
	; no complex address modes			; no complex address modes
	; X32-NOT: (%{{[^)]+}},%{{[^)]+}},			; X32-NOT: [1-9]+(%{{[^)]+}},%{{[^)]+}},
	; no reloads			; no reloads
	; X32-NOT: (%esp)			; X32-NOT: (%esp)
	define void @extrastride(i8* nocapture %main, i32 %main_stride, i32* nocapture %res, i32 %x, i32 %y, i32 %z) nounwind {			define void @extrastride(i8* nocapture %main, i32 %main_stride, i32* nocapture %res, i32 %x, i32 %y, i32 %z) nounwind {
	entry:			entry:
	%cmp8 = icmp eq i32 %z, 0			%cmp8 = icmp eq i32 %z, 0
	br i1 %cmp8, label %for.end, label %for.body.lr.ph			br i1 %cmp8, label %for.end, label %for.body.lr.ph

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

utils/TableGen/DAGISelMatcherGen.cpp

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	if (N->getOperator()->isSubClassOf("ComplexPattern")) {
}		}

return;		return;
}		}

const SDNodeInfo &CInfo = CGP.getSDNodeInfo(N->getOperator());		const SDNodeInfo &CInfo = CGP.getSDNodeInfo(N->getOperator());

// If this is an 'and R, 1234' where the operation is AND/OR and the RHS is		// If this is an 'and R, 1234' where the operation is AND/OR and the RHS is
// a constant without a predicate fn that has more that one bit set, handle		// a constant without a predicate fn that has more than one bit set, handle
		craig.topperUnsubmitted Not Done Reply Inline Actions This should be a NFC pre-commit. craig.topper: This should be a NFC pre-commit.
		RKSimonUnsubmitted Not Done Reply Inline Actions This is still here RKSimon: This is still here
// this as a special case. This is usually for targets that have special		// this as a special case. This is usually for targets that have special
// handling of certain large constants (e.g. alpha with it's 8/16/32-bit		// handling of certain large constants (e.g. alpha with it's 8/16/32-bit
// handling stuff). Using these instructions is often far more efficient		// handling stuff). Using these instructions is often far more efficient
// than materializing the constant. Unfortunately, both the instcombiner		// than materializing the constant. Unfortunately, both the instcombiner
// and the dag combiner can often infer that bits are dead, and thus drop		// and the dag combiner can often infer that bits are dead, and thus drop
// them from the mask in the dag. For example, it might turn 'AND X, 255'		// them from the mask in the dag. For example, it might turn 'AND X, 255'
// into 'AND X, 254' if it knows the low bit is set. Emit code that checks		// into 'AND X, 254' if it knows the low bit is set. Emit code that checks
// to handle this.		// to handle this.
▲ Show 20 Lines • Show All 692 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improvement in CodeGen instruction selection for LEAs.ClosedPublic

Details

Diff Detail

Event Timeline

BB#0: # %entry

Revision Contents

Diff 105790

lib/Target/X86/X86ISelDAGToDAG.cpp

lib/Target/X86/X86OptimizeLEAs.cpp

test/CodeGen/X86/lea-opt-cst.ll

test/CodeGen/X86/mul-constant-i16.ll

test/CodeGen/X86/mul-constant-i32.ll

test/CodeGen/X86/mul-constant-i64.ll

test/CodeGen/X86/mul-constant-result.ll

test/CodeGen/X86/umul-with-overflow.ll

test/Transforms/LoopStrengthReduce/X86/ivchain-X86.ll

utils/TableGen/DAGISelMatcherGen.cpp

[X86] Improvement in CodeGen instruction selection for LEAs.
ClosedPublic