This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64TargetMachine.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
-
GlobalISel/
1/2
arm64-irtranslator-gep.ll
-
arm64-irtranslator.ll
1
O3-pipeline.ll
4/8
cond-br-tuning.ll
-
Transforms/SeparateConstOffsetFromGEP/AArch64/
-
SeparateConstOffsetFromGEP/
-
AArch64/
-
lit.local.cfg
-
split-gep.ll

Differential D128582

Move SeparateConstOffsetFromGEPPass() before LSR() and enable EnableGEPOpt by default.
ClosedPublic

Authored by gsocshubham on Jun 25 2022, 5:27 AM.

Download Raw Diff

Details

Reviewers

momchil.velikov
KyrBoh
fhahn
craig.topper
dmgreen
nikic

Commits

rGf55dbfbd9d8c: [AArch64] Move SeparateConstOffsetFromGEPPass before LSR and enable…

Summary

GEP's across basic blocks were not getting splitted due to EnableGEPOpt which was turned off by default. Hence, EarlyCSE missed the opportunity to eliminate common part of GEP's. This can be achieved by simply turning GEP pass on.

This patch moves SeparateConstOffsetFromGEPPass() just before LSR().
It enables EnableGEPOpt by default.

Resolves - https://github.com/llvm/llvm-project/issues/50528

Added an unit test.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gsocshubham created this revision.Jun 25 2022, 5:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2022, 5:27 AM

Herald added subscribers: StephenFan, hiraditya, arichardson. · View Herald Transcript

gsocshubham requested review of this revision.Jun 25 2022, 5:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2022, 5:27 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

gsocshubham edited the summary of this revision. (Show Details)Jun 25 2022, 5:28 AM

gsocshubham edited the summary of this revision. (Show Details)

Below are SPECS intrate benchmark results obtained on commit f6c79c6ae49f3a642bebe32a2346186c38bb83d7

Peak tuning(-O3)

Benchmark          %change

500.perlbench_r    2.173
505.mcf_r          -0.472
520.omnetpp_r      0.055
523.xalancbmk_r    -0.939
525.x264_r         -4.048
531.deepsjeng_r    4.944
541.leela_r        0.604
557.xz_r           0.209

Base tuning(-O2)

Benchmark          %change

500.perlbench_r    1.341
505.mcf_r          -0.526
520.omnetpp_r      0.961
523.xalancbmk_r    1.257
525.x264_r         -6.739
531.deepsjeng_r    0
541.leela_r        -0.462
557.xz_r           0.522

The only benchmark with most regression is 525.x264_r. Most of the benchmarks shows improved results.

gsocshubham added inline comments.Jun 25 2022, 5:40 AM

llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
1050 ↗	(On Diff #439975)	Let me know best course of action on `lowerToSingleIndexGEPs()`. I have temporarily removed it because I was running SPEC benchmarks for AArch64 and GEP's are being splitted using `lowerToSingleIndexGEPs()` when called using clang. But when called using opt, `lowerToArithmetics()` gets called. I want to have `lowerToArithmetics()` as default. Also, this pass was no longer used.

Please review and suggest changes.

Harbormaster completed remote builds in B172017: Diff 439975.Jun 25 2022, 5:59 AM

Missing context
Missing test
x264_r regression is big one..
Many tests fail in precommit CI - please update them

Hello. Unfortunately, I doubt that people will be in favour of this approach, especially if it is introducing ptr2int and int2ptr's so early in the pass pipeline. It looks like a pass that needs to be run as part of the backend.

There is a run of the pass in the AArch64 backend already, but it is disabled by default: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp#L581
Enabling it didn't seem to help with the original case in #50528. Would it be possible to teach it what it needs to for that case, and from there enable the EnableGEPOpt flag?

In D128582#3624787, @dmgreen wrote:

Hello. Unfortunately, I doubt that people will be in favour of this approach, especially if it is introducing ptr2int and int2ptr's so early in the pass pipeline. It looks like a pass that needs to be run as part of the backend.

There is a run of the pass in the AArch64 backend already, but it is disabled by default: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp#L581

Hello David,

Passing -aarch64-enable-gep-opt=true -O3 would do the job - https://clang.godbolt.org/z/hcjsh1vex what this patch was proposing. WDYT?

Enabling it didn't seem to help with the original case in #50528. Would it be possible to teach it what it needs to for that case, and from there enable the EnableGEPOpt flag?

Sure. For #50528, enabling EnableGEPOpt reduces GEP instructions by half but is done by splitting GEP into ptr2int and int2ptr.

In D128582#3624787, @dmgreen wrote:

Hello. Unfortunately, I doubt that people will be in favour of this approach, especially if it is introducing ptr2int and int2ptr's so early in the pass pipeline. It looks like a pass that needs to be run as part of the backend.

Indeed. This pass, especially in lowerToArithmetics() mode, is only usable in the backend.

This revision now requires changes to proceed.Jul 4 2022, 12:14 PM

Passing -aarch64-enable-gep-opt=true -O3 would do the job - https://clang.godbolt.org/z/hcjsh1vex what this patch was proposing. WDYT?

..

Sure. For #50528, enabling EnableGEPOpt reduces GEP instructions by half but is done by splitting GEP into ptr2int and int2ptr.

Oh yeah I see it does. I must have missed the -O3 off the time I tried it. That's good. I wonder why it was never enabled in the past.

From the look of it, it runs after LSR, which I think it would need to run before. Otherwise it is likely to mess up what LSR has tried to do. That would be before the call to TargetPassConfig::addIRPasses(). I'm not sure if the LICM run is necessary either, but I see it is used in other backends. We would need to gather some benchmark to see how it behaves. Like do the issues in x264 still occur, and what happens across more benchmark cases.

In D128582#3628937, @dmgreen wrote:

Passing -aarch64-enable-gep-opt=true -O3 would do the job - https://clang.godbolt.org/z/hcjsh1vex what this patch was proposing. WDYT?

..

Sure. For #50528, enabling EnableGEPOpt reduces GEP instructions by half but is done by splitting GEP into ptr2int and int2ptr.

Oh yeah I see it does. I must have missed the -O3 off the time I tried it. That's good. I wonder why it was never enabled in the past.

From the look of it, it runs after LSR, which I think it would need to run before. Otherwise it is likely to mess up what LSR has tried to do. That would be before the call to TargetPassConfig::addIRPasses(). I'm not sure if the LICM run is necessary either, but I see it is used in other backends. We would need to gather some benchmark to see how it behaves. Like do the issues in x264 still occur, and what happens across more benchmark cases.

Will running it before LSR make any difference compared to current location?

The issue in x264 was occuring due to pass registeration way early at IR level. It is irrelevant now since it is already enabled in AArch64 if passed relevant flags. As we have been given suggestion to move GEP pass just before Selection DAG, I have registered/moved GEP pass from addIRPasses() to at the end of AArch64PassConfig::addCodeGenPrepare() but there is no regression in x264 benchmark result as compared to master.

In D128582#3637881, @gsocshubham wrote:

In D128582#3628937, @dmgreen wrote:

Passing -aarch64-enable-gep-opt=true -O3 would do the job - https://clang.godbolt.org/z/hcjsh1vex what this patch was proposing. WDYT?

..

Sure. For #50528, enabling EnableGEPOpt reduces GEP instructions by half but is done by splitting GEP into ptr2int and int2ptr.

Oh yeah I see it does. I must have missed the -O3 off the time I tried it. That's good. I wonder why it was never enabled in the past.

From the look of it, it runs after LSR, which I think it would need to run before. Otherwise it is likely to mess up what LSR has tried to do. That would be before the call to TargetPassConfig::addIRPasses(). I'm not sure if the LICM run is necessary either, but I see it is used in other backends. We would need to gather some benchmark to see how it behaves. Like do the issues in x264 still occur, and what happens across more benchmark cases.

Will running it before LSR make any difference compared to current location?

The issue in x264 was occuring due to pass registeration way early at IR level. It is irrelevant now since it is already enabled in AArch64 if passed relevant flags. As we have been given suggestion to move GEP pass just before Selection DAG, I have registered/moved GEP pass from addIRPasses() to at the end of AArch64PassConfig::addCodeGenPrepare() but there is no regression in x264 benchmark result as compared to master.

Everything in AArch64PassConfig::addIRPasses counts as the "backend" from the point of view of LLVM. They are still llvm-ir passes, but ran as part of the backend prior to ISel lowering to perform certain optimizations and help with lowering that are target-specific, but easier to perform on IR than MIR. I think the pass really needs to run before the call to LSR, as LSR will be very opinionated about the geps in loops and we don't want to mess that up. LSR is added by TargetPassConfig::addIRPasses(), so moving the SeparateConstOffsetFromGEP passes anywhere before that call in AArch64PassConfig::addIRPasses should be OK.

Move GEP pass before LSR pass.

In D128582#3640458, @dmgreen wrote:

In D128582#3637881, @gsocshubham wrote:

In D128582#3628937, @dmgreen wrote:

Passing -aarch64-enable-gep-opt=true -O3 would do the job - https://clang.godbolt.org/z/hcjsh1vex what this patch was proposing. WDYT?

..

Sure. For #50528, enabling EnableGEPOpt reduces GEP instructions by half but is done by splitting GEP into ptr2int and int2ptr.

Oh yeah I see it does. I must have missed the -O3 off the time I tried it. That's good. I wonder why it was never enabled in the past.

From the look of it, it runs after LSR, which I think it would need to run before. Otherwise it is likely to mess up what LSR has tried to do. That would be before the call to TargetPassConfig::addIRPasses(). I'm not sure if the LICM run is necessary either, but I see it is used in other backends. We would need to gather some benchmark to see how it behaves. Like do the issues in x264 still occur, and what happens across more benchmark cases.

Will running it before LSR make any difference compared to current location?

The issue in x264 was occuring due to pass registeration way early at IR level. It is irrelevant now since it is already enabled in AArch64 if passed relevant flags. As we have been given suggestion to move GEP pass just before Selection DAG, I have registered/moved GEP pass from addIRPasses() to at the end of AArch64PassConfig::addCodeGenPrepare() but there is no regression in x264 benchmark result as compared to master.

Everything in AArch64PassConfig::addIRPasses counts as the "backend" from the point of view of LLVM. They are still llvm-ir passes, but ran as part of the backend prior to ISel lowering to perform certain optimizations and help with lowering that are target-specific, but easier to perform on IR than MIR. I think the pass really needs to run before the call to LSR, as LSR will be very opinionated about the geps in loops and we don't want to mess that up. LSR is added by TargetPassConfig::addIRPasses(), so moving the SeparateConstOffsetFromGEP passes anywhere before that call in AArch64PassConfig::addIRPasses should be OK.

I have moved GEP pass just before LSR pass and below are benchmarks results at peak (O3) -

Benchmark              %change w.r.t master
500.perlbench_r ->     2.085
505.mcf_r       ->     0.466
520.omnetpp_r   ->     0.607
523.xalancbmk_r ->     -0.515
531.deepsjeng_r ->     0.326
541.leela_r     ->     -0.980
557.xz_r        ->     -0.0571

In D128582#3624787, @dmgreen wrote:

Hello. Unfortunately, I doubt that people will be in favour of this approach, especially if it is introducing ptr2int and int2ptr's so early in the pass pipeline. It looks like a pass that needs to be run as part of the backend.

There is a run of the pass in the AArch64 backend already, but it is disabled by default: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp#L581
Enabling it didn't seem to help with the original case in #50528. Would it be possible to teach it what it needs to for that case, and from there enable the EnableGEPOpt flag?

Regarding teaching SeparateConstOffsetFromGEPPass pass to handle constant GEP's -

From #50528, there is only one case where there is gep with all constant indices as below -

store i32 %inc, i32* getelementptr inbounds (%struct.state_t, %struct.state_t* @s, i64 0, i32 2)

-> At first, this was not being considered for gep as it is store instruction and hence I split above into 2 instructions in testcase as below -

%temp = getelementptr inbounds %struct.state_t, %struct.state_t* @s, i64 0, i32 2
  store i32 %inc, i32* %temp

%temp is not being considered by the pass since it has all constant indices -

// The backend can already nicely handle the case where all indices are
// constant.
if (GEP->hasAllConstantIndices())
  return false;

I tried forcefully splitting it by passing the checks but it does not seem right to split GEP with all constant indices at IR level. From the comment backend passes should take care of it but I still see repeated instructions -

madd    x8, x8, x10, x9
madd    x0, x11, x12, x8

I see 2 occurrencea of above set of instruction in assembly. Ideally it should occur only once.

Reference - https://clang.godbolt.org/z/KfrfT97hn

It sounds like this would be a good first step, and we can look into the other geps in the issue in a separate patch if needed. Can you:

Change the EnableGEPOpt from false to true
Add a test case from the bug, run through llc to show the codegen improvements.
Update the patch with full context

(Patch no longer modifies the middle-end pipeline in unacceptable ways)

Harbormaster completed remote builds in B175474: Diff 444754.Jul 14 2022, 3:22 PM

a. Updated patch with context.
b. Updated title and summary.
c. Moved SeparateConstOffsetFromGEPPass() before LSR()
d. Added an unit test.

Harbormaster completed remote builds in B175603: Diff 444933.Jul 15 2022, 4:41 AM

This patch does not enable splitting of GEP with all constant indices - SeparateConstOffsetFromGEP.cpp:969

// The backend can already nicely handle the case where all indices are
// constant.
if (GEP->hasAllConstantIndices())
  return false;

Thanks - Please reread this and see what "with full context" means: https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface. The patch should be created with -U9999999.

I was thinking of the original longer test from the issue, but it is a bit long. This one might be OK too.

llvm/test/CodeGen/AArch64/cond-br-tuning.ll
33	Change the store here to store 10 not 0. That should keep this testing what it did previously.
llvm/test/Transforms/SeparateConstOffsetFromGEP/split-gep.ll
2 ↗	(On Diff #444933)	Remove -aarch64-enable-gep-opt - it is on by default now.
30 ↗	(On Diff #444933)	Remove extra line.

Updated patch using -U9999999 and fixed review comments.

llvm/test/CodeGen/AArch64/cond-br-tuning.ll
33	Done.

In D128582#3657201, @dmgreen wrote:

Thanks - Please reread this and see what "with full context" means: https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface. The patch should be created with -U9999999.

Understood. Thanks. I have updated patch using -U9999999. Is it fine now?

I was thinking of the original longer test from the issue, but it is a bit long. This one might be OK too.

gsocshubham edited the summary of this revision. (Show Details)Jul 20 2022, 1:16 AM

Thanks. This looks good, but did the test go missing from the previous revision?

llvm/test/CodeGen/AArch64/O3-pipeline.ll
12	This should be removed, as the file doesn't use the script.

Harbormaster completed remote builds in B176444: Diff 446067.Jul 20 2022, 2:04 AM

Fixed review comments.

llvm/test/Transforms/SeparateConstOffsetFromGEP/split-gep.ll
2 ↗	(On Diff #444933)	Done.
30 ↗	(On Diff #444933)	Done.

In D128582#3665002, @dmgreen wrote:

Thanks. This looks good, but did the test go missing from the previous revision?

Yes. It got removed. I have added it.

Harbormaster completed remote builds in B176471: Diff 446108.Jul 20 2022, 5:10 AM

On my latest revision, build is failed. Is it due to pre merge test failures?

Build Status
    Buildable 176471	
    Build 265103: pre-merge checks	x64 windows failed · x64 debian failed

What is the script used for building? Can someone please point to it? In my local, build is passing. I want to replicate current build at my local with same config options.

Thanks. The buildbot failure seems unrelated.

This LGTM if you can alter the test slightly and things are passing locally.

llvm/test/CodeGen/AArch64/cond-br-tuning.ll
36	%d = icmp ne i32 %c, 10
39	store i32 10, i32* %ptr, align 4

This revision is now accepted and ready to land.Jul 20 2022, 11:31 PM

gsocshubham added inline comments.Jul 21 2022, 5:19 AM

llvm/test/CodeGen/AArch64/cond-br-tuning.ll
36	Do you mean it to change it to `%d = icmp ne i32 %c, 0`? It is already `%d = icmp ne i32 %c, 10`

dmgreen added inline comments.Jul 21 2022, 5:27 AM

llvm/test/CodeGen/AArch64/cond-br-tuning.ll
36	Yeah sorry, that is what I meant. The store should be storing a value that isn't 0, so the csel isn't optimized away. The cmp should still not be present (it does get optimized, as it can re-use the adds flags).

Fixed review comments.

There is only one LIT test which is failing now - CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.

gsocshubham added inline comments.Jul 21 2022, 8:28 AM

llvm/test/CodeGen/AArch64/cond-br-tuning.ll
36	Done.
39	Done.

In D128582#3668948, @gsocshubham wrote:

Fixed review comments.

There is only one LIT test which is failing now - CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.

Regarding CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll, it is failing due to change in GEP representation. There are -O0 and -O3 checks from which -O3 checks are failing due to change in GEP representation. I tried to update testcase using - ../../../../utils/update_mir_test_checks.py --llc-binary=../../../../../install/bin/llc arm64-irtranslator.ll due to which checks had been updated but now -O0 checks are failing. The testcase contains 6k lines and is difficult to update manually.

I used this option to just update O3 checks but that is not helping -
../../../../utils/update_mir_test_checks.py --llc-binary=../../../../../install/bin/llc arm64-irtranslator.ll --filter O3

How do I just update O3 checks for testcase CodeGen/AArch64/GlobalISel/arm64-irtranslator.l? I have updated CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll using utils/update_mir_test_checks.py successfully.

Harbormaster completed remote builds in B176777: Diff 446511.Jul 21 2022, 9:03 AM

In D128582#3668948, @gsocshubham wrote:

There is only one LIT test which is failing now - CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.

It looks like it is removing a unused alloca. Try this:

index ef559652380e..3fb33ecbb2c7 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -1458,10 +1458,12 @@ define void @test_lifetime_intrin() {
 ; O3-LABEL: name: test_lifetime_intrin
 ; O3: {{%[0-9]+}}:_(p0) = G_FRAME_INDEX %stack.0.slot
 ; O3-NEXT: LIFETIME_START %stack.0.slot
+; O3-NEXT: G_STORE
 ; O3-NEXT: LIFETIME_END %stack.0.slot
 ; O3-NEXT: RET_ReallyLR
   %slot = alloca i8, i32 4
   call void @llvm.lifetime.start.p0i8(i64 0, i8* %slot)
+  store volatile i8 10, i8* %slot
   call void @llvm.lifetime.end.p0i8(i64 0, i8* %slot)
   ret void
 }

llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll
55	Make this %v1, %v2, I think is better. One of the loads is otherwise unused.
llvm/test/Transforms/SeparateConstOffsetFromGEP/split-gep.ll
1 ↗	(On Diff #446511)	Oh - If this test is using the AArch64 target then it needs to be moved into an AArch64 subdirectory, which is only ran when AArch64 is a registered target. If it doesn't exist already, make sure there is a lit.local.cfg.

Fix review comments and lit test failures - AArch64/GlobalISel/arm64-irtranslator.ll and AArch64/GlobalISel/arm64-irtranslator-gep.ll

llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll
55	Done.

In D128582#3669192, @dmgreen wrote:

In D128582#3668948, @gsocshubham wrote:

There is only one LIT test which is failing now - CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.

It looks like it is removing a unused alloca. Try this:

index ef559652380e..3fb33ecbb2c7 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
@@ -1458,10 +1458,12 @@ define void @test_lifetime_intrin() {
 ; O3-LABEL: name: test_lifetime_intrin
 ; O3: {{%[0-9]+}}:_(p0) = G_FRAME_INDEX %stack.0.slot
 ; O3-NEXT: LIFETIME_START %stack.0.slot
+; O3-NEXT: G_STORE
 ; O3-NEXT: LIFETIME_END %stack.0.slot
 ; O3-NEXT: RET_ReallyLR
   %slot = alloca i8, i32 4
   call void @llvm.lifetime.start.p0i8(i64 0, i8* %slot)
+  store volatile i8 10, i8* %slot
   call void @llvm.lifetime.end.p0i8(i64 0, i8* %slot)
   ret void
 }

Thanks a lot for this.

Are there any other suggestions on this patch?

Harbormaster completed remote builds in B176832: Diff 446593.Jul 21 2022, 12:50 PM

Yeah - the patch LGTM. Thanks.

Do you have commit access? If not, I can commit it for you, I just need a "name <name@email.com>" to attribute it to.

In D128582#3671454, @dmgreen wrote:

Yeah - the patch LGTM. Thanks.

Do you have commit access? If not, I can commit it for you, I just need a "name <name@email.com>" to attribute it to.

I do not have commit access. Please commit for me.

Here are the details -

Name - Shubham Narlawar
Email - shubham.narlawar@rrlogic.co.in

"Shubham Narlawar <shubham.narlawar@rrlogic.co.in>"

This revision was landed with ongoing or failed builds.Jul 22 2022, 7:21 AM

Closed by commit rGf55dbfbd9d8c: [AArch64] Move SeparateConstOffsetFromGEPPass before LSR and enable… (authored by gsocshubham, committed by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGf55dbfbd9d8c: [AArch64] Move SeparateConstOffsetFromGEPPass before LSR and enable….

dmgreen mentioned this in rG201b7858f695: [AArch64] Disable aarch64-enable-gep-opt.Nov 19 2022, 1:25 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetMachine.cpp

24 lines

test/

CodeGen/

AArch64/

GlobalISel/

arm64-irtranslator-gep.ll

72 lines

arm64-irtranslator.ll

2 lines

O3-pipeline.ll

13 lines

cond-br-tuning.ll

7 lines

Transforms/

SeparateConstOffsetFromGEP/

AArch64/

lit.local.cfg

2 lines

split-gep.ll

32 lines

Diff 446828

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
static cl::opt<bool>		static cl::opt<bool>
EnableCondOpt("aarch64-enable-condopt",		EnableCondOpt("aarch64-enable-condopt",
cl::desc("Enable the condition optimizer pass"),		cl::desc("Enable the condition optimizer pass"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableGEPOpt("aarch64-enable-gep-opt", cl::Hidden,		EnableGEPOpt("aarch64-enable-gep-opt", cl::Hidden,
cl::desc("Enable optimizations on complex GEPs"),		cl::desc("Enable optimizations on complex GEPs"),
cl::init(false));		cl::init(true));

static cl::opt<bool>		static cl::opt<bool>
BranchRelaxation("aarch64-enable-branch-relax", cl::Hidden, cl::init(true),		BranchRelaxation("aarch64-enable-branch-relax", cl::Hidden, cl::init(true),
cl::desc("Relax out of range conditional branches"));		cl::desc("Relax out of range conditional branches"));

static cl::opt<bool> EnableCompressJumpTables(		static cl::opt<bool> EnableCompressJumpTables(
"aarch64-enable-compress-jump-tables", cl::Hidden, cl::init(true),		"aarch64-enable-compress-jump-tables", cl::Hidden, cl::init(true),
cl::desc("Use smallest entry possible for jump tables"));		cl::desc("Use smallest entry possible for jump tables"));
▲ Show 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	void AArch64PassConfig::addIRPasses() {
// pointer values N iterations ahead.		// pointer values N iterations ahead.
if (TM->getOptLevel() != CodeGenOpt::None) {		if (TM->getOptLevel() != CodeGenOpt::None) {
if (EnableLoopDataPrefetch)		if (EnableLoopDataPrefetch)
addPass(createLoopDataPrefetchPass());		addPass(createLoopDataPrefetchPass());
if (EnableFalkorHWPFFix)		if (EnableFalkorHWPFFix)
addPass(createFalkorMarkStridedAccessesPass());		addPass(createFalkorMarkStridedAccessesPass());
}		}

TargetPassConfig::addIRPasses();

addPass(createAArch64StackTaggingPass(
/IsOptNone=/TM->getOptLevel() == CodeGenOpt::None));

// Match interleaved memory accesses to ldN/stN intrinsics.
if (TM->getOptLevel() != CodeGenOpt::None) {
addPass(createInterleavedLoadCombinePass());
addPass(createInterleavedAccessPass());
}

if (TM->getOptLevel() == CodeGenOpt::Aggressive && EnableGEPOpt) {		if (TM->getOptLevel() == CodeGenOpt::Aggressive && EnableGEPOpt) {
// Call SeparateConstOffsetFromGEP pass to extract constants within indices		// Call SeparateConstOffsetFromGEP pass to extract constants within indices
// and lower a GEP with multiple indices to either arithmetic operations or		// and lower a GEP with multiple indices to either arithmetic operations or
// multiple GEPs with single index.		// multiple GEPs with single index.
addPass(createSeparateConstOffsetFromGEPPass(true));		addPass(createSeparateConstOffsetFromGEPPass(true));
// Call EarlyCSE pass to find and remove subexpressions in the lowered		// Call EarlyCSE pass to find and remove subexpressions in the lowered
// result.		// result.
addPass(createEarlyCSEPass());		addPass(createEarlyCSEPass());
// Do loop invariant code motion in case part of the lowered result is		// Do loop invariant code motion in case part of the lowered result is
// invariant.		// invariant.
addPass(createLICMPass());		addPass(createLICMPass());
}		}

		TargetPassConfig::addIRPasses();

		addPass(createAArch64StackTaggingPass(
		/IsOptNone=/TM->getOptLevel() == CodeGenOpt::None));

		// Match interleaved memory accesses to ldN/stN intrinsics.
		if (TM->getOptLevel() != CodeGenOpt::None) {
		addPass(createInterleavedLoadCombinePass());
		addPass(createInterleavedAccessPass());
		}

// Add Control Flow Guard checks.		// Add Control Flow Guard checks.
if (TM->getTargetTriple().isOSWindows())		if (TM->getTargetTriple().isOSWindows())
addPass(createCFGuardCheckPass());		addPass(createCFGuardCheckPass());

if (TM->Options.JMCInstrument)		if (TM->Options.JMCInstrument)
addPass(createJMCInstrumenterPass());		addPass(createJMCInstrumenterPass());
}		}

▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -O0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 \| FileCheck %s --check-prefix=O0			; RUN: llc -O0 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 \| FileCheck %s --check-prefix=O0
	; RUN: llc -O3 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 \| FileCheck %s --check-prefix=O3			; RUN: llc -O3 -stop-after=irtranslator -global-isel -verify-machineinstrs %s -o - 2>&1 \| FileCheck %s --check-prefix=O3
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--"			target triple = "aarch64--"

	define i32 @cse_gep([4 x i32]* %ptr, i32 %idx) {			define i32 @cse_gep([4 x i32]* %ptr, i32 %idx) {
	; O0-LABEL: name: cse_gep			; O0-LABEL: name: cse_gep
	; O0: bb.1 (%ir-block.0):			; O0: bb.1 (%ir-block.0):
	; O0: liveins: $w1, $x0			; O0-NEXT: liveins: $w1, $x0
	; O0: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; O0-NEXT: {{ $}}
	; O0: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1			; O0-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; O0: [[SEXT:%[0-9]+]]:_(s64) = G_SEXT [[COPY1]](s32)			; O0-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
	; O0: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 16			; O0-NEXT: [[SEXT:%[0-9]+]]:_(s64) = G_SEXT [[COPY1]](s32)
	; O0: [[MUL:%[0-9]+]]:_(s64) = G_MUL [[SEXT]], [[C]]			; O0-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 16
	; O0: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[MUL]](s64)			; O0-NEXT: [[MUL:%[0-9]+]]:_(s64) = G_MUL [[SEXT]], [[C]]
	; O0: [[COPY2:%[0-9]+]]:_(p0) = COPY [[PTR_ADD]](p0)			; O0-NEXT: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[MUL]](s64)
	; O0: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY2]](p0) :: (load (s32) from %ir.gep1)			; O0-NEXT: [[COPY2:%[0-9]+]]:_(p0) = COPY [[PTR_ADD]](p0)
	; O0: [[MUL1:%[0-9]+]]:_(s64) = G_MUL [[SEXT]], [[C]]			; O0-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY2]](p0) :: (load (s32) from %ir.gep1)
	; O0: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[MUL1]](s64)			; O0-NEXT: [[MUL1:%[0-9]+]]:_(s64) = G_MUL [[SEXT]], [[C]]
	; O0: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 4			; O0-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[MUL1]](s64)
	; O0: [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD1]], [[C1]](s64)			; O0-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
	; O0: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD2]](p0) :: (load (s32) from %ir.gep2)			; O0-NEXT: [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD1]], [[C1]](s64)
	; O0: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[LOAD1]], [[LOAD1]]			; O0-NEXT: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD2]](p0) :: (load (s32) from %ir.gep2)
	; O0: $w0 = COPY [[ADD]](s32)			; O0-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[LOAD]], [[LOAD1]]
	; O0: RET_ReallyLR implicit $w0			; O0-NEXT: $w0 = COPY [[ADD]](s32)
				; O0-NEXT: RET_ReallyLR implicit $w0
	; O3-LABEL: name: cse_gep			; O3-LABEL: name: cse_gep
	; O3: bb.1 (%ir-block.0):			; O3: bb.1 (%ir-block.0):
	; O3: liveins: $w1, $x0			; O3-NEXT: liveins: $w1, $x0
	; O3: [[COPY:%[0-9]+]]:_(p0) = COPY $x0			; O3-NEXT: {{ $}}
	; O3: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1			; O3-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x0
	; O3: [[SEXT:%[0-9]+]]:_(s64) = G_SEXT [[COPY1]](s32)			; O3-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $w1
	; O3: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 16			; O3-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
	; O3: [[MUL:%[0-9]+]]:_(s64) = G_MUL [[SEXT]], [[C]]			; O3-NEXT: [[SEXT:%[0-9]+]]:_(s64) = G_SEXT [[COPY1]](s32)
	; O3: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[MUL]](s64)			; O3-NEXT: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 16
	; O3: [[COPY2:%[0-9]+]]:_(p0) = COPY [[PTR_ADD]](p0)			; O3-NEXT: [[MUL:%[0-9]+]]:_(s64) = G_MUL [[SEXT]], [[C1]]
	; O3: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY2]](p0) :: (load (s32) from %ir.gep1)			; O3-NEXT: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[MUL]](s64)
	; O3: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 4			; O3-NEXT: [[COPY2:%[0-9]+]]:_(p0) = COPY [[PTR_ADD]](p0)
	; O3: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD]], [[C1]](s64)			; O3-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[COPY2]](p0) :: (load (s32) from %ir.gep1)
	; O3: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD1]](p0) :: (load (s32) from %ir.gep2)			; O3-NEXT: [[SHL:%[0-9]+]]:_(s64) = G_SHL [[SEXT]], [[C]](s64)
	; O3: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[LOAD1]], [[LOAD1]]			; O3-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY]], [[SHL]](s64)
	; O3: $w0 = COPY [[ADD]](s32)			; O3-NEXT: [[COPY3:%[0-9]+]]:_(p0) = COPY [[PTR_ADD1]](p0)
	; O3: RET_ReallyLR implicit $w0			; O3-NEXT: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
				; O3-NEXT: [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[COPY3]], [[C2]](s64)
				; O3-NEXT: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD2]](p0) :: (load (s32) from %ir.3)
				; O3-NEXT: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[LOAD]], [[LOAD1]]
				; O3-NEXT: $w0 = COPY [[ADD]](s32)
				; O3-NEXT: RET_ReallyLR implicit $w0
	%sidx = sext i32 %idx to i64			%sidx = sext i32 %idx to i64
	%gep1 = getelementptr inbounds [4 x i32], [4 x i32]* %ptr, i64 %sidx, i64 0			%gep1 = getelementptr inbounds [4 x i32], [4 x i32]* %ptr, i64 %sidx, i64 0
	%v1 = load i32, i32* %gep1			%v1 = load i32, i32* %gep1
	%gep2 = getelementptr inbounds [4 x i32], [4 x i32]* %ptr, i64 %sidx, i64 1			%gep2 = getelementptr inbounds [4 x i32], [4 x i32]* %ptr, i64 %sidx, i64 1
	%v2 = load i32, i32* %gep2			%v2 = load i32, i32* %gep2
	%res = add i32 %v2, %v2			%res = add i32 %v1, %v2
				dmgreenUnsubmitted Not Done Reply Inline Actions Make this %v1, %v2, I think is better. One of the loads is otherwise unused. dmgreen: Make this %v1, %v2, I think is better. One of the loads is otherwise unused.
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
	ret i32 %res			ret i32 %res
	}			}

llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll

	Show First 20 Lines • Show All 1,452 Lines • ▼ Show 20 Lines
	declare void @llvm.lifetime.start.p0i8(i64, i8*)			declare void @llvm.lifetime.start.p0i8(i64, i8*)
	declare void @llvm.lifetime.end.p0i8(i64, i8*)			declare void @llvm.lifetime.end.p0i8(i64, i8*)
	define void @test_lifetime_intrin() {			define void @test_lifetime_intrin() {
	; CHECK-LABEL: name: test_lifetime_intrin			; CHECK-LABEL: name: test_lifetime_intrin
	; CHECK: RET_ReallyLR			; CHECK: RET_ReallyLR
	; O3-LABEL: name: test_lifetime_intrin			; O3-LABEL: name: test_lifetime_intrin
	; O3: {{%[0-9]+}}:_(p0) = G_FRAME_INDEX %stack.0.slot			; O3: {{%[0-9]+}}:_(p0) = G_FRAME_INDEX %stack.0.slot
	; O3-NEXT: LIFETIME_START %stack.0.slot			; O3-NEXT: LIFETIME_START %stack.0.slot
				; O3-NEXT: G_STORE
	; O3-NEXT: LIFETIME_END %stack.0.slot			; O3-NEXT: LIFETIME_END %stack.0.slot
	; O3-NEXT: RET_ReallyLR			; O3-NEXT: RET_ReallyLR
	%slot = alloca i8, i32 4			%slot = alloca i8, i32 4
	call void @llvm.lifetime.start.p0i8(i64 0, i8* %slot)			call void @llvm.lifetime.start.p0i8(i64 0, i8* %slot)
				store volatile i8 10, i8* %slot
	call void @llvm.lifetime.end.p0i8(i64 0, i8* %slot)			call void @llvm.lifetime.end.p0i8(i64 0, i8* %slot)
	ret void			ret void
	}			}

	define void @test_load_store_atomics(i8* %addr) {			define void @test_load_store_atomics(i8* %addr) {
	; CHECK-LABEL: name: test_load_store_atomics			; CHECK-LABEL: name: test_load_store_atomics
	; CHECK: [[ADDR:%[0-9]+]]:_(p0) = COPY $x0			; CHECK: [[ADDR:%[0-9]+]]:_(p0) = COPY $x0
	; CHECK: [[V0:%[0-9]+]]:_(s8) = G_LOAD [[ADDR]](p0) :: (load unordered (s8) from %ir.addr)			; CHECK: [[V0:%[0-9]+]]:_(s8) = G_LOAD [[ADDR]](p0) :: (load unordered (s8) from %ir.addr)
	▲ Show 20 Lines • Show All 1,023 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	; RUN: llc --debugify-and-strip-all-safe=0 -mtriple=arm64-- -O3 -debug-pass=Structure < %s -o /dev/null 2>&1 \| \			; RUN: llc --debugify-and-strip-all-safe=0 -mtriple=arm64-- -O3 -debug-pass=Structure < %s -o /dev/null 2>&1 \| \
	; RUN: grep -v "Verify generated machine code" \| FileCheck %s			; RUN: grep -v "Verify generated machine code" \| FileCheck %s

	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK-LABEL: Pass Arguments:			; CHECK-LABEL: Pass Arguments:
	; CHECK-NEXT: Target Library Information			; CHECK-NEXT: Target Library Information
	; CHECK-NEXT: Target Pass Configuration			; CHECK-NEXT: Target Pass Configuration
	; CHECK-NEXT: Machine Module Information			; CHECK-NEXT: Machine Module Information
	; CHECK-NEXT: Target Transform Information			; CHECK-NEXT: Target Transform Information
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
				dmgreenUnsubmitted Not Done Reply Inline Actions This should be removed, as the file doesn't use the script. dmgreen: This should be removed, as the file doesn't use the script.
	; CHECK-NEXT: Type-Based Alias Analysis			; CHECK-NEXT: Type-Based Alias Analysis
	; CHECK-NEXT: Scoped NoAlias Alias Analysis			; CHECK-NEXT: Scoped NoAlias Alias Analysis
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: SVE intrinsics optimizations			; CHECK-NEXT: SVE intrinsics optimizations
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Data Prefetch			; CHECK-NEXT: Loop Data Prefetch
	; CHECK-NEXT: Falkor HW Prefetch Fix			; CHECK-NEXT: Falkor HW Prefetch Fix
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Split GEPs to a variadic base and a constant offset for better CSE
				; CHECK-NEXT: Early CSE
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
				; CHECK-NEXT: Function Alias Analysis Results
				; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
				; CHECK-NEXT: LCSSA Verifier
				; CHECK-NEXT: Loop-Closed SSA Form Pass
				; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
				; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Canonicalize Freeze Instructions in Loops			; CHECK-NEXT: Canonicalize Freeze Instructions in Loops
	; CHECK-NEXT: Induction Variable Users			; CHECK-NEXT: Induction Variable Users
	; CHECK-NEXT: Loop Strength Reduction			; CHECK-NEXT: Loop Strength Reduction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Merge contiguous icmps into a memcmp			; CHECK-NEXT: Merge contiguous icmps into a memcmp
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/cond-br-tuning.ll

	Show All 21 Lines
	L2:			L2:
	store i32 1, i32* %ptr, align 4			store i32 1, i32* %ptr, align 4
	ret void			ret void
	}			}

	define void @test_add_cbz_multiple_use(i32 %a, i32 %b, i32* %ptr) {			define void @test_add_cbz_multiple_use(i32 %a, i32 %b, i32* %ptr) {
	; CHECK-LABEL: test_add_cbz_multiple_use:			; CHECK-LABEL: test_add_cbz_multiple_use:
	; CHECK: // %bb.0: // %common.ret			; CHECK: // %bb.0: // %common.ret
	; CHECK-NEXT: adds w8, w0, w1			; CHECK-NEXT: mov w8, #10
	; CHECK-NEXT: csel w8, wzr, w8, ne			; CHECK-NEXT: adds w9, w0, w1
				; CHECK-NEXT: csel w8, w8, w9, ne
	; CHECK-NEXT: str w8, [x2]			; CHECK-NEXT: str w8, [x2]
				dmgreenUnsubmitted Not Done Reply Inline Actions Change the store here to store 10 not 0. That should keep this testing what it did previously. dmgreen: Change the store here to store 10 not 0. That should keep this testing what it did previously.
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%c = add nsw i32 %a, %b			%c = add nsw i32 %a, %b
	%d = icmp ne i32 %c, 0			%d = icmp ne i32 %c, 0
				dmgreenUnsubmitted Not Done Reply Inline Actions %d = icmp ne i32 %c, 10 dmgreen: %d = icmp ne i32 %c, 10
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Do you mean it to change it to `%d = icmp ne i32 %c, 0`? It is already `%d = icmp ne i32 %c, 10` gsocshubham: Do you mean it to change it to `%d = icmp ne i32 %c, 0`? It is already `%d = icmp ne i32 %c…
				dmgreenUnsubmitted Not Done Reply Inline Actions Yeah sorry, that is what I meant. The store should be storing a value that isn't 0, so the csel isn't optimized away. The cmp should still not be present (it does get optimized, as it can re-use the adds flags). dmgreen: Yeah sorry, that is what I meant. The store should be storing a value that isn't 0, so the csel…
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
	br i1 %d, label %L1, label %L2			br i1 %d, label %L1, label %L2
	L1:			L1:
	store i32 0, i32* %ptr, align 4			store i32 10, i32* %ptr, align 4
				dmgreenUnsubmitted Not Done Reply Inline Actions store i32 10, i32* %ptr, align 4 dmgreen: store i32 10, i32* %ptr, align 4
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
	ret void			ret void
	L2:			L2:
	store i32 %c, i32* %ptr, align 4			store i32 %c, i32* %ptr, align 4
	ret void			ret void
	}			}

	define void @test_add_cbz_64(i64 %a, i64 %b, i64* %ptr) {			define void @test_add_cbz_64(i64 %a, i64 %b, i64* %ptr) {
	; CHECK-LABEL: test_add_cbz_64:			; CHECK-LABEL: test_add_cbz_64:
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/test/Transforms/SeparateConstOffsetFromGEP/AArch64/lit.local.cfg

This file was added.

				if not 'AArch64' in config.root.targets:
				config.unsupported = True

llvm/test/Transforms/SeparateConstOffsetFromGEP/AArch64/split-gep.ll

This file was added.

				; RUN: llc < %s -O3 -mtriple=aarch64-linux-gnu \| FileCheck %s

				%struct = type { i32, i32, i32 }

				define i32 @test1(%struct* %ptr, i64 %idx) {
				; CHECK-LABEL: test1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #12
				; CHECK-NEXT: madd x8, x1, x8, x0
				; CHECK-NEXT: ldr w9, [x8, #4]
				; CHECK-NEXT: tbnz w9, #31, .LBB0_2
				; CHECK-NEXT: // %bb.1:
				; CHECK-NEXT: mov w0, wzr
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB0_2: // %then
				; CHECK-NEXT: ldr w8, [x8, #8]
				; CHECK-NEXT: add w0, w9, w8
				; CHECK-NEXT: ret
				%gep.1 = getelementptr %struct, %struct* %ptr, i64 %idx, i32 1
				%lv.1 = load i32, i32* %gep.1
				%c = icmp slt i32 %lv.1, 0
				br i1 %c, label %then, label %else

				then:
				%gep.2 = getelementptr %struct, %struct* %ptr, i64 %idx, i32 2
				%lv.2 = load i32, i32* %gep.2
				%res = add i32 %lv.1, %lv.2
				ret i32 %res

				else:
				ret i32 0
				}

This is an archive of the discontinued LLVM Phabricator instance.

Move SeparateConstOffsetFromGEPPass() before LSR() and enable EnableGEPOpt by default.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 446828

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator-gep.ll

llvm/test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AArch64/cond-br-tuning.ll

llvm/test/Transforms/SeparateConstOffsetFromGEP/AArch64/lit.local.cfg

llvm/test/Transforms/SeparateConstOffsetFromGEP/AArch64/split-gep.ll

Move SeparateConstOffsetFromGEPPass() before LSR() and enable EnableGEPOpt by default.
ClosedPublic