This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.
ClosedPublic

Authored by • HaoLiu on Oct 19 2014, 11:39 PM.

Download Raw Diff

Details

Reviewers

ssijaric
t.p.northover
jingyue
mcrosier

Summary

Hi Tim, Jingyue and other reviewers,

This patch is based on http://reviews.llvm.org/D5863, which fixes some problems in the SeparateConstOffsetFromGEP pass. So please apply that patch first if you want to have a try.

We find LLVM cannot handle CSE well in GEPs (getelementptrs). The GEP can be very complex, it can has many mixed constant indices and variable indices. The variable indices may also contain other constants. And such complex GEPs are usually kept in CodeGen. But as CodeGen can only see in one basic block. For the GEPs across basic blocks (e.g, two GEPs used in two different basic blocks or one GEP used in two basic blocks), it may have CSE opportunities and will be missed.

Currently there is a pass called SeparateConstOffsetFromGEP, which can separate constant within variable indices and split a complex GEP into two GEPs. But this is not enough for the problem that GEPs across basic blocks I mentioned above.

So I improve this pass. It will separate constant within indices for both sequential and struct types. And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks. Also it can benefit address sinking logic in CodeGenPrepare for those complex GEPs which can not be matched by addressing mode, as part of the address calculation can still be sunk. The address sinking logic can result in better addressing mode. EarlyCSE pass is called after this pass to do CSE. LICM pass is also called to do loop invariant code motion in case any of the address calculations are invariant.

If we don't find CSE opportunities after such arithmetic transformation, it still has no obvious regression on performance, as it will always do such transformation in CodeGen. We just move such transformation several passes ahead of CodeGen.

I tested the performance for A57 of AArch64 target on SPEC CPU2000 and SPEC CPU2006. It has no obvious regressions and has improvements on following benchmarks:
spec2006.473.astar 4.7%
spec2006.444.namd 3.0%
spec2006.445.gobmk 2.5%

For the benchmarks don't have obvious improvement, we can also see the address calculation and addressing mode are better from the assembly code.

For other targets like NVPTX, I can not test this patch. I think this patch can also benefit the performance, at least it has no regression.

Review please.

Thanks,
-Hao

Diff Detail

Event Timeline

• HaoLiu updated this revision to Diff 15133.Oct 19 2014, 11:39 PM

• HaoLiu retitled this revision from to [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend..

• HaoLiu updated this object.

• HaoLiu edited the test plan for this revision. (Show Details)

• HaoLiu added reviewers: t.p.northover, jingyue.

• HaoLiu added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 19 2014, 11:39 PM

And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks.

You'll need to disable this transformation when AA is enabled in CodeGen. You can do this just like CGP does (essentially by calling TM->getSubtarget<TargetSubtargetInfo>().useAA()). CGP splits the GEPs into simpler GEPs when AA is in use, and uses inttoptr otherwise. This all relates to the way that we use BasicAA to override TBAA and catch cases of basic type punning -- but BasicAA does not analyze inttoptr, and so can no longer fulfill this need after new inttoptrs are introduced. You could try using simpler GEPs like CGP does when AA is in use, or just disable this part of the transformation for now. Either is fine with me.

• HaoLiu updated this revision to Diff 15234.Oct 21 2014, 11:45 PM

• HaoLiu updated this revision to Diff 15235.Oct 21 2014, 11:57 PM

In D5864#5, @hfinkel wrote:

And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks.

You'll need to disable this transformation when AA is enabled in CodeGen. You can do this just like CGP does (essentially by calling TM->getSubtarget<TargetSubtargetInfo>().useAA()). CGP splits the GEPs into simpler GEPs when AA is in use, and uses inttoptr otherwise. This all relates to the way that we use BasicAA to override TBAA and catch cases of basic type punning -- but BasicAA does not analyze inttoptr, and so can no longer fulfill this need after new inttoptrs are introduced. You could try using simpler GEPs like CGP does when AA is in use, or just disable this part of the transformation for now. Either is fine with me.

Hi Hal,

Thanks for the explanation on useAA(). We do transformation because using simpler GEPs to replace a complex GEP like CGP does has small impact. Such transformation is not in CGP, because we want to make use of the constant extraction in SeparateConstOffsetFromGEP pass, and we also want to call other passes like EarlyCSE to remove common sub-expressions after such transformation. For the performance concern, I prefer to enable such transformation in AArch64 backend.
But unfortunately, AArch64 backend enables AA for both cortex-a53 and cortex-a57:

bool useAA() const override { return isCortexA53() || isCortexA57(); }

This was introduced recently in http://reviews.llvm.org/D5103. I find most of the backend including X86,ARM doesn't enable AA. AArch64 backend also disables AA when CPU is cyclone. As Sanjin also says enabling AA for CortexA57 doesn't have performance changes in D5103, I think we can just remove CortexA57 from useAA(), so that we can get benefit from such transformation and has no regression.

For CortexA53, I don't test how much benefit from enabling AA in code gen. Maybe we can test following situations and find the best solution:

(1) disable AA, enable Transformation.
(2) enable AA, disable Transformation.
(3) enable AA and Transformation.

Anyway, such test need a lot of time. So I think we'd better only consider about A57, so I've attached a new patch (diff ID 15234) following your suggestion and remove A57 form useAA().

Review please.

Thanks,
-Hao

• HaoLiu added reviewers: ssijaric, mcrosier.Oct 22 2014, 12:09 AM

Hi James,

I think there are two main reasons this approach is outside of CGP:

(1) I want to make use of the constant extraction logic in SeparateConstOffsetFromGEP pass. It can find more constants

(2) CGP only sinks GEPs, it doesn’t do CSE. So I think it’s better to do such logic separately in SeparateConstOffsetFromGEP pass before CGP. This won’t affect the address sinking logic in CGP.

Thanks,

-Hao

From: mankeyrabbit@gmail.com [mailto:mankeyrabbit@gmail.com] On Behalf Of James Molloy
Sent: Wednesday, October 22, 2014 4:01 AM
To: reviews+D5864+public+2de5fd757ea4c578@reviews.llvm.org
Cc: Hao Liu; t.p.northover@gmail.com; jingyue@google.com; llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Hi Hao,

It's great to finally see this go upstream!

When you started working on this, the approach was to modify the GEP lowering logic in CGP. Hal's comment has reminded me that your approach has moved away from that, so could you please explain why you're doing this outside of CGP, and why your solution does not duplicate the CGP lowering code?

How do they interact?

Cheers,

James

Original Message -----

From: "Hao Liu" <Hao.Liu@arm.com>
To: "Hao Liu" <Hao.Liu@arm.com>, "t p northover" <t.p.northover@gmail.com>, jingyue@google.com
Cc: hfinkel@anl.gov, "amara emerson" <amara.emerson@arm.com>, llvm-commits@cs.uiuc.edu
Sent: Wednesday, October 22, 2014 2:07:12 AM
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

In D5864#5, @hfinkel wrote:

And most important is that it will also transform complex GEPs
into a "ptrtoint + arithmetic + inttoptr" form, so that it is
able to find CSE opportunities across basic blocks.

You'll need to disable this transformation when AA is enabled in
CodeGen. You can do this just like CGP does (essentially by
calling TM->getSubtarget<TargetSubtargetInfo>().useAA()). CGP
splits the GEPs into simpler GEPs when AA is in use, and uses
inttoptr otherwise. This all relates to the way that we use
BasicAA to override TBAA and catch cases of basic type punning --
but BasicAA does not analyze inttoptr, and so can no longer
fulfill this need after new inttoptrs are introduced. You could
try using simpler GEPs like CGP does when AA is in use, or just
disable this part of the transformation for now. Either is fine
with me.

Hi Hal,

Thanks for the explanation on useAA(). We do transformation because
using simpler GEPs to replace a complex GEP like CGP does has small
impact. Such transformation is not in CGP, because we want to make
use of the constant extraction in SeparateConstOffsetFromGEP pass,
and we also want to call other passes like EarlyCSE to remove common
sub-expressions after such transformation. For the performance
concern, I prefer to enable such transformation in AArch64 backend.
But unfortunately, AArch64 backend enables AA for both cortex-a53 and
cortex-a57:
bool useAA() const override { return isCortexA53() ||
isCortexA57(); }
This was introduced recently in http://reviews.llvm.org/D5103. I find
most of the backend including X86,ARM doesn't enable AA. AArch64
backend also disables AA when CPU is cyclone. As Sanjin also says
enabling AA for CortexA57 doesn't have performance changes in D5103,
I think we can just remove CortexA57 from useAA(), so that we can
get benefit from such transformation and has no regression.

For CortexA53, I don't test how much benefit from enabling AA in code
gen. Maybe we can test following situations and find the best
solution:
(1) disable AA, enable Transformation.
(2) enable AA, disable Transformation.
(3) enable AA and Transformation.
Anyway, such test need a lot of time. So I think we'd better only
consider about A57, so I've attached a new patch (diff ID 15234)
following your suggestion and remove A57 form useAA().

Hi Hao,

This is not quite what I had in mind... when I said that we should disable the transformation, I did not mean that you should disable the entire pass, but rather, that you should only disable the part of SeparateConstOffsetFromGEP that breaks apart GEPs into ptrtoint/inttoptr when useAA is true like CGP does. Does SeparateConstOffsetFromGEP not convey benefits without this part of the transformation? Can you try implementing the same thing by breaking the GEPs into simpler ones like CGP does when useAA is true?

As a general note, while I understand why CGP breaks GEPs into ptrtoint/inttoptr, I'd like to move the generic CodeGen passes away from doing that. If you look at CGP, implementing this in terms of GEPs is pretty straightforward. When testing on x86 using -addr-sink-using-gep (even with useAA false), performance generally improves over using the ptrtoint/inttoptr (although there are a couple of regressions I need to investigate when I have some time). As a quasi-separate issue, I'm curious to know what using -addr-sink-using-gep does on ARM/AArch64. Regardless, I'm excited to see this work because I'd like to make SeparateConstOffsetFromGEP available to more backends -- including PowerPC that uses AA more aggressively -- and so I don't want the pass itself to introduce ptrtoint/inttoptr (at least not when useAA is true).

Thanks again,
Hal

Review please.

Thanks,
-Hao

http://reviews.llvm.org/D5864

Sorry, a wrong patch is attached, it should check not useAA()... So add another patch to fix this problem.

This is not quite what I had in mind... when I said that we should disable the transformation, I did not mean that you should disable the entire pass, but rather, that you should only disable the part of SeparateConstOffsetFromGEP that breaks apart GEPs into ptrtoint/inttoptr when useAA is true like CGP does. Does SeparateConstOffsetFromGEP not convey benefits without this part of the transformation? Can you try implementing the same thing by breaking the GEPs into simpler ones like CGP does when useAA is true?

Hi Hal,

I think I understand your point now. I've attached a new patch. It will do arithmetic transformation only when the useAA() returns false, or it will transform a complex GEP into several simpler GEPs like CGP does.
The correctness has been tested. The performance test has not finished yet. But the assembly code looks similar to the old patch for cortex-a57. If the result is better than the old patch, I'll remove the logic about transformation to ptrtoint/inttoptr.

Review please.

Thanks,
-Hao

This generally looks good to me. It would be good to have someone expert in this pass to also look it over.

the performance test has not finished yet. But the assembly code looks similar to the old patch for cortex-a57. If the result is better than the old patch, I'll remove the logic about transformation to ptrtoint/inttoptr.

I am curious how this turns out; please let us know.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
86	Even we can extract constant from GEPs -> Even if we can extract constants from GEPs
88	algorithm
984	I think that if the original GEP has the "inbounds" property, then the shl, mul here (and the same below plus the add in the ptrtoint/inttoptr code) can have the nsw/nuw flags.

• HaoLiu updated this revision to Diff 15380.Oct 24 2014, 1:15 AM

Hi Hal,

A new patch has been attached according to your comments.
BTW. For the inbounds issue, the CGP just ignores it when creating new GEP/MUL/ADD, so I think maybe it is a minor problem in CGP.

In D5864#17, @hfinkel wrote:

This generally looks good to me. It would be good to have someone expert in this pass to also look it over.

the performance test has not finished yet. But the assembly code looks similar to the old patch for cortex-a57. If the result is better than the old patch, I'll remove the logic about transformation to ptrtoint/inttoptr.

I am curious how this turns out; please let us know.

I think this is because ptrtoint/inttoptr is similar to GEP. According to http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr, the most difference is that GEP carries additional pointer aliasing rules. Also as the AA used in CodeGen doesn't change to much thing, so the assembly code is similar. It's just my guess. Two version still has much difference in instruction order.

Thanks,
-Hao

Original Message -----

From: "Hao Liu" <Hao.Liu@arm.com>
To: "Hao Liu" <Hao.Liu@arm.com>, "t p northover" <t.p.northover@gmail.com>, jingyue@google.com,
ssijaric@codeaurora.org, mcrosier@codeaurora.org
Cc: hfinkel@anl.gov, "amara emerson" <amara.emerson@arm.com>, llvm-commits@cs.uiuc.edu
Sent: Friday, October 24, 2014 3:25:14 AM
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Hi Hal,

A new patch has been attached according to your comments.
BTW. For the inbounds issue, the CGP just ignores it when creating
new GEP/MUL/ADD, so I think maybe it is a minor problem in CGP.

FWIW, we only recently started propagating NSW/NUW into the SelectionDAG layer. We should add a FIXME (or just fix this) in CGP too.

In D5864#17, @hfinkel wrote:

This generally looks good to me. It would be good to have someone
expert in this pass to also look it over.

the performance test has not finished yet. But the assembly code
looks similar to the old patch for cortex-a57. If the result is
better than the old patch, I'll remove the logic about
transformation to ptrtoint/inttoptr.

I am curious how this turns out; please let us know.

I think this is because ptrtoint/inttoptr is similar to GEP.
According to
http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr,
the most difference is that GEP carries additional pointer aliasing
rules. Also as the AA used in CodeGen doesn't change to much thing,
so the assembly code is similar. It's just my guess. Two version
still has much difference in instruction order.

Yep; I recommend testing the difference independently of enabling AA. (In CGP, as I imagine you noticed, I also have a separate flag for this purpose).

-Hal

Thanks,
-Hao

http://reviews.llvm.org/D5864

I just noticed I was put on this. Sorry for the delay, and I'll review it today.

Jingyue

The performance result shows disable useAA for CortexA57 can have more improvement. The performance improvement result for latest trunk with this patch on CortexA57 are as follows:
Version 1 enable useAA() for CortexA57:

spec.cpu2006.ref.444_namd	-4.56%
spec.cpu2006.ref.473_astar	-3.79%
spec.cpu2006.ref.445_gobmk	-2.91%

Version 2 disable useAA() for CortexA57:

spec.cpu2006.ref.473_astar	-4.84%
spec.cpu2006.ref.444_namd	-4.39%
spec.cpu2006.ref.445_gobmk	-2.80%

Both versions have obvious benefit. But the result on astar of version 2 is much better than version 1. So I've attached a patch removing CortexA57 from useAA(). useAA() is beneficial for in-order target, but it has no obvious improvement on out of order cpu. Currently cyclone, x86 also doesn't useAA(). So I think we don't need useAA for CortexA57.

Hi James,

From: mankeyrabbit@gmail.com [mailto:mankeyrabbit@gmail.com] On Behalf Of James Molloy
Sent: Monday, October 27, 2014 11:38 PM
To: reviews+D5864+public+2de5fd757ea4c578@reviews.llvm.org
Cc: Hao Liu; Tim Northover; ssijaric@codeaurora.org; Chad Rosier; Jingyue Wu; LLVM Commits
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Hi Hao,

I'm not sure about this. Firstly I have concerns about the statistical significance of the difference in results - we've seen swings of +/- 1% on A57, and here we're talking about a 0.5% difference. Are you satisfied that the difference is genuine?

[Hao Liu] I’m also curious about the result. I suspect both versions should have not big difference. But I think the difference is genuine. The benchmark astar is quite stable and should never have variance about +/- 1%.

Secondly, I don't see a good reason why useAA should be worse than !useAA. To my mind, if there is a regression when useAA is enabled, it should be one of two things:

A legitimate regression, in which case let's fix useAA!
Random code permutation outside of the compiler's model. If this is the case we shouldn't be adjusting the compiler to compensate.

useAA has the potential to improve codegen. We're not really using it at the moment, but it's good that it's available to use in the future. If we rip it out, we should have a good reason for it, I think!

Can I suggest that you make the initial commit, which improves performance, then have a look at why !useAA is better and fix that incrementally?

[Hao Liu] OK, I’d like to do that. Thanks.

Cheers,

James

A new patch adds several lines reviewed in D5863 and does some code reformat work to make it more readable.

Thanks,
-Hao

My general concern is the part extracting constant struct field index seems wrong. Unlike arrays where every elements share the same type, structure fields can have different types. Therefore, changing a field index may change the intermediate type in a pointer arithmetic, causing incorrect IR.

For example, consider

struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)

If you extract the 1, the gep becomes:

%p = getelementptr %s, 0, 0, i; &s.a[i] = s + i*sizeof(int)

and the constant offset is 4 * sizeof(int) = 16.

As you probably see already,

s + 4*sizeof(int) + i*sizeof(double) != s + 4*sizeof(int) + i*sizeof(int)

Does this make sense to you? What is a good way to fix this?

I also have a bunch of nit picks inlined.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
88	Nice explanation! However, in the final version, you may want to merge these improvements into the old comments. You can mention the improvements in the summary of the revision.
185	//, or null if we cannot extract a constant offset.
319	typo: the the given => the given a constant offsets => a constant offset wheter => whether depends on => depending on use => uses several simpler GEPs or arithmetic operations
331	a better name would be transformToArithmetics because you are using not only adds but also muls
364	Add a comment on what SimplifyGEP means.
642	If (ConstantOffset == 0) return NewIdx; // ... return Extractor.rebuild...
878	getZExtValue() returns uint64_t.
1019	The comment here is confusing. It doesn't seem to explain what the following code does.
1021	Why name it OtherConst?

Hi Jingyue,

My general concern is the part extracting constant struct field index seems wrong. Unlike arrays where every elements share the same type, structure fields can have different types. Therefore, changing a field index may change the intermediate type in a pointer arithmetic, causing incorrect IR.

For example, consider
struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)
If you extract the 1, the gep becomes:
%p = getelementptr %s, 0, 0, i; &s.a[i] = s + i*sizeof(int)
and the constant offset is 4 * sizeof(int) = 16.

As you probably see already,
s + 4*sizeof(int) + i*sizeof(double) != s + 4*sizeof(int) + i*sizeof(int)
Does this make sense to you? What is a good way to fix this?

Yes, your concern is reasonable. So to extract a index of structure type, we should not keep the original gep form of 3 indices.
For your case about "%p = getelementptr %s, 0, 1, i", we can transform it to either of two simpler forms:
1st Form: 2 simpler GEPs

%i_offset = mul %i, sizeof(double)
%s_i8 = bitcast %s to i8*
%s_i = getelementptr i8* %s_i8, %i_offset
%s_i_c = getelementptr i8* %s_i, 4*sizeof(int)
%result = bitcast %s_i_c to double*

Bitcast can be ignored, the original GEP with 3 indices are transformed into 1 MUL/SHL and 2 simpler GEPs with 1 index. If we find constant within the index "%i", it can also be added into the last GEP.

2nd Form: ptrtoint+arithmetic+inttoptr

%ptrint = ptrtoint %s to i64
%i_offset = mul %i, sizeof(double)
%s_i = add %ptrint, %i_offset
%s_i_c = add %ptrint, 4*sizeof(int)
%result = inttoptr %s_i_c to double*

This is similar, ptrtoint/inttoptr can be ignored, we transform the original GEP with 3 indices into 1 MUL/SHL and 2 ADDs.

The main benefit is that
(1) We can always extract the indices of struct types.
(2) We can do CSE easily on simpler GEPs and MUL/SHL/ADDs.

Such transformation is similar to the address sinking login in CGP pass, which also transform GEP to one of such forms by checking useAA(). For the correctness and performance, several benchmarks has been tested on AArch64 backend. I cannot test NVPTX, but I think if the test cases have complex address calculation, it can also get benefit.

Thanks,
-Hao

Add a new patch refactored according to Jingyue's comments.
Review please.

Thanks,
-Hao

Regarding the correctness issue with struct field indices, I think the solutions you proposed make sense. Look forward to your fix in the next diff. We should choose which solution to go with depending on useAA.

The logic looks good in general except the field index issue. My inline comments are mostly on refactoring.

lib/Target/AArch64/AArch64TargetMachine.cpp
208	I'd explain why running GEP reassociation can expose more opportunities for LICM. I guess only the split part (splitting a GEP with multiple indices into a set of smaller GEPs) buys you such opportunities. Right?
lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
82	Remove superfluous blank lines.
83	// Another improvement enabled by the SimplifyGEP flag is ...
93	I don't understand reason 2. Can you explain more? Are you saying, given a[i][j1] and a[i][j2], you can reuse the common part &a[i]?
365	What would be a good name? You are not really simplifying but to some extend complicating GEPs :) Does LowerGEP sound good to you? You are lowering a GEP into multiple uglygeps (btw, uglygep is a known concept in LLVM) or arithmetic operations which are closer to machine code. Make sure the comments are consistent after renaming.
857	Similarly, this function should be merged into accumulateByteOffset.
889	Can we merge the two transformToXXX functions? The logics are pretty similar.
988	Looks quite similar to splitGEP. Can we massage this into splitGEP, and guard the part that lowers GEP with if (SimplifyGEP)?
1016	OK, now I see what you're doing here. According to our discussion in D5863, this looks like a temporary work around for the CFGSimplification issue, and will eventually go away. I don't think it's a good idea to hack this pass to work around that issue. I'd rather run instsimplify before SeparateConstOffset in the AArch64 backend, and add a FIXME there saying once the CFGSimplification issue is fixed we needn't run instsimplify any more.
1050	After extractConstantAndSimplifyGEP is merged into splitGEP, the code here can be simplified.

Attach a new patch refactored according to Jingyue's comments. Review please.

Thanks,
-Hao

Regarding the correctness issue with struct field indices, I think the solutions you proposed make sense. Look forward to your fix in the next diff.

Hi Jingyue,

I'm just a bit confused about the "fix" you mean. Did you mean the solution to extract constant in the structure index? The solution is already in the diff.
I'm only not sure about the "a = 1+2" problem, and I've replied to your comment. For other problems, I've fixed in the attached diff.

Thanks,
-Hao

lib/Target/AArch64/AArch64TargetMachine.cpp
208	Yes. The whole GEP may not be invariant, but after splitting, we can move the invariant part and leave the variant part in the loop.
lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
93	Yes, GEPs like "getelementptr %ptr, i, j1" and "getelementptr %ptr, i, j2" can not be eliminated. But transformed into simpler forms, "i" and "j1" "j2" are separated, and we can find the common expressions about "i".
365	That make sence.
857	Make sence. It can be controlled by the LowerGEP.
889	I think we'd better not do merge. They logic look similar, but they do quite different things. If we want to merge them, we need to add checks like "if (TransformToSimplerGEP)" in many places (From the first for loop logic to the function end). That would be a quite hard work for other people to understand the code.
1016	I just don't think this is a temporary work. We cannot assume there is no situation like "a = 1+2". People also can create a test case of "a = 1 + 2" and feed to this pass. I mean people can call this pass without any other passes. So if we don't accumulate all none zero indices in this place, we'll get incorrect result. We want to " Remove the constant offset in each GEP index" and " Splits this GEP index into a variadic part and a constant offset" in this for loop. So the later part about transformation won't consider about the none zero constant indices, it assumes there is all varaible indices or zero indices. See that I only check "if (!isa<ConstantInt>(GEP->getOperand(I)))". So if we leave an index of "constant 1" behand, it will be ignored and we'll generate incorrect result. That's why I want to make sure we have accumulated all the none zero constant indices.

I mean the issue we spotted previously with the following example:

struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)

I thought we agreed that your diff had a correctness issue with this example, and I didn't see any change to fix that issue in your latest diff. Am I missing something?

• HaoLiu added a comment.Nov 2 2014, 5:20 PM

This comment was removed by • HaoLiu.

In D5864#35, @jingyue wrote:
I mean the issue we spotted previously with the following example:
struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)
I thought we agreed that your diff had a correctness issue with this example, and I didn't see any change to fix that issue in your latest diff. Am I missing something?

Sorry, my words misled you. I said "we can transform it to either of two simpler forms", I meant this patch had already done like that.
The correctness issue has already been solved in the first patch. See more details in transformToSimplerGEPs(), which does the same as my explanation. Transformation to simpler GEPs is similar to transformation to arithmetic operations. It just replace ADDs by GEPs. So your concern about correctness issue has already been resolved. Also I have two test cases to test structure indices.

Thanks,
-Hao

ping...

I see what's going on: you only extract constant offsets from array indices, and leave struct indices in those transformXXX functions. Thanks for clarification! I think that's worth some comment though, and I will point that out in my inline comments.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
82	The word "simpler" is overloaded. I'd explicitly say something like: // to lower a GEP with multiple indices into multiple GEPs with a single index or arithmetic operations ... Is this more accurate?
102	The sentence "another simpler form of simpler GEPs is similar" is a little awkward. The word "simple" is very overloaded. What do you mean exactly by "simpler"? Consider using multiple sentences and/or an example to explain what this lowering to GEPs exactly mean.
125	This => Lowering GEPs
126	The word "optimize" is overloaded. Is hoist/sink a better word?
131	Such transformation can split it => Lowering GEPs can split such GEPs
184	You may want to say Find looks for constant offsets in both array indices and field indices. Because the signature of Extract is changed, you need to remove the comment "The meaning of the arguments and the return value ..." and explicitly explain what its arguments and return value mean.
313	I don't like the word "transform" and "simpler". To be more accurate, I'd use lower instead of transform. What do you exactly mean by "simpler"? Are you lowering a GEP with multiple indices to multiple GEPs with a single index? If that's the case, I'd say that explicitly in order to be more accurate.
319	ditto
323	will find => finds will only find => only finds else => ; otherwise,
324	none zero => non-zero
345	/// Whether to transform
713	If you buy that transformXXX should run after setIsInBounds(false), you needn't check inbound in this function.
717	These are essentially byte offsets instead of indices, because they are scaled by element sizes. Do we have a better name? ByteOffsetOfIndex?
717	Actually, is this vector even necessary? Can you build the resultant uglygep along the way?
719	vairable => variable
721	Why checking whether it's a constant? Shouldn't you check whether it is an array index? My understanding is, the offsets of structure indices (which are guaranteed to be constant) and the constant offsets of array indices are already counted in accumulativeByteOffsets, and now you only need to rebuild the variadic part of the array indices.
726	{ at the end
730	} else {
744	The meaning of "simpler" here is unclear.
779	ditto. See Line 721
784	ditto. curly brackets.
797	{
801	}
827	Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set, +AccumulativeByteOffset will still become part of the lowered GEPs or arithmetics (see how transformToSimplerGEPs create a GEP with offset AccumulativeByteOffset, and how transformToArithmetics create an add with offset AccumulativeByteOffset). Shouldn't we still check whether this offset can be nicely folded into an addressing mode?
836	// ... in each sequential index Emphasizing you only remove constant offsets from sequential indices is important, because the behavior here is slightly different from accumulateByteOffset which looks into both sequential and struct indices. You also need to explain why removing constant offsets in struct field index here is dangerous, and mention the transformXXX functions will take care of that.
863	Run these transformXXX functions before GEP->setIsInBounds(false) seems wrong to me, because transformXXX may treat off-bound GEPs as inbound and add nsw flags incorrectly.
866	"such form" is too obscure
889	The more I look at these two functions, the more strongly I believe we should merge them. Especially, after you get rid of ResultIndices, you probably only need to check LowerGEP once in the merged function. No?
1016	I'm confused. Check my comment on the line if (!isa<ConstantInt>(GEP->getOperand(I))) and if my comment there makes sense I'll come back here.

In D5864#39, @jingyue wrote:

I see what's going on: you only extract constant offsets from array indices, and leave struct indices in those transformXXX functions. Thanks for clarification! I think that's worth some comment though, and I will point that out in my inline comments.

Hi Jingyue,

Thanks for your detailed comments, which are really very helpful. I've refactored the patch and attached.
Review please.

Thanks,
-Hao

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
721	Yes, you are right. I'll add comments to explain this.
827	I think even a pointer plus a constant index can not be matched in an addressing mode, it is worth to lower a GEP with multiple indices. Because we can do optimizations on the variable indices. Also if the constant index is extracted from multiple indices, the result looks better than the original GEP. If I remove such check, some test cases will fail. So it is kept when disable LowerGEP.

jingyue added inline comments.Nov 11 2014, 10:56 AM

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
22	// We cannot eliminate the partially common subexpression getelementptr [10 x %struct]* %ptr, i64 %i
184	I don't see the first part of this comment addressed. I'd like to emphasize "both array indices and field indices".
720	Not critical, but I slightly prefer getZExtValue() because a field index is non-negative and getElementOffset takes unsigned.
721	I wasn't just proposing a comment change. I meant the source code should read something like if (isa<SequentialType>(GTI)) You are building new GEPs/arithmetics for the remainder of the array indices, and checking whether the indexed type is a SequentialType matches your intention much better. I also believe checking for SequentialType instead of ConstantInt solves your previous question on handling "a = 1 + 2". I'll go back to that later.
827	That makes some sense. Out of curiosity, which test would fail?
853	// we always don't => we don't remove struct field indices here "always don't" is not the right grammar. Besides, you do extract field indices later, so I'd weaken the word "never".
854	diable => LowerGEP is disabled
855	enable LowerGEP => LowerGEP is enabled
856	I wrote lowerToXXX in my reviews just for simplicity. However, in the source code, we should still use the full names.
889	I don't see this comment addressed.

jingyue requested changes to this revision.Nov 11 2014, 10:56 AM

jingyue edited edge metadata.

This revision now requires changes to proceed.Nov 11 2014, 10:56 AM

Hi Hao,

could you share your current performance data + options you use? With your patch I see small gains on astar + xalancbmk, but a ~4% loss on libquantum. The data is based on the 11/07 trunk on top of r221549, O3 + LTO, cyclone, ref input.
In addition to cortex, could you run on x86 also?

Thanks
Gerolf

In D5864#45, @Gerolf wrote:

Hi Hao,

could you share your current performance data + options you use? With your patch I see small gains on astar + xalancbmk, but a ~4% loss on libquantum. The data is based on the 11/07 trunk on top of r221549, O3 + LTO, cyclone, ref input.
In addition to cortex, could you run on x86 also?

Thanks
Gerolf

Hi Gerolf,

I use "-O3 -ffast-math -mcpu=cortex-a57" and ref input. My last run was on the 10/24 trunk on top of r220484 (To ignore the impact of trunk changes, I always run and compare to the specific revision). I've ran the performance test for about 4 times on this specific revision. To ignore noise and variances, I remove those benchmark who doesn't exist in every performance runs. Then I find there is no obvious regressions and have stable improvements on following 3 benchmarks (This is just results from one run, other runs have similar results):

SPEC2006.473.astar          -4.84%
SPEC2006.444.namd           -4.39%
SPEC2006.445.gobmk          -2.80%

The percentage number is calculated by: 100% - current_time / reference_time.

It's quite strange that your results have many differences from mine on cyclone. libquantum is a quite stable benchmark on our test infrastructure, and I never see performance changes on it with my patch. So I always think my patch doesn't affect its performance.

I can not test cyclone, so I run on x86 with today's trunk with and without my patch (I enabled the SeparateConstOffsetFromGEP pass on x86 backend similarly to AArch64 backend). I've ran for about 8 times on libquantum. This benchmark is not stable on my x86 machine. The results have +/-1% variances comparing to the average results with or without my patch. But the average results are similar with or without my patch. So I think my patch won't affect the performance of libquantum on x86. I've ran a performance test for all the int/fp benchmarks in SPEC 2006. Just haven't got result yet. As it needs quite a long time for the result. If it is fortunate, I'll get the result tomorrow and show to you.
So I wonder whether libquantum is stable on cyclone. If it is not stable, maybe you can run it several times to ignore the variance. Or if it is real regression, we can then investigate it.

Do you have more details about the results. I'm interested in how many improvements on astar. Also I'm interested in why there is no improvement on namd and gobmk. They are supposed to get some benefits from this patch.

Thanks,
-Hao

Hi Gerolf,

Why do you want to see the results on x86? SeparateConstOffsetFromGEP is
only used by NVPTX and AArch64 so far. I don't think it will affect the
compilation time or the compiled code on x86.

Jingyue

• HaoLiu updated this revision to Diff 16135.Nov 12 2014, 11:57 PM

• HaoLiu edited edge metadata.

Hi Jingyue,

I've attached a new patch according to the review comments.
Review please.

Thanks,
-Hao

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
184	I think "both array indices and field indices" means all indices. We don't need to explain this common behavior. It is quite obvious. I think we need comments if we only find in one kind of "array indices" or "field indices", or do something special. Anyway, the new comments will be like the comments for Extract()
720	Aha, previously I use getZExtValue(). You commented I should change to getSExtValue(). I thought it was not critical, so I modified it. Now, It seems I should roll back... : )
721	If check SequentialType, we also need to check wheter it is ConstantInt. Then the code will be like: if (isa<SequentialType>(GTI)) if (!isa<ConstantInt>(GEP->getOperand(I)) ... else AccumulativeByteOffset += Const This looks more complicate. About the "a = 1 + 2", I think handling it when extracting constant in indices is better. Then we don't need to add code in both lowerToXXX functions. Also if we handle it in lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem.
827	Oh, interesting. I supposed some of your test cases in test/Transforms/SeparateConstOffsetFromGEP/NVPTX may fail. But I tried removing such check and ran "make check", there is no test failures. So it seems there is no such test cases...
889	I still doubt about your suggestion. The diff 16027 gets ride of ResultIndices. They are still quite different and difficult to combine them. Lowering to arithmetic operations starts from creating ptrtoint, and then creates ADD/MUL/SHL, and ends with creating ADD and inttoptr. Lowering to GEPs starts from creating bitcast, and then creates MUL/SHL/GEPs, and ends with creating GEP and bitcast.
1016	I think this is the right place to do such thing. As the comments say, we want to "// Remove the constant offset in each GEP index." If we should leave some constant offset in index. Also, if we fix such things in the following lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem.

A general suggestion: I guess you upload your diff using Phab's web interface. If so, I suggest you upload patches with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface) from now on or use arc which uploads full context by default. It's a little difficult to review a diff with only 3-line contexts and Phab seems to have a little trouble computing the diff between two diffs with limited context (at least the line numbers don't match in many places).

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
184	That makes sense.
720	No, I don't think I suggested you change it to getSExtValue(). When I said "getSExtValue() returns uint64_t", I expected you to write uint64_t Field = ...->getZExtValue(); because I slightly prefer delaying truncating which loses precision. But, anyway, I'm fine with "unsigned".
721	I'm confused. How does AccumulativeByteOffset in your snippet come up here? I think AccumulativeByteOffset is computed before calling transformToSimplerGEPs. Why do you bother changing that again? In case I haven't made this clear, I never expect Find/Extract always extracts all constant offsets. For example, given a = 1 + 2, I'm totally fine if the Find function finds only 2, and in that case the remaining code should treat 1 just as a variable remainder. The GEP after extraction may look like "gep base, i1, 1, i2" which may not be the optimal form, but again I am fine with that because it's sound. If we want to get the optimal form, I am looking for either a more comprehensive way (not just handling tricky cases like adding two constants; your initial diff might serve as a good start but the code was too complicated) or reassociating each index before this pass so that all constant offsets in the index are merged (which I suppose is the approach we want to go with). With that, do we still need the nested "if (!isa<CostantInt>)" check? Do you have an example where simply checking isa<SequentialType> isn't enough? I think a concrete example may help to make things clearer.
827	That was my suspicion. This check on the addressing mode is a very weak check: I believe both NVPTX and AArch64 support reg+<a potentially very large offset>, so I would be surprised that having/not having this check would cause tests to fail. You explained pretty well why it's still worth to lower a GEP even if address mode matching fails. I'd like to see that in the source code comments. Thanks!
889	Anyway, I'll probably do that by myself after this diff is in.

• HaoLiu updated this revision to Diff 16190.Nov 13 2014, 6:58 PM

• HaoLiu edited edge metadata.

Hi Jingyue,

Nice suggestion. It really helps a lot for the review.

I've attached a new patch with full context. Review please.

Thanks,
-Hao

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
720	Oh, I double checked that. I indeed misunderstood your meaning. Last time I used "unsigned Field = xxx->getZExtValue()". not "uint64_t Field = xxx->getZExtValue()". Sorry for the noise. Anyway, diff 16135 has already fix it.
721	Yeah, I understand your meaning. We just let the "1" as a remainder. It's sound because the logic of Find/Extract is to handle only one constant and return. It's reasonable. Previously I focused on making it optimal. So I want to extract all the constant. Anyway, this is just corner case and it may be fixed in the future. I'm fine with that. BTW, there is already a test case @test_const_add checking we generate correct result for "a = 1 + 2".
889	OK, thanks.

LGTM with some minor fixes. You may want to double check with ARM folks on performance. I remember Gerolf said he couldn't get the same performance numbers as you did.

Thanks again for all the hard work!

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
148	Lower => Lowering
150	lower => we lower
154	lower => we lower
155	better => a better
333	single index => a single index
334	The grammar of the sentence "It has been extracted a constant offset" is incorrect. /// Lower a GEP with multiple indices into multiple GEPs with a single index. /// Function splitGEP already split the original GEP into a variadic part and a /// constant offset (i.e., AccumulativeByteOffset). This function lowers the /// variadic part into a set of GEPs with a single index and applies /// AccumulativeByteOffset to it. /// GEP The variadic part of the original GEP /// AccumulativeByteOffset ... I'd even like to change the name of the argument GEP to Variadic, because GEP appears so many times throughout this file and usually refers to the original GEP.
343	ditto
348	we can find => we successfully find My original comment wasn't clear enough, and I think "we successfully find" is better.
370	single index => a single index
850	The grammar of "So we don't such check" is wrong. => Therefore, we don't check for addressing modes in that case.
865	The following xxx and yyy => lowerToSingleIndexGEPs or lowerToArithmetics will later

This revision is now accepted and ready to land.Nov 14 2014, 9:14 AM

In D5864#45, @Gerolf wrote:

Hi Hao,

could you share your current performance data + options you use? With your patch I see small gains on astar + xalancbmk, but a ~4% loss on libquantum. The data is based on the 11/07 trunk on top of r221549, O3 + LTO, cyclone, ref input.
In addition to cortex, could you run on x86 also?

Hi Gerolf,

I ran spec2006 4 times on my x86 machine, but the results are always fluctuating. So I can not see whether it is better or not by enabling this pass. I think it is because I don't have much experience on testing x86.

For the AArch64 backend, recently I ran twice on r222083@trunk and r222017@trunk on Cortex-A57. The results are similar to the previous results. But it has about 1%-2% regression on 471.omnetpp. That benchmark is always fluctuating, so I can't measure it accurately. Anyway, we have much improvements than regressions.

Thanks,
-Hao

Did you try pinning the benchmark on one CPU? Scheduling can significantly
disturb the benchmark results. Also, if your CPU has any boosting feature
(similar to Intel's turbo boost), try disabling that. One of our folks,
Mark, found that can make benchmarking sensitive too.

Jingyue

In D5864#58, @jingyue wrote:

Did you try pinning the benchmark on one CPU? Scheduling can significantly
disturb the benchmark results. Also, if your CPU has any boosting feature
(similar to Intel's turbo boost), try disabling that. One of our folks,
Mark, found that can make benchmarking sensitive too.

Jingyue

I think SPEC2006 benchmarks always run on one CPU, so I didn't pin to one CPU. For the boosting feature, I tested on an old x86 machine, so I'm not sure about the features. Anyway, I'm not familiar with x86 performance tests, so I think it's better to let other experts to have a try.

Thanks,
-Hao

In D5864#54, @jingyue wrote:

LGTM with some minor fixes. You may want to double check with ARM folks on performance. I remember Gerolf said he couldn't get the same performance numbers as you did.

Thanks again for all the hard work!

Hi Jingyue,

Thanks very much for your hard work on review, which really help me a lot.
I've committed the code in http://llvm.org/viewvc/llvm-project?view=revision&revision=222328 and http://llvm.org/viewvc/llvm-project?view=revision&revision=222331.

BTW, for the performance test, I also tested other revision by some slightly modifications such as:
(1) Always lower GEPs if there is more than one index despite wheter we found a constant.
(2) Call SeparateConstOffsetFromGEP pass before calling TargetPassConfig::addIRPasses() in AArch64TargetMachine.cpp.

The performance results are different. Some benchmarks are better, some benchmarks are worse. So I just think maybe we need further tuning in the future for better performance. Anyway, that's the future work.

Thanks,
-Hao

SPEC2006 benchmarks are single-threaded, but this single thread can be
scheduled by the operating system on different CPUs (cores, hyperthreads,
etc.) in different time slices. I used to encounter fluctuation too and
fixed that using the taskset command. Not sure if that can help you too.

Also, I don't think x86 matters because this pass is only enabled in
AArch64 and NVPTX for now. I do feel a little worried on the 1%-2%
regression you observed on AArch64. It would be great if you can get a more
stable result.

Jingyue

Hi Hao,

thanks for all your efforts landing this. We’ll check the performance impact on cyclone. Did lto have any impact on your results?

Cheers
Gerolf

In D5864#61, @jingyue wrote:

SPEC2006 benchmarks are single-threaded, but this single thread can be
scheduled by the operating system on different CPUs (cores, hyperthreads,
etc.) in different time slices. I used to encounter fluctuation too and
fixed that using the taskset command. Not sure if that can help you too.

Also, I don't think x86 matters because this pass is only enabled in
AArch64 and NVPTX for now. I do feel a little worried on the 1%-2%
regression you observed on AArch64. It would be great if you can get a more
stable result.

Jingyue

Hi Jingyue,

When I run benchmarks on AArch64 machine, I use taskset because there are two kinds of cores (A53 and A57). I thought all x86 cores on my machine are the same, so I didn't use taskset. Thanks for your share. It is helpful for our future perf tests.

For the regressions, I'll keep on watching the performance results.

Thanks,
-Hao

In D5864#62, @Gerolf wrote:

Hi Hao,

thanks for all your efforts landing this. We’ll check the performance impact on cyclone. Did lto have any impact on your results?

Cheers
Gerolf

Hi Gerolf

I don't know the LTO impact. Actually I never tested on LTO and don't know how to test it.
If you think LTO will be affected, I can have a try.

Thanks,
-Hao

Alright, we’ll do an lto a/b run then. I thought you might have checked libquantum where saw a regression with an earlier version of the patch, but let’s sync up once we have the latest data.

Cheers
Gerolf

• HaoLiu closed this revision.May 13 2015, 7:52 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Scalar.h

4 lines

lib/

Target/

AArch64/

AArch64TargetMachine.cpp

18 lines

Transforms/

Scalar/

SeparateConstOffsetFromGEP.cpp

305 lines

test/

CodeGen/

AArch64/

aarch64-gep-opt.ll

162 lines

arm64-addr-mode-folding.ll

2 lines

arm64-cse.ll

2 lines

Diff 16190

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 389 Lines • ▼ Show 20 Lines
	//			//
	// AddDiscriminators - Add DWARF path discriminators to the IR.			// AddDiscriminators - Add DWARF path discriminators to the IR.
	FunctionPass *createAddDiscriminatorsPass();			FunctionPass *createAddDiscriminatorsPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// SeparateConstOffsetFromGEP - Split GEPs for better CSE			// SeparateConstOffsetFromGEP - Split GEPs for better CSE
	//			//
	FunctionPass *createSeparateConstOffsetFromGEPPass();			FunctionPass *
				createSeparateConstOffsetFromGEPPass(const TargetMachine *TM = nullptr,
				bool LowerGEP = false);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoadCombine - Combine loads into bigger loads.			// LoadCombine - Combine loads into bigger loads.
	//			//
	BasicBlockPass *createLoadCombinePass();			BasicBlockPass *createLoadCombinePass();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	EnableCondOpt("aarch64-condopt",
cl::desc("Enable the condition optimizer pass"),		cl::desc("Enable the condition optimizer pass"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableA53Fix835769("aarch64-fix-cortex-a53-835769", cl::Hidden,		EnableA53Fix835769("aarch64-fix-cortex-a53-835769", cl::Hidden,
cl::desc("Work around Cortex-A53 erratum 835769"),		cl::desc("Work around Cortex-A53 erratum 835769"),
cl::init(false));		cl::init(false));

		static cl::opt<bool>
		EnableGEPOpt("aarch64-gep-opt", cl::Hidden,
		cl::desc("Enable optimizations on complex GEPs"),
		cl::init(true));

extern "C" void LLVMInitializeAArch64Target() {		extern "C" void LLVMInitializeAArch64Target() {
// Register the target.		// Register the target.
RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);		RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);
RegisterTargetMachine<AArch64beTargetMachine> Y(TheAArch64beTarget);		RegisterTargetMachine<AArch64beTargetMachine> Y(TheAArch64beTarget);
RegisterTargetMachine<AArch64leTargetMachine> Z(TheARM64Target);		RegisterTargetMachine<AArch64leTargetMachine> Z(TheARM64Target);
}		}

/// TargetMachine ctor - Create an AArch64 architecture model.		/// TargetMachine ctor - Create an AArch64 architecture model.
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	void AArch64PassConfig::addIRPasses() {

// Cmpxchg instructions are often used with a subsequent comparison to		// Cmpxchg instructions are often used with a subsequent comparison to
// determine whether it succeeded. We can exploit existing control-flow in		// determine whether it succeeded. We can exploit existing control-flow in
// ldrex/strex loops to simplify this, but it needs tidying up.		// ldrex/strex loops to simplify this, but it needs tidying up.
if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)		if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)
addPass(createCFGSimplificationPass());		addPass(createCFGSimplificationPass());

TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();

		if (TM->getOptLevel() == CodeGenOpt::Aggressive && EnableGEPOpt) {
		// Call SeparateConstOffsetFromGEP pass to extract constants within indices
		// and lower a GEP with multiple indices to either arithmetic operations or
		// multiple GEPs with single index.
		addPass(createSeparateConstOffsetFromGEPPass(TM, true));
		// Call EarlyCSE pass to find and remove subexpressions in the lowered
		// result.
		jingyueUnsubmitted Not Done Reply Inline Actions I'd explain why running GEP reassociation can expose more opportunities for LICM. I guess only the split part (splitting a GEP with multiple indices into a set of smaller GEPs) buys you such opportunities. Right? jingyue: I'd explain why running GEP reassociation can expose more opportunities for LICM. I guess only…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yes. The whole GEP may not be invariant, but after splitting, we can move the invariant part and leave the variant part in the loop. HaoLiu: Yes. The whole GEP may not be invariant, but after splitting, we can move the invariant part…
		addPass(createEarlyCSEPass());
		// Do loop invariant code motion in case part of the lowered result is
		// invariant.
		addPass(createLICMPass());
		}
}		}

// Pass Pipeline Configuration		// Pass Pipeline Configuration
bool AArch64PassConfig::addPreISel() {		bool AArch64PassConfig::addPreISel() {
// Run promote constant before global merge, so that the promoted constants		// Run promote constant before global merge, so that the promoted constants
// get a chance to be merged		// get a chance to be merged
if (TM->getOptLevel() != CodeGenOpt::None && EnablePromoteConstant)		if (TM->getOptLevel() != CodeGenOpt::None && EnablePromoteConstant)
addPass(createAArch64PromoteConstantPass());		addPass(createAArch64PromoteConstantPass());
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

Show All 13 Lines
//		//
// for (int i = 0; i < 2; ++i) {		// for (int i = 0; i < 2; ++i) {
// for (int j = 0; j < 2; ++j) {		// for (int j = 0; j < 2; ++j) {
// ...		// ...
// ... = a[x + i][y + j];		// ... = a[x + i][y + j];
// ...		// ...
// }		// }
// }		// }
//		//
		jingyueUnsubmitted Not Done Reply Inline Actions // We cannot eliminate the partially common subexpression getelementptr [10 x %struct]* %ptr, i64 %i jingyue: // We cannot eliminate the partially common subexpression getelementptr [10 x %struct]* %ptr…
// will probably be unrolled to:		// will probably be unrolled to:
//		//
// gep %a, 0, %x, %y; load		// gep %a, 0, %x, %y; load
// gep %a, 0, %x, %y + 1; load		// gep %a, 0, %x, %y + 1; load
// gep %a, 0, %x + 1, %y; load		// gep %a, 0, %x + 1, %y; load
// gep %a, 0, %x + 1, %y + 1; load		// gep %a, 0, %x + 1, %y + 1; load
//		//
// LLVM's GVN does not use partial redundancy elimination yet, and is thus		// LLVM's GVN does not use partial redundancy elimination yet, and is thus
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
// add.s64 %rl4, %rl3, %rl2;		// add.s64 %rl4, %rl3, %rl2;
// mul.wide.u32 %rl5, %r2, 4;		// mul.wide.u32 %rl5, %r2, 4;
// add.s64 %rl6, %rl4, %rl5;		// add.s64 %rl6, %rl4, %rl5;
// ld.global.f32 %f1, [%rl6]; // so far the same as unoptimized PTX		// ld.global.f32 %f1, [%rl6]; // so far the same as unoptimized PTX
// ld.global.f32 %f2, [%rl6+4]; // much better		// ld.global.f32 %f2, [%rl6+4]; // much better
// ld.global.f32 %f3, [%rl6+128]; // much better		// ld.global.f32 %f3, [%rl6+128]; // much better
// ld.global.f32 %f4, [%rl6+132]; // much better		// ld.global.f32 %f4, [%rl6+132]; // much better
//		//
		// Another improvement enabled by the LowerGEP flag is to lower a GEP with
		jingyueUnsubmitted Not Done Reply Inline Actions Remove superfluous blank lines. jingyue: Remove superfluous blank lines.
		jingyueUnsubmitted Not Done Reply Inline Actions The word "simpler" is overloaded. I'd explicitly say something like: // to lower a GEP with multiple indices into multiple GEPs with a single index or arithmetic operations ... Is this more accurate? jingyue: The word "simpler" is overloaded. I'd explicitly say something like: // to lower a GEP with…
		// multiple indices to either multiple GEPs with a single index or arithmetic
		jingyueUnsubmitted Not Done Reply Inline Actions // Another improvement enabled by the SimplifyGEP flag is ... jingyue: // Another improvement enabled by the SimplifyGEP flag is ...
		// operations (depending on whether the target uses alias analysis in codegen).
		// Such transformation can have following benefits:
		// (1) It can always extract constants in the indices of structure type.
		hfinkelUnsubmitted Not Done Reply Inline Actions Even we can extract constant from GEPs -> Even if we can extract constants from GEPs hfinkel: Even we can extract constant from GEPs -> Even if we can extract constants from GEPs
		// (2) After such Lowering, there are more optimization opportunities such as
		// CSE, LICM and CGP.
		hfinkelUnsubmitted Not Done Reply Inline Actions algorithm hfinkel: algorithm
		jingyueUnsubmitted Not Done Reply Inline Actions Nice explanation! However, in the final version, you may want to merge these improvements into the old comments. You can mention the improvements in the summary of the revision. jingyue: Nice explanation! However, in the final version, you may want to merge these improvements into…
		//
		// E.g. The following GEPs have multiple indices:
		// BB1:
		// %p = getelementptr [10 x %struct]* %ptr, i64 %i, i64 %j1, i32 3
		// load %p
		jingyueUnsubmitted Not Done Reply Inline Actions I don't understand reason 2. Can you explain more? Are you saying, given a[i][j1] and a[i][j2], you can reuse the common part &a[i]? jingyue: I don't understand reason 2. Can you explain more? Are you saying, given a[i][j1] and a[i][j2]…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yes, GEPs like "getelementptr %ptr, i, j1" and "getelementptr %ptr, i, j2" can not be eliminated. But transformed into simpler forms, "i" and "j1" "j2" are separated, and we can find the common expressions about "i". HaoLiu: Yes, GEPs like "getelementptr %ptr, i, j1" and "getelementptr %ptr, i, j2" can not be…
		// ...
		// BB2:
		// %p2 = getelementptr [10 x %struct]* %ptr, i64 %i, i64 %j1, i32 2
		// load %p2
		// ...
		//
		// We can not do CSE for to the common part related to index "i64 %i". Lowering
		// GEPs can achieve such goals.
		// If the target does not use alias analysis in codegen, this pass will
		jingyueUnsubmitted Not Done Reply Inline Actions The sentence "another simpler form of simpler GEPs is similar" is a little awkward. The word "simple" is very overloaded. What do you mean exactly by "simpler"? Consider using multiple sentences and/or an example to explain what this lowering to GEPs exactly mean. jingyue: The sentence "another simpler form of simpler GEPs is similar" is a little awkward. The word…
		// lower a GEP with multiple indices into arithmetic operations:
		// BB1:
		// %1 = ptrtoint [10 x %struct]* %ptr to i64 ; CSE opportunity
		// %2 = mul i64 %i, length_of_10xstruct ; CSE opportunity
		// %3 = add i64 %1, %2 ; CSE opportunity
		// %4 = mul i64 %j1, length_of_struct
		// %5 = add i64 %3, %4
		// %6 = add i64 %3, struct_field_3 ; Constant offset
		// %p = inttoptr i64 %6 to i32*
		// load %p
		// ...
		// BB2:
		// %7 = ptrtoint [10 x %struct]* %ptr to i64 ; CSE opportunity
		// %8 = mul i64 %i, length_of_10xstruct ; CSE opportunity
		// %9 = add i64 %7, %8 ; CSE opportunity
		// %10 = mul i64 %j2, length_of_struct
		// %11 = add i64 %9, %10
		// %12 = add i64 %11, struct_field_2 ; Constant offset
		// %p = inttoptr i64 %12 to i32*
		// load %p2
		// ...
		//
		// If the target uses alias analysis in codegen, this pass will lower a GEP
		jingyueUnsubmitted Not Done Reply Inline Actions This => Lowering GEPs jingyue: This => Lowering GEPs
		// with multiple indices into multiple GEPs with a single index:
		jingyueUnsubmitted Not Done Reply Inline Actions The word "optimize" is overloaded. Is hoist/sink a better word? jingyue: The word "optimize" is overloaded. Is hoist/sink a better word?
		// BB1:
		// %1 = bitcast [10 x %struct]* %ptr to i8* ; CSE opportunity
		// %2 = mul i64 %i, length_of_10xstruct ; CSE opportunity
		// %3 = getelementptr i8* %1, i64 %2 ; CSE opportunity
		// %4 = mul i64 %j1, length_of_struct
		jingyueUnsubmitted Not Done Reply Inline Actions Such transformation can split it => Lowering GEPs can split such GEPs jingyue: Such transformation can split it => Lowering GEPs can split such GEPs
		// %5 = getelementptr i8* %3, i64 %4
		// %6 = getelementptr i8* %5, struct_field_3 ; Constant offset
		// %p = bitcast i8* %6 to i32*
		// load %p
		// ...
		// BB2:
		// %7 = bitcast [10 x %struct]* %ptr to i8* ; CSE opportunity
		// %8 = mul i64 %i, length_of_10xstruct ; CSE opportunity
		// %9 = getelementptr i8* %7, i64 %8 ; CSE opportunity
		// %10 = mul i64 %j2, length_of_struct
		// %11 = getelementptr i8* %9, i64 %10
		// %12 = getelementptr i8* %11, struct_field_2 ; Constant offset
		// %p2 = bitcast i8* %12 to i32*
		// load %p2
		// ...
		//
		// Lower GEPs can also benefit other passes such as LICM and CGP.
		jingyueUnsubmitted Not Done Reply Inline Actions Lower => Lowering jingyue: Lower => Lowering
		// LICM (Loop Invariant Code Motion) can not hoist/sink a GEP of multiple
		// indices if one of the index is variant. If lower such GEP into invariant
		jingyueUnsubmitted Not Done Reply Inline Actions lower => we lower jingyue: lower => we lower
		// parts and variant parts, LICM can hoist/sink those invariant parts.
		// CGP (CodeGen Prepare) tries to sink address calculations that match the
		// target's addressing modes. A GEP with multiple indices may not match and will
		// not be sunk. If lower such GEP into smaller parts, CGP may sink some of them.
		jingyueUnsubmitted Not Done Reply Inline Actions lower => we lower jingyue: lower => we lower
		// So we end up with better addressing mode.
		jingyueUnsubmitted Not Done Reply Inline Actions better => a better jingyue: better => a better
		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
		#include "llvm/Target/TargetMachine.h"
		#include "llvm/Target/TargetSubtargetInfo.h"
		#include "llvm/IR/IRBuilder.h"

using namespace llvm;		using namespace llvm;

static cl::opt<bool> DisableSeparateConstOffsetFromGEP(		static cl::opt<bool> DisableSeparateConstOffsetFromGEP(
"disable-separate-const-offset-from-gep", cl::init(false),		"disable-separate-const-offset-from-gep", cl::init(false),
cl::desc("Do not separate the constant offset from a GEP instruction"),		cl::desc("Do not separate the constant offset from a GEP instruction"),
cl::Hidden);		cl::Hidden);

namespace {		namespace {

/// \brief A helper class for separating a constant offset from a GEP index.		/// \brief A helper class for separating a constant offset from a GEP index.
///		///
		jingyueUnsubmitted Not Done Reply Inline Actions You may want to say Find looks for constant offsets in both array indices and field indices. Because the signature of Extract is changed, you need to remove the comment "The meaning of the arguments and the return value ..." and explicitly explain what its arguments and return value mean. jingyue: You may want to say Find looks for constant offsets in both array indices and field indices.
		jingyueUnsubmitted Not Done Reply Inline Actions I don't see the first part of this comment addressed. I'd like to emphasize "both array indices and field indices". jingyue: I don't see the first part of this comment addressed. I'd like to emphasize "both array indices…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think "both array indices and field indices" means all indices. We don't need to explain this common behavior. It is quite obvious. I think we need comments if we only find in one kind of "array indices" or "field indices", or do something special. Anyway, the new comments will be like the comments for Extract() HaoLiu: I think "both array indices and field indices" means all indices. We don't need to explain this…
		jingyueUnsubmitted Not Done Reply Inline Actions That makes sense. jingyue: That makes sense.
/// In real programs, a GEP index may be more complicated than a simple addition		/// In real programs, a GEP index may be more complicated than a simple addition
		jingyueUnsubmitted Not Done Reply Inline Actions //, or null if we cannot extract a constant offset. jingyue: //, or null if we cannot extract a constant offset.
/// of something and a constant integer which can be trivially splitted. For		/// of something and a constant integer which can be trivially splitted. For
/// example, to split ((a << 3) \| 5) + b, we need to search deeper for the		/// example, to split ((a << 3) \| 5) + b, we need to search deeper for the
/// constant offset, so that we can separate the index to (a << 3) + b and 5.		/// constant offset, so that we can separate the index to (a << 3) + b and 5.
///		///
/// Therefore, this class looks into the expression that computes a given GEP		/// Therefore, this class looks into the expression that computes a given GEP
/// index, and tries to find a constant integer that can be hoisted to the		/// index, and tries to find a constant integer that can be hoisted to the
/// outermost level of the expression as an addition. Not every constant in an		/// outermost level of the expression as an addition. Not every constant in an
/// expression can jump out. e.g., we cannot transform (b * (a + 5)) to (b * a +		/// expression can jump out. e.g., we cannot transform (b * (a + 5)) to (b * a +
/// 5); nor can we transform (3 * (a + 5)) to (3 * a + 5), however in this case,		/// 5); nor can we transform (3 * (a + 5)) to (3 * a + 5), however in this case,
/// -instcombine probably already optimized (3 * (a + 5)) to (3 * a + 15).		/// -instcombine probably already optimized (3 * (a + 5)) to (3 * a + 15).
class ConstantOffsetExtractor {		class ConstantOffsetExtractor {
public:		public:
/// Extracts a constant offset from the given GEP index. It outputs the		/// Extracts a constant offset from the given GEP index. It returns the
/// numeric value of the extracted constant offset (0 if failed), and a
/// new index representing the remainder (equal to the original index minus		/// new index representing the remainder (equal to the original index minus
/// the constant offset).		/// the constant offset), or nullptr if we cannot extract a constant offset.
/// \p Idx The given GEP index		/// \p Idx The given GEP index
/// \p NewIdx The new index to replace (output)
/// \p DL The datalayout of the module		/// \p DL The datalayout of the module
/// \p GEP The given GEP		/// \p GEP The given GEP
static int64_t Extract(Value Idx, Value &NewIdx, const DataLayout *DL,		static Value Extract(Value Idx, const DataLayout *DL,
GetElementPtrInst *GEP);		GetElementPtrInst *GEP);
/// Looks for a constant offset without extracting it. The meaning of the		/// Looks for a constant offset from the given GEP index without extracting
/// arguments and the return value are the same as Extract.		/// it. It returns the numeric value of the extracted constant offset (0 if
		/// failed). The meaning of the arguments are the same as Extract.
static int64_t Find(Value Idx, const DataLayout DL, GetElementPtrInst *GEP);		static int64_t Find(Value Idx, const DataLayout DL, GetElementPtrInst *GEP);

private:		private:
ConstantOffsetExtractor(const DataLayout Layout, Instruction InsertionPt)		ConstantOffsetExtractor(const DataLayout Layout, Instruction InsertionPt)
: DL(Layout), IP(InsertionPt) {}		: DL(Layout), IP(InsertionPt) {}
/// Searches the expression that computes V for a non-zero constant C s.t.		/// Searches the expression that computes V for a non-zero constant C s.t.
/// V can be reassociated into the form V' + C. If the searching is		/// V can be reassociated into the form V' + C. If the searching is
/// successful, returns C and update UserChain as a def-use chain from C to V;		/// successful, returns C and update UserChain as a def-use chain from C to V;
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
};		};

/// \brief A pass that tries to split every GEP in the function into a variadic		/// \brief A pass that tries to split every GEP in the function into a variadic
/// base and a constant offset. It is a FunctionPass because searching for the		/// base and a constant offset. It is a FunctionPass because searching for the
/// constant offset may inspect other basic blocks.		/// constant offset may inspect other basic blocks.
class SeparateConstOffsetFromGEP : public FunctionPass {		class SeparateConstOffsetFromGEP : public FunctionPass {
public:		public:
static char ID;		static char ID;
SeparateConstOffsetFromGEP() : FunctionPass(ID) {		SeparateConstOffsetFromGEP(const TargetMachine *TM = nullptr,
		bool LowerGEP = false)
		: FunctionPass(ID), TM(TM), LowerGEP(LowerGEP) {
initializeSeparateConstOffsetFromGEPPass(*PassRegistry::getPassRegistry());		initializeSeparateConstOffsetFromGEPPass(*PassRegistry::getPassRegistry());
}		}

		jingyueUnsubmitted Not Done Reply Inline Actions I don't like the word "transform" and "simpler". To be more accurate, I'd use lower instead of transform. What do you exactly mean by "simpler"? Are you lowering a GEP with multiple indices to multiple GEPs with a single index? If that's the case, I'd say that explicitly in order to be more accurate. jingyue: I don't like the word "transform" and "simpler". To be more accurate, I'd use lower instead of…
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<DataLayoutPass>();		AU.addRequired<DataLayoutPass>();
AU.addRequired<TargetTransformInfo>();		AU.addRequired<TargetTransformInfo>();
}		}

bool doInitialization(Module &M) override {		bool doInitialization(Module &M) override {
		jingyueUnsubmitted Not Done Reply Inline Actions typo: the the given => the given a constant offsets => a constant offset wheter => whether depends on => depending on use => uses several simpler GEPs or arithmetic operations jingyue: typo: the the given => the given a constant offsets => a constant offset wheter => whether…
		jingyueUnsubmitted Not Done Reply Inline Actions ditto jingyue: ditto
DataLayoutPass *DLP = getAnalysisIfAvailable<DataLayoutPass>();		DataLayoutPass *DLP = getAnalysisIfAvailable<DataLayoutPass>();
if (DLP == nullptr)		if (DLP == nullptr)
report_fatal_error("data layout missing");		report_fatal_error("data layout missing");
DL = &DLP->getDataLayout();		DL = &DLP->getDataLayout();
		jingyueUnsubmitted Not Done Reply Inline Actions will find => finds will only find => only finds else => ; otherwise, jingyue: will find => finds will only find => only finds else => ; otherwise,
return false;		return false;
		jingyueUnsubmitted Not Done Reply Inline Actions none zero => non-zero jingyue: none zero => non-zero
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

private:		private:
/// Tries to split the given GEP into a variadic base and a constant offset,		/// Tries to split the given GEP into a variadic base and a constant offset,
/// and returns true if the splitting succeeds.		/// and returns true if the splitting succeeds.
		jingyueUnsubmitted Not Done Reply Inline Actions a better name would be transformToArithmetics because you are using not only adds but also muls jingyue: a better name would be transformToArithmetics because you are using not only adds but also muls
bool splitGEP(GetElementPtrInst *GEP);		bool splitGEP(GetElementPtrInst *GEP);
/// Finds the constant offset within each index, and accumulates them. This		/// Lower a GEP with multiple indices into multiple GEPs with single index.
		jingyueUnsubmitted Not Done Reply Inline Actions single index => a single index jingyue: single index => a single index
/// function only inspects the GEP without changing it. The output		/// \p GEP The given GEP. It has been extracted a constant
		jingyueUnsubmitted Not Done Reply Inline Actions The grammar of the sentence "It has been extracted a constant offset" is incorrect. /// Lower a GEP with multiple indices into multiple GEPs with a single index. /// Function splitGEP already split the original GEP into a variadic part and a /// constant offset (i.e., AccumulativeByteOffset). This function lowers the /// variadic part into a set of GEPs with a single index and applies /// AccumulativeByteOffset to it. /// GEP The variadic part of the original GEP /// AccumulativeByteOffset ... I'd even like to change the name of the argument GEP to Variadic, because GEP appears so many times throughout this file and usually refers to the original GEP. jingyue: The grammar of the sentence "It has been extracted a constant offset" is incorrect. ``` ///…
/// NeedsExtraction indicates whether we can extract a non-zero constant		/// offset.
/// offset from any index.		/// \p AccumulativeByteOffset The extracted constant offset.
		void lowerToSingleIndexGEPs(GetElementPtrInst *GEP,
		int64_t AccumulativeByteOffset);
		/// Lower a GEP with multiple indices into ptrtoint+arithmetics+inttoptr form.
		/// \p GEP The given GEP. It has been extracted a constant
		/// offset.
		/// \p AccumulativeByteOffset The extracted constant offset.
		void lowerToArithmetics(GetElementPtrInst *GEP,
		jingyueUnsubmitted Not Done Reply Inline Actions ditto jingyue: ditto
		int64_t AccumulativeByteOffset);
		/// Finds the constant offset within each index and accumulates them. If
		jingyueUnsubmitted Not Done Reply Inline Actions /// Whether to transform jingyue: /// Whether to transform
		/// LowerGEP is true, it finds in indices of both sequential and structure
		/// types, otherwise it only finds in sequential indices. The output
		/// NeedsExtraction indicates whether we can find a non-zero constant offset.
		jingyueUnsubmitted Not Done Reply Inline Actions we can find => we successfully find My original comment wasn't clear enough, and I think "we successfully find" is better. jingyue: we can find => we successfully find My original comment wasn't clear enough, and I think "we…
int64_t accumulateByteOffset(GetElementPtrInst *GEP, bool &NeedsExtraction);		int64_t accumulateByteOffset(GetElementPtrInst *GEP, bool &NeedsExtraction);
/// Canonicalize array indices to pointer-size integers. This helps to		/// Canonicalize array indices to pointer-size integers. This helps to
/// simplify the logic of splitting a GEP. For example, if a + b is a		/// simplify the logic of splitting a GEP. For example, if a + b is a
/// pointer-size integer, we have		/// pointer-size integer, we have
/// gep base, a + b = gep (gep base, a), b		/// gep base, a + b = gep (gep base, a), b
/// However, this equality may not hold if the size of a + b is smaller than		/// However, this equality may not hold if the size of a + b is smaller than
/// the pointer size, because LLVM conceptually sign-extends GEP indices to		/// the pointer size, because LLVM conceptually sign-extends GEP indices to
/// pointer size before computing the address		/// pointer size before computing the address
/// (http://llvm.org/docs/LangRef.html#id181).		/// (http://llvm.org/docs/LangRef.html#id181).
///		///
/// This canonicalization is very likely already done in clang and		/// This canonicalization is very likely already done in clang and
/// instcombine. Therefore, the program will probably remain the same.		/// instcombine. Therefore, the program will probably remain the same.
///		///
/// Returns true if the module changes.		/// Returns true if the module changes.
///		///
/// Verified in @i32_add in split-gep.ll		/// Verified in @i32_add in split-gep.ll
		jingyueUnsubmitted Not Done Reply Inline Actions Add a comment on what SimplifyGEP means. jingyue: Add a comment on what SimplifyGEP means.
bool canonicalizeArrayIndicesToPointerSize(GetElementPtrInst *GEP);		bool canonicalizeArrayIndicesToPointerSize(GetElementPtrInst *GEP);
		jingyueUnsubmitted Not Done Reply Inline Actions What would be a good name? You are not really simplifying but to some extend complicating GEPs :) Does LowerGEP sound good to you? You are lowering a GEP into multiple uglygeps (btw, uglygep is a known concept in LLVM) or arithmetic operations which are closer to machine code. Make sure the comments are consistent after renaming. jingyue: What would be a good name? You are not really simplifying but to some extend complicating…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions That make sence. HaoLiu: That make sence.

const DataLayout *DL;		const DataLayout *DL;
		const TargetMachine *TM;
		/// Whether to lower a GEP with multiple indices into arithmetic operations or
		/// multiple GEPs with single index.
		jingyueUnsubmitted Not Done Reply Inline Actions single index => a single index jingyue: single index => a single index
		bool LowerGEP;
};		};
} // anonymous namespace		} // anonymous namespace

char SeparateConstOffsetFromGEP::ID = 0;		char SeparateConstOffsetFromGEP::ID = 0;
INITIALIZE_PASS_BEGIN(		INITIALIZE_PASS_BEGIN(
SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",		SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",
"Split GEPs to a variadic base and a constant offset for better CSE", false,		"Split GEPs to a variadic base and a constant offset for better CSE", false,
false)		false)
INITIALIZE_AG_DEPENDENCY(TargetTransformInfo)		INITIALIZE_AG_DEPENDENCY(TargetTransformInfo)
INITIALIZE_PASS_DEPENDENCY(DataLayoutPass)		INITIALIZE_PASS_DEPENDENCY(DataLayoutPass)
INITIALIZE_PASS_END(		INITIALIZE_PASS_END(
SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",		SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",
"Split GEPs to a variadic base and a constant offset for better CSE", false,		"Split GEPs to a variadic base and a constant offset for better CSE", false,
false)		false)

FunctionPass *llvm::createSeparateConstOffsetFromGEPPass() {		FunctionPass *
return new SeparateConstOffsetFromGEP();		llvm::createSeparateConstOffsetFromGEPPass(const TargetMachine *TM,
		bool LowerGEP) {
		return new SeparateConstOffsetFromGEP(TM, LowerGEP);
}		}

bool ConstantOffsetExtractor::CanTraceInto(bool SignExtended,		bool ConstantOffsetExtractor::CanTraceInto(bool SignExtended,
bool ZeroExtended,		bool ZeroExtended,
BinaryOperator *BO,		BinaryOperator *BO,
bool NonNegative) {		bool NonNegative) {
// We only consider ADD, SUB and OR, because a non-zero constant found in		// We only consider ADD, SUB and OR, because a non-zero constant found in
// expressions composed of these operations can be easily hoisted as a		// expressions composed of these operations can be easily hoisted as a
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) {
BO->setOperand(OpNo, NextInChain);		BO->setOperand(OpNo, NextInChain);
BO->setHasNoSignedWrap(false);		BO->setHasNoSignedWrap(false);
BO->setHasNoUnsignedWrap(false);		BO->setHasNoUnsignedWrap(false);
// Make sure it appears after all instructions we've inserted so far.		// Make sure it appears after all instructions we've inserted so far.
BO->moveBefore(IP);		BO->moveBefore(IP);
return BO;		return BO;
}		}

int64_t ConstantOffsetExtractor::Extract(Value Idx, Value &NewIdx,		Value ConstantOffsetExtractor::Extract(Value Idx, const DataLayout *DL,
		jingyueUnsubmitted Not Done Reply Inline Actions If (ConstantOffset == 0) return NewIdx; // ... return Extractor.rebuild... jingyue: If (ConstantOffset == 0) return NewIdx; // ... return Extractor.rebuild...
const DataLayout *DL,
GetElementPtrInst *GEP) {		GetElementPtrInst *GEP) {
ConstantOffsetExtractor Extractor(DL, GEP);		ConstantOffsetExtractor Extractor(DL, GEP);
// Find a non-zero constant offset first.		// Find a non-zero constant offset first.
APInt ConstantOffset =		APInt ConstantOffset =
Extractor.find(Idx, /* SignExtended / false, / ZeroExtended */ false,		Extractor.find(Idx, /* SignExtended / false, / ZeroExtended */ false,
GEP->isInBounds());		GEP->isInBounds());
if (ConstantOffset != 0) {		if (ConstantOffset == 0)
		return nullptr;
// Separates the constant offset from the GEP index.		// Separates the constant offset from the GEP index.
NewIdx = Extractor.rebuildWithoutConstOffset();		return Extractor.rebuildWithoutConstOffset();
}
return ConstantOffset.getSExtValue();
}		}

int64_t ConstantOffsetExtractor::Find(Value Idx, const DataLayout DL,		int64_t ConstantOffsetExtractor::Find(Value Idx, const DataLayout DL,
GetElementPtrInst *GEP) {		GetElementPtrInst *GEP) {
// If Idx is an index of an inbound GEP, Idx is guaranteed to be non-negative.		// If Idx is an index of an inbound GEP, Idx is guaranteed to be non-negative.
return ConstantOffsetExtractor(DL, GEP)		return ConstantOffsetExtractor(DL, GEP)
.find(Idx, /* SignExtended / false, / ZeroExtended */ false,		.find(Idx, /* SignExtended / false, / ZeroExtended */ false,
GEP->isInBounds())		GEP->isInBounds())
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	SeparateConstOffsetFromGEP::accumulateByteOffset(GetElementPtrInst *GEP,
for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
if (isa<SequentialType>(*GTI)) {		if (isa<SequentialType>(*GTI)) {
// Tries to extract a constant offset from this GEP index.		// Tries to extract a constant offset from this GEP index.
int64_t ConstantOffset =		int64_t ConstantOffset =
ConstantOffsetExtractor::Find(GEP->getOperand(I), DL, GEP);		ConstantOffsetExtractor::Find(GEP->getOperand(I), DL, GEP);
if (ConstantOffset != 0) {		if (ConstantOffset != 0) {
NeedsExtraction = true;		NeedsExtraction = true;
// A GEP may have multiple indices. We accumulate the extracted		// A GEP may have multiple indices. We accumulate the extracted
// constant offset to a byte offset, and later offset the remainder of		// constant offset to a byte offset, and later offset the remainder of
		jingyueUnsubmitted Not Done Reply Inline Actions If you buy that transformXXX should run after setIsInBounds(false), you needn't check inbound in this function. jingyue: If you buy that transformXXX should run after setIsInBounds(false), you needn't check inbound…
// the original GEP with this byte offset.		// the original GEP with this byte offset.
AccumulativeByteOffset +=		AccumulativeByteOffset +=
ConstantOffset * DL->getTypeAllocSize(GTI.getIndexedType());		ConstantOffset * DL->getTypeAllocSize(GTI.getIndexedType());
}		}
		jingyueUnsubmitted Not Done Reply Inline Actions These are essentially byte offsets instead of indices, because they are scaled by element sizes. Do we have a better name? ByteOffsetOfIndex? jingyue: These are essentially byte offsets instead of indices, because they are scaled by element sizes.
		jingyueUnsubmitted Not Done Reply Inline Actions Actually, is this vector even necessary? Can you build the resultant uglygep along the way? jingyue: Actually, is this vector even necessary? Can you build the resultant uglygep along the way?
		} else if (LowerGEP) {
		StructType StTy = cast<StructType>(GTI);
		jingyueUnsubmitted Not Done Reply Inline Actions vairable => variable jingyue: vairable => variable
		uint64_t Field = cast<ConstantInt>(GEP->getOperand(I))->getZExtValue();
		jingyueUnsubmitted Not Done Reply Inline Actions Not critical, but I slightly prefer getZExtValue() because a field index is non-negative and getElementOffset takes unsigned. jingyue: Not critical, but I slightly prefer getZExtValue() because a field index is non-negative and…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Aha, previously I use getZExtValue(). You commented I should change to getSExtValue(). I thought it was not critical, so I modified it. Now, It seems I should roll back... : ) HaoLiu: Aha, previously I use getZExtValue(). You commented I should change to getSExtValue(). I…
		jingyueUnsubmitted Not Done Reply Inline Actions No, I don't think I suggested you change it to getSExtValue(). When I said "getSExtValue() returns uint64_t", I expected you to write uint64_t Field = ...->getZExtValue(); because I slightly prefer delaying truncating which loses precision. But, anyway, I'm fine with "unsigned". jingyue: No, I don't think I suggested you change it to getSExtValue(). When I said "getSExtValue()…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Oh, I double checked that. I indeed misunderstood your meaning. Last time I used "unsigned Field = xxx->getZExtValue()". not "uint64_t Field = xxx->getZExtValue()". Sorry for the noise. Anyway, diff 16135 has already fix it. HaoLiu: Oh, I double checked that. I indeed misunderstood your meaning. Last time I used "unsigned…
		// Skip field 0 as the offset is always 0.
		jingyueUnsubmitted Not Done Reply Inline Actions Why checking whether it's a constant? Shouldn't you check whether it is an array index? My understanding is, the offsets of structure indices (which are guaranteed to be constant) and the constant offsets of array indices are already counted in accumulativeByteOffsets, and now you only need to rebuild the variadic part of the array indices. jingyue: Why checking whether it's a constant? Shouldn't you check whether it is an array index? My…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yes, you are right. I'll add comments to explain this. HaoLiu: Yes, you are right. I'll add comments to explain this.
		jingyueUnsubmitted Not Done Reply Inline Actions I wasn't just proposing a comment change. I meant the source code should read something like if (isa<SequentialType>(GTI)) You are building new GEPs/arithmetics for the remainder of the array indices, and checking whether the indexed type is a SequentialType matches your intention much better. I also believe checking for SequentialType instead of ConstantInt solves your previous question on handling "a = 1 + 2". I'll go back to that later. jingyue: I wasn't just proposing a comment change. I meant the source code should read something like…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions If check SequentialType, we also need to check wheter it is ConstantInt. Then the code will be like: if (isa<SequentialType>(GTI)) if (!isa<ConstantInt>(GEP->getOperand(I)) ... else AccumulativeByteOffset += Const This looks more complicate. About the "a = 1 + 2", I think handling it when extracting constant in indices is better. Then we don't need to add code in both lowerToXXX functions. Also if we handle it in lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem. HaoLiu: If check SequentialType, we also need to check wheter it is ConstantInt. Then the code will be…
		jingyueUnsubmitted Not Done Reply Inline Actions I'm confused. How does AccumulativeByteOffset in your snippet come up here? I think AccumulativeByteOffset is computed before calling transformToSimplerGEPs. Why do you bother changing that again? In case I haven't made this clear, I never expect Find/Extract always extracts all constant offsets. For example, given a = 1 + 2, I'm totally fine if the Find function finds only 2, and in that case the remaining code should treat 1 just as a variable remainder. The GEP after extraction may look like "gep base, i1, 1, i2" which may not be the optimal form, but again I am fine with that because it's sound. If we want to get the optimal form, I am looking for either a more comprehensive way (not just handling tricky cases like adding two constants; your initial diff might serve as a good start but the code was too complicated) or reassociating each index before this pass so that all constant offsets in the index are merged (which I suppose is the approach we want to go with). With that, do we still need the nested "if (!isa<CostantInt>)" check? Do you have an example where simply checking isa<SequentialType> isn't enough? I think a concrete example may help to make things clearer. jingyue: I'm confused. How does AccumulativeByteOffset in your snippet come up here? I think…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I understand your meaning. We just let the "1" as a remainder. It's sound because the logic of Find/Extract is to handle only one constant and return. It's reasonable. Previously I focused on making it optimal. So I want to extract all the constant. Anyway, this is just corner case and it may be fixed in the future. I'm fine with that. BTW, there is already a test case @test_const_add checking we generate correct result for "a = 1 + 2". HaoLiu: Yeah, I understand your meaning. We just let the "1" as a remainder. It's sound because the…
		if (Field != 0) {
		NeedsExtraction = true;
		AccumulativeByteOffset +=
		DL->getStructLayout(StTy)->getElementOffset(Field);
		}
		jingyueUnsubmitted Not Done Reply Inline Actions { at the end jingyue: { at the end
}		}
}		}
return AccumulativeByteOffset;		return AccumulativeByteOffset;
}		}
		jingyueUnsubmitted Not Done Reply Inline Actions } else { jingyue: } else {

		void SeparateConstOffsetFromGEP::lowerToSingleIndexGEPs(
		GetElementPtrInst *GEP, int64_t AccumulativeByteOffset) {
		IRBuilder<> Builder(GEP);
		Type *IntPtrTy = DL->getIntPtrType(GEP->getType());

		Type *I8PtrTy =
		Builder.getInt8PtrTy(GEP->getType()->getPointerAddressSpace());
		Value *ResultPtr = GEP->getOperand(0);
		if (ResultPtr->getType() != I8PtrTy)
		ResultPtr = Builder.CreateBitCast(ResultPtr, I8PtrTy);

		gep_type_iterator GTI = gep_type_begin(*GEP);
		// Create an ugly GEP for each sequential index. We don't create GEPs for
		jingyueUnsubmitted Not Done Reply Inline Actions The meaning of "simpler" here is unclear. jingyue: The meaning of "simpler" here is unclear.
		// structure indices, as they are accumulated in the constant offset index.
		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
		if (isa<SequentialType>(*GTI)) {
		Value *Idx = GEP->getOperand(I);
		// Skip zero indices.
		if (ConstantInt *CI = dyn_cast<ConstantInt>(Idx))
		if (CI->isZero())
		continue;

		APInt ElementSize = APInt(IntPtrTy->getIntegerBitWidth(),
		DL->getTypeAllocSize(GTI.getIndexedType()));
		// Scale the index by element size.
		if (ElementSize != 1) {
		if (ElementSize.isPowerOf2()) {
		Idx = Builder.CreateShl(
		Idx, ConstantInt::get(IntPtrTy, ElementSize.logBase2()));
		} else {
		Idx = Builder.CreateMul(Idx, ConstantInt::get(IntPtrTy, ElementSize));
		}
		}
		// Create an ugly GEP with single index for each index.
		ResultPtr = Builder.CreateGEP(ResultPtr, Idx, "uglygep");
		}
		}

		// Create a GEP with the constant offset index.
		if (AccumulativeByteOffset != 0) {
		Value *Offset = ConstantInt::get(IntPtrTy, AccumulativeByteOffset);
		ResultPtr = Builder.CreateGEP(ResultPtr, Offset, "uglygep");
		}
		if (ResultPtr->getType() != GEP->getType())
		ResultPtr = Builder.CreateBitCast(ResultPtr, GEP->getType());

		GEP->replaceAllUsesWith(ResultPtr);
		GEP->eraseFromParent();
		jingyueUnsubmitted Not Done Reply Inline Actions ditto. See Line 721 jingyue: ditto. See Line 721
		}

		void
		SeparateConstOffsetFromGEP::lowerToArithmetics(GetElementPtrInst *GEP,
		int64_t AccumulativeByteOffset) {
		jingyueUnsubmitted Not Done Reply Inline Actions ditto. curly brackets. jingyue: ditto. curly brackets.
		IRBuilder<> Builder(GEP);
		Type *IntPtrTy = DL->getIntPtrType(GEP->getType());

		Value *ResultPtr = Builder.CreatePtrToInt(GEP->getOperand(0), IntPtrTy);
		gep_type_iterator GTI = gep_type_begin(*GEP);
		// Create ADD/SHL/MUL arithmetic operations for each sequential indices. We
		// don't create arithmetics for structure indices, as they are accumulated
		// in the constant offset index.
		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
		if (isa<SequentialType>(*GTI)) {
		Value *Idx = GEP->getOperand(I);
		// Skip zero indices.
		if (ConstantInt *CI = dyn_cast<ConstantInt>(Idx))
		jingyueUnsubmitted Not Done Reply Inline Actions { jingyue: {
		if (CI->isZero())
		continue;

		APInt ElementSize = APInt(IntPtrTy->getIntegerBitWidth(),
		jingyueUnsubmitted Not Done Reply Inline Actions } jingyue: }
		DL->getTypeAllocSize(GTI.getIndexedType()));
		// Scale the index by element size.
		if (ElementSize != 1) {
		if (ElementSize.isPowerOf2()) {
		Idx = Builder.CreateShl(
		Idx, ConstantInt::get(IntPtrTy, ElementSize.logBase2()));
		} else {
		Idx = Builder.CreateMul(Idx, ConstantInt::get(IntPtrTy, ElementSize));
		}
		}
		// Create an ADD for each index.
		ResultPtr = Builder.CreateAdd(ResultPtr, Idx);
		}
		}

		// Create an ADD for the constant offset index.
		if (AccumulativeByteOffset != 0) {
		ResultPtr = Builder.CreateAdd(
		ResultPtr, ConstantInt::get(IntPtrTy, AccumulativeByteOffset));
		}

		ResultPtr = Builder.CreateIntToPtr(ResultPtr, GEP->getType());
		GEP->replaceAllUsesWith(ResultPtr);
		GEP->eraseFromParent();
		}

		jingyueUnsubmitted Not Done Reply Inline Actions Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set, +AccumulativeByteOffset will still become part of the lowered GEPs or arithmetics (see how transformToSimplerGEPs create a GEP with offset AccumulativeByteOffset, and how transformToArithmetics create an add with offset AccumulativeByteOffset). Shouldn't we still check whether this offset can be nicely folded into an addressing mode? jingyue: Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think even a pointer plus a constant index can not be matched in an addressing mode, it is worth to lower a GEP with multiple indices. Because we can do optimizations on the variable indices. Also if the constant index is extracted from multiple indices, the result looks better than the original GEP. If I remove such check, some test cases will fail. So it is kept when disable LowerGEP. HaoLiu: I think even a pointer plus a constant index can not be matched in an addressing mode, it is…
		jingyueUnsubmitted Not Done Reply Inline Actions That makes some sense. Out of curiosity, which test would fail? jingyue: That makes some sense. Out of curiosity, which test would fail?
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Oh, interesting. I supposed some of your test cases in test/Transforms/SeparateConstOffsetFromGEP/NVPTX may fail. But I tried removing such check and ran "make check", there is no test failures. So it seems there is no such test cases... HaoLiu: Oh, interesting. I supposed some of your test cases in…
		jingyueUnsubmitted Not Done Reply Inline Actions That was my suspicion. This check on the addressing mode is a very weak check: I believe both NVPTX and AArch64 support reg+<a potentially very large offset>, so I would be surprised that having/not having this check would cause tests to fail. You explained pretty well why it's still worth to lower a GEP even if address mode matching fails. I'd like to see that in the source code comments. Thanks! jingyue: That was my suspicion. This check on the addressing mode is a very weak check: I believe both…
bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {		bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
// Skip vector GEPs.		// Skip vector GEPs.
if (GEP->getType()->isVectorTy())		if (GEP->getType()->isVectorTy())
return false;		return false;

// The backend can already nicely handle the case where all indices are		// The backend can already nicely handle the case where all indices are
// constant.		// constant.
if (GEP->hasAllConstantIndices())		if (GEP->hasAllConstantIndices())
return false;		return false;
		jingyueUnsubmitted Not Done Reply Inline Actions // ... in each sequential index Emphasizing you only remove constant offsets from sequential indices is important, because the behavior here is slightly different from accumulateByteOffset which looks into both sequential and struct indices. You also need to explain why removing constant offsets in struct field index here is dangerous, and mention the transformXXX functions will take care of that. jingyue: // ... in each sequential index Emphasizing you only remove constant offsets from sequential…

bool Changed = canonicalizeArrayIndicesToPointerSize(GEP);		bool Changed = canonicalizeArrayIndicesToPointerSize(GEP);

bool NeedsExtraction;		bool NeedsExtraction;
int64_t AccumulativeByteOffset = accumulateByteOffset(GEP, NeedsExtraction);		int64_t AccumulativeByteOffset = accumulateByteOffset(GEP, NeedsExtraction);

if (!NeedsExtraction)		if (!NeedsExtraction)
return Changed;		return Changed;
// Before really splitting the GEP, check whether the backend supports the		// If LowerGEP is disabled, before really splitting the GEP, check whether the
// addressing mode we are about to produce. If no, this splitting probably		// backend supports the addressing mode we are about to produce. If no, this
// won't be beneficial.		// splitting probably won't be beneficial.
		// If LowerGEP is enabled, even the extracted constant offset can not match
		// the addressing mode, we can still do optimizations to other lowered parts
		// of variable indices. So we don't such check.
		jingyueUnsubmitted Not Done Reply Inline Actions The grammar of "So we don't such check" is wrong. => Therefore, we don't check for addressing modes in that case. jingyue: The grammar of "So we don't such check" is wrong. => Therefore, we don't check for addressing…
		if (!LowerGEP) {
TargetTransformInfo &TTI = getAnalysis<TargetTransformInfo>();		TargetTransformInfo &TTI = getAnalysis<TargetTransformInfo>();
if (!TTI.isLegalAddressingMode(GEP->getType()->getElementType(),		if (!TTI.isLegalAddressingMode(GEP->getType()->getElementType(),
		jingyueUnsubmitted Not Done Reply Inline Actions // we always don't => we don't remove struct field indices here "always don't" is not the right grammar. Besides, you do extract field indices later, so I'd weaken the word "never". jingyue: // we always don't => we don't remove struct field indices here "always don't" is not the…
/BaseGV=/nullptr, AccumulativeByteOffset,		/BaseGV=/nullptr, AccumulativeByteOffset,
		jingyueUnsubmitted Not Done Reply Inline Actions diable => LowerGEP is disabled jingyue: diable => LowerGEP is disabled
/HasBaseReg=/true, /Scale=/0)) {		/HasBaseReg=/true, /Scale=/0)) {
		jingyueUnsubmitted Not Done Reply Inline Actions enable LowerGEP => LowerGEP is enabled jingyue: enable LowerGEP => LowerGEP is enabled
return Changed;		return Changed;
		jingyueUnsubmitted Not Done Reply Inline Actions I wrote lowerToXXX in my reviews just for simplicity. However, in the source code, we should still use the full names. jingyue: I wrote lowerToXXX in my reviews just for simplicity. However, in the source code, we should…
}		}
		jingyueUnsubmitted Not Done Reply Inline Actions Similarly, this function should be merged into accumulateByteOffset. jingyue: Similarly, this function should be merged into accumulateByteOffset.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Make sence. It can be controlled by the LowerGEP. HaoLiu: Make sence. It can be controlled by the LowerGEP.
		}

// Remove the constant offset in each GEP index. The resultant GEP computes		// Remove the constant offset in each sequential index. The resultant GEP
// the variadic base.		// computes the variadic base.
		// Notice that we don't remove struct field indices here. If LowerGEP is
		// disabled, a structure index is not accumulated and we still use the old
		jingyueUnsubmitted Not Done Reply Inline Actions Run these transformXXX functions before GEP->setIsInBounds(false) seems wrong to me, because transformXXX may treat off-bound GEPs as inbound and add nsw flags incorrectly. jingyue: Run these transformXXX functions before GEP->setIsInBounds(false) seems wrong to me, because…
		// one. If LowerGEP is enabled, a structure index is accumulated in the
		// constant offset. The following lowerToSingleIndexGEPs and
		jingyueUnsubmitted Not Done Reply Inline Actions The following xxx and yyy => lowerToSingleIndexGEPs or lowerToArithmetics will later jingyue: The following xxx and yyy => lowerToSingleIndexGEPs or lowerToArithmetics will later
		// lowerToArithmetics will handle the constant offset and won't need a new
		jingyueUnsubmitted Not Done Reply Inline Actions "such form" is too obscure jingyue: "such form" is too obscure
		// structure index.
gep_type_iterator GTI = gep_type_begin(*GEP);		gep_type_iterator GTI = gep_type_begin(*GEP);
for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
if (isa<SequentialType>(*GTI)) {		if (isa<SequentialType>(*GTI)) {
Value *NewIdx = nullptr;		// Splits this GEP index into a variadic part and a constant offset, and
// Tries to extract a constant offset from this GEP index.		// uses the variadic part as the new index.
int64_t ConstantOffset =		Value *NewIdx =
ConstantOffsetExtractor::Extract(GEP->getOperand(I), NewIdx, DL, GEP);		ConstantOffsetExtractor::Extract(GEP->getOperand(I), DL, GEP);
if (ConstantOffset != 0) {		if (NewIdx != nullptr) {
assert(NewIdx != nullptr &&
"ConstantOffset != 0 implies NewIdx is set");
GEP->setOperand(I, NewIdx);		GEP->setOperand(I, NewIdx);
}		}
}		}
		jingyueUnsubmitted Not Done Reply Inline Actions getZExtValue() returns uint64_t. jingyue: getZExtValue() returns uint64_t.
}		}

// Clear the inbounds attribute because the new index may be off-bound.		// Clear the inbounds attribute because the new index may be off-bound.
// e.g.,		// e.g.,
//		//
// b = add i64 a, 5		// b = add i64 a, 5
// addr = gep inbounds float* p, i64 b		// addr = gep inbounds float* p, i64 b
//		//
// is transformed to:		// is transformed to:
//		//
// addr2 = gep float* p, i64 a		// addr2 = gep float* p, i64 a
		jingyueUnsubmitted Not Done Reply Inline Actions Can we merge the two transformToXXX functions? The logics are pretty similar. jingyue: Can we merge the two transformToXXX functions? The logics are pretty similar.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think we'd better not do merge. They logic look similar, but they do quite different things. If we want to merge them, we need to add checks like "if (TransformToSimplerGEP)" in many places (From the first for loop logic to the function end). That would be a quite hard work for other people to understand the code. HaoLiu: I think we'd better not do merge. They logic look similar, but they do quite different things.
		jingyueUnsubmitted Not Done Reply Inline Actions The more I look at these two functions, the more strongly I believe we should merge them. Especially, after you get rid of ResultIndices, you probably only need to check LowerGEP once in the merged function. No? jingyue: The more I look at these two functions, the more strongly I believe we should merge them.
		jingyueUnsubmitted Not Done Reply Inline Actions I don't see this comment addressed. jingyue: I don't see this comment addressed.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I still doubt about your suggestion. The diff 16027 gets ride of ResultIndices. They are still quite different and difficult to combine them. Lowering to arithmetic operations starts from creating ptrtoint, and then creates ADD/MUL/SHL, and ends with creating ADD and inttoptr. Lowering to GEPs starts from creating bitcast, and then creates MUL/SHL/GEPs, and ends with creating GEP and bitcast. HaoLiu: I still doubt about your suggestion. The diff 16027 gets ride of ResultIndices. They are still…
		jingyueUnsubmitted Not Done Reply Inline Actions Anyway, I'll probably do that by myself after this diff is in. jingyue: Anyway, I'll probably do that by myself after this diff is in.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions OK, thanks. HaoLiu: OK, thanks.
// addr = gep float* addr2, i64 5		// addr = gep float* addr2, i64 5
//		//
// If a is -4, although the old index b is in bounds, the new index a is		// If a is -4, although the old index b is in bounds, the new index a is
// off-bound. http://llvm.org/docs/LangRef.html#id181 says "if the		// off-bound. http://llvm.org/docs/LangRef.html#id181 says "if the
// inbounds keyword is not present, the offsets are added to the base		// inbounds keyword is not present, the offsets are added to the base
// address with silently-wrapping two's complement arithmetic".		// address with silently-wrapping two's complement arithmetic".
// Therefore, the final code will be a semantically equivalent.		// Therefore, the final code will be a semantically equivalent.
//		//
// TODO(jingyue): do some range analysis to keep as many inbounds as		// TODO(jingyue): do some range analysis to keep as many inbounds as
// possible. GEPs with inbounds are more friendly to alias analysis.		// possible. GEPs with inbounds are more friendly to alias analysis.
GEP->setIsInBounds(false);		GEP->setIsInBounds(false);

		// Lowers a GEP to either GEPs with single index or arithmetic operations.
		if (LowerGEP) {
		// As currently BasicAA does not analyze ptrtoint/inttoptr, do not lower to
		// arithmetic operations if the target uses alias analysis in codegen.
		if (TM && TM->getSubtarget<TargetSubtargetInfo>().useAA())
		lowerToSingleIndexGEPs(GEP, AccumulativeByteOffset);
		else
		lowerToArithmetics(GEP, AccumulativeByteOffset);
		return true;
		}

		// No need to create another GEP if the accumulative byte offset is 0.
		if (AccumulativeByteOffset == 0)
		return true;

// Offsets the base with the accumulative byte offset.		// Offsets the base with the accumulative byte offset.
//		//
// %gep ; the base		// %gep ; the base
// ... %gep ...		// ... %gep ...
//		//
// => add the offset		// => add the offset
//		//
// %gep2 ; clone of %gep		// %gep2 ; clone of %gep
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	Type *I8PtrTy = Type::getInt8PtrTy(GEP->getContext(),
GEP->getPointerAddressSpace());		GEP->getPointerAddressSpace());
NewGEP = new BitCastInst(NewGEP, I8PtrTy, "", GEP);		NewGEP = new BitCastInst(NewGEP, I8PtrTy, "", GEP);
NewGEP = GetElementPtrInst::Create(		NewGEP = GetElementPtrInst::Create(
NewGEP, ConstantInt::get(IntPtrTy, AccumulativeByteOffset, true),		NewGEP, ConstantInt::get(IntPtrTy, AccumulativeByteOffset, true),
"uglygep", GEP);		"uglygep", GEP);
if (GEP->getType() != I8PtrTy)		if (GEP->getType() != I8PtrTy)
NewGEP = new BitCastInst(NewGEP, GEP->getType(), GEP->getName(), GEP);		NewGEP = new BitCastInst(NewGEP, GEP->getType(), GEP->getName(), GEP);
}		}

		hfinkelUnsubmitted Not Done Reply Inline Actions I think that if the original GEP has the "inbounds" property, then the shl, mul here (and the same below plus the add in the ptrtoint/inttoptr code) can have the nsw/nuw flags. hfinkel: I think that if the original GEP has the "inbounds" property, then the shl, mul here (and the…
GEP->replaceAllUsesWith(NewGEP);		GEP->replaceAllUsesWith(NewGEP);
GEP->eraseFromParent();		GEP->eraseFromParent();

return true;		return true;
		jingyueUnsubmitted Not Done Reply Inline Actions Looks quite similar to splitGEP. Can we massage this into splitGEP, and guard the part that lowers GEP with if (SimplifyGEP)? jingyue: Looks quite similar to splitGEP. Can we massage this into splitGEP, and guard the part that…
}		}

bool SeparateConstOffsetFromGEP::runOnFunction(Function &F) {		bool SeparateConstOffsetFromGEP::runOnFunction(Function &F) {
if (DisableSeparateConstOffsetFromGEP)		if (DisableSeparateConstOffsetFromGEP)
return false;		return false;

bool Changed = false;		bool Changed = false;
for (Function::iterator B = F.begin(), BE = F.end(); B != BE; ++B) {		for (Function::iterator B = F.begin(), BE = F.end(); B != BE; ++B) {
for (BasicBlock::iterator I = B->begin(), IE = B->end(); I != IE; ) {		for (BasicBlock::iterator I = B->begin(), IE = B->end(); I != IE; ) {
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I++)) {		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I++)) {
Changed \|= splitGEP(GEP);		Changed \|= splitGEP(GEP);
}		}
// No need to split GEP ConstantExprs because all its indices are constant		// No need to split GEP ConstantExprs because all its indices are constant
// already.		// already.
}		}
}		}
return Changed;		return Changed;
}		}
		jingyueUnsubmitted Not Done Reply Inline Actions The comment here is confusing. It doesn't seem to explain what the following code does. jingyue: The comment here is confusing. It doesn't seem to explain what the following code does.
		jingyueUnsubmitted Not Done Reply Inline Actions Why name it OtherConst? jingyue: Why name it OtherConst?
		jingyueUnsubmitted Not Done Reply Inline Actions OK, now I see what you're doing here. According to our discussion in D5863, this looks like a temporary work around for the CFGSimplification issue, and will eventually go away. I don't think it's a good idea to hack this pass to work around that issue. I'd rather run instsimplify before SeparateConstOffset in the AArch64 backend, and add a FIXME there saying once the CFGSimplification issue is fixed we needn't run instsimplify any more. jingyue: OK, now I see what you're doing here. According to our discussion in D5863, this looks like a…
		jingyueUnsubmitted Not Done Reply Inline Actions After extractConstantAndSimplifyGEP is merged into splitGEP, the code here can be simplified. jingyue: After extractConstantAndSimplifyGEP is merged into splitGEP, the code here can be simplified.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I just don't think this is a temporary work. We cannot assume there is no situation like "a = 1+2". People also can create a test case of "a = 1 + 2" and feed to this pass. I mean people can call this pass without any other passes. So if we don't accumulate all none zero indices in this place, we'll get incorrect result. We want to " Remove the constant offset in each GEP index" and " Splits this GEP index into a variadic part and a constant offset" in this for loop. So the later part about transformation won't consider about the none zero constant indices, it assumes there is all varaible indices or zero indices. See that I only check "if (!isa<ConstantInt>(GEP->getOperand(I)))". So if we leave an index of "constant 1" behand, it will be ignored and we'll generate incorrect result. That's why I want to make sure we have accumulated all the none zero constant indices. HaoLiu: I just don't think this is a temporary work. We cannot assume there is no situation like "a =…
		jingyueUnsubmitted Not Done Reply Inline Actions I'm confused. Check my comment on the line if (!isa<ConstantInt>(GEP->getOperand(I))) and if my comment there makes sense I'll come back here. jingyue: I'm confused. Check my comment on the line ``` if (!isa<ConstantInt>(GEP->getOperand(I))) ```…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think this is the right place to do such thing. As the comments say, we want to "// Remove the constant offset in each GEP index." If we should leave some constant offset in index. Also, if we fix such things in the following lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem. HaoLiu: I think this is the right place to do such thing. As the comments say, we want to "// Remove…

test/CodeGen/AArch64/aarch64-gep-opt.ll

This file was added.

				; RUN: llc -O3 -verify-machineinstrs %s -o - \| FileCheck %s
				; RUN: llc -O3 -print-after=codegenprepare -mcpu=cyclone < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s
				; RUN: llc -O3 -print-after=codegenprepare -mcpu=cortex-a53 < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA <%t %s
				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-linux-gnueabi"

				; Following test cases test enabling SeparateConstOffsetFromGEP pass in AArch64
				; backend. If useAA() returns true, it will transform a complex GEP into simpler
				; GEPs, else it will transform a complex GEP into a ptrtoint/inttoptr form.

				%struct = type { i32, i32, i32, i32, [20 x i32] }

				; Check that when two complex GEPs are used in two basic blocks, LLVM can
				; elimilate the common subexpression for the second use.
				define void @test_GEP_CSE([240 x %struct]* %string, i32* %adj, i32 %lib, i64 %idxprom) {
				%liberties = getelementptr [240 x %struct]* %string, i64 1, i64 %idxprom, i32 3
				%1 = load i32* %liberties, align 4
				%cmp = icmp eq i32 %1, %lib
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%origin = getelementptr [240 x %struct]* %string, i64 1, i64 %idxprom, i32 2
				%2 = load i32* %origin, align 4
				store i32 %2, i32* %adj, align 4
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				}

				; CHECK-LABEL: test_GEP_CSE:
				; CHECK: madd
				; CHECK: ldr
				; CHECK-NOT: madd
				; CHECK:ldr

				; CHECK-NoAA-LABEL: @test_GEP_CSE(
				; CHECK-NoAA: [[PTR0:%[a-zA-Z0-9]+]] = ptrtoint [240 x %struct]* %string to i64
				; CHECK-NoAA: [[PTR1:%[a-zA-Z0-9]+]] = mul i64 %idxprom, 96
				; CHECK-NoAA: [[PTR2:%[a-zA-Z0-9]+]] = add i64 [[PTR0]], [[PTR1]]
				; CHECK-NoAA: add i64 [[PTR2]], 23052
				; CHECK-NoAA: inttoptr
				; CHECK-NoAA: if.then:
				; CHECK-NoAA-NOT: ptrtoint
				; CHECK-NoAA-NOT: mul
				; CHECK-NoAA: add i64 [[PTR2]], 23048
				; CHECK-NoAA: inttoptr

				; CHECK-UseAA-LABEL: @test_GEP_CSE(
				; CHECK-UseAA: [[PTR0:%[a-zA-Z0-9]+]] = bitcast [240 x %struct]* %string to i8*
				; CHECK-UseAA: [[IDX:%[a-zA-Z0-9]+]] = mul i64 %idxprom, 96
				; CHECK-UseAA: [[PTR1:%[a-zA-Z0-9]+]] = getelementptr i8* [[PTR0]], i64 [[IDX]]
				; CHECK-UseAA: getelementptr i8* [[PTR1]], i64 23052
				; CHECK-UseAA: bitcast
				; CHECK-UseAA: if.then:
				; CHECK-UseAA: getelementptr i8* [[PTR1]], i64 23048
				; CHECK-UseAA: bitcast

				%class.my = type { i32, [128 x i32], i32, [256 x %struct.pt]}
				%struct.pt = type { %struct.point*, i32, i32 }
				%struct.point = type { i32, i32 }

				; Check when a GEP is used across two basic block, LLVM can sink the address
				; calculation and code gen can generate a better addressing mode for the second
				; use.
				define void @test_GEP_across_BB(%class.my* %this, i64 %idx) {
				%1 = getelementptr %class.my* %this, i64 0, i32 3, i64 %idx, i32 1
				%2 = load i32* %1, align 4
				%3 = getelementptr %class.my* %this, i64 0, i32 3, i64 %idx, i32 2
				%4 = load i32* %3, align 4
				%5 = icmp eq i32 %2, %4
				br i1 %5, label %if.true, label %exit

				if.true:
				%6 = shl i32 %4, 1
				store i32 %6, i32* %3, align 4
				br label %exit

				exit:
				%7 = add nsw i32 %4, 1
				store i32 %7, i32* %1, align 4
				ret void
				}
				; CHECK-LABEL: test_GEP_across_BB:
				; CHECK: ldr {{w[0-9]+}}, [{{x[0-9]+}}, #528]
				; CHECK: ldr {{w[0-9]+}}, [{{x[0-9]+}}, #532]
				; CHECK-NOT: add
				; CHECK: str {{w[0-9]+}}, [{{x[0-9]+}}, #532]
				; CHECK: str {{w[0-9]+}}, [{{x[0-9]+}}, #528]

				; CHECK-NoAA-LABEL: test_GEP_across_BB(
				; CHECK-NoAA: add i64 [[TMP:%[a-zA-Z0-9]+]], 528
				; CHECK-NoAA: add i64 [[TMP]], 532
				; CHECK-NoAA: if.true:
				; CHECK-NoAA: {{%sunk[a-zA-Z0-9]+}} = add i64 [[TMP]], 532
				; CHECK-NoAA: exit:
				; CHECK-NoAA: {{%sunk[a-zA-Z0-9]+}} = add i64 [[TMP]], 528

				; CHECK-UseAA-LABEL: test_GEP_across_BB(
				; CHECK-UseAA: [[PTR0:%[a-zA-Z0-9]+]] = getelementptr
				; CHECK-UseAA: getelementptr i8* [[PTR0]], i64 528
				; CHECK-UseAA: getelementptr i8* [[PTR0]], i64 532
				; CHECK-UseAA: if.true:
				; CHECK-UseAA: {{%sunk[a-zA-Z0-9]+}} = getelementptr i8* [[PTR0]], i64 532
				; CHECK-UseAA: exit:
				; CHECK-UseAA: {{%sunk[a-zA-Z0-9]+}} = getelementptr i8* [[PTR0]], i64 528

				%struct.S = type { float, double }
				@struct_array = global [1024 x %struct.S] zeroinitializer, align 16

				; The following two test cases check we can extract constant from indices of
				; struct type.
				; The constant offsets are from indices "i64 %idxprom" and "i32 1". As the
				; alloca size of %struct.S is 16, and "i32 1" is the 2rd element whose field
				; offset is 8, the total constant offset is (5 * 16 + 8) = 88.
				define double* @test-struct_1(i32 %i) {
				entry:
				%add = add nsw i32 %i, 5
				%idxprom = sext i32 %add to i64
				%p = getelementptr [1024 x %struct.S]* @struct_array, i64 0, i64 %idxprom, i32 1
				ret double* %p
				}
				; CHECK-NoAA-LABEL: @test-struct_1(
				; CHECK-NoAA-NOT: getelementptr
				; CHECK-NoAA: add i64 %{{[a-zA-Z0-9]+}}, 88

				; CHECK-UseAA-LABEL: @test-struct_1(
				; CHECK-UseAA: getelementptr i8* %{{[a-zA-Z0-9]+}}, i64 88

				%struct3 = type { i64, i32 }
				%struct2 = type { %struct3, i32 }
				%struct1 = type { i64, %struct2 }
				%struct0 = type { i32, i32, i64*, [100 x %struct1] }

				; The constant offsets are from indices "i32 3", "i64 %arrayidx" and "i32 1".
				; "i32 3" is the 4th element whose field offset is 16. The alloca size of
				; %struct1 is 32. "i32 1" is the 2rd element whose field offset is 8. So the
				; total constant offset is 16 + (-2 * 32) + 8 = -40
				define %struct2* @test-struct_2(%struct0* %ptr, i64 %idx) {
				entry:
				%arrayidx = add nsw i64 %idx, -2
				%ptr2 = getelementptr %struct0* %ptr, i64 0, i32 3, i64 %arrayidx, i32 1
				ret %struct2* %ptr2
				}
				; CHECK-NoAA-LABEL: @test-struct_2(
				; CHECK-NoAA-NOT: = getelementptr
				; CHECK-NoAA: add i64 %{{[a-zA-Z0-9]+}}, -40

				; CHECK-UseAA-LABEL: @test-struct_2(
				; CHECK-UseAA: getelementptr i8* %{{[a-zA-Z0-9]+}}, i64 -40

				; Test that when a index is added from two constant, SeparateConstOffsetFromGEP
				; pass does not generate incorrect result.
				define void @test_const_add([3 x i32]* %in) {
				%inc = add nsw i32 2, 1
				%idxprom = sext i32 %inc to i64
				%arrayidx = getelementptr [3 x i32]* %in, i64 %idxprom, i64 2
				store i32 0, i32* %arrayidx, align 4
				ret void
				}
				; CHECK-LABEL: test_const_add:
				; CHECK: str wzr, [x0, #44]

test/CodeGen/AArch64/arm64-addr-mode-folding.ll

	; RUN: llc -O3 -mtriple arm64-apple-ios3 %s -o - \| FileCheck %s			; RUN: llc -O3 -mtriple arm64-apple-ios3 -aarch64-gep-opt=false %s -o - \| FileCheck %s
	; <rdar://problem/13621857>			; <rdar://problem/13621857>

	@block = common global i8* null, align 8			@block = common global i8* null, align 8

	define i32 @fct(i32 %i1, i32 %i2) {			define i32 @fct(i32 %i1, i32 %i2) {
	; CHECK: @fct			; CHECK: @fct
	; Sign extension is used more than once, thus it should not be folded.			; Sign extension is used more than once, thus it should not be folded.
	; CodeGenPrepare is not sharing sext across uses, thus this is folded because			; CodeGenPrepare is not sharing sext across uses, thus this is folded because
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-cse.ll

	; RUN: llc -O3 < %s -aarch64-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc -O3 < %s -aarch64-atomic-cfg-tidy=0 -aarch64-gep-opt=false \| FileCheck %s
	target triple = "arm64-apple-ios"			target triple = "arm64-apple-ios"

	; rdar://12462006			; rdar://12462006
	; CSE between "icmp reg reg" and "sub reg reg".			; CSE between "icmp reg reg" and "sub reg reg".
	; Both can be in the same basic block or in different basic blocks.			; Both can be in the same basic block or in different basic blocks.
	define i8* @t1(i8* %base, i32* nocapture %offset, i32 %size) nounwind {			define i8* @t1(i8* %base, i32* nocapture %offset, i32 %size) nounwind {
	entry:			entry:
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 16190

include/llvm/Transforms/Scalar.h

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

test/CodeGen/AArch64/aarch64-gep-opt.ll

test/CodeGen/AArch64/arm64-addr-mode-folding.ll

test/CodeGen/AArch64/arm64-cse.ll

[AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.
ClosedPublic