This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.
ClosedPublic

Authored by • HaoLiu on Oct 19 2014, 11:39 PM.

Download Raw Diff

Details

Reviewers

ssijaric
t.p.northover
jingyue
mcrosier

Summary

Hi Tim, Jingyue and other reviewers,

This patch is based on http://reviews.llvm.org/D5863, which fixes some problems in the SeparateConstOffsetFromGEP pass. So please apply that patch first if you want to have a try.

We find LLVM cannot handle CSE well in GEPs (getelementptrs). The GEP can be very complex, it can has many mixed constant indices and variable indices. The variable indices may also contain other constants. And such complex GEPs are usually kept in CodeGen. But as CodeGen can only see in one basic block. For the GEPs across basic blocks (e.g, two GEPs used in two different basic blocks or one GEP used in two basic blocks), it may have CSE opportunities and will be missed.

Currently there is a pass called SeparateConstOffsetFromGEP, which can separate constant within variable indices and split a complex GEP into two GEPs. But this is not enough for the problem that GEPs across basic blocks I mentioned above.

So I improve this pass. It will separate constant within indices for both sequential and struct types. And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks. Also it can benefit address sinking logic in CodeGenPrepare for those complex GEPs which can not be matched by addressing mode, as part of the address calculation can still be sunk. The address sinking logic can result in better addressing mode. EarlyCSE pass is called after this pass to do CSE. LICM pass is also called to do loop invariant code motion in case any of the address calculations are invariant.

If we don't find CSE opportunities after such arithmetic transformation, it still has no obvious regression on performance, as it will always do such transformation in CodeGen. We just move such transformation several passes ahead of CodeGen.

I tested the performance for A57 of AArch64 target on SPEC CPU2000 and SPEC CPU2006. It has no obvious regressions and has improvements on following benchmarks:
spec2006.473.astar 4.7%
spec2006.444.namd 3.0%
spec2006.445.gobmk 2.5%

For the benchmarks don't have obvious improvement, we can also see the address calculation and addressing mode are better from the assembly code.

For other targets like NVPTX, I can not test this patch. I think this patch can also benefit the performance, at least it has no regression.

Review please.

Thanks,
-Hao

Diff Detail

Event Timeline

• HaoLiu updated this revision to Diff 15133.Oct 19 2014, 11:39 PM

• HaoLiu retitled this revision from to [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend..

• HaoLiu updated this object.

• HaoLiu edited the test plan for this revision. (Show Details)

• HaoLiu added reviewers: t.p.northover, jingyue.

• HaoLiu added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 19 2014, 11:39 PM

And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks.

You'll need to disable this transformation when AA is enabled in CodeGen. You can do this just like CGP does (essentially by calling TM->getSubtarget<TargetSubtargetInfo>().useAA()). CGP splits the GEPs into simpler GEPs when AA is in use, and uses inttoptr otherwise. This all relates to the way that we use BasicAA to override TBAA and catch cases of basic type punning -- but BasicAA does not analyze inttoptr, and so can no longer fulfill this need after new inttoptrs are introduced. You could try using simpler GEPs like CGP does when AA is in use, or just disable this part of the transformation for now. Either is fine with me.

• HaoLiu updated this revision to Diff 15234.Oct 21 2014, 11:45 PM

• HaoLiu updated this revision to Diff 15235.Oct 21 2014, 11:57 PM

In D5864#5, @hfinkel wrote:

And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks.

You'll need to disable this transformation when AA is enabled in CodeGen. You can do this just like CGP does (essentially by calling TM->getSubtarget<TargetSubtargetInfo>().useAA()). CGP splits the GEPs into simpler GEPs when AA is in use, and uses inttoptr otherwise. This all relates to the way that we use BasicAA to override TBAA and catch cases of basic type punning -- but BasicAA does not analyze inttoptr, and so can no longer fulfill this need after new inttoptrs are introduced. You could try using simpler GEPs like CGP does when AA is in use, or just disable this part of the transformation for now. Either is fine with me.

Hi Hal,

Thanks for the explanation on useAA(). We do transformation because using simpler GEPs to replace a complex GEP like CGP does has small impact. Such transformation is not in CGP, because we want to make use of the constant extraction in SeparateConstOffsetFromGEP pass, and we also want to call other passes like EarlyCSE to remove common sub-expressions after such transformation. For the performance concern, I prefer to enable such transformation in AArch64 backend.
But unfortunately, AArch64 backend enables AA for both cortex-a53 and cortex-a57:

bool useAA() const override { return isCortexA53() || isCortexA57(); }

This was introduced recently in http://reviews.llvm.org/D5103. I find most of the backend including X86,ARM doesn't enable AA. AArch64 backend also disables AA when CPU is cyclone. As Sanjin also says enabling AA for CortexA57 doesn't have performance changes in D5103, I think we can just remove CortexA57 from useAA(), so that we can get benefit from such transformation and has no regression.

For CortexA53, I don't test how much benefit from enabling AA in code gen. Maybe we can test following situations and find the best solution:

(1) disable AA, enable Transformation.
(2) enable AA, disable Transformation.
(3) enable AA and Transformation.

Anyway, such test need a lot of time. So I think we'd better only consider about A57, so I've attached a new patch (diff ID 15234) following your suggestion and remove A57 form useAA().

Review please.

Thanks,
-Hao

• HaoLiu added reviewers: ssijaric, mcrosier.Oct 22 2014, 12:09 AM

Hi James,

I think there are two main reasons this approach is outside of CGP:

(1) I want to make use of the constant extraction logic in SeparateConstOffsetFromGEP pass. It can find more constants

(2) CGP only sinks GEPs, it doesn’t do CSE. So I think it’s better to do such logic separately in SeparateConstOffsetFromGEP pass before CGP. This won’t affect the address sinking logic in CGP.

Thanks,

-Hao

From: mankeyrabbit@gmail.com [mailto:mankeyrabbit@gmail.com] On Behalf Of James Molloy
Sent: Wednesday, October 22, 2014 4:01 AM
To: reviews+D5864+public+2de5fd757ea4c578@reviews.llvm.org
Cc: Hao Liu; t.p.northover@gmail.com; jingyue@google.com; llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Hi Hao,

It's great to finally see this go upstream!

When you started working on this, the approach was to modify the GEP lowering logic in CGP. Hal's comment has reminded me that your approach has moved away from that, so could you please explain why you're doing this outside of CGP, and why your solution does not duplicate the CGP lowering code?

How do they interact?

Cheers,

James

Original Message -----

From: "Hao Liu" <Hao.Liu@arm.com>
To: "Hao Liu" <Hao.Liu@arm.com>, "t p northover" <t.p.northover@gmail.com>, jingyue@google.com
Cc: hfinkel@anl.gov, "amara emerson" <amara.emerson@arm.com>, llvm-commits@cs.uiuc.edu
Sent: Wednesday, October 22, 2014 2:07:12 AM
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

In D5864#5, @hfinkel wrote:

And most important is that it will also transform complex GEPs
into a "ptrtoint + arithmetic + inttoptr" form, so that it is
able to find CSE opportunities across basic blocks.

You'll need to disable this transformation when AA is enabled in
CodeGen. You can do this just like CGP does (essentially by
calling TM->getSubtarget<TargetSubtargetInfo>().useAA()). CGP
splits the GEPs into simpler GEPs when AA is in use, and uses
inttoptr otherwise. This all relates to the way that we use
BasicAA to override TBAA and catch cases of basic type punning --
but BasicAA does not analyze inttoptr, and so can no longer
fulfill this need after new inttoptrs are introduced. You could
try using simpler GEPs like CGP does when AA is in use, or just
disable this part of the transformation for now. Either is fine
with me.

Hi Hal,

Thanks for the explanation on useAA(). We do transformation because
using simpler GEPs to replace a complex GEP like CGP does has small
impact. Such transformation is not in CGP, because we want to make
use of the constant extraction in SeparateConstOffsetFromGEP pass,
and we also want to call other passes like EarlyCSE to remove common
sub-expressions after such transformation. For the performance
concern, I prefer to enable such transformation in AArch64 backend.
But unfortunately, AArch64 backend enables AA for both cortex-a53 and
cortex-a57:
bool useAA() const override { return isCortexA53() ||
isCortexA57(); }
This was introduced recently in http://reviews.llvm.org/D5103. I find
most of the backend including X86,ARM doesn't enable AA. AArch64
backend also disables AA when CPU is cyclone. As Sanjin also says
enabling AA for CortexA57 doesn't have performance changes in D5103,
I think we can just remove CortexA57 from useAA(), so that we can
get benefit from such transformation and has no regression.

For CortexA53, I don't test how much benefit from enabling AA in code
gen. Maybe we can test following situations and find the best
solution:
(1) disable AA, enable Transformation.
(2) enable AA, disable Transformation.
(3) enable AA and Transformation.
Anyway, such test need a lot of time. So I think we'd better only
consider about A57, so I've attached a new patch (diff ID 15234)
following your suggestion and remove A57 form useAA().

Hi Hao,

This is not quite what I had in mind... when I said that we should disable the transformation, I did not mean that you should disable the entire pass, but rather, that you should only disable the part of SeparateConstOffsetFromGEP that breaks apart GEPs into ptrtoint/inttoptr when useAA is true like CGP does. Does SeparateConstOffsetFromGEP not convey benefits without this part of the transformation? Can you try implementing the same thing by breaking the GEPs into simpler ones like CGP does when useAA is true?

As a general note, while I understand why CGP breaks GEPs into ptrtoint/inttoptr, I'd like to move the generic CodeGen passes away from doing that. If you look at CGP, implementing this in terms of GEPs is pretty straightforward. When testing on x86 using -addr-sink-using-gep (even with useAA false), performance generally improves over using the ptrtoint/inttoptr (although there are a couple of regressions I need to investigate when I have some time). As a quasi-separate issue, I'm curious to know what using -addr-sink-using-gep does on ARM/AArch64. Regardless, I'm excited to see this work because I'd like to make SeparateConstOffsetFromGEP available to more backends -- including PowerPC that uses AA more aggressively -- and so I don't want the pass itself to introduce ptrtoint/inttoptr (at least not when useAA is true).

Thanks again,
Hal

Review please.

Thanks,
-Hao

http://reviews.llvm.org/D5864

Sorry, a wrong patch is attached, it should check not useAA()... So add another patch to fix this problem.

This is not quite what I had in mind... when I said that we should disable the transformation, I did not mean that you should disable the entire pass, but rather, that you should only disable the part of SeparateConstOffsetFromGEP that breaks apart GEPs into ptrtoint/inttoptr when useAA is true like CGP does. Does SeparateConstOffsetFromGEP not convey benefits without this part of the transformation? Can you try implementing the same thing by breaking the GEPs into simpler ones like CGP does when useAA is true?

Hi Hal,

I think I understand your point now. I've attached a new patch. It will do arithmetic transformation only when the useAA() returns false, or it will transform a complex GEP into several simpler GEPs like CGP does.
The correctness has been tested. The performance test has not finished yet. But the assembly code looks similar to the old patch for cortex-a57. If the result is better than the old patch, I'll remove the logic about transformation to ptrtoint/inttoptr.

Review please.

Thanks,
-Hao

This generally looks good to me. It would be good to have someone expert in this pass to also look it over.

the performance test has not finished yet. But the assembly code looks similar to the old patch for cortex-a57. If the result is better than the old patch, I'll remove the logic about transformation to ptrtoint/inttoptr.

I am curious how this turns out; please let us know.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
86	Even we can extract constant from GEPs -> Even if we can extract constants from GEPs
88	algorithm
984	I think that if the original GEP has the "inbounds" property, then the shl, mul here (and the same below plus the add in the ptrtoint/inttoptr code) can have the nsw/nuw flags.

• HaoLiu updated this revision to Diff 15380.Oct 24 2014, 1:15 AM

Hi Hal,

A new patch has been attached according to your comments.
BTW. For the inbounds issue, the CGP just ignores it when creating new GEP/MUL/ADD, so I think maybe it is a minor problem in CGP.

In D5864#17, @hfinkel wrote:

This generally looks good to me. It would be good to have someone expert in this pass to also look it over.

the performance test has not finished yet. But the assembly code looks similar to the old patch for cortex-a57. If the result is better than the old patch, I'll remove the logic about transformation to ptrtoint/inttoptr.

I am curious how this turns out; please let us know.

I think this is because ptrtoint/inttoptr is similar to GEP. According to http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr, the most difference is that GEP carries additional pointer aliasing rules. Also as the AA used in CodeGen doesn't change to much thing, so the assembly code is similar. It's just my guess. Two version still has much difference in instruction order.

Thanks,
-Hao

Original Message -----

From: "Hao Liu" <Hao.Liu@arm.com>
To: "Hao Liu" <Hao.Liu@arm.com>, "t p northover" <t.p.northover@gmail.com>, jingyue@google.com,
ssijaric@codeaurora.org, mcrosier@codeaurora.org
Cc: hfinkel@anl.gov, "amara emerson" <amara.emerson@arm.com>, llvm-commits@cs.uiuc.edu
Sent: Friday, October 24, 2014 3:25:14 AM
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Hi Hal,

A new patch has been attached according to your comments.
BTW. For the inbounds issue, the CGP just ignores it when creating
new GEP/MUL/ADD, so I think maybe it is a minor problem in CGP.

FWIW, we only recently started propagating NSW/NUW into the SelectionDAG layer. We should add a FIXME (or just fix this) in CGP too.

In D5864#17, @hfinkel wrote:

This generally looks good to me. It would be good to have someone
expert in this pass to also look it over.

the performance test has not finished yet. But the assembly code
looks similar to the old patch for cortex-a57. If the result is
better than the old patch, I'll remove the logic about
transformation to ptrtoint/inttoptr.

I am curious how this turns out; please let us know.

I think this is because ptrtoint/inttoptr is similar to GEP.
According to
http://llvm.org/docs/GetElementPtr.html#how-is-gep-different-from-ptrtoint-arithmetic-and-inttoptr,
the most difference is that GEP carries additional pointer aliasing
rules. Also as the AA used in CodeGen doesn't change to much thing,
so the assembly code is similar. It's just my guess. Two version
still has much difference in instruction order.

Yep; I recommend testing the difference independently of enabling AA. (In CGP, as I imagine you noticed, I also have a separate flag for this purpose).

-Hal

Thanks,
-Hao

http://reviews.llvm.org/D5864

I just noticed I was put on this. Sorry for the delay, and I'll review it today.

Jingyue

The performance result shows disable useAA for CortexA57 can have more improvement. The performance improvement result for latest trunk with this patch on CortexA57 are as follows:
Version 1 enable useAA() for CortexA57:

spec.cpu2006.ref.444_namd	-4.56%
spec.cpu2006.ref.473_astar	-3.79%
spec.cpu2006.ref.445_gobmk	-2.91%

Version 2 disable useAA() for CortexA57:

spec.cpu2006.ref.473_astar	-4.84%
spec.cpu2006.ref.444_namd	-4.39%
spec.cpu2006.ref.445_gobmk	-2.80%

Both versions have obvious benefit. But the result on astar of version 2 is much better than version 1. So I've attached a patch removing CortexA57 from useAA(). useAA() is beneficial for in-order target, but it has no obvious improvement on out of order cpu. Currently cyclone, x86 also doesn't useAA(). So I think we don't need useAA for CortexA57.

Hi James,

From: mankeyrabbit@gmail.com [mailto:mankeyrabbit@gmail.com] On Behalf Of James Molloy
Sent: Monday, October 27, 2014 11:38 PM
To: reviews+D5864+public+2de5fd757ea4c578@reviews.llvm.org
Cc: Hao Liu; Tim Northover; ssijaric@codeaurora.org; Chad Rosier; Jingyue Wu; LLVM Commits
Subject: Re: [PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Hi Hao,

I'm not sure about this. Firstly I have concerns about the statistical significance of the difference in results - we've seen swings of +/- 1% on A57, and here we're talking about a 0.5% difference. Are you satisfied that the difference is genuine?

[Hao Liu] I’m also curious about the result. I suspect both versions should have not big difference. But I think the difference is genuine. The benchmark astar is quite stable and should never have variance about +/- 1%.

Secondly, I don't see a good reason why useAA should be worse than !useAA. To my mind, if there is a regression when useAA is enabled, it should be one of two things:

A legitimate regression, in which case let's fix useAA!
Random code permutation outside of the compiler's model. If this is the case we shouldn't be adjusting the compiler to compensate.

useAA has the potential to improve codegen. We're not really using it at the moment, but it's good that it's available to use in the future. If we rip it out, we should have a good reason for it, I think!

Can I suggest that you make the initial commit, which improves performance, then have a look at why !useAA is better and fix that incrementally?

[Hao Liu] OK, I’d like to do that. Thanks.

Cheers,

James

A new patch adds several lines reviewed in D5863 and does some code reformat work to make it more readable.

Thanks,
-Hao

My general concern is the part extracting constant struct field index seems wrong. Unlike arrays where every elements share the same type, structure fields can have different types. Therefore, changing a field index may change the intermediate type in a pointer arithmetic, causing incorrect IR.

For example, consider

struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)

If you extract the 1, the gep becomes:

%p = getelementptr %s, 0, 0, i; &s.a[i] = s + i*sizeof(int)

and the constant offset is 4 * sizeof(int) = 16.

As you probably see already,

s + 4*sizeof(int) + i*sizeof(double) != s + 4*sizeof(int) + i*sizeof(int)

Does this make sense to you? What is a good way to fix this?

I also have a bunch of nit picks inlined.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
88	Nice explanation! However, in the final version, you may want to merge these improvements into the old comments. You can mention the improvements in the summary of the revision.
185	//, or null if we cannot extract a constant offset.
319	typo: the the given => the given a constant offsets => a constant offset wheter => whether depends on => depending on use => uses several simpler GEPs or arithmetic operations
331	a better name would be transformToArithmetics because you are using not only adds but also muls
364	Add a comment on what SimplifyGEP means.
642	If (ConstantOffset == 0) return NewIdx; // ... return Extractor.rebuild...
878	getZExtValue() returns uint64_t.
1019	The comment here is confusing. It doesn't seem to explain what the following code does.
1021	Why name it OtherConst?

Hi Jingyue,

My general concern is the part extracting constant struct field index seems wrong. Unlike arrays where every elements share the same type, structure fields can have different types. Therefore, changing a field index may change the intermediate type in a pointer arithmetic, causing incorrect IR.

For example, consider
struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)
If you extract the 1, the gep becomes:
%p = getelementptr %s, 0, 0, i; &s.a[i] = s + i*sizeof(int)
and the constant offset is 4 * sizeof(int) = 16.

As you probably see already,
s + 4*sizeof(int) + i*sizeof(double) != s + 4*sizeof(int) + i*sizeof(int)
Does this make sense to you? What is a good way to fix this?

Yes, your concern is reasonable. So to extract a index of structure type, we should not keep the original gep form of 3 indices.
For your case about "%p = getelementptr %s, 0, 1, i", we can transform it to either of two simpler forms:
1st Form: 2 simpler GEPs

%i_offset = mul %i, sizeof(double)
%s_i8 = bitcast %s to i8*
%s_i = getelementptr i8* %s_i8, %i_offset
%s_i_c = getelementptr i8* %s_i, 4*sizeof(int)
%result = bitcast %s_i_c to double*

Bitcast can be ignored, the original GEP with 3 indices are transformed into 1 MUL/SHL and 2 simpler GEPs with 1 index. If we find constant within the index "%i", it can also be added into the last GEP.

2nd Form: ptrtoint+arithmetic+inttoptr

%ptrint = ptrtoint %s to i64
%i_offset = mul %i, sizeof(double)
%s_i = add %ptrint, %i_offset
%s_i_c = add %ptrint, 4*sizeof(int)
%result = inttoptr %s_i_c to double*

This is similar, ptrtoint/inttoptr can be ignored, we transform the original GEP with 3 indices into 1 MUL/SHL and 2 ADDs.

The main benefit is that
(1) We can always extract the indices of struct types.
(2) We can do CSE easily on simpler GEPs and MUL/SHL/ADDs.

Such transformation is similar to the address sinking login in CGP pass, which also transform GEP to one of such forms by checking useAA(). For the correctness and performance, several benchmarks has been tested on AArch64 backend. I cannot test NVPTX, but I think if the test cases have complex address calculation, it can also get benefit.

Thanks,
-Hao

Add a new patch refactored according to Jingyue's comments.
Review please.

Thanks,
-Hao

Regarding the correctness issue with struct field indices, I think the solutions you proposed make sense. Look forward to your fix in the next diff. We should choose which solution to go with depending on useAA.

The logic looks good in general except the field index issue. My inline comments are mostly on refactoring.

lib/Target/AArch64/AArch64TargetMachine.cpp
208	I'd explain why running GEP reassociation can expose more opportunities for LICM. I guess only the split part (splitting a GEP with multiple indices into a set of smaller GEPs) buys you such opportunities. Right?
lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
82	Remove superfluous blank lines.
83	// Another improvement enabled by the SimplifyGEP flag is ...
93	I don't understand reason 2. Can you explain more? Are you saying, given a[i][j1] and a[i][j2], you can reuse the common part &a[i]?
365	What would be a good name? You are not really simplifying but to some extend complicating GEPs :) Does LowerGEP sound good to you? You are lowering a GEP into multiple uglygeps (btw, uglygep is a known concept in LLVM) or arithmetic operations which are closer to machine code. Make sure the comments are consistent after renaming.
857	Similarly, this function should be merged into accumulateByteOffset.
889	Can we merge the two transformToXXX functions? The logics are pretty similar.
988	Looks quite similar to splitGEP. Can we massage this into splitGEP, and guard the part that lowers GEP with if (SimplifyGEP)?
1016	OK, now I see what you're doing here. According to our discussion in D5863, this looks like a temporary work around for the CFGSimplification issue, and will eventually go away. I don't think it's a good idea to hack this pass to work around that issue. I'd rather run instsimplify before SeparateConstOffset in the AArch64 backend, and add a FIXME there saying once the CFGSimplification issue is fixed we needn't run instsimplify any more.
1050	After extractConstantAndSimplifyGEP is merged into splitGEP, the code here can be simplified.

Attach a new patch refactored according to Jingyue's comments. Review please.

Thanks,
-Hao

Regarding the correctness issue with struct field indices, I think the solutions you proposed make sense. Look forward to your fix in the next diff.

Hi Jingyue,

I'm just a bit confused about the "fix" you mean. Did you mean the solution to extract constant in the structure index? The solution is already in the diff.
I'm only not sure about the "a = 1+2" problem, and I've replied to your comment. For other problems, I've fixed in the attached diff.

Thanks,
-Hao

lib/Target/AArch64/AArch64TargetMachine.cpp
208	Yes. The whole GEP may not be invariant, but after splitting, we can move the invariant part and leave the variant part in the loop.
lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
93	Yes, GEPs like "getelementptr %ptr, i, j1" and "getelementptr %ptr, i, j2" can not be eliminated. But transformed into simpler forms, "i" and "j1" "j2" are separated, and we can find the common expressions about "i".
365	That make sence.
857	Make sence. It can be controlled by the LowerGEP.
889	I think we'd better not do merge. They logic look similar, but they do quite different things. If we want to merge them, we need to add checks like "if (TransformToSimplerGEP)" in many places (From the first for loop logic to the function end). That would be a quite hard work for other people to understand the code.
1016	I just don't think this is a temporary work. We cannot assume there is no situation like "a = 1+2". People also can create a test case of "a = 1 + 2" and feed to this pass. I mean people can call this pass without any other passes. So if we don't accumulate all none zero indices in this place, we'll get incorrect result. We want to " Remove the constant offset in each GEP index" and " Splits this GEP index into a variadic part and a constant offset" in this for loop. So the later part about transformation won't consider about the none zero constant indices, it assumes there is all varaible indices or zero indices. See that I only check "if (!isa<ConstantInt>(GEP->getOperand(I)))". So if we leave an index of "constant 1" behand, it will be ignored and we'll generate incorrect result. That's why I want to make sure we have accumulated all the none zero constant indices.

I mean the issue we spotted previously with the following example:

struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)

I thought we agreed that your diff had a correctness issue with this example, and I didn't see any change to fix that issue in your latest diff. Am I missing something?

• HaoLiu added a comment.Nov 2 2014, 5:20 PM

This comment was removed by • HaoLiu.

In D5864#35, @jingyue wrote:
I mean the issue we spotted previously with the following example:
struct S {
  int a[4]; // sizeof(int) = 4
  double b[8]; // sizeof(double) = 8
};

%s = alloca struct %S
%p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)
I thought we agreed that your diff had a correctness issue with this example, and I didn't see any change to fix that issue in your latest diff. Am I missing something?

Sorry, my words misled you. I said "we can transform it to either of two simpler forms", I meant this patch had already done like that.
The correctness issue has already been solved in the first patch. See more details in transformToSimplerGEPs(), which does the same as my explanation. Transformation to simpler GEPs is similar to transformation to arithmetic operations. It just replace ADDs by GEPs. So your concern about correctness issue has already been resolved. Also I have two test cases to test structure indices.

Thanks,
-Hao

ping...

I see what's going on: you only extract constant offsets from array indices, and leave struct indices in those transformXXX functions. Thanks for clarification! I think that's worth some comment though, and I will point that out in my inline comments.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
82	The word "simpler" is overloaded. I'd explicitly say something like: // to lower a GEP with multiple indices into multiple GEPs with a single index or arithmetic operations ... Is this more accurate?
102	The sentence "another simpler form of simpler GEPs is similar" is a little awkward. The word "simple" is very overloaded. What do you mean exactly by "simpler"? Consider using multiple sentences and/or an example to explain what this lowering to GEPs exactly mean.
125	This => Lowering GEPs
126	The word "optimize" is overloaded. Is hoist/sink a better word?
131	Such transformation can split it => Lowering GEPs can split such GEPs
184	You may want to say Find looks for constant offsets in both array indices and field indices. Because the signature of Extract is changed, you need to remove the comment "The meaning of the arguments and the return value ..." and explicitly explain what its arguments and return value mean.
313	I don't like the word "transform" and "simpler". To be more accurate, I'd use lower instead of transform. What do you exactly mean by "simpler"? Are you lowering a GEP with multiple indices to multiple GEPs with a single index? If that's the case, I'd say that explicitly in order to be more accurate.
319	ditto
323	will find => finds will only find => only finds else => ; otherwise,
324	none zero => non-zero
345	/// Whether to transform
713	If you buy that transformXXX should run after setIsInBounds(false), you needn't check inbound in this function.
717	These are essentially byte offsets instead of indices, because they are scaled by element sizes. Do we have a better name? ByteOffsetOfIndex?
717	Actually, is this vector even necessary? Can you build the resultant uglygep along the way?
719	vairable => variable
721	Why checking whether it's a constant? Shouldn't you check whether it is an array index? My understanding is, the offsets of structure indices (which are guaranteed to be constant) and the constant offsets of array indices are already counted in accumulativeByteOffsets, and now you only need to rebuild the variadic part of the array indices.
726	{ at the end
730	} else {
744	The meaning of "simpler" here is unclear.
779	ditto. See Line 721
784	ditto. curly brackets.
797	{
801	}
827	Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set, +AccumulativeByteOffset will still become part of the lowered GEPs or arithmetics (see how transformToSimplerGEPs create a GEP with offset AccumulativeByteOffset, and how transformToArithmetics create an add with offset AccumulativeByteOffset). Shouldn't we still check whether this offset can be nicely folded into an addressing mode?
836	// ... in each sequential index Emphasizing you only remove constant offsets from sequential indices is important, because the behavior here is slightly different from accumulateByteOffset which looks into both sequential and struct indices. You also need to explain why removing constant offsets in struct field index here is dangerous, and mention the transformXXX functions will take care of that.
863	Run these transformXXX functions before GEP->setIsInBounds(false) seems wrong to me, because transformXXX may treat off-bound GEPs as inbound and add nsw flags incorrectly.
866	"such form" is too obscure
889	The more I look at these two functions, the more strongly I believe we should merge them. Especially, after you get rid of ResultIndices, you probably only need to check LowerGEP once in the merged function. No?
1016	I'm confused. Check my comment on the line if (!isa<ConstantInt>(GEP->getOperand(I))) and if my comment there makes sense I'll come back here.

In D5864#39, @jingyue wrote:

I see what's going on: you only extract constant offsets from array indices, and leave struct indices in those transformXXX functions. Thanks for clarification! I think that's worth some comment though, and I will point that out in my inline comments.

Hi Jingyue,

Thanks for your detailed comments, which are really very helpful. I've refactored the patch and attached.
Review please.

Thanks,
-Hao

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
721	Yes, you are right. I'll add comments to explain this.
827	I think even a pointer plus a constant index can not be matched in an addressing mode, it is worth to lower a GEP with multiple indices. Because we can do optimizations on the variable indices. Also if the constant index is extracted from multiple indices, the result looks better than the original GEP. If I remove such check, some test cases will fail. So it is kept when disable LowerGEP.

jingyue added inline comments.Nov 11 2014, 10:56 AM

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
22	// We cannot eliminate the partially common subexpression getelementptr [10 x %struct]* %ptr, i64 %i
184	I don't see the first part of this comment addressed. I'd like to emphasize "both array indices and field indices".
720	Not critical, but I slightly prefer getZExtValue() because a field index is non-negative and getElementOffset takes unsigned.
721	I wasn't just proposing a comment change. I meant the source code should read something like if (isa<SequentialType>(GTI)) You are building new GEPs/arithmetics for the remainder of the array indices, and checking whether the indexed type is a SequentialType matches your intention much better. I also believe checking for SequentialType instead of ConstantInt solves your previous question on handling "a = 1 + 2". I'll go back to that later.
827	That makes some sense. Out of curiosity, which test would fail?
853	// we always don't => we don't remove struct field indices here "always don't" is not the right grammar. Besides, you do extract field indices later, so I'd weaken the word "never".
854	diable => LowerGEP is disabled
855	enable LowerGEP => LowerGEP is enabled
856	I wrote lowerToXXX in my reviews just for simplicity. However, in the source code, we should still use the full names.
889	I don't see this comment addressed.

jingyue requested changes to this revision.Nov 11 2014, 10:56 AM

jingyue edited edge metadata.

This revision now requires changes to proceed.Nov 11 2014, 10:56 AM

Hi Hao,

could you share your current performance data + options you use? With your patch I see small gains on astar + xalancbmk, but a ~4% loss on libquantum. The data is based on the 11/07 trunk on top of r221549, O3 + LTO, cyclone, ref input.
In addition to cortex, could you run on x86 also?

Thanks
Gerolf

In D5864#45, @Gerolf wrote:

Hi Hao,

could you share your current performance data + options you use? With your patch I see small gains on astar + xalancbmk, but a ~4% loss on libquantum. The data is based on the 11/07 trunk on top of r221549, O3 + LTO, cyclone, ref input.
In addition to cortex, could you run on x86 also?

Thanks
Gerolf

Hi Gerolf,

I use "-O3 -ffast-math -mcpu=cortex-a57" and ref input. My last run was on the 10/24 trunk on top of r220484 (To ignore the impact of trunk changes, I always run and compare to the specific revision). I've ran the performance test for about 4 times on this specific revision. To ignore noise and variances, I remove those benchmark who doesn't exist in every performance runs. Then I find there is no obvious regressions and have stable improvements on following 3 benchmarks (This is just results from one run, other runs have similar results):

SPEC2006.473.astar          -4.84%
SPEC2006.444.namd           -4.39%
SPEC2006.445.gobmk          -2.80%

The percentage number is calculated by: 100% - current_time / reference_time.

It's quite strange that your results have many differences from mine on cyclone. libquantum is a quite stable benchmark on our test infrastructure, and I never see performance changes on it with my patch. So I always think my patch doesn't affect its performance.

I can not test cyclone, so I run on x86 with today's trunk with and without my patch (I enabled the SeparateConstOffsetFromGEP pass on x86 backend similarly to AArch64 backend). I've ran for about 8 times on libquantum. This benchmark is not stable on my x86 machine. The results have +/-1% variances comparing to the average results with or without my patch. But the average results are similar with or without my patch. So I think my patch won't affect the performance of libquantum on x86. I've ran a performance test for all the int/fp benchmarks in SPEC 2006. Just haven't got result yet. As it needs quite a long time for the result. If it is fortunate, I'll get the result tomorrow and show to you.
So I wonder whether libquantum is stable on cyclone. If it is not stable, maybe you can run it several times to ignore the variance. Or if it is real regression, we can then investigate it.

Do you have more details about the results. I'm interested in how many improvements on astar. Also I'm interested in why there is no improvement on namd and gobmk. They are supposed to get some benefits from this patch.

Thanks,
-Hao

Hi Gerolf,

Why do you want to see the results on x86? SeparateConstOffsetFromGEP is
only used by NVPTX and AArch64 so far. I don't think it will affect the
compilation time or the compiled code on x86.

Jingyue

• HaoLiu updated this revision to Diff 16135.Nov 12 2014, 11:57 PM

• HaoLiu edited edge metadata.

Hi Jingyue,

I've attached a new patch according to the review comments.
Review please.

Thanks,
-Hao

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
184	I think "both array indices and field indices" means all indices. We don't need to explain this common behavior. It is quite obvious. I think we need comments if we only find in one kind of "array indices" or "field indices", or do something special. Anyway, the new comments will be like the comments for Extract()
720	Aha, previously I use getZExtValue(). You commented I should change to getSExtValue(). I thought it was not critical, so I modified it. Now, It seems I should roll back... : )
721	If check SequentialType, we also need to check wheter it is ConstantInt. Then the code will be like: if (isa<SequentialType>(GTI)) if (!isa<ConstantInt>(GEP->getOperand(I)) ... else AccumulativeByteOffset += Const This looks more complicate. About the "a = 1 + 2", I think handling it when extracting constant in indices is better. Then we don't need to add code in both lowerToXXX functions. Also if we handle it in lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem.
827	Oh, interesting. I supposed some of your test cases in test/Transforms/SeparateConstOffsetFromGEP/NVPTX may fail. But I tried removing such check and ran "make check", there is no test failures. So it seems there is no such test cases...
889	I still doubt about your suggestion. The diff 16027 gets ride of ResultIndices. They are still quite different and difficult to combine them. Lowering to arithmetic operations starts from creating ptrtoint, and then creates ADD/MUL/SHL, and ends with creating ADD and inttoptr. Lowering to GEPs starts from creating bitcast, and then creates MUL/SHL/GEPs, and ends with creating GEP and bitcast.
1016	I think this is the right place to do such thing. As the comments say, we want to "// Remove the constant offset in each GEP index." If we should leave some constant offset in index. Also, if we fix such things in the following lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem.

A general suggestion: I guess you upload your diff using Phab's web interface. If so, I suggest you upload patches with full context (http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface) from now on or use arc which uploads full context by default. It's a little difficult to review a diff with only 3-line contexts and Phab seems to have a little trouble computing the diff between two diffs with limited context (at least the line numbers don't match in many places).

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
184	That makes sense.
720	No, I don't think I suggested you change it to getSExtValue(). When I said "getSExtValue() returns uint64_t", I expected you to write uint64_t Field = ...->getZExtValue(); because I slightly prefer delaying truncating which loses precision. But, anyway, I'm fine with "unsigned".
721	I'm confused. How does AccumulativeByteOffset in your snippet come up here? I think AccumulativeByteOffset is computed before calling transformToSimplerGEPs. Why do you bother changing that again? In case I haven't made this clear, I never expect Find/Extract always extracts all constant offsets. For example, given a = 1 + 2, I'm totally fine if the Find function finds only 2, and in that case the remaining code should treat 1 just as a variable remainder. The GEP after extraction may look like "gep base, i1, 1, i2" which may not be the optimal form, but again I am fine with that because it's sound. If we want to get the optimal form, I am looking for either a more comprehensive way (not just handling tricky cases like adding two constants; your initial diff might serve as a good start but the code was too complicated) or reassociating each index before this pass so that all constant offsets in the index are merged (which I suppose is the approach we want to go with). With that, do we still need the nested "if (!isa<CostantInt>)" check? Do you have an example where simply checking isa<SequentialType> isn't enough? I think a concrete example may help to make things clearer.
827	That was my suspicion. This check on the addressing mode is a very weak check: I believe both NVPTX and AArch64 support reg+<a potentially very large offset>, so I would be surprised that having/not having this check would cause tests to fail. You explained pretty well why it's still worth to lower a GEP even if address mode matching fails. I'd like to see that in the source code comments. Thanks!
889	Anyway, I'll probably do that by myself after this diff is in.

• HaoLiu updated this revision to Diff 16190.Nov 13 2014, 6:58 PM

• HaoLiu edited edge metadata.

Hi Jingyue,

Nice suggestion. It really helps a lot for the review.

I've attached a new patch with full context. Review please.

Thanks,
-Hao

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
720	Oh, I double checked that. I indeed misunderstood your meaning. Last time I used "unsigned Field = xxx->getZExtValue()". not "uint64_t Field = xxx->getZExtValue()". Sorry for the noise. Anyway, diff 16135 has already fix it.
721	Yeah, I understand your meaning. We just let the "1" as a remainder. It's sound because the logic of Find/Extract is to handle only one constant and return. It's reasonable. Previously I focused on making it optimal. So I want to extract all the constant. Anyway, this is just corner case and it may be fixed in the future. I'm fine with that. BTW, there is already a test case @test_const_add checking we generate correct result for "a = 1 + 2".
889	OK, thanks.

LGTM with some minor fixes. You may want to double check with ARM folks on performance. I remember Gerolf said he couldn't get the same performance numbers as you did.

Thanks again for all the hard work!

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
148	Lower => Lowering
150	lower => we lower
154	lower => we lower
155	better => a better
333	single index => a single index
334	The grammar of the sentence "It has been extracted a constant offset" is incorrect. /// Lower a GEP with multiple indices into multiple GEPs with a single index. /// Function splitGEP already split the original GEP into a variadic part and a /// constant offset (i.e., AccumulativeByteOffset). This function lowers the /// variadic part into a set of GEPs with a single index and applies /// AccumulativeByteOffset to it. /// GEP The variadic part of the original GEP /// AccumulativeByteOffset ... I'd even like to change the name of the argument GEP to Variadic, because GEP appears so many times throughout this file and usually refers to the original GEP.
343	ditto
348	we can find => we successfully find My original comment wasn't clear enough, and I think "we successfully find" is better.
370	single index => a single index
850	The grammar of "So we don't such check" is wrong. => Therefore, we don't check for addressing modes in that case.
865	The following xxx and yyy => lowerToSingleIndexGEPs or lowerToArithmetics will later

This revision is now accepted and ready to land.Nov 14 2014, 9:14 AM

In D5864#45, @Gerolf wrote:

Hi Hao,

could you share your current performance data + options you use? With your patch I see small gains on astar + xalancbmk, but a ~4% loss on libquantum. The data is based on the 11/07 trunk on top of r221549, O3 + LTO, cyclone, ref input.
In addition to cortex, could you run on x86 also?

Hi Gerolf,

I ran spec2006 4 times on my x86 machine, but the results are always fluctuating. So I can not see whether it is better or not by enabling this pass. I think it is because I don't have much experience on testing x86.

For the AArch64 backend, recently I ran twice on r222083@trunk and r222017@trunk on Cortex-A57. The results are similar to the previous results. But it has about 1%-2% regression on 471.omnetpp. That benchmark is always fluctuating, so I can't measure it accurately. Anyway, we have much improvements than regressions.

Thanks,
-Hao

Did you try pinning the benchmark on one CPU? Scheduling can significantly
disturb the benchmark results. Also, if your CPU has any boosting feature
(similar to Intel's turbo boost), try disabling that. One of our folks,
Mark, found that can make benchmarking sensitive too.

Jingyue

In D5864#58, @jingyue wrote:

Did you try pinning the benchmark on one CPU? Scheduling can significantly
disturb the benchmark results. Also, if your CPU has any boosting feature
(similar to Intel's turbo boost), try disabling that. One of our folks,
Mark, found that can make benchmarking sensitive too.

Jingyue

I think SPEC2006 benchmarks always run on one CPU, so I didn't pin to one CPU. For the boosting feature, I tested on an old x86 machine, so I'm not sure about the features. Anyway, I'm not familiar with x86 performance tests, so I think it's better to let other experts to have a try.

Thanks,
-Hao

In D5864#54, @jingyue wrote:

LGTM with some minor fixes. You may want to double check with ARM folks on performance. I remember Gerolf said he couldn't get the same performance numbers as you did.

Thanks again for all the hard work!

Hi Jingyue,

Thanks very much for your hard work on review, which really help me a lot.
I've committed the code in http://llvm.org/viewvc/llvm-project?view=revision&revision=222328 and http://llvm.org/viewvc/llvm-project?view=revision&revision=222331.

BTW, for the performance test, I also tested other revision by some slightly modifications such as:
(1) Always lower GEPs if there is more than one index despite wheter we found a constant.
(2) Call SeparateConstOffsetFromGEP pass before calling TargetPassConfig::addIRPasses() in AArch64TargetMachine.cpp.

The performance results are different. Some benchmarks are better, some benchmarks are worse. So I just think maybe we need further tuning in the future for better performance. Anyway, that's the future work.

Thanks,
-Hao

SPEC2006 benchmarks are single-threaded, but this single thread can be
scheduled by the operating system on different CPUs (cores, hyperthreads,
etc.) in different time slices. I used to encounter fluctuation too and
fixed that using the taskset command. Not sure if that can help you too.

Also, I don't think x86 matters because this pass is only enabled in
AArch64 and NVPTX for now. I do feel a little worried on the 1%-2%
regression you observed on AArch64. It would be great if you can get a more
stable result.

Jingyue

Hi Hao,

thanks for all your efforts landing this. We’ll check the performance impact on cyclone. Did lto have any impact on your results?

Cheers
Gerolf

In D5864#61, @jingyue wrote:

SPEC2006 benchmarks are single-threaded, but this single thread can be
scheduled by the operating system on different CPUs (cores, hyperthreads,
etc.) in different time slices. I used to encounter fluctuation too and
fixed that using the taskset command. Not sure if that can help you too.

Also, I don't think x86 matters because this pass is only enabled in
AArch64 and NVPTX for now. I do feel a little worried on the 1%-2%
regression you observed on AArch64. It would be great if you can get a more
stable result.

Jingyue

Hi Jingyue,

When I run benchmarks on AArch64 machine, I use taskset because there are two kinds of cores (A53 and A57). I thought all x86 cores on my machine are the same, so I didn't use taskset. Thanks for your share. It is helpful for our future perf tests.

For the regressions, I'll keep on watching the performance results.

Thanks,
-Hao

In D5864#62, @Gerolf wrote:

Hi Hao,

thanks for all your efforts landing this. We’ll check the performance impact on cyclone. Did lto have any impact on your results?

Cheers
Gerolf

Hi Gerolf

I don't know the LTO impact. Actually I never tested on LTO and don't know how to test it.
If you think LTO will be affected, I can have a try.

Thanks,
-Hao

Alright, we’ll do an lto a/b run then. I thought you might have checked libquantum where saw a regression with an earlier version of the patch, but let’s sync up once we have the latest data.

Cheers
Gerolf

• HaoLiu closed this revision.May 13 2015, 7:52 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Scalar.h

4 lines

lib/

Target/

AArch64/

AArch64TargetMachine.cpp

16 lines

Transforms/

Scalar/

SeparateConstOffsetFromGEP.cpp

278 lines

test/

CodeGen/

AArch64/

aarch64-gep-opt.ll

150 lines

arm64-addr-mode-folding.ll

2 lines

arm64-cse.ll

2 lines

Diff 15380

include/llvm/Transforms/Scalar.h

Context not available.
	//	//
	// SeparateConstOffsetFromGEP - Split GEPs for better CSE	// SeparateConstOffsetFromGEP - Split GEPs for better CSE
	//	//
	FunctionPass *createSeparateConstOffsetFromGEPPass();	FunctionPass *
		createSeparateConstOffsetFromGEPPass(const TargetMachine *TM = nullptr,
		bool SimplifyGEP = false);

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	//	//
Context not available.

lib/Target/AArch64/AArch64TargetMachine.cpp

Context not available.
	cl::desc("Work around Cortex-A53 erratum 835769"),	cl::desc("Work around Cortex-A53 erratum 835769"),
	cl::init(false));	cl::init(false));

		static cl::opt<bool>
		EnableGEPOpt("aarch64-gep-opt", cl::Hidden,
		cl::desc("Enable optimizations on GEPs for better address CSE"),
		cl::init(true));

	extern "C" void LLVMInitializeAArch64Target() {	extern "C" void LLVMInitializeAArch64Target() {
	// Register the target.	// Register the target.
	RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);	RegisterTargetMachine<AArch64leTargetMachine> X(TheAArch64leTarget);
		jingyueUnsubmitted Not Done Reply Inline Actions I'd explain why running GEP reassociation can expose more opportunities for LICM. I guess only the split part (splitting a GEP with multiple indices into a set of smaller GEPs) buys you such opportunities. Right? jingyue: I'd explain why running GEP reassociation can expose more opportunities for LICM. I guess only…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yes. The whole GEP may not be invariant, but after splitting, we can move the invariant part and leave the variant part in the loop. HaoLiu: Yes. The whole GEP may not be invariant, but after splitting, we can move the invariant part…
Context not available.
	addPass(createCFGSimplificationPass());	addPass(createCFGSimplificationPass());

	TargetPassConfig::addIRPasses();	TargetPassConfig::addIRPasses();

		if (TM->getOptLevel() == CodeGenOpt::Aggressive && EnableGEPOpt) {
		// Call SeparateConstOffsetFromGEP pass to extract constants and simplify
		// complex GEPs.
		addPass(createSeparateConstOffsetFromGEPPass(TM, true));
		// Call EarlyCSE pass to find and remove subexpressions related to the GEPs.
		addPass(createEarlyCSEPass());
		// Do loop invariant code motion in case any of the address calculations are
		// actually loop invariant.
		addPass(createLICMPass());
		}
	}	}

	// Pass Pipeline Configuration	// Pass Pipeline Configuration
Context not available.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

Context not available.
	//	//
	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
		jingyueUnsubmitted Not Done Reply Inline Actions Remove superfluous blank lines. jingyue: Remove superfluous blank lines.
		jingyueUnsubmitted Not Done Reply Inline Actions The word "simpler" is overloaded. I'd explicitly say something like: // to lower a GEP with multiple indices into multiple GEPs with a single index or arithmetic operations ... Is this more accurate? jingyue: The word "simpler" is overloaded. I'd explicitly say something like: // to lower a GEP with…

		jingyueUnsubmitted Not Done Reply Inline Actions // Another improvement enabled by the SimplifyGEP flag is ... jingyue: // Another improvement enabled by the SimplifyGEP flag is ...
		//===----------------------------------------------------------------------===//
		//
		// Even we can extract constants from GEPs, sometimes it is still unable to do
		hfinkelUnsubmitted Not Done Reply Inline Actions Even we can extract constant from GEPs -> Even if we can extract constants from GEPs hfinkel: Even we can extract constant from GEPs -> Even if we can extract constants from GEPs
		// fully CSE for complex GEPs. There are two reasons:
		// (1) The above algorithm can not extract constants in structure type indices.
		hfinkelUnsubmitted Not Done Reply Inline Actions algorithm hfinkel: algorithm
		jingyueUnsubmitted Not Done Reply Inline Actions Nice explanation! However, in the final version, you may want to merge these improvements into the old comments. You can mention the improvements in the summary of the revision. jingyue: Nice explanation! However, in the final version, you may want to merge these improvements into…
		// Actually the indices of structure types in GEP are always constants which
		// can always be extracted.
		// (2) As GEPs can have variable indices and can often not be matched by the
		// addressing mode matcher. Such GEPs will be kept as-is into CodeGen, which
		// can only see GEPs in the same basic block. If two GEPs are in different
		jingyueUnsubmitted Not Done Reply Inline Actions I don't understand reason 2. Can you explain more? Are you saying, given a[i][j1] and a[i][j2], you can reuse the common part &a[i]? jingyue: I don't understand reason 2. Can you explain more? Are you saying, given a[i][j1] and a[i][j2]…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yes, GEPs like "getelementptr %ptr, i, j1" and "getelementptr %ptr, i, j2" can not be eliminated. But transformed into simpler forms, "i" and "j1" "j2" are separated, and we can find the common expressions about "i". HaoLiu: Yes, GEPs like "getelementptr %ptr, i, j1" and "getelementptr %ptr, i, j2" can not be…
		// basic blocks, CodeGen will not be able to tell if there are common
		// subexpressions between them.
		//
		// For the above reasons, if SimplifyGEP is true, this pass will transform
		// complex GEPs into simpler forms of several simpler GEPs or a
		// "ptrtoint + arithmetic + inttoptr" form (depends on whether the target use
		// alias analysis in codegen).
		// E.g. the following two GEPs are used in two basic blocks:
		// BB1:
		jingyueUnsubmitted Not Done Reply Inline Actions The sentence "another simpler form of simpler GEPs is similar" is a little awkward. The word "simple" is very overloaded. What do you mean exactly by "simpler"? Consider using multiple sentences and/or an example to explain what this lowering to GEPs exactly mean. jingyue: The sentence "another simpler form of simpler GEPs is similar" is a little awkward. The word…
		// %p = getelementptr inbounds [240 x %struct]* %in, i64 0, i64 %idx, i32 3
		// load %p
		// ...
		// BB2:
		// %p2 = getelementptr inbounds [240 x %struct]* %in, i64 0, i64 %idx, i32 2
		// load %p2
		// ...
		//
		// This pass will transfer it into arithmetic forms as follows (The simpler
		// GEP forms are similar):
		// BB1:
		// %1 = ptrtoint [240 x %struct]* %in to i64
		// %2 = mul i64 %idx, length_Of_struct
		// %3 = add i64 %1, %2
		// %4 = add i64 %3, 12
		// %p = inttoptr i64 %4 to i32*
		// load %p
		// ...
		// BB2:
		// %5 = ptrtoint [240 x %struct]* %in to i64
		// %6 = mul i64 %idx, length_Of_struct
		// %7 = add i64 %5, %6
		// %8 = add i64 %7, 8
		jingyueUnsubmitted Not Done Reply Inline Actions This => Lowering GEPs jingyue: This => Lowering GEPs
		// %p2 = inttoptr i64 %8 to i32*
		jingyueUnsubmitted Not Done Reply Inline Actions The word "optimize" is overloaded. Is hoist/sink a better word? jingyue: The word "optimize" is overloaded. Is hoist/sink a better word?
		// load %p
		// ...
		//
		// So that it can extract the constant in the last structure type index and it
		// is easy to remove following subexpressions:
		jingyueUnsubmitted Not Done Reply Inline Actions Such transformation can split it => Lowering GEPs can split such GEPs jingyue: Such transformation can split it => Lowering GEPs can split such GEPs
		// %5 = ptrtoint [240 x %struct]* %in to i64
		// %6 = mul i64 %idx, length_Of_struct
		// %7 = add i64 %5, %6
		//
		// This can also improve the address sinking logic in CodeGenPrepare.
		// CodeGenPrepare tries to sink address calculations that match the target's
		// addressing modes. Complex GEPs may not match and will not be sunk in this
		// case. If with such transformation, part of the original GEPs can be matched
		// and should be sunk, so we end up with better address calculations.
		//
		//===----------------------------------------------------------------------===//

	#include "llvm/Analysis/TargetTransformInfo.h"	#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/Analysis/ValueTracking.h"	#include "llvm/Analysis/ValueTracking.h"
	#include "llvm/IR/Constants.h"	#include "llvm/IR/Constants.h"
		jingyueUnsubmitted Not Done Reply Inline Actions Lower => Lowering jingyue: Lower => Lowering
		jingyueUnsubmitted Not Done Reply Inline Actions lower => we lower jingyue: lower => we lower
Context not available.
	#include "llvm/Support/CommandLine.h"	#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/raw_ostream.h"	#include "llvm/Support/raw_ostream.h"
	#include "llvm/Transforms/Scalar.h"	#include "llvm/Transforms/Scalar.h"
		jingyueUnsubmitted Not Done Reply Inline Actions lower => we lower jingyue: lower => we lower
		#include "llvm/Target/TargetMachine.h"
		jingyueUnsubmitted Not Done Reply Inline Actions better => a better jingyue: better => a better
		#include "llvm/Target/TargetSubtargetInfo.h"
		#include "llvm/IR/IRBuilder.h"

	using namespace llvm;	using namespace llvm;

		jingyueUnsubmitted Not Done Reply Inline Actions //, or null if we cannot extract a constant offset. jingyue: //, or null if we cannot extract a constant offset.
		jingyueUnsubmitted Not Done Reply Inline Actions You may want to say Find looks for constant offsets in both array indices and field indices. Because the signature of Extract is changed, you need to remove the comment "The meaning of the arguments and the return value ..." and explicitly explain what its arguments and return value mean. jingyue: You may want to say Find looks for constant offsets in both array indices and field indices.
		jingyueUnsubmitted Not Done Reply Inline Actions I don't see the first part of this comment addressed. I'd like to emphasize "both array indices and field indices". jingyue: I don't see the first part of this comment addressed. I'd like to emphasize "both array indices…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think "both array indices and field indices" means all indices. We don't need to explain this common behavior. It is quite obvious. I think we need comments if we only find in one kind of "array indices" or "field indices", or do something special. Anyway, the new comments will be like the comments for Extract() HaoLiu: I think "both array indices and field indices" means all indices. We don't need to explain this…
		jingyueUnsubmitted Not Done Reply Inline Actions That makes sense. jingyue: That makes sense.
Context not available.
	class SeparateConstOffsetFromGEP : public FunctionPass {	class SeparateConstOffsetFromGEP : public FunctionPass {
	public:	public:
	static char ID;	static char ID;
	SeparateConstOffsetFromGEP() : FunctionPass(ID) {	SeparateConstOffsetFromGEP(const TargetMachine *TM = nullptr,
		bool SimplifyGEP = false)
		: FunctionPass(ID), TM(TM), SimplifyGEP(SimplifyGEP) {
	initializeSeparateConstOffsetFromGEPPass(*PassRegistry::getPassRegistry());	initializeSeparateConstOffsetFromGEPPass(*PassRegistry::getPassRegistry());
	}	}

		jingyueUnsubmitted Not Done Reply Inline Actions typo: the the given => the given a constant offsets => a constant offset wheter => whether depends on => depending on use => uses several simpler GEPs or arithmetic operations jingyue: typo: the the given => the given a constant offsets => a constant offset wheter => whether…
		jingyueUnsubmitted Not Done Reply Inline Actions I don't like the word "transform" and "simpler". To be more accurate, I'd use lower instead of transform. What do you exactly mean by "simpler"? Are you lowering a GEP with multiple indices to multiple GEPs with a single index? If that's the case, I'd say that explicitly in order to be more accurate. jingyue: I don't like the word "transform" and "simpler". To be more accurate, I'd use lower instead of…
		jingyueUnsubmitted Not Done Reply Inline Actions ditto jingyue: ditto
		jingyueUnsubmitted Not Done Reply Inline Actions will find => finds will only find => only finds else => ; otherwise, jingyue: will find => finds will only find => only finds else => ; otherwise,
		jingyueUnsubmitted Not Done Reply Inline Actions none zero => non-zero jingyue: none zero => non-zero
Context not available.
	/// Tries to split the given GEP into a variadic base and a constant offset,	/// Tries to split the given GEP into a variadic base and a constant offset,
	/// and returns true if the splitting succeeds.	/// and returns true if the splitting succeeds.
	bool splitGEP(GetElementPtrInst *GEP);	bool splitGEP(GetElementPtrInst *GEP);
		jingyueUnsubmitted Not Done Reply Inline Actions a better name would be transformToArithmetics because you are using not only adds but also muls jingyue: a better name would be transformToArithmetics because you are using not only adds but also muls
	/// Finds the constant offset within each index, and accumulates them. This	/// Tries to extract a constant offsets from the the given GEP and simplify
	/// function only inspects the GEP without changing it. The output	/// it into simpler forms of several simpler GEPs or an arithmetic form
		jingyueUnsubmitted Not Done Reply Inline Actions single index => a single index jingyue: single index => a single index
	/// NeedsExtraction indicates whether we can extract a non-zero constant	/// (depends on wheter the target use alias analysis in codegen).
		jingyueUnsubmitted Not Done Reply Inline Actions The grammar of the sentence "It has been extracted a constant offset" is incorrect. /// Lower a GEP with multiple indices into multiple GEPs with a single index. /// Function splitGEP already split the original GEP into a variadic part and a /// constant offset (i.e., AccumulativeByteOffset). This function lowers the /// variadic part into a set of GEPs with a single index and applies /// AccumulativeByteOffset to it. /// GEP The variadic part of the original GEP /// AccumulativeByteOffset ... I'd even like to change the name of the argument GEP to Variadic, because GEP appears so many times throughout this file and usually refers to the original GEP. jingyue: The grammar of the sentence "It has been extracted a constant offset" is incorrect. ``` ///…
	/// offset from any index.	bool extractConstantAndSimplifyGEP(GetElementPtrInst *GEP);
	int64_t accumulateByteOffset(GetElementPtrInst *GEP, bool &NeedsExtraction);	/// Finds the constant offset within each index for sequential types, and
		/// accumulates them. This function only inspects the GEP without changing
		/// it.
		/// The output NeedsExtraction indicates whether we can extract a constant
		/// offset.
		int64_t accumulateByteOffsetForSeq(GetElementPtrInst *GEP,
		bool &NeedsExtraction);
		/// Finds the constant offset within each index for sequential and struct
		jingyueUnsubmitted Not Done Reply Inline Actions ditto jingyue: ditto
		/// types, and accumulates them. This function only inspects the GEP without
		/// changing it. The output NeedsExtraction indicates whether we can extract
		jingyueUnsubmitted Not Done Reply Inline Actions /// Whether to transform jingyue: /// Whether to transform
		/// a constant offset.
		int64_t accumulateByteOffsetForSeqAndStruct(GetElementPtrInst *GEP,
		bool &NeedsExtraction);
		jingyueUnsubmitted Not Done Reply Inline Actions we can find => we successfully find My original comment wasn't clear enough, and I think "we successfully find" is better. jingyue: we can find => we successfully find My original comment wasn't clear enough, and I think "we…
	/// Canonicalize array indices to pointer-size integers. This helps to	/// Canonicalize array indices to pointer-size integers. This helps to
	/// simplify the logic of splitting a GEP. For example, if a + b is a	/// simplify the logic of splitting a GEP. For example, if a + b is a
	/// pointer-size integer, we have	/// pointer-size integer, we have
Context not available.
	bool canonicalizeArrayIndicesToPointerSize(GetElementPtrInst *GEP);	bool canonicalizeArrayIndicesToPointerSize(GetElementPtrInst *GEP);
		jingyueUnsubmitted Not Done Reply Inline Actions Add a comment on what SimplifyGEP means. jingyue: Add a comment on what SimplifyGEP means.

		jingyueUnsubmitted Not Done Reply Inline Actions What would be a good name? You are not really simplifying but to some extend complicating GEPs :) Does LowerGEP sound good to you? You are lowering a GEP into multiple uglygeps (btw, uglygep is a known concept in LLVM) or arithmetic operations which are closer to machine code. Make sure the comments are consistent after renaming. jingyue: What would be a good name? You are not really simplifying but to some extend complicating…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions That make sence. HaoLiu: That make sence.
	const DataLayout *DL;	const DataLayout *DL;
		const TargetMachine *TM;
		bool SimplifyGEP;
	};	};
	} // anonymous namespace	} // anonymous namespace
		jingyueUnsubmitted Not Done Reply Inline Actions single index => a single index jingyue: single index => a single index

Context not available.
	"Split GEPs to a variadic base and a constant offset for better CSE", false,	"Split GEPs to a variadic base and a constant offset for better CSE", false,
	false)	false)

	FunctionPass *llvm::createSeparateConstOffsetFromGEPPass() {	FunctionPass *
	return new SeparateConstOffsetFromGEP();	llvm::createSeparateConstOffsetFromGEPPass(const TargetMachine *TM,
		bool SimplifyGEP) {
		return new SeparateConstOffsetFromGEP(TM, SimplifyGEP);
	}	}

	bool ConstantOffsetExtractor::CanTraceInto(bool SignExtended,	bool ConstantOffsetExtractor::CanTraceInto(bool SignExtended,
		jingyueUnsubmitted Not Done Reply Inline Actions If (ConstantOffset == 0) return NewIdx; // ... return Extractor.rebuild... jingyue: If (ConstantOffset == 0) return NewIdx; // ... return Extractor.rebuild...
		jingyueUnsubmitted Not Done Reply Inline Actions vairable => variable jingyue: vairable => variable
		jingyueUnsubmitted Not Done Reply Inline Actions These are essentially byte offsets instead of indices, because they are scaled by element sizes. Do we have a better name? ByteOffsetOfIndex? jingyue: These are essentially byte offsets instead of indices, because they are scaled by element sizes.
		jingyueUnsubmitted Not Done Reply Inline Actions { at the end jingyue: { at the end
		jingyueUnsubmitted Not Done Reply Inline Actions } else { jingyue: } else {
		jingyueUnsubmitted Not Done Reply Inline Actions Actually, is this vector even necessary? Can you build the resultant uglygep along the way? jingyue: Actually, is this vector even necessary? Can you build the resultant uglygep along the way?
		jingyueUnsubmitted Not Done Reply Inline Actions If you buy that transformXXX should run after setIsInBounds(false), you needn't check inbound in this function. jingyue: If you buy that transformXXX should run after setIsInBounds(false), you needn't check inbound…
		jingyueUnsubmitted Not Done Reply Inline Actions Why checking whether it's a constant? Shouldn't you check whether it is an array index? My understanding is, the offsets of structure indices (which are guaranteed to be constant) and the constant offsets of array indices are already counted in accumulativeByteOffsets, and now you only need to rebuild the variadic part of the array indices. jingyue: Why checking whether it's a constant? Shouldn't you check whether it is an array index? My…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yes, you are right. I'll add comments to explain this. HaoLiu: Yes, you are right. I'll add comments to explain this.
		jingyueUnsubmitted Not Done Reply Inline Actions Not critical, but I slightly prefer getZExtValue() because a field index is non-negative and getElementOffset takes unsigned. jingyue: Not critical, but I slightly prefer getZExtValue() because a field index is non-negative and…
		jingyueUnsubmitted Not Done Reply Inline Actions I wasn't just proposing a comment change. I meant the source code should read something like if (isa<SequentialType>(GTI)) You are building new GEPs/arithmetics for the remainder of the array indices, and checking whether the indexed type is a SequentialType matches your intention much better. I also believe checking for SequentialType instead of ConstantInt solves your previous question on handling "a = 1 + 2". I'll go back to that later. jingyue: I wasn't just proposing a comment change. I meant the source code should read something like…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Aha, previously I use getZExtValue(). You commented I should change to getSExtValue(). I thought it was not critical, so I modified it. Now, It seems I should roll back... : ) HaoLiu: Aha, previously I use getZExtValue(). You commented I should change to getSExtValue(). I…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions If check SequentialType, we also need to check wheter it is ConstantInt. Then the code will be like: if (isa<SequentialType>(GTI)) if (!isa<ConstantInt>(GEP->getOperand(I)) ... else AccumulativeByteOffset += Const This looks more complicate. About the "a = 1 + 2", I think handling it when extracting constant in indices is better. Then we don't need to add code in both lowerToXXX functions. Also if we handle it in lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem. HaoLiu: If check SequentialType, we also need to check wheter it is ConstantInt. Then the code will be…
		jingyueUnsubmitted Not Done Reply Inline Actions No, I don't think I suggested you change it to getSExtValue(). When I said "getSExtValue() returns uint64_t", I expected you to write uint64_t Field = ...->getZExtValue(); because I slightly prefer delaying truncating which loses precision. But, anyway, I'm fine with "unsigned". jingyue: No, I don't think I suggested you change it to getSExtValue(). When I said "getSExtValue()…
		jingyueUnsubmitted Not Done Reply Inline Actions I'm confused. How does AccumulativeByteOffset in your snippet come up here? I think AccumulativeByteOffset is computed before calling transformToSimplerGEPs. Why do you bother changing that again? In case I haven't made this clear, I never expect Find/Extract always extracts all constant offsets. For example, given a = 1 + 2, I'm totally fine if the Find function finds only 2, and in that case the remaining code should treat 1 just as a variable remainder. The GEP after extraction may look like "gep base, i1, 1, i2" which may not be the optimal form, but again I am fine with that because it's sound. If we want to get the optimal form, I am looking for either a more comprehensive way (not just handling tricky cases like adding two constants; your initial diff might serve as a good start but the code was too complicated) or reassociating each index before this pass so that all constant offsets in the index are merged (which I suppose is the approach we want to go with). With that, do we still need the nested "if (!isa<CostantInt>)" check? Do you have an example where simply checking isa<SequentialType> isn't enough? I think a concrete example may help to make things clearer. jingyue: I'm confused. How does AccumulativeByteOffset in your snippet come up here? I think…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I understand your meaning. We just let the "1" as a remainder. It's sound because the logic of Find/Extract is to handle only one constant and return. It's reasonable. Previously I focused on making it optimal. So I want to extract all the constant. Anyway, this is just corner case and it may be fixed in the future. I'm fine with that. BTW, there is already a test case @test_const_add checking we generate correct result for "a = 1 + 2". HaoLiu: Yeah, I understand your meaning. We just let the "1" as a remainder. It's sound because the…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Oh, I double checked that. I indeed misunderstood your meaning. Last time I used "unsigned Field = xxx->getZExtValue()". not "uint64_t Field = xxx->getZExtValue()". Sorry for the noise. Anyway, diff 16135 has already fix it. HaoLiu: Oh, I double checked that. I indeed misunderstood your meaning. Last time I used "unsigned…
Context not available.
	}	}

	int64_t	int64_t
	SeparateConstOffsetFromGEP::accumulateByteOffset(GetElementPtrInst *GEP,	SeparateConstOffsetFromGEP::accumulateByteOffsetForSeq(GetElementPtrInst *GEP,
	bool &NeedsExtraction) {	bool &NeedsExtraction) {
	NeedsExtraction = false;	NeedsExtraction = false;
	int64_t AccumulativeByteOffset = 0;	int64_t AccumulativeByteOffset = 0;
	gep_type_iterator GTI = gep_type_begin(*GEP);	gep_type_iterator GTI = gep_type_begin(*GEP);
		jingyueUnsubmitted Not Done Reply Inline Actions The meaning of "simpler" here is unclear. jingyue: The meaning of "simpler" here is unclear.
Context not available.
	bool Changed = canonicalizeArrayIndicesToPointerSize(GEP);	bool Changed = canonicalizeArrayIndicesToPointerSize(GEP);

	bool NeedsExtraction;	bool NeedsExtraction;
	int64_t AccumulativeByteOffset = accumulateByteOffset(GEP, NeedsExtraction);	int64_t AccumulativeByteOffset =
		accumulateByteOffsetForSeq(GEP, NeedsExtraction);

	if (!NeedsExtraction)	if (!NeedsExtraction)
	return Changed;	return Changed;
		jingyueUnsubmitted Not Done Reply Inline Actions getZExtValue() returns uint64_t. jingyue: getZExtValue() returns uint64_t.
		jingyueUnsubmitted Not Done Reply Inline Actions Can we merge the two transformToXXX functions? The logics are pretty similar. jingyue: Can we merge the two transformToXXX functions? The logics are pretty similar.
		jingyueUnsubmitted Not Done Reply Inline Actions Similarly, this function should be merged into accumulateByteOffset. jingyue: Similarly, this function should be merged into accumulateByteOffset.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Make sence. It can be controlled by the LowerGEP. HaoLiu: Make sence. It can be controlled by the LowerGEP.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think we'd better not do merge. They logic look similar, but they do quite different things. If we want to merge them, we need to add checks like "if (TransformToSimplerGEP)" in many places (From the first for loop logic to the function end). That would be a quite hard work for other people to understand the code. HaoLiu: I think we'd better not do merge. They logic look similar, but they do quite different things.
		jingyueUnsubmitted Not Done Reply Inline Actions // ... in each sequential index Emphasizing you only remove constant offsets from sequential indices is important, because the behavior here is slightly different from accumulateByteOffset which looks into both sequential and struct indices. You also need to explain why removing constant offsets in struct field index here is dangerous, and mention the transformXXX functions will take care of that. jingyue: // ... in each sequential index Emphasizing you only remove constant offsets from sequential…
		jingyueUnsubmitted Not Done Reply Inline Actions ditto. curly brackets. jingyue: ditto. curly brackets.
		jingyueUnsubmitted Not Done Reply Inline Actions Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set, +AccumulativeByteOffset will still become part of the lowered GEPs or arithmetics (see how transformToSimplerGEPs create a GEP with offset AccumulativeByteOffset, and how transformToArithmetics create an add with offset AccumulativeByteOffset). Shouldn't we still check whether this offset can be nicely folded into an addressing mode? jingyue: Why do we need to check LowerGEP here? My understanding is, even when LowerGEP is set…
		jingyueUnsubmitted Not Done Reply Inline Actions { jingyue: {
		jingyueUnsubmitted Not Done Reply Inline Actions } jingyue: }
		jingyueUnsubmitted Not Done Reply Inline Actions The more I look at these two functions, the more strongly I believe we should merge them. Especially, after you get rid of ResultIndices, you probably only need to check LowerGEP once in the merged function. No? jingyue: The more I look at these two functions, the more strongly I believe we should merge them.
		jingyueUnsubmitted Not Done Reply Inline Actions ditto. See Line 721 jingyue: ditto. See Line 721
		jingyueUnsubmitted Not Done Reply Inline Actions "such form" is too obscure jingyue: "such form" is too obscure
		jingyueUnsubmitted Not Done Reply Inline Actions Run these transformXXX functions before GEP->setIsInBounds(false) seems wrong to me, because transformXXX may treat off-bound GEPs as inbound and add nsw flags incorrectly. jingyue: Run these transformXXX functions before GEP->setIsInBounds(false) seems wrong to me, because…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think even a pointer plus a constant index can not be matched in an addressing mode, it is worth to lower a GEP with multiple indices. Because we can do optimizations on the variable indices. Also if the constant index is extracted from multiple indices, the result looks better than the original GEP. If I remove such check, some test cases will fail. So it is kept when disable LowerGEP. HaoLiu: I think even a pointer plus a constant index can not be matched in an addressing mode, it is…
		jingyueUnsubmitted Not Done Reply Inline Actions That makes some sense. Out of curiosity, which test would fail? jingyue: That makes some sense. Out of curiosity, which test would fail?
		jingyueUnsubmitted Not Done Reply Inline Actions I don't see this comment addressed. jingyue: I don't see this comment addressed.
		jingyueUnsubmitted Not Done Reply Inline Actions diable => LowerGEP is disabled jingyue: diable => LowerGEP is disabled
		jingyueUnsubmitted Not Done Reply Inline Actions // we always don't => we don't remove struct field indices here "always don't" is not the right grammar. Besides, you do extract field indices later, so I'd weaken the word "never". jingyue: // we always don't => we don't remove struct field indices here "always don't" is not the…
		jingyueUnsubmitted Not Done Reply Inline Actions enable LowerGEP => LowerGEP is enabled jingyue: enable LowerGEP => LowerGEP is enabled
		jingyueUnsubmitted Not Done Reply Inline Actions I wrote lowerToXXX in my reviews just for simplicity. However, in the source code, we should still use the full names. jingyue: I wrote lowerToXXX in my reviews just for simplicity. However, in the source code, we should…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I still doubt about your suggestion. The diff 16027 gets ride of ResultIndices. They are still quite different and difficult to combine them. Lowering to arithmetic operations starts from creating ptrtoint, and then creates ADD/MUL/SHL, and ends with creating ADD and inttoptr. Lowering to GEPs starts from creating bitcast, and then creates MUL/SHL/GEPs, and ends with creating GEP and bitcast. HaoLiu: I still doubt about your suggestion. The diff 16027 gets ride of ResultIndices. They are still…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Oh, interesting. I supposed some of your test cases in test/Transforms/SeparateConstOffsetFromGEP/NVPTX may fail. But I tried removing such check and ran "make check", there is no test failures. So it seems there is no such test cases... HaoLiu: Oh, interesting. I supposed some of your test cases in…
		jingyueUnsubmitted Not Done Reply Inline Actions Anyway, I'll probably do that by myself after this diff is in. jingyue: Anyway, I'll probably do that by myself after this diff is in.
		jingyueUnsubmitted Not Done Reply Inline Actions That was my suspicion. This check on the addressing mode is a very weak check: I believe both NVPTX and AArch64 support reg+<a potentially very large offset>, so I would be surprised that having/not having this check would cause tests to fail. You explained pretty well why it's still worth to lower a GEP even if address mode matching fails. I'd like to see that in the source code comments. Thanks! jingyue: That was my suspicion. This check on the addressing mode is a very weak check: I believe both…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions OK, thanks. HaoLiu: OK, thanks.
		jingyueUnsubmitted Not Done Reply Inline Actions The grammar of "So we don't such check" is wrong. => Therefore, we don't check for addressing modes in that case. jingyue: The grammar of "So we don't such check" is wrong. => Therefore, we don't check for addressing…
		jingyueUnsubmitted Not Done Reply Inline Actions The following xxx and yyy => lowerToSingleIndexGEPs or lowerToArithmetics will later jingyue: The following xxx and yyy => lowerToSingleIndexGEPs or lowerToArithmetics will later
Context not available.
	return true;	return true;
	}	}

		int64_t SeparateConstOffsetFromGEP::accumulateByteOffsetForSeqAndStruct(
		GetElementPtrInst *GEP, bool &NeedsExtraction) {
		NeedsExtraction = false;
		int64_t AccumulativeByteOffset = 0;
		gep_type_iterator GTI = gep_type_begin(GEP);
		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
		if (isa<SequentialType>(*GTI)) {
		// Tries to extract a constant offset from this GEP index.
		bool FoundConst = false;
		int64_t ConstantOffset = ConstantOffsetExtractor::Find(
		GEP->getOperand(I), DL, GEP, FoundConst);
		if (FoundConst) {
		NeedsExtraction = true;
		// A GEP may have multiple indices. We accumulate the extracted
		// constant offset to a byte offset, and later offset the remainder of
		// the original GEP with this byte offset.
		AccumulativeByteOffset +=
		ConstantOffset * DL->getTypeAllocSize(GTI.getIndexedType());
		}
		} else {
		StructType StTy = cast<StructType>(GTI);
		unsigned Field = cast<ConstantInt>(GEP->getOperand(I))->getZExtValue();
		// Skip field 0 as the offset is always 0.
		if (Field != 0) {
		NeedsExtraction = true;
		AccumulativeByteOffset +=
		DL->getStructLayout(StTy)->getElementOffset(Field);
		}
		}
		}
		return AccumulativeByteOffset;
		}

		bool SeparateConstOffsetFromGEP::extractConstantAndSimplifyGEP(
		GetElementPtrInst *GEP) {
		// Skip vector GEPs.
		if (GEP->getType()->isVectorTy())
		return false;

		// The backend can already nicely handle the case where all indices are
		// constants.
		if (GEP->hasAllConstantIndices())
		return false;

		bool NeedsExtraction;
		int64_t AccumulativeByteOffset =
		accumulateByteOffsetForSeqAndStruct(GEP, NeedsExtraction);

		if (!NeedsExtraction)
		return false;

		canonicalizeArrayIndicesToPointerSize(GEP);
		// Remove the constant offset in each GEP index. The resultant GEP computes
		// the variadic base.
		gep_type_iterator GTI = gep_type_begin(*GEP);
		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
		// Skip indices for struct types as they are always constants and have been
		// accumulated.
		if (isa<SequentialType>(*GTI)) {
		// Tries to extract constant offsets and build new index.
		Value *NewIdx =
		ConstantOffsetExtractor::Extract(GEP->getOperand(I), DL, GEP);
		if (NewIdx != nullptr)
		GEP->setOperand(I, NewIdx);
		}
		}

		// The inbounds keyword describes pointer arithmetic overflow conditions. As
		// indices are always treated as signed values, set the HasNSW to InBounds
		// when creating MUL/SHL/ADD.
		bool InBounds = GEP->isInBounds();
		IRBuilder<> Builder(GEP);
		Type *IntPtrTy = DL->getIntPtrType(GEP->getType());
		// As currently BasicAA does not analyze ptrtoint/inttoptr, do not transform
		// to such form if the target use alias analysis in codegen,
		bool TansformToSimplerGEPs =
		TM && TM->getSubtarget<TargetSubtargetInfo>().useAA();
		if (TansformToSimplerGEPs) {
		SmallVector<Value *, 4> ResultIndices;
		GTI = gep_type_begin(*GEP);
		// Create new indices for each vairable indices.
		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
		if (!isa<ConstantInt>(GEP->getOperand(I))) {
		Value *Idx = GEP->getOperand(I);
		APInt ElementSize = APInt(IntPtrTy->getIntegerBitWidth(),
		DL->getTypeAllocSize(GTI.getIndexedType()));
		hfinkelUnsubmitted Not Done Reply Inline Actions I think that if the original GEP has the "inbounds" property, then the shl, mul here (and the same below plus the add in the ptrtoint/inttoptr code) can have the nsw/nuw flags. hfinkel: I think that if the original GEP has the "inbounds" property, then the shl, mul here (and the…
		if (ElementSize != 1) {
		if (ElementSize.isPowerOf2())
		Idx = Builder.CreateShl(
		Idx, ConstantInt::get(IntPtrTy, ElementSize.logBase2()), "",
		jingyueUnsubmitted Not Done Reply Inline Actions Looks quite similar to splitGEP. Can we massage this into splitGEP, and guard the part that lowers GEP with if (SimplifyGEP)? jingyue: Looks quite similar to splitGEP. Can we massage this into splitGEP, and guard the part that…
		false, InBounds);
		else
		Idx =
		Builder.CreateMul(Idx, ConstantInt::get(IntPtrTy, ElementSize),
		"", false, InBounds);
		}
		ResultIndices.push_back(Idx);
		}
		}

		Type *I8PtrTy =
		Builder.getInt8PtrTy(GEP->getType()->getPointerAddressSpace());
		Value *ResultPtr = GEP->getOperand(0);
		if (ResultPtr->getType() != I8PtrTy)
		ResultPtr = Builder.CreateBitCast(ResultPtr, I8PtrTy);

		// Create simpler GEPs for each variable indices.
		for (auto I = ResultIndices.begin(), E = ResultIndices.end(); I != E; ++I) {
		if (InBounds)
		ResultPtr = Builder.CreateInBoundsGEP(ResultPtr, *I, "simplegep");
		else
		ResultPtr = Builder.CreateGEP(ResultPtr, *I, "simplegep");
		}
		// Create a simple GEP for the constant offset index.
		if (AccumulativeByteOffset != 0) {
		Value *Offset = ConstantInt::get(IntPtrTy, AccumulativeByteOffset);
		if (InBounds)
		ResultPtr = Builder.CreateInBoundsGEP(ResultPtr, Offset, "simplegep");
		jingyueUnsubmitted Not Done Reply Inline Actions OK, now I see what you're doing here. According to our discussion in D5863, this looks like a temporary work around for the CFGSimplification issue, and will eventually go away. I don't think it's a good idea to hack this pass to work around that issue. I'd rather run instsimplify before SeparateConstOffset in the AArch64 backend, and add a FIXME there saying once the CFGSimplification issue is fixed we needn't run instsimplify any more. jingyue: OK, now I see what you're doing here. According to our discussion in D5863, this looks like a…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I just don't think this is a temporary work. We cannot assume there is no situation like "a = 1+2". People also can create a test case of "a = 1 + 2" and feed to this pass. I mean people can call this pass without any other passes. So if we don't accumulate all none zero indices in this place, we'll get incorrect result. We want to " Remove the constant offset in each GEP index" and " Splits this GEP index into a variadic part and a constant offset" in this for loop. So the later part about transformation won't consider about the none zero constant indices, it assumes there is all varaible indices or zero indices. See that I only check "if (!isa<ConstantInt>(GEP->getOperand(I)))". So if we leave an index of "constant 1" behand, it will be ignored and we'll generate incorrect result. That's why I want to make sure we have accumulated all the none zero constant indices. HaoLiu: I just don't think this is a temporary work. We cannot assume there is no situation like "a =…
		jingyueUnsubmitted Not Done Reply Inline Actions I'm confused. Check my comment on the line if (!isa<ConstantInt>(GEP->getOperand(I))) and if my comment there makes sense I'll come back here. jingyue: I'm confused. Check my comment on the line ``` if (!isa<ConstantInt>(GEP->getOperand(I))) ```…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I think this is the right place to do such thing. As the comments say, we want to "// Remove the constant offset in each GEP index." If we should leave some constant offset in index. Also, if we fix such things in the following lowerToXXX functions, the logic for the original splitGEP (I.E. when lowerGEP is diabled) can still not handle this problem. HaoLiu: I think this is the right place to do such thing. As the comments say, we want to "// Remove…
		else
		ResultPtr = Builder.CreateGEP(ResultPtr, Offset, "simplegep");
		}
		jingyueUnsubmitted Not Done Reply Inline Actions The comment here is confusing. It doesn't seem to explain what the following code does. jingyue: The comment here is confusing. It doesn't seem to explain what the following code does.

		if (ResultPtr->getType() != GEP->getType())
		jingyueUnsubmitted Not Done Reply Inline Actions Why name it OtherConst? jingyue: Why name it OtherConst?
		ResultPtr = Builder.CreateBitCast(ResultPtr, GEP->getType());
		GEP->replaceAllUsesWith(ResultPtr);
		GEP->eraseFromParent();
		return true;
		}

		// Else we transform a complex GEP to ptrtoint/inttoptr.
		Value *ResultPtr = Builder.CreatePtrToInt(GEP->getOperand(0), IntPtrTy);
		GTI = gep_type_begin(*GEP);
		// Create ADD/SHL/MUL arithmetic operations for each vairable indices.
		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
		if (!isa<ConstantInt>(GEP->getOperand(I))) {
		Value *Idx = GEP->getOperand(I);
		APInt ElementSize = APInt(IntPtrTy->getIntegerBitWidth(),
		DL->getTypeAllocSize(GTI.getIndexedType()));
		if (ElementSize != 1) {
		if (ElementSize.isPowerOf2())
		Idx = Builder.CreateShl(
		Idx, ConstantInt::get(IntPtrTy, ElementSize.logBase2()), "",
		false, InBounds);
		else
		Idx = Builder.CreateMul(Idx, ConstantInt::get(IntPtrTy, ElementSize),
		"", false, InBounds);
		}
		ResultPtr = Builder.CreateAdd(ResultPtr, Idx, "", false, InBounds);
		}
		}

		// Create an ADD for the constant offset.
		jingyueUnsubmitted Not Done Reply Inline Actions After extractConstantAndSimplifyGEP is merged into splitGEP, the code here can be simplified. jingyue: After extractConstantAndSimplifyGEP is merged into splitGEP, the code here can be simplified.
		if (AccumulativeByteOffset != 0)
		ResultPtr = Builder.CreateAdd(
		ResultPtr, ConstantInt::get(IntPtrTy, AccumulativeByteOffset), "",
		false, InBounds);

		ResultPtr = Builder.CreateIntToPtr(ResultPtr, GEP->getType());
		GEP->replaceAllUsesWith(ResultPtr);
		GEP->eraseFromParent();
		return true;
		}

	bool SeparateConstOffsetFromGEP::runOnFunction(Function &F) {	bool SeparateConstOffsetFromGEP::runOnFunction(Function &F) {
	if (DisableSeparateConstOffsetFromGEP)	if (DisableSeparateConstOffsetFromGEP)
	return false;	return false;

	bool Changed = false;	bool Changed = false;
		// Enabling simplifying a complex GEP to several simpler GEPs or a
		// ptrtoint + arithmetic + inttoptr form.
		if (SimplifyGEP) {
		for (auto &B : F) {
		for (BasicBlock::iterator I = B.begin(), IE = B.end(); I != IE;)
		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I++))
		Changed \|= extractConstantAndSimplifyGEP(GEP);
		}
		return Changed;
		}

	for (Function::iterator B = F.begin(), BE = F.end(); B != BE; ++B) {	for (Function::iterator B = F.begin(), BE = F.end(); B != BE; ++B) {
	for (BasicBlock::iterator I = B->begin(), IE = B->end(); I != IE; ) {	for (BasicBlock::iterator I = B->begin(), IE = B->end(); I != IE; ) {
	if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I++)) {	if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I++)) {
Context not available.

test/CodeGen/AArch64/aarch64-gep-opt.ll

This file was added.

				; RUN: llc -O3 -verify-machineinstrs %s -o - \| FileCheck %s
				; RUN: llc -O3 -print-after=codegenprepare -mcpu=cyclone < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s
				; RUN: llc -O3 -print-after=codegenprepare -mcpu=cortex-a53 < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA <%t %s
				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--linux-gnueabi"

				; Following test cases test enabling SeparateConstOffsetFromGEP pass in AArch64
				; backend. If useAA() returns true, it will transform a complex GEP into simpler
				; GEPs, else it will transform a complex GEP into a ptrtoint/inttoptr form.

				%struct = type { i32, i32, i32, i32, [20 x i32] }

				; Check that when two complex GEPs are used in two basic blocks, LLVM can
				; elimilate the common subexpression for the second use.
				define void @test_GEP_CSE([240 x %struct]* %string, i32* %adj, i32 %lib, i64 %idxprom) {
				%liberties = getelementptr inbounds [240 x %struct]* %string, i64 1, i64 %idxprom, i32 3
				%1 = load i32* %liberties, align 4
				%cmp = icmp eq i32 %1, %lib
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%origin = getelementptr inbounds [240 x %struct]* %string, i64 1, i64 %idxprom, i32 2
				%2 = load i32* %origin, align 4
				store i32 %2, i32* %adj, align 4
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				}

				; CHECK-LABEL: test_GEP_CSE:
				; CHECK: madd
				; CHECK: ldr
				; CHECK-NOT: madd
				; CHECK:ldr

				; CHECK-NoAA-LABEL: @test_GEP_CSE(
				; CHECK-NoAA: [[PTR0:%[a-zA-Z0-9]+]] = ptrtoint [240 x %struct]* %string to i64
				; CHECK-NoAA: [[PTR1:%[a-zA-Z0-9]+]] = mul nsw i64 %idxprom, 96
				; CHECK-NoAA: [[PTR2:%[a-zA-Z0-9]+]] = add nsw i64 [[PTR0]], [[PTR1]]
				; CHECK-NoAA: add nsw i64 [[PTR2]], 23052
				; CHECK-NoAA: inttoptr
				; CHECK-NoAA: if.then:
				; CHECK-NoAA-NOT: ptrtoint
				; CHECK-NoAA-NOT: mul
				; CHECK-NoAA: add nsw i64 [[PTR2]], 23048
				; CHECK-NoAA: inttoptr

				; CHECK-UseAA-LABEL: @test_GEP_CSE(
				; CHECK-UseAA: [[IDX:%[a-zA-Z0-9]+]] = mul nsw i64 %idxprom, 96
				; CHECK-UseAA: [[PTR0:%[a-zA-Z0-9]+]] = bitcast [240 x %struct]* %string to i8*
				; CHECK-UseAA: [[PTR1:%[a-zA-Z0-9]+]] = getelementptr inbounds i8* [[PTR0]], i64 [[IDX]]
				; CHECK-UseAA: getelementptr inbounds i8* [[PTR1]], i64 23052
				; CHECK-UseAA: bitcast
				; CHECK-UseAA: if.then:
				; CHECK-UseAA: getelementptr inbounds i8* [[PTR1]], i64 23048
				; CHECK-UseAA: bitcast

				%class.my = type { i32, [128 x i32], i32, [256 x %struct.pt]}
				%struct.pt = type { %struct.point*, i32, i32 }
				%struct.point = type { i32, i32 }

				; Check when a GEP is used across two basic block, LLVM can sink the address
				; calculation and code gen can generate a better addressing mode for the second
				; use.
				define void @test_GEP_across_BB(%class.my* %this, i64 %idx) {
				%1 = getelementptr inbounds %class.my* %this, i64 0, i32 3, i64 %idx, i32 1
				%2 = load i32* %1, align 4
				%3 = getelementptr inbounds %class.my* %this, i64 0, i32 3, i64 %idx, i32 2
				%4 = load i32* %3, align 4
				%5 = icmp eq i32 %2, %4
				br i1 %5, label %if.true, label %exit

				if.true:
				%6 = shl i32 %4, 1
				store i32 %6, i32* %3, align 4
				br label %exit

				exit:
				%7 = add nsw i32 %4, 1
				store i32 %7, i32* %1, align 4
				ret void
				}
				; CHECK-LABEL: test_GEP_across_BB:
				; CHECK: ldr {{w[0-9]+}}, [{{x[0-9]+}}, #528]
				; CHECK: ldr {{w[0-9]+}}, [{{x[0-9]+}}, #532]
				; CHECK-NOT: add
				; CHECK: str {{w[0-9]+}}, [{{x[0-9]+}}, #532]
				; CHECK: str {{w[0-9]+}}, [{{x[0-9]+}}, #528]

				; CHECK-NoAA-LABEL: test_GEP_across_BB(
				; CHECK-NoAA: add nsw i64 [[TMP:%[a-zA-Z0-9]+]], 528
				; CHECK-NoAA: add nsw i64 [[TMP]], 532
				; CHECK-NoAA: if.true:
				; CHECK-NoAA: {{%sunk[a-zA-Z0-9]+}} = add i64 [[TMP]], 532
				; CHECK-NoAA: exit:
				; CHECK-NoAA: {{%sunk[a-zA-Z0-9]+}} = add i64 [[TMP]], 528

				; CHECK-UseAA-LABEL: test_GEP_across_BB(
				; CHECK-UseAA: [[PTR0:%[a-zA-Z0-9]+]] = getelementptr
				; CHECK-UseAA: getelementptr inbounds i8* [[PTR0]], i64 528
				; CHECK-UseAA: getelementptr inbounds i8* [[PTR0]], i64 532
				; CHECK-UseAA: if.true:
				; CHECK-UseAA: {{%sunk[a-zA-Z0-9]+}} = getelementptr i8* [[PTR0]], i64 532
				; CHECK-UseAA: exit:
				; CHECK-UseAA: {{%sunk[a-zA-Z0-9]+}} = getelementptr i8* [[PTR0]], i64 528

				%struct.S = type { float, double }
				@struct_array = global [1024 x %struct.S] zeroinitializer, align 16

				; The following two test cases check we can extract constant from indices of
				; struct type.
				; The constant offsets are from indices "i64 %idxprom" and "i32 1". As the
				; alloca size of %struct.S is 16, and "i32 1" is the 2rd element whose field
				; offset is 8, the total constant offset is (5 * 16 + 8) = 88.
				define double* @test-struct_1(i32 %i) {
				entry:
				%add = add nsw i32 %i, 5
				%idxprom = sext i32 %add to i64
				%p = getelementptr inbounds [1024 x %struct.S]* @struct_array, i64 0, i64 %idxprom, i32 1
				ret double* %p
				}
				; CHECK-NoAA-LABEL: @test-struct_1(
				; CHECK-NoAA-NOT: getelementptr
				; CHECK-NoAA: add nsw i64 %{{[a-zA-Z0-9]+}}, 88

				; CHECK-UseAA-LABEL: @test-struct_1(
				; CHECK-UseAA: getelementptr inbounds i8* %{{[a-zA-Z0-9]+}}, i64 88

				%struct3 = type { i64, i32 }
				%struct2 = type { %struct3, i32 }
				%struct1 = type { i64, %struct2 }
				%struct0 = type { i32, i32, i64*, [100 x %struct1] }

				; The constant offsets are from indices "i32 3", "i64 %arrayidx" and "i32 1".
				; "i32 3" is the 4th element whose field offset is 16. The alloca size of
				; %struct1 is 32. "i32 1" is the 2rd element whose field offset is 8. So the
				; total constant offset is 16 + (-2 * 32) + 8 = -40
				define %struct2* @test-struct_2(%struct0* %ptr, i64 %idx) {
				entry:
				%arrayidx = add nsw i64 %idx, -2
				%ptr2 = getelementptr inbounds %struct0* %ptr, i64 0, i32 3, i64 %arrayidx, i32 1
				ret %struct2* %ptr2
				}
				; CHECK-NoAA-LABEL: @test-struct_2(
				; CHECK-NoAA-NOT: = getelementptr
				; CHECK-NoAA: add nsw i64 %{{[a-zA-Z0-9]+}}, -40

				; CHECK-UseAA-LABEL: @test-struct_2(
				; CHECK-UseAA: getelementptr inbounds i8* %{{[a-zA-Z0-9]+}}, i64 -40

test/CodeGen/AArch64/arm64-addr-mode-folding.ll

	; RUN: llc -O3 -mtriple arm64-apple-ios3 %s -o - \| FileCheck %s			; RUN: llc -O3 -mtriple arm64-apple-ios3 -aarch64-gep-opt=false %s -o - \| FileCheck %s
	; <rdar://problem/13621857>			; <rdar://problem/13621857>

	@block = common global i8* null, align 8			@block = common global i8* null, align 8

test/CodeGen/AArch64/arm64-cse.ll

	; RUN: llc -O3 < %s -aarch64-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc -O3 < %s -aarch64-atomic-cfg-tidy=0 -aarch64-gep-opt=false \| FileCheck %s
	target triple = "arm64-apple-ios"			target triple = "arm64-apple-ios"

	; rdar://12462006			; rdar://12462006

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 15380

include/llvm/Transforms/Scalar.h

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

test/CodeGen/AArch64/aarch64-gep-opt.ll

test/CodeGen/AArch64/arm64-addr-mode-folding.ll

test/CodeGen/AArch64/arm64-cse.ll

[AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.
ClosedPublic