This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Split large data structres to sink more GEPs
ClosedPublic

Authored by haicheng on Jan 31 2018, 12:28 PM.

Details

Summary

Accessing the members of a large data structures needs a lot of GEPs which usually have large offsets due to the size of the underlying data structure. If the offsets are too large to fit into the r+i addressing mode, these GEPs cannot be sunk to their users' blocks and many extra registers are needed then to carry the values of these GEPs.

This patch tries to split a large data struct starting from %base like the following.

Before:

BB0:
  %base     =

BB1:
  %gep0     = gep %base, off0
  %gep1     = gep %base, off1
  %gep2     = gep %base, off2

BB2:
  %load1    = load %gep0
  %load2    = load %gep1
  %load3    = load %gep2

After:

BB0:
  %base     =
  %new_base = gep %base, off0

BB1:
  %new_gep0 = %new_base
  %new_gep1 = gep %new_base, off1 - off0
  %new_gep2 = gep %new_base, off2 - off0

BB2:
  %load1    = load i32, i32* %new_gep0
  %load2    = load i32, i32* %new_gep1
  %load3    = load i32, i32* %new_gep2

In the above example, the struct is split into two parts. The first part still starts from %base and the second part starts from %new_base. After the splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to BB2 and folded into their users.

The algorithm to split data structure is simple and very similar to the work of merging SExts. First, it collects GEPs that have large offsets when iterating the blocks. Second, it splits the underlying data structures and updates the collected GEPs to use smaller offsets.

The code size and performance results of spec20xx is listed as below

Code Size (%)Performance (%)
(- is smaller)(+ is faster)
spec2006/bzip2-3.47+2.89
spec2017/imagick-0.01+1.02
spec2000/mesa-0.76+0.71
spec2017/x264-0.91+0.39
spec2017/leela-0.01+0.31
spec2000/ammp-0.04+0.17
spec2017/parest00
spec2017/xalancbmk0-0.13
spec2017/blender0-0.18
spec2006/xalancbmk0-0.51

Diff Detail

Repository
rL LLVM

Event Timeline

haicheng created this revision.Jan 31 2018, 12:28 PM
efriedma added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
3776

I haven't really reviewed the whole patch closely, so I might be missing something, but why does it matter that the element type is a struct, as opposed to something else large like an array?

haicheng added inline comments.Jan 31 2018, 1:05 PM
lib/CodeGen/CodeGenPrepare.cpp
3776

Nothing special, I am just being conservative in the beginning. I find that at this stage, LLVM can bitcast arbitrary pointers to i8* and do random stuffs. I am thinking supporting other large data structures as the next step.

tobiasvk added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
432

Drive-by comment: it's not a good idea to modify a cl::opt here as this will create (benign) races between multiple backend threads in ThinLTO.

haicheng updated this revision to Diff 133782.Feb 10 2018, 6:22 PM
haicheng retitled this revision from [CGP] Split large structs to sink more GEPs to [CGP] Split large data structres to sink more GEPs.
haicheng edited the summary of this revision. (Show Details)

I made two changes to address the comments.

  • Add the support of splitting large arrays.
  • Introduce a target hook to enable/disable the change.

I am also welcome to any alternative implementation suggestions.

A few quick comments. Will follow up with a more complete review later this week.

lib/CodeGen/CodeGenPrepare.cpp
3768

Perhaps something like:

// Record GEPs with a non-zero offsets as candidates for splitting in the event that the offset cannot fit into the r+i addressing mode.
3771

Add isa<GetElementPtrInst>(AddrInst) check?

3780

You could reduce the indent here by putting a 'isa<GetElementPtrInst>(AddrInst) in the outer most if statement and then make this a cast<>. See comment above.

3781

Also, I need some clarification here. The first check

(BaseI && GEP->getParent() != BaseI->getParent())

seems to be inline with the comments below, but I don't completely follow the second check (i.e., checking that the PHI is not in the entry block, if BaseI is null).

efriedma added inline comments.Feb 12 2018, 11:38 AM
lib/Target/AArch64/AArch64ISelLowering.cpp
8218

The base of a GEP is always a pointer.

8220

I'm still not sure what the purpose of this check is; why does it matter how many dimensions an array has?

haicheng updated this revision to Diff 134614.Feb 16 2018, 7:18 AM
haicheng marked 8 inline comments as done.
  • Completely remove the check of base types. So, any large data structure is supported.
  • Clarify some comments.

Thank you for the review.

haicheng updated this revision to Diff 135450.Feb 22 2018, 9:52 AM

Made some changes to improve the readability.

Kindly ping. I am appreciated to take any advice.

Kindly Ping #2

Kindly Ping #3

Just minor nits inlined.

lib/CodeGen/CodeGenPrepare.cpp
3768

a non-zero offsets -> a non-zero offset

3774

It seems that you can move this check to "else" by changing "else" to "else if".

4319

You may want to have a new variable for GetElementPtrInst instead of doing LargeOffsetGEP.first several times.

4321

!NewGEPBases.count(LargeOffsetGEP.first)

4323

I'm not clear about this comment. Can you clarify it little bit?

typo: spllit

lib/Target/AArch64/AArch64ISelLowering.cpp
8217

Should we have BaseType as parameter ?

haicheng updated this revision to Diff 139315.Mar 21 2018, 10:19 AM
haicheng marked 6 inline comments as done.

Address Jun's comments.

Thank you very much for taking a look.

skatkov added inline comments.Mar 26 2018, 10:12 PM
lib/CodeGen/CodeGenPrepare.cpp
3767

Why do you need the check isa<GetElementPtrInst>(AddrInst)?
You are in "case Instruction::GetElementPtr" basing on "switch (Opcode) {" where Opcode is actually AddrInst->getOpcode().
So to me it is redundant...

3786

getParent()->getParent() => getFunction()

4828

Please add a comment for nontrivial comparison function.

4882

I wonder if you sort the an array anyway why not to use set like data-structure to keep them and avoid erase of unique elements?
Is there any hidden sense for that?

4894

How can we get GEP == nullptr here?

4938

getParent()->getParent() => getFunction()

haicheng updated this revision to Diff 139959.Mar 27 2018, 10:59 AM
haicheng marked 4 inline comments as done.

Update to address Serguei's comments. Thank you for taking a look.

lib/CodeGen/CodeGenPrepare.cpp
3767

AddrInst is a user. It can be a GetElementPtrConstantExpr which is not supported by CGP yet.

4882

I need a data structure that can be iterated in a sorted order (ascending by offsets) and all elements (pointers to GEPs) are unique. I think the data in the set are unique but I cannot access them in the order that I want If I use a set.

Kindly Ping

efriedma added inline comments.Apr 9 2018, 5:06 PM
lib/CodeGen/CodeGenPrepare.cpp
2548

AssertingVH<GetElementPtrInst>

4892

This is going to visit GEPs with the same offset in non-deterministic order. I guess it might not be that important, but it'll probably mess with the use-lists, which could affect some later pass. I would rather stay on the safe side and sort GEPs with the same offset based on the order they were added to the map, or some other deterministic ordering.

4899

The result type of the GEP might not be the type of the memory access... but I don't know how you'd get the right type off the top of my head, and it might not be important in practice. Maybe worth noting with a comment, though.

4933

I think you have to be a bit more careful here to avoid inserting a GEP into a block with a catchswitch. (This is Windows-only exception-handling, so maybe difficult to trigger.)

haicheng updated this revision to Diff 142989.Apr 18 2018, 1:34 PM

Address Eli's comments. Thank you very much, Eli.

haicheng marked 2 inline comments as done.Apr 18 2018, 1:36 PM
haicheng added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
4933

Now I check if the blocks contain catchswitch when collecting GEPs.

This is looking good. Just a few more small comments.

lib/CodeGen/CodeGenPrepare.cpp
278

Probably these maps should also use AssertingVH.

3778

I'm pretty sure it's impossible for BaseI to be a BinaryOperator, given it's a pointer.

haicheng updated this revision to Diff 144146.Apr 26 2018, 9:51 AM
haicheng marked an inline comment as done.

Address Eli's comments. Thank you.

haicheng marked 2 inline comments as done.Apr 26 2018, 9:54 AM
efriedma added inline comments.Apr 27 2018, 2:35 PM
lib/CodeGen/CodeGenPrepare.cpp
278

By "these maps", I meant NewGEPBases and LargeOffsetGEPMap too.

haicheng updated this revision to Diff 144711.May 1 2018, 7:14 AM

Add more AssertingVH.

efriedma added inline comments.May 1 2018, 11:16 AM
lib/CodeGen/CodeGenPrepare.cpp
278

LargeOffsetGEPMap still has a GetElementPtrInst * that isn't an AssertingVH.

haicheng updated this revision to Diff 144868.May 2 2018, 6:09 AM
haicheng marked 3 inline comments as done.

Add one more AssertingVH

Kindly Ping

efriedma accepted this revision.May 9 2018, 3:23 PM

LGTM

This revision is now accepted and ready to land.May 9 2018, 3:23 PM
This revision was automatically updated to reflect the committed changes.
jonpa added a subscriber: jonpa.Jun 19 2022, 8:36 AM
jonpa added inline comments.
llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
3816 ↗(On Diff #146171)

Is there any reason to not use '!=' as oppsed to '>' here (which would match the comment below)?

On SystemZ negative offsets are not supported on vector instructions.

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 19 2022, 8:36 AM
efriedma added inline comments.Jun 20 2022, 3:41 PM
llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
3816 ↗(On Diff #146171)

It could work. Would probably need a few other changes to make sure we don't do anything silly like grouping together positive and negative offsets.

jonpa added inline comments.Jun 21 2022, 4:09 AM
llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
3816 ↗(On Diff #146171)

I may be missing something, but as far as I can tell there should not be any problem just because the offset is negative. The type for offsets is already int64_t, and the algorithm of sorting them by offset and inserting a new GEP whenever needed seems to work regardless of a negative offset, since the most negative offset is generated first and then the remainder will be positive, or?

efriedma added inline comments.Jun 21 2022, 10:19 AM
llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
3816 ↗(On Diff #146171)

I don't remember exactly how the algorithm for grouping offsets together functions; maybe it just works.