This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Use users of GEP to guess access type in getGEPCost
ClosedPublic

Authored by luke on May 4 2023, 12:18 PM.

Details

Summary

Currently getGEPCost uses the target type of the GEP as a heuristic for
the type that will be accessed, to pass onto isLegalAddressingMode.
Targets use this to work out if a GEP can then be folded into the
load/store instruction that uses the GEP.
For example, on RISC-V loads and stores can have an offset added to a
base register folded into a single instruction, so the following GEP is
free:

%p = getelementptr i32, ptr %base, i32 42 ; getInstructionCost = 0
%x = load i32, ptr %p ; getInstructionCost = 1
------------------------------------------------------------------------
lw t0, a0(42)

However vector loads and stores cannot have an offset folded into them,
so the following GEP is costed:

%p = getelementptr <2 x i32>, ptr %base, i32 42 ; getInstructionCost = 1
%x = load <2 x i32>, ptr %p ; getInstructionCost = 1
------------------------------------------------------------------------
addi a0, 42
vle32 v8, (a0)

The issue arises whenever there is a mismatch between the target type of
the GEP and the type that is actually accessed:

%p = getelementptr i32, ptr %base, i32 42 ; getInstructionCost = 0
%x = load <2 x i32>, ptr %p ; getInstructionCost = 1
------------------------------------------------------------------------
addi a0, 42
vle32 v8, (a0)

Even though this GEP will result in an add instruction, because TTI
thinks it's loading an i32, it will think it can be folded and not
charge for it.

The target type can become mismatched with the memory access during
transformations, noticeably during SLP where a scalar base pointer will
be reused to perform a vector load or store.

This patch adds an optional AccessType argument to getGEPCost which
allows the type of memory accessed by users to be passed in as a hint,
so that we can more accurately determine if the GEP can be folded into
its users.

If AccessType is not provided, getGEPCost falls back to the old
behaviour of using the PointeeType to guess the memory access type. This
can be revisited in a later patch.

Also for now, only GEPs with exactly one user use the access type hint.
Whilst we could look through all users and use all access types to
determine if we can fold the GEP, this patch avoids doing so to prevent
O(N) behaviour.

Diff Detail

Event Timeline

luke created this revision.May 4 2023, 12:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 12:18 PM
luke requested review of this revision.May 4 2023, 12:18 PM
luke edited the summary of this revision. (Show Details)May 4 2023, 12:21 PM
luke added inline comments.May 4 2023, 12:27 PM
llvm/include/llvm/Analysis/TargetTransformInfo.h
301

Not overly thrilled with introducing a struct just for this, nor with the constructors. Suggestions are very welcome here

llvm/lib/Analysis/TargetTransformInfo.cpp
229–233

I remember seeing this logic being duplicated in a few places across llvm, earlycse is the first one that comes to mind. It might be worthwhile addressing this in a follow up patch

llvm/test/Transforms/SLPVectorizer/RISCV/gep.ll
18 ↗(On Diff #519600)

This is the main result from all the noise in the test cases.

nikic added a comment.May 4 2023, 12:55 PM

You should be able to avoid the LICM changes by adjusting the code at https://github.com/llvm/llvm-project/blob/bfb7c99f3aeab09236adf1f684f7144f384c6dd7/llvm/lib/Transforms/Scalar/LICM.cpp#L1346-L1361 to only pass the users in the loop (and drop the separate user iteration that code does).

Something I'm concerned about here is that your default implementation is going to scan all users of the instruction, which means that the cost query is now O(n) instead of O(1). This is probably not acceptable for compile-time reasons. Can we get away with only inspecting a single user in that case and assume it is representative?

This change seems to cause some pretty large code size changes on x86 at least (https://llvm-compile-time-tracker.com/compare.php?from=b77265711b390a6905e8901694d75f4b3812c71a&to=113f1989fa7e469407942249b4086e3a28da2bf4&stat=size-text), so it seems like a pretty high impact change. I can't say whether those changes are good or bad though :)

luke updated this revision to Diff 519811.May 5 2023, 5:11 AM

Only account for users in loop in LICM, fix inliner test

luke added a comment.May 5 2023, 5:21 AM

You should be able to avoid the LICM changes by adjusting the code at https://github.com/llvm/llvm-project/blob/bfb7c99f3aeab09236adf1f684f7144f384c6dd7/llvm/lib/Transforms/Scalar/LICM.cpp#L1346-L1361 to only pass the users in the loop (and drop the separate user iteration that code does).

Thanks, just tried it and it seems to work.

Something I'm concerned about here is that your default implementation is going to scan all users of the instruction, which means that the cost query is now O(n) instead of O(1). This is probably not acceptable for compile-time reasons. Can we get away with only inspecting a single user in that case and assume it is representative?

Yeah I think that should still work for our SLP use case, sounds like a good compromise. Will try that and report back.

This change seems to cause some pretty large code size changes on x86 at least (https://llvm-compile-time-tracker.com/compare.php?from=b77265711b390a6905e8901694d75f4b3812c71a&to=113f1989fa7e469407942249b4086e3a28da2bf4&stat=size-text), so it seems like a pretty high impact change. I can't say whether those changes are good or bad though :)

Indeed, from a quick look it looks like it might be affecting the inliner?

luke updated this revision to Diff 522222.May 15 2023, 9:05 AM

Only use the first user to avoid O(N) compile times.
Move the memory access type logic into Instruction.h in a dependent patch.

luke updated this revision to Diff 522223.May 15 2023, 9:08 AM

Remove redundant empty check

luke added inline comments.May 15 2023, 9:10 AM
llvm/lib/Transforms/Scalar/LICM.cpp
1346–1354 ↗(On Diff #522223)

@nikic I kept the O(N) behaviour here because it looks like we're already iterating through all the users below. But this area seems sensitive. Do we want O(1) behaviour in this block too?

asb added inline comments.May 17 2023, 3:49 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
286–296

Perhaps it's worth expanding the doc comment to explain how AccessTypes may be used? e.g. "Callers may pass in the access type of users of the GEP in order to allow a more accurate cost, for instance on targets where the legal addressing modes are different for different types."

Of course personal tastes on the amount of doc comments to provide can vary....

luke added inline comments.May 17 2023, 5:38 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
286–296

It definitely needs some documentation, thanks for pointing that out. The usage of AccessTypes can be quite subtle when it comes to users that aren't loads/stores

luke updated this revision to Diff 523014.May 17 2023, 5:39 AM

Add documentation comments

ABataev added inline comments.May 17 2023, 5:47 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1018

Shall we use getArithmeticInstrCost(ADD) instead of TCC_Basic?

1080

Same question, cost of int add instead of basic?

luke added inline comments.May 17 2023, 6:37 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1080

I agree we should use getArithmeticInstrCost here, but I have some ideas about adding a new target hook to control this and was wondering if I could defer it to that patch?
On RISC-V if Scale != 1 then we will need an extra mul instruction, as opposed to aarch64 which can perform both an add and multiply in one. I think this would make sense to expose the address calculation cost as a hook since it's target dependent, and we already have all the information available here to pass onto TTI.

ABataev added inline comments.May 17 2023, 7:26 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1080

Add FIXME/TODOs here then

luke updated this revision to Diff 523338.May 18 2023, 4:27 AM
luke marked 2 inline comments as done.

Add TODO about modelling with arithmetic instr

luke marked 2 inline comments as done.May 18 2023, 4:28 AM
ABataev added inline comments.May 18 2023, 5:51 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1018

return BaseGV ? TTI::TCC_Basic : TTI::TCC_Free;

nikic added inline comments.May 28 2023, 12:23 PM
llvm/include/llvm/Analysis/TargetTransformInfo.h
293

to determine

296

stray "which"?

309

ArrayRef should suffice?

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1018

Why is the BaseGV case not free as well?

llvm/lib/Transforms/Scalar/LICM.cpp
1346–1354 ↗(On Diff #522223)

It's okay, but you should drop the extra loop below. It's purpose is to essentially do the check you're doing here, just in a very crude way.

luke updated this revision to Diff 533228.Jun 21 2023, 5:58 AM
luke marked 2 inline comments as done.

Rebase and address review comments

luke marked 2 inline comments as done.Jun 21 2023, 6:06 AM

@nikic @ABataev thanks for the review, and sorry for the long delay here. I've rebased this and have run some benchmarks now that SLP is enabled by default for RISC-V.
I'm still seeing a reduction in the number of small unprofitable VFs on RISC-V with this patch, and it also seems to make a few more VF=8s profitable.
As a rough overview, here's the number vsetivlis on the benchmark before:

vsetivli : 467 total
1: 26
2: 247
3: 2
4: 169
5: 1
6: 0
7: 1
8: 17

And after:

vsetivli : 454 total
1: 24
2: 235
3: 2
4: 168
5: 1
6: 0
7: 1
8: 19
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1018

This is just to match the behaviour of the previous if (Operands.empty()) check. I'll refactor this so that they share this logic

A macro comment on the approach here. I think you're trying to do too much in a single change, and that makes it very hard to review.

I would advise doing the following:

  • Remove the array of users API. Instead, pass a single optional access type.
  • In TTIImpl, get the access type of the *sole user*. If there isn't a sole user, *do not pass one*.
  • As a first step, use the access type in the legal addressing mode if available, and *otherwise the pointee type*. This is slightly wrong, but is wrong in the "it's very close to what we used to do" kinda of way which will reduce test deltas significantly. You can change this behavior in a follow up change if profitable.
  • Add tests *via cost model* which demonstrate the change above. You should be able to (on RISCV) demonstrate the difference between access type of vector vs scalar.
  • All of the other changes in this review - i.e. everything in lib/Transform - should be independent changes structured after the above.
luke updated this revision to Diff 533642.Jun 22 2023, 9:09 AM
  • Make AccessType just a single pointer
  • Isolate changes to just cost model for now: will follow up with SLP and other users later
  • Preserve original behaviour if accesstype isn't passed
luke added a comment.Jun 22 2023, 9:11 AM

A macro comment on the approach here. I think you're trying to do too much in a single change, and that makes it very hard to review.

I would advise doing the following:

  • Remove the array of users API. Instead, pass a single optional access type.
  • In TTIImpl, get the access type of the *sole user*. If there isn't a sole user, *do not pass one*.
  • As a first step, use the access type in the legal addressing mode if available, and *otherwise the pointee type*. This is slightly wrong, but is wrong in the "it's very close to what we used to do" kinda of way which will reduce test deltas significantly. You can change this behavior in a follow up change if profitable.
  • Add tests *via cost model* which demonstrate the change above. You should be able to (on RISCV) demonstrate the difference between access type of vector vs scalar.
  • All of the other changes in this review - i.e. everything in lib/Transform - should be independent changes structured after the above.

Agreed, thanks for the pointers. I've split it up and the diff is much smaller now. Will follow up with other patches to do the lib/Transform changes.

reames accepted this revision.Jun 28 2023, 10:37 AM

LGTM w/required changes.

In addition to the inline code comment, make sure you update your commit message. The current review description is out of sync with the code and you must fix that before landing.

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4972

Pass nullptr here so that you don't change the x86 behavior.

This revision is now accepted and ready to land.Jun 28 2023, 10:37 AM
luke edited the summary of this revision. (Show Details)Jun 28 2023, 2:58 PM
luke edited the summary of this revision. (Show Details)