This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
ClosedPublic

Authored by aeubanks on Jan 5 2021, 9:25 PM.

Details

Summary

Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:

  1. It disables inlining if the number of BBs in a function would exceed some limit
  2. It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Diff Detail

Event Timeline

aeubanks created this revision.Jan 5 2021, 9:25 PM
aeubanks requested review of this revision.Jan 5 2021, 9:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 9:25 PM
arsenm added inline comments.Jan 6 2021, 8:43 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1195–1199

I'm not convinced we ever really needed this. I believe the standard inline heuristic will always do this anyway

1204

I'm not sure I like having a target hook for a hack like this

aeubanks added inline comments.Jan 6 2021, 9:03 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1195–1199

There was one case where this forced a coldcc call to be inlined where normally it wouldn't be. Of course, why put coldcc on some random function?

Will remove.

1204

Would you rather just remove this altogether?

arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1204

I don't remember the story here. @rampitec @vpykhtin @dfukalov ?

aeubanks added inline comments.Jan 6 2021, 9:54 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1204

It was added in https://reviews.llvm.org/D62917.
Apparently it's a hack to help with compile times. That's a bit surprising, is this an AMDGPU specific issue?

rampitec added inline comments.Jan 6 2021, 10:24 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1195–1199

This was added to unwrap several layers of wrappers we have in device lib. Without this heuristic it was not always properly handled by the standard inliner.

1204

It is AMDGPU specific only in a sense. We tend to inline a lot, much more than other targets. Therefor we can have drastic compile time issues. In particular there are several pretty big codes which compile hours instead of one or two minutes without this (suboptimal) cutoff.

arsenm added inline comments.Jan 6 2021, 10:28 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1204

Is it possible this was only due to a bug in the subtarget feature compatibility handling? I believe the standard inline analysis specifically looks for wrappers as well

rampitec added inline comments.Jan 6 2021, 10:33 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1204

It did not work this way at least back when it was added. I think we should not change the logic with this patch. This patch is infrastructural, if we want to tune heuristics that would be a separate work.

aeubanks added inline comments.Jan 6 2021, 10:41 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1204

That's fair. I'll keep everything for now.

@arsenm Any thoughts on alternatives to target hooks?

arsenm added inline comments.Jan 6 2021, 10:43 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1204

I think we should first try to remove this part and see how it goes

rnk added inline comments.Jan 6 2021, 10:59 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1204

I think it makes sense to avoid building infrastructure (new TTI hooks) if it turns out that it is never needed. It's much harder to remove hooks than it is to add them. However, we want to make sure that Arthur can make progress flipping the default pass manager without doing any AMDGPU tuning work. That's clearly out of scope for him.

We could proceed with a minimal set of inliner hooks, flip the default pass manager, and allow downstream AMD folks to sort out any performance problems later. AMD or any other vendor can configure CMake to use the old pass manager if they aren't ready to triage the performance impacts of the NPM. Would that be OK?

Removing the wrapper check seems to work fine: D94187

aeubanks edited the summary of this revision. (Show Details)Jan 12 2021, 9:08 PM
arsenm added inline comments.Jan 13 2021, 6:47 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1184–1194

I still don't really want to expand this compile time hack to a target hook

foad added a subscriber: foad.Jan 13 2021, 7:05 AM
foad added inline comments.
llvm/include/llvm/Analysis/TargetTransformInfo.h
292

"A value to be added to the inlining threshold"? You can't add something by something.

foad added inline comments.Jan 13 2021, 7:11 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
74

"Maximum number of BBs allowed ..."

aeubanks updated this revision to Diff 316719.Jan 14 2021, 11:37 AM

comment fixups
move max BB check into areInlineCompatible

aeubanks marked 2 inline comments as done.Jan 14 2021, 11:40 AM
aeubanks added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
1184–1194

is stuffing this into areInlineCompatible() better?

aeubanks retitled this revision from [AMDGPU][Inliner] Remove amdgpu-inline and add new TTI inline hooks to [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook.Jan 20 2021, 11:53 PM

ping

This revision is now accepted and ready to land.Jan 21 2021, 11:47 AM
llvm/lib/Target/AMDGPU/AMDGPU.h