Page MenuHomePhabricator

[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook

Authored by aeubanks on Jan 5 2021, 9:25 PM.



Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal

  1. It disables inlining if the number of BBs in a function would exceed some limit
  2. It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline ( fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Diff Detail

Event Timeline

aeubanks created this revision.Jan 5 2021, 9:25 PM
aeubanks requested review of this revision.Jan 5 2021, 9:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 9:25 PM
arsenm added inline comments.Jan 6 2021, 8:43 AM

I'm not convinced we ever really needed this. I believe the standard inline heuristic will always do this anyway


I'm not sure I like having a target hook for a hack like this

aeubanks added inline comments.Jan 6 2021, 9:03 AM

There was one case where this forced a coldcc call to be inlined where normally it wouldn't be. Of course, why put coldcc on some random function?

Will remove.


Would you rather just remove this altogether?

arsenm added inline comments.

I don't remember the story here. @rampitec @vpykhtin @dfukalov ?

aeubanks added inline comments.Jan 6 2021, 9:54 AM

It was added in
Apparently it's a hack to help with compile times. That's a bit surprising, is this an AMDGPU specific issue?

rampitec added inline comments.Jan 6 2021, 10:24 AM

This was added to unwrap several layers of wrappers we have in device lib. Without this heuristic it was not always properly handled by the standard inliner.


It is AMDGPU specific only in a sense. We tend to inline a lot, much more than other targets. Therefor we can have drastic compile time issues. In particular there are several pretty big codes which compile hours instead of one or two minutes without this (suboptimal) cutoff.

arsenm added inline comments.Jan 6 2021, 10:28 AM

Is it possible this was only due to a bug in the subtarget feature compatibility handling? I believe the standard inline analysis specifically looks for wrappers as well

rampitec added inline comments.Jan 6 2021, 10:33 AM

It did not work this way at least back when it was added. I think we should not change the logic with this patch. This patch is infrastructural, if we want to tune heuristics that would be a separate work.

aeubanks added inline comments.Jan 6 2021, 10:41 AM

That's fair. I'll keep everything for now.

@arsenm Any thoughts on alternatives to target hooks?

arsenm added inline comments.Jan 6 2021, 10:43 AM

I think we should first try to remove this part and see how it goes

rnk added inline comments.Jan 6 2021, 10:59 AM

I think it makes sense to avoid building infrastructure (new TTI hooks) if it turns out that it is never needed. It's much harder to remove hooks than it is to add them. However, we want to make sure that Arthur can make progress flipping the default pass manager without doing any AMDGPU tuning work. That's clearly out of scope for him.

We could proceed with a minimal set of inliner hooks, flip the default pass manager, and allow downstream AMD folks to sort out any performance problems later. AMD or any other vendor can configure CMake to use the old pass manager if they aren't ready to triage the performance impacts of the NPM. Would that be OK?

Removing the wrapper check seems to work fine: D94187

aeubanks edited the summary of this revision. (Show Details)Jan 12 2021, 9:08 PM
arsenm added inline comments.Jan 13 2021, 6:47 AM

I still don't really want to expand this compile time hack to a target hook

foad added a subscriber: foad.Jan 13 2021, 7:05 AM
foad added inline comments.

"A value to be added to the inlining threshold"? You can't add something by something.

foad added inline comments.Jan 13 2021, 7:11 AM

"Maximum number of BBs allowed ..."

aeubanks updated this revision to Diff 316719.Jan 14 2021, 11:37 AM

comment fixups
move max BB check into areInlineCompatible

aeubanks marked 2 inline comments as done.Jan 14 2021, 11:40 AM
aeubanks added inline comments.

is stuffing this into areInlineCompatible() better?

aeubanks retitled this revision from [AMDGPU][Inliner] Remove amdgpu-inline and add new TTI inline hooks to [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook.Jan 20 2021, 11:53 PM


This revision is now accepted and ready to land.Jan 21 2021, 11:47 AM