Page MenuHomePhabricator

AMDGPU: Account for usage HIP-style dynamic LDS
ClosedPublic

Authored by scchan on Jan 17 2022, 8:15 AM.

Details

Summary

Disable promote alloca to LDS when HIP-style dynamic LDS since the size
is unknown at compile time.

Change-Id: I13244a767cf172f63d2261bf6dd78ee8d2c21c44

Diff Detail

Event Timeline

scchan created this revision.Jan 17 2022, 8:15 AM
scchan requested review of this revision.Jan 17 2022, 8:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2022, 8:15 AM
arsenm added inline comments.Jan 17 2022, 8:47 AM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
787

The dynamic behavior is keyed on this being a 0 size allocation, not external linkage. Some of the graphics usage has external LDS with fixed, static sizes

787

You also don't need a separate loop, can move down to the existing loop over UsedLDS

arsenm added inline comments.Jan 17 2022, 8:51 AM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
787

Although I'm wondering if externally defined globals are allowed to be larger than the declared size

This may cause some perf degradation.

Do we know why promote alloca does not work for dynamic LDS? Is there a better way to handle this? Thanks.

This may cause some perf degradation.

Do we know why promote alloca does not work for dynamic LDS? Is there a better way to handle this? Thanks.

We can't take LDS if the user may be using an unknown amount. The only way we could continue to do this is if we had an explicit optimization hint provided telling us either the max dynamic total size or allocation

arsenm added inline comments.Jan 17 2022, 10:27 AM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
787

I guess logically if you're allowed to declare this as a larger size somewhere else if the size is 0, you're also allowed to do it for any other size. That said, I still think this check should be moved down to the existing loop over UsedLDS

scchan updated this revision to Diff 401010.Jan 18 2022, 3:17 PM

Check for zero-sized and move the check into the existing loop.

scchan marked an inline comment as done.Jan 18 2022, 3:18 PM
arsenm accepted this revision.Jan 19 2022, 7:15 AM
This revision is now accepted and ready to land.Jan 19 2022, 7:15 AM
yaxunl accepted this revision.Jan 19 2022, 7:31 AM

I am OK for now. We may need keep an eye for the perf regression and be prepared to figure a way to alleviate that.

I am OK for now. We may need keep an eye for the perf regression and be prepared to figure a way to alleviate that.

Right but there isn't an easy way. Either extending the language with a hint like Matt suggested or we have to clone the kernel and have the runtime to choose the one that fits at execution time.
Do you mind landing this patch? thanks

This revision was automatically updated to reflect the committed changes.