Page MenuHomePhabricator

AMDGPU: Round up kernel argument allocation size
ClosedPublic

Authored by arsenm on May 25 2018, 5:50 AM.

Details

Summary

AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.

One thing I'm unclear on is if we need to really allocate the
implicit arg bytes if they aren't actually used

Diff Detail

Event Timeline

arsenm created this revision.May 25 2018, 5:50 AM

Are we sure that is what RT(s) do?

Are we sure that is what RT(s) do?

It doesn't really matter if it does or not, since we're now requesting the larger allocation

t-tye added inline comments.May 25 2018, 12:05 PM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
426

I believe you can align this to 16. See HSA spec at www.hsafoundation.com/html_spec111/HSA_Library.htm#PRM/Topics/04_SyntaxSemantics/kernarg_segment.htm which says:

"The alignment of the base address of the kernel's kernarg segment variables is the larger of 16 bytes and the maximum alignment of the kernel's kernarg segment variables."

I suspect that the OpenCL runtime simply aligns all kernarg allocations to 256 but not sure of other languages.

Are we sure that is what RT(s) do?

It doesn't really matter if it does or not, since we're now requesting the larger allocation

Note that technically the HSA spec requires the kernarg segment size to match the size and alignment of the kernel arguments. So it ought not to be larger than deduced from the arguments. In this context the implicit arguments are just extra arguments.

arsenm added inline comments.May 25 2018, 12:09 PM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
426

I think we really only need to pad up to 4 at the end to avoid vmem extloads. Wider scalar loads at the end may be useful in some cases, but we do so badly at this now I wouldn't worry about it yet

t-tye accepted this revision.May 25 2018, 12:13 PM

LGTM

This revision is now accepted and ready to land.May 25 2018, 12:13 PM
t-tye added inline comments.May 25 2018, 1:58 PM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
426

Confirmed that OpenCL chooses to always align kernarg up to 128. I don't think the compiler should rely on anything higher than defined by the HSA spec 16 but interesting to know.

arsenm closed this revision.May 29 2018, 12:39 PM

r333456