This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add support for llvm.r600.local.size.* instrics when targeting HSA
AbandonedPublic

Authored by tstellarAMD on Aug 28 2015, 3:33 PM.

Details

Reviewers
arsenm
Summary

For HSA these values are stored in the aql structure, which can be accesed
via the dispatch pointer which is loaded into user sgprs. For
simplicity, we are currently loading the dispatch pointer for all shaders,
even when it isn't used.

Diff Detail

Event Timeline

tstellarAMD retitled this revision from to AMDGPU/SI: Add support for llvm.r600.local.size.* instrics when targeting HSA.
tstellarAMD updated this object.
tstellarAMD added a reviewer: arsenm.
tstellarAMD added a subscriber: llvm-commits.
arsenm added inline comments.Aug 28 2015, 3:44 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1044

Why is this an i16? We really don't want to have to do an argument extload, although from the tests it looks like that doesn't happen.

lib/Target/AMDGPU/SIISelLowering.cpp
1044

The extload isn't happening, because LowerParameter only does extloads for floating-point types, I can fix that.

The local size values are stored in memory as i16 values. We could use a 32-bit non-ext load for the z value, since the next 16-bits after the z value will always be 0.

For x and y, we always load both and then mask/shift to get the value we need. I'm not sure if 32-bit load + mask or shift is faster than 16-bit ext load.

arsenm accepted this revision.Aug 28 2015, 5:23 PM
arsenm edited edge metadata.

LGTM

lib/Target/AMDGPU/SIISelLowering.cpp
1044

The 32-bit load and mask will definitely be better because there are no scalar ext loads.

This revision is now accepted and ready to land.Aug 28 2015, 5:23 PM
arsenm added inline comments.Oct 2 2015, 3:18 PM
lib/Target/AMDGPU/SIISelLowering.cpp
1043

This should be Dim * 2 since this is an i16.

tstellarAMD edited edge metadata.

Added a fix for the offset bug, and added mask/shift to avoid using extloads.
Unfortunately, the DAGCombine still emits an extload for the local.size.x case.
We really need address space arguments for isExtLoadLegal().

arsenm added inline comments.Oct 5 2015, 11:37 AM
lib/Target/AMDGPU/SIISelLowering.cpp
476–477

I think Signed should stay the last parameter

1052–1053

The cases aren't supposed to be indented

1059–1060

Why do you want to use SRA? I would expect SRL to be preferrable

test/CodeGen/AMDGPU/work-item-intrinsics.ll
102–104

For this case specifically, it already should not happen. I added shouldReduceLoadWidth to stop producing < 32-bit type extloads. Is there somewhere missing this?

113

There should be tests that use all the combinations of local.size.x/y/z

Added more test cases, and fixes some coding style issues.

LGTM

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
521

We should be able to detect if this will be needed from the IR, but I'm not sure where to put that

lib/Target/AMDGPU/SIISelLowering.cpp
632–635

I'm not sure why we have getPhysRegSubReg when getSubReg already exists. I'm already working on fixing this in other patches though

I was thinking about this, and I think it would be better if we added an intrinsic for the dispatch pointer and put the complexity of deciding what offset to read in the library. It will make it easier to detect when the dispatch pointer is a necessary input.

tstellarAMD abandoned this revision.Dec 1 2015, 2:26 PM

This has been implemented using the llvm.amdgcn.dispatch.ptr intrinsic instead.