For HSA these values are stored in the aql structure, which can be accesed
via the dispatch pointer which is loaded into user sgprs. For
simplicity, we are currently loading the dispatch pointer for all shaders,
even when it isn't used.
Details
- Reviewers
arsenm
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1022 | Why is this an i16? We really don't want to have to do an argument extload, although from the tests it looks like that doesn't happen. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1022 | The extload isn't happening, because LowerParameter only does extloads for floating-point types, I can fix that. The local size values are stored in memory as i16 values. We could use a 32-bit non-ext load for the z value, since the next 16-bits after the z value will always be 0. For x and y, we always load both and then mask/shift to get the value we need. I'm not sure if 32-bit load + mask or shift is faster than 16-bit ext load. |
LGTM
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1022 | The 32-bit load and mask will definitely be better because there are no scalar ext loads. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1021 | This should be Dim * 2 since this is an i16. |
Added a fix for the offset bug, and added mask/shift to avoid using extloads.
Unfortunately, the DAGCombine still emits an extload for the local.size.x case.
We really need address space arguments for isExtLoadLegal().
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
470–472 | I think Signed should stay the last parameter | |
1030–1031 | The cases aren't supposed to be indented | |
1037–1038 | Why do you want to use SRA? I would expect SRL to be preferrable | |
test/CodeGen/AMDGPU/work-item-intrinsics.ll | ||
108–110 | For this case specifically, it already should not happen. I added shouldReduceLoadWidth to stop producing < 32-bit type extloads. Is there somewhere missing this? | |
114 | There should be tests that use all the combinations of local.size.x/y/z |
LGTM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
511 | We should be able to detect if this will be needed from the IR, but I'm not sure where to put that | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
606–609 | I'm not sure why we have getPhysRegSubReg when getSubReg already exists. I'm already working on fixing this in other patches though |
I was thinking about this, and I think it would be better if we added an intrinsic for the dispatch pointer and put the complexity of deciding what offset to read in the library. It will make it easier to detect when the dispatch pointer is a necessary input.
We should be able to detect if this will be needed from the IR, but I'm not sure where to put that