Eliminate loads from the dispatch packet when they will have
a known value.
Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.
Paths
| Differential D47009
AMDGPU: Add pass to optimize reqd_work_group_size ClosedPublic Authored by arsenm on May 17 2018, 3:45 AM.
Details
Diff Detail Event TimelineHerald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptMay 17 2018, 3:45 AM Comment Actions As far as I understand it is only applicable if:
Potentially other languages can benefit it as well per language standard. This may be an easier work for an FE to call simplified function, but an FE will not solve the issue with call from a non-kernel function. Since you are writing the whole pass for it makes sense to address this as well. This revision is now accepted and ready to land.May 18 2018, 1:20 PM
Revision Contents
Diff 147284 lib/Target/AMDGPU/AMDGPU.h
lib/Target/AMDGPU/AMDGPULowerKernelAttributes.cpp
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
lib/Target/AMDGPU/CMakeLists.txt
test/CodeGen/AMDGPU/reqd-work-group-size.ll
|