If we know we we aren't using a component from the kernel, we can save
a few bit packing instructions.
We're still enabling the VGPR input to the kernel though.
Paths
| Differential D116953
AMDGPU: Optimize outgoing workitem ID based on reqd_work_group_size ClosedPublic Authored by arsenm on Jan 10 2022, 9:22 AM.
Details
Summary If we know we we aren't using a component from the kernel, we can save We're still enabling the VGPR input to the kernel though.
Diff Detail Event TimelineHerald added subscribers: foad, kerbowa, hiraditya and 7 others. · View Herald TranscriptJan 10 2022, 9:22 AM arsenm added a child revision: D116954: AMDGPU: Select workitem ID intrinsics to 0 with req_work_group_size.Jan 10 2022, 9:23 AM This revision is now accepted and ready to land.Jan 10 2022, 11:00 AM
Revision Contents
Diff 398671 llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-call-implicit-args.ll
llvm/test/CodeGen/AMDGPU/call-reqd-group-size.ll
|