Nic Curtis done the experiments to prove it is faster than a
separate mul and add.
Fixes: SWDEV-332806
Paths
| Differential D127253
[AMDGPU] Use v_mad_u64_u32 for IMAD32 ClosedPublic Authored by rampitec on Jun 7 2022, 2:43 PM.
Details
Summary Nic Curtis done the experiments to prove it is faster than a Fixes: SWDEV-332806
Diff Detail
Event Timeline
Comment Actions Newly added pattern with imm:$src2 causes assert in global isel exporter with debug tablegen: assert(WaitingForNamedOperands == 0 && "previous predicate didn't find all operands or " "nested predicate that uses operands"); Disable global isel for this rule, it does not work anyway because of the 2 result instruction unsupported by gisel.
Comment Actions LGTM, thanks. But we should really do gisel too, even if it means writing C++ code.
This revision is now accepted and ready to land.Jun 9 2022, 1:53 AM
Comment Actions
I do not really think it needs C++ code. What it needs is a gisel exporter working in a way supporting what sdag can do. Maybe we need to extend tg syntax to extract a specific result from a node. This is my religious belief. Looking at even sdag code for handling of mul_lohi I believe this an unnecessary overkill, serving just one purpose: underdeveloped gisel backend.
This revision was landed with ongoing or failed builds.Jun 9 2022, 11:40 AM Closed by commit rG23db8e4b4322: [AMDGPU] Use v_mad_u64_u32 for IMAD32 (authored by rampitec). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 435622 llvm/lib/Target/AMDGPU/VOP3Instructions.td
llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll
llvm/test/CodeGen/AMDGPU/mad_64_32.ll
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll
llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll
llvm/test/CodeGen/AMDGPU/udiv.ll
|
Just curious: how does this work? This class has no GISelPredicateCode, but isn't that the same as having a GISelPredicateCode that always returns true?