This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0
and C1 are constants where C0 + C1 is the bit-width of the shift
instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
3901 | I think this comment belongs just above the FshOpc = line. | |
3907–3910 | Likewise. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll | ||
2219 | Maybe not your fault, but it's a bad idea to use a VALU instruction for uniform values, especially if it means we need to insert readfirstlanes. |
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll | ||
---|---|---|
2219 | should probably do this in the post-regbank combiner |
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll | ||
---|---|---|
2219 | We could maintain this generic combine and an AMDGPU specific post More importantly, is this worth the effort? The constant shift amt pattern define amdgpu_kernel void @fshr_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %amt, <4 x i32> addrspace(1)* %m) { %sub = sub <4 x i32> <i32 32, i32 32, i32 32, i32 32>, %amt %shl = shl <4 x i32> %a, %sub %lshr = lshr <4 x i32> %b, %amt %ret = or <4 x i32> %shl, %lshr store <4 x i32> %ret, <4 x i32> addrspace(1)* %m ret void } has lesser instructions with the combine. How should we move forward? |
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll | ||
---|---|---|
2219 | Can't we just do what this comment in RegBankSelect says: case AMDGPU::G_FSHR: // TODO: Expand for scalar maybe expanding it to S_LSHR_B64 and just taking the low part of the result? In any case I don't think this needs to block the current patch. |
I think this comment belongs just above the FshOpc = line.