Add signed and unsigned integer version of med3 combine.
Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1. Destination is med3
that corresponds to signedness of min/max in source.
Details
- Reviewers
foad arsenm - Commits
- rG4a9bc59867b6: AMDGPU/GlobalISel: Add integer med3 combines
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
39–67 | This feels like reinventing MIPatternMatch | |
83 | Opcode checks first? | |
94 | Technically 0 is a valid instruction opcode (PHI) but we ignore that most places | |
107 | Braces | |
151–152 | This will fail if the constant is a bitcasted G_FCONSTANT when the other patch extends isConst | |
166 | Should not construct new MachineIRBuilder |
Use MIPatternMatch to match source pattern. Use Optional instead of return 0. Separate from fp med3 function to avoid bitcasted G_FCONSTANT and explicitly match G_CONSTANT.
Looks pretty good to me, just some questions inline.
Will we end up selecting v_med3 instructions even for uniform inputs? That would be bad.
llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h | ||
---|---|---|
177 | Why is the cast needed? | |
280 | Might be simpler to make this a subclass of AnyBinaryOp_match? | |
llvm/lib/CodeGen/GlobalISel/Utils.cpp | ||
349 | Swap the operands: MI->getOpcode() != TargetOpcode::G_CONSTANT | |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
267 | I don't understand why you need to use Optional here. The default case could just be marked as unreachable. | |
281 | Is this a template so that you can use it for floating point values in the future? Then shouldn't "CstRegMatch" be called "ICstRegMatch"? | |
310–313 | I would remove these asserts. |
Should we move this combine after register bank select and skip min/max with sgpr bank?
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
281 | Yes, fmed3 will match fconstant, and clamp fconstant or splat value |
This should probably be done after RegBankSelect, although post-regbankselect combines are a yet unsolved problem. The dummy pass is there for it, it just doesn't do anything yet
Move combines after register bank select. Check for register bank of the result register and only combine for vgpr bank. There is mir test for sgpr bank min/max input for completeness. However sgpr bank min/max are transformed to compare + select in regbankselect and we should not expect them in post regbank combiner.
llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h | ||
---|---|---|
266 | Nit: drop the parens around (R.match(MRI, Op0) && L.match(MRI, Op1)). | |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
115 ↗ | (On Diff #306646) | Can we just assert the type is one of these, since we're running on legalized GMIR and these are the only types that we have integer min/max instructions for? (Or do we need to exclude v2s16?) |
124 ↗ | (On Diff #306646) | Nit: make this "K0_Imm" not "KO_Imm". |
Rebased. Updated uniform test (Jay fixed reg-bank-select for uniform min and max) and added test for v2i16 which we don't combine.
Renamed G_AMDGPU_MED3 to G_AMDGPU_SMED3 since this patch also adds UMED3.
Looks pretty good, just minor comments inline.
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
115 ↗ | (On Diff #306646) | Ping? Maybe just if (Ty.isVector()) return false;? |
97 ↗ | (On Diff #340463) | I don't think you really need the form of m_CommutativeBinOp that does not check the opcode. This whole function could be written as: return mi_match(MI, MRI, m_any_of(m_CommutativeBinOp(Max, m_CommutativeBinOp(Min, Val, K1), K0)), m_CommutativeBinOp(Min, m_CommutativeBinOp(Max, Val, K0), K1))); |
Addressed review comments.
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
115 ↗ | (On Diff #306646) | We need to exclude v2s16, there is a test that we don't combine in this case. |
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp | ||
---|---|---|
115 ↗ | (On Diff #306646) | Fair enough. |
I'd rather just have to pass MRI consistently