This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Add integer med3 combines
ClosedPublic

Authored by Petar.Avramovic on Oct 23 2020, 8:40 AM.

Details

Summary

Add signed and unsigned integer version of med3 combine.
Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0)
where K0 and K1 are constants and K0 <= K1. Destination is med3
that corresponds to signedness of min/max in source.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2020, 8:40 AM
Petar.Avramovic requested review of this revision.Oct 23 2020, 8:40 AM
arsenm added inline comments.Oct 23 2020, 9:12 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
39–67 ↗(On Diff #300309)

This feels like reinventing MIPatternMatch

83 ↗(On Diff #300309)

Opcode checks first?

94 ↗(On Diff #300309)

Technically 0 is a valid instruction opcode (PHI) but we ignore that most places

107 ↗(On Diff #300309)

Braces

151–152 ↗(On Diff #300309)

This will fail if the constant is a bitcasted G_FCONSTANT when the other patch extends isConst

166 ↗(On Diff #300309)

Should not construct new MachineIRBuilder

foad added inline comments.Oct 27 2020, 3:06 AM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
132–133 ↗(On Diff #300309)

This should never fail.

141 ↗(On Diff #300309)

I think it only has to be K0 <= K1.

Petar.Avramovic edited the summary of this revision. (Show Details)

Use MIPatternMatch to match source pattern. Use Optional instead of return 0. Separate from fp med3 function to avoid bitcasted G_FCONSTANT and explicitly match G_CONSTANT.

foad added a comment.Nov 19 2020, 6:40 AM

Looks pretty good to me, just some questions inline.

Will we end up selecting v_med3 instructions even for uniform inputs? That would be bad.

llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
177 ↗(On Diff #306179)

Why is the cast needed?

280 ↗(On Diff #306179)

Might be simpler to make this a subclass of AnyBinaryOp_match?

llvm/lib/CodeGen/GlobalISel/Utils.cpp
349 ↗(On Diff #306179)

Swap the operands: MI->getOpcode() != TargetOpcode::G_CONSTANT

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
267 ↗(On Diff #306179)

I don't understand why you need to use Optional here. The default case could just be marked as unreachable.

281 ↗(On Diff #306179)

Is this a template so that you can use it for floating point values in the future? Then shouldn't "CstRegMatch" be called "ICstRegMatch"?

310–313 ↗(On Diff #306179)

I would remove these asserts.

Address review comments.

Should we move this combine after register bank select and skip min/max with sgpr bank?

llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
281 ↗(On Diff #306179)

Yes, fmed3 will match fconstant, and clamp fconstant or splat value

foad added a comment.Nov 19 2020, 8:48 AM

Should we move this combine after register bank select and skip min/max with sgpr bank?

I'm not sure. @arsenm is that the usual way to handle this?

Should we move this combine after register bank select and skip min/max with sgpr bank?

I'm not sure. @arsenm is that the usual way to handle this?

This should probably be done after RegBankSelect, although post-regbankselect combines are a yet unsolved problem. The dummy pass is there for it, it just doesn't do anything yet

Move combines after register bank select. Check for register bank of the result register and only combine for vgpr bank. There is mir test for sgpr bank min/max input for completeness. However sgpr bank min/max are transformed to compare + select in regbankselect and we should not expect them in post regbank combiner.

foad added a comment.Nov 20 2020, 3:56 AM

However sgpr bank min/max are transformed to compare + select in regbankselect

That seems wrong since we have S_MIN/MAX_I32/U32 instructions.

However sgpr bank min/max are transformed to compare + select in regbankselect

That seems wrong since we have S_MIN/MAX_I32/U32 instructions.

I thought I fixed this before

arsenm requested changes to this revision.Mar 29 2021, 3:33 PM

The scalar min/max issue has been fixed, so this needs updating

This revision now requires changes to proceed.Mar 29 2021, 3:33 PM
foad added inline comments.Mar 30 2021, 2:00 AM
llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
266 ↗(On Diff #306646)

Nit: drop the parens around (R.match(MRI, Op0) && L.match(MRI, Op1)).

llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
114

Can we just assert the type is one of these, since we're running on legalized GMIR and these are the only types that we have integer min/max instructions for? (Or do we need to exclude v2s16?)

123

Nit: make this "K0_Imm" not "KO_Imm".

Rebased. Updated uniform test (Jay fixed reg-bank-select for uniform min and max) and added test for v2i16 which we don't combine.
Renamed G_AMDGPU_MED3 to G_AMDGPU_SMED3 since this patch also adds UMED3.

arsenm added inline comments.Mar 31 2021, 10:46 AM
llvm/include/llvm/CodeGen/GlobalISel/MIPatternMatch.h
27–29 ↗(On Diff #334409)

I'd rather just have to pass MRI consistently

264–275 ↗(On Diff #334409)

Probably should split the new MIPatternMtach changes into a separate patch

Extracted MIPatternMtach changes.

Rebase and ping.

foad added a comment.Apr 26 2021, 3:00 AM

Looks pretty good, just minor comments inline.

llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
97

I don't think you really need the form of m_CommutativeBinOp that does not check the opcode. This whole function could be written as:

return mi_match(MI, MRI, m_any_of(m_CommutativeBinOp(Max, m_CommutativeBinOp(Min, Val, K1), K0)),
                                  m_CommutativeBinOp(Min, m_CommutativeBinOp(Max, Val, K0), K1)));
114

Ping? Maybe just if (Ty.isVector()) return false;?

Addressed review comments.

llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
114

We need to exclude v2s16, there is a test that we don't combine in this case.
I did find it more informative this way since it is explicit about available types for med3.

foad accepted this revision.Apr 26 2021, 4:13 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
114

Fair enough.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 27 2021, 2:54 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.