This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Support mad/fma_mix selection
ClosedPublic

Authored by Pierre-vh on Sep 21 2022, 5:41 AM.

Details

Summary

Adds support for selecting the following instructions using GlobalISel:

  • v_mad_mix/v_fma_mix
  • v_mad_mixhi/v_fma_mixhi
  • v_mad_mixlo/v_fma_mixlo

To select those instructions properly, some additional changes were
needed which impacted other tests as well.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
arsenm added inline comments.Sep 21 2022, 7:00 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/mad-mix.ll
2–5

I'd work to eliminate those differences. If it turns out to be difficult, I would probably switch to generated checks and have them share the same file

Pierre-vh updated this revision to Diff 462170.Sep 22 2022, 6:59 AM
Pierre-vh marked 7 inline comments as done.

Rebase on D134433, address some comments
I still have some work left to do but I think the majority can already be reviewed

Pierre-vh added inline comments.Sep 22 2022, 7:01 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/mad-mix.ll
2–5

The remaining difference fall in the following categories:

  • isCanonicalized has different behaviour between GISel/DAG, so there's v_mac instead of v_mad in a few places since SIFoldOperand doesn't fold
    • there's also v_madak_f32 that's no longer present, I think it's the same cause but I haven't looked into it yet
  • op_sel is on the second v_mad_mix instead of the first, seems like a harmless difference due to how DAG/GISel work?
  • A G_LSHR + G_SHUFFLEVECTOR (1,0) pair isn't folded out. I think those operations negate each other, perhaps a combine should be added for that?
  • Some unfinished things like -[v0| not being picked up yet.

The last 2 definitely have to be fixed, I'll look into them ASAP, but are the first 2 important as well? I'm not sure of what to do with isCanonicalized, is there a place where I can find the list of operations that should go in there?

llvm/utils/TableGen/GlobalISelEmitter.cpp
2526–2530

Actually it isn't, it looks like the FMA/MAD patterns expose a bug in GISel. Without that there's a crash (segfault) in executeMatchTable because the number of renderer fns is incorrectly reported and it doesn't allocate enough entries in the vector that holds them. It seems like we rarely go above 2 renderers but here there's 4 IIRC

Pierre-vh added inline comments.Sep 23 2022, 12:13 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/mad-mix.ll
2–5

For -|v0|, the gMIR looks like this:

bb.1 (%ir-block.0):
  liveins: $vgpr0, $vgpr1, $vgpr2
  %0:_(s32) = COPY $vgpr0
  %3:_(s32) = COPY $vgpr1
  %1:_(s16) = G_TRUNC %3:_(s32)
  %4:_(s32) = COPY $vgpr2
  %2:_(s16) = G_TRUNC %4:_(s32)
  %8:_(s16) = G_FCONSTANT half 0xH8000
  %7:_(<2 x s16>) = G_BUILD_VECTOR %8:_(s16), %8:_(s16)
  %5:_(<2 x s16>) = G_BITCAST %0:_(s32)
  %6:_(<2 x s16>) = G_FABS %5:_
  %18:_(<2 x s16>) = G_FNEG %6:_
  %9:_(<2 x s16>) = G_FADD %7:_, %18:_
  %19:_(s32) = G_BITCAST %9:_(<2 x s16>)
  %20:_(s32) = G_CONSTANT i32 16
  %21:_(s32) = G_LSHR %19:_, %20:_(s32)
  %17:_(s16) = G_TRUNC %21:_(s32)
  %12:_(s32) = G_FPEXT %17:_(s16)
  %13:_(s32) = G_FPEXT %1:_(s16)
  %14:_(s32) = G_FPEXT %2:_(s16)
  %15:_(s32) = G_FMA %12:_, %13:_, %14:_
  $vgpr0 = COPY %15:_(s32)
  SI_RETURN implicit $vgpr0

Could we add a combine to fold G_FADD (+-)0.0, x into just x?
If we add that and another one to fold G_LSHR + G_SHUFFLEVECTOR (1,0), it should address most of the remaining differences.

arsenm added inline comments.Sep 23 2022, 6:05 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/mad-mix.ll
2–5

fadd 0 can only be folded out if you don't care about signed zeros and don't care about canonicalizing (I think the existing DAG combine fails to consider this second point). Which test is this in? I don't see how this would have ever been able to fold into an fmax_mix operand.

We should have the shift + shuffle combine.

isCanonicalized is essentially a list of opcodes with floating point semantics.

Pierre-vh added inline comments.Sep 26 2022, 2:27 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/mad-mix.ll
2–5

the gMIR was from v_mad_mix_f32_preextractfabsfneg_f16hi_f16lo_f16lo. I'll remove the comment saying it should be -|v0| then.

For isCanonicalized, do I just need to add opcodes like in the DAG version? e.g.:

switch (Opcode) {
case AMDGPU::G_FADD:
case AMDGPU::G_FSUB:
case AMDGPU::G_FMUL:
case AMDGPU::G_FMA:
case AMDGPU::G_FMAD:
case AMDGPU::G_FDIV:
case AMDGPU::G_FREM:
case AMDGPU::G_FPOW:
case AMDGPU::G_FPEXT:
case AMDGPU::G_FPTRUNC:
  return true;
case AMDGPU::G_FNEG:
case AMDGPU::G_FMINNUM_IEEE:
case AMDGPU::G_FMAXNUM_IEEE:

If yes, I get this result in one of the tests:

v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_pre_cvt: ; @v_mad_mixlo_f16_f16lo_f16lo_f32_clamp_pre_cvt
; %bb.0:
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_cvt_f32_f16_e32 v0, v0
	v_cvt_f32_f16_e32 v1, v1
	v_mac_f32_e32 v2, v0, v1
	v_med3_f32 v0, v2, 0, 1.0
	v_cvt_f16_f32_e32 v0, v0
	s_setpc_b64 s[30:31]

Compared to this for the DAG:

; v_mad_f32 v0, v0, v1, v2 clamp
; v_cvt_f16_f32_e32 v0, v0
; v_cvt_f32_f16_e32 v0, v0

Which is a really big difference.

Pierre-vh updated this revision to Diff 462872.Sep 26 2022, 5:34 AM

Rebase, update tests/uncomment broken test

arsenm added inline comments.Sep 26 2022, 7:40 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
580

eraseFromParent

3735–3747

This should definitely be a combine and should not be in the selector. This should only directly interpret fabs and fneg

llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
248

Don't see what happened here?

260

Leftover debugging

Pierre-vh updated this revision to Diff 463165.Sep 27 2022, 3:19 AM
Pierre-vh marked 4 inline comments as done.
  • Address comments
  • Fix failing tablegen test
  • Change tests to use fneg instead of fsub, as discussed.
    • v_mad_mix_f32_preextractfabsfneg_f16hi_f16lo_f16lo now uses -[v0| because of it, some other tests also use -v0 instead now.
  • ​Update Value Tracking functions to fix some additional test cases
foad added inline comments.Sep 27 2022, 3:48 AM
llvm/lib/CodeGen/GlobalISel/Utils.cpp
665–667 ↗(On Diff #463165)

This isn't true. You can get a NaN result even if none of the inputs are NaNs, e.g. from +inf + -inf.

Pierre-vh added inline comments.Sep 27 2022, 3:52 AM
llvm/lib/CodeGen/GlobalISel/Utils.cpp
665–667 ↗(On Diff #463165)

I just copied the SDag implementation so that one would be wrong too then

case ISD::FMA:
case ISD::FMAD: {
  if (SNaN)
    return true;
  return isKnownNeverNaN(Op.getOperand(0), SNaN, Depth + 1) &&
         isKnownNeverNaN(Op.getOperand(1), SNaN, Depth + 1) &&
         isKnownNeverNaN(Op.getOperand(2), SNaN, Depth + 1);
}

What is the alternative? Can this function not handle FMA/FMAD at all then?

foad added inline comments.Sep 27 2022, 3:55 AM
llvm/lib/CodeGen/GlobalISel/Utils.cpp
665–667 ↗(On Diff #463165)

D50804 fixed the sdag implementation for fadd/fmul etc but apparently not for fma/fmad :(

Pierre-vh updated this revision to Diff 463207.Sep 27 2022, 5:57 AM
Pierre-vh marked 2 inline comments as done.

Fix tests affected by isCanonicalized/isKnownNeverNaN changes
I minimized the addition to those functions to minimize the amount of tests impacted.
They both need more love + the FADD to FNEG combine would be nice too, but not in this patch.

  • Change tests to use fneg instead of fsub, as discussed.

This should be done as a separate precommit

  • Change tests to use fneg instead of fsub, as discussed.

This should be done as a separate precommit

I didn't change existing tests, I mean in the tests I added (in the GISel folder)

I didn't change existing tests, I mean in the tests I added (in the GISel folder)

But it's still a copy pasted copy of an existing test, so they're diverging. The goal is still to move towards a shared test

Switch to common, generated tests for DAG/GISel (Using D134793)

Pierre-vh edited the summary of this revision. (Show Details)Sep 28 2022, 12:48 AM

Note: I had to reduce the amount of check prefixes in the tests because it seem to be confusing update_llc_test_checks, it kept generating non-working tests if I had too many overlapping check prefixes across all run lines.

Pierre-vh updated this revision to Diff 463523.Sep 28 2022, 6:20 AM

Rebase, there's a few regressions due to legalizer changes now

arsenm added inline comments.Sep 28 2022, 8:29 AM
llvm/lib/CodeGen/GlobalISel/Utils.cpp
665–667 ↗(On Diff #463165)

Can you fix the DAG version? Plus this should be its own patch with its own testing

Pierre-vh updated this revision to Diff 463790.Sep 29 2022, 1:16 AM

Rebase on top of D134861
Splitting up the patch quite a bit so I can commit the straightforward changes sooner

Pierre-vh updated this revision to Diff 463795.Sep 29 2022, 1:35 AM

Rebase on top of D134862
Another set of changes removed on the diff, now it only focuses on mad_mix

Pierre-vh updated this revision to Diff 463841.Sep 29 2022, 4:02 AM

Rebase on tree that includes D134870.

All big regressions have been addressed, there's still a few minor differences between the DAG/GISel variants but I think they're not critical (I'll still look at them but please review in the meantime)
-> v_mad_mixhi_f16_f16lo_f16lo_f16lo_undeflo_clamp_precvt

-> weird use of shl but it's just one extra inst

-> v_mad_mixhi_f16_f16lo_f16lo_f16lo_intpack

-> constant rematerialization issue
Pierre-vh updated this revision to Diff 463856.Sep 29 2022, 5:45 AM

Fix v_mad_mixhi_f16_f16lo_f16lo_f16lo_undeflo_clamp_precvt

arsenm accepted this revision.Sep 30 2022, 5:23 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
5101–5108

I'd expect this to be dead code. 16-bit vector extracts are lowered to 32-bit bit ops

5164–5165

Asserting on s16 is adequate

This revision is now accepted and ready to land.Sep 30 2022, 5:23 AM
Pierre-vh updated this revision to Diff 464232.Sep 30 2022, 5:35 AM

Rebase, there's a few very small regressions due to the change in FMIN/MAXNUM legalization rules

Pierre-vh edited the summary of this revision. (Show Details)Sep 30 2022, 5:35 AM
Pierre-vh updated this revision to Diff 464235.Sep 30 2022, 5:39 AM
Pierre-vh marked 2 inline comments as done.
Pierre-vh edited the summary of this revision. (Show Details)

Comments

Needs re-review of the latest 2 changes because there's a few codegen changes due to the rebase

arsenm accepted this revision.Sep 30 2022, 5:55 AM

LGTM

Pierre-vh requested review of this revision.Oct 4 2022, 5:00 AM

Needs another quick re-review because of the shuffle/shift combine removal - there's some small regressions I was unable to fix for now but they only affect VI/CI

arsenm accepted this revision.Oct 12 2022, 8:43 AM
This revision is now accepted and ready to land.Oct 12 2022, 8:43 AM
foad added inline comments.Oct 17 2022, 3:59 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
534–535

Personally I prefer to write this as (IsFMA ? Subtarget->hasMadMixInsts() : Subtarget->hasFmaMixInsts()).

Also I don't understand what the (!Subtarget->hasMadMixInsts() && !Subtarget->hasFmaMixInsts()) part is for. Why can't the whole thing just be: (IsFMA ? Subtarget->!hasFmaMixInsts() : !Subtarget->hasMadMixInsts())?

753

Don't need the braces.

759

Don't need the braces.

llvm/lib/Target/AMDGPU/SIInstructions.td
2752 ↗(On Diff #467050)

This looks weird. Why would we every want to generate V_LSHLREV_B32 with an sgpr operand?

Pierre-vh marked 4 inline comments as done.

Comments

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
534–535

That condition was imported from selectFMAD_FMA in the DAGISel and yeah, after looking at it more carefully it's indeed redundant, I rewrote it.

llvm/lib/Target/AMDGPU/SIInstructions.td
2752 ↗(On Diff #467050)

I just changed the pattern matching to also allow VGPR operands (which, IIRC, will be copied to the right regbank in any case). Not sure why the pattern uses a SGPR output though.

I did this change because it helped codegen in one test go from:

v_mad_mix_f32 v0, v0, v1, v2 op_sel_hi:[1,1,1] clamp 
v_cvt_f16_f32_e32 v0, v0 
v_and_b32_e32 v1, 0xffff, v0 
v_lshl_or_b32 v0, v0, 16, v1

to

v_mad_mix_f32 v0, v0, v1, v2 op_sel_hi:[1,1,1] clamp
v_cvt_f16_f32_sdwa v0, v0 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD

Perhaps it'd be more adequate to duplicate the pattern and have another one that uses VReg input/output?

foad added inline comments.Oct 19 2022, 1:44 AM
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
156

Is this required? I am confused about whether you are now matching VOP3PMadMixMods in TableGen patterns, or only doing it from the C++ code in selectG_FMA.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
525

Call it selectG_FMA_FMAD.

534

The usual convention seems to be to return false (with no braces) here, and have the caller do:

if (selectG_FMA(I))
  return true;
return selectImpl(I, *CoverageInfo);

Would that work? Or is there something special about the return false on line 575?

558

Return false? No braces.

574

No braces.

751

Is this related to the current patch? Could it be split out?

llvm/lib/Target/AMDGPU/SIInstructions.td
2752 ↗(On Diff #467050)

Maybe split this into a separate patch and we can discuss it there? It doesn't seem to be directly related to selecting the mad_mix instruction.

Pierre-vh updated this revision to Diff 468822.Oct 19 2022, 2:09 AM
Pierre-vh marked 7 inline comments as done.

Comments, splitting up stuff in other diffs

Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
156

I'm doing it exclusively in C++.
This is needed to tell the GlobalISelEmitter that, when it sees "VOP3PMadMixMods" it needs to use the "selectVOP3PMadMixMods" function from AMDGPUInstructionSelector.
Without that, it wouldn't import patterns that use it.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
751

It's required. D136235

llvm/lib/Target/AMDGPU/SIInstructions.td
2752 ↗(On Diff #467050)

v_mad_mixhi_f16_f16lo_f16lo_f16lo_undeflo_clamp_precvt regressed due to splitting the change into D136236

foad added inline comments.Oct 19 2022, 2:18 AM
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
156

I'm still confused. If we are now importing patterns that use VOP3PMadMixMods, why do you need to do the selection in C++ code?

Pierre-vh marked an inline comment as done.Oct 19 2022, 2:21 AM
Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
156

It's a ComplexPattern

def VOP3PMadMixMods  : ComplexPattern<untyped, 2, "SelectVOP3PMadMixMods">;

Those functions work with DAG nodes and must be rewritten for GISel so the patterns can be imported
https://llvm.org/docs/GlobalISel/InstructionSelect.html#complexpatterns

foad added inline comments.Oct 19 2022, 2:26 AM
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
156

Right, but why do you need to write code in selectG_FMA_FMAD to manually select MAD_MIX/FMA_MIX? Why can't the imported patterns do this automatically?

Pierre-vh marked 2 inline comments as done.Oct 19 2022, 2:31 AM
Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
156

Because there are no patterns for MIX, only for MIXHI/MIXLO. MIX selection is still manual in the DAG as well (see SelectFMAD_FMA)

foad added inline comments.Oct 19 2022, 2:38 AM
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
156

OK, thanks for explaining. Out of curiosity, is there a good reason why mix selection cannot be done with patterns?

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3729

I've also done this as part of D136238.

foad added inline comments.Oct 19 2022, 3:15 AM
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
157

Why is this "s64"?

Pierre-vh updated this revision to Diff 468842.Oct 19 2022, 3:16 AM
Pierre-vh marked an inline comment as done.

Rebase

Pierre-vh added inline comments.Oct 19 2022, 3:18 AM
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
157

I'm not sure about this actually, I just followed the examples above. It's supposed to be "the expected type at the root of the match" but IIRC it doesn't look like it matter much in this case at least

arsenm accepted this revision.Oct 19 2022, 8:27 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
157

I'm surprised this doesn't do anything but should probably be untyped

Pierre-vh marked an inline comment as done.Oct 20 2022, 3:01 AM
Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUGISel.td
157

untyped doesn't work

[build] /home/pvanhout/work/trunk/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUGISel.td:157:5: error: Value specified for template argument 'GIComplexOperandMatcher:type' (#0) is of type ValueType; expected type LLT: untyped
[build]     GIComplexOperandMatcher<untyped, "selectVOP3PMadMixMods">,

There's a TODO for it

  // The expected type of the root of the match.
  //
  // TODO: We should probably support, any-type, any-scalar, and multiple types
  //       in the future.
  LLT Type = type;
`
Pierre-vh marked an inline comment as done.

Rebase

Pierre-vh updated this revision to Diff 470142.Oct 24 2022, 7:27 AM

DAG -> SDAG for FileCheck prefix
Still blocked by parent diff

Pierre-vh updated this revision to Diff 471432.Oct 28 2022, 1:24 AM

Rebase

There are some codegen changes (due to D136922) but it looks like a net positive everywhere

arsenm accepted this revision.Nov 1 2022, 2:41 PM

LGTM

Pierre-vh updated this revision to Diff 473175.Nov 4 2022, 2:28 AM

Rebase without D136922, some regressions are back but it's relatively small.

Pierre-vh updated this revision to Diff 473579.Nov 6 2022, 11:48 PM

Abandoning D135145 as we can't really decide whether it's a good or bad thing.
There's a couple of regressions then but nothing blocking landing afaik.
@arsenm can you confirm this is good to land? For the remainingg cases I'll add them to my notes and try to come up with something when time allows

Pierre-vh requested review of this revision.Nov 6 2022, 11:48 PM
arsenm accepted this revision.Nov 7 2022, 9:33 AM

Should work to fix these regressions vs. the DAG but they certainly aren't directly related to this

This revision is now accepted and ready to land.Nov 7 2022, 9:33 AM
This revision was landed with ongoing or failed builds.Nov 8 2022, 12:02 AM
This revision was automatically updated to reflect the committed changes.