This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection
ClosedPublic

Authored by Petar.Avramovic on Nov 15 2022, 9:55 AM.

Details

Summary

When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 9:55 AM
Petar.Avramovic requested review of this revision.Nov 15 2022, 9:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 9:55 AM
arsenm added inline comments.Nov 15 2022, 10:17 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3443–3447

This copy should just have been restricted in the first place. I also would have expected this to not have produced instructions unless the match happened; can you first try constraining the operands of this copy, and then splitting the predicate part from the instruction construction?

Split getting the operand modifiers and potential copy construction.

Pierre-vh added inline comments.Nov 16 2022, 4:33 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4852

With this it looks like we will end up calling getVOP3ModsImpl twice. I guess this isn't called often enough (and the "get" function seems simple enough) to cause performance issues but it still feels suboptimal
Could this be refactored to make it so selectVOP3ModsImpl doesn't need to call "get" again?
If not then I suppose it's fine as is

Now that I checked in tests with madmix that has sgpr input, copy is not created but it should have been.
Patch is not correct at this state.
Is there a way to move

if (!MatchedSrc0 && !MatchedSrc1 && !MatchedSrc2)
  return false;

in selectG_FMA_FMAD before lookup for modifiers begin via selectVOP3PMadMixModsImpl?

What looks equivalent to me is to check if at least one of the operands is fpext (could be hidden behind other instructions that could be folded into mods). Unfortunately this still has to call getVOP3ModsImpl on each operand at worst case before actual mods selection begins

This should be equivalent to trunk llvm but without creating copy instruction when fma_mix does not get selected.

I was checking if test like this gets selected in the same way:

define amdgpu_vs float @test_f16_f32_add_fma_ext_mul(float inreg %x, float %y, float %z, half %u, half %v) {
.entry:
    %a = fmul half %u, %v
    %b = fpext half %a to float
    %abs_x = call contract float @llvm.fabs.f32(float %x)
    %c = call float @llvm.fmuladd.f32(float %abs_x, float %y, float %b)
    %d = fadd float %c, %z
    ret float %d
}

declare float @llvm.fmuladd.f32(float, float, float)
declare float @llvm.fabs.f32(float)

selects into

%0:sreg_32 = COPY $sgpr0
%1:vgpr_32 = COPY $vgpr0
%2:vgpr_32 = COPY $vgpr1
%5:vgpr_32 = COPY $vgpr2
%6:vgpr_32 = COPY $vgpr3
%8:vgpr_32 = nofpexcept V_MUL_F16_e64 0, %5, 0, %6, 0, 0, implicit $mode, implicit $exec
%14:vgpr_32 = COPY %0
%11:vgpr_32 = V_FMA_MIX_F32 2, %14, 0, %1, 8, %8, 0, 0, 0, implicit $mode, implicit $exec
%12:vgpr_32 = nofpexcept V_ADD_F32_e64 0, %11, 0, %2, 0, 0, implicit $mode, implicit $exec
$vgpr0 = COPY %12
SI_RETURN_TO_EPILOG implicit $vgpr0

Previous version of this patch would not make a copy and it would use sgpr directly. Visible only in mir. sifoldoperands folds this copy later anyway.

%11:vgpr_32 = V_FMA_MIX_F32 2, %0, 0, %1, 8, %8, 0, 0, 0, implicit $mode, implicit $exec

Previous version of this patch would not make a copy and it would use sgpr directly. Visible only in mir. sifoldoperands folds this copy later anyway.

%11:vgpr_32 = V_FMA_MIX_F32 2, %0, 0, %1, 8, %8, 0, 0, 0, implicit $mode, implicit $exec

I do think we should lean more on post-selection operand folding for dealing with SGPR copies. We should write a new and better version. The current version is "backwards". Traditionally after selection we would have excess SGPRs and try to remove them to legalize instructions. With RegBankSelect making every operand a VGPR, we need one written from the perspective of everything starts as VGPRs and needs to make an optimal choice for SGPR operands

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
4852

I agree, you should know everything after the first round of selectVOP3PMadMixModsImpl and should just pass through what to do

I did not understand last comment. What do I need to do exactly?
Now selectVOP3PMadMixModsImpl (unchanged) is called when we know that fma/mad_mix will be selected.
Afaik select modifiers functions don't fail and are called once pattern was matched.
The tricky thing here is that selectVOP3PMadMixModsImpl was used to check for actual pattern (it needs to find fpext of some operand).
Now we check for fpext before selecting modifiers for operands.

I did not understand last comment. What do I need to do exactly?
Now selectVOP3PMadMixModsImpl (unchanged) is called when we know that fma/mad_mix will be selected.

Remove the special case getVOP3ModsImpl and FPEXT checks. Delete the copy build inside of selectVOP3ModsImpl, and move that code down to where the new instruction is constructed. You just need a new build copy to VGPR helper

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-mul.ll
173–174 ↗(On Diff #475815)

Should pre-contract this

Petar.Avramovic edited the summary of this revision. (Show Details)

Remove build copy from selectVOP3ModsImpl.
Copy is now built when adding operands to selected instruction.

arsenm added inline comments.Nov 17 2022, 8:27 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3435–3437

The name isn't accurate since this isn't just blindly inserting a copy. How about copyToVGPRIfSrcFolded

3438

Unrelated, but I don't understand why this modifier check is here. It doesn't factor into the constant bus restriction

3680

Could you put this down in the callback with the addReg (same with the rest)

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3438

I wanted to remove both checks, also build copy into a vgpr class but there were way too many changes in regression tests (mostly mir).
In those tests instructions have sgpr inputs (there is no copy).
Here is the summary of the effects of Mods != 0 and cloneVirtualRegister
sgpr input without mods, stays sgpr no copy
sgpr input with mods, creates copy but copy is still to sgpr! (cloneVirtualRegister is for Root, which was sgpr) not vgpr

In default use case (to my understanding at least) all checks could be removed
instructions that fold these mods have inputs with vgpr reg bank
Mods != 0 is most probably fast exit since operand should have already been vgpr
cloneVirtualRegister is just a convenience and avoids having to get reg class from register (needs llt, then size in bits and call to helper that will get the class)

Name change for helper function, move call inside lambda body. Copies are now created when src operand is being added to the new instruction (MIB) and inserted before it.

arsenm accepted this revision.Nov 18 2022, 8:44 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-mul.ll
173–174 ↗(On Diff #475815)

Should still use fmuladd or fma intrinsic here

This revision is now accepted and ready to land.Nov 18 2022, 8:44 AM

Update test that I missed, move to test file with other fma intrinsics.

arsenm accepted this revision.Nov 18 2022, 9:00 AM
This revision was landed with ongoing or failed builds.Nov 18 2022, 9:02 AM
This revision was automatically updated to reflect the committed changes.