This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU
ClosedPublic

Authored by JanekvO on Oct 7 2022, 7:24 AM.

Details

Reviewers
arsenm
Summary

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic for llvm.is.fpclass

Diff Detail

Event Timeline

JanekvO created this revision.Oct 7 2022, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 7:24 AM
JanekvO requested review of this revision.Oct 7 2022, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 7:24 AM
arsenm added a comment.Oct 7 2022, 9:15 AM

Can do it as a follow up commit, but the existing combines we have for AMDGPU::FP_CLASS should be ported to use the generic intrinsic. Also, llvm.amdgcn.class should get bitcode upgraded to the generic

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
183

Should add some vector cases too

arsenm added inline comments.Oct 7 2022, 9:15 AM
llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
4

Should also test/handle globalisel

arsenm added a comment.Oct 7 2022, 9:44 AM

Can do it as a follow up commit, but the existing combines we have for AMDGPU::FP_CLASS should be ported to use the generic intrinsic. Also, llvm.amdgcn.class should get bitcode upgraded to the generic

I just realized the amdgpu intrinsic allows non-immediate arguments, but is_fpclass does not so these are not equivalent

JanekvO planned changes to this revision.Oct 10 2022, 7:14 AM
JanekvO updated this revision to Diff 472930.Nov 3 2022, 7:25 AM
  • SelectionDAG fpclass vector support
  • GlobalISel llvm.is.fpclass support for AMDGPU
  • rebase
JanekvO retitled this revision from [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support for AMDGPU to [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU.Nov 3 2022, 10:20 AM
arsenm added inline comments.Nov 3 2022, 11:49 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320

This should get an IRTranslator test to make sure the flags are passed through

2324–2325

getUniqueInteger is unnecessarily fancy, can just cast to ConstantInt directly

2332

Do you really need the float type operand? I know bfloat16 isn't going to work without it, but I thought the plan was to introduce FP types to LLT

llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
133

Should avoid defining an AMDGPU node for this and move this to generic code

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
924

I don't see why you need to manually select this (maybe sharing the pattern between the existing intrinsic is annoying because the new intrinsic uses immarg?)

934–937

Should be no reason to check this here

948–956

You can just unconditionally materialize the constant into a register and let SIFoldOperands sort out the constant bus restriction

959–960

You shouldn't need to special case the result constraint

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
3945

Pretty sure this default constructs to null

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
3

Should use some share prefixes, a lot of these functions are the same. Also needs a gfx7 and 8 run lines for the half promotion

1923

v3f16 and v4f16 are also potentially interesting

JanekvO updated this revision to Diff 474002.Nov 8 2022, 7:31 AM
JanekvO marked 9 inline comments as done.
  • SelectionDAG fpclass vector support
  • GlobalISel llvm.is.fpclass support for AMDGPU
  • Address comments, add custom half promotion for gfx7
JanekvO added inline comments.Nov 8 2022, 7:48 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320

Not sure if I completely hit the mark with my added test, but to me it seemed that not all flags were possible (e.g., nnan flag didn't work as it required a fp return type). For now I've added flag related tests that explicitly test the addition of nofpexcept. Do let me know if there's something missing or whether this copyFlagsFromInstruction is better omitted.

2332

I believe it's not necessary for amdgpu but required for the G_IS_FPCLASS target opcode. Leaving it out results in verifier errors (I also am unaware about introducing FP types and LLT).

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
924

I did look on whether I could re-use some of the existing tablegen but I couldn't get it quite into the right shape for it to match. llvm.is.fpclass requires the mask to be an immarg as you mentioned so materializing the immediate into a register anywhere before this function results in a verifier error.

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
3

I'm not that well versed in how gfx7 should do half promotion. I feel like either gfx7 selectiondag or gfx7 globalisel half promotion tests are incorrect (and if not, selectiondag version does seem suboptimal).

arsenm added inline comments.Nov 8 2022, 8:05 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450

This will do the wrong thing for snans and also denormals inputs are flushed

arsenm added inline comments.Nov 10 2022, 7:17 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450

I also don't see the corresponding DAG legalization. It's such a special case I think this should be split into a separate patch anyway.

arsenm added inline comments.Nov 10 2022, 7:19 PM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320

I'd consider that a pre-existing bug in intrinsics. The IR is annoyingly strict about what things are allowed to have flags

2332

What do you mean verifier errors?

foad added inline comments.Nov 10 2022, 11:27 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6492–6493

Why did this change?

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320

It seems annoying to have such a long list of types here - it'll need updating whenever we introduce a new one. Can you use something like FloatVectorTypes instead?

JanekvO marked 2 inline comments as done.Nov 11 2022, 6:15 AM
JanekvO added inline comments.
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

Sorry, I meant that the MachineVerifier will fail for the G_IS_FPCLASS instruction.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450

I also don't see the corresponding DAG legalization.

I put the corresponding SelectionDAG type widening code for IS_FPCLASS is in target custom function LowerIS_FPCLASS as I couldn't bypass expansion in SelectionDAGBuilder.cpp when marking the action for the instruction with f16 as promote (i.e., it would call IS_FPCLASS expansion code even when trying to promote).

It's such a special case I think this should be split into a separate patch anyway.

As in, the widening code, or IS_FPCLASS support for amdgpu gfx7?

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6492–6493

isOperationLegalOrCustom will return false if the type is considered illegal regardless of whether the instruction's type is marked legal or custom whereas isOperationCustom won't explicitly check for type legality and returns whether the action was set to custom. I basically just wanted it to go through to target custom code.

(May revert this in favor of using the expand code for f16 in case there is no f16 fp class instruction for amdgpu)

For this patch I'd like to drop all the attempts to handle legalizing the f16 case and move that to a separate patch. It's a much more complicated edge case that doesn't have much in common with the base handling

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

Right, it's there in the operand list. I mean more abstractly, why is it there?

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450

OK, there are several issues here. None of this should be done in target code.

I also don't approve of doing this expansion in the DAG builder, but see that's a pre-existing issue. GlobalISel does need to do the same expansion.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6492–6493

For the no f16 case, I think we need to do software expansion to get correct results for denormal values

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320

This should be unnecessary, we have no vector class instructions. These should just expand into scalars

2778

This doesn't work correctly for denormals. The f16 denormal value won't be denormal after casting to f32 (if it wasn't flushed to zero under DAZ or FTZ modes)

JanekvO added inline comments.Nov 11 2022, 11:06 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

I can see the semantics being used in the target independent expansion of IS_FPCLASS in SelectionDAG (e.g., for retrieving inf of a particular fp semantic). I'm inferring that the rationale could be: GlobalISel will require a similar implementation and therefore requires the semantics. I haven't looked into whether any alternatives exist that don't require passing of the semantics through the operand, though.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2450

I was looking at implementing the SelectionDAG target independent expansion for GlobalISel lower(). I'll first remove f16 legalizing for cases where there is no f16 instructions available for amdgpu for this diff and move the GlobalISel's expansion/lower to another diff.

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320

If not set as custom (or legal), these'll get expanded through the target independent expansion. Bypassing said target independent expansion does result into the desired scalarizing.

arsenm added inline comments.Nov 11 2022, 11:14 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

Currently the LLT directly implies the semantics for every operation

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
318–320

This is one of the problems with doing this kind of expansion in SelectionDAGBuilder. This should go through the usual legalization paths

JanekvO updated this revision to Diff 475117.Nov 14 2022, 5:30 AM
  • Remove IS_FPCLASS amdgpu f16 legalization, split tests into f16 and not f16 cases, temporarily disable gfx7 glisel tests
  • Rebase
JanekvO added inline comments.
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

@sepavloff Do you happen to recall the rationale of the fp semantic operand for G_IS_FPCLASS? My knowledge about it are a bit shallow but perhaps it can be removed

sepavloff added inline comments.Nov 14 2022, 7:45 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

It is used to workaround limitations of GlobalISel, - lack of floating-point types. Without this operand it is impossible to distinguish between half and bfloat16 and also between different flavors of 8-bit floats.

If LLT supported floating-point types, this operand could be removed.

arsenm added inline comments.Nov 14 2022, 9:11 AM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

But this is a problem for every single operation, not just this one. We don't have a decided upon strategy for dealing with this, so it doesn't make sense to me to try to deal with it here

sepavloff added inline comments.Nov 14 2022, 10:43 PM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2332

Sounds reasonable. Let's remove it, in separate commit.

2332
JanekvO updated this revision to Diff 475421.Nov 15 2022, 4:22 AM
  • Remove fpsem operand construction in irtranslator for G_IS_FPCLASS
arsenm added inline comments.Nov 16 2022, 2:54 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
314

Can you add a fixme that we just want scalarization?

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
924

You might need to split it into a different pattern instantiation, but you would just need the S_MOV_B32 from the mask to the constant (although I actually would expect it to work if you directly folded the constant anyway, since the operand should have been copied to VGPR anyway). Something like:

class ClassPat<Instruction inst, ValueType vt> : GCNPat <
  (fp_class (VOP3Mods vt:$src0, i32:$src0_mods), (i32 timm:mask))
  (inst $src0_mods, VSrc_b32:$src0, $src0_mods, (S_MOV_B32 $mask))
>;
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
984

I think this clampScalar isn't doing anything and can be dropped

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll
12

Can you also add some cases where the input will be an SGPR?

107

s/float/f32 in these function names

arsenm requested changes to this revision.Nov 16 2022, 2:56 PM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll
2

-global-isel to front, also generate these checks

17

Needs additional checks with other flags besides the one just set

This revision now requires changes to proceed.Nov 16 2022, 2:56 PM
JanekvO added inline comments.Nov 18 2022, 6:57 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll
17

I've been wondering whether the flag copy from the IR intrinsic to G_IS_FPCLASS in IRTranslator should be removed altogether. I'd have to weaken the flags' constraints as they all require scalar or vector fp return types. Additionally, Any use of fast math flags outside of existing uses will most likely require amending langref. E.g., current descriptions of some fast math flags describe how input can result into a poison value but this wouldn't be possible for G_IS_FPCLASS as it's a bool return.

Let me know what you think, I can see some of the flags being useful by folding into constant bool values (e.g., not a nan flag + G_IS_FPCLASS test for nans) but I may be a bit naïve on useful cases beyond said folding.

arsenm added inline comments.Nov 18 2022, 3:53 PM
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
2320

You can remove the flag copy if you want, although the flags may be introduced in the future

2333–2334

I just realized there's no point in doing this. G_IS_FPCLASS is not marked as mayRaiseFPException, so the flag is implied

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-fpclass-flags.ll
17

OK, might as well drop this test if we have end to end tests and there's nothing unique to test in the IRTranslator

JanekvO updated this revision to Diff 477969.Nov 25 2022, 8:05 AM
JanekvO marked 5 inline comments as done.
  • Remove patfrag dependency for is_fpclass
  • Add dedicated patterns
  • Remove globalisel manual selection and depend on selectiondag tablegen
  • Address comments
JanekvO marked 4 inline comments as done.Nov 25 2022, 8:07 AM
arsenm accepted this revision.Nov 28 2022, 9:34 AM

LGTM with nits. you have some dead checks and dead code

llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td
132–133

Whole file is now whitespace only changes which can be dropped

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
913–922

Dead code

llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll
94–95 ↗(On Diff #477969)

This is broken for signaling nans. You dropped this from the patch but left these dead checks around

This revision is now accepted and ready to land.Nov 28 2022, 9:34 AM
JanekvO updated this revision to Diff 478322.Nov 28 2022, 12:17 PM
JanekvO marked 2 inline comments as done.
  • Remove dead code and tests
arsenm accepted this revision.Nov 28 2022, 12:17 PM
JanekvO marked an inline comment as done.Nov 28 2022, 12:38 PM

Sorry, haven't gotten github access yet: could you (or somebody in AMDGPU group) land this for me? 😅