This is an archive of the discontinued LLVM Phabricator instance.

[Codegen] Generate fast fp64-to-fp16 conversions in unsafe mode.
ClosedPublic

Authored by kosarev on Jul 5 2023, 10:22 AM.

Diff Detail

Event Timeline

kosarev created this revision.Jul 5 2023, 10:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 10:22 AM
kosarev requested review of this revision.Jul 5 2023, 10:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 10:22 AM
arsenm added inline comments.Jul 5 2023, 10:57 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

Should just let the expansion split them up, this is missing the source modifiers

kosarev updated this revision to Diff 537649.Jul 6 2023, 3:15 AM

Added matching modifiers.

kosarev added inline comments.Jul 6 2023, 3:18 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

Updated to match modifiers. The expansion would combine back to a single fpround f64, so doesn't work.

foad added a comment.Jul 6 2023, 3:20 AM

I think this should be implemented by legalizing (f16 fptrunc (f64 x)) to (f16 fptrunc (f32 fptrunc (f64 x))) in the unsafe case. There should be no need for extra isel patterns.

foad added inline comments.Jul 6 2023, 3:21 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

"The expansion would combine back to a single fpround f64" - that sounds like the combine is broken then? It should respect legality.

kosarev added inline comments.Jul 6 2023, 8:15 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

It's trivial to explain it to GISel that f16 = fp_round f64 should be expanded while f16 = fp_round f32 is legal as it is. However, in SDAG, if I'm not wrong, we can only differentiate by the resulting type, and if we choose custom expansion for all to-f16 roundings, then SDAG will try to combine whatever they result in.

It's not an immediate problem for this patch because we always do FP_TO_FP16, because we don't have 16-bit registers here yet. For True16, however, if we want it be a chain of two FP_ROUNDs, the expansion will be combined back. But then if we need the pattern for the True16 case anyway, then I'm not sure doing special work in GlobalISel and for the non-16bit cases is worth it. Would appreciate your opinions.

foad added inline comments.Jul 6 2023, 8:33 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

SITargetLowering::lowerFP_ROUND is already doing custom legalization for to-f16 fprounds. It expands f64-to-f16 into two steps, and leaves the others alone. I would hope that post-legalization combiners would not create new to-f16 fprounds, because generally after legalization you should not create any new nodes unless they are "Legal" - and this does not include "Custom".

kosarev added inline comments.Jul 6 2023, 8:52 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

Well, the two FP_ROUNDs we might be expanding into here are legal. But I'm not sure already-legal expansions should mean no combining attempts. Otherwise, why would we want any combinings after legalization.

arsenm added inline comments.Jul 6 2023, 8:53 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

This is one of the consequences of the DAG's single-type legality checks. You need to consider both the source and dest types for legality

kosarev added inline comments.Jul 6 2023, 8:58 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

I probably misread Jay's comment. Not combining anything post-legalization if this results into a node for which getOperationAction() yields 'Custom' makes sense, yes,

foad added inline comments.Jul 6 2023, 9:24 AM
llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

"Not combining anything post-legalization if this results into a node for which getOperationAction() yields 'Custom' makes sense" - right, that is my understanding of the desired behaviour. But I can't promise that all existing combines obey it.

kosarev updated this revision to Diff 538061.Jul 7 2023, 3:38 AM

Forbid folding legal fp_rounds into non-legal ones.

kosarev added inline comments.Jul 7 2023, 3:40 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17037–17040

This doesn't seem to cause any test failures here upstream and eliminates the need for the True16 f16 = fp_round f64 pattern downstream.

Not sure if we want that in a separate patch or better keep here to provide some context. It looks problematic to give it a real test without having some True16 support.

llvm/lib/Target/AMDGPU/VOP1Instructions.td
519–522 ↗(On Diff #537417)

This bit in DAGCombiner.cpp reads like the expectation was that combining legal SDAGs is not required to produce legal nodes:

// If this combine is running after legalizing the DAG, re-legalize any
// nodes pulled off the worklist.
if (LegalDAG) {
  SmallSetVector<SDNode *, 16> UpdatedNodes;
  bool NIsValid = DAG.LegalizeOp(N, UpdatedNodes);

  for (SDNode *LN : UpdatedNodes)
    AddToWorklistWithUsers(LN);

  ...

But then we also have this snippet that clearly addresses the same kind of problem, so not a precedent.

// Legalizing in AArch64TargetLowering::LowerCONCAT_VECTORS() and combining
// here could cause an infinite loop. That legalizing happens when LegalDAG
// is true and input of AArch64TargetLowering::LowerCONCAT_VECTORS() is
// scalable.
if (In.getOpcode() == ISD::CONCAT_VECTORS && In.hasOneUse() &&
    !(LegalDAG && In.getValueType().isScalableVector())) {
  ...
arsenm added inline comments.Jul 7 2023, 7:20 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17037–17040

I think you're supposed to be checking LegalOperations, not LegalDAG. Use the hasOperation helper?

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2535–2540 ↗(On Diff #538061)

Can move this into the generic code. Also it's unfortunate that we don't have fmf on the cast instructions

kosarev added inline comments.Jul 7 2023, 9:19 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17037–17040

For some reason that I don't quite understand LegalOperations gets raised as soon as vector operations are legalised:

LegalOperations = Level >= AfterLegalizeVectorOps;

So relying on that flag would mean we forbid the fold during the AfterLegalizeVectorOps combine.

kosarev updated this revision to Diff 538342.Jul 8 2023, 2:57 AM

Move the expansion logic to the legalisation helper.

foad added inline comments.Jul 11 2023, 1:44 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17037–17040

"So relying on that flag would mean we forbid the fold during the AfterLegalizeVectorOps combine." - why is that a problem?

kosarev added inline comments.Jul 11 2023, 2:43 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17037–17040

I'm thinking of cases where legalisation of vector operations may result in a chain of fp_rounds, which won't be combined before legalisation of non-vector operations because we forbid the fold too early.

foad added inline comments.Jul 11 2023, 2:52 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17037–17040

LegalizeVectorOps mostly just scalarizes things, so it is unlikely to introduce fp_rounds unless there were already (vector typed) fp_rounds in the input, which could have been combined earlier.

So I don't think this will be a problem. I would go with LegalOperations unless/until you have a real example of a problem that we can look at.

kosarev updated this revision to Diff 539145.Jul 11 2023, 9:12 AM

Updated to use hasOperation().

foad added a comment.Jul 11 2023, 9:30 AM

The code looks good. Can you update the description, since this is no longer just for GlobalISel and no longer AMDGPU-specific. Does it affect codegen tests for any other targets?

kosarev updated this revision to Diff 539154.Jul 11 2023, 9:33 AM

Update description.

kosarev retitled this revision from [AMDGPU][GlobalISel] Generate fast fp64-to-fp16 conversions in unsafe mode. to [AMDGPU] Generate fast fp64-to-fp16 conversions in unsafe mode..Jul 11 2023, 9:33 AM
kosarev updated this revision to Diff 539155.Jul 11 2023, 9:34 AM

Update description.

kosarev retitled this revision from [AMDGPU] Generate fast fp64-to-fp16 conversions in unsafe mode. to [Codegen] Generate fast fp64-to-fp16 conversions in unsafe mode..Jul 11 2023, 9:34 AM

this is no longer just for GlobalISel

Let's mention it here again that the SDAG change is not directly necessary for what the title declares, and was only kept as part of this patch to give it some context and explain the reasoning.

Does it affect codegen tests for any other targets?

No, check-all tests pass fine with all targets enabled.

arsenm accepted this revision.Jul 11 2023, 10:41 AM

I had forgotten about these sorts of uses of unsafe-fp-math, I thought we were getting closer to eliminating it

This revision is now accepted and ready to land.Jul 11 2023, 10:41 AM
chapuni added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6264–6265

S16 and S64 are used only here in -Asserts.

kosarev added inline comments.Jul 12 2023, 6:46 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6264–6265

Fixed in rGe705b2b1f4a7, thanks.