Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | Should just let the expansion split them up, this is missing the source modifiers |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | Updated to match modifiers. The expansion would combine back to a single fpround f64, so doesn't work. |
I think this should be implemented by legalizing (f16 fptrunc (f64 x)) to (f16 fptrunc (f32 fptrunc (f64 x))) in the unsafe case. There should be no need for extra isel patterns.
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | "The expansion would combine back to a single fpround f64" - that sounds like the combine is broken then? It should respect legality. |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | It's trivial to explain it to GISel that f16 = fp_round f64 should be expanded while f16 = fp_round f32 is legal as it is. However, in SDAG, if I'm not wrong, we can only differentiate by the resulting type, and if we choose custom expansion for all to-f16 roundings, then SDAG will try to combine whatever they result in. It's not an immediate problem for this patch because we always do FP_TO_FP16, because we don't have 16-bit registers here yet. For True16, however, if we want it be a chain of two FP_ROUNDs, the expansion will be combined back. But then if we need the pattern for the True16 case anyway, then I'm not sure doing special work in GlobalISel and for the non-16bit cases is worth it. Would appreciate your opinions. |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | SITargetLowering::lowerFP_ROUND is already doing custom legalization for to-f16 fprounds. It expands f64-to-f16 into two steps, and leaves the others alone. I would hope that post-legalization combiners would not create new to-f16 fprounds, because generally after legalization you should not create any new nodes unless they are "Legal" - and this does not include "Custom". |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | Well, the two FP_ROUNDs we might be expanding into here are legal. But I'm not sure already-legal expansions should mean no combining attempts. Otherwise, why would we want any combinings after legalization. |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | This is one of the consequences of the DAG's single-type legality checks. You need to consider both the source and dest types for legality |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | I probably misread Jay's comment. Not combining anything post-legalization if this results into a node for which getOperationAction() yields 'Custom' makes sense, yes, |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
---|---|---|
519–522 | "Not combining anything post-legalization if this results into a node for which getOperationAction() yields 'Custom' makes sense" - right, that is my understanding of the desired behaviour. But I can't promise that all existing combines obey it. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17037–17040 ↗ | (On Diff #538061) | This doesn't seem to cause any test failures here upstream and eliminates the need for the True16 f16 = fp_round f64 pattern downstream. Not sure if we want that in a separate patch or better keep here to provide some context. It looks problematic to give it a real test without having some True16 support. |
llvm/lib/Target/AMDGPU/VOP1Instructions.td | ||
519–522 | This bit in DAGCombiner.cpp reads like the expectation was that combining legal SDAGs is not required to produce legal nodes: // If this combine is running after legalizing the DAG, re-legalize any // nodes pulled off the worklist. if (LegalDAG) { SmallSetVector<SDNode *, 16> UpdatedNodes; bool NIsValid = DAG.LegalizeOp(N, UpdatedNodes); for (SDNode *LN : UpdatedNodes) AddToWorklistWithUsers(LN); ... But then we also have this snippet that clearly addresses the same kind of problem, so not a precedent. // Legalizing in AArch64TargetLowering::LowerCONCAT_VECTORS() and combining // here could cause an infinite loop. That legalizing happens when LegalDAG // is true and input of AArch64TargetLowering::LowerCONCAT_VECTORS() is // scalable. if (In.getOpcode() == ISD::CONCAT_VECTORS && In.hasOneUse() && !(LegalDAG && In.getValueType().isScalableVector())) { ... |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17037–17040 ↗ | (On Diff #538061) | I think you're supposed to be checking LegalOperations, not LegalDAG. Use the hasOperation helper? |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
2537–2542 | Can move this into the generic code. Also it's unfortunate that we don't have fmf on the cast instructions |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17037–17040 ↗ | (On Diff #538061) | For some reason that I don't quite understand LegalOperations gets raised as soon as vector operations are legalised: LegalOperations = Level >= AfterLegalizeVectorOps; So relying on that flag would mean we forbid the fold during the AfterLegalizeVectorOps combine. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17037–17040 ↗ | (On Diff #538061) | "So relying on that flag would mean we forbid the fold during the AfterLegalizeVectorOps combine." - why is that a problem? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17037–17040 ↗ | (On Diff #538061) | I'm thinking of cases where legalisation of vector operations may result in a chain of fp_rounds, which won't be combined before legalisation of non-vector operations because we forbid the fold too early. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
17037–17040 ↗ | (On Diff #538061) | LegalizeVectorOps mostly just scalarizes things, so it is unlikely to introduce fp_rounds unless there were already (vector typed) fp_rounds in the input, which could have been combined earlier. So I don't think this will be a problem. I would go with LegalOperations unless/until you have a real example of a problem that we can look at. |
The code looks good. Can you update the description, since this is no longer just for GlobalISel and no longer AMDGPU-specific. Does it affect codegen tests for any other targets?
this is no longer just for GlobalISel
Let's mention it here again that the SDAG change is not directly necessary for what the title declares, and was only kept as part of this patch to give it some context and explain the reasoning.
Does it affect codegen tests for any other targets?
No, check-all tests pass fine with all targets enabled.
I had forgotten about these sorts of uses of unsafe-fp-math, I thought we were getting closer to eliminating it
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
6264–6265 ↗ | (On Diff #539485) | S16 and S64 are used only here in -Asserts. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
6264–6265 ↗ | (On Diff #539485) | Fixed in rGe705b2b1f4a7, thanks. |
Can move this into the generic code. Also it's unfortunate that we don't have fmf on the cast instructions