Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
1,290 ms | x64 debian > libomp.api::omp_get_wtime.c |
Event Timeline
will not be merged post legalization as FMAC becomes
the preferred fusion opcode
I'm not sure wha you mean about FMAD "becoming the preferred fusion opcode". Is that just an AMDGPU quirk? Or is there something in generic code that prefers FMA before legalization and FMAD after??
llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll | ||
---|---|---|
5 | Could use GCN-COUNT-13: v_fma. But maybe it would be better to generate the checks for this file? It's not immediately obvious (to me) why 13 is the optimal number of fmas. There are more than 13 fmuls and fadds and fsubs in the IR. |
bool HasFMAD = (LegalOperations && TLI.isFMADLegal(DAG, N)); bool HasFMA = TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT) && (!LegalOperations || TLI.isOperationLegalOrCustom(ISD::FMA, VT)); unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
This is most probably an AMDGPU quirk.
FMAD is only available when LegalOperations is true, i.e. post legalize vector ops. Where as FMA is always legal if backend sets it to be Legal or Custom. I don't know the origin of this behaviour; however, the only backend that marks ISD::FMAD legal appears to be AMDGPU (SI and R600).
So preferred fusion opcode will only be FMAD on AMDGPU.
llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll | ||
---|---|---|
5 | I'll generate the checks for this file and precommit. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
13056 | Why do you need the HasFMA and HasFMAD checks here? I can see that you don't want to create new FMA instructions if FMA is not legal, but I can't see why you wouldn't want to combine existing FMAs. In other words can this just be return Opcode == ISD::FMA || Opcode == ISD::FMAD;? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
13056 | Sure, this yields some test changes (instruction count reductions). |
Looks fine to me, with or without the tweak suggested inline.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
13107 | Could use FMA.getOpcode() instead of PreferredFusedOpcode here, to try to preserve the original opcode of the A*B+something part of the expression. But that's getting pretty subtle and I'm not sure if it will make any practical difference. The same goes for any other parts of combines which preserve an fma(d) instead of creating a new one from scratch. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
13107 | What I meant was, use FMA.getOpcode() for A*B+something because it's preserving an existing fma(d) from the input, but still use PreferredFusedOpcode for C*D+something because it is a new fma(d) that we are creating by fusing an fmul and and fadd from the input. I realise this is pretty subtle and I'm not sure it will make any practical difference. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
13108 | On second thoughts, perhaps there's no point trying to preserve the original opcode for the A*B+something part, because the something is different, so it's impossible to retain the (fused vs unfused) rounding behaviour of the original opcode. So perhaps your original approach of using DAG.getNode(PreferredFusedOpcode, ...) everywhere is the best we can do. Sorry for the noise. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
13108 | No worries, I have spent a while thinking about it too. |
clang-tidy: warning: invalid case style for variable 'isFusedOp' [readability-identifier-naming]
not useful