This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix infinite loop with fma combines
ClosedPublic

Authored by kerbowa on Feb 3 2020, 5:46 PM.

Details

Summary

RFC since I'm not very familiar with the DAG
combiner.

https://reviews.llvm.org/D72312 introduced an infinite loop which involves
DAGCombiner::visitFMA and AMDGPUTargetLowering::performFNegCombine.

fma( a, fneg(b), fneg(c) ) => fneg( fma (a, b, c) ) => fma( a, fneg(b), fneg(c) ) ...

This only breaks with types where 'isFNegFree' returns flase, e.g. v4f32.
Reproducing the issue also needs the attribute 'no-signed-zeros-fp-math',
and no source mods allowed on one of the users of the Op.

This fix makes changes to indicate that it is not free to negate a fma if it
has users with source mods.

Diff Detail

Event Timeline

kerbowa created this revision.Feb 3 2020, 5:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2020, 5:46 PM
arsenm added inline comments.Feb 3 2020, 6:23 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
737–740

My initial thought is it should not be necessary to override this

742

This is incomplete and FMAD will have the same problem at minimum

kerbowa added inline comments.Feb 3 2020, 6:48 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
737–740

I had the same initial thought.

The issue is that performFNegCombine is doing the right thing. And the generic fma combine show below relies on 'isNegatibleForFree'. In this case it's not actually profitable.

if (!TLI.isFNegFree(VT) &&
+ TLI.isNegatibleForFree(SDValue(N, 0), DAG, LegalOperations,
+ ForCodeSize) == 2)

742

The hang at least doesn't effect FMAD. The combine that triggers this wont happen.

This revision is now accepted and ready to land.Feb 4 2020, 10:40 AM
arsenm added inline comments.Feb 4 2020, 10:42 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
742

Where is the combine? It's also suspicious if an fneg combine triggered for FMA but not FMAD.

llvm/test/CodeGen/AMDGPU/fma-combine.ll
699

should also have an fmad case

kerbowa added inline comments.Feb 4 2020, 11:03 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
742

It's a loop so there are combines in two directions.

The FNeg combine -fma(a, b, c) => fma(a, -b, -c), triggers for FMA/FMAD and is correct.

AFAICS the unprofitable combine, fma(a, -b, -c) => -fma(a, b, c), only triggers for FMA.
https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L12652

arsenm added inline comments.Feb 4 2020, 11:21 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
742

This should be just as valid for fmad, but we probably don't want it. This should handle fmad as well.

I am somewhat concerned about splitting much of the same logic between isNegatibleForFree and the target specific fneg combines. Is there now more overlap in what happens in the AMDGPU performFNegCombine and DAGCombiner than there used to be?

kerbowa updated this revision to Diff 242390.Feb 4 2020, 11:41 AM

Handle fmad.

kerbowa added inline comments.Feb 4 2020, 11:56 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
742

It might just be this one combine that was added. I agree with you though that there is danger that the logic becomes unnecessarily split, but the only other short term option is reverting https://reviews.llvm.org/D72312.

I know not many targets are overriding isNegatibleForFree. In this case, the generic rule is that if fnegs are not free, then two is worse then one. I.e. -fma(a, b, c) is better than fma(a, -b, -c). It does seem like we need target specific logic here.

arsenm accepted this revision.Feb 4 2020, 12:43 PM

LGTM to unblock this, but we do need to consolidate this logic. Either all of the AMDGPU fneg combines should be moved to generic code, or at least the AMDGPU code should start using isNegatibleForFree

This revision was automatically updated to reflect the committed changes.