This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
ClosedPublic

Authored by bsmith on Feb 10 2021, 8:07 AM.

Details

Summary

Add tablegen patterns for a variety of FP multiply type instructions.

Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Diff Detail

Event Timeline

bsmith created this revision.Feb 10 2021, 8:07 AM
bsmith requested review of this revision.Feb 10 2021, 8:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2021, 8:07 AM

Hello. Would the machine combiner method not be better, if it can reduce critical path length?

Hello. Would the machine combiner method not be better, if it can reduce critical path length?

In principal it perhaps would yes, however I'm not sure by itself it is up to the task. The main issue here is the fact that there are different instructions available depending on whether you have a predicate or not, (notably the combined instructions (FMLA etc) have only predicated forms). As such, ISelLowering lowers nodes (e.g. FADD + FMUL) to predicated forms to allow instruction selection to either select these predicated combined instructions (FMLA) or to select in such a way that the predicate is removed again if it could not combine instructions.

If this instruction combining was deferred to the machine combiner then either it would have to learn how to add PTRUES in to allow use of predicated forms, or ISelLowering would keep lowering to predicated forms and in the case of no combining happening a later post machine combiner pass would have to remove PTRUES that were no longer needed and revert instructions to unpredicated forms. Essentially doing it this way allows for a lot of this complexity to be taken care of by lowering and instruction selection.

Perhaps in the future this is something that could be addressed, but for now I think this is a reasonable approach.

Sure, It would at least make sense to do things in machine combiner at a later date. We need to be sure we don't make anything else worse whilst we are doing it though!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
11553–11554

This feels like a separate change. I don't see a reason to not fold to FMA when HasFullFP16 in general. (But that would require it's own tests).

llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.cpp
56

This needs to be more precise. Enabling SVE should not regress scalar/Neon code.

It looks like it should be possible to add the VT to this definition.

llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.h
30–31

Could we just make it so that AArch64Subtarget->TSInfo is passed the subtarget directly on construction?

llvm/test/CodeGen/AArch64/sve-fp-combine.ll
4

It's better, in the long run, to use fast math flags over the global fp-contract I believe.

dmgreen added inline comments.Feb 11 2021, 8:36 AM
llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.h
30–31

Alternatively, I'm not sure why this function is here in AArch64SelectionDAGInfo and could possibly just be moved to AArch64TargetLowering?

bsmith updated this revision to Diff 323297.Feb 12 2021, 5:44 AM
bsmith added a reviewer: dmgreen.

Updated patch:

  • Add some more FNMLA patterns that are required when things get combined slightly differently
  • Only return true in generateFMAsInMachineCombiner when the types in question are scalable, and not always when SVE is present.
  • Move generateFMAsInMachineCombiner into AArch64TargetLowering
  • Replace use of -fp-contract=fast in tests with fast flag on instructions
  • Pull out FP16 combining into a separate patch

Thanks. It looks like this could do with more half tests.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2156

Should there be nxv4f16 and nxv2f16 patterns?

It may also make sense to make multipatterns from these. They all just differ in a similar way to how TriOpFrag is used.

And it would probably be best to keep these close to the instructions they create, unless there is a reason have them separate.

paulwalker-arm added inline comments.Feb 15 2021, 3:20 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2155–2163

I might have misread something but these looks like a duplicate of the patterns you've updated just above, which does support the other f16 vector types. As Dave said, for completeness the other selection blocks will need patterns for nxv4f16 and nxv2f16.

llvm/test/CodeGen/AArch64/sve-fp-combine.ll
6–8

Generally for these types of tests we use update_llc_test_checks.py to generate the CHECK lines as it aids future updates.

195–200

It didn't exist when the tests were written but today I think/hope you can use the fneg instruction and thus simplify the tests a little.

paulwalker-arm added inline comments.Feb 15 2021, 3:31 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2222–2223

This doesn't effect this patch but just so you know where my head is at, my feeling is that we may end up moving this kind of logic into an AArch64 specific DAG combine. I say this because between the select, fma and fneg we have three predicates in play and so there's likely to be more cases we can perform the match against plus in the future we're quite likely to have to consider the implications of floating point exceptions which will make predicate analysis even more important.

bsmith updated this revision to Diff 323750.Feb 15 2021, 7:39 AM
  • Regenerate tests with update_llc_test_checks.py
  • Restructure patterns to use a multiclass
  • Remove duplicated patterns
  • Add missing patterns for more f16 types
bsmith updated this revision to Diff 323769.Feb 15 2021, 8:28 AM
bsmith edited the summary of this revision. (Show Details)
  • Reverse order of dependant patches, pulling half tests from D96599
  • Add missing half tests
paulwalker-arm accepted this revision.Feb 16 2021, 10:47 AM

Given the extensive testing within sve-fp-combine.ll I do wonder why sve-fma-dagcombine.ll is required, but otherwise the patch looks good.

This revision is now accepted and ready to land.Feb 16 2021, 10:47 AM

Given the extensive testing within sve-fp-combine.ll I do wonder why sve-fma-dagcombine.ll is required, but otherwise the patch looks good.

Test removed in commit, as yes, it is totally redundant.