This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE dag combines
ClosedPublic

Authored by bsmith on Nov 26 2021, 4:07 AM.

Details

Summary

By duplicating these dag combines we can bypass the legality checks that
they do, this allows us to perform these combines on larger than legal
fixed types, which in turn allows us to bring the same benefits D114580
brought but to larger than legal fixed types.

Depends on D114580

Diff Detail

Event Timeline

bsmith created this revision.Nov 26 2021, 4:07 AM
bsmith requested review of this revision.Nov 26 2021, 4:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2021, 4:07 AM
Matt added a subscriber: Matt.Nov 26 2021, 5:52 AM

I suppose an alternate solution is a new TLI hook that is less reliant on the legalisation tables. However, if we're the only backend with this issue I guess copying a couple of DAG combines is not so bad.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15979–15980

Although true, here it's really type legalisation you don't want to be affected by and this cannot ask the usual question (because we only say we support custom lower for the legal types). However, as is the combine could result in an unselectable DAG so I think you still need a DCI.isBeforeLegalizeOps() check somewhere.

17341–17342

Same DCI.isBeforeLegalizeOps comment as above.

I'm not really happy that we're using extload/truncstore with floating-point types here. It's an obscure, mostly unused feature, intended to support certain ancient floating-point instruction sets that would implicitly extend floating-point loads to the natural register size (think x87, m68k). I'm concerned this is throwing us into mostly untested codepaths; in particular, I don't think any other target uses these operations with vector floating-point types or 16-bit floating-point types.

It's not clear to me why the special case is being implemented this way, anyway; if we want to combine this sort of load+shuffle or shuffle+store pattern into two loads/stores, we should just do that. It doesn't need to be specific to fcvt.

bsmith updated this revision to Diff 390660.Nov 30 2021, 3:52 AM
  • Only perform duplicated dag combines before legalization
paulwalker-arm accepted this revision.EditedNov 30 2021, 8:52 AM

I'm accepting this patch on the grounds it doesn't newly enable the use of fp extload/truncstore nodes, it is just extending their use for the cases not already handled by D114580.

With regards to using fp extload/truncstrore nodes: We experimented with several solutions and this ended up being the cleanest of the approaches. The other solutions either required fragile target specific DAG combines or the introduction of new common ISD nodes (e.g. FPEXTEND_INREG) that require significant plumbing that seemed overkill for our single use case. There is nothing in the code (documentation, asserts... etc) suggesting any kind of restriction and we added tests to verify they work as required. If something does arise we'll just fix it like any other bug. The SVE VLS support is somewhat specially anyway given almost all nodes are not considered legal and require custom lowering so in this regard there's no real difference between an extending int load and an extending fp load.

This revision is now accepted and ready to land.Nov 30 2021, 8:52 AM

Okay.

Any thoughts on adding load+shuffle/shuffle+store combines?

I guess I'm not sure what you're asking here. In the context of these floating point loads and stores there is no load/shuffle pattern, or at least not one that DAGCombine sees. Essentially before operation legalisation the DAG looks nice. The problem is introduced during operation legalisation and thus the first opportunity DAGCombine has to improve things is after everything has been converted to the predicated target specific nodes along with the (un)packs and subvector operations that marshal the data between fixed and scalable domains. Spotting these sequences and optimising them proved messy.