Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I suppose an alternate solution is a new TLI hook that is less reliant on the legalisation tables. However, if we're the only backend with this issue I guess copying a couple of DAG combines is not so bad.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
15952–15953 | Although true, here it's really type legalisation you don't want to be affected by and this cannot ask the usual question (because we only say we support custom lower for the legal types). However, as is the combine could result in an unselectable DAG so I think you still need a DCI.isBeforeLegalizeOps() check somewhere. | |
17313–17314 | Same DCI.isBeforeLegalizeOps comment as above. |
I'm not really happy that we're using extload/truncstore with floating-point types here. It's an obscure, mostly unused feature, intended to support certain ancient floating-point instruction sets that would implicitly extend floating-point loads to the natural register size (think x87, m68k). I'm concerned this is throwing us into mostly untested codepaths; in particular, I don't think any other target uses these operations with vector floating-point types or 16-bit floating-point types.
It's not clear to me why the special case is being implemented this way, anyway; if we want to combine this sort of load+shuffle or shuffle+store pattern into two loads/stores, we should just do that. It doesn't need to be specific to fcvt.
I'm accepting this patch on the grounds it doesn't newly enable the use of fp extload/truncstore nodes, it is just extending their use for the cases not already handled by D114580.
With regards to using fp extload/truncstrore nodes: We experimented with several solutions and this ended up being the cleanest of the approaches. The other solutions either required fragile target specific DAG combines or the introduction of new common ISD nodes (e.g. FPEXTEND_INREG) that require significant plumbing that seemed overkill for our single use case. There is nothing in the code (documentation, asserts... etc) suggesting any kind of restriction and we added tests to verify they work as required. If something does arise we'll just fix it like any other bug. The SVE VLS support is somewhat specially anyway given almost all nodes are not considered legal and require custom lowering so in this regard there's no real difference between an extending int load and an extending fp load.
I guess I'm not sure what you're asking here. In the context of these floating point loads and stores there is no load/shuffle pattern, or at least not one that DAGCombine sees. Essentially before operation legalisation the DAG looks nice. The problem is introduced during operation legalisation and thus the first opportunity DAGCombine has to improve things is after everything has been converted to the predicated target specific nodes along with the (un)packs and subvector operations that marshal the data between fixed and scalable domains. Spotting these sequences and optimising them proved messy.
Although true, here it's really type legalisation you don't want to be affected by and this cannot ask the usual question (because we only say we support custom lower for the legal types). However, as is the combine could result in an unselectable DAG so I think you still need a DCI.isBeforeLegalizeOps() check somewhere.