This patch adds the llvm.aarch64.sve.ldnf1 intrinsic, plus
DAG combine rules for non-faulting loads and sign/zero extends
Details
Diff Detail
Event Timeline
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
---|---|---|
5333 | This is depending on hasSideEffects to preserve the correct ordering with instructions that read/write FFR? That probably works. I guess the alternative is to insert an IMPLICIT_DEF of FFR in the entry block of each function. What are the calling convention rules for FFR? Is it callee-save? If not, we might need to do some work to make FFR reads/writes do something sane across calls inserted by the compiler. |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
---|---|---|
5333 | The FFR is not callee-saved. We will need to add support to save & restore it where appropriate at the point the compiler starts generating reads to the FFR, but for the purpose of the ACLE the user will be required to do this if necessary. |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
---|---|---|
5333 | How can the user write correct code to save/restore the FFR? The compiler can move arbitrary readnone/argmemonly calls between the definition and the use. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
9998 | Could you replace GLD1* with Load? I believe that that will be still correct with the added bonus of covering the new case :) | |
11051 | You could use getSVEContainterType here instead. You'll need to extend it a wee bit. | |
12284 | The following switch statement will now cover more than just *Gather* nodes. Maybe SVE load nodes instead? | |
12328–12331 | Why not: SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2), Src->getOperand(3), Src->getOperand(4)}; ? | |
12332 | Could you add a comment explaining what the underlying difference between LDNF1S and GLD1S is? Otherwise it's not clear why this if statement is needed. IIUC, GLD1S has an extra argument for the offsets (hence 5 args vs 4). |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
---|---|---|
5333 | There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which use a svbool_t to keep the value of the FFR. These intrinsics are implemented in the same way with a Pseudo with hasSideEffects = 1 set. I thought this flag would prevent other calls from being scheduled/moved over these intrinsics, as they have unknown/unmodelled side-effects and would thus act kind of like a barrier? |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
---|---|---|
5333 | The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the MIR level correctly, like you're describing. |
- Rebased patch
- Updated comments and extended getSVEContainerType to handle nxv8i16 & nxv16i8
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12318–12321 | Move the assignment of MemVTOpNum to the switch statement above instead of special-casing it here? | |
12319 | nit: s/LD1SrcMemVT/SrcMemVT/ | |
12327 | Better make the default '5' if there is a large likelihood of there being 5 default values. | |
12327 | Instead of special -casing LDNF1S below, you can write this as: SmallVector<SDValue, 5> Ops; for(unsigned I=0; I<Src->getNumOperands(); ++I) Ops.push_back(Src->getOperand(I)); |
- Some minor changes to performSignExtendInRegCombine to address comments from @sdesmalen
LGTM [with the caveat that we need to revisit the modelling of the FFR register and get rid fo the PseudoInstExpansion at a later point, as discussed during the previous sync-up call]
Could you replace GLD1* with Load? I believe that that will be still correct with the added bonus of covering the new case :)