This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add intrinsic for non-faulting loads
ClosedPublic

Authored by kmclaughlin on Dec 19 2019, 4:15 AM.

Details

Summary

This patch adds the llvm.aarch64.sve.ldnf1 intrinsic, plus
DAG combine rules for non-faulting loads and sign/zero extends

Diff Detail

Event Timeline

kmclaughlin created this revision.Dec 19 2019, 4:15 AM
Herald added a project: Restricted Project. · View Herald Transcript
efriedma added inline comments.Dec 19 2019, 3:15 PM
llvm/lib/Target/AArch64/SVEInstrFormats.td
5570

This is depending on hasSideEffects to preserve the correct ordering with instructions that read/write FFR? That probably works. I guess the alternative is to insert an IMPLICIT_DEF of FFR in the entry block of each function.

What are the calling convention rules for FFR? Is it callee-save? If not, we might need to do some work to make FFR reads/writes do something sane across calls inserted by the compiler.

kmclaughlin added inline comments.Dec 20 2019, 9:16 AM
llvm/lib/Target/AArch64/SVEInstrFormats.td
5570

The FFR is not callee-saved. We will need to add support to save & restore it where appropriate at the point the compiler starts generating reads to the FFR, but for the purpose of the ACLE the user will be required to do this if necessary.

efriedma added inline comments.Dec 20 2019, 2:00 PM
llvm/lib/Target/AArch64/SVEInstrFormats.td
5570

How can the user write correct code to save/restore the FFR? The compiler can move arbitrary readnone/argmemonly calls between the definition and the use.

andwar added inline comments.Jan 2 2020, 6:27 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
10178–10180

Could you replace GLD1* with Load? I believe that that will be still correct with the added bonus of covering the new case :)

11254

You could use getSVEContainterType here instead. You'll need to extend it a wee bit.

12460

The following switch statement will now cover more than just *Gather* nodes. Maybe SVE load nodes instead?

12504–12507

Why not:

SmallVector<SDvalue, 4> Ops = {Src->getOperand(0), Src->getOperand(1), Src->getOperand(2), Src->getOperand(3), Src->getOperand(4)};

?

12508

Could you add a comment explaining what the underlying difference between LDNF1S and GLD1S is? Otherwise it's not clear why this if statement is needed. IIUC, GLD1S has an extra argument for the offsets (hence 5 args vs 4).

sdesmalen added inline comments.Jan 8 2020, 9:39 AM
llvm/lib/Target/AArch64/SVEInstrFormats.td
5570

There are separate intrinsics for loading/writing the FFR (svrdffr, svsetffr, svwrffr), which use a svbool_t to keep the value of the FFR. These intrinsics are implemented in the same way with a Pseudo with hasSideEffects = 1 set.

I thought this flag would prevent other calls from being scheduled/moved over these intrinsics, as they have unknown/unmodelled side-effects and would thus act kind of like a barrier?

efriedma added inline comments.Jan 8 2020, 12:39 PM
llvm/lib/Target/AArch64/SVEInstrFormats.td
5570

The issue would be transforms at the IR/SelectionDAG level. We can probably model calls at the MIR level correctly, like you're describing.

  • Rebased patch
  • Updated comments and extended getSVEContainerType to handle nxv8i16 & nxv16i8
kmclaughlin marked 5 inline comments as done.Jan 14 2020, 3:10 AM

Thanks for your suggestions, @andwar!

sdesmalen added inline comments.Jan 20 2020, 5:07 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12497

Move the assignment of MemVTOpNum to the switch statement above instead of special-casing it here?

12498

nit: s/LD1SrcMemVT/SrcMemVT/

12503

Better make the default '5' if there is a large likelihood of there being 5 default values.

12503

Instead of special -casing LDNF1S below, you can write this as:

SmallVector<SDValue, 5> Ops;
for(unsigned I=0; I<Src->getNumOperands(); ++I)
  Ops.push_back(Src->getOperand(I));
  • Some minor changes to performSignExtendInRegCombine to address comments from @sdesmalen
sdesmalen accepted this revision.Jan 21 2020, 1:00 AM

LGTM [with the caveat that we need to revisit the modelling of the FFR register and get rid fo the PseudoInstExpansion at a later point, as discussed during the previous sync-up call]

This revision is now accepted and ready to land.Jan 21 2020, 1:00 AM
This revision was automatically updated to reflect the committed changes.