This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add intrinsics for non-temporal scatters/gathers
ClosedPublic

Authored by andwar on Mar 4 2020, 3:43 AM.

Details

Summary

[AArch64][SVE] Add intrinsics for non-temporal scatters/gathers

This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:

  • aarch64_sve_ldnt1_gather_index
  • aarch64_sve_stnt1_scatter_index

These intrinsics implement the "scalar + vector of indices" addressing
mode.

As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to ldnt1 instructions.

The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:

  • GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
  • SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1

The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.

Diff Detail

Event Timeline

andwar created this revision.Mar 4 2020, 3:43 AM
Herald added a reviewer: efriedma. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
andwar marked an inline comment as done.Mar 4 2020, 3:52 AM
andwar added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.h
266

The new nodes (GLDNT1_INDEX and SSTN1_INDEX) are only introduced to keep performGatherLoadCombine and performScatterStoreCombine relatively clean.

But maybe I shouldn't introduce them if they're always meant to be replaced with SPLAT_VECTOR + MUL + GLDNT1?

Just a quick note: this patch is formatted consistently with the files that it touches. Since these files diverge from the official clang-format rules, clang-format pre-merge checks failed.

If I'm following correctly, you don't actually create any SSTNT1_INDEX nodes? You're just using it as a placeholder in the call to performScatterStoreCombine? I guess that's not a big deal.

llvm/test/CodeGen/AArch64/sve2-intrinsics-nt-gather-loads-64bit-scaled-offset.ll
13

"mul"? Can we make a shift instead?

andwar marked an inline comment as done.Mar 5 2020, 5:34 AM

If I'm following correctly, you don't actually create any SSTNT1_INDEX nodes? You're just using it as a placeholder in the call to performScatterStoreCombine? I guess that's not a big deal.

Correct. Same for LDNT1_INDEX. I'm only using these nodes to uniquely identify scatters/gathers that:

  • are non-temporal
  • use the "scalar + vector of indices" addressing mode.

It's not great, but I think it's the least intrusive way of identifying these cases.

llvm/test/CodeGen/AArch64/sve2-intrinsics-nt-gather-loads-64bit-scaled-offset.ll
13

Sadly there are no patterns for lsl yet, so I'd have to add a call to @llvm.aarch64.sve.lsl for this to wok. It's an option, but I'd prefer to leave a TODO instead (e.g. // TODO Replace MUL with SHL once patterns for lsl are added).

efriedma added inline comments.Mar 5 2020, 9:04 AM
llvm/test/CodeGen/AArch64/sve2-intrinsics-nt-gather-loads-64bit-scaled-offset.ll
13

The lsl patterns are pretty simple; see https://reviews.llvm.org/D73602 . But sure, we can leave that for later.

andwar updated this revision to Diff 248699.Mar 6 2020, 4:39 AM
  • Replaced ISD::MUL from my original patch with ISD::SHL
  • Added patterns for ISD::SHL for scalable types
  • Extracted the scaling into a seperate function
andwar marked 2 inline comments as done.Mar 6 2020, 5:17 AM
andwar added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5267

This change is needed to accommodate for scalable types.

llvm/test/CodeGen/AArch64/sve2-intrinsics-nt-gather-loads-64bit-scaled-offset.ll
13

Fair point, updated.

Btw, thanks for the link, but it seems unrelated. Did you have some other patch in mind?

andwar edited the summary of this revision. (Show Details)Mar 6 2020, 5:18 AM
andwar updated this revision to Diff 249299.Mar 10 2020, 3:35 AM

Fix formatting to appease Harbormaster.

Maybe you can avoid the need for SSTNT1_INDEX and GLDNT1_INDEX by refactoring the performScatterStoreCombine/performGatherLoadCombine to take the parameters from the caller, rather than handling the special cases inside that function.
I don't think that should hold off this patch though, but maybe something to refactor afterwards.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3085

Can you also create some negative tests for this change?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12674

getScaledOffsetForLDNT1 is a bit of misnomer in this function, how about getScaledOffsetForBitwidth()?

andwar updated this revision to Diff 249660.Mar 11 2020, 9:54 AM

Rename getScaledOffsetForLDNT1 as getScaledOffsetForBitwidth

andwar marked 2 inline comments as done.Mar 11 2020, 9:57 AM
andwar added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3085

If that's OK, I'll do it in a separate patch.

sdesmalen accepted this revision.Mar 11 2020, 11:18 AM

LGTM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3085

Okay, that's fine with me. I guess this requires adding more patterns for shifts in order to test it, so it makes sense to do that in a separate patch.

This revision is now accepted and ready to land.Mar 11 2020, 11:18 AM
This revision was automatically updated to reflect the committed changes.