This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store
ClosedPublic

Authored by peterwaller-arm on Oct 19 2021, 7:41 AM.

Details

Summary

Contiguous gather => masked load:

(sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1))
=> (masked.load (gep BasePtr IndexBase) Align Mask undef)

Contiguous scatter => masked store:

(sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1))
=> (masked.store Value (gep BasePtr IndexBase) Align Mask)

Tests with <vscale x 2 x double>:

[Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation].

Diff Detail

Event Timeline

peterwaller-arm requested review of this revision.Oct 19 2021, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2021, 7:41 AM

Hi @peterwaller-arm, looks like a nice improvement, although I haven't reviewed it yet! Just one thought - do you think there is any value in adding a similar instcombine for normal IR, i.e.

%indices = call <vscale x 4 x i64> @llvm.experimental.stepvector()
%geps = getelementptr %base, %indices
%val = call <vscale x 4 x i32> @llvm.masked.gather(<vscale x 4 x i32*> %geps, ...)

?

Matt added a subscriber: Matt.Oct 19 2021, 9:43 AM

do you think there is any value in adding a similar instcombine for normal IR

Yes, I had a quick go at this but it requires a bit more time than I have, so I'll defer that for now.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
874

The SVE gather intrinsics explicitly zero inactive lanes and so the replacement masked load must do likewise.

peterwaller-arm edited the summary of this revision. (Show Details)
  • Update tests per review comments, use ptr param instead of alloca.
  • Use zeroinitializer instead of undef for PassThru to preserve gathers' semantics.
  • Fix store pattern (doesn't take undef, takes Value), also in commit message.

I had my wires crossed for this comment: "Update tests per review comments, use ptr param instead of alloca." which belongs to a different patch, please ignore.

peterwaller-arm marked an inline comment as done.Oct 26 2021, 2:03 AM
david-arm added inline comments.Nov 2 2021, 6:06 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
880

nit: Is it worth reversing the logic to remove indentation and avoid unnecessary creation of variables above? For example, something like this:

Value *Index = II.getOperand(2);
Value *IndexBase;
if (!match(Index, m_Intrinsic<Intrinsic::aarch64_sve_index>(
    m_Value(IndexBase), m_SpecificInt(1))))
  return None;

Value *Mask = II.getOperand(0);
Value *BasePtr = II.getOperand(1);
...
897

Should we also call II.eraseFromParent(); here too like the scatter case?

peterwaller-arm added inline comments.Nov 2 2021, 6:10 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
880

As with many other combines, I've structured it in this way to enable further combines to be added without having to touch my logic.

(I'm also a fan of early return in principle *tips hat*, but I think it might not be best for this case).

897

Nice nitpick, but my belief is that the IC.replaceInstUsesWith-related logic erases it in the load case because loads return a value. When these uses are removed the load becomes trivially dead and deleted. Whereas the stores do not have any uses, and therefore the erasure logic doesn't kick in.

peterwaller-arm added inline comments.Nov 2 2021, 6:18 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
880

I'm also making a loose assumption here that those variable assignments will optimize to be free or very cheap in the case where they are not used. Additionally, that those variable assignments are likely to find use in hypothetical future combines. These points are all debatable, I think!

david-arm accepted this revision.Nov 2 2021, 6:25 AM

LGTM! OK, fair enough and thanks for the explanations. :)

This revision is now accepted and ready to land.Nov 2 2021, 6:25 AM
peterwaller-arm marked 2 inline comments as done.Nov 2 2021, 6:39 AM
paulwalker-arm added inline comments.Nov 2 2021, 10:43 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
885–888

With the advice below this can become:

Align Alignment = BasePtr->getPointerAlignment(II.getModule()->getDataLayout());
894–895

Please use Builder.CreateMaskedLoad here.

921–924

As above.

931–932

Please use Builder.CreateMaskedStore here.

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-gatherscatter.ll
11

Choose to ignore if you disagree but I don't like seeing . in test function names and would prefer _ to be used.

peterwaller-arm marked 4 inline comments as done.

Address review comments:

  • Use Builder.Create* functions
  • Use _ for test names
peterwaller-arm marked an inline comment as done.Nov 3 2021, 2:40 AM

This patch triggers a read of uninitialized memory on sanitizer-x86_64-linux-fast. I intend to land a fix within the next hour or two, please revert if it is causing you a problem and I will reland it, otherwise fix will appear soon.

Relanded in 7a34145f407e4ec652111f5e9483a36f816a6a3a
Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store"