This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics
ClosedPublic

Authored by joechrisellis on Jan 5 2021, 2:32 AM.

Details

Summary

This commit extends SVEIntrinsicOpts::optimizeConvertFromSVBool to
identify and remove longer chains of redundant SVE reintepret
intrinsics. For example, the following chain of redundant SVE
reinterprets is now recognised as redundant:

%a = <vscale x 2 x i1>
%1 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 2 x i1> %a)
%2 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %1)
%3 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %2)
%4 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %3)
%5 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %4)
%6 = <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %5)
ret <vscale x 2 x i1> %6

and will be replaced with:

ret <vscale x 2 x i1> %a

Eliminating these can sometimes mean emitting fewer unnecessary
loads/stores when lowering to assembly.

Diff Detail

Event Timeline

joechrisellis created this revision.Jan 5 2021, 2:32 AM
joechrisellis requested review of this revision.Jan 5 2021, 2:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 2:32 AM
david-arm added inline comments.Jan 11 2021, 5:32 AM
llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
84

If you create a test with similar to this, but with "<vscale x 2 x i1> %a" is there a bug? From your algorithm above it looks like EarliestRemoval would be "%2 tail call ...", but we'd keep iterating Cursor until we get to "%a". If I've understood your algorithm correctly won't that mean we end up deleting %1 and %2 and end up with this?

define <vscale x 4 x i1> @reinterpret_test_partial_chain(<vscale x 8 x i1> %a) {

ret <vscale x 4 x i1> %2;

}

84

Sorry, I meant create a test similar to this, but with "<vscale x 8 x i1> %a"!!

joechrisellis added inline comments.Jan 11 2021, 5:57 AM
llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
84

Hi @david-arm, just tested the algorithm with the following code:

define <vscale x 4 x i1> @foo(<vscale x 8 x i1> %a) {
  %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
  %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
  %3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2)
  %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %3)
  ret <vscale x 4 x i1> %4
}

And got the following output:

define <vscale x 4 x i1> @foo(<vscale x 8 x i1> %a) #1 {
  %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
  %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
  ret <vscale x 4 x i1> %2
}

This makes sense to me, although maybe I haven't fed the pass the same code that you were thinking of?

david-arm added inline comments.Jan 11 2021, 6:04 AM
llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
84

Ah ok, I see yes. It's this line that prevents us from removing %1 and %2:

if (Candidate->use_empty())

as %2 and %1 are being used. Could you add this as another test case please, since it's testing the case where the candidate list is bigger than what we end up removing. Thanks!

joechrisellis marked an inline comment as done.

Address @david-arm's comments.

  • Add test for the case where the number of removed instructions is strictly less than the number of candidates for removal.
david-arm accepted this revision.Jan 12 2021, 1:10 AM

LGTM! Thanks for adding the test.

This revision is now accepted and ready to land.Jan 12 2021, 1:10 AM