This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics
ClosedPublic

Authored by joechrisellis on Jan 5 2021, 2:32 AM.

Download Raw Diff

Details

Reviewers

kmclaughlin
david-arm
peterwaller-arm
bsmith
efriedma
paulwalker-arm

Commits

rG3122c66aee7b: [AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics

Summary

This commit extends SVEIntrinsicOpts::optimizeConvertFromSVBool to
identify and remove longer chains of redundant SVE reintepret
intrinsics. For example, the following chain of redundant SVE
reinterprets is now recognised as redundant:

%a = <vscale x 2 x i1>
%1 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 2 x i1> %a)
%2 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %1)
%3 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %2)
%4 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %3)
%5 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %4)
%6 = <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %5)
ret <vscale x 2 x i1> %6

and will be replaced with:

ret <vscale x 2 x i1> %a

Eliminating these can sometimes mean emitting fewer unnecessary
loads/stores when lowering to assembly.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

joechrisellis created this revision.Jan 5 2021, 2:32 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJan 5 2021, 2:32 AM

Herald added subscribers: NickHung, psnobl, hiraditya and 2 others. · View Herald Transcript

joechrisellis requested review of this revision.Jan 5 2021, 2:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 2:32 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B84014: Diff 314551.Jan 5 2021, 3:42 AM

david-arm added a reviewer: paulwalker-arm.Jan 6 2021, 6:31 AM

joechrisellis mentioned this in D94230: [AArch64][SVE] Coalesce ptrue instrinsic calls where possible.Jan 11 2021, 2:14 AM

david-arm added inline comments.Jan 11 2021, 5:32 AM

llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
84	If you create a test with similar to this, but with "<vscale x 2 x i1> %a" is there a bug? From your algorithm above it looks like EarliestRemoval would be "%2 tail call ...", but we'd keep iterating Cursor until we get to "%a". If I've understood your algorithm correctly won't that mean we end up deleting %1 and %2 and end up with this? define <vscale x 4 x i1> @reinterpret_test_partial_chain(<vscale x 8 x i1> %a) { ret <vscale x 4 x i1> %2; }
84	Sorry, I meant create a test similar to this, but with "<vscale x 8 x i1> %a"!!

joechrisellis added inline comments.Jan 11 2021, 5:57 AM

llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

Hi @david-arm, just tested the algorithm with the following code:

define <vscale x 4 x i1> @foo(<vscale x 8 x i1> %a) {
  %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
  %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
  %3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2)
  %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %3)
  ret <vscale x 4 x i1> %4
}

And got the following output:

define <vscale x 4 x i1> @foo(<vscale x 8 x i1> %a) #1 {
  %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
  %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
  ret <vscale x 4 x i1> %2
}

This makes sense to me, although maybe I haven't fed the pass the same code that you were thinking of?

david-arm added inline comments.Jan 11 2021, 6:04 AM

llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll
84	Ah ok, I see yes. It's this line that prevents us from removing %1 and %2: if (Candidate->use_empty()) as %2 and %1 are being used. Could you add this as another test case please, since it's testing the case where the candidate list is bigger than what we end up removing. Thanks!

Address @david-arm's comments.

Add test for the case where the number of removed instructions is strictly less than the number of candidates for removal.

Harbormaster completed remote builds in B84686: Diff 315795.Jan 11 2021, 7:29 AM

LGTM! Thanks for adding the test.

This revision is now accepted and ready to land.Jan 12 2021, 1:10 AM

Closed by commit rG3122c66aee7b: [AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics (authored by joechrisellis). · Explain WhyJan 13 2021, 1:44 AM

This revision was automatically updated to reflect the committed changes.

joechrisellis added a commit: rG3122c66aee7b: [AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

SVEIntrinsicOpts.cpp

50 lines

test/

CodeGen/

AArch64/

sve-intrinsic-opts-reinterpret.ll

56 lines

Diff 316349

llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	bool SVEIntrinsicOpts::optimizeConvertFromSVBool(IntrinsicInst *I) {			bool SVEIntrinsicOpts::optimizeConvertFromSVBool(IntrinsicInst *I) {
	assert(I->getIntrinsicID() == Intrinsic::aarch64_sve_convert_from_svbool &&			assert(I->getIntrinsicID() == Intrinsic::aarch64_sve_convert_from_svbool &&
	"Unexpected opcode");			"Unexpected opcode");

	// If the reinterpret instruction operand is a PHI Node			// If the reinterpret instruction operand is a PHI Node
	if (isa<PHINode>(I->getArgOperand(0)))			if (isa<PHINode>(I->getArgOperand(0)))
	return processPhiNode(I);			return processPhiNode(I);

	// If we have a reinterpret intrinsic I of type A which is converting from			SmallVector<Instruction *, 32> CandidatesForRemoval;
	// another reinterpret Y of type B, and the source type of Y is A, then we can			Value Cursor = I->getOperand(0), EarliestReplacement = nullptr;
	// elide away both reinterprets if there are no other users of Y.
	auto *Y = isReinterpretToSVBool(I->getArgOperand(0));			const auto *IVTy = cast<VectorType>(I->getType());
	if (!Y)
	return false;

	Value *SourceVal = Y->getArgOperand(0);			// Walk the chain of conversions.
	if (I->getType() != SourceVal->getType())			while (Cursor) {
				// If the type of the cursor has fewer lanes than the final result, zeroing
				// must take place, which breaks the equivalence chain.
				const auto *CursorVTy = cast<VectorType>(Cursor->getType());
				if (CursorVTy->getElementCount().getKnownMinValue() <
				IVTy->getElementCount().getKnownMinValue())
				break;

				// If the cursor has the same type as I, it is a viable replacement.
				if (Cursor->getType() == IVTy)
				EarliestReplacement = Cursor;

				auto *IntrinsicCursor = dyn_cast<IntrinsicInst>(Cursor);

				// If this is not an SVE conversion intrinsic, this is the end of the chain.
				if (!IntrinsicCursor \|\| !(IntrinsicCursor->getIntrinsicID() ==
				Intrinsic::aarch64_sve_convert_to_svbool \|\|
				IntrinsicCursor->getIntrinsicID() ==
				Intrinsic::aarch64_sve_convert_from_svbool))
				break;

				CandidatesForRemoval.insert(CandidatesForRemoval.begin(), IntrinsicCursor);
				Cursor = IntrinsicCursor->getOperand(0);
				}

				// If no viable replacement in the conversion chain was found, there is
				// nothing to do.
				if (!EarliestReplacement)
	return false;			return false;

	I->replaceAllUsesWith(SourceVal);			I->replaceAllUsesWith(EarliestReplacement);
	I->eraseFromParent();			I->eraseFromParent();
	if (Y->use_empty())
	Y->eraseFromParent();

				while (!CandidatesForRemoval.empty()) {
				Instruction *Candidate = CandidatesForRemoval.pop_back_val();
				if (Candidate->use_empty())
				Candidate->eraseFromParent();
				}
	return true;			return true;
	}			}

	bool SVEIntrinsicOpts::optimizeIntrinsic(Instruction *I) {			bool SVEIntrinsicOpts::optimizeIntrinsic(Instruction *I) {
	IntrinsicInst *IntrI = dyn_cast<IntrinsicInst>(I);			IntrinsicInst *IntrI = dyn_cast<IntrinsicInst>(I);
	if (!IntrI)			if (!IntrI)
	return false;			return false;

	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; OPT: %1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %a)			; OPT: %1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %a)
	; OPT-NEXT: %2 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %1)			; OPT-NEXT: %2 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %1)
	; OPT-NEXT: ret <vscale x 16 x i1> %2			; OPT-NEXT: ret <vscale x 16 x i1> %2
	%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %a)			%1 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %a)
	%2 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %1)			%2 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %1)
	ret <vscale x 16 x i1> %2			ret <vscale x 16 x i1> %2
	}			}

				define <vscale x 2 x i1> @reinterpret_test_full_chain(<vscale x 2 x i1> %a) {
				; OPT-LABEL: @reinterpret_test_full_chain(
				; OPT: ret <vscale x 2 x i1> %a
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %a)
				%2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
				%3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2)
				%4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %3)
				%5 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %4)
				%6 = tail call <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1> %5)
				ret <vscale x 2 x i1> %6
				}

				; The last two reinterprets are not necessary, since they are doing the same
				; work as the first two.
				define <vscale x 4 x i1> @reinterpret_test_partial_chain(<vscale x 2 x i1> %a) {
				david-armUnsubmitted Not Done Reply Inline Actions If you create a test with similar to this, but with "<vscale x 2 x i1> %a" is there a bug? From your algorithm above it looks like EarliestRemoval would be "%2 tail call ...", but we'd keep iterating Cursor until we get to "%a". If I've understood your algorithm correctly won't that mean we end up deleting %1 and %2 and end up with this? define <vscale x 4 x i1> @reinterpret_test_partial_chain(<vscale x 8 x i1> %a) { ret <vscale x 4 x i1> %2; } david-arm: If you create a test with similar to this, but with "<vscale x 2 x i1> %a" is there a bug? From…
				david-armUnsubmitted Not Done Reply Inline Actions Sorry, I meant create a test similar to this, but with "<vscale x 8 x i1> %a"!! david-arm: Sorry, I meant create a test similar to this, but with "<vscale x 8 x i1> %a"!!
				joechrisellisAuthorUnsubmitted Done Reply Inline Actions Hi @david-arm, just tested the algorithm with the following code: define <vscale x 4 x i1> @foo(<vscale x 8 x i1> %a) { %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a) %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1) %3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2) %4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %3) ret <vscale x 4 x i1> %4 } And got the following output: define <vscale x 4 x i1> @foo(<vscale x 8 x i1> %a) #1 { %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a) %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1) ret <vscale x 4 x i1> %2 } This makes sense to me, although maybe I haven't fed the pass the same code that you were thinking of? joechrisellis: Hi @david-arm, just tested the algorithm with the following code: ``` define <vscale x 4 x i1>…
				david-armUnsubmitted Done Reply Inline Actions Ah ok, I see yes. It's this line that prevents us from removing %1 and %2: if (Candidate->use_empty()) as %2 and %1 are being used. Could you add this as another test case please, since it's testing the case where the candidate list is bigger than what we end up removing. Thanks! david-arm: Ah ok, I see yes. It's this line that prevents us from removing %1 and %2: if (Candidate…
				; OPT-LABEL: @reinterpret_test_partial_chain(
				; OPT: %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %a)
				; OPT-NEXT: %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
				; OPT-NEXT: ret <vscale x 4 x i1> %2
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv2i1(<vscale x 2 x i1> %a)
				%2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
				%3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2)
				%4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %3)
				ret <vscale x 4 x i1> %4
				}

				; The chain cannot be reduced because of the second reinterpret, which causes
				; zeroing.
				define <vscale x 8 x i1> @reinterpret_test_irreducible_chain(<vscale x 8 x i1> %a) {
				; OPT-LABEL: @reinterpret_test_irreducible_chain(
				; OPT: %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
				; OPT-NEXT: %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
				; OPT-NEXT: %3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2)
				; OPT-NEXT: %4 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %3)
				; OPT-NEXT: ret <vscale x 8 x i1> %4
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
				%2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
				%3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2)
				%4 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %3)
				ret <vscale x 8 x i1> %4
				}

				; Here, the candidate list is larger than the number of instructions that we
				; end up removing.
				define <vscale x 4 x i1> @reinterpret_test_keep_some_candidates(<vscale x 8 x i1> %a) {
				; OPT-LABEL: @reinterpret_test_keep_some_candidates(
				; OPT: %1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
				; OPT-NEXT: %2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
				; OPT-NEXT: ret <vscale x 4 x i1> %2
				%1 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv8i1(<vscale x 8 x i1> %a)
				%2 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1)
				%3 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %2)
				%4 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %3)
				ret <vscale x 4 x i1> %4
				}

	define <vscale x 2 x i1> @reinterpret_reductions(i32 %cond, <vscale x 2 x i1> %a, <vscale x 2 x i1> %b, <vscale x 2 x i1> %c) {			define <vscale x 2 x i1> @reinterpret_reductions(i32 %cond, <vscale x 2 x i1> %a, <vscale x 2 x i1> %b, <vscale x 2 x i1> %c) {
	; OPT-LABEL: reinterpret_reductions			; OPT-LABEL: reinterpret_reductions
	; OPT-NOT: convert			; OPT-NOT: convert
	; OPT-NOT: phi <vscale x 16 x i1>			; OPT-NOT: phi <vscale x 16 x i1>
	; OPT: phi <vscale x 2 x i1> [ %a, %br_phi_a ], [ %b, %br_phi_b ], [ %c, %br_phi_c ]			; OPT: phi <vscale x 2 x i1> [ %a, %br_phi_a ], [ %b, %br_phi_b ], [ %c, %br_phi_c ]
	; OPT-NOT: convert			; OPT-NOT: convert
	; OPT: ret			; OPT: ret

	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 316349

llvm/lib/Target/AArch64/SVEIntrinsicOpts.cpp

llvm/test/CodeGen/AArch64/sve-intrinsic-opts-reinterpret.ll

[AArch64][SVE] Remove chains of unnecessary SVE reinterpret intrinsics
ClosedPublic