Download Raw Diff

Details

Reviewers

ctetreau
david-arm
efriedma
rengolin
sdesmalen

Commits

rG15e9a6c2118f: [llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on…

Summary

This patch prevents the llvm.masked.gather and llvm.masked.scatter intrinsics to be scalarized when invoked on scalable vectors.

The change in Function.cpp is needed to prevent the warning that is raised when getNumElements is used in place of getElementCount on VectorType instances. The tests guards for regressions on this change.

The tests makes sure that calls to llvm.masked.[gather|scatter] are still scalarized when:

the intrinsics are operating on fixed size vectors, and
the compiler is not targeting fixed length SVE code generation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fpetrogalli created this revision.Aug 19 2020, 2:57 PM

Herald added a reviewer: rengolin. · View Herald TranscriptAug 19 2020, 2:57 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. · View Herald Transcript

fpetrogalli requested review of this revision.Aug 19 2020, 2:57 PM

ScalarizeMaskedMemIntrin can't scalarize scalable vectors on any target; would it make sense to check there?

We probably want the TTI change eventually for the sake of cost modeling in the vectorizer, but I'd prefer to be defensive about inappropriate transforms on scalable vectors.

Harbormaster completed remote builds in B68950: Diff 286669.Aug 19 2020, 3:45 PM

In D86249#2227281, @efriedma wrote:

ScalarizeMaskedMemIntrin can't scalarize scalable vectors on any target; would it make sense to check there?

Sure, good idea, working on it. It is a trivial change, but I would like to add a comment as an explanation. What is the reason for not being able to scalarize anything that work on scalable vectors? Is it because this scalarization pass cannot (shouldn't?) produce loops?

We probably want the TTI change eventually for the sake of cost modeling in the vectorizer, but I'd prefer to be defensive about inappropriate transforms on scalable vectors.

OK - but I think I should remove them from this patch. We will introduce them when needed. Does that sound right to you?

Thank you!

Francesco

What is the reason for not being able to scalarize anything that work on scalable vectors? Is it because this scalarization pass cannot (shouldn't?) produce loops?

The scalarization pass currently doesn't produce loops, so the resulting code is nonsense. We could, in theory, teach the scalarization pass how to generate loops, but it would be hard to do well.

OK - but I think I should remove them from this patch. We will introduce them when needed. Does that sound right to you?

Makes sense

I lifted the implemementation to the target-agnostic part of the scalarize pass.

Harbormaster completed remote builds in B69318: Diff 287394.Aug 24 2020, 8:28 AM

sdesmalen added a subscriber: sdesmalen.Aug 27 2020, 3:27 AM

sdesmalen added inline comments.

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
900	Should this check be hoisted out of the switch statement and have it return false if the result or any of the operands to the intrinsic is scalable? The intrinsic itself doesn't really matter, given that the pass doesn't produce loops for any of the loads/stores of scalable vector types, not just gather/scatter.

efriedma added inline comments.Aug 27 2020, 3:54 AM

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
900	I doubt anyone is going to care about scalable masked_expandloads anytime soon, but sure, hosting it out makes sense.

fpetrogalli added inline comments.Aug 28 2020, 12:42 PM

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
900	Given that we know this pass doesn't produce loop, I am happy to hoist this even outside the `if (II)`, and test it on a generic `CallInst` instead of `IntrinsicInst`. Does that make sense?

[llvm][sve] Make llvm.masked.[gather|scatter] legal for SVE.

The title of the patch seems wrong, as this patch doesn't do any legalization of masked.gather/scatter for SVE?

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
900	Sure, that sounds fine.

fpetrogalli added inline comments.Sep 7 2020, 8:15 AM

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp
900	Hi both - sorry for the long wait. I had a go at hoisting the check outside the switch statement, and on a generic CallInst. I actually prefer the original version in this patch, for the following reasons: The check on the generic CallInst is a bit weird, because the pass is called "ScalarizeMaskedMemInstrinsics". Hence, the pass is already expected to ignore anything that is not a masked memory intrinsic, I don't see why we should create CallInst specific behavior. I suggested this, but I don't think it is the right thing to do anymore after seeing a LIT test with a generic call function that is invoking `opt` with `-scalarize-masked-mem-intrin`. Hosting the check outside the switch statement is viable, but than that would make sense if the scalarization process would be something generic over the masked mem intrinsics. Instead, as things are, scalarization is performed on a per intrinsic base. For this reason, I think we should perform this extra test in each case (like it is done in this patch), as we will be able to enable scalarization on each intrinsic individually without having to revert the generic check for all the masked memory intrinsics.

fpetrogalli retitled this revision from [llvm][sve] Make `llvm.masked.[gather|scatter]` legal for SVE. to [llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on scalable vectors..Sep 7 2020, 8:43 AM

fpetrogalli edited the summary of this revision. (Show Details)

LGTM

This revision is now accepted and ready to land.Sep 7 2020, 5:14 PM

@sdesmalen - I have moved the check outside the switch statement as discussed in our phone call.

Harbormaster completed remote builds in B71355: Diff 291187.Sep 11 2020, 5:31 AM

fpetrogalli added a reviewer: sdesmalen.Sep 14 2020, 3:20 AM

Thanks @fpetrogalli, LGTM!

llvm/test/CodeGen/AArch64/llvm-masked-gather-legal-for-sve.ll
55	nit: `s/passthro/passthru`

Fix typo.

This revision was landed with ongoing or failed builds.Sep 16 2020, 9:03 AM

Closed by commit rG15e9a6c2118f: [llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on… (authored by fpetrogalli). · Explain Why

This revision was automatically updated to reflect the committed changes.

fpetrogalli added a commit: rG15e9a6c2118f: [llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on….

Harbormaster completed remote builds in B71887: Diff 292242.Sep 16 2020, 9:31 AM

Diff 292245

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp

Show First 20 Lines • Show All 859 Lines • ▼ Show 20 Lines	bool ScalarizeMaskedMemIntrin::optimizeBlock(BasicBlock &BB, bool &ModifiedDT) {

return MadeChange;		return MadeChange;
}		}

bool ScalarizeMaskedMemIntrin::optimizeCallInst(CallInst *CI,		bool ScalarizeMaskedMemIntrin::optimizeCallInst(CallInst *CI,
bool &ModifiedDT) {		bool &ModifiedDT) {
IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);		IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
if (II) {		if (II) {
		// The scalarization code below does not work for scalable vectors.
		if (isa<ScalableVectorType>(II->getType()) \|\|
		any_of(II->arg_operands(),
		[](Value *V) { return isa<ScalableVectorType>(V->getType()); }))
		return false;

switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
break;		break;
case Intrinsic::masked_load:		case Intrinsic::masked_load:
// Scalarize unsupported vector masked load		// Scalarize unsupported vector masked load
if (TTI->isLegalMaskedLoad(		if (TTI->isLegalMaskedLoad(
CI->getType(),		CI->getType(),
cast<ConstantInt>(CI->getArgOperand(1))->getAlignValue()))		cast<ConstantInt>(CI->getArgOperand(1))->getAlignValue()))
Show All 10 Lines	if (II) {
case Intrinsic::masked_gather: {		case Intrinsic::masked_gather: {
unsigned AlignmentInt =		unsigned AlignmentInt =
cast<ConstantInt>(CI->getArgOperand(1))->getZExtValue();		cast<ConstantInt>(CI->getArgOperand(1))->getZExtValue();
Type *LoadTy = CI->getType();		Type *LoadTy = CI->getType();
Align Alignment =		Align Alignment =
DL->getValueOrABITypeAlignment(MaybeAlign(AlignmentInt), LoadTy);		DL->getValueOrABITypeAlignment(MaybeAlign(AlignmentInt), LoadTy);
if (TTI->isLegalMaskedGather(LoadTy, Alignment))		if (TTI->isLegalMaskedGather(LoadTy, Alignment))
return false;		return false;
scalarizeMaskedGather(CI, ModifiedDT);		scalarizeMaskedGather(CI, ModifiedDT);
		sdesmalenUnsubmitted Not Done Reply Inline Actions Should this check be hoisted out of the switch statement and have it return false if the result or any of the operands to the intrinsic is scalable? The intrinsic itself doesn't really matter, given that the pass doesn't produce loops for any of the loads/stores of scalable vector types, not just gather/scatter. sdesmalen: Should this check be hoisted out of the switch statement and have it return false if the result…
		efriedmaUnsubmitted Not Done Reply Inline Actions I doubt anyone is going to care about scalable masked_expandloads anytime soon, but sure, hosting it out makes sense. efriedma: I doubt anyone is going to care about scalable masked_expandloads anytime soon, but sure…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Given that we know this pass doesn't produce loop, I am happy to hoist this even outside the `if (II)`, and test it on a generic `CallInst` instead of `IntrinsicInst`. Does that make sense? fpetrogalli: Given that we know this pass doesn't produce loop, I am happy to hoist this even outside the…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Sure, that sounds fine. sdesmalen: Sure, that sounds fine.
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Hi both - sorry for the long wait. I had a go at hoisting the check outside the switch statement, and on a generic CallInst. I actually prefer the original version in this patch, for the following reasons: The check on the generic CallInst is a bit weird, because the pass is called "ScalarizeMaskedMemInstrinsics". Hence, the pass is already expected to ignore anything that is not a masked memory intrinsic, I don't see why we should create CallInst specific behavior. I suggested this, but I don't think it is the right thing to do anymore after seeing a LIT test with a generic call function that is invoking `opt` with `-scalarize-masked-mem-intrin`. Hosting the check outside the switch statement is viable, but than that would make sense if the scalarization process would be something generic over the masked mem intrinsics. Instead, as things are, scalarization is performed on a per intrinsic base. For this reason, I think we should perform this extra test in each case (like it is done in this patch), as we will be able to enable scalarization on each intrinsic individually without having to revert the generic check for all the masked memory intrinsics. fpetrogalli: Hi both - sorry for the long wait. I had a go at hoisting the check outside the switch…
return true;		return true;
}		}
case Intrinsic::masked_scatter: {		case Intrinsic::masked_scatter: {
unsigned AlignmentInt =		unsigned AlignmentInt =
cast<ConstantInt>(CI->getArgOperand(2))->getZExtValue();		cast<ConstantInt>(CI->getArgOperand(2))->getZExtValue();
Type *StoreTy = CI->getArgOperand(0)->getType();		Type *StoreTy = CI->getArgOperand(0)->getType();
Align Alignment =		Align Alignment =
DL->getValueOrABITypeAlignment(MaybeAlign(AlignmentInt), StoreTy);		DL->getValueOrABITypeAlignment(MaybeAlign(AlignmentInt), StoreTy);
Show All 20 Lines

llvm/lib/IR/Function.cpp

Show First 20 Lines • Show All 1,394 Lines • ▼ Show 20 Lines	case IITDescriptor::VecOfAnyPtrsToElt: {
}		}

// Verify the overloaded type "matches" the Ref type.		// Verify the overloaded type "matches" the Ref type.
// i.e. Ty is a vector with the same width as Ref.		// i.e. Ty is a vector with the same width as Ref.
// Composed of pointers to the same element type as Ref.		// Composed of pointers to the same element type as Ref.
auto *ReferenceType = dyn_cast<VectorType>(ArgTys[RefArgNumber]);		auto *ReferenceType = dyn_cast<VectorType>(ArgTys[RefArgNumber]);
auto *ThisArgVecTy = dyn_cast<VectorType>(Ty);		auto *ThisArgVecTy = dyn_cast<VectorType>(Ty);
if (!ThisArgVecTy \|\| !ReferenceType \|\|		if (!ThisArgVecTy \|\| !ReferenceType \|\|
(cast<FixedVectorType>(ReferenceType)->getNumElements() !=		(ReferenceType->getElementCount() != ThisArgVecTy->getElementCount()))
cast<FixedVectorType>(ThisArgVecTy)->getNumElements()))
return true;		return true;
PointerType *ThisArgEltTy =		PointerType *ThisArgEltTy =
dyn_cast<PointerType>(ThisArgVecTy->getElementType());		dyn_cast<PointerType>(ThisArgVecTy->getElementType());
if (!ThisArgEltTy)		if (!ThisArgEltTy)
return true;		return true;
return ThisArgEltTy->getElementType() != ReferenceType->getElementType();		return ThisArgEltTy->getElementType() != ReferenceType->getElementType();
}		}
case IITDescriptor::VecElementArgument: {		case IITDescriptor::VecElementArgument: {
▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/llvm-masked-gather-legal-for-sve.ll

This file was added.

				; RUN: opt -mtriple=aarch64-linux-gnu -mattr=+sve -scalarize-masked-mem-intrin -S < %s 2>%t \| FileCheck %s
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				; Testing that masked gathers operating on scalable vectors that are
				; packed in SVE registers are not scalarized.

				; CHECK-LABEL: @masked_gather_nxv4i32(
				; CHECK: call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32
				define <vscale x 4 x i32> @masked_gather_nxv4i32(<vscale x 4 x i32*> %ld, <vscale x 4 x i1> %masks, <vscale x 4 x i32> %passthru) {
				%res = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32(<vscale x 4 x i32*> %ld, i32 0, <vscale x 4 x i1> %masks, <vscale x 4 x i32> %passthru)
				ret <vscale x 4 x i32> %res
				}

				; Testing that masked gathers operating on scalable vectors of FP data
				; that is packed in SVE registers are not scalarized.

				; CHECK-LABEL: @masked_gather_nxv2f64(
				; CHECK: call <vscale x 2 x double> @llvm.masked.gather.nxv2f64
				define <vscale x 2 x double> @masked_gather_nxv2f64(<vscale x 2 x double*> %ld, <vscale x 2 x i1> %masks, <vscale x 2 x double> %passthru) {
				%res = call <vscale x 2 x double> @llvm.masked.gather.nxv2f64(<vscale x 2 x double*> %ld, i32 0, <vscale x 2 x i1> %masks, <vscale x 2 x double> %passthru)
				ret <vscale x 2 x double> %res
				}

				; Testing that masked gathers operating on scalable vectors of FP data
				; that is unpacked in SVE registers are not scalarized.

				; CHECK-LABEL: @masked_gather_nxv2f16(
				; CHECK: call <vscale x 2 x half> @llvm.masked.gather.nxv2f16
				define <vscale x 2 x half> @masked_gather_nxv2f16(<vscale x 2 x half*> %ld, <vscale x 2 x i1> %masks, <vscale x 2 x half> %passthru) {
				%res = call <vscale x 2 x half> @llvm.masked.gather.nxv2f16(<vscale x 2 x half*> %ld, i32 0, <vscale x 2 x i1> %masks, <vscale x 2 x half> %passthru)
				ret <vscale x 2 x half> %res
				}

				; Testing that masked gathers operating on 64-bit fixed vectors are
				; scalarized because NEON doesn't have support for masked gather
				; instructions.

				; CHECK-LABEL: @masked_gather_v2f32(
				; CHECK-NOT: @llvm.masked.gather.v2f32(
				define <2 x float> @masked_gather_v2f32(<2 x float*> %ld, <2 x i1> %masks, <2 x float> %passthru) {
				%res = call <2 x float> @llvm.masked.gather.v2f32(<2 x float*> %ld, i32 0, <2 x i1> %masks, <2 x float> %passthru)
				ret <2 x float> %res
				}

				; Testing that masked gathers operating on 128-bit fixed vectors are
				; scalarized because NEON doesn't have support for masked gather
				; instructions and because we are not targeting fixed width SVE.

				; CHECK-LABEL: @masked_gather_v4i32(
				; CHECK-NOT: @llvm.masked.gather.v4i32(
				define <4 x i32> @masked_gather_v4i32(<4 x i32*> %ld, <4 x i1> %masks, <4 x i32> %passthru) {
				%res = call <4 x i32> @llvm.masked.gather.v4i32(<4 x i32*> %ld, i32 0, <4 x i1> %masks, <4 x i32> %passthru)
				sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `s/passthro/passthru` sdesmalen: nit: `s/passthro/passthru`
				ret <4 x i32> %res
				}

				declare <vscale x 4 x i32> @llvm.masked.gather.nxv4i32(<vscale x 4 x i32*> %ptrs, i32 %align, <vscale x 4 x i1> %masks, <vscale x 4 x i32> %passthru)
				declare <vscale x 2 x double> @llvm.masked.gather.nxv2f64(<vscale x 2 x double*> %ptrs, i32 %align, <vscale x 2 x i1> %masks, <vscale x 2 x double> %passthru)
				declare <vscale x 2 x half> @llvm.masked.gather.nxv2f16(<vscale x 2 x half*> %ptrs, i32 %align, <vscale x 2 x i1> %masks, <vscale x 2 x half> %passthru)
				declare <2 x float> @llvm.masked.gather.v2f32(<2 x float*> %ptrs, i32 %align, <2 x i1> %masks, <2 x float> %passthru)
				declare <4 x i32> @llvm.masked.gather.v4i32(<4 x i32*> %ptrs, i32 %align, <4 x i1> %masks, <4 x i32> %passthru)

llvm/test/CodeGen/AArch64/llvm-masked-scatter-legal-for-sve.ll

This file was added.

				; RUN: opt -mtriple=aarch64-linux-gnu -mattr=+sve -scalarize-masked-mem-intrin -S < %s 2>%t \| FileCheck %s
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				; Testing that masked scatters operating on scalable vectors that are
				; packed in SVE registers are not scalarized.

				; CHECK-LABEL: @masked_scatter_nxv4i32(
				; CHECK: call void @llvm.masked.scatter.nxv4i32
				define void @masked_scatter_nxv4i32(<vscale x 4 x i32> %data, <vscale x 4 x i32*> %ptrs, <vscale x 4 x i1> %masks) {
				call void @llvm.masked.scatter.nxv4i32(<vscale x 4 x i32> %data, <vscale x 4 x i32*> %ptrs, i32 0, <vscale x 4 x i1> %masks)
				ret void
				}

				; Testing that masked scatters operating on scalable vectors of FP
				; data that is packed in SVE registers are not scalarized.

				; CHECK-LABEL: @masked_scatter_nxv2f64(
				; CHECK: call void @llvm.masked.scatter.nxv2f64
				define void @masked_scatter_nxv2f64(<vscale x 2 x double> %data, <vscale x 2 x double*> %ptrs, <vscale x 2 x i1> %masks) {
				call void @llvm.masked.scatter.nxv2f64(<vscale x 2 x double> %data, <vscale x 2 x double*> %ptrs, i32 0, <vscale x 2 x i1> %masks)
				ret void
				}

				; Testing that masked scatters operating on scalable vectors of FP
				; data that is unpacked in SVE registers are not scalarized.

				; CHECK-LABEL: @masked_scatter_nxv2f16(
				; CHECK: call void @llvm.masked.scatter.nxv2f16
				define void @masked_scatter_nxv2f16(<vscale x 2 x half> %data, <vscale x 2 x half*> %ptrs, <vscale x 2 x i1> %masks) {
				call void @llvm.masked.scatter.nxv2f16(<vscale x 2 x half> %data, <vscale x 2 x half*> %ptrs, i32 0, <vscale x 2 x i1> %masks)
				ret void
				}

				; Testing that masked scatters operating on 64-bit fixed vectors are
				; scalarized because NEON doesn't have support for masked scatter
				; instructions.

				; CHECK-LABEL: @masked_scatter_v2f32(
				; CHECK-NOT: @llvm.masked.scatter.v2f32(
				define void @masked_scatter_v2f32(<2 x float> %data, <2 x float*> %ptrs, <2 x i1> %masks) {
				call void @llvm.masked.scatter.v2f32(<2 x float> %data, <2 x float*> %ptrs, i32 0, <2 x i1> %masks)
				ret void
				}

				; Testing that masked scatters operating on 128-bit fixed vectors are
				; scalarized because NEON doesn't have support for masked scatter
				; instructions and because we are not targeting fixed width SVE.

				; CHECK-LABEL: @masked_scatter_v4i32(
				; CHECK-NOT: @llvm.masked.scatter.v4i32(
				define void @masked_scatter_v4i32(<4 x i32> %data, <4 x i32*> %ptrs, <4 x i1> %masks) {
				call void @llvm.masked.scatter.v4i32(<4 x i32> %data, <4 x i32*> %ptrs, i32 0, <4 x i1> %masks)
				ret void
				}

				declare void @llvm.masked.scatter.nxv4i32(<vscale x 4 x i32> %data, <vscale x 4 x i32*> %ptrs, i32 %align, <vscale x 4 x i1> %masks)
				declare void @llvm.masked.scatter.nxv2f64(<vscale x 2 x double> %data, <vscale x 2 x double*> %ptrs, i32 %align, <vscale x 2 x i1> %masks)
				declare void @llvm.masked.scatter.nxv2f16(<vscale x 2 x half> %data, <vscale x 2 x half*> %ptrs, i32 %align, <vscale x 2 x i1> %masks)
				declare void @llvm.masked.scatter.v2f32(<2 x float> %data, <2 x float*> %ptrs, i32 %align, <2 x i1> %masks)
				declare void @llvm.masked.scatter.v4i32(<4 x i32> %data, <4 x i32*> %ptrs, i32 %align, <4 x i1> %masks)

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on scalable vectors.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 292245

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp

llvm/lib/IR/Function.cpp

llvm/test/CodeGen/AArch64/llvm-masked-gather-legal-for-sve.ll

llvm/test/CodeGen/AArch64/llvm-masked-scatter-legal-for-sve.ll

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on scalable vectors.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 292245

llvm/lib/CodeGen/ScalarizeMaskedMemIntrin.cpp

llvm/lib/IR/Function.cpp

llvm/test/CodeGen/AArch64/llvm-masked-gather-legal-for-sve.ll

llvm/test/CodeGen/AArch64/llvm-masked-scatter-legal-for-sve.ll

[llvm][CodeGen] Do not scalarize `llvm.masked.[gather|scatter]` operating on scalable vectors.
ClosedPublic