This is an archive of the discontinued LLVM Phabricator instance.

Differential D88505

[InstCombine] ease alignment restriction for converting masked load to normal load
ClosedPublic

Authored by spatel on Sep 29 2020, 10:53 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
efriedma
RKSimon

Commits

rG0527c8749b90: [InstCombine] ease alignment restriction for converting masked load to normal…

Summary

I think we made this fold conservative to be safer, but we do not need the alignment attribute/metadata limitation because the masked load intrinsic itself specifies the alignment. A normal vector load is better for IR transforms and should be no worse in codegen than the masked alternative. If it is worse for some target, the backend can reverse this transform.

Diff Detail

Event Timeline

spatel created this revision.Sep 29 2020, 10:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2020, 10:53 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

spatel requested review of this revision.Sep 29 2020, 10:53 AM

This seems obviously correct (C) to me, but i'm not very familiar with masked intrinsics semantics in LLVM, so i'll leave it for other reviewers.

LGTM

(I doubt this will come up often; most sources of "dereferenceable" also provide alignment. But it makes sense in any case.)

This revision is now accepted and ready to land.Sep 29 2020, 11:17 AM

Closed by commit rG0527c8749b90: [InstCombine] ease alignment restriction for converting masked load to normal… (authored by spatel). · Explain WhySep 29 2020, 12:26 PM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG0527c8749b90: [InstCombine] ease alignment restriction for converting masked load to normal….

In D88505#2301482, @efriedma wrote:

LGTM

(I doubt this will come up often; most sources of "dereferenceable" also provide alignment. But it makes sense in any case.)

I missed posting this comment before commit:
The motivation comes from increased use of masked loads that would be produced by SLP with the current proposal in D57059. I suspect SLP could/should do a better job of producing canonical code, but it's a mess and relies on instcombine for cleanup.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

5 lines

test/

Transforms/

InstCombine/

masked_intrinsics.ll

5 lines

Diff 295043

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

	Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines		Value *InstCombinerImpl::simplifyMaskedLoad(IntrinsicInst &II) {
	// If the mask is all ones or undefs, this is a plain vector load of the 1st			// If the mask is all ones or undefs, this is a plain vector load of the 1st
	// argument.			// argument.
	if (maskIsAllOneOrUndef(II.getArgOperand(2)))			if (maskIsAllOneOrUndef(II.getArgOperand(2)))
	return Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,			return Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,
	"unmaskedload");			"unmaskedload");

	// If we can unconditionally load from this address, replace with a			// If we can unconditionally load from this address, replace with a
	// load/select idiom. TODO: use DT for context sensitive query			// load/select idiom. TODO: use DT for context sensitive query
	if (isDereferenceableAndAlignedPointer(LoadPtr, II.getType(), Alignment,			if (isDereferenceablePointer(LoadPtr, II.getType(),
	II.getModule()->getDataLayout(), &II,			II.getModule()->getDataLayout(), &II, nullptr)) {
	nullptr)) {
	Value *LI = Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,			Value *LI = Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,
	"unmaskedload");			"unmaskedload");
	return Builder.CreateSelect(II.getArgOperand(2), LI, II.getArgOperand(3));			return Builder.CreateSelect(II.getArgOperand(2), LI, II.getArgOperand(3));
	}			}

	return nullptr;			return nullptr;
	}			}

	▲ Show 20 Lines • Show All 2,159 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/masked_intrinsics.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines		;
	%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)			%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	define <2 x double> @load_speculative_less_aligned(<2 x double>* dereferenceable(16) %ptr, double %pt, <2 x i1> %mask) {			define <2 x double> @load_speculative_less_aligned(<2 x double>* dereferenceable(16) %ptr, double %pt, <2 x i1> %mask) {
	; CHECK-LABEL: @load_speculative_less_aligned(			; CHECK-LABEL: @load_speculative_less_aligned(
	; CHECK-NEXT: [[PTV1:%.]] = insertelement <2 x double> undef, double [[PT:%.]], i64 0			; CHECK-NEXT: [[PTV1:%.]] = insertelement <2 x double> undef, double [[PT:%.]], i64 0
	; CHECK-NEXT: [[PTV2:%.*]] = shufflevector <2 x double> [[PTV1]], <2 x double> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: [[PTV2:%.*]] = shufflevector <2 x double> [[PTV1]], <2 x double> undef, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[RES:%.]] = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double> nonnull [[PTR:%.]], i32 4, <2 x i1> [[MASK:%.]], <2 x double> [[PTV2]])			; CHECK-NEXT: [[UNMASKEDLOAD:%.]] = load <2 x double>, <2 x double> [[PTR:%.*]], align 4
	; CHECK-NEXT: ret <2 x double> [[RES]]			; CHECK-NEXT: [[TMP1:%.]] = select <2 x i1> [[MASK:%.]], <2 x double> [[UNMASKEDLOAD]], <2 x double> [[PTV2]]
				; CHECK-NEXT: ret <2 x double> [[TMP1]]
	;			;
	%ptv1 = insertelement <2 x double> undef, double %pt, i64 0			%ptv1 = insertelement <2 x double> undef, double %pt, i64 0
	%ptv2 = insertelement <2 x double> %ptv1, double %pt, i64 1			%ptv2 = insertelement <2 x double> %ptv1, double %pt, i64 1
	%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)			%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	; Can't speculate since only half of required size is known deref			; Can't speculate since only half of required size is known deref
	▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines