This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] simplify masked load intrinsics with constant masks
ClosedPublic

Authored by spatel on Jan 28 2016, 12:45 PM.

Download Raw Diff

Details

Reviewers

RKSimon
delena
igorb

Commits

rGb695c5557cc9: [InstCombine] simplify masked load intrinsics with all ones or zeros masks
rL259369: [InstCombine] simplify masked load intrinsics with all ones or zeros masks

Summary

A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.

I think something similar may be happening in CodeGenPrepare with D13855, but it doesn't trigger for a target that actually supports these ops (an x86 AVX target for example). We may be able to remove some of that logic. Doing these transforms in InstCombine is a better solution because it will trigger sooner and allow more optimizations from other passes.

Eventually, I think we should be able to replace the x86 intrinsics with the llvm IR intrinsics.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 46305.Jan 28 2016, 12:45 PM

spatel retitled this revision from to [InstCombine] simplify masked load intrinsics with constant masks.

spatel updated this object.

spatel added reviewers: delena, igorb, RKSimon.

spatel added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptJan 28 2016, 12:45 PM

Some minor thoughts - I'd like to see what others have to say as well though.

lib/Transforms/InstCombine/InstCombineCalls.cpp
690 ↗	(On Diff #46305)	Minor: You can relax the type requirements by using Constant::isNullValue() and Constant::isAllOnesValue() But this is true for a lot of the code in InstCombineCalls.cpp .....
708 ↗	(On Diff #46305)	I'm really not sure whether we should try to do this here or leaving it until lowering. Its making a big assumption that the target is good at scalar loads + insertions.

spatel added inline comments.Jan 29 2016, 1:48 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
690 ↗	(On Diff #46305)	Yes, that makes the code simpler. I think we still want all of the asserts for sanity checking the intrinsic signature, but let me know if I'm missing anything.
708 ↗	(On Diff #46305)	I wasn't sure about that when I wrote it, so I threw it in to see what the consensus might be. But I agree with you now. Although I would only expect this intrinsic to be formed from C a intrinsic or the vectorizers, and therefore, implying that the masked op is supported by the target, we shouldn't assume that also means that other vector ops are supported equally. It could also be wrong if someone's trying to minimize size (-Oz). Let's handle other constant values in the DAG.

Patch updated based on feedback from Simon:

Cast the mask to plain 'Constant' for simpler predicates.
Remove the 'TODO' and associated test cases for constant masks with a single set bit.

delena added inline comments.Jan 30 2016, 11:38 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
678 ↗	(On Diff #46427)	Verifier checks intrinsic signature. You don't need these checks at all.
840 ↗	(On Diff #46427)	Not sure that you can handle masked gather/scatter this way.

Patch updated: remove assertions because these are or should be handled by the IR verifier.

spatel added inline comments.Jan 31 2016, 8:51 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
680 ↗	(On Diff #46493)	Thanks. I think the verifier has a hole - it doesn't confirm that the alignment operand is constant. But I've removed the assertions here because it does confirm the other things.
907 ↗	(On Diff #46493)	Please correct me if I've misunderstood, but I think scatter and gather each have one degenerate folding opportunity that we can handle here: A scatter with zero mask is a nop. A gather with zero mask will return the passthru arg.

LGTM

lib/Transforms/InstCombine/InstCombineCalls.cpp
907 ↗	(On Diff #46493)	yes, for zero mask it can be optimized.

This revision is now accepted and ready to land.Feb 1 2016, 1:56 AM

Closed by commit rL259369: [InstCombine] simplify masked load intrinsics with all ones or zeros masks (authored by spatel). · Explain WhyFeb 1 2016, 9:04 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D16828: [x86] convert masked store of one element to scalar store.Feb 2 2016, 3:28 PM

spatel mentioned this in rL260145: [x86] convert masked store of one element to scalar store.Feb 8 2016, 1:09 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

30 lines

test/

Transforms/

InstCombine/

masked_intrinsics.ll

23 lines

Diff 46547

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 747 Lines • ▼ Show 20 Lines	if (C1 && C1->isInfinity()) {
// fmax(x, inf) -> inf		// fmax(x, inf) -> inf
if (!C1->isNegative())		if (!C1->isNegative())
return Arg1;		return Arg1;
}		}
}		}
return nullptr;		return nullptr;
}		}

		static Value *simplifyMaskedLoad(const IntrinsicInst &II,
		InstCombiner::BuilderTy &Builder) {
		auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(2));
		if (!ConstMask)
		return nullptr;

		// If the mask is all zeros, the "passthru" argument is the result.
		if (ConstMask->isNullValue())
		return II.getArgOperand(3);

		// If the mask is all ones, this is a plain vector load of the 1st argument.
		if (ConstMask->isAllOnesValue()) {
		Value *LoadPtr = II.getArgOperand(0);
		unsigned Alignment = cast<ConstantInt>(II.getArgOperand(1))->getZExtValue();
		return Builder.CreateAlignedLoad(LoadPtr, Alignment, "unmaskedload");
		}

		return nullptr;
		}

/// CallInst simplification. This mostly only handles folding of intrinsic		/// CallInst simplification. This mostly only handles folding of intrinsic
/// instructions. For normal calls, it allows visitCallSite to do the heavy		/// instructions. For normal calls, it allows visitCallSite to do the heavy
/// lifting.		/// lifting.
Instruction *InstCombiner::visitCallInst(CallInst &CI) {		Instruction *InstCombiner::visitCallInst(CallInst &CI) {
auto Args = CI.arg_operands();		auto Args = CI.arg_operands();
if (Value *V = SimplifyCall(CI.getCalledValue(), Args.begin(), Args.end(), DL,		if (Value *V = SimplifyCall(CI.getCalledValue(), Args.begin(), Args.end(), DL,
TLI, DT, AC))		TLI, DT, AC))
return ReplaceInstUsesWith(CI, V);		return ReplaceInstUsesWith(CI, V);
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	case Intrinsic::bitreverse: {
Value *X = nullptr;		Value *X = nullptr;

// bitreverse(bitreverse(x)) -> x		// bitreverse(bitreverse(x)) -> x
if (match(IIOperand, m_Intrinsic<Intrinsic::bitreverse>(m_Value(X))))		if (match(IIOperand, m_Intrinsic<Intrinsic::bitreverse>(m_Value(X))))
return ReplaceInstUsesWith(CI, X);		return ReplaceInstUsesWith(CI, X);
break;		break;
}		}

		case Intrinsic::masked_load:
		if (Value SimplifiedMaskedOp = simplifyMaskedLoad(II, *Builder))
		return ReplaceInstUsesWith(CI, SimplifiedMaskedOp);
		break;

		// TODO: Handle the other masked ops.
		// case Intrinsic::masked_store:
		// case Intrinsic::masked_gather:
		// case Intrinsic::masked_scatter:

case Intrinsic::powi:		case Intrinsic::powi:
if (ConstantInt *Power = dyn_cast<ConstantInt>(II->getArgOperand(1))) {		if (ConstantInt *Power = dyn_cast<ConstantInt>(II->getArgOperand(1))) {
// powi(x, 0) -> 1.0		// powi(x, 0) -> 1.0
if (Power->isZero())		if (Power->isZero())
return ReplaceInstUsesWith(CI, ConstantFP::get(CI.getType(), 1.0));		return ReplaceInstUsesWith(CI, ConstantFP::get(CI.getType(), 1.0));
// powi(x, 1) -> x		// powi(x, 1) -> x
if (Power->isOne())		if (Power->isOne())
return ReplaceInstUsesWith(CI, II->getArgOperand(0));		return ReplaceInstUsesWith(CI, II->getArgOperand(0));
▲ Show 20 Lines • Show All 1,607 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/masked_intrinsics.ll

	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -instcombine -S < %s \| FileCheck %s

	declare <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptrs, i32, <2 x i1> %mask, <2 x double> %src0)			declare <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptrs, i32, <2 x i1> %mask, <2 x double> %src0)

	; FIXME: All of these could be simplified.

	define <2 x double> @load_zeromask(<2 x double>* %ptr, <2 x double> %passthru) {			define <2 x double> @load_zeromask(<2 x double>* %ptr, <2 x double> %passthru) {
	%res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 1, <2 x i1> zeroinitializer, <2 x double> %passthru)			%res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 1, <2 x i1> zeroinitializer, <2 x double> %passthru)
	ret <2 x double> %res			ret <2 x double> %res

	; CHECK-LABEL: @load_zeromask(			; CHECK-LABEL: @load_zeromask(
	; CHECK-NEXT: %res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 1, <2 x i1> zeroinitializer, <2 x double> %passthru)			; CHECK-NEXT ret <2 x double> %passthru
	; CHECK-NEXT ret <2 x double> %res
	}			}

	define <2 x double> @load_onemask(<2 x double>* %ptr, <2 x double> %passthru) {			define <2 x double> @load_onemask(<2 x double>* %ptr, <2 x double> %passthru) {
	%res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 2, <2 x i1> <i1 1, i1 1>, <2 x double> %passthru)			%res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 2, <2 x i1> <i1 1, i1 1>, <2 x double> %passthru)
	ret <2 x double> %res			ret <2 x double> %res

	; CHECK-LABEL: @load_onemask(			; CHECK-LABEL: @load_onemask(
	; CHECK-NEXT: %res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 2, <2 x i1> <i1 true, i1 true>, <2 x double> %passthru)			; CHECK-NEXT: %unmaskedload = load <2 x double>, <2 x double>* %ptr, align 2
	; CHECK-NEXT ret <2 x double> %res
	}

	define <2 x double> @load_onesetbitmask1(<2 x double>* %ptr, <2 x double> %passthru) {
	%res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 3, <2 x i1> <i1 0, i1 1>, <2 x double> %passthru)
	ret <2 x double> %res

	; CHECK-LABEL: @load_onesetbitmask1(
	; CHECK-NEXT: %res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 3, <2 x i1> <i1 false, i1 true>, <2 x double> %passthru)
	; CHECK-NEXT ret <2 x double> %res			; CHECK-NEXT ret <2 x double> %res
	}			}

	define <2 x double> @load_onesetbitmask2(<2 x double>* %ptr, <2 x double> %passthru) {
	%res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 4, <2 x i1> <i1 1, i1 0>, <2 x double> %passthru)
	ret <2 x double> %res

	; CHECK-LABEL: @load_onesetbitmask2(
	; CHECK-NEXT: %res = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %ptr, i32 4, <2 x i1> <i1 true, i1 false>, <2 x double> %passthru)
	; CHECK-NEXT ret <2 x double> %res
	}