This is an archive of the discontinued LLVM Phabricator instance.

Convert a masked.load of a dereferenceable address to an unconditional load
ClosedPublic

Authored by reames on Mar 22 2019, 9:56 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel

Commits

rG2ce017026af5: [InstCombine] Convert a masked.load of a dereferenceable address to an…
rL359000: [InstCombine] Convert a masked.load of a dereferenceable address to an…

Summary

If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.

Diff Detail

Event Timeline

reames created this revision.Mar 22 2019, 9:56 AM

Herald added subscribers: bollu, mcrosier. · View Herald TranscriptMar 22 2019, 9:56 AM

missed a todo

ping?

I'm not really an expert in InstCombine but it looks good to me. Someone else should probably take a look as well. I added some minor comments though.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1169	Isn't that implemented below (in this patch)?
1247	I would have preferred an actual type here (and in one or two other places below where it is not obvious). However, I don't know if this is the "standard" in this part of the code base.
test/Transforms/InstCombine/masked_intrinsics.ll
211	The zero here is the lane, correct? It looks "like a special case", even if it's not. Could we copy this test, change the width to 8 and gather lane 5? That should make sure it works "in general" with minimal effort on your part. Idk if there is, or if we need, a negative test as well, e.g., two active lanes in a constant mask.

These are 2 independent transforms, right? It would be better to split that into 2 smaller reviews/commits, so we are not missing anything (and easier to diagnose if there's trouble post-commit). The tests should show minimal IR patterns to exercise those 2 independent transforms.

In D59703#1473037, @spatel wrote:

These are 2 independent transforms, right? It would be better to split that into 2 smaller reviews/commits, so we are not missing anything (and easier to diagnose if there's trouble post-commit). The tests should show minimal IR patterns to exercise those 2 independent transforms.

You can view them that way, or not. I was looking at this as "what was needed to decompose a single element gather from a dereferenceable address", but I can see your point. I'll split.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1169	Nope, that's the gather case. Analogous, but distinct intrinsics.
1247	I'm happy to change to explicit types, which did you find non-obvious?
test/Transforms/InstCombine/masked_intrinsics.ll
211	Will do.

reames mentioned this in rL358906: [Tests] Add a negative test for masked.gather part of D59703.Apr 22 2019, 11:26 AM

reames mentioned this in rG8f470890344f: [Tests] Add a negative test for masked.gather part of D59703.Apr 22 2019, 11:29 AM

Split the patch. This review is now only the masked.load part.

reames mentioned this in rL358907: [Tests] Revise a test as requested by reviewer in D59703.Apr 22 2019, 11:52 AM

reames mentioned this in rGf01583d09751: [Tests] Revise a test as requested by reviewer in D59703.

The masked.gather part is split to D60975.

LGTM. This canonicalization is reverse of the typical (shorter code is preferred), but the reasoning in the summary makes sense: we have lots of IR optimizer logic for selects, and this should be reversible in the backend.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1183	typo: sensative
1187	I think LLVM standards prefer to make this an explicit "Value " rather than "auto ": http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

This revision is now accepted and ready to land.Apr 22 2019, 2:13 PM

Closed by commit rL359000: [InstCombine] Convert a masked.load of a dereferenceable address to an… (authored by reames). · Explain WhyApr 23 2019, 8:25 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2019, 8:25 AM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

21 lines

test/

Transforms/

InstCombine/

masked_intrinsics.ll

5 lines

Diff 196105

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show All 15 Lines
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
▲ Show 20 Lines • Show All 1,128 Lines • ▼ Show 20 Lines	static APInt possiblyDemandedEltsInMask(Value *Mask) {
if (auto *CV = dyn_cast<ConstantVector>(Mask))		if (auto *CV = dyn_cast<ConstantVector>(Mask))
for (unsigned i = 0; i < VWidth; i++)		for (unsigned i = 0; i < VWidth; i++)
if (CV->getAggregateElement(i)->isNullValue())		if (CV->getAggregateElement(i)->isNullValue())
DemandedElts.clearBit(i);		DemandedElts.clearBit(i);
return DemandedElts;		return DemandedElts;
}		}

// TODO, Obvious Missing Transforms:		// TODO, Obvious Missing Transforms:
// * Dereferenceable address -> speculative load/select		// * Single active lane to scalar masked load
		jdoerfertUnsubmitted Not Done Reply Inline Actions Isn't that implemented below (in this patch)? jdoerfert: Isn't that implemented below (in this patch)?
		reamesAuthorUnsubmitted Done Reply Inline Actions Nope, that's the gather case. Analogous, but distinct intrinsics. reames: Nope, that's the gather case. Analogous, but distinct intrinsics.
// * Narrow width by halfs excluding zero/undef lanes		// * Narrow width by halfs excluding zero/undef lanes
static Value *simplifyMaskedLoad(const IntrinsicInst &II,		static Value *simplifyMaskedLoad(const IntrinsicInst &II,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
// If the mask is all ones or undefs, this is a plain vector load of the 1st
// argument.
if (maskIsAllOneOrUndef(II.getArgOperand(2))) {
Value *LoadPtr = II.getArgOperand(0);		Value *LoadPtr = II.getArgOperand(0);
unsigned Alignment = cast<ConstantInt>(II.getArgOperand(1))->getZExtValue();		unsigned Alignment = cast<ConstantInt>(II.getArgOperand(1))->getZExtValue();

		// If the mask is all ones or undefs, this is a plain vector load of the 1st
		// argument.
		if (maskIsAllOneOrUndef(II.getArgOperand(2)))
return Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,		return Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,
"unmaskedload");		"unmaskedload");

		// If we can unconditionally load from this address, replace with a
		// load/select idiom. TODO: use DT for context sensative query
		spatelUnsubmitted Not Done Reply Inline Actions typo: sensative spatel: typo: sensative
		if (isDereferenceableAndAlignedPointer(LoadPtr, Alignment,
		II.getModule()->getDataLayout(),
		&II, nullptr)) {
		auto *LI = Builder.CreateAlignedLoad(II.getType(), LoadPtr, Alignment,
		spatelUnsubmitted Not Done Reply Inline Actions I think LLVM standards prefer to make this an explicit "Value " rather than "auto ": http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable spatel: I think LLVM standards prefer to make this an explicit "Value " rather than "auto ": http…
		"unmaskedload");
		return Builder.CreateSelect(II.getArgOperand(2), LI, II.getArgOperand(3));
}		}

return nullptr;		return nullptr;
}		}

// TODO, Obvious Missing Transforms:		// TODO, Obvious Missing Transforms:
// * Single constant active lane -> store		// * Single constant active lane -> store
// * Narrow width by halfs excluding zero/undef lanes		// * Narrow width by halfs excluding zero/undef lanes
Instruction *InstCombiner::simplifyMaskedStore(IntrinsicInst &II) {		Instruction *InstCombiner::simplifyMaskedStore(IntrinsicInst &II) {
auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));		auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));
Show All 39 Lines	static Instruction *simplifyMaskedGather(IntrinsicInst &II, InstCombiner &IC) {
return nullptr;		return nullptr;
}		}

// TODO, Obvious Missing Transforms:		// TODO, Obvious Missing Transforms:
// * Single constant active lane -> store		// * Single constant active lane -> store
// * Adjacent vector addresses -> masked.store		// * Adjacent vector addresses -> masked.store
// * Narrow store width by halfs excluding zero/undef lanes		// * Narrow store width by halfs excluding zero/undef lanes
// * Vector splat address w/known mask -> scalar store		// * Vector splat address w/known mask -> scalar store
// * Vector incrementing address -> vector masked store		// * Vector incrementing address -> vector masked store
		jdoerfertUnsubmitted Not Done Reply Inline Actions I would have preferred an actual type here (and in one or two other places below where it is not obvious). However, I don't know if this is the "standard" in this part of the code base. jdoerfert: I would have preferred an actual type here (and in one or two other places below where it is…
		reamesAuthorUnsubmitted Done Reply Inline Actions I'm happy to change to explicit types, which did you find non-obvious? reames: I'm happy to change to explicit types, which did you find non-obvious?
Instruction *InstCombiner::simplifyMaskedScatter(IntrinsicInst &II) {		Instruction *InstCombiner::simplifyMaskedScatter(IntrinsicInst &II) {
auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));		auto *ConstMask = dyn_cast<Constant>(II.getArgOperand(3));
if (!ConstMask)		if (!ConstMask)
return nullptr;		return nullptr;

// If the mask is all zeros, a scatter does nothing.		// If the mask is all zeros, a scatter does nothing.
if (ConstMask->isNullValue())		if (ConstMask->isNullValue())
return eraseInstFromFunction(II);		return eraseInstFromFunction(II);
▲ Show 20 Lines • Show All 3,584 Lines • Show Last 20 Lines

test/Transforms/InstCombine/masked_intrinsics.ll

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	;
%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)		%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)
ret <2 x double> %res		ret <2 x double> %res
}		}

define <2 x double> @load_speculative(<2 x double>* dereferenceable(16) %ptr,		define <2 x double> @load_speculative(<2 x double>* dereferenceable(16) %ptr,
; CHECK-LABEL: @load_speculative(		; CHECK-LABEL: @load_speculative(
; CHECK-NEXT: [[PTV1:%.]] = insertelement <2 x double> undef, double [[PT:%.]], i64 0		; CHECK-NEXT: [[PTV1:%.]] = insertelement <2 x double> undef, double [[PT:%.]], i64 0
; CHECK-NEXT: [[PTV2:%.*]] = shufflevector <2 x double> [[PTV1]], <2 x double> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[PTV2:%.*]] = shufflevector <2 x double> [[PTV1]], <2 x double> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: [[RES:%.]] = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double> nonnull [[PTR:%.]], i32 4, <2 x i1> [[MASK:%.]], <2 x double> [[PTV2]])		; CHECK-NEXT: [[UNMASKEDLOAD:%.]] = load <2 x double>, <2 x double> [[PTR:%.*]], align 4
; CHECK-NEXT: ret <2 x double> [[RES]]		; CHECK-NEXT: [[TMP1:%.]] = select <2 x i1> [[MASK:%.]], <2 x double> [[UNMASKEDLOAD]], <2 x double> [[PTV2]]
		; CHECK-NEXT: ret <2 x double> [[TMP1]]
;		;
double %pt, <2 x i1> %mask) {		double %pt, <2 x i1> %mask) {
%ptv1 = insertelement <2 x double> undef, double %pt, i64 0		%ptv1 = insertelement <2 x double> undef, double %pt, i64 0
%ptv2 = insertelement <2 x double> %ptv1, double %pt, i64 1		%ptv2 = insertelement <2 x double> %ptv1, double %pt, i64 1
%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)		%res = call <2 x double> @llvm.masked.load.v2f64.p0v2f64(<2 x double>* %ptr, i32 4, <2 x i1> %mask, <2 x double> %ptv2)
ret <2 x double> %res		ret <2 x double> %res
}		}

▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
}		}

define <2 x double> @gather_lane0(double* %base, double %pt) {		define <2 x double> @gather_lane0(double* %base, double %pt) {
; CHECK-LABEL: @gather_lane0(		; CHECK-LABEL: @gather_lane0(
; CHECK-NEXT: [[PTRS:%.]] = getelementptr double, double [[BASE:%.*]], <2 x i64> <i64 0, i64 undef>		; CHECK-NEXT: [[PTRS:%.]] = getelementptr double, double [[BASE:%.*]], <2 x i64> <i64 0, i64 undef>
; CHECK-NEXT: [[PT_V2:%.]] = insertelement <2 x double> undef, double [[PT:%.]], i64 1		; CHECK-NEXT: [[PT_V2:%.]] = insertelement <2 x double> undef, double [[PT:%.]], i64 1
; CHECK-NEXT: [[RES:%.]] = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double> [[PTRS]], i32 4, <2 x i1> <i1 true, i1 false>, <2 x double> [[PT_V2]])		; CHECK-NEXT: [[RES:%.]] = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double> [[PTRS]], i32 4, <2 x i1> <i1 true, i1 false>, <2 x double> [[PT_V2]])
; CHECK-NEXT: ret <2 x double> [[RES]]		; CHECK-NEXT: ret <2 x double> [[RES]]
;		;
		jdoerfertUnsubmitted Not Done Reply Inline Actions The zero here is the lane, correct? It looks "like a special case", even if it's not. Could we copy this test, change the width to 8 and gather lane 5? That should make sure it works "in general" with minimal effort on your part. Idk if there is, or if we need, a negative test as well, e.g., two active lanes in a constant mask. jdoerfert: The zero here is the lane, correct? It looks "like a special case", even if it's not. Could we…
		reamesAuthorUnsubmitted Done Reply Inline Actions Will do. reames: Will do.
%ptrs = getelementptr double, double *%base, <2 x i64> <i64 0, i64 1>		%ptrs = getelementptr double, double *%base, <2 x i64> <i64 0, i64 1>
%pt_v1 = insertelement <2 x double> undef, double %pt, i64 0		%pt_v1 = insertelement <2 x double> undef, double %pt, i64 0
%pt_v2 = insertelement <2 x double> %pt_v1, double %pt, i64 1		%pt_v2 = insertelement <2 x double> %pt_v1, double %pt, i64 1
%res = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double*> %ptrs, i32 4, <2 x i1> <i1 true, i1 false>, <2 x double> %pt_v2)		%res = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double*> %ptrs, i32 4, <2 x i1> <i1 true, i1 false>, <2 x double> %pt_v2)
ret <2 x double> %res		ret <2 x double> %res
}		}

define <2 x double> @gather_lane0_maybe(double* %base, double %pt,		define <2 x double> @gather_lane0_maybe(double* %base, double %pt,
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines